Web Crawlers

The following URLs are crawlers that have been banned by the IT Skeptic: crawler.bloglines.com - controvertial banning, but they leave error messages everywhere and eat huge resource A hungry crawler from irldotcsdottamudotedu, doing "research". Piss off. out of Turkey, out of Japan. USA Japanese. Big resource eaters. Look like spammers Chinese Sogou spider corp(dot)sohu(dot)com(slash)20051130(slash)n240842344(dot)shtml . Dunno who it is, comes out of China, but very hungry spider China Railway corporation! dear me. Ate 50% more than Google. And right after the Chinese Govt featured in a spoof on the IT Skeptic. Chinese spooks I reckon. So they can combine sex and travel, and ... out of Vietnam Dunno who dwhl.de are but they provide an anonymous ftp server too
87.99.76.% from Latvia ( - )
89.34.173.% from Romania

These crawlers are permitted to access this site: News gator. High hits but low resource usage - nice people., - Despite being by far the biggest remaining muncher of resource, who can say no to Google? But man Googlebot is hungry!!, "Burning Door", a.k.a feedburner. Better let them in, they serve my RSS feeds! Nice light crawler A Mickeysoft bot, average hunger but hits lightly

This list does not include known spammers who have been blocked. "Crawlers" are grouped (by me) by their high hit rates and/or high resource consumption: their intent may or may not be legitimate.

Many thanks to domaintools.com and googleprr.com and of course standard Google search. All powerful tools in establishing the bona fides - or not - of an address.

Syndicate content