mssola / user_agent

This project has been moved, check the README.md file!

Home Page:https://github.com/mssola/useragent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bots not marked as bots

grotos opened this issue · comments

Here is a list of UserAgent strings which are not marked as bots, but in fact they are:

"ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com"
"Apache-HttpClient/4.2.3 (java 1.5)"
"Apache-HttpClient/4.3 (java 1.5)"
"Apache-HttpClient/4.3.3 (java 1.5)"
"Application"
"CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)"
"COMODOSpider/Nutch-1.2"
"Comodo Spider 1.2"
"Comodo-Webinspector-Crawler 2.1"
"Faraday v0.8.9"
"GigablastOpenSource/1.0"
"GoogleBot 1.0"
"Google_Analytics_Snippet_Validator"
"HTTPClient/1.0 (2.3.4.1, ruby 1.9.3 (2013-06-27))"
"HTTPClient/1.0 (2.4.0, ruby 1.9.3 (2013-06-27))"
"Java/1.6.0_29"
"Java/1.6.0_45"
"Java/1.7.0_09"
"Java/1.7.0_21"
"Java/1.7.0_40"
"Java/1.7.0_60-ea"
"Java/1.7.0_65"
"Mozilla/2.0 (compatible; crw)"
"Mozilla/3.0 (compatible; Indy Library)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.2)"
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider"
"Mozilla/4.0 (compatible; Netcraft Web Server Survey)"
"Mozilla/4.0 (compatible; Synapse)"
"Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)"
"Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"
"Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36 AlexaToolbar/alxg-3.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) KomodiaBot/1.0"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
"Mozilla/5.0 (Windows NT 6.2; WOW64) Runet-Research-Crawler (itrack.ru/research/cmsrate; rating@itrack.ru)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) Survey/2.3 (fr.wsdata.com)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; )  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11)  Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/0.0.0.0 Safari/536.11"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
"Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20100101 Firefox/21.0 WordPress.com mShots"
"Mozilla/5.0 (compatible; Google-Site-Verification/1.0)"
"Mozilla/5.0 (compatible; IstellaBot/1.18.81 +http://www.tiscali.it/)"
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1) (http://name911.com)"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider(compatible; HaosouSpider; http://www.haosou.com/help/help_3_2.html)"
"Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)"
"Mozilla/5.0 (compatible; Owler/0.4; +; )"
"Mozilla/5.0 (compatible; PageAnalyzer/1.1;)"
"Mozilla/5.0 (compatible; XML Sitemaps Generator; http://www.xml-sitemaps.com) Gecko XML-Sitemaps/1.0"
"Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)"
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
"Mozilla/5.0(compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)"
"Mozilla/5.0(compatible;Sosospider/2.0;+http://help.soso.com/webspider.htm)"
"Porkbun/Mustache (Website Analysis; http://porkbun.com; tech@porkbun.com)"
"PycURL/7.23.1"
"Python-urllib/1.17"
"Python-urllib/2.6"
"Python-urllib/2.7"
"Python-urllib/3.4"
"Robosourcer/1.0"
"Ruby"
"Sosospider+(+http://help.soso.com/webspider.htm)"
"W3C_Validator/1.3 http://validator.w3.org/services"
"WebTarantula.com Crawler"
"Wget/1.12 (linux-gnu)"
"Wget/1.13.4 (linux-gnu)"
"WhatWeb/0.4.8-dev"
"Who.is Bot"
"WinInet Test"
"YisouSpider"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.35.0"
"ip-web-crawler.com"
"panscient.com"
"python-requests/1.1.0 CPython/2.7.4 Linux/3.8.0-19-generic"
"python-requests/1.2.0 CPython/2.7.4 Linux/3.8.0-33-generic"
"python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-24-generic"
"spotinfluence/Nutch-1.4 (Spot Influence crawler; http://spotinfluence.com; hello at spotinfluence dot com)"
"visaduhoc.info Crawler"
"wsr-agent/1.0"

Some more data from me https://gist.github.com/crackcomm/40bad73724f14369b602
Second revision is after #26

commented

This one also appears to fail, possibly due to having more than one section (so bot doesn't match) and using HTTPS (the site regexp only appears to match http://...).

Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)

Here's a couple more user agents I consider bots in addition to the results from Bot():

AppEngine-Google; (+http://code.google.com/appengine; appid: s~something)
Slack-ImgProxy (+https://api.slack.com/robots)