mssola / user_agent

This project has been moved, check the file!

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bots not marked as bots

grotos opened this issue · comments

Here is a list of UserAgent strings which are not marked as bots, but in fact they are:

"ADmantX Platform Semantic Analyzer - ADmantX Inc. - -"
"Apache-HttpClient/4.2.3 (java 1.5)"
"Apache-HttpClient/4.3 (java 1.5)"
"Apache-HttpClient/4.3.3 (java 1.5)"
"CATExplorador/1.0beta (sistemes at domini dot cat;"
"Comodo Spider 1.2"
"Comodo-Webinspector-Crawler 2.1"
"Faraday v0.8.9"
"GoogleBot 1.0"
"HTTPClient/1.0 (, ruby 1.9.3 (2013-06-27))"
"HTTPClient/1.0 (2.4.0, ruby 1.9.3 (2013-06-27))"
"Mozilla/2.0 (compatible; crw)"
"Mozilla/3.0 (compatible; Indy Library)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.2)"
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider"
"Mozilla/4.0 (compatible; Netcraft Web Server Survey)"
"Mozilla/4.0 (compatible; Synapse)"
"Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)"
"Mozilla/4.0 (compatible;"
"Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36 AlexaToolbar/alxg-3.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider(compatible; HaosouSpider;"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) KomodiaBot/1.0"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+"
"Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
"Mozilla/5.0 (Windows NT 6.2; WOW64) Runet-Research-Crawler (;"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv: Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) Survey/2.3 ("
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv: Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; )  Firefox/; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:  Firefox/; 360Spider"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: Gecko/20070312 Firefox/; 360Spider"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/ Safari/536.11"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
"Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20100101 Firefox/21.0 mShots"
"Mozilla/5.0 (compatible; Google-Site-Verification/1.0)"
"Mozilla/5.0 (compatible; IstellaBot/1.18.81 +"
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1) ("
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider"
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider(compatible; HaosouSpider;"
"Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0;"
"Mozilla/5.0 (compatible; Owler/0.4; +; )"
"Mozilla/5.0 (compatible; PageAnalyzer/1.1;)"
"Mozilla/5.0 (compatible; XML Sitemaps Generator; Gecko XML-Sitemaps/1.0"
"Mozilla/5.0 (compatible; archive.org_bot +"
"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +"
"Mozilla/5.0(compatible; Sosospider/2.0; +"
"Porkbun/Mustache (Website Analysis;;"
" Crawler"
"Wget/1.12 (linux-gnu)"
"Wget/1.13.4 (linux-gnu)"
" Bot"
"WinInet Test"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/ zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/ zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
"python-requests/1.1.0 CPython/2.7.4 Linux/3.8.0-19-generic"
"python-requests/1.2.0 CPython/2.7.4 Linux/3.8.0-33-generic"
"python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-24-generic"
"spotinfluence/Nutch-1.4 (Spot Influence crawler;; hello at spotinfluence dot com)"
" Crawler"

Some more data from me
Second revision is after #26


This one also appears to fail, possibly due to having more than one section (so bot doesn't match) and using HTTPS (the site regexp only appears to match http://...).

Slackbot-LinkExpanding 1.0 (+

Here's a couple more user agents I consider bots in addition to the results from Bot():

AppEngine-Google; (+; appid: s~something)
Slack-ImgProxy (+