WhichBrowser / Parser-PHP

Browser sniffing gone too far — A useragent parser library for PHP

Home Page:http://whichbrowser.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce false positives by detecting two variables for some bots

summercms opened this issue · comments

commented

Some bots using the normal regex patterns could create a load of false positives with common words. Using two variables inside their user agent would reduce the number of false positive results. In this example, we take the common pattern compatible; and use a custom regex for Blogger Bot. More bots could be added inside the compatible; regex - if they are deemed to have high false positives from using common word patterns.

        /* Reduce false positives by detecting two variables
         * the bots are based on finding `compatible;` and 
         * their unique regex to improve results
         */

        if (preg_match('/compatible;/u', $ua, $match)) {
            /* Detect Blogger Bot */
            if (preg_match('/blogger\.com/u', $ua, $match)) {
                $this->data->browser->name = 'Blogger Bot';
                $this->data->device->type = Constants\DeviceType::BOT;
            }
            ..
        }

The above code example would detect the user agent:

Mozilla/5.0 (compatible; blogger.com)

It would first find compatible; and then find blogger.com

Instead of doing the pr code line:

[ 'name' => 'Blogger Bot', 'id' => 'blogger', 'regexp' => '/blogger\.com/u' ],

Which could create a false positive result if someone created the user agent:

https://example.blogger.com bad bot

Some other bot user agents could also be put inside this if statement container to avoid other false positive results.