omrilotan / isbot

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string

Home Page:https://isbot.js.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Invalid regular expression" on older ES versions (lack of negative lookbehind support)

jlowcs opened this issue · comments

Steps to reproduce

install es-check

create a file with the regex from the release:

var regex = / daum[ /]| deusu\/| yadirectfetcher|(?:^| )site|(?:^|[^g])news|(?<! (?:channel\/|google\/))google(?!(app|\/google| pixel))|(?<! cu)bot(?:[^\w]|_|$)|(?<! ya(?:yandex)?)search|(?<!(?:lib))http|(?<![hg]m)score|@[a-z]|\(at\)[a-z]|\(github\.com\/|\[at\][a-z]|^12345|^<|^[\w \.\-\(?:\):]+(?:\/v?\d+(\.\d+)?(?:\.\d{1,10})?)?(?:,|$)|^[^ ]{50,}$|^active|^ad muncher|^amaya|^anglesharp\/|^avsdevicesdk\/|^bidtellect\/|^biglotron|^bot|^btwebclient\/|^clamav[ /]|^client\/|^cobweb\/|^coccoc|^custom|^ddg[_-]android|^discourse|^dispatch\/\d|^downcast\/|^duckduckgo|^facebook|^fdm[ /]\d|^getright\/|^gozilla\/|^hatena|^hobbit|^hotzonu|^hwcdn\/|^jeode\/|^jetty\/|^jigsaw|^linkdex|^metauri|^microsoft bits|^movabletype|^mozilla\/\d\.\d \(compatible;?\)$|^mozilla\/\d\.\d \w*$|^navermailapp|^netsurf|^offline explorer|^php|^postman|^postrank|^python|^rank|^read|^reed|^rest|^snapchat|^space bison|^svn|^swcd |^taringa|^thumbor\/|^tumblr\/|^user-agent:|^valid|^venus\/fedoraplanet|^w3c|^webbandit\/|^webcopier|^wget|^whatsapp|^xenu link sleuth|^yahoo|^yandex|^zdm\/\d|^zoom marketplace\/|^{{.*}}$|adbeat\.com|appinsights|archive|ask jeeves\/teoma|bit\.ly\/|bluecoat drtr|browsex|burpcollaborator|capture|catch|check|chrome-lighthouse|chromeframe|classifier|cloud|crawl|cryptoapi|dareboost|datanyze|dataprovider|dejaclick|dmbrowser|download|evc-batch\/|feed|firephp|freesafeip|gomezagent|headless|httrack|hubspot marketing grader|hydra|ibisbrowser|images|inspect|iplabel|ips-agent|java(?!;)|library|mail\.ru\/|manager|monitor|neustar wpm|nutch|offbyone|optimize|pageburst|pagespeed|parser|perl|phantom|pingdom|powermarks|preview|proxy|ptst[ /]\d|reader|reputation|resolver|retriever|rexx;|rigor|robot|rss|scan|scrape|server|sogou|sparkler\/|speedcurve|spider|splash|statuscake|stumbleupon\.com|supercleaner|synapse|synthetic|torrent|tracemyfile|transcoder|twingly recon|url|virtuoso|wappalyzer|webglance|webkit2png|whatcms\/|wordpress|zgrab/i;

run es-check es2017 on that file

Expected behaviour

It should pass the check, or the README should explicitely say that older browsers are not supported.

Actual behaviour

It does not pass the check

Thank you for the Issue.
I will list supported items for version 4 and review using es-check.
It should match supported devices for regular expression "lookbehind" feature.
Isbot version 3 supports engines that does not support this feature.

Do you plan on maintaining v3 with new/updated bot user agents?

That's a very good question.
The non-lookbehind patterns include some false-positives. For example, a lot of android apps include the webview substring "Channel/googleplay", where the simplistic "google" substring pattern will recognise as a bot. That's why I think supporting version 3 pattern updates is not a priority.

Regular expression lookbehind is now widely supported.

Source: https://caniuse.com/?search=Lookbehind

Instead - I might create a shorter, more simplistic and more optimistic option for engines that do not support lookbehind as a fallback. This could mitigate your case as well.

For example:

let pattern;
try {
 pattern = new Regexp("(?<! cu)bot(?:[^\\w]|_|$)", "i");
} catch (error) {
 pattern = /bot/i;
}

What is your opinion about this feature?

That's a very good question. The non-lookbehind patterns include some false-positives. For example, a lot of android apps include the webview substring "Channel/googleplay", where the simplistic "google" substring pattern will recognise as a bot. That's why I think supporting version 3 pattern updates is not a priority.

Regular expression lookbehind is now widely supported.

True, although Safari only started supporting it in 2023...

A quarter of our users seem to be on older browser versions (ie over 4 versions old), for which we serve a less optimized bundle. As we're providing a platform for social change, we want to be mindful of the fact that not everyone have access to modern browsers, but we still want them to be able to use the platform.

Instead - I might create a shorter, more simplistic and more optimistic option for engines that do not support lookbehind as a fallback. This could mitigate your case as well.

For example:

let pattern;
try {
 pattern = new Regexp("(?<! cu)bot(?:[^\\w]|_|$)", "i");
} catch (error) {
 pattern = /bot/i;
}

What is your opinion about this feature?

I like this idea!

#237

This changeset adds a fallback for engines that do not support lookbehind in regular expressions. This fallback is less accurate but catches ~78% of bots with less than 1% false-positive.
It is also exposed as "isbotNaive" export.

Happy if you would take a look

The only regretful side effect is the pattern can not be tree-shaken for users who don't use the "isbot" named export.

The only regretful side effect is the pattern can not be tree-shaken for users who don't use the "isbot" named export.

Would using a IIFE fix that?

export const pattern = (() => {
  try {
	return new RegExp(expression, "i");
  } catch (error) {
	return naiveExpression;
  }
})();

No, because iife run anyways it will be included in all output versions.
I'm not very worried about this, most of usage is for "isbot" function anyway.

No, because iife run anyways it will be included in all output versions. I'm not very worried about this, most of usage is for "isbot" function anyway.

I think it depends on whether you set "sideEffects": false in your package.json, but I might be wrong.

https://webpack.js.org/guides/tree-shaking/

No, because iife run anyways it will be included in all output versions. I'm not very worried about this, most of usage is for "isbot" function anyway.

I think it depends on whether you set "sideEffects": false in your package.json, but I might be wrong.

https://webpack.js.org/guides/tree-shaking/

It's about what runs in-scope and what has to run on "module load".
I was able to separate the two exports by building both the string and the regex as individual exports.

Version 4.4.0 should be okay when importing isbot.

import { isbot } from  "isbot"

I greatly appreciate your involvement in this issue. It helped me reach a better outcome.

Thank you for fixing this so quickly, this makes it possible for us to upgrade to v4 again. :)

@omrilotan It appears this is still an issue for Safari versions that don't support lookbehind (In my case, Safari 15.5).

@bfaulk96 appologies for the delay. Proposed resolution: #243

@bfaulk96 appologies for the delay. Proposed resolution: #243

No worries! Thanks for the quick turnaround, though! #243 looks like it would do the trick!

Resolution of isbot@5 tested on Safari 15.6

Can confirm, isbot@5 is working now for my Safari 15.5 use-case. Thanks again for the quick turnaround on that!