Detection based on Version Information in JavaScript Files

Question

Detection based on Version Information in JavaScript Files

Phylu opened this issue 4 years ago · comments

Within the WhatWeb plugins, I have multiple ways to detect frameworks with versions based on regexes in the code or based on the occurrence of certain files. What I would like to do is the following in addition to that:

Check for JavaScript files that are included in the index page.
Check each of those JavaScript files vor version information (e.g. based on a regex).

Many times, these JavaScript files (which could be named main.js or vendor.js contain comments like the following:

	 * http://jquery.com/
	 *
	 * Includes Sizzle.js
	 * http://sizzlejs.com/
	 *
	 * Copyright jQuery Foundation and other contributors
	 * Released under the MIT license
	 * http://jquery.org/license
	 *
	 * Date: 2016-01-08T20:02Z
	 */

Is there a way to implement something like this within a plugin? Or for all existing plugins so that the regexes could be used "recursively" on js pages that are included?

Andrew Horton · Answer 1 · Thu Oct 01 2020 21:13:11 GMT+0800 (China Standard Time)

Surprisingly I was just thinking💡 about how to add JavaScript library detection to WhatWeb.
I'll just dump my thoughts here, so we can kick off a discussion.

We will need:

An engine to recursively discover JavaScript URLs
Scan JavaScript content for patterns
A collection of patterns for JS Libraries

Things that make JavaScript unique:

Minify - JS is compressed with and white space and comments removed (and comments make great patterns)
Webpack, Browserify, Gulp - JS files are bundled together
SourceMaps - when it's available it can disclose more information for debugging

Thoughts:

Discovering, fetching, and parsing JS files would fit into aggressive level 2, a currently unused aggressive level.

Some questions to consider:

Should WhatWeb scan only same-site JS or also remote JS URLs?
Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?
A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?
Is there something faster than a headless browser that can be used like jsdom?

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

Janosch Braukmann · Answer 2 · Thu Oct 01 2020 23:09:29 GMT+0800 (China Standard Time)

My thoughts here:

Should WhatWeb scan only same-site JS or also remote JS URLs?
I suggest to fetch both in order to check for:

Version numbers in the URL Path
Version numbers in the GET Parameter
Version numbers in the JS Files themselves

Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?

I suggest to not do this (at least in the beginning). Of course there is techniques like Google Tag Manager, but as a first step (probably much easier & faster to implement and maintain), all the files that are included directly such as all minified js files from a vendor folder may be fine.

A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?

We have some experience here, and i totally agree with the resource issue. In addition, it will add huge third party dependencies for whatweb.

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

I would probably try to start with patterns using version numbers, as they are a good way to get information about the used libraries independent from their name

Possible license string & pattern (I will keep the eyes open for more):

* @license Angular v8.0.2\n     --> /@license ([a-zA-Z]*) v?([1-9])*\.?([1-9])\.?([1-9])?/