Detection based on Version Information in JavaScript Files
Phylu opened this issue · comments
Within the WhatWeb plugins, I have multiple ways to detect frameworks with versions based on regexes in the code or based on the occurrence of certain files. What I would like to do is the following in addition to that:
- Check for JavaScript files that are included in the index page.
- Check each of those JavaScript files vor version information (e.g. based on a regex).
Many times, these JavaScript files (which could be named main.js or vendor.js contain comments like the following:
* http://jquery.com/
*
* Includes Sizzle.js
* http://sizzlejs.com/
*
* Copyright jQuery Foundation and other contributors
* Released under the MIT license
* http://jquery.org/license
*
* Date: 2016-01-08T20:02Z
*/
Is there a way to implement something like this within a plugin? Or for all existing plugins so that the regexes could be used "recursively" on js pages that are included?
Surprisingly I was just thinking💡 about how to add JavaScript library detection to WhatWeb.
I'll just dump my thoughts here, so we can kick off a discussion.
We will need:
- An engine to recursively discover JavaScript URLs
- Scan JavaScript content for patterns
- A collection of patterns for JS Libraries
Things that make JavaScript unique:
- Minify - JS is compressed with and white space and comments removed (and comments make great patterns)
- Webpack, Browserify, Gulp - JS files are bundled together
- SourceMaps - when it's available it can disclose more information for debugging
Thoughts:
- Discovering, fetching, and parsing JS files would fit into aggressive level 2, a currently unused aggressive level.
Some questions to consider:
- Should WhatWeb scan only same-site JS or also remote JS URLs?
- Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?
- A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?
- Is there something faster than a headless browser that can be used like jsdom?
I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.
My thoughts here:
- Should WhatWeb scan only same-site JS or also remote JS URLs?
I suggest to fetch both in order to check for:
- Version numbers in the URL Path
- Version numbers in the GET Parameter
- Version numbers in the JS Files themselves
- Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?
I suggest to not do this (at least in the beginning). Of course there is techniques like Google Tag Manager, but as a first step (probably much easier & faster to implement and maintain), all the files that are included directly such as all minified js files from a vendor folder may be fine.
- A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?
We have some experience here, and i totally agree with the resource issue. In addition, it will add huge third party dependencies for whatweb.
I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.
I would probably try to start with patterns using version numbers, as they are a good way to get information about the used libraries independent from their name
Possible license string & pattern (I will keep the eyes open for more):
* @license Angular v8.0.2\n --> /@license ([a-zA-Z]*) v?([1-9])*\.?([1-9])\.?([1-9])?/