urbanadventurer / WhatWeb

Next generation web scanner

Home Page:https://www.morningstarsecurity.com/research/whatweb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Detection based on Version Information in JavaScript Files

Phylu opened this issue · comments

Within the WhatWeb plugins, I have multiple ways to detect frameworks with versions based on regexes in the code or based on the occurrence of certain files. What I would like to do is the following in addition to that:

  • Check for JavaScript files that are included in the index page.
  • Check each of those JavaScript files vor version information (e.g. based on a regex).

Many times, these JavaScript files (which could be named main.js or vendor.js contain comments like the following:

	 * http://jquery.com/
	 *
	 * Includes Sizzle.js
	 * http://sizzlejs.com/
	 *
	 * Copyright jQuery Foundation and other contributors
	 * Released under the MIT license
	 * http://jquery.org/license
	 *
	 * Date: 2016-01-08T20:02Z
	 */

Is there a way to implement something like this within a plugin? Or for all existing plugins so that the regexes could be used "recursively" on js pages that are included?

Surprisingly I was just thinking💡 about how to add JavaScript library detection to WhatWeb.
I'll just dump my thoughts here, so we can kick off a discussion.

We will need:

  • An engine to recursively discover JavaScript URLs
  • Scan JavaScript content for patterns
  • A collection of patterns for JS Libraries

Things that make JavaScript unique:

  • Minify - JS is compressed with and white space and comments removed (and comments make great patterns)
  • Webpack, Browserify, Gulp - JS files are bundled together
  • SourceMaps - when it's available it can disclose more information for debugging

Thoughts:

  • Discovering, fetching, and parsing JS files would fit into aggressive level 2, a currently unused aggressive level.

Some questions to consider:

  • Should WhatWeb scan only same-site JS or also remote JS URLs?
  • Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?
  • A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?
  • Is there something faster than a headless browser that can be used like jsdom?

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

My thoughts here:

  • Should WhatWeb scan only same-site JS or also remote JS URLs?
    I suggest to fetch both in order to check for:
  • Version numbers in the URL Path
  • Version numbers in the GET Parameter
  • Version numbers in the JS Files themselves
  • Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?

I suggest to not do this (at least in the beginning). Of course there is techniques like Google Tag Manager, but as a first step (probably much easier & faster to implement and maintain), all the files that are included directly such as all minified js files from a vendor folder may be fine.

  • A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?

We have some experience here, and i totally agree with the resource issue. In addition, it will add huge third party dependencies for whatweb.

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

I would probably try to start with patterns using version numbers, as they are a good way to get information about the used libraries independent from their name

Possible license string & pattern (I will keep the eyes open for more):

* @license Angular v8.0.2\n     --> /@license ([a-zA-Z]*) v?([1-9])*\.?([1-9])\.?([1-9])?/