HTTPArchive / httparchive.org

The HTTP Archive website hosted on App Engine

Home Page:https://httparchive.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Store Technology meta data in HTTP Archive

tunetheweb opened this issue · comments

It would be useful to pull in more of the Wappalyzer meta data into the HTTP Archive.

For example the meta->description and website could be useful to display in the CWV Tech report (and possibly the icon).

The Aurora team also mentioned the implies data could be useful to see how much technology is based on other tech.

WPT stores the processed as well as the raw wappalyzer output in the HAR. The _detected_technologies field is lightly processed and _detected_raw has the raw, unprocessed detection results. Both have the description and file name for the logo:

i.e. for webpagetest.org:

                "_detected": {
                    "Programming languages": "PHP",
                    "Caching": "Varnish",
                    "JavaScript libraries": "jQuery UI 1.8.17,jQuery 1.7.1",
                    "Documentation": "Zendesk",
                    "Issue trackers": "Zendesk",
                    "Live chat": "Zendesk",
                    "Advertising": "Twitter Ads",
                    "Webmail": "Microsoft 365",
                    "Email": "Microsoft 365",
                    "Security": "HSTS,Cloudflare Bot Management",
                    "Tag managers": "Google Tag Manager",
                    "Analytics": "Google Analytics",
                    "CDN": "Cloudflare",
                    "Miscellaneous": "Open Graph"
                },
                "_detected_apps": {
                    "PHP": "",
                    "Varnish": "",
                    "jQuery UI": "1.8.17",
                    "Zendesk": "",
                    "Twitter Ads": "",
                    "Microsoft 365": "",
                    "jQuery": "1.7.1",
                    "HSTS": "",
                    "Google Tag Manager": "",
                    "Google Analytics": "",
                    "Cloudflare Bot Management": "",
                    "Cloudflare": "",
                    "Open Graph": ""
                },
                "_detected_technologies": {
                    "PHP": {
                        "name": "PHP",
                        "description": "PHP is a general-purpose scripting language used for web development.",
                        "slug": "php",
                        "categories": [
                            {
                                "id": 27,
                                "slug": "programming-languages",
                                "groups": [
                                    9
                                ],
                                "name": "Programming languages",
                                "priority": 5
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "PHP.svg",
                        "website": "http://php.net",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:php:php:*:*:*:*:*:*:*:*"
                    },
                    "Varnish": {
                        "name": "Varnish",
                        "description": "Varnish is a reverse caching proxy.",
                        "slug": "varnish",
                        "categories": [
                            {
                                "id": 23,
                                "slug": "caching",
                                "groups": [
                                    7
                                ],
                                "name": "Caching",
                                "priority": 7
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Varnish.svg",
                        "website": "http://www.varnish-cache.org",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:varnish-software:varnish_cache:*:*:*:*:*:*:*:*"
                    },
                    "jQuery UI": {
                        "name": "jQuery UI",
                        "description": "jQuery UI is a collection of GUI widgets, animated visual effects, and themes implemented with jQuery, Cascading Style Sheets, and HTML.",
                        "slug": "jquery-ui",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.8.17",
                        "icon": "jQuery UI.svg",
                        "website": "http://jqueryui.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery_ui:*:*:*:*:*:*:*:*"
                    },
                    "Zendesk": {
                        "name": "Zendesk",
                        "description": "Zendesk is a cloud-based help desk management solution offering customizable tools to build customer service portal, knowledge base and online communities.",
                        "slug": "zendesk",
                        "categories": [
                            {
                                "id": 4,
                                "slug": "documentation",
                                "groups": [
                                    3
                                ],
                                "name": "Documentation",
                                "priority": 2
                            },
                            {
                                "id": 13,
                                "slug": "issue-trackers",
                                "groups": [
                                    3,
                                    18
                                ],
                                "name": "Issue trackers",
                                "priority": 2
                            },
                            {
                                "id": 52,
                                "slug": "live-chat",
                                "groups": [
                                    4,
                                    16
                                ],
                                "name": "Live chat",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Zendesk.svg",
                        "website": "https://zendesk.com",
                        "pricing": [
                            "low"
                        ],
                        "cpe": null
                    },
                    "Twitter Ads": {
                        "name": "Twitter Ads",
                        "description": "Twitter Ads is an advertising platform for Twitter 'microblogging' system.",
                        "slug": "twitter-ads",
                        "categories": [
                            {
                                "id": 36,
                                "slug": "advertising",
                                "groups": [
                                    2
                                ],
                                "name": "Advertising",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Twitter.svg",
                        "website": "https://ads.twitter.com",
                        "pricing": [
                            "payg"
                        ],
                        "cpe": null
                    },
                    "Microsoft 365": {
                        "name": "Microsoft 365",
                        "description": "Microsoft 365 is a line of subscription services offered by Microsoft as part of the Microsoft Office product line.",
                        "slug": "microsoft-365",
                        "categories": [
                            {
                                "id": 30,
                                "slug": "webmail",
                                "groups": [
                                    4
                                ],
                                "name": "Webmail",
                                "priority": 2
                            },
                            {
                                "id": 75,
                                "slug": "email",
                                "groups": [
                                    4,
                                    2
                                ],
                                "name": "Email",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Microsoft 365.svg",
                        "website": "https://www.microsoft.com/microsoft-365",
                        "pricing": [],
                        "cpe": null
                    },
                    "jQuery": {
                        "name": "jQuery",
                        "description": "jQuery is a JavaScript library which is a free, open-source software designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax.",
                        "slug": "jquery",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.7.1",
                        "icon": "jQuery.svg",
                        "website": "https://jquery.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery:*:*:*:*:*:*:*:*"
                    },
                    "HSTS": {
                        "name": "HSTS",
                        "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
                        "slug": "hsts",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "default.svg",
                        "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
                        "pricing": [],
                        "cpe": null
                    },
                    "Google Tag Manager": {
                        "name": "Google Tag Manager",
                        "description": "Google Tag Manager is a tag management system (TMS) that allows you to quickly and easily update measurement codes and related code fragments collectively known as tags on your website or mobile app.",
                        "slug": "google-tag-manager",
                        "categories": [
                            {
                                "id": 42,
                                "slug": "tag-managers",
                                "groups": [
                                    8
                                ],
                                "name": "Tag managers",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Tag Manager.svg",
                        "website": "http://www.google.com/tagmanager",
                        "pricing": [],
                        "cpe": null
                    },
                    "Google Analytics": {
                        "name": "Google Analytics",
                        "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
                        "slug": "google-analytics",
                        "categories": [
                            {
                                "id": 10,
                                "slug": "analytics",
                                "groups": [
                                    8
                                ],
                                "name": "Analytics",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Analytics.svg",
                        "website": "http://google.com/analytics",
                        "pricing": [],
                        "cpe": null
                    },
                    "Cloudflare Bot Management": {
                        "name": "Cloudflare Bot Management",
                        "description": "Cloudflare bot management solution identifies and mitigates automated traffic to protect websites from bad bots.",
                        "slug": "cloudflare-bot-management",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "https://www.cloudflare.com/en-gb/products/bot-management/",
                        "pricing": [],
                        "cpe": null
                    },
                    "Cloudflare": {
                        "name": "Cloudflare",
                        "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
                        "slug": "cloudflare",
                        "categories": [
                            {
                                "id": 31,
                                "slug": "cdn",
                                "groups": [
                                    7
                                ],
                                "name": "CDN",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "http://www.cloudflare.com",
                        "pricing": [],
                        "cpe": null
                    },
                    "Open Graph": {
                        "name": "Open Graph",
                        "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
                        "slug": "open-graph",
                        "categories": [
                            {
                                "id": 19,
                                "slug": "miscellaneous",
                                "groups": [
                                    6
                                ],
                                "name": "Miscellaneous",
                                "priority": 10
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Open Graph.png",
                        "website": "https://ogp.me",
                        "pricing": [],
                        "cpe": null
                    }
                },
                "_detected_raw": [
                    {
                        "name": "PHP",
                        "description": "PHP is a general-purpose scripting language used for web development.",
                        "slug": "php",
                        "categories": [
                            {
                                "id": 27,
                                "slug": "programming-languages",
                                "groups": [
                                    9
                                ],
                                "name": "Programming languages",
                                "priority": 5
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "PHP.svg",
                        "website": "http://php.net",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:php:php:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "Varnish",
                        "description": "Varnish is a reverse caching proxy.",
                        "slug": "varnish",
                        "categories": [
                            {
                                "id": 23,
                                "slug": "caching",
                                "groups": [
                                    7
                                ],
                                "name": "Caching",
                                "priority": 7
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Varnish.svg",
                        "website": "http://www.varnish-cache.org",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:varnish-software:varnish_cache:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "jQuery UI",
                        "description": "jQuery UI is a collection of GUI widgets, animated visual effects, and themes implemented with jQuery, Cascading Style Sheets, and HTML.",
                        "slug": "jquery-ui",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.8.17",
                        "icon": "jQuery UI.svg",
                        "website": "http://jqueryui.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery_ui:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "Zendesk",
                        "description": "Zendesk is a cloud-based help desk management solution offering customizable tools to build customer service portal, knowledge base and online communities.",
                        "slug": "zendesk",
                        "categories": [
                            {
                                "id": 4,
                                "slug": "documentation",
                                "groups": [
                                    3
                                ],
                                "name": "Documentation",
                                "priority": 2
                            },
                            {
                                "id": 13,
                                "slug": "issue-trackers",
                                "groups": [
                                    3,
                                    18
                                ],
                                "name": "Issue trackers",
                                "priority": 2
                            },
                            {
                                "id": 52,
                                "slug": "live-chat",
                                "groups": [
                                    4,
                                    16
                                ],
                                "name": "Live chat",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Zendesk.svg",
                        "website": "https://zendesk.com",
                        "pricing": [
                            "low"
                        ],
                        "cpe": null
                    },
                    {
                        "name": "Twitter Ads",
                        "description": "Twitter Ads is an advertising platform for Twitter 'microblogging' system.",
                        "slug": "twitter-ads",
                        "categories": [
                            {
                                "id": 36,
                                "slug": "advertising",
                                "groups": [
                                    2
                                ],
                                "name": "Advertising",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Twitter.svg",
                        "website": "https://ads.twitter.com",
                        "pricing": [
                            "payg"
                        ],
                        "cpe": null
                    },
                    {
                        "name": "Microsoft 365",
                        "description": "Microsoft 365 is a line of subscription services offered by Microsoft as part of the Microsoft Office product line.",
                        "slug": "microsoft-365",
                        "categories": [
                            {
                                "id": 30,
                                "slug": "webmail",
                                "groups": [
                                    4
                                ],
                                "name": "Webmail",
                                "priority": 2
                            },
                            {
                                "id": 75,
                                "slug": "email",
                                "groups": [
                                    4,
                                    2
                                ],
                                "name": "Email",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Microsoft 365.svg",
                        "website": "https://www.microsoft.com/microsoft-365",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "jQuery",
                        "description": "jQuery is a JavaScript library which is a free, open-source software designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax.",
                        "slug": "jquery",
                        "categories": [
                            {
                                "id": 59,
                                "slug": "javascript-libraries",
                                "groups": [
                                    9
                                ],
                                "name": "JavaScript libraries",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "1.7.1",
                        "icon": "jQuery.svg",
                        "website": "https://jquery.com",
                        "pricing": [],
                        "cpe": "cpe:2.3:a:jquery:jquery:*:*:*:*:*:*:*:*"
                    },
                    {
                        "name": "HSTS",
                        "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
                        "slug": "hsts",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "default.svg",
                        "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Google Tag Manager",
                        "description": "Google Tag Manager is a tag management system (TMS) that allows you to quickly and easily update measurement codes and related code fragments collectively known as tags on your website or mobile app.",
                        "slug": "google-tag-manager",
                        "categories": [
                            {
                                "id": 42,
                                "slug": "tag-managers",
                                "groups": [
                                    8
                                ],
                                "name": "Tag managers",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Tag Manager.svg",
                        "website": "http://www.google.com/tagmanager",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Google Analytics",
                        "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
                        "slug": "google-analytics",
                        "categories": [
                            {
                                "id": 10,
                                "slug": "analytics",
                                "groups": [
                                    8
                                ],
                                "name": "Analytics",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Google Analytics.svg",
                        "website": "http://google.com/analytics",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Cloudflare Bot Management",
                        "description": "Cloudflare bot management solution identifies and mitigates automated traffic to protect websites from bad bots.",
                        "slug": "cloudflare-bot-management",
                        "categories": [
                            {
                                "id": 16,
                                "slug": "security",
                                "groups": [
                                    11
                                ],
                                "name": "Security",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "https://www.cloudflare.com/en-gb/products/bot-management/",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Cloudflare",
                        "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
                        "slug": "cloudflare",
                        "categories": [
                            {
                                "id": 31,
                                "slug": "cdn",
                                "groups": [
                                    7
                                ],
                                "name": "CDN",
                                "priority": 9
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "CloudFlare.svg",
                        "website": "http://www.cloudflare.com",
                        "pricing": [],
                        "cpe": null
                    },
                    {
                        "name": "Open Graph",
                        "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
                        "slug": "open-graph",
                        "categories": [
                            {
                                "id": 19,
                                "slug": "miscellaneous",
                                "groups": [
                                    6
                                ],
                                "name": "Miscellaneous",
                                "priority": 10
                            }
                        ],
                        "confidence": 100,
                        "version": "",
                        "icon": "Open Graph.png",
                        "website": "https://ogp.me",
                        "pricing": [],
                        "cpe": null
                    }
                ],

It doesn't look like the "implies" mapping and if a detection was direct or implied are available.