benbalter / 2021-analysis-of-federal-dotgov-domains

Analysis of what technologies power federal .gov websites - 2021 edition.

https://ben.balter.com/2021-analysis-of-federal-dotgov-domains/

Analysis of federal .gov domains

Analysis of what technologies power federal .gov websites - 2021 edition.

This is a Jekyll site, with a few custom plugins. It uses Site Inspector to gather information about domains.

Crawling

script/fetch-domain-list will fetch the latest .gov domain list from GSA
script/crawl will iterate over each domain, running Site Inspector to capture data
Once all domains are crawled, script/dump-fields parses some additional metadata necessary for the site to work.

Data storage

Crawled domains are stored in the _domains directory as _domains/DOMAIN-GOV.html.
Crawl data is stored as YAML front matter within each domain.
Some additional metadata (the raw domain list, field names, technologies in use), is stored as JSON in the _data directory.
data.json and data.csv are generated at build time, and contain the complete dataset. They are available in the _site folder after running script/server.

Running locally

Clone the repo
script/bootstrap to install dependencies
script/server to boot up the server and open the site in your browser

About

Analysis of what technologies power federal .gov websites - 2021 edition.

https://ben.balter.com/2021-analysis-of-federal-dotgov-domains/

MIT License

Languages

Language:HTML 99.6%Language:Ruby 0.3%Language:JavaScript 0.0%Language:TypeScript 0.0%Language:Shell 0.0%Language:SCSS 0.0%