Analysis of federal .gov domains
Analysis of what technologies power federal .gov websites - 2021 edition.
This is a Jekyll site, with a few custom plugins. It uses Site Inspector to gather information about domains.
Crawling
script/fetch-domain-list
will fetch the latest .gov domain list from GSAscript/crawl
will iterate over each domain, running Site Inspector to capture data- Once all domains are crawled,
script/dump-fields
parses some additional metadata necessary for the site to work.
Data storage
- Crawled domains are stored in the
_domains
directory as_domains/DOMAIN-GOV.html
. - Crawl data is stored as YAML front matter within each domain.
- Some additional metadata (the raw domain list, field names, technologies in use), is stored as JSON in the
_data
directory. data.json
anddata.csv
are generated at build time, and contain the complete dataset. They are available in the_site
folder after runningscript/server
.
Running locally
- Clone the repo
script/bootstrap
to install dependenciesscript/server
to boot up the server and open the site in your browser