[Feature request]: Rewrite of generate-environment-identifiers-dict.sh
molangning opened this issue · comments
Describe the feature request:
The current script uses majestic's domain list, which may be missing a lot more domains as compared to the other lists in the checklist. Another issue that I find with it is that it makes a new (insecure) ssl session for every domain in the list, which is both insecure and inefficient.
More lists recommendations would be appreciated as these lists may be outdated.
Additional context:
You can use this command to interact with the sql server directly
psql -h crt.sh -p 5432 -U guest certwatch
https://groups.google.com/g/crtsh/c/sUmV0mBz8bQ/m/K-6Vymd_AAAJ
Domain list
https://hackertarget.com/top-million-site-list-download/
https://radar.cloudflare.com/domains
https://www.domcop.com/top-10-million-websites
https://s3-us-west-1.amazonaws.com/umbrella-static/index.html
https://majestic.com/reports/majestic-million
https://builtwith.com/top-sites
https://tranco-list.eu/
https://statvoo.com/dl/top-1million-sites.csv.zip
Next steps:
- Implement a script that pulls domains from domcorp, alexa, cloudflare, majestic and others
- Dedupe the list/find better ways to extract environment ids
- Change to use sql interface
- I intend to open a pull request later