thiagomayllart / domainthreat

Daily Domain Monitoring to detect phishing and brand impersonation with subdomain enumeration and source code scraping

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

domainthreat

Daily Domain Monitoring for Brands and Mailing Domain Names

Current Version 3.02

New in Version: 3.0

  • Add more Subdomain Scans via dnsdumpster and subdomain.center
  • Add Feature "Email Availability": Check if domain is ready for receiving mails (by mx record) or ready for sending mails (by SPF record and dmarc record)
  • Bug Fixes in CSV Output
  • Improve Readability of code
  • Add Feature "parked domains": Check if domain is parked or not (experimental)

Here you can find a Domain Monitoring tool. You can monitor your company brands (e.g. "amazon"), your mailing domains (e.g. "companygroup) or other words.

Motivation

Typical Domain Monitoring relies on brand names as input. Sometimes this is not sufficient enough to detect phishing attacks in cases where the brand names and mailing domain names are not equal.

Thought experiment: If example company "IBM" monitors their brand "IBM", send mails via @ibmgroup.com and attacker registers the domain ibrngroup.com (m = rn) for spear phishing purposes (e.g. CEO Fraud). Typical Brand (Protection) Domain Monitoring Solutions may experience difficulties because the distance between monitored brand name "IBM" and registered domain name "ibrngroup.com" is too big to classify it as a true positive and therefore makes it harder for the targeted company to take appropriate measures more proactively. This scenario is avoidable by also monitoring your mailing domain names and thus focussing more on text strings rather than brands.

This was the motivation for this project.

Detection Scope

  • full-word matching (e.g. amazon-shop.com),
  • regular typo squatting cases (e.g. ammazon.com),
  • typical look-alikes / phishing / so called CEO-Fraud domains (e.g. arnazon.com (rn = m),
  • IDN Detection / look-alike Domains based on full word matching (e.g. ๐—‰ay๐ž€al.com - greek letter RHO '๐ž€' instead of latin letter 'p'),
  • IDN Detection / look-alike Domains based on partial word matching (e.g. ๐—‰ya๐ž€a1.com - greek letter RHO '๐ž€' instead of latin letter 'p' AND "ya" instead of "ay" AND Number "1" instead of Letter "l")

Example Screenshot: Illustration of detected topic keyword in source code of newly registered domains image


Example Screenshot: Illustration of detected subdomains of newly registered domains as of version >= 2.2 image

Features

Key & CSV Output Features

  • Check if domain is parked or not (experimental state)

  • Subdomain enumeration via crt.sh, dnsdumpster and subdomain.center (beware of rate limits)

  • Check website status by http status codes: HTTPError for a 4XX client error or 5XX server error response code

  • Check if domain is ready for receiving mails (by mx record) or ready for sending mails (by SPF record and dmarc record)

  • Keyword detection in (english translated) source codes of newly registered domains via HTML Title, Description and HTML Keywords Tag - even if they are in other languages (e.g. chinese) by using different translators (normalized to english per default)

    ==> This is to cover needs of international companies and foreign-speaking markets

  • IDN / Homoglyph / Homograph Detection

  • Daily CSV export into a calender week based CSV file (can be filtered by dates)

Other Features

  • Multithreading (50 workers by default) & Multiprocessing
  • False Positive Reduction Instruments (e.g. self defined Blacklists, Thresholds depending on string lenght)
  • Keyword detection in source code of newly registered domains which neither contain brands in domain names nor are similar registered
  • Mix of Edit-based and Token-based textdistance algorithms to increase result quality by considering degree of freedom in choosing variations of domain names from attacker side
  • Possibility to change pre-defined thresholds of fuzzy-matching algorithms if you want to

Principles

1. Basic Domainmonitoring

1.1. Keywords from file keywords.txt (e.g. tuigroup) are used to make full-word detection (e.g. newtuigroup.shop) and similar-word detection (e.g. tuiqroup.com (g=q)) on newly registered domain names.

1.2. Keywords from file topic_keywords.txt are used to find these keywords (e.g. travel) in source code of (translated) webpages (e.g. dulichtui.com) of domain monitoring results from point 1.1.

==> Results are exported to Newly_Registered_Domains_Calender_Week_ .csv File

2. Advanced Domainmonitoring

2.1. Keywords from file topic_keywords.txt (e.g. holiday) are used to make full-word detection (e.g. usa-holiday.net) on newly registered domain names.

2.2. Keywords from file unique_brand_names.txt are used to find these keywords (e.g. tui) in content of webpages of monitoring results from point 2.1.

==> Results are exported to Advanced_Monitoring_Results_Calender_Week_ .csv File

Instructions

How to install:

How to run:

  • python3 domainthreat.py

How to update:

  • cd domainthreat
  • git pull
  • In case of a Merge Error: Try "git reset --hard" before "git pull"

Before the first run - How it Works:

  1. Put your brand names or mailing domain names into this TXT file "User Input/keywords.txt" line per line for monitoring operations (without the TLD). Some "TUI" Names are listed per default.

  2. Put common word collisions into this TXT file "User Input/blacklist_keywords.txt" line per line you want to exclude from the results to reduce false positives.

  • e.g. blacklist "lotto" if you monitor keyword "otto", e.g. blacklist "amazonas" if you want to monitor "amazon", e.g. blacklist "intuitive" if you want to monitor "tui" ...
  1. Put commonly used words into this TXT file "User Input/topic_keywords.txt" line per line that are describing your brands, industry, brand names, products on websites. These keywords will be used for searching / matching in source codes of webistes. Default language is english for performing automated translation operations from HTML Title, Description and Keywords Tag via different translators.
  • e.g. Keyword "fashion" for a fashion company, e.g. "sneaker" for shoe company, e.g. "Zero Sugar" for Coca Cola Inc., e.g. "travel" for travel company...
  1. Put your brand names into this TXT file "User Input/unique_brand_names.txt" line per line for monitoring operations (e.g. "tui"). These keywords will be used for searching / matching in sources codes on websites which neither contain your brand names in domain name nor are similar registered to them (e.g. usa-holiday.net). Some "TUI" Names are listed per default.

Troubleshooting

  • In case of errors with modules "httpcore" or "httpx" - possible fixes:
    • pip uninstall googletrans (in case you have installed older version of domainthreat as of version <= 2.11)
    • pip install --upgrade pip
    • pip install --upgrade httpx
    • pip install --upgrade httpcore

Changelog

Notes

Author

TO DO

  • Add additional fuzzy matching algorithms to increase true positive rate / accurancy (Sequence-based algorithm "Longest Common Substring" is already included but not activated by default)
  • Enhance source code keyword detection on subdomain level
  • Add Possibility to parse Arguments (e.g. workers for multithreading)
  • Logo Recognition / Similarity Matching
  • Change multithreading by asyncio in rate limit functions (e.g. subdomain enumeration) - done for crtsh and subdomaincenter

Additional

  • Used public source whoisds (https://www.whoisds.com/newly-registered-domains) has capped quantity of daily registrations to 100.000.
  • Thresholds are intentional tolerant by default (possible high false positive rate) in order to consider degree of freedom in choosing variations of domain names from attacker side more accurate. Change them if you want to match your particular (company) needs
  • A perfect supplement to this wonderful project: https://github.com/elceef/dnstwist
  • Written in Python 3.10
  • Recommended Python Version >= 3.7
  • Some TLDs are not included in this public source (e.g. ".de" domains). You can bypass it by using my other project https://github.com/PAST2212/certthreat that uses CERT Transparency Logs as Input instead.

About

Daily Domain Monitoring to detect phishing and brand impersonation with subdomain enumeration and source code scraping

License:MIT License


Languages

Language:Python 100.0%