ranguli / ip

ip: (the) i(nternet is) p(robably down)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ip: (the) i(nternet is) p(robably down)

"ip" is a complete stack for the procurement, processing, analysis and visualization of honeypot data.

Geographical visualization of attacker data

Sample of the SQLite database contents

Bubble map of attackers based on continent


  • Ingests Cowrie honeypot JSON logs into SQLite at 100,000+ insertions/sec, while adding geolocation data from MaxMind.
    • Gets the following information on honeypot attackers:
      • Continent, Country, ISP, Region, City, Timezone, and Postal Code
      • Latitude and Longitude with Accuracy Radius
      • Activity log (login success/fail, logout, credentials used)
      • Log of all access timestamps, as well as timestamp for first and last attacker sightings
      • Number of attacks from an IP on the honeypot
    • Visualizes data out of the box in the following manners:
      • Map IP addresses by geolocation, with color coding and labelling based on severity of threat
      • Chart IP addresses by number of attacks conducted
  • Exposes all SQLite data as a Pandas/GeoPandas dataframe, which can be directly manipulated and visualized in the included Jupyter Notebook
  • Low memory consumption


  • Clone the GitHub repository with git clone https://github.com/ranguli/ip
  • Install the prerequisite packagess for your OS listed below.
  • Within the project root, run python install -r requirements.txt, preferably in a python venv.


Debian-based systems

sudo apt install libpython3-dev proj-bin libgeos-dev libproj-dev


  • Run log_digester.py. This will do the following:
    • Create the sqlite schema, including views, necessary for storing the converted data
    • Perform Geolocation on IP addresses
    • Create a SQLite view 'wordlist' - containing the attackers credentials. Export wordlist to .txt option TBD.
    • Create a profile of each individual attacker, including number of attacks
    • Create a profile of each city and each country, including number of attacks

Structural Overview


  • System packages: libgeos-dev, libgdal-dev, libproj-dev
  • For Python requirements see requirements.txt
  • GeoLite2 City and ASN MMDB files in the root directory of the project, freely downloadable here

Data Size:

One days worth of Cowrie JSON logs are 60MB on average. This means that if the honeypot is running 24/7, you'll end up collecting about 20-30GB of uncompressed raw log data per honeypot a year. This is substantially less if you compress the data into tar achives. The SQLite database turns a 60-80MB daily log into roughly 1MB of processed data. So uncompressed it will yield roughly 365MB a year.

Extrapolating this out to a honeynet containing 5 sensors operated over 3 years:


  • Daily log yield: ~300MB
  • Yearly log yield: ~100GB
  • Total log yield: ~300GB
  • Total SQlite yield: ~1GB


  • Daily log yield: ~30MB
  • Yearly log yield: ~11GB
  • Total log yield: ~33GB


  • Create a virtualenv with requirements.txt packages installed
  • Run python log_digester.py, which will use the sample data provided in the repo
  • Run jupyter notebook to view the data visualizations


  • Dockerize
  • Write a Prometheus exporter
  • Extract and analyze data based on timeframes
    • Get the number of attacks/day, attacks/month, etc
      • How do we determine the number of attacks for a given day?
        • Need to normalize the timestamps first
    • Get the average frequency of attacks for a timeframe (every minute, twice a day, etc)


ip: (the) i(nternet is) p(robably down)


Language:Python 100.0%