connspy is a command line utility to parse connection logs in either batch or realtime and output summaries. It expects logs in the following line format, whitespace separated, "\n" end of line. TS is the UNIX timestamp FROM_HOST is the host that initiated the connection request TO_HOST is the host that recieved the connection request INSTALLATION ------------ Directly from github: pip install git+https://github.com/nicku33/clrtyconnspy Or clone and either pip install . or python setup.py install This will place the 'connspy' and 'connspy-stream' utility in your path as well. Tests can be run with ./test.sh USAGE of connspy ---------------- connspy simply searches within the given time range and returns a list of unique hosts with the 'to' host connected to. It attempts to avoid scanning the entire file by doing a binary search over the records for the --start timestamp. This can be disabled with --nofastseek usage: connspy [-h] [--to TO] --time_init TIME_INIT [--nofastseek] --time_end TIME_END [--max_log_late_seconds MAX_LOG_LATE_SECONDS] file connspy: parse connection logs to see who is connecting to who positional arguments: file the file to parse optional arguments: -h, --help show this help message and exit --to TO Collect all hosts who connected to this host --time_init TIME_INIT the earliest time stamp of the log entries we should consider --nofastseek do not use fast block seek to start time --time_end TIME_END the end of the time stamp range, noninclusive --max_log_late_seconds MAX_LOG_LATE_SECONDS the maximum time in seconds a log line can be late, relative to minimum time example: connspy --time_init 1565647264445 --time_end 1565733587895 --to=zyla sample_data/input-file-10000.txt alyson ramone zephyrus ... CONNSPY-STREAM -------------- This is intended to be used over a range of files or even tailing a file being written to. It senses when hour boundries are crossed and outputs a list of hosts that have connected to --to and connected by --from as well as the most active host (to + from) in the given period. The output format is chosen for easy reading into a database, no matter how large the collections get, since it is envisioned that there could be a large number of domains. The fields are NOTE: If given a list of files, they must be in timestamp order and not overlap beyond the configuarable --max_log_late_seconds, or records will be skipped. 1. Hour of summary as timestamp 2. Does this belong to the --to query or the --from query 3. One of the domains found Because of the use of a Bloom filter, for large (10M unique) records there is a 1% chance of a single missed record. This was done to keep memory usage constant in extreme situations. usage: connspy-stream [-h] --to TO --from FRM [--only_complete_hours] [--tail] [--max_log_late_seconds MAX_LOG_LATE_SECONDS] [files [files ...]] connspy: parse connection logs to see who is connecting to who positional arguments: files the files to parse, separated by space. Leave blank for STDIN optional arguments: -h, --help show this help message and exit --to TO Collect all hosts who connected to this host --from FRM Collect all hosts who this host connected to --only_complete_hours Normally at end of batch, partially completed hours are dumped. However you many only want completed hours --tail if present, the application will read all files but continue to scan the last file for appends and output when new hour boundries are detected. Note this means that an inactive log will not output a row Even if the clock time crosses the hour. "current time" is entirely a function of data. --max_log_late_seconds MAX_LOG_LATE_SECONDS the maximum time in seconds a log line can be late, relative to minimum time example: connspy-stream --to aselin --from tanya sample_data/input-file-10000.txt 1565654400.0 TO tyjhawn 1565654400.0 MOST cayce 1565658000.0 TO devonta 1565658000.0 FROM reneisha 1565658000.0 MOST zeplin 1565661600.0 MOST keyleigh DEPENDENCIES ------------ python 3.0+ bloom-filter>=1.3