nicku33 / clrtyconnspy

log parser exercise

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

connspy is a command line utility to parse connection logs in either batch or realtime and output summaries.

It expects logs in the following line format, whitespace separated, "\n" end of line.

TS        is the UNIX timestamp
FROM_HOST is the host that initiated the connection request
TO_HOST   is the host that recieved the connection request


INSTALLATION
------------
Directly from github:
    pip install git+https://github.com/nicku33/clrtyconnspy

Or clone and either 
    pip install .
or
    python setup.py install

This will place the 'connspy' and 'connspy-stream' utility in your path as well.

Tests can be run with ./test.sh



USAGE of connspy
----------------
connspy simply searches within the given time range and 
returns a list of unique hosts with the 'to' host connected to.
It attempts to avoid scanning the entire file by doing
a binary search over the records for the --start timestamp.
This can be disabled with --nofastseek

usage: connspy [-h] [--to TO] --time_init TIME_INIT [--nofastseek] --time_end
               TIME_END [--max_log_late_seconds MAX_LOG_LATE_SECONDS]
               file

connspy: parse connection logs to see who is connecting to who

positional arguments:
  file                  the file to parse

optional arguments:
  -h, --help            show this help message and exit
  --to TO               Collect all hosts who connected to this host
  --time_init TIME_INIT
                        the earliest time stamp of the log entries we should
                        consider
  --nofastseek          do not use fast block seek to start time
  --time_end TIME_END   the end of the time stamp range, noninclusive
  --max_log_late_seconds MAX_LOG_LATE_SECONDS
                        the maximum time in seconds a log line can be late,
                        relative to minimum time

example: connspy --time_init 1565647264445 --time_end 1565733587895 --to=zyla sample_data/input-file-10000.txt

alyson
ramone
zephyrus
...

CONNSPY-STREAM
--------------
This is intended to be used over a range of files or even tailing a file
being written to. It senses when hour boundries are crossed and outputs
a list of hosts that have connected to --to and connected by --from
as well as the most active host (to + from) in the given period.

The output format is chosen for easy reading into a database, no matter
how large the collections get, since it is envisioned that
there could be a large number of domains. The fields are

NOTE: If given a list of files, they must be in timestamp order
and not overlap beyond the configuarable --max_log_late_seconds, 
or records will be skipped.

1. Hour of summary as timestamp
2. Does this belong to the --to query or the --from query
3. One of the domains found

Because of the use of a Bloom filter, for large (10M unique) records
there is a 1% chance of a single missed record. This was done
to keep memory usage constant in extreme situations.


usage: connspy-stream [-h] --to TO --from FRM [--only_complete_hours] [--tail]
                      [--max_log_late_seconds MAX_LOG_LATE_SECONDS]
                      [files [files ...]]

connspy: parse connection logs to see who is connecting to who

positional arguments:
  files                 the files to parse, separated by space. Leave blank
                        for STDIN

optional arguments:
  -h, --help            show this help message and exit
  --to TO               Collect all hosts who connected to this host
  --from FRM            Collect all hosts who this host connected to
  --only_complete_hours
                        Normally at end of batch, partially completed hours
                        are dumped. However you many only want completed hours
  --tail                if present, the application will read all files but 
                        continue to scan the last file for appends and output
                        when new hour boundries are detected.
                        Note this means that an inactive log will
                        not output a row Even if the clock time crosses the
                        hour. "current time" is entirely a function of data.
  --max_log_late_seconds MAX_LOG_LATE_SECONDS
                        the maximum time in seconds a log line can be late,
                        relative to minimum time

example: connspy-stream --to aselin --from tanya sample_data/input-file-10000.txt

1565654400.0    TO      tyjhawn
1565654400.0    MOST    cayce
1565658000.0    TO      devonta
1565658000.0    FROM    reneisha
1565658000.0    MOST    zeplin
1565661600.0    MOST    keyleigh

DEPENDENCIES
------------

python 3.0+
bloom-filter>=1.3







About

log parser exercise


Languages

Language:Python 99.6%Language:Shell 0.4%