Does it look normal?

Security standards are keen in asking logs to be monitored. It is also expected that whoever or whatever will monitor these logs will detect anomalies.

... Unless the person/code monitoring the logs knows the likelihood of each event happening, everything will be hard to explain.

There are plenty of options, but I wanted to code one myself to see what it involved. I tried to do this using cross entropy analysis (information theory, https://en.wikipedia.org/wiki/Cross_entropy)

prerequisites

docker
docker-compose
a source of logs that generates discrete outputs.

How to run the service

docker-compose up

This starts a web service that will expect GET queries on the /ratebykey endpoint.

How to use

A client needs to submit 3 mandatory parameters to this service

the context parameter: this identifies which collection (table) the value belongs to (ex.: syslog_by_ip)
The key parameter: unique key identifying the actor doing an action. For example: ip address, username, mac address, concatenation of multiple values that makes sense for you
The action parameter: what the actor did. for example: login:success. this string can be whatever, as long as it represent a discrete action. Anything dynamic (ie.: source port, time, process id) needs to be removed.

There is a last optional parameter:

The date in iso format. If it is undefined, it will use the current server date. This is useful when events needs to be replayed, or if events are fed as batches. The service only keeps values for the last 1 day (unless you modify the hardcoded value in the code), so the date of the entry is significant

Usage example

Submiting a value to the service looks like this

curl http://service:5000/ratemykey?context=authentication&key=1.2.3.4&action=/login:success

The service will return something like this

{"context": "authentication", "key": "1.2.3.4", "action": "/login:success", "date": "2020-12-18T01:58:06.005000", "runtime": 0.12289857864379883, "result": {"key": "1.2.3.4", "xentropy": 22668.486066189835, "count": 4524, "xz": 17.485557229264874, "normalized": 5.010717521262121, "nz": 1.1418003271427444, "outlier": "false"}}

Making sense of the output

The context, key, action values are just an output of what was provided.
The date value will be the one provided as parameter, or the current time if none was provided.
The runtime value is the processing time from start to finish, which can be useful to see if your system is fast enough for your needs.
The result parameter contains... the results
- xentropy: total value of information in bit generated by the key. This number can become big if the session of the key is continuous. A key will only be flushed if there is no activity for 24 hours.
- count: number of values observed for the key
- normalized: xentropy / count. Since this is calculated for each key, we can use this value to see if the normalized cross entropy is still significantly greater than others
- xz: z score for key against others. this represents how many standard deviations a key is under or over the average. xz >= 1 means that the value is greater than 84.1% of the others. xz >= 2 means greater than 97.7% of all keys. xz >= 3 means greater than 99.8% of all keys. So a xz >=3 is in the 0.2% of all keys. This is unlikely.
- nz: z score for normalized cross entropy. Same concept as xz, but for normalized actions.
- outlier: if xz and nz are 3 or greater, we can be quite confident that what the key is doing is unlikely. the xz and nz limits are defined in the Dockerfile.

How I use these results

I am using Open Policy Agent to take decision on events. The OPA rule is used to filter out known noise, and will trigger an alert on anything else with xz>=3 and nz>=2. If OPA is not an option and filtering on xz and nz is not convenient, SZLIMIT and NZLIMIT can be modified in the Dockerfile to change over each limit outlier will be set to true. I noticed during testing that administrators and service accounts tend to trigger alerts more than users. Instead of increasing the xz and nz limits too high, it might worth filtering known good actions in OPA or through your filter of choice.

Sample script

./examples/feed_json.py can be used to feed a json log to the web service. This script was coded to take a custom bigquery output from okta, modify the date, then feed it to the web service.

The output of feed_json.py can be filtered using jq like follows:

python3 feed_json.py 0 | jq '.|select((.result.xz >= 3) and (.result.nz >= 2) )'

The output will look like this

{
  "context": "okta_by_ip",
  "key": "10.200.129.56",
  "action": "user.authentication.sso:SUCCESS",
  "runtime": 0.05727791786193848,
  "result": {
    "key": "10.200.129.56",
    "xentropy": 423.69227524651967,
    "count": 81,
    "xz": 4.700408508719075,
    "normalized": 5.230768830203947,
    "nz": 2.19694324038594,
    "outlier": "false"
  }
}

./examples/generate_queries.py can also be used to generate junk.

cleanup of the database

NOTE: it's a bad idea to expose this to the internet without protection

This query will wipe all data in the database for a specified context.

curl http://localhost:5000/reset?context=okta_by_ip

Differences between the Python Version and the Nodejs version

Python

no cache, the app is going through a full refresh of the state at each query
there might be an issue in the code, or how garbage collection is handled, but this python app seems to have a memory leak.

Nodejs

Caching the score lookup table both in mongodb for all workers, and in memory for the local instance
Caching the entries from the database used by the scorekey function, both in mongodb for all workers and in memory for the local instance
The statistics for keys are calculated on a sample size of 5000. This limit is aribrary, but since statistics considers a sample to be sufficient over 1050, this gives some margin to have a reliable accuracy

The nodejs version was tested against a log file with 1.8M distinct keys and over 3.8M transactions. Running locally with the current cache tuning, I managed to process ~170k events per hour, which is not super fast (~47 events per second). While this is sufficient to handle the ~44 events per second from the log file, using horizontal scaling is likely going to be necessary to process more events.

cerebraljam / looking-normal