splunk / security_content

Splunk Security Content

Home Page:https://research.splunk.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SPL Bug - Excessive DNS Failures

bowesmana opened this issue · comments

@patel-bhavin

The SPL for this correlation search is broken

https://research.splunk.com/network/104658f4-afdc-499e-9719-17243f9826f1/

Search contains a number of problems - base search is

| tstats `security_content_summariesonly` count values("DNS.query") as queries from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src","DNS.query"
| `drop_dm_object_name("DNS")`
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain
| where isnull(domain)
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank
| where isnull(rank)
| stats sum(count) as count mode(queries) as queries by src
| `get_asset(src)`
| where count>50 
| `excessive_dns_failures_filter`

Problems are:

  1. values("DNS.query") as queries is redundant because the search split by clause already contains DNS.query, so will only every be a single value - can be optimised away
  2. Order of get_asset macro and where clause should be reversed for performance reasons, as there is no point in running the get _asset to then throw the majority of results away
  3. mode(queries) is wrong in that it will only ever return the FIRST alphabetically ordered queries field by src.

This example shows the effect - it creates a random count for 3 random src and 3 random query.

| makeresults count=30
| eval queries=mvindex(split("a.b.com,x.y.com,j.k.com", ","), random() % 3)
| eval count=random() % 20 + 1
| eval src="source_".(random() % 3)
| fields - _time
| stats sum(count) as count by queries src
| stats sum(count) as count mode(queries) as mode_queries list(count) as list_count list(queries) as list_queries by src

You will always see a.b.com as the mode(queries) in all rows, which is NOT the most common query by src.

The correct logic to determine the most frequent query by src is to do this instead of the stats statement

| eventstats max(count) as mc by src
| eval mode_query=if(count=mc, query, null())
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src

and if there is a multivalue field in query and that is not OK, then simply do

| eval query=mvindex(query,0)

The SPL should probably read something like this, which fixes all 3 issues documented here

| tstats `security_content_summariesonly` count from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src","DNS.query"
| `drop_dm_object_name("DNS")` 
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain 
| where isnull(domain) 
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank 
| where isnull(rank) 
| eventstats max(count) as mc by src
| eval mode_query=if(count=mc, query, null())
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src
| where count>50 
| `get_asset(src)`
| `excessive_dns_failures_filter` 

One issue in general with this correlation search is what should be done with reply_code - currently they are all lumped together, so it's not possible to see what reply_code is common. Adding a values(DNS.reply_code) in the tstats is probably not the way to go, as it's looking for the most common query, as opposed to the most common reply_code/query.

It may be useful to split by DNS.reply_code as well as query and perhaps lower the threshold test (50). Therefore reply code can make it's way to the final output, for example with this SPL

| tstats `security_content_summariesonly` count from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src" "DNS.query" "DNS.reply_code"
| `drop_dm_object_name("DNS")` 
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain 
| where isnull(domain) 
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank 
| where isnull(rank) 
| eventstats max(count) as mc by src reply_code
| eval mode_query=if(count=mc, query, null()) 
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src reply_code
| where count>50 
| `get_asset(src)` 
| `excessive_dns_failures_filter` 

Hello, thank you very much for a detailed feedback : Created this fix PR #2490