SPL Bug - Excessive DNS Failures

Question

SPL Bug - Excessive DNS Failures

bowesmana opened this issue 2 years ago · comments

bowesmana commented 2 years ago

@patel-bhavin

The SPL for this correlation search is broken

https://research.splunk.com/network/104658f4-afdc-499e-9719-17243f9826f1/

Search contains a number of problems - base search is

| tstats `security_content_summariesonly` count values("DNS.query") as queries from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src","DNS.query"
| `drop_dm_object_name("DNS")`
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain
| where isnull(domain)
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank
| where isnull(rank)
| stats sum(count) as count mode(queries) as queries by src
| `get_asset(src)`
| where count>50 
| `excessive_dns_failures_filter`

Problems are:

values("DNS.query") as queries is redundant because the search split by clause already contains DNS.query, so will only every be a single value - can be optimised away
Order of get_asset macro and where clause should be reversed for performance reasons, as there is no point in running the get _asset to then throw the majority of results away
mode(queries) is wrong in that it will only ever return the FIRST alphabetically ordered queries field by src.

This example shows the effect - it creates a random count for 3 random src and 3 random query.

| makeresults count=30
| eval queries=mvindex(split("a.b.com,x.y.com,j.k.com", ","), random() % 3)
| eval count=random() % 20 + 1
| eval src="source_".(random() % 3)
| fields - _time
| stats sum(count) as count by queries src
| stats sum(count) as count mode(queries) as mode_queries list(count) as list_count list(queries) as list_queries by src

You will always see a.b.com as the mode(queries) in all rows, which is NOT the most common query by src.

The correct logic to determine the most frequent query by src is to do this instead of the stats statement

| eventstats max(count) as mc by src
| eval mode_query=if(count=mc, query, null())
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src

and if there is a multivalue field in query and that is not OK, then simply do

| eval query=mvindex(query,0)

The SPL should probably read something like this, which fixes all 3 issues documented here

| tstats `security_content_summariesonly` count from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src","DNS.query"
| `drop_dm_object_name("DNS")` 
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain 
| where isnull(domain) 
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank 
| where isnull(rank) 
| eventstats max(count) as mc by src
| eval mode_query=if(count=mc, query, null())
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src
| where count>50 
| `get_asset(src)`
| `excessive_dns_failures_filter`

bowesmana · Answer 1 · Mon Nov 07 2022 12:08:31 GMT+0800 (China Standard Time)

One issue in general with this correlation search is what should be done with reply_code - currently they are all lumped together, so it's not possible to see what reply_code is common. Adding a values(DNS.reply_code) in the tstats is probably not the way to go, as it's looking for the most common query, as opposed to the most common reply_code/query.

It may be useful to split by DNS.reply_code as well as query and perhaps lower the threshold test (50). Therefore reply code can make it's way to the final output, for example with this SPL

| tstats `security_content_summariesonly` count from datamodel=Network_Resolution where nodename=DNS "DNS.reply_code"!="No Error" "DNS.reply_code"!="NoError" DNS.reply_code!="unknown" NOT "DNS.query"="*.arpa" "DNS.query"="*.*" by "DNS.src" "DNS.query" "DNS.reply_code"
| `drop_dm_object_name("DNS")` 
| lookup cim_corporate_web_domain_lookup domain as query OUTPUT domain 
| where isnull(domain) 
| lookup update=true alexa_lookup_by_str domain as query OUTPUT rank 
| where isnull(rank) 
| eventstats max(count) as mc by src reply_code
| eval mode_query=if(count=mc, query, null()) 
| stats sum(count) as count values(mode_query) as query values(mc) as max_query_count by src reply_code
| where count>50 
| `get_asset(src)` 
| `excessive_dns_failures_filter`

Bhavin Patel · Answer 2 · Thu Dec 22 2022 05:16:26 GMT+0800 (China Standard Time)

Hello, thank you very much for a detailed feedback : Created this fix PR #2490