Network analysis capabilities for pcap files

Question

Network analysis capabilities for pcap files

aaronatp opened this issue 7 months ago · comments

Hi @mr-tz and @williballenthin, I was looking at issues #1532 and #1549 and have been thinking about how to implement some network analysis capabilities for capa. Some of the capabilities I have been thinking about include:

matching yara rules to pcap data, and printing any identified capabilities to the user
printing malicious IP addresses and domain names (and protocols and ports used in communication with each)
checking it there are suspicious DNS queries, such as for newly registered domains, or domains that have been associated with malware. I have read that there may be some services that can do this but I have not looked that far into this. It probably wouldn't be that hard for capa to check if a domain is newly-registered anyway
a simple flexible graphing feature to print data like packet size or traffic volume over time. Users could feed various types of pcap data to it using wireshark-like filters. This could help users see e.g., traffic from a suspicious domain name, or the amount of large data packets packets, over time

All of these features would be run against a pcap file. What do you think of these features? Are there others capabilities that should be here too?

Also, with regard to the yara rules above, would it be a good idea to have "flexible" yara rules here? Since data integrity is relatively unreliable with some network protocols like UDP, some malicious sequences might become damaged in transmission. "Flexible" yara rules could alert users even if the rules' byte sequences are slightly damaged. This might be more useful for users whose pcap files are generated "in the wild" compared to users whose pcap's are from sandboxes.

It looks like there are some existing types of "flexible" algorithms for matching yara sequences based on "fuzzy hashing," the Smith-Waterman algorithm, and concepts called Levenshtein distance and Hamming distance. However, these flexible algorithms primarily address other problems; none improve yara rule matching when data packets are damaged in transit.

The damage that data packets may experience in transit might be: If a malicious actor sends a packet containing the 10 letter-long malicious sequence (for example, ABCDBCCBDA), and if two of the letters are dropped or damaged in transit, we might have: ABC-BCC-DA. Yara rules expecting all 10 letters may not match against this slightly altered sequence.

A good algorithm should probably be able to determine roughly how many bytes in a sequence are dropped or damaged, and then apply capa's existing rules to efficiently look for "flexible" yara matches. Please let me know if you think that something like this sounds reasonable - there's a fair chance I've gone too far down the yara rule rabbit hole. "Flexible" yara rules may only be necessary if malicious data packets are sometimes damaged and if some other conditions are met.

Moritz · Answer 1 · Wed Dec 13 2023 20:06:32 GMT+0800 (China Standard Time)

Hey, I've just opened another related issue at #1907.

As for the above suggestions I think we'd want to focus on using the existing data (i.e. from sandbox runs) vs. generating new data or querying external databases. Some ideas here would be cool for a "capa for Wireshark" tool. For now though I think we could pull all important data from the sandbox traces (or from the summary section as mentioned in #1907).

aaronatp · Answer 2 · Thu Dec 14 2023 12:24:33 GMT+0800 (China Standard Time)

Thanks for clarifying some of these things @mr-tz and for opening that issue, I'll have a look!