JusticeRage / Manalyze

A static analyzer for PE executables.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error parsing ldb file

malware-kitten opened this issue · comments

Hello,

This might be a case of user error, but when I try to run parse_clamav.py against a custom set of clam rules (https://raw.githubusercontent.com/wmetcalf/clam-punch/master/miscreantpunch099.ldb) it'll generate an error

Unable to understand the following offset: 48344426616d703b48354126616d703b*48353426616d703b48363826616d703b48363926616d703b48373326616d703b48323026616d703b48373026616d703b48373226616d703b48366626616d703b48363726616d703b48373226616d703b48363126616d703b48366426616d703b

This appears to be from this line in the ldb file.

MiscreantPunch.EXEInsideOfDoc.ASASCII.2;Target:0;(0);48344426616d703b48354126616d703b*48353426616d703b48363826616d703b48363926616d703b48373326616d703b48323026616d703b48373026616d703b48373226616d703b48366626616d703b48363726616d703b48373226616d703b48363126616d703b48366426616d703b::i

Any help you can give would be greatly appreciated!

Hi! Thanks for your report.
I've been looking into the ClamAV rules you provided, and I'm a bit stumped. The parser is based on the following (official?) specification.

Let's break down the rule that makes the script crash:
SignatureName: MiscreantPunch.EXEInsideOfDoc.ASASCII.2
TargetDescriptionBlock: Target:0 (Any file type)
LogicalExpression: (0) (Subsig0 mush be present)
Subsig0: 48344426616d703b48354126616d703b*48353426616d703b48363826616d703b48363926616d703b48373326616d703b48323026616d703b48373026616d703b48373226616d703b48366626616d703b48363726616d703b48373226616d703b48363126616d703b48366426616d703b::i

So far, so good. From my understanding of the documentation, the "subsigN" can either be structured as "[offset]:[hex string]" or "[hex-string]".
Therefore, my script splits this line on ":", and if there is more than one token, it is assumed that the first one is an offset and the second one is the hexadecimal string. Apparently, this is not the case here, as there are 3 tokens. The first one seems to be the hex-string (expected in second position), there's a blank one in the middle, and I have no idea what the "i" at the end means.
So the hex-string is interpreted as an offset (which should look like "EIP+20" or something) and of course, it fails.

So based on my understanding of the structure of ClamAV signatures (which is likely to be flawed and/or incomplete), the rules you linked to are indeed invalid, as several of them contain the "::i" suffix.
If you know of a more recent specification of the signatures explaining what they are, I'd be more than willing to update the parser!

After talking with the owner of the Clam rules, it sounds like there are some newer features in 0.99 that the signatures are utilizing. The direct quote from him is:
"Clamav-0.99+ and ldb sigs supports PCRE which we use pretty often in those rules, even using named captures etc. So there is probably no clean conversion in those cases."

So it sounds like converting these rules might be a wash and just require some manual by hand conversion (when it's applicable)

YARA supports regular expressions so there might actually a way to convert those signatures.
I would need to find this ClamAV-0.99+ specification/documentation however. I'll look around for it, but if the owner of the rules has it lying around somewhere, I'd be more than happy to get it from them!

Not sure if I need to open a new issue since the error title it's valid also in my case.
I've issue with main.ldb file that, when parsed, returns the following error

$ ./update_clamav_signatures.py 
Downloading: main.cvd Bytes: 168205379
Rule Win.Trojan.EOL-1 seems to be malformed. Skipping...
Traceback (most recent call last):
  File "/home/dadokkio/Docker/Manalyze/bin/yara_rules/./update_clamav_signatures.py", line 166, in <module>
    update_signatures(URL_MAIN, args.skipdownload)
  File "/home/dadokkio/Docker/Manalyze/bin/yara_rules/./update_clamav_signatures.py", line 138, in update_signatures
    parse_ldb("%s.ldb" % file_basename, "clamav.yara", file_basename != "main")
  File "/home/dadokkio/Docker/Manalyze/bin/yara_rules/parse_clamav.py", line 324, in parse_ldb
    for block in data[1].split(","):
IndexError: list index out of range

The value of the line that generate error [The first one really] is:
# There must always be one line in the file

It seems that comment lines are not skipped.

Thanks a lot! That looks new, I'll fix this ASAP.