sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Define a maximum parsing time out again

lfcnassif opened this issue · comments

Current time out control actually checks the parsing progress, if there is some progress (chars written to the ContentHandler) the timeout counter is reset. This was designed to handle huge files that may take a lot of time to be parsed, when the client thread can see progress from the parsing thread.

Theoretically, a corrupted or malicious hand crafted file can cause some parser code without proper data validation to enter an infinite loop, writing text to the content handler indefinitely, that would never trigger a time out. Today, that situation is handled by the ZipBombException checking, if text written to the content handler is much bigger than the data being parsed, a ZipBombException would be thrown and parsing is interrupted.

But, there are rare situations where parsing is progressing, but very very slowly, like the one optimized on #2084 (4 days to parse a small RAR file). Another example is when a file of several GBs is wrongly detected as HTML, HTMLParser is very slow and can take days to finish.

So I think it would be worth if we check again for a total maximum parse time for each file, proportional to file size. What would be a reasonable waiting time per MB?

With the new maximum parse time out, maybe we can decrease the current progress time out. One important think to keep in mind is that, for some complex formats or large files, it usually takes some parsing time until the parser starts to output parsing results. For example, large PST or OST files take time to be copied to the temp folder, and then to parse the mailbox index structures. So we can't decrease that much the current progress time out, or we might create a third time out counter to monitor the initial parsing process, until it begins to output results...

Or, we can just stop resetting the time out counter and increase it a bit, it would behave like a total maximum time out...

Ps: The problem is that parsing time depends on CPU speed...