parsing files in container.content
SuddenDevelopment opened this issue · comments
Hi I'm hoping you can help me on this... for this project:
https://github.com/SuddenDevelopment/ScanWordDoc
I'm trying to be able to extract the Macro as a string. I can detect if it's there but the the cfb file.content is a buffer, I can toString('utf8') that buffer and see it's still a ways off from being workable... I get a bunch of unreadbale characters and the macro is in there. in this format i cant treat it like a string, I cant get an indexOf or regex match on anything except for an attribute in .content that is in quotes.
how can I parse the .content buffer in a word doc file to work with it further?
I ahev also tried passing the resulting .content buffers to cfb.parse and cfb.read with every option I could find :)
thanks
The files are not actually stored as plaintext. The individual files start with a performance cache. The starting offset for the content are stored in the main project file, so you need to read those first. The actual payload uses a light compression technique. The structure is described in the MS-OVBA specification.
yeah I firgured out that they were not plaintext. is reading the payload something you have an example of for one of your libraries?