parsing files in container.content

Question

parsing files in container.content

SuddenDevelopment opened this issue 6 years ago · comments

Hi I'm hoping you can help me on this... for this project:
https://github.com/SuddenDevelopment/ScanWordDoc

I'm trying to be able to extract the Macro as a string. I can detect if it's there but the the cfb file.content is a buffer, I can toString('utf8') that buffer and see it's still a ways off from being workable... I get a bunch of unreadbale characters and the macro is in there. in this format i cant treat it like a string, I cant get an indexOf or regex match on anything except for an attribute in .content that is in quotes.

how can I parse the .content buffer in a word doc file to work with it further?

I ahev also tried passing the resulting .content buffers to cfb.parse and cfb.read with every option I could find :)

thanks

Eric P Sheets · Answer 1 · Thu May 17 2018 20:00:25 GMT+0800 (China Standard Time)

The files are not actually stored as plaintext. The individual files start with a performance cache. The starting offset for the content are stored in the main project file, so you need to read those first. The actual payload uses a light compression technique. The structure is described in the MS-OVBA specification.

SuddenDevelopment · Answer 2 · Thu May 17 2018 21:20:40 GMT+0800 (China Standard Time)

yeah I firgured out that they were not plaintext. is reading the payload something you have an example of for one of your libraries?