SheetJS / js-cfb

:floppy_disk: OLE File Container Format

Home Page:https://sheetjs.com/cfb-editor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parsing files in container.content

SuddenDevelopment opened this issue · comments

Hi I'm hoping you can help me on this... for this project:
https://github.com/SuddenDevelopment/ScanWordDoc

I'm trying to be able to extract the Macro as a string. I can detect if it's there but the the cfb file.content is a buffer, I can toString('utf8') that buffer and see it's still a ways off from being workable... I get a bunch of unreadbale characters and the macro is in there. in this format i cant treat it like a string, I cant get an indexOf or regex match on anything except for an attribute in .content that is in quotes.

how can I parse the .content buffer in a word doc file to work with it further?

I ahev also tried passing the resulting .content buffers to cfb.parse and cfb.read with every option I could find :)

thanks

The files are not actually stored as plaintext. The individual files start with a performance cache. The starting offset for the content are stored in the main project file, so you need to read those first. The actual payload uses a light compression technique. The structure is described in the MS-OVBA specification.

yeah I firgured out that they were not plaintext. is reading the payload something you have an example of for one of your libraries?