dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Excessive memory usage?

davidworkman9 opened this issue · comments

We've recently began to shard out our text extraction processes and I noticed a significant spike in memory usage. Looks like it's coming from this module. Running the following:

var textract = require('textract');
setInterval(function () {
  console.error(process.memoryUsage());
}, 1000);

Results in around 135 MB of memory being used. Comment out the first line and that shoots down to around 10 MB.

Any ideas what's causing this?

I've found the culprit... J (xls.js) module jumps memory usage from 14 MB to 129 MB.