dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cannot extract Chinese comma symbol

lcc19941214 opened this issue · comments

commented

I'm using textract and it's awesome! I can easily extract content from any of my .doc or .docx files.

However, most of my time I'm handling with documents full of Chinese characters and it seems like textract has some porblem with extracting Chinese comma symbol ',' (with space instead).

👍

I tend to update textract every 3 months or so and I am overdue. Hoping to be taking a good look at all the various issues/PRs this time next week.

@lcc19941214 Not an issue anymore?

The last time I spent time working textract stuff I was working this. Didn't check anything in but believe I was close to resolving this.

FWIW, the fix for this was just published as textract 2.1, thanks!