cannot extract Chinese comma symbol

Question

cannot extract Chinese comma symbol

lcc19941214 opened this issue 8 years ago · comments

I'm using textract and it's awesome! I can easily extract content from any of my .doc or .docx files.

However, most of my time I'm handling with documents full of Chinese characters and it seems like textract has some porblem with extracting Chinese comma symbol '，' (with space instead).

David Bashford · Answer 1 · Sat Oct 15 2016 00:53:30 GMT+0800 (China Standard Time)

👍

I tend to update textract every 3 months or so and I am overdue. Hoping to be taking a good look at all the various issues/PRs this time next week.

David Bashford · Answer 2 · Tue Dec 20 2016 22:28:08 GMT+0800 (China Standard Time)

@lcc19941214 Not an issue anymore?

The last time I spent time working textract stuff I was working this. Didn't check anything in but believe I was close to resolving this.

David Bashford · Answer 3 · Sat Dec 24 2016 00:48:56 GMT+0800 (China Standard Time)

FWIW, the fix for this was just published as textract 2.1, thanks!