endorno / python-tesseract

Automatically exported from code.google.com/p/python-tesseract

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

api.GetHocrText() returns malformed XML

GoogleCodeExporter opened this issue · comments

Control characters are inserted into the document, and XML parsers cannot 
handle it without first trying to strip them out. This problem was reportedly 
fixed in the main tesseract SVN a few days ago, and I think producing an update 
linked with SVN will fix it.

Using Python 2.7.3 under Windows 7 X64.

P.S. Are there any instructions for building from SVN with VS 2008? I see the 
binary under downloads but there's no information as for how it was generated. 
Just libtesseract et al wrapped with swig?

Original issue reported on code.google.com by stephen....@gmail.com on 9 Aug 2012 at 2:27

If u are trying to use Python 64Version, then the answer is negative. I am 
still working on how to compile tesseract-ocr into windows 64 bit version.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 3:48

No; this is 32-bit python, and I have no interest in compiling/distributing 
anything exclusive to 64-bit machines. Apart from the occasional memory 
corruption from Tesseract and this issue, the package is working very well.

Original comment by stephen....@gmail.com on 10 Aug 2012 at 12:39

Since the current release of tesseract is   relatively old, compiling
svn might not compatible with python tesseract all the time. Anyhow, I
will look into it and come back to u ASAP.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 2:55

Below is built vs tesseract-ocr svn737 
http://python-tesseract.googlecode.com/files/python-tesseract-0.7.5.win32-py2.7.
exe

If it works, buy me a coffee pls. 

If not, pls contact me.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 7:30

Well done; that seems to have fixed it. I'm more than happy to help feed your 
coffee adiction. Do you accept PayPal?

Also, if you would be willing to pass on any instructions for getting the SWIG 
portion to build properly under VS2008 (once Tesseract itself is built) I'd be 
happy to update my own copies on my development machine. Thanks again for the 
quick fix!

Steve

Original comment by stephen....@gmail.com on 10 Aug 2012 at 7:52

Try and let me know whether the following procedures work for u

svn checkout http://python-tesseract.googlecode.com/svn/trunk/ python-tesseract
cd python-tesseract
python setup.py build
python setup.py install


Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:48

[deleted comment]
https://www.paypal.com/cgi-bin/webscr?cmd=_cart&business=VD2Y4PZSK7T86&lc=HK&ite
m_name=To%20support%20the%20development%20of%20python%2dtesseract&amount=5%2e00�
�cy_code=USKD&button_subtype=products&add=1&bn=PP%2dShopCartBF%3abtn_cart_LG%2eg
if%3aNonHosted

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:51

Worked like a charm. Sending you a couple cups of coffee shortly. Thanks!
Steve

Original comment by stephen....@gmail.com on 13 Aug 2012 at 12:32

Thank you for your coffees

Original comment by FreeT...@gmail.com on 13 Aug 2012 at 5:31

Original comment by FreeT...@gmail.com on 20 Aug 2012 at 8:47

  • Changed state: Done