endorno / python-tesseract

Automatically exported from code.google.com/p/python-tesseract

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python-tesseract on raspberry pi

GoogleCodeExporter opened this issue · comments

A few months back I got python-tesseract running on the pi, but now the install 
crashes.

What steps will reproduce the problem?
1. Try to build python-tesseract 0.7 from source. 
(python-tesseract_0.7.orig.tar.gz)on the raspberry pi, following the described 
steps: 
python config.py
python setup.py clean
python setup.py build
sudo python setup.py install

python setup.py build is the one that gives the error.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?

include path=/usr/include
Current Version : 0.7
running build
running build_py
creating build
creating build/lib.linux-armv6l-2.7
copying tesseract.py -> build/lib.linux-armv6l-2.7
running build_ext
building '_tesseract' extension
swigging tesseract.i to tesseract_wrap.cpp
swig -python -c++ -I/usr/include/tesseract -I/usr/include 
-I/usr/include/leptonica -o tesseract_wrap.cpp tesseract.i
/usr/include/tesseract/publictypes.h:76: Warning 462: Unable to set 
dimensionless array variable
creating build/temp.linux-armv6l-2.7
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -fPIC -I. -I/usr/include/tesseract -I/usr/include 
-I/usr/include/leptonica -I/usr/include/python2.7 -c tesseract_wrap.cpp -o 
build/temp.linux-armv6l-2.7/tesseract_wrap.o
cc1plus: warning: command line option â-Wstrict-prototypesâ is valid for 
Ada/C/ObjC but not for C++ [enabled by default]
tesseract_wrap.cpp: In function âPyObject* 
_wrap_TessBaseAPI_SetDictFunc(PyObject*, PyObject*)â:
tesseract_wrap.cpp:6428:8: error: âDictâ was not declared in this scope
tesseract_wrap.cpp:6428:8: note: suggested alternative:
/usr/include/tesseract/baseapi.h:79:7: note:   âtesseract::Dictâ
tesseract_wrap.cpp:6428:21: error: expected primary-expression before âvoidâ
tesseract_wrap.cpp:6428:38: error: expected primary-expression before â,â 
token
tesseract_wrap.cpp:6428:39: error: expected primary-expression before âboolâ
tesseract_wrap.cpp:6446:3: error: âarg2â was not declared in this scope
tesseract_wrap.cpp:6446:32: error: expected â>â before â(â token
tesseract_wrap.cpp:6446:33: error: âDictâ is not a class or namespace
tesseract_wrap.cpp:6446:39: error: expected unqualified-id before â*â token
tesseract_wrap.cpp:6446:40: error: expected primary-expression before â)â 
token
tesseract_wrap.cpp:6446:42: error: expected primary-expression before âvoidâ
tesseract_wrap.cpp:6446:59: error: expected primary-expression before â,â 
token
tesseract_wrap.cpp:6446:60: error: expected primary-expression before âboolâ
error: command 'gcc' failed with exit status 1


Please provide any additional information below.
I use python 2.7, an fully up-to-data version of raspbian(out of the box 
nothing removed). 

Original issue reported on code.google.com by JorenVra...@gmail.com on 13 Jul 2014 at 9:00

I wish I could help but I don't have Raspberry

Original comment by FreeT...@gmail.com on 14 Jul 2014 at 10:28

After trying the shotgun approach, I found a way that works. The 0.7.4 version 
is not promoted on the site, so it took some time finding it. Most of the 
installed programs are unnecessary, but it will take some time figuring out 
what is needed. This is what worked for me: 

sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng 
libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev 
python-opencv python-numpy python-setuptools build-essential subversion

sudo apt-get install tesseract-ocr-eng tesseract-ocr-dev libleptonica-dev 
python-all-dev swig libcv-dev

sudo svn checkout 
http://python-tesseract.googlecode.com/svn/python-tesseract-0.7.4/

sudo python setup.py build
sudo python setup.py install 

Original comment by JorenVra...@gmail.com on 14 Jul 2014 at 1:23

It is great to know that it works. Have you done any modification on the codes? 
 Also, what tesseract version you are using? 

I need your input so that I could backport 0.7.4 to the mainstream version 
hopefully.

Also, mind telling me whether you are doing it for fun or for work?

Joe

Original comment by FreeT...@gmail.com on 14 Jul 2014 at 2:42

I did not modify the code, just checked it out with subversion and installed 
with setup.py build and setup.py install. I have added 2 files, these contain 
the output of the "sudo python setup.py build" and the "sudo python setup.py 
install" commands. 

I use tesseract version 3.02 (latest available version on raspbian).

At the moment I use python-tesseract for a school project.

P.S.
The necessary programs seem to be(some of which are already installed on 
raspbian): 

sudo apt-get install tesseract-ocr tesseract-ocr-eng libtesseract-dev 
libleptonica-dev python-all-dev swig build-essential subversion 
python-setuptools

Original comment by JorenVra...@gmail.com on 14 Jul 2014 at 7:21

Attachments:

I was able to get python-tesseract 0.7.4 to work wont the Raspberry Pi with 
Tesseract 3.02, but not with Tesseract 3.03-rc1 (revision 1049) and Leptonica 
1.70 built from source.  I reinstalled the libraries after compiling from 
source.  Here's the error I get:

pi@raspberrypi:~/ocr/python-tesseract-0.7.4$ python
Python 2.7.3 (default, Mar 18 2014, 05:13:23) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> import cv
>>> import tesseract
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "tesseract.py", line 26, in <module>
    _tesseract = swig_import_helper()
  File "tesseract.py", line 18, in swig_import_helper
    import _tesseract
ImportError: 
/usr/local/lib/python2.7/dist-packages/python_tesseract-0.7.4-py2.7-linux-armv6l
.egg/_tesseract.so: undefined symbol: 
_ZN9tesseract11TessBaseAPI14NormalizeTBLOBEP5TBLOBP3ROWbP6DENORM

Here's version info:
pi@raspberrypi:~$ tesseract -v
tesseract 3.03
 leptonica-1.70
  libjpeg 8d : libpng 1.2.49 : libtiff 3.9.6 : zlib 1.2.7

Original comment by gcap...@gmail.com on 18 Aug 2014 at 4:01

You make need to comment out NormalizeTBLOBE... in the include file 

Original comment by FreeT...@gmail.com on 19 Aug 2014 at 8:31

I could not find where one would comment out NormalizeTBLOBE... in the include 
file.  Can you give me more details?  Thanks for the help!

Original comment by gcap...@gmail.com on 19 Aug 2014 at 4:54

locate baseapi_mini.h

comment out the following line
static void NormalizeTBLOB(TBLOB *tblob, ROW *row, bool numeric_mode);

Original comment by FreeT...@gmail.com on 19 Aug 2014 at 8:25

Thank you!  I had to also comment out:

void SetFillLatticeFunc(FillLatticeFunc f);

Boxa* GetComponentImages(PageIteratorLevel level, bool text_only, Pixa** pixa, 
int** blockids);

void GetFeaturesForBlob(TBLOB* blob, const DENORM& denorm, INT_FEATURE_ARRAY, 
int_features, int* num_features, int* FeatureOutlineIndex);

void RunAdaptiveClassifier(TBLOB* blob, const DENORM& denorm, int 
num_max_matches, int* unichar_ids, float* ratings, int* num_matches_returned);

Boxa* GetTextlines(Pixa** pixa, int** blockids);

Original comment by gcap...@gmail.com on 20 Aug 2014 at 2:57

Given you knew the skill, you should have no problem to brave for a newer 
version.

Have fun.


Original comment by FreeT...@gmail.com on 20 Aug 2014 at 3:14

One more thing... I'm getting a segmentation fault on this line:

api.End()

I attached the code.

Original comment by gcap...@gmail.com on 20 Aug 2014 at 4:09

Attachments:

then comment out this line
#api.End()


Original comment by FreeT...@gmail.com on 20 Aug 2014 at 4:33

Actually, I spoke too soon.  I assumed it was the last line (api.End()) that 
was causing the problem, since the print statement prior to that line was 
executed.  It seems that just having api = tesseract.TessBaseAPI() creates an 
error upon termination.  Sometimes instead of segmentation fault, it gives the 
error:

*** glibc detected *** python: corrupted double-linked list: 0x01864570 ***
Aborted

It is unpredictable whether it gives that error or segmentation fault.

Original comment by gcap...@gmail.com on 20 Aug 2014 at 5:05

Could you try a newer version. The old version did have memory leak.


Original comment by FreeT...@gmail.com on 20 Aug 2014 at 8:38

First I tried 0.9 and could not compile.  Attached is the output.

Then I tried r444 and when I tried import tesseract, I got:

pi@raspberrypi:~/ocr/python-tesseract-r444$ python
Python 2.7.3 (default, Mar 18 2014, 05:13:23) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tesseract
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "tesseract.py", line 26, in <module>
    _tesseract = swig_import_helper()
  File "tesseract.py", line 18, in swig_import_helper
    import _tesseract
ImportError: 
/usr/local/lib/python2.7/dist-packages/python_tesseract-r444-py2.7-linux-armv6l.
egg/_tesseract.so: undefined symbol: cvSetData

I looked for cvSetData in baseapi_mini.h, but could not find it and it's not 
obvious to me which file to modify.

Original comment by gcap...@gmail.com on 20 Aug 2014 at 5:33

Attachments:

as u have used the newest version of tesseract , u better use the newest 
version of svn

Original comment by FreeT...@gmail.com on 20 Aug 2014 at 6:09

I'm using, which is  
svn, version 1.7.5 (r1336830)
   compiled Mar 22 2014, 03:08:50

You think I need to upgrade to 1.8.x ?

Original comment by gcap...@gmail.com on 20 Aug 2014 at 6:26

[deleted comment]
svn checkout http://python-tesseract.googlecode.com/svn/trunk/ python-tesseract
cd python-tesseract/src
python setup.py build
python setup.py install

make sure than tesseract is 3.0.3 and leptonica 1.7

Original comment by FreeT...@gmail.com on 20 Aug 2014 at 6:36

I followed those exact instructions and got the following error:

...
running build
running build_py
file tesseract.py (for module tesseract) not found
file tesseract.py (for module tesseract) not found
running build_ext
building '_tesseract' extension
swigging tesseract.i to tesseract_wrap.cpp
swig -python -c++ -I/usr/include/tesseract -I/usr/include/leptonica 
-I/usr/include/opencv2 -o tesseract_wrap.cpp tesseract.i
tesseract.i:98: Error: Unable to find 'renderer.h'
error: command 'swig' failed with exit status 1

Original comment by gcap...@gmail.com on 21 Aug 2014 at 3:23

The missing files were under /usr/local/include , so I modified makefile.shsh 
and setup.py to have the correct paths.

Original comment by gcap...@gmail.com on 21 Aug 2014 at 4:21

Ok, I successfully installed 0.9, but I'm still getting the same error:

Python 2.7.3 (default, Mar 18 2014, 05:13:23) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tesseract
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "tesseract.py", line 26, in <module>
    _tesseract = swig_import_helper()
  File "tesseract.py", line 22, in swig_import_helper
    _mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: ./_tesseract.so: undefined symbol: cvSetData

Original comment by gcap...@gmail.com on 21 Aug 2014 at 4:56

Any ideas on the error in #22 above.  I did find you got a similar in the past 
according to this issue:
https://code.google.com/p/python-tesseract/issues/detail?id=7

Here's my OpenCV version:
>>> from cv2 import __version__
>>> __version__
'2.4.8'
>>> 

I have no problem importing either cv or cv2.  Could part of the problem be 
that I have

OpenCV 2.3.1 in
/usr/share/OpenCV

and

OpenCV 2.4.8 in
/usr/local/share/OpenCV

I did rename the directory that 2.3.1 sits in, but that did not seem to help.

Original comment by gcap...@gmail.com on 22 Aug 2014 at 4:32

could you create a ssh account for me and sent it to my gmail account?

Original comment by FreeT...@gmail.com on 24 Aug 2014 at 8:57