charlesw / tesseract

A .Net wrapper for tesseract-ocr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

System.Exception: dlsym: /app/x64/libtesseract41.so: undefined symbol: TessBaseAPIGetAltoText

rootn3rd opened this issue · comments

I'm trying to build a docker image for a console application that runs tesseract library (.NET).

Below is the content of my Dockerfile

FROM mcr.microsoft.com/dotnet/runtime:5.0 AS base
WORKDIR /app

RUN apt-get update -y && apt-get install -y git cmake build-essential && mkdir leptonica
RUN git clone https://github.com/DanBloomberg/leptonica.git /leptonica

WORKDIR /leptonica
RUN mkdir build
WORKDIR /leptonica/build
RUN cmake ..
RUN apt-get install -y libleptonica-dev libtesseract-dev 
RUN apt-get install -y tesseract-ocr


WORKDIR /app/x64
RUN mkdir somefolder
RUN touch my.txt
RUN ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 liblept.so.5
RUN ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 libleptonica-1.80.0.so
RUN ln -s /usr/lib/x86_64-linux-gnu/libtesseract.so.4.0.0 libtesseract40.so

FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src
COPY ["PdfConverter/PdfConverter.csproj", "PdfConverter/"]
RUN dotnet restore "PdfConverter/PdfConverter.csproj"
COPY . .
WORKDIR "/src/PdfConverter"
RUN dotnet build "PdfConverter.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "PdfConverter.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .

WORKDIR /app
ENTRYPOINT ["dotnet", "PdfConverter.dll"]

However, I'm seeing the below exception whenever I run the container

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Exception: dlsym: /app/x64/libtesseract41.so: undefined symbol: TessBaseAPIGetAltoText at InteropDotNet.UnixLibraryLoaderLogic.GetProcAddress(IntPtr libraryHandle, String functionName) at InteropDotNet.LibraryLoader.GetProcAddress(IntPtr dllHandle, String name) at InteropRuntimeImplementer.TessApiSignaturesInstance.TessApiSignaturesImplementation..ctor(LibraryLoader loader) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture) at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes) at System.Activator.CreateInstance(Type type, Object[] args) at InteropDotNet.InteropRuntimeImplementer.CreateInstance[T]() at Tesseract.Interop.TessApi.Initialize() at Tesseract.Interop.TessApi.get_Native() at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable``1 configFiles, IDictionary``2 initialOptions, Boolean setOnlyNonDebugVariables) at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode) at PdfConverter.Program.ExtractText(MemoryStream stream) in D:\StudioWorks\Gothiadigital\OcrDemo\PdfConverter\Program.cs:line 67 at PdfConverter.Program.Main(String[] args) in D:\StudioWorks\Gothiadigital\OcrDemo\PdfConverter\Program.cs:line 24

Any leads would be very helpful.

Thanks

Switched to using Windows container for now.

I tested this with wsl ubuntu 20.4, it works. Only the /usr/lib/x86_64-linux-gnu/libtesseract.so.4.0.0 is /usr/lib/x86_64-linux-gnu/libtesseract.so.4.0.1 now.
With dotnet 6.