kermitt2 / grobid

A machine learning software for extracting information from scholarly documents

Home Page:https://grobid.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

! org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99

victor919 opened this issue · comments

hello ,when i download the master of grobid by github, and run with IDEA on my MacOS(Apple M1 Pro), i get following error.Can you please suggest possible solution here ? thank you very much!

ERROR [2023-08-15 07:53:29,206] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 99. [/Users/djp/IdeaProjects/grobid/grobid-home/pdfalto/mac_arm-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, -l, 2, /Users/djp/IdeaProjects/grobid/grobid-home/tmp/origin1836236228304053289.pdf, /Users/djp/IdeaProjects/grobid/grobid-home/tmp/KWN8S7w7Hd.lxml, --timeout, 120, --ulimit, 6242304]
ERROR [2023-08-15 07:53:29,206] org.grobid.core.process.ProcessPdfToXml: pdfalto return message: 

ERROR [2023-08-15 07:53:29,206] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs. 
! org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 99
! at org.grobid.core.document.DocumentSource.processPdfaltoServerMode(DocumentSource.java:248)
! at org.grobid.core.document.DocumentSource.pdfalto(DocumentSource.java:151)
! at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:64)
! at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:50)
! at org.grobid.core.engines.HeaderParser.processing(HeaderParser.java:79)
! at org.grobid.core.engines.Engine.processHeader(Engine.java:417)
! at org.grobid.core.engines.Engine.processHeader(Engine.java:385)
! at org.grobid.service.process.GrobidRestProcessFiles.processStatelessHeaderDocument(GrobidRestProcessFiles.java:99)
! at org.grobid.service.GrobidRestService.processHeaderDocumentReturnXml_post(GrobidRestService.java:160)
! at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
! at java.base/java.lang.reflect.Method.invoke(Method.java:578)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
! at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
! at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
! at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
! at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
! at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
! at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
! at io.dropwizard.jetty.NonblockingServletHolder.handle(NonblockingServletHolder.java:49)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
! at io.dropwizard.servlets.ThreadNameFilter.doFilter(ThreadNameFilter.java:35)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:45)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:39)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
! at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
! at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
! at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
! at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
! at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
! at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
! at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
! at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
! at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
! at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
! at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:239)
! at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:52)
! at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:703)
! at io.dropwizard.jetty.BiDiGzipHandler.handle(BiDiGzipHandler.java:67)
! at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:56)
! at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
! at org.eclipse.jetty.server.Server.handle(Server.java:505)
! at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
! at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
! at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
! at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
! at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
! at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
! at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
! at java.base/java.lang.Thread.run(Thread.java:1623)

Hi @victor919

For M1 mac architecture compatibility, the master branch had a an issue during one week, but should be fixed with 2c720dd.

Otherwise, there are always PDF that will fail for one reason or another. If it's just a particular PDF you can test it on the public demo and see if it works on non-M1 environment.