Question on config
zakariabouachra opened this issue · comments
Ubuntu
java 8
I have a machine with 32CPU and 64go of memory and I can't extract my pdf.
79545 pdf I want to extract with grobit client python code
ERROR [2023-11-12 07:18:28,906] org.grobid.service.process.GrobidRestProcessFiles: Could not get an engine from the pool within configured time. Sending service unavailable.
70.49.92.133 - - [12/Nov/2023:07:18:28 +0000] "POST /api/processFulltextDocument HTTP/1.1" 503 0 "-" "python-requests/2.31.0" 1794
Hi @zakariabouachra !
You can use the python client https://github.com/kermitt2/grobid_client_python - then you don't need to understand how to use the service web API directly.
Otherwise, the 503 error is well documented and use to manage parallel requests, see https://grobid.readthedocs.io/en/latest/Grobid-service/#apiprocessfulltextdocument
Yes, I used this one but I don't know how to configure the server concurrency, the n and the config file
I read the documentation but if you can give me a little explanation it would be appreciated
You don't have a lot of PDF and a large machine, so the default settings will run just fine, e.g. use n=10
in the client command line.
If you really want to use all GPU, change concurrency
in the server config and n
to something like 30.
In every cases, use the large Docker image (docker pull grobid/grobid:0.7.3
) rather than the small one or your own build, using Deep Learning models will bring more accurate results. Even without GPU, it should run fine with 80K PDF only.