pd3f / pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

Home Page:https://pd3f.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Example API call goes to Waiting forever

rahulkrprajapati opened this issue Β· comments

I can't seem to run the example script to inference anything from the docker image running pd3f. It seems to go on waiting state forever.
I cloned the repo and ran the ./dev.sh script and use the code below for inferencing:

import time

import requests

files = {
    "pdf": (
        "CreditCardStatement (1).pdf.pdf",
        open(r"./test/pdfs/Admit Card.pdf", "rb"),
    )
}
response = requests.post("http://localhost:1616", files=files, data={"lang": "de"})
id = response.json()["id"]

while True:
    r = requests.get(f"http://localhost:1616/update/{id}")
    j = r.json()
    if "text" in j:
        break
    print("waiting...")
    time.sleep(1)
print(j["text"])

TERMINAL OUTPUT:

waiting...
waiting...
waiting...
waiting...
waiting...
waiting...
waiting...
waiting...
waiting...
waiting...

Terminal output for docker:

./dev.sh

[+] Running 2/0
 βœ” Network pd3f_default                                                                                                                                      Created0.0s
 β ‹ Container pd3f-ocr_worker-1                                                                                                                               Creating0.0s
[+] Running 10/5f-parsr-1                                                                                                                    βœ” Network pd3f_default                                                                                                                                      Created0.0s                                                                                                                 βœ” Container pd3f-ocr_worker-1                                                                                                                               Created0.1s mage's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platfo βœ” Container pd3f-parsr-1                                                                                                                                    Created0.1s
 βœ” Container pd3f-redis-1                                                                                                                                    Created0.1s
 ! ocr_worker The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 0.0s
 ! parsr The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested      0.0s
 βœ” Container pd3f-worker-1                                                                                                                                   Created0.0s
 ! worker The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested     0.0s
 βœ” Container pd3f-web-1                                                                                                                                      Created0.0s
 ! web The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested        0.0s
Attaching to ocr_worker-1, parsr-1, redis-1, web-1, worker-1
ocr_worker-1  | + mkdir -p /to-ocr
ocr_worker-1  | + sleep 1
redis-1       | 1:C 02 Mar 2024 14:23:56.137 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-1       | 1:C 02 Mar 2024 14:23:56.137 # Redis version=6.2.14, bits=64, commit=00000000, modified=0, pid=1, just started
redis-1       | 1:C 02 Mar 2024 14:23:56.137 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis-1       | 1:M 02 Mar 2024 14:23:56.137 * monotonic clock: POSIX clock_gettime
redis-1       | 1:M 02 Mar 2024 14:23:56.137 * Running mode=standalone, port=6379.
redis-1       | 1:M 02 Mar 2024 14:23:56.137 # Server initialized
redis-1       | 1:M 02 Mar 2024 14:23:56.138 * Ready to accept connections
parsr-1       | Starting par.sr API : node api/server/dist/index.js
worker-1      | 14:23:56 Worker rq:worker:82fe312997394624a7e13aea0ea16aa5: started, version 1.5.2
worker-1      | 14:23:56 *** Listening on default...
worker-1      | 14:23:56 Cleaning registries for queue: default
web-1         |  * Serving Flask app "/app/app.py" (lazy loading)
web-1         |  * Environment: development
web-1         |  * Debug mode: on
web-1         |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
ocr_worker-1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker-1  | + sleep 1
parsr-1       | [2024-03-02T14:23:57] INFO  (parsr-api/12 on c03dbcc7bc0f): Api listening on port 3001!
web-1         |  * Restarting with stat
web-1         |  * Debugger is active!
web-1         |  * Debugger PIN: 334-469-980
ocr_worker-1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker-1  | + sleep 1
ocr_worker-1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker-1  | + sleep 1
ocr_worker-1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker-1  | + sleep 1
ocr_worker-1  | ++ find /to-ocr -name '*.pdf' -type f