Support for BIG PDFs?(like 500pages)

Question

Support for BIG PDFs?(like 500pages)

xnpeter opened this issue 2 years ago · comments

Thank you for creating this project; it's been very useful. I have a few minor questions:

When working with very long PDFs, for example, ones that are 500 pages, or when dealing with multiple documents, the API requests seem to become unresponsive. Would it be possible to add a feature for chunked uploading?
Currently, if I add a new document to the 'doc' folder, do I need to re-ingest all the files again? Is it possible to implement an incremental ingest feature?

Renlong · Answer 1 · Sun Sep 24 2023 00:37:23 GMT+0800 (China Standard Time)

Hi, my experience is relevant to the first question, and I even have several books that are over 500 pages long. However, you must be mindful of the pinecone.io free tier's pod capacity, which may be exceeded - premium projects are certainly doable. If you are not prepared to pay and the book has a huge capacity, I would advise to check out this branch https://github.com/mayooear/gpt4-pdf-chatbot-langchain/tree/feat/chroma to deploy the vector database (ChromaDB) locally.

I'm also seeking an answer to the following second question: #415

Cheers,

dosubot · Answer 2 · Sun Dec 24 2023 00:02:43 GMT+0800 (China Standard Time)

Hi, @xnpeter

I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog and am marking this issue as stale. The issue you opened requests support for handling large PDFs and suggests adding chunked uploading to prevent unresponsive API requests. You also inquired about the possibility of implementing an incremental ingest feature to avoid re-ingesting all files when adding a new document to the 'doc' folder. YIN-Renlong shared their experience with handling large PDFs and suggested considering the project's pod capacity and a related issue for the second question.

Could you please confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository? If it is, please let the gpt4-pdf-chatbot-langchain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!