There are 2 repositories under pdf-processing topic.
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
Multiple and Large PDF Documents Text Extraction.
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...
Built with pdf-actions NPM package.
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
This is some useful mini projects that I had worked for self-learning Python programming.
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
A statistical data display and notifier app for Covid-19 pandemic.
A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.
Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.
Freedom to use PDF, DOC and other document processing
Extensive analysis of user guides in Swiss government-to-citizen software, correlating guide features with canton socio-economic factors.
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
CLI tool to merge, compress, extract or delete pages from PDF
Azure Document Intelligence Result Processor: A toolset for annotating PDFs based on Azure Document Intelligence analysis results, featuring a React web application and a standalone Python script for processing and visualizing extracted data with confidence indicators.
A Python script to combine multiple PDFs, allowing the insertion of one PDF before the last page of another. Flexible for adding additional documents. Perfect for document management tasks.
The 🐲EMU RPG API🐲 supports the EMU RPG Club’s events by managing game tables, players, and D&D character data. Built with FastAPI, it includes features like table/character management, real-time WebSocket updates, data validation, API monitoring, and secure access, providing an organized backend for tabletop RPG sessions.
PDFGuard is a user-friendly Python application that helps you enhance the security of PDF files by removing potential security threats and hidden content. It does this by converting PDF pages into images and then creating new, sanitized PDFs from these images.
OCR PDF Survey Response Extractor: Automate Paper Survey Analysis
A Flask-based web app that screens PDF documents for compliance terms from GDPR, NIST, and FAR frameworks, with a user-friendly interface for viewing and highlighting flagged terms.
An ATS-optimized resume analyzer providing AI-powered insights, skill scoring, and compatibility analysis to enhance resumes for specific job descriptions
An automation tool designed to extract and process financial data from PDF bank statements, including decryption and extraction of credits, debits, billing periods, and customer details.
The script is to remediate the page order of PDF scans by the home printers which are limited to one-sided scanning
PDF merger and stamper (watermark) using python and PyPDF2 - an open source pure-python PDF library
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Chat-with-PDF is a web application designed to provide an interactive experience with PDF documents. Built using Streamlit, the app allows users to upload PDFs and engage in conversational queries with the content.
Merge multiple PDF files into a single PDF with ease using this simple Python PDF Merger. 🚀
AI-powered PDF Assistant: Upload PDFs and ask questions about the content with intelligent answers powered by FastAPI and LangChain. Option to check Better Answer for enhanced responses.