pdf-processing

There are 2 repositories under pdf-processing topic.

dissorial / doc-chatbot
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
openai typescript gpt-3 gpt-4 langchain mongoose nextjs openai-api chat chatbot document-embedding pdf-processing pinecone reactjs tailwindcss vectorization
Language:TypeScript 841
allenai / papermage
library supporting NLP and CV research on scientific papers
computer-vision machine-learning multimodal natural-language-processing pdf-processing python scientific-papers
Language:Python 748
PDFs-TextExtract
ahmedkhemiri95 / PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
data-science extract-text parser pdf pdf-document pdf-processing pdfminer pdfs pdfs-textextract pypdf2 python text-analytics
Language:Python 128
document-processing-pipeline-for-regulated-industries
aws-samples / document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
amazon-comprehend amazon-dynamodb amazon-elasticsearch-service amazon-s3 amazon-sns amazon-sqs amazon-textract amazon-web-services aws aws-cdk aws-lambda cdk data-analytics data-governance data-lineage image-processing image-processing-python machine-learning pdf-processing processing-pipelines
Language:Python 62
Govind-S-B / pdf-to-text-chroma-search
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
chromadb pdf-processing similarity-search text-extraction vector-embeddings
Language:Python 23
ManasMadan / pdf-actions
A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...
pdf pdf-merge pdf-merger react reactjs react-component pdf-split pdf-splitter pdf-processing pdf-lib pdf-rotate pdf-downloader pdf-download javascript npm pdf-free pdf-online
Language:JavaScript 13
ManasMadan / PDFActions
Built with pdf-actions NPM package.
react pdf reactjs react-components react-component pdf-merge pdf-merger pdf-split pdf-splitter pdf-rotate pdf-lib pdf-downloader pdf-download pdf-processing
Language:JavaScript 7
ranguy9304 / LangGraphRAG
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
chatbot information-retrieval langgraph natural-language-processing nlp-machine-learning openai-api pdf-processing python rag terminal-application vector-database web-scraping
Language:Python 7
Inc44 / MaTools
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
application audio-processing code-formatting file-management gui image-processing ocr pdf-processing productivity python qt rust speech-recognition video-processing youtube-downloader
Language:Python 5
Aleptonic / PdfSnipper
PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.
nlp-tools pdf-processing utilities
Language:Python 3
allanninal / document-summarizer
The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backend, it supports PDF and text files, offering real-time summarization for researchers, students, and professionals.
ai-tools document-summarizer flask huggingface nlp open-source-cods pdf-processing reactjs text-summarization vite
Language:JavaScript 3
thinhuos0913 / python_useful_mini_projects
This is some useful mini projects that I had worked for self-learning Python programming.
ocr opencv python image-processing pdf-processing
Language:Python 3
Yardenrsk / PsychometryReceiverCV
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
opencv-python pandas pdf-processing
Language:Python 3
Al-shwaib / Book-Preparation-for-Printing
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
flask-application pdf-processing pymupdf rtl-support a3-printing arabic-books book-preparation commercial-printing offset-printing order-to-print
Language:Python 2
arsath-eng / RAG1-NVIDIA-GENAI
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
document-analysis embeddings faiss langchain llama-models llm nvidia-ai-faundry pdf-processing question-answering rag streamlit vector-store
Language:Python 2
dsckiet / covid-tracker-android-app
A statistical data display and notifier app for Covid-19 pandemic.
mvvm dagger2 statistics pdf-processing
Language:Kotlin 2
Farhaj499 / RAG_with_Weaviate_DB
This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.
agentic-ai embeddings huggingface-transformers langchain pdf-processing python rag retrieval-augmented-generation semantic-search vector-database weaviate
Language:Jupyter Notebook 2
rithulkamesh / docproc
Opinionated and Sophisticated Document Region Analyzer.
pdf-processing document-analysis text-extraction equation-detection mathematical-symbols python ocr region-detection machine-learning layout-analysis content-extraction text-classification data-extraction document-parsing pdf-text-extraction
Language:Python 2
9-5 / Chromium-Intelligence
A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.
ai-assistant browser-automation browser-tools chrome-extension content-analysis custom-prompts gemini-api image-analysis manifest-v3 natural-language-processing pdf-processing productivity proofreading text-processing text-summarization tone-adjustment
Language:JavaScript 1
akshatpunia26 / berrylit_pdf_chat
Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.
api chatbot natural-language-processing pdf-processing python streamlit
Language:Python 1
Aumlo123 / pdfdoom
DOOM in a PDF (as ascii art)
pdf-creation pdf-editor pdf-extraction pdf-generation pdf-library pdf-manipulation pdf-modification pdf-parser pdf-processing pdf-toolkit pdf-tools pdf-viewer github-pdf open-source-pdf pdfdoom
1
DioCrafts / ai-book-summarizer
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
ai ai-powered-tools automation book-summary document-analysis educational-tools knowledge-extraction machine-learning markdown natural-language-processing openai pdf pdf-processing pdf-summarization pymupdf python study-materials text-analysis text-summarization
Language:Python 1
eddieyg / freedomfile
Freedom to use PDF, DOC and other document processing
image-to-pdf pdf pdf-processing image-processing
Language:TypeScript 1
Francesco-Sovrano / Swiss-G2C-User-Guide-Analysis
Extensive analysis of user guides in Swiss government-to-citizen software, correlating guide features with canton socio-economic factors.
content-classification correlation-analysis data-analysis government-data natural-language-processing open-data pdf-processing public-sector python-scripts user-documentation web-scraping swiss-digital-strategy
Language:Python 1
FurqanHun / textnomnom-py
Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍
automation document-conversion pdf-processing pdf-to-text ppt ppt-to-text pptx pptx-to-text text-extraction automated-conversion image-text-extraction cross-platform linux windows
Language:Python 1
gs-ai / PDFProfessor
PDF Professor 1.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!
ai-analysis data-extraction document-processing machine-learning natural-language-processing ollama pdf-processing python text-extraction
Language:Python 1
HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
blip2 image-indexing multimodal-analysis natural-language-queries pdf-processing qwen2-vl retrieval-augmented-generation semantic-search transformers visual-language-models
Language:Jupyter Notebook 1
king04aman / PDF-Extractor-API
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
api-security docker-compose doker fastapi invoice-management invoice-pdf jwt-auth jwt-authentication jwt-token pdf-processing pdf-processor python python3 rate-limiting sap
Language:Python 1
Mateusz2734 / pdf-cli
CLI tool to merge, compress, extract or delete pages from PDF
cli pdf python pdf-processing pdf-tool
Language:Python 1
mohamedelareeg / ImageAutomaticCroppingWatcher
Image Automatic Cropping Watcher: A tool that automatically detects PDF files, converts them to images, corrects perspective distortion, and compiles them back into PDFs.
ai itextsharp json opencv pdf pdf-generation pdf-processing autoskew
Language:C# 1
RajnishProgrammer / PDF-Info-Processing-Service
A PDF processing project with backend integration using Python-Flask 🚀
api flask pdf-document pdf-processing python railway
Language:HTML 1
Remisu / GajyunETL
The goal of this project is to eliminate the need for paper by digitizing the process of handling client passport information.
automation csharp csv-processing data-integration database dotnet etl logging pdf-processing sql-server guesthouse-management
Language:C# 1
setuc / pdf-annotation-with-azure-doc-intel
Azure Document Intelligence Result Processor: A toolset for annotating PDFs based on Azure Document Intelligence analysis results, featuring a React web application and a standalone Python script for processing and visualizing extracted data with confidence indicators.
azure-document-intelligence confidence-scores form-recognizer javascript pdf-annotation pdf-processing python react vite
Language:JavaScript 1
ydvrahul19 / Invoice-Manager
A modern, intelligent invoice processing system with advanced multi-format data extraction capabilities. Process invoices from PDFs, Excel files, and images with smart data recognition.
data-extraction firebase framer-motion invoice-management invoice-processing material-ui pdf-processing react redux-toolkit
Language:JavaScript 1
AkshayG999 / MistralOCR---AI-Powered-Document-Extraction
MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.
ai-tools business-automation-insights data-extraction data-science docker document-analysis document-automation document-processing fastapi image-processing invoice-processing machine-learning mistral-ai ocr pdf-processing python receipt-scanner rest-api streamlit text-extraction
Language:Python
nathania-rachael / Chat-with-Multiple-PDFs
An AI-powered chatbot that lets users upload multiple PDFs and ask questions based on their content. It extracts text, processes it with FAISS, and retrieves answers using Google Generative AI (Gemini Pro) through a simple Streamlit interface.
faiss gemini-pro google-generative-ai langchain pdf-chat-bot pdf-processing python streamlit
Language:Python

pdf-processing

dissorial / doc-chatbot

allenai / papermage

ahmedkhemiri95 / PDFs-TextExtract

aws-samples / document-processing-pipeline-for-regulated-industries

Govind-S-B / pdf-to-text-chroma-search

ManasMadan / pdf-actions

ManasMadan / PDFActions

ranguy9304 / LangGraphRAG

Inc44 / MaTools

Aleptonic / PdfSnipper

allanninal / document-summarizer

thinhuos0913 / python_useful_mini_projects

Yardenrsk / PsychometryReceiverCV

Al-shwaib / Book-Preparation-for-Printing

arsath-eng / RAG1-NVIDIA-GENAI

dsckiet / covid-tracker-android-app

Farhaj499 / RAG_with_Weaviate_DB

rithulkamesh / docproc

9-5 / Chromium-Intelligence

akshatpunia26 / berrylit_pdf_chat

Aumlo123 / pdfdoom

DioCrafts / ai-book-summarizer

eddieyg / freedomfile

Francesco-Sovrano / Swiss-G2C-User-Guide-Analysis

FurqanHun / textnomnom-py

gs-ai / PDFProfessor

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

king04aman / PDF-Extractor-API

Mateusz2734 / pdf-cli

mohamedelareeg / ImageAutomaticCroppingWatcher

RajnishProgrammer / PDF-Info-Processing-Service

Remisu / GajyunETL

setuc / pdf-annotation-with-azure-doc-intel

ydvrahul19 / Invoice-Manager

AkshayG999 / MistralOCR---AI-Powered-Document-Extraction

nathania-rachael / Chat-with-Multiple-PDFs