document-retrieval

There are 13 repositories under document-retrieval topic.

chroma-core / chroma
Open-source search and retrieval database for AI applications.
ai database document-retrieval embeddings llm llms rag rust rust-lang vector-database
Language:Rust 24275
vearch
vearch / vearch
Distributed vector search for AI-native applications
vectors vector-search cloud-native document-retrieval embeddings vector-database hybrid-search rag retrieval-augmented-generation ai-native ai-native-database
Language:Go 2240
vector-admin
Mintplex-Labs / vector-admin
The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease.
ai aitools chroma database-management document-retrieval embeddings langchain llms pinecone vector-data-management vector-database vector-database-embedding vectordatabase weaviate ai-agents flowise langchain-js qdrant vector-search vectorspace
Language:TypeScript 2087
OpenBMB / VisRAG
Parsing-free RAG supported by VLMs
document-retrieval document-understanding multi-modal multi-modality rag retrieval retrieval-augmented-generation vision-language-model
Language:Python 849
redis-developer / redis-arXiv-search
Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.
arxiv arxiv-papers document-retrieval document-search huggingface machine-learning nlp redis vector-search openai react cohere vector-database
Language:Python 148
Amitha353 / Machine-Learning-Foundation-Case-Study
python sframe-dataframe regression classification clustering similarity recommendation-system deep-learning predicting-housing-prices sentiment-analysis document-retrieval product-recommendation song-recommender image-finder
Language:Jupyter Notebook 21
grafana / vectorapi
pgvector + embeddings API
embeddings pgvector document-retrieval llms
Language:Python 21
vTuanpham / Vietnamese_QA_System
Vietnamese long form question answering system with documents retrieval.
document-retrieval dpr lfqa qa question-answering sentence-embeddings sentence-similarity sentence-transformers vietnamese vietnamese-nlp nlp instructions instruction-tune
Language:Python 21
xndien2004 / ViDrill
[VLSP 2025] ViDRILL is a Vietnamese document retrieval system for VLSP 2025. It combines dense and sparse retrieval, reranking, and optional LLM-based query rewriting and reasoning to support high-accuracy information retrieval and future LLM-enhanced pipelines.
document-retrieval information-retrieval query-rewriting reinforcement-learning reranking vietnamese-nlp vlsp-2025
Language:Python 17
HennyJie / GNN-DocRetrieval
Implementation of ECIR 2022 Paper: How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation
concept-graphs document-retrieval graph-mining graph-neural-networks
Language:Python 15
manan-paneri-99 / Vector-Space-based-Document-Retrieval-system
Retrieves the top 10 documents from the Wikipedia corpus for a user inputted free-text query
document-retrieval information-retrieval vector-space-model
Language:Python 10
Syed007Hassan / Document-Querying-With-VectorDB
Document Querying with LLMs - Google PaLM API: Semantic Search With LLM Embeddings
chroma document-retrieval embeddings palm-api pdf-encoding vectordb
Language:Python 9
hierarchical-language-modeling
marcomoldovan / hierarchical-language-modeling
We address the task of learning contextualized word, sentence and document representations with a hierarchical language model by stacking Transformer-based encoders on a sentence level and subsequently on a document level and performing masked token prediction.
natural-language-processing natural-language-understanding transformer transfer-learning attention-mechanism representation-learning word-embeddings sentence-embeddings deep-learning machine-learning pytorch document-embedding document-retrieval information-retrieval language-model
Language:Jupyter Notebook 8
maxsagt / lambda-instructor
Run text embeddings with Instructor-Large on AWS Lambda.
aws aws-lambda embeddings lambda machine-learning vector-database document-retrieval llms serverless
Language:Shell 8
agrawal-priyank / machine-learning-case-studies
Built prediction and retrieval models for document retrieval, image retrieval, house price prediction, song recommendation, and analyzed sentiments using machine learning algorithms in Python
machine-learning logistic-regression supervised-learning predictive-analytics coursera university-of-washington graphlab-create jupyter-notebook python ipython-notebook knn-classification nearest-neighbours-classifier document-retrieval clustering similarity deep-learning transfer-learning deep-features image-retrieval image-classifier
Language:Jupyter Notebook 7
aniketwdubey / chatpdf
This project is a Document Retrieval application that utilizes Retrieval-Augmented Generation (RAG) techniques to enable users to interact with uploaded PDF documents. By leveraging a Large Language Model (LLM), users can ask questions about the content of the documents and receive accurate answers based on the information retrieved.
chat-application document-retrieval fastapi huggingface large-language-models llm python rag retrieval-augmented-generation
Language:Jupyter Notebook 6
YUSANITY / TF-IDF-DOCUMENT-RETRIEVAL-CHATBOT
tf-idf document-retrieval chatbot
Language:Jupyter Notebook 6
boudinfl / redefining-absent-keyphrases
Code and dataset for the paper "Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness"
absent-keyphrases digital-library document-retrieval information-retrieval keyphrase-generation retrieval-effectiveness
Language:Python 5
DebanjanSarkar / askdoc
The Intelligent "ASKDOC" project combines the power of Langchain, Azure, OpenAI models, and Python to deliver an intelligent question-answering system, that scans your PDF documents and answer queries based on its contents. It can be queried using Human Natural Language.
artificial-intelligence azure-openai azure-openai-api chatbot document-retrieval faiss langchain langchain-python natural-language-processing natural-language-understanding pdf-document-query python3
Language:Python 5
PRITHIVSAKTHIUR / Multimodal-OCR2
A comprehensive multimodal OCR application that supports both image and video document processing using state-of-the-art vision-language models. This application provides an intuitive Gradio interface for extracting text, converting documents to markdown, and performing advanced document analysis.
document-retrieval gradio huggingface-transformers image-analysis ocr-recognition pillow qwen2-5-vl smoldocling video-understanding vision-transformer
Language:Python 4
shrebox / Information-Retrieval
Compilation of Information Retrieval codes.
information-retrieval inverted-index postional-index tf-idf tf-idf-vectorizer document-retrieval relevance-feedback evaluation-metrics pr-curve naive-bayes knn-classification kmeans-clustering
Language:Jupyter Notebook 4
spyros-briakos / Document-Retrieval-and-Question-Answering-with-BERT
Initially implement Document-Retrieval-System with SBERT embeddings and evaluate it in CORD-19 dataset. Afterwards, fine tune BERT model with SQuAD.v2 dataset so as to evaluate it in Question Answering task.
document-retrieval sbert fine-tuning-bert question-answering pytorch squad-dataset google-colab cord-19-dataset
Language:Jupyter Notebook 4
SubhangiSati / LangChat-Explorer
"LangChat Explorer: Your intuitive document companion. Effortlessly explore vast information with natural language conversations. Simplify queries, gain insights, and embark on a seamless journey of knowledge discovery. Unleash the power of language with LangChat Explorer."
api deep-learning document-retrieval generative-ai llms machine-learning pdf-document-processor python3 q-and-a-bot
Language:Python 4
ahmadvh / Context-based-document-search
A Python-based tool for context-based search across text documents using OpenAI embeddings and Chroma vector storage. This system enables efficient querying of document collections by generating vector embeddings, storing them persistently, and retrieving relevant results based on textual queries.
chromadb contextual-search document-retrieval embeddings langchain machine-learning nlp openai python vector-database
Language:Python 3
anaramirli / snlp-information-retrieval
A two-stage information retrieval model using baseline TF-IDF model and refined BM25.
information-retrieval bm25 okapi-bm25 tf-idf tf-idf-calculation document-retrieval statistical-nlp
Language:Python 3
Md-Emon-Hasan / Retrieval-Augmented-Generation-RAG
RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.
ai-chatbot chromadb custom-llm document-retrieval embedding-models faiss huggingface-rag knowledge-augmented-llm knowledge-graph langchain-rag llm-applications llm-retrieval multi-modal-rag prompt-engineering rag-pipeline retrieval-augmented-generation retrieval-qa semantic-search text-embedding vector-search
Language:Jupyter Notebook 3
MohammedNasserAhmed / CodeXpert
CodeXpert: A cutting-edge AI-powered code analysis tool leveraging CodeLlama, FAISS, and HuggingFace for efficient code understanding, explanation, and optimization. 🚀✨
code-embedding code-explanation document-retrieval embeddings faiss huggingface llm natural-language-processing python vector-search ai-code-analyzer
Language:Python 3
ndtands / Information-Retrieval
document-retrieval
Language:Python 3
wlzhao22 / mirlecture
course slides for Multimedia Information Retrieval
course instance-search nearest-neighbor-search slides multimedia-information-retrieval document-retrieval
Language:TeX 3
YesNLP / text-summ-for-doc-retrieval
Neural text summarization for document retrieval
text-summarization document-retrieval bert-model precision-medicine
Language:Python 3
AGiannoutsos / COVID19-document-retrieval-with-BERT
This project is about developing a document retrieval system to return titles and the context of scientific papers containing the answer to a given user question
bert bert-embeddings cord-19-dataset covid-19 covid19 question-answering document-retrieval deep-learning
Language:Jupyter Notebook 2
Doc-VLMs-exp
PRITHIVSAKTHIUR / Doc-VLMs-exp
An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.
demo-app document-retrieval drex gradio huggingface-transformers ocr qwen2-5-vl spaces transformers vgpu vlms
Language:Python 2
PRITHIVSAKTHIUR / Doc-VLMs-v2-Localization
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
7b document-retrieval gradio huggingface-transformers ocr ocr-recognition qwen2-5-vl table vision-language vision-transformer
Language:Python 2
SavinRazvan / questions
The "Questions" project, part of Harvard's CS50 AI course, develops an AI system for answering questions by retrieving documents and passages from a text corpus using tf-idf. It aids in understanding natural language processing (NLP) and information retrieval techniques.
ai cs50 document-retrieval information-retrieval natural-language-processing nlp nltk passage-retrieval python question-answering tf-idf
Language:Jupyter Notebook 2
iota
timothyckl / iota
a minimal local embedding database.
document-retrieval embeddings python vector-database vector-search
Language:Python 2
unendschlossen2 / chatbot_jade_hs_planspiel
An RAG-Chatbot developed for a business-oriented-game at the JADE HOCHSCHULE
chatbot document-retrieval rag
Language:Python 2

document-retrieval

chroma-core / chroma

vearch / vearch

Mintplex-Labs / vector-admin

OpenBMB / VisRAG

redis-developer / redis-arXiv-search

Amitha353 / Machine-Learning-Foundation-Case-Study

grafana / vectorapi

vTuanpham / Vietnamese_QA_System

xndien2004 / ViDrill

HennyJie / GNN-DocRetrieval

manan-paneri-99 / Vector-Space-based-Document-Retrieval-system

Syed007Hassan / Document-Querying-With-VectorDB

marcomoldovan / hierarchical-language-modeling

maxsagt / lambda-instructor

agrawal-priyank / machine-learning-case-studies

aniketwdubey / chatpdf

YUSANITY / TF-IDF-DOCUMENT-RETRIEVAL-CHATBOT

boudinfl / redefining-absent-keyphrases

DebanjanSarkar / askdoc

PRITHIVSAKTHIUR / Multimodal-OCR2

shrebox / Information-Retrieval

spyros-briakos / Document-Retrieval-and-Question-Answering-with-BERT

SubhangiSati / LangChat-Explorer

ahmadvh / Context-based-document-search

anaramirli / snlp-information-retrieval

Md-Emon-Hasan / Retrieval-Augmented-Generation-RAG

MohammedNasserAhmed / CodeXpert

ndtands / Information-Retrieval

wlzhao22 / mirlecture

YesNLP / text-summ-for-doc-retrieval

AGiannoutsos / COVID19-document-retrieval-with-BERT

PRITHIVSAKTHIUR / Doc-VLMs-exp

PRITHIVSAKTHIUR / Doc-VLMs-v2-Localization

SavinRazvan / questions

timothyckl / iota

unendschlossen2 / chatbot_jade_hs_planspiel