There are 13 repositories under document-retrieval topic.
Open-source search and retrieval database for AI applications.
The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease.
Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.
Vietnamese long form question answering system with documents retrieval.
[VLSP 2025] ViDRILL is a Vietnamese document retrieval system for VLSP 2025. It combines dense and sparse retrieval, reranking, and optional LLM-based query rewriting and reasoning to support high-accuracy information retrieval and future LLM-enhanced pipelines.
Implementation of ECIR 2022 Paper: How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation
Retrieves the top 10 documents from the Wikipedia corpus for a user inputted free-text query
Document Querying with LLMs - Google PaLM API: Semantic Search With LLM Embeddings
We address the task of learning contextualized word, sentence and document representations with a hierarchical language model by stacking Transformer-based encoders on a sentence level and subsequently on a document level and performing masked token prediction.
Run text embeddings with Instructor-Large on AWS Lambda.
Built prediction and retrieval models for document retrieval, image retrieval, house price prediction, song recommendation, and analyzed sentiments using machine learning algorithms in Python
This project is a Document Retrieval application that utilizes Retrieval-Augmented Generation (RAG) techniques to enable users to interact with uploaded PDF documents. By leveraging a Large Language Model (LLM), users can ask questions about the content of the documents and receive accurate answers based on the information retrieved.
Code and dataset for the paper "Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness"
The Intelligent "ASKDOC" project combines the power of Langchain, Azure, OpenAI models, and Python to deliver an intelligent question-answering system, that scans your PDF documents and answer queries based on its contents. It can be queried using Human Natural Language.
A comprehensive multimodal OCR application that supports both image and video document processing using state-of-the-art vision-language models. This application provides an intuitive Gradio interface for extracting text, converting documents to markdown, and performing advanced document analysis.
Compilation of Information Retrieval codes.
Initially implement Document-Retrieval-System with SBERT embeddings and evaluate it in CORD-19 dataset. Afterwards, fine tune BERT model with SQuAD.v2 dataset so as to evaluate it in Question Answering task.
"LangChat Explorer: Your intuitive document companion. Effortlessly explore vast information with natural language conversations. Simplify queries, gain insights, and embark on a seamless journey of knowledge discovery. Unleash the power of language with LangChat Explorer."
A Python-based tool for context-based search across text documents using OpenAI embeddings and Chroma vector storage. This system enables efficient querying of document collections by generating vector embeddings, storing them persistently, and retrieving relevant results based on textual queries.
A two-stage information retrieval model using baseline TF-IDF model and refined BM25.
RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.
CodeXpert: A cutting-edge AI-powered code analysis tool leveraging CodeLlama, FAISS, and HuggingFace for efficient code understanding, explanation, and optimization. 🚀✨
course slides for Multimedia Information Retrieval
Neural text summarization for document retrieval
This project is about developing a document retrieval system to return titles and the context of scientific papers containing the answer to a given user question
An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
The "Questions" project, part of Harvard's CS50 AI course, develops an AI system for answering questions by retrieving documents and passages from a text corpus using tf-idf. It aids in understanding natural language processing (NLP) and information retrieval techniques.
a minimal local embedding database.
An RAG-Chatbot developed for a business-oriented-game at the JADE HOCHSCHULE