There are 116 repositories under information-retrieval topic.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
Retrieval and Retrieval-augmented LLMs
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Apache Lucene and Solr open-source search software
Fetches system/theme information in terminal for Linux desktop screenshots.
AdalFlow: The library to build & auto-optimize LLM applications.
Accelerated deep learning R&D
MTEB: Massive Text Embedding Benchmark
Learning to Rank in TensorFlow
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖
Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.
Deep neural network to extract intelligent information from invoice documents.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Comprehensive and timely academic information on federated learning (papers, frameworks, datasets, tutorials, workshops)
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
A collection of research on knowledge graphs
telegram group scraper tool. fetch all information about group members
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
A Collection of BM25 Algorithms in Python