Tracy Shen's starred repositories
mmdetection
OpenMMLab Detection Toolbox and Benchmark
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
DAVAR-Lab-OCR
OCR toolbox from Davar-Lab
llama_parse
Parse files for optimal RAG
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
CascadeTabNet
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
RAGatouille
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
financial-vss
Notebooks demonstrating vector search & RAG design patterns with Redis Python clients.
chatgpt-memory
Allows to scale the ChatGPT API to multiple simultaneous sessions with infinite contextual and adaptive memory powered by GPT and Redis datastore.
redis-product-search
Visual and semantic vector similarity with Redis Stack, FastAPI, PyTorch and Huggingface.