Fan's starred repositories
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
CompilerGym
Reinforcement learning environments for compiler and program optimization tasks
scaling-with-vocab
📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Phi-3CookBook
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
flash-attention
Fast and memory-efficient exact attention
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Awesome-DataCentric-LLM
trending projects & awesome papers about data-centric llm studies.