Dan Lou's starred repositories

llama.cpp

LLM inference in C/C++

Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Language:PythonLicense:Apache-2.0Stargazers:36851Issues:429Issues:1641

whisper.cpp

Port of OpenAI's Whisper model in C/C++

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonLicense:Apache-2.0Stargazers:29183Issues:341Issues:267

alpaca-lora

Instruct-tune LLaMA on consumer hardware

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:18422Issues:154Issues:468

StableLM

StableLM: Stability AI Language Models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:15849Issues:201Issues:76

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonLicense:MITStargazers:14625Issues:129Issues:602

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:12504Issues:169Issues:502

deepface

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Language:PythonLicense:MITStargazers:11044Issues:135Issues:1062

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:9256Issues:97Issues:629

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonLicense:Apache-2.0Stargazers:8421Issues:99Issues:1213

llama-cpp-python

Python bindings for llama.cpp

Language:PythonLicense:MITStargazers:7244Issues:67Issues:989

axolotl

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:6856Issues:50Issues:597

jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models

Language:Jupyter NotebookLicense:MITStargazers:4006Issues:21Issues:40

pyllama

LLaMA: Open and Efficient Foundation Language Models

Language:PythonLicense:GPL-3.0Stargazers:2801Issues:34Issues:93

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonLicense:MITStargazers:2715Issues:42Issues:255

DeBERTa

The implementation of DeBERTa

Language:PythonLicense:MITStargazers:1924Issues:42Issues:122

PanoHead

Code Repository for CVPR 2023 Paper "PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 degree"

Language:PythonLicense:CC0-1.0Stargazers:1892Issues:42Issues:45

gpt-discord-bot

Example Discord bot written in Python that uses the completions API to have conversations with the `text-davinci-003` model, and the moderations API to filter the messages.

Language:PythonLicense:MITStargazers:1742Issues:35Issues:57

notion-sdk-py

The official Notion API client library, but rewritten in Python! (sync + async)

Language:PythonLicense:MITStargazers:1689Issues:26Issues:86

elia

A snappy, keyboard-centric terminal user interface for interacting with large language models. Chat with ChatGPT, Claude, Llama 3, Phi 3, Mistral, Gemma and more.

Language:PythonLicense:Apache-2.0Stargazers:1640Issues:10Issues:39

cramming

Cramming the training of a (BERT-type) language model into limited compute.

Language:PythonLicense:MITStargazers:1263Issues:22Issues:34

llama-int8

Quantized inference code for LLaMA models

Language:PythonLicense:GPL-3.0Stargazers:1053Issues:17Issues:17

pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Language:PythonLicense:MITStargazers:762Issues:12Issues:74

unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System

Language:PythonLicense:Apache-2.0Stargazers:428Issues:14Issues:40

lm-question-generation

Multilingual/multidomain question generation datasets, models, and python library for question generation.

Language:PythonLicense:MITStargazers:286Issues:3Issues:21

hlb-gpt

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

Language:PythonLicense:Apache-2.0Stargazers:261Issues:9Issues:5

medal

Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain

sodaverse

🥤🧑🏻‍🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization"

Language:PythonLicense:MITStargazers:214Issues:18Issues:8

fabricator

[EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.

Language:PythonLicense:Apache-2.0Stargazers:99Issues:6Issues:32