Tianyu Gao's starred repositories
stable-diffusion
A latent text-to-image diffusion model
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
LegalPapers
Must-read Papers on Legal Intelligence
knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
ModelCenter
Efficient, Low-Resource, Distributed transformer implementation based on BMTrain
ACL-anthology-corpus
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
MAVEN-dataset
Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".
attribute_charge
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".