thom lake's starred repositories
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
scikit-learn
scikit-learn: machine learning in Python
quickdraw-dataset
Documentation on how to access and use the Quick, Draw! Dataset.
CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
json_repair
A python module to repair invalid JSON, commonly used to parse the output of LLMs
ark-tweet-nlp
CMU ARK Twitter Part-of-Speech Tagger
TriangleCOPA
One hundred challenge problems for logical formalizations of commonsense psychology
treebank-scripts
Suite of scripts for preprocessing the Penn Treebank, primarily to extract lexical subcategorization frames and dependencies.
data-file-parsers
stuff to parse data files
gen-twitter-data-files
python scripts to get twitter feeds (stream and search)
1milliontweets
A big text file full of tweets
tweepy-examples
example python source using the tweepy twitter api