Peter's repositories
vid2cleantxt
Python API & command-line tool to easily transcribe speech-based video files into clean text
BoulderAreaDetector
An app that uses a CNN to classify whether a satellite image shows an area would be a good rock climbing spot or not. On streamlit.
confectionary
a tool to quickly create sweet PDF files from text files :cupcake:
ml4hc-s22-project01
An investigation into tabular classification with deep NNs for ETHZ Machine Learning for Healthcare on the MIT-BIH arrythmia dataset .
scrape-viz-jobs
A tool for scraping and clustering job postings from ch.indeed.com; Visualization is completed through various clustering and dimensionality reduction techniques.
pubmed-text-classification
ETHZ Machine Learning for Healthcare Problem 2: classification of pubmed paper sentences or text into document sections.
rpunct-cpu
📝An easy-to-use package to restore punctuation of the text + cpu
Slack-Export-JSON-to-CSV
Convert Slack messages exported in their complicated JSON format to simple CSV format, by channel or entire exported workspace
SummComparer
compiles and parses the summarization gauntlet and results from various models into a dataset-like format
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
autoEDA-resources
A list of software and papers related to automatic and fast Exploratory Data Analysis
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
autolabel
Label, clean and enrich text datasets with LLMs.
CASSINI_geo
for the CASSINI hackathon
contrastors
Train Models Contrastively in Pytorch
DailyDialogue-Parser
Parser for DailyDialogue Dataset, updated with some conventions and additional cleaning for text-generation
deepcluster
Custom PyTorch model (VGG-16 Auto-Encoder) and custom criterion (Local Aggregation) for image clustering. The repo contains elaborated creation of fungi image data using factory method.
inbox_cleaner
A python script to help manage a Gmail inbox by filtering out promotional emails using GPT-3 or GPT-4.
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
lm-evaluation-harness
A framework for few-shot evaluation of language models.
mteb
MTEB: Massive Text Embedding Benchmark
nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
unlimiformer
Public repo for the preprint "Unlimiformer: Long-Range Transformers with Unlimited Length Input"