ProgrammerUnknown's repositories
ACL-anthology-corpus
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
Algorithm-Practice-in-Industry
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
aqua
AQuA: A Benchmarking Tool for Label Quality Assessment
CIS5528_Project
This is the project repository of CIS5528 Project in 2023 Spring
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
cs-video-courses
List of Computer Science courses with video lectures.
data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
finetuning
Finetune Llama-3-8b on the MathInstruct dataset
llama_index
LlamaIndex is a data framework for your LLM applications
Low-resource-KEPapers
A Paper List of Low-resource Information Extraction
machine-learning-interview
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
minbert-assignment
Minimalist BERT implementation assignment for CS11-711
nessie
Automatically detect errors in annotated corpora.
nlp-from-scratch-assignment-2022
An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch
Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
PromptNER
Prompting For Named Entity Recognition
retriv
A Python Search Engine for Humans 🥸
s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
spacyex
SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.
TrainingDynamics
Compute training dynamics, plot data cartography, analysing data quality...
zshot
Zero and Few shot named entity & relationships recognition