Lien Le's starred repositories
TransformerLens
A library for mechanistic interpretability of GPT-style language models
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
basic-pitch
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
PhoWhisper
PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
gpt-researcher
GPT based autonomous agent that does online comprehensive research on any given topic
iwslt-2022
Systems submitted to IWSLT 2022 by the MT-UPC group.
SpeechTransProgress
Tracking the progress in end-to-end speech translation
video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
video-splitter
Simple Python script to split video into equal length chunks or chunks of equal size, duration, etc.
generative-ai-for-beginners
18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
ColossalAI
Making large AI models cheaper, faster and more accessible
awesome-generative-ai
A curated list of modern Generative Artificial Intelligence projects and services
100-Days-Of-ML-Code
100 Days of ML Coding
Data-Science-For-Beginners
10 Weeks, 20 Lessons, Data Science for All!
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
MLE-Flashcards
200+ detailed flashcards useful for reviewing topics in machine learning, computer vision, and computer science.
the-algorithm
Source code for Twitter's Recommendation Algorithm
data-science-road-map
A roadmap for those looking to start or expand a career in the data community
ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
ImageCaptioning.pytorch
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
annotated_deep_learning_paper_implementations
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠