Yohan Na's starred repositories
Awesome-AI-Data-GitHub-Repos
A collection of the most important Github repos for ML, AI & Data science practitioners
Google_SCoRe
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
awesome-production-llm
A curated list of awesome open-source libraries for production LLM
Megatron-LM
Ongoing research training transformer models at scale
Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
Liger-Kernel
Efficient Triton Kernels for LLM Training
bad-word-filtering
욕설, 비속어등을 확인하고 처리하는 라이브러리 입니다. 필터링용 욕설및 비속어가 보일 수 있으니 참고해주세요.
KoreanBadwordDetection
딥러닝을 사용하지 않고 만드는 파이썬 한국어 욕설 필터링 모듈입니다
DiscordBadWordDetect
학교 행사 시스템으로 디스코드 욕 방지 시스템
badword-filter-ko
욕 필터 기능과 욕 리스트를 제공합니다
KoCommonGEN-V2
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
elasticsearch-labs
Notebooks & Example Apps for Search & AI Applications with Elasticsearch
elasticsearch-vector-crud
ElasticSearch를 이용한 이미지 및 텍스트 데이터 벡터 데이터베이스 저장
sentence-transformers
State-of-the-Art Text Embeddings
KoMT-Bench
Official repository for KoMT-Bench built by LG AI Research
MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
bicleaner-ai
Bicleaner fork that uses neural networks