nayohan

followers

following

stars

LG Uplus CTO

Seoul, SouthKorea

huggingface.co/nayohan

Yohan Na's starred repositories

Awesome-AI-Data-GitHub-Repos

A collection of the most important Github repos for ML, AI & Data science practitioners

MIT75000

Google_SCoRe

Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)

Language:Jupyter NotebookApache-2.06300

awesome-production-llm

A curated list of awesome open-source libraries for production LLM

MIT32200

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0957200

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION1014400

loft

LOFT: A 1 Million+ Token Long-Context Benchmark

Language:PythonApache-2.013200

Superfiltering

[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Language:Python10500

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonApache-2.0197100

S-Eval

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

NOASSERTION3200

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Language:Jupyter NotebookApache-2.048200

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonBSD-2-Clause310600

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Language:RustApache-2.01998500

bad-word-filtering

욕설, 비속어등을 확인하고 처리하는 라이브러리 입니다. 필터링용 욕설및 비속어가 보일 수 있으니 참고해주세요.

Language:JavaMIT3600

KoreanBadwordDetection

딥러닝을 사용하지 않고 만드는 파이썬 한국어 욕설 필터링 모듈입니다

Language:PythonMIT1800

DiscordBadWordDetect

학교 행사 시스템으로 디스코드 욕 방지 시스템

Language:TypeScript100

badword-filter-ko

욕 필터 기능과 욕 리스트를 제공합니다

Language:JavaScriptMIT200

KoCommonGEN-V2

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

Language:Python2500

cuhnsw

CUDA implementation of Hierarchical Navigable Small World Graph algorithm

Language:CudaApache-2.013800

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch

Language:Jupyter NotebookApache-2.061500

elasticsearch-vector-crud

ElasticSearch를 이용한 이미지 및 텍스트 데이터 벡터 데이터베이스 저장

Language:Python100

BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Language:PythonMIT603100

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Language:CudaApache-2.074500

rapids-examples

Language:Jupyter Notebook3300

sentence-transformers

State-of-the-Art Text Embeddings

Language:PythonApache-2.01496000

KoMT-Bench

Official repository for KoMT-Bench built by LG AI Research

Language:PythonLGPL-3.04500

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonApache-2.0475200

distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language:PythonApache-2.0145200

korean_smile_style_dataset

bicleaner-ai

Bicleaner fork that uses neural networks

Language:PythonGPL-3.03700

MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.