Yohan Na (nayohan)

nayohan

Geek Repo

Company:LG Uplus CTO

Location:Seoul, SouthKorea

Home Page:huggingface.co/nayohan

Github PK Tool:Github PK Tool

Yohan Na's starred repositories

Awesome-AI-Data-GitHub-Repos

A collection of the most important Github repos for ML, AI & Data science practitioners

License:MITStargazers:750Issues:0Issues:0

Google_SCoRe

Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:63Issues:0Issues:0

awesome-production-llm

A curated list of awesome open-source libraries for production LLM

License:MITStargazers:322Issues:0Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:9572Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:10144Issues:0Issues:0

loft

LOFT: A 1 Million+ Token Long-Context Benchmark

Language:PythonLicense:Apache-2.0Stargazers:132Issues:0Issues:0

Superfiltering

[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Language:PythonStargazers:105Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1971Issues:0Issues:0

S-Eval

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

License:NOASSERTIONStargazers:32Issues:0Issues:0

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:482Issues:0Issues:0

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3106Issues:0Issues:0

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Language:RustLicense:Apache-2.0Stargazers:19985Issues:0Issues:0

bad-word-filtering

욕설, 비속어등을 확인하고 처리하는 라이브러리 입니다. 필터링용 욕설및 비속어가 보일 수 있으니 참고해주세요.

Language:JavaLicense:MITStargazers:36Issues:0Issues:0

KoreanBadwordDetection

딥러닝을 사용하지 않고 만드는 파이썬 한국어 욕설 필터링 모듈입니다

Language:PythonLicense:MITStargazers:18Issues:0Issues:0

DiscordBadWordDetect

학교 행사 시스템으로 디스코드 욕 방지 시스템

Language:TypeScriptStargazers:1Issues:0Issues:0

badword-filter-ko

욕 필터 기능과 욕 리스트를 제공합니다

Language:JavaScriptLicense:MITStargazers:2Issues:0Issues:0

KoCommonGEN-V2

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

Language:PythonStargazers:25Issues:0Issues:0

cuhnsw

CUDA implementation of Hierarchical Navigable Small World Graph algorithm

Language:CudaLicense:Apache-2.0Stargazers:138Issues:0Issues:0

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:615Issues:0Issues:0

elasticsearch-vector-crud

ElasticSearch를 이용한 이미지 및 텍스트 데이터 벡터 데이터베이스 저장

Language:PythonStargazers:1Issues:0Issues:0

BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Language:PythonLicense:MITStargazers:6031Issues:0Issues:0

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Language:CudaLicense:Apache-2.0Stargazers:745Issues:0Issues:0
Language:Jupyter NotebookStargazers:33Issues:0Issues:0

sentence-transformers

State-of-the-Art Text Embeddings

Language:PythonLicense:Apache-2.0Stargazers:14960Issues:0Issues:0

KoMT-Bench

Official repository for KoMT-Bench built by LG AI Research

Language:PythonLicense:LGPL-3.0Stargazers:45Issues:0Issues:0

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonLicense:Apache-2.0Stargazers:4752Issues:0Issues:0

distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language:PythonLicense:Apache-2.0Stargazers:1452Issues:0Issues:0

bicleaner-ai

Bicleaner fork that uses neural networks

Language:PythonLicense:GPL-3.0Stargazers:37Issues:0Issues:0

MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.

Stargazers:738Issues:0Issues:0