Hyunjoong Kim's repositories
KR-WordRank
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
KoBERTScore
BERTScore for Korean
huggingface_konlpy
Training Transformers of Huggingface with KoNLPy
WordPieceModel
Word Piece Model python light version with functions tokenize/save/load
namuwikitext
Wikitext format dataset of Namuwiki (Most famous Korean wikipedia)
naver_news_search_scraper
검색어 기준으로 네이버뉴스와 댓글을 수집하는 파이썬 코드
soykeyword
Python library for keyword extraction
clustering4docs
Clustering algorithm library. Implemented spherical kmeans
naver_movie_scraper
네이버 영화 정보 및 사용자 작성 영화평/평점 데이터 수집기
levenshtein_finder
Similar string search in Levenshtein distance
python_ml_intro
패스트캠퍼스, 파이썬을 이용한 머신러닝 입문 실습 코드
synthetic_dataset
Synthetic data generator for machine learning
petitions_archive
청와대 국민청원 데이터 아카이브
pycrfsuite_spacing
python-crfsuite를 이용한 한국어 띄어쓰기 교정기
kmeans_to_pyLDAvis
Visualizing k-means using pyLDAvis
flask_api_tutorial
Flask 로 API 를 만들기 위한 튜토리얼
text-dedup
Python package for memory-friendly text de-duplication
python_upload_webserver
Flask, Waitress based file upload webserver
python-stopwatch
Python stopwatch
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
kwnlp-sql-parser
Utilities for parsing Wikipedia MySQL/MariaDB dumps.
parallelformers
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
wikiextractor
A tool for extracting plain text from Wikipedia dumps