lovit

Hyunjoong Kim's repositories

soynlp

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

Language:PythonNOASSERTION933 44 114

KR-WordRank

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다

Language:PythonNOASSERTION351 11 12

textrank

Implementation TextRank and related utils

Language:PythonMIT84 2 2

KoBERTScore

BERTScore for Korean

Language:Python72 3 18

huggingface_konlpy

Training Transformers of Huggingface with KoNLPy

Language:Jupyter Notebook68 6 2

WordPieceModel

Word Piece Model python light version with functions tokenize/save/load

Language:Python65 5 1

namuwikitext

Wikitext format dataset of Namuwiki (Most famous Korean wikipedia)

Language:Python50 2 11

naver_news_search_scraper

검색어 기준으로 네이버뉴스와 댓글을 수집하는 파이썬 코드

Language:Python43 6 5

soykeyword

Python library for keyword extraction

Language:Python39 5 5

clustering4docs

Clustering algorithm library. Implemented spherical kmeans

Language:PythonGPL-3.037 4 7

naver_movie_scraper

네이버 영화 정보 및 사용자 작성 영화평/평점 데이터 수집기

Language:Python29 3 4

kmrd

Synthetic dataset for recommender system created from Naver Movie rating system

Language:Python24 2 1

levenshtein_finder

Similar string search in Levenshtein distance

Language:Python22 2 1

python_ml_intro

패스트캠퍼스, 파이썬을 이용한 머신러닝 입문 실습 코드

Language:Jupyter Notebook21 80

kowikitext

Language:Python19 2 6

synthetic_dataset

Synthetic data generator for machine learning

Language:Python16 40

petitions_archive

청와대 국민청원 데이터 아카이브

15 2 2

pycrfsuite_spacing

python-crfsuite를 이용한 한국어 띄어쓰기 교정기

Language:Python14 30

kmeans_to_pyLDAvis

Visualizing k-means using pyLDAvis

Language:Python11 2 2

flask_api_tutorial

Flask 로 API 를 만들기 위한 튜토리얼

Language:Python10 2 5

text-dedup

Python package for memory-friendly text de-duplication

Language:PythonApache-2.06 20

python_upload_webserver

Flask, Waitress based file upload webserver

Language:Python3 2 2

python-stopwatch

Python stopwatch

Language:Python1 2 1

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Language:RustApache-2.0100

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.01 10

kwnlp-sql-parser

Utilities for parsing Wikipedia MySQL/MariaDB dumps.

Language:PythonApache-2.0000

papago_intern

Language:Python000

parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Language:PythonApache-2.0000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.0000

wikiextractor

A tool for extracting plain text from Wikipedia dumps

Language:PythonAGPL-3.0010