Zae Myung Kim's repositories
streamlit-tutorial
A simple tutorial script on Streamlit using the Iris Dataset
Visualizing-Cross-Lingual-Discourse-Relations
Codes for paper, "Visualizing Cross-Lingual Discourse Relations in Multilingual TED Corpora" at CODI 2021 @ EMNLP 2021
crawl-naver-news-and-comments
Crawling the most read news articles per day over the years (with comments)
bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Cornell-Conversational-Analysis-Toolkit
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
Creative-Commons-Markdown
Markdown-formatted Creative Commons licenses
Discourse-Phenomena-in-Document-level-Neural-Machine-Translation
Datasets for "A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation" accepted by Proceedings of the Second International Workshop of Discourse Processing
DMRST_Parser
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".
good-translation-wrong-in-context
This is a repository with the data and code for the ACL 2019 paper "When a Good Translation is Wrong in Context: ..." and the EMNLP 2019 paper "Context-Aware Monolingual Repair for Neural Machine Translation"
google-research
Google Research
kmeans_pytorch
kmeans using PyTorch
korean_wordlist
korean wordlist
Pytorch-Sequence-Bucket-Iterator
A minimal sampler example for bucketing sequences of similar lengths in Pytorch based off of @TrentBrick script https://gist.github.com/TrentBrick/bac21af244e7c772dc8651ab9c58328c.
Shallow-Discourse-Annotation-for-Chinese-TED-Talks
Datasets for "Shallow Discourse Annotation for Chinese TED Talks" Accepted by LREC 2020
st-annotated-text
A simple component to display annotated text in Streamlit apps.
transformer-lm
Transformer language model (GPT-2) with sentencepiece tokenizer
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
weightedWWL
learning subtree pattern importance for WL based graph kernels
zaemyung.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics