Kyoungrok Jang's repositories
Wikidata_relation_extractor
Code to extract KB triples from given contexts using Wikidata API
art
Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"
bashidioms-examples
Example code from O'Reilly's bash Idioms
BERT-QPP
BERT-QPP: Contextualized Pre-trained transformers for Query Performance Prediction
BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
big-ann-benchmarks
Framework for evaluating ANNS algorithms on billion scale datasets.
charset_normalizer
Truly universal encoding detector in pure Python
CircuitsVis
Mechanistic Interpretability Visualizations using React
dropout
Code release for "Dropout Reduces Underfitting"
einops
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)
gpt3-blog-title-optimizer
Python code for building a GPT-3 based technical blog post optimizer.
imagen-pytorch
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
lightning-transformers
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
natural-questions
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.
pmi-masking
This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper
SAELens
Training Sparse Autoencoders on Language Models
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
splade
SPLADE: sparse neural search (SIGIR21, SIGIR22)
Streamlit-Authenticator
A secure authentication module to validate user credentials in a Streamlit application.
Style-Transformer-for-MSD
Expert-laymen style transfer model based on the Style Transformer (Dai. et al., 2019). This code is based on the MSD dataset (Cao et al., 2020), which is a task for style transfer between an expert-level language to a laymen language for the ease of communication.
subspace-clustering
Toolbox for large scale subspace clustering
the-art-of-command-line
Master the command line, in one page