Dongjun Lee (DongjunLee)

DongjunLee

Geek Repo

Company:@lbox-kr

Location:South Korea

Home Page:https://dongjunlee.github.io/

Github PK Tool:Github PK Tool


Organizations
hb-research
KLUE-benchmark
lbox-kr
naver

Dongjun Lee's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:66653Issues:559Issues:0

pyscript

Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2

Language:PythonLicense:Apache-2.0Stargazers:17794Issues:171Issues:783

presto

The official home of the Presto distributed SQL query engine for big data

Language:JavaLicense:Apache-2.0Stargazers:15862Issues:861Issues:6531

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:13119Issues:115Issues:987

trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language:JavaLicense:Apache-2.0Stargazers:10093Issues:172Issues:6546

datahub

The Metadata Platform for your Data Stack

Language:JavaLicense:Apache-2.0Stargazers:9601Issues:254Issues:2133

fastapi-best-practices

FastAPI Best Practices and Conventions we used at our startup

keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Language:GoLicense:Apache-2.0Stargazers:8222Issues:92Issues:2228

alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Language:JavaLicense:Apache-2.0Stargazers:6783Issues:442Issues:2199

obsidian-dataview

A data index and query language over Markdown files, for https://obsidian.md/.

Language:TypeScriptLicense:MITStargazers:6768Issues:40Issues:1321

nbdev

Create delightful software with Jupyter Notebooks

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4860Issues:47Issues:875

nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Language:PythonLicense:Apache-2.0Stargazers:4447Issues:26Issues:83

quarto-cli

Open-source scientific and technical publishing system built on Pandoc.

Language:JavaScriptLicense:NOASSERTIONStargazers:3701Issues:27Issues:4791

docquery

An easy way to extract information from documents

Language:PythonLicense:MITStargazers:1689Issues:24Issues:46

pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Language:PythonLicense:Apache-2.0Stargazers:1610Issues:18Issues:540

ktlint-gradle

A ktlint gradle plugin

Language:KotlinLicense:MITStargazers:1444Issues:16Issues:394

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)

Language:PythonLicense:NOASSERTIONStargazers:734Issues:20Issues:50

polyglot

Polyglot: Large Language Models of Well-balanced Competence in Multi-languages

tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

Language:PythonLicense:Apache-2.0Stargazers:452Issues:9Issues:93

DiffCSE

Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"

Language:PythonLicense:MITStargazers:290Issues:4Issues:21

floret

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Language:C++License:MITStargazers:278Issues:5Issues:3

TLM

ICML'2022: NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Language:PythonLicense:MITStargazers:255Issues:5Issues:19

CLIP-Caption-Reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

Language:PythonLicense:NOASSERTIONStargazers:233Issues:5Issues:12
Language:PythonLicense:Apache-2.0Stargazers:191Issues:2Issues:16

COIL

NAACL2021 - COIL Contextualized Lexical Retriever

Language:PythonLicense:Apache-2.0Stargazers:145Issues:2Issues:21

elasticsearch-jaso-analyzer

Korean Jaso Analyzer for Elasticsearch

Language:JavaLicense:MITStargazers:75Issues:7Issues:12

carecall-corpus

CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).

License:NOASSERTIONStargazers:59Issues:3Issues:0