Li-Kuang Chen's starred repositories

angular

Deliver web apps with confidence 🚀

Language:TypeScriptLicense:MITStargazers:95707Issues:3015Issues:28005

typst

A new markup-based typesetting system that is powerful and easy to learn.

Language:RustLicense:Apache-2.0Stargazers:32748Issues:85Issues:2430

analytics

Simple, open source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

Language:ElixirLicense:AGPL-3.0Stargazers:19744Issues:135Issues:692

pyo3

Rust bindings for the Python interpreter

Language:RustLicense:Apache-2.0Stargazers:11983Issues:98Issues:1344

Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

Language:PythonLicense:Apache-2.0Stargazers:9564Issues:142Issues:240

owncast

Take control over your live stream video by running it yourself. Streaming + chat out of the box.

axolotl

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:6856Issues:50Issues:597

pipreqs

pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.

Language:PythonLicense:Apache-2.0Stargazers:6111Issues:55Issues:275

presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images

Language:PythonLicense:MITStargazers:3636Issues:67Issues:406

ImageOptim-CLI

Make optimisation of images part of your automated build process

Language:TypeScriptLicense:MITStargazers:3449Issues:51Issues:171

roberta_zh

RoBERTa中文预训练模型: RoBERTa for Chinese

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:2552Issues:18Issues:174

hunspell

The most popular spellchecking library.

Language:C++License:LGPL-2.1Stargazers:2101Issues:56Issues:709

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1939Issues:44Issues:120

llm-guard

The Security Toolkit for LLM Interactions

Language:PythonLicense:MITStargazers:1129Issues:18Issues:59

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonLicense:Apache-2.0Stargazers:1115Issues:40Issues:75

TransformerLens

A library for mechanistic interpretability of GPT-style language models

Language:PythonLicense:MITStargazers:920Issues:13Issues:192

awesome-instruction-learning

Papers and Datasets on Instruction Tuning and Following. ✨✨✨

Language:PythonLicense:MITStargazers:449Issues:7Issues:0

csvdedupe

:id: Command line tool for deduplicating CSV files

Language:PythonLicense:NOASSERTIONStargazers:410Issues:15Issues:81
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:323Issues:29Issues:12
Language:PythonLicense:Apache-2.0Stargazers:310Issues:22Issues:2

piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Language:PythonLicense:Apache-2.0Stargazers:273Issues:13Issues:102

MR-Models

聯發創新基地(MediaTek Research) 致力於研究基礎模型。我們將研究體現在適合繁體中文使用者的模型上,並在使用權許可的情況下,提供模型給學術界研究或產業界使用。

Language:PythonLicense:NOASSERTIONStargazers:156Issues:5Issues:7

conversationai-models

A repository to house model building experiments and tools that are part of the Conversation AI effort.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:137Issues:15Issues:53

pandas_dq

Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

Language:PythonLicense:Apache-2.0Stargazers:125Issues:5Issues:4

c4-dataset-script

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.

Language:PythonLicense:MITStargazers:118Issues:5Issues:0

features-across-time

Understanding how features learned by neural networks evolve throughout training

Language:PythonLicense:MITStargazers:30Issues:4Issues:0

edos

Public repository for SemEval 2023 - Task 10 - Explainable Detection of Online Sexism (EDOS)

Language:PythonLicense:CC0-1.0Stargazers:18Issues:1Issues:3

perspectiveapi-proxy

Example code for an authenticated proxy for requests to the Perspective API

Language:TypeScriptLicense:Apache-2.0Stargazers:6Issues:7Issues:0