ChenghaoMou

Chenghao Mou's repositories

text-dedup

All-in-one text de-duplication

Language:PythonApache-2.0553 4 57

touchbar-lyric

Show synced lyric in the touch-bar with BetterTouchTool and NetEase APIs

Language:Python81 2 23

pytorch-pQRNN

Implementation of pQRNN in PyTorch

Language:PythonMIT45 3 7

embeddings

zero-vocab or low-vocab embeddings

Language:PythonMIT16 4 2

awesome-data-deduplication

An awesome list of data deduplication use cases, papers, tools, and methods.

Language:PythonMIT3 1 2

chenghaomou.github.io

Personal Blog

Language:HTMLNOASSERTION3 30

deduplicate-text-datasets

A modified version of Google's tool for pure text file

Language:RustApache-2.03 20

karafuru

Traditional Chinese colors in your terminal

Language:PythonMIT2 10

simhash

Simhash in C++

Language:C++MIT2 20

lightning-grid-template

A minimal template for pytorch-lightning and grid.ai

Language:PythonMIT1 20

mini-vae

Minimal GMM VAE model for NLP

Language:PythonMIT1 20

ai.robots.txt

A list of AI agents and robots to block.

MIT000

awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

CC0-1.0010

bender-ruler

Bender Rule analysis for NLP papers

Language:PythonMIT020

bigcode-analysis

Repository for analysis notebooks and experimentes of the BigCode project.

Language:Jupyter NotebookApache-2.0010

bigcode-dataset

Language:Jupyter NotebookApache-2.0000

blog

Public repo for HF blog posts

Language:Jupyter Notebook000

chenghaomou

020

closedapi

Tired of seeing not-so-open apis behind paywalls.

Language:PythonApache-2.0010

data_tooling

Tools for managing datasets for governance and training.

Language:HTMLApache-2.0010

edgar-crawler

SEC EDGAR Exhibit Downloader

Language:PythonGPL-3.0000

file-explorer-markdown-titles

Obsidian Plugin that adds the the markdown title within your notes to the file explorer

Language:TypeScriptGPL-3.0010

go-wordninja

Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Language:GoMIT010

open-source-mac-os-apps

🚀 Awesome list of open source applications for macOS. https://t.me/s/opensourcemacosapps

Language:SwiftCC0-1.0010

paper2audio

Convert research papers to audio files.

Language:PythonMIT000

presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images

Language:PythonMIT000

pytorch-dice-loss

Dice loss for data-imbalanced NLP tasks

Language:Python020

quartz

🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites

Language:TypeScriptMIT000

star-classification

A tool for the projects you starred on GitHub

Language:PythonMIT020

table-transformer-doclaynet

Table Transformer Fine-tuned with DocLayNet Dataset

Apache-2.0020