Beast code in Giters

李熙's starred repositories

DisCo

This is the public repository of EMNLP 2023 paper "DisCo: Co-training Distilled Student Models for Semi-supervised Text Mining"

Language:Python6100

IDP-system

Intelligent Document Processing System

Language:Python5900

BiGAE

Code Repo for EMNLP'23 paper "Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders"

Language:Python5700

CPSUM

Code and Data Repo for COLING'22 paper "Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization"

Language:Python5700

bucket-based_farthest-point-sampling_CPU

the CPU implementation of bucket based farthest point sampling, achieves 7-81x speedup than the conventional implementation

Language:C++Apache-2.0900

bucket-based_farthest-point-sampling_GPU

the GPU implementation of bucket based farthest point sampling, achieves 3-4x speedup than the conventional implementation

Language:CudaGPL-3.0700

awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Language:PythonGPL-3.0220700

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonApache-2.01459500

MatchSum

Code for ACL 2020 paper: "Extractive Summarization as Text Matching"

Language:Python51900

pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。

Language:PythonApache-2.0539000

lihang-code

《统计学习方法》的代码实现

Language:Jupyter Notebook1873900

ML-NLP

此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现，也是作为一个算法工程师必会的理论基础知识。

Language:Jupyter Notebook1556000

文本分类资源汇总，包括深度学习文本分类模型，如SpanBERT、ALBERT、RoBerta、Xlnet、MT-DNN、BERT、TextGCN、MGAN、TextCapsule、SGNN、SGM、LEAM、ULMFiT、DGCNN、ELMo、RAM、DeepMoji、IAN、DPCNN、TopicRNN、LSTMN 、Multi-Task、HAN、CharCNN、Tree-LSTM、DAN、TextRCNN、Paragraph-Vec、TextCNN、DCNN、RNTN、MV-RNN、RAE等，浅层学习模型，如LightGBM 、SVM、XGboost、Random Forest、C4.5、CART、KNN、NB、HMM等。介绍文本分类数据集，如MR、SST、MPQA、IMDB、Yelp、20NG、AG、R8、DBpedia、Ohsumed、SQuAD、SNLI、MNLI、MSRP、MRDA、RCV1、AAPD，评价指标，如accuracy、Precision、Recall、F1、EM、MRR、HL、Micro-F1、Macro-F1、P@K，和技术挑战，包括多标签文本分类。

Language:Python58800

Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Language:Jupyter NotebookMIT403500

langdetect

Port of Google's language-detection library to Python.

Language:PythonNOASSERTION169000

ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language:PythonMIT66100

whxf

李熙's starred repositories

DisCo

IDP-system

BiGAE

CPSUM

bucket-based_farthest-point-sampling_CPU

bucket-based_farthest-point-sampling_GPU

DecryptPrompt

HashtagGen

awesome-sentence-embedding

sentence-transformers

MatchSum

pycorrector

lihang-code

ML-NLP

text-classification-surveys

Conference-Acceptance-Rate

langdetect

ekphrasis

Summarization-Papers

tilse

heideltime

spacy-models

factCC

pumpkin-book

pytorch

stopwords

README

COVID-19-tracker

PreSumm

git-for-win