李熙 (whxf)

whxf

Geek Repo

Company:Beihang University

Location:Beijing, China

Github PK Tool:Github PK Tool

李熙's starred repositories

DisCo

This is the public repository of EMNLP 2023 paper "DisCo: Co-training Distilled Student Models for Semi-supervised Text Mining"

Language:PythonStargazers:61Issues:0Issues:0

IDP-system

Intelligent Document Processing System

Language:PythonStargazers:59Issues:0Issues:0

BiGAE

Code Repo for EMNLP'23 paper "Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders"

Language:PythonStargazers:57Issues:0Issues:0

CPSUM

Code and Data Repo for COLING'22 paper "Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization"

Language:PythonStargazers:57Issues:0Issues:0

bucket-based_farthest-point-sampling_CPU

the CPU implementation of bucket based farthest point sampling, achieves 7-81x speedup than the conventional implementation

Language:C++License:Apache-2.0Stargazers:9Issues:0Issues:0

bucket-based_farthest-point-sampling_GPU

the GPU implementation of bucket based farthest point sampling, achieves 3-4x speedup than the conventional implementation

Language:CudaLicense:GPL-3.0Stargazers:7Issues:0Issues:0

DecryptPrompt

总结Prompt&LLM论文,开源数据&模型,AIGC应用

Stargazers:2468Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:12Issues:0Issues:0

awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Language:PythonLicense:GPL-3.0Stargazers:2207Issues:0Issues:0

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonLicense:Apache-2.0Stargazers:14595Issues:0Issues:0

MatchSum

Code for ACL 2020 paper: "Extractive Summarization as Text Matching"

Language:PythonStargazers:519Issues:0Issues:0

pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:5390Issues:0Issues:0

lihang-code

《统计学习方法》的代码实现

Language:Jupyter NotebookStargazers:18739Issues:0Issues:0

ML-NLP

此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。

Language:Jupyter NotebookStargazers:15560Issues:0Issues:0

text-classification-surveys

文本分类资源汇总,包括深度学习文本分类模型,如SpanBERT、ALBERT、RoBerta、Xlnet、MT-DNN、BERT、TextGCN、MGAN、TextCapsule、SGNN、SGM、LEAM、ULMFiT、DGCNN、ELMo、RAM、DeepMoji、IAN、DPCNN、TopicRNN、LSTMN 、Multi-Task、HAN、CharCNN、Tree-LSTM、DAN、TextRCNN、Paragraph-Vec、TextCNN、DCNN、RNTN、MV-RNN、RAE等,浅层学习模型,如LightGBM 、SVM、XGboost、Random Forest、C4.5、CART、KNN、NB、HMM等。介绍文本分类数据集,如MR、SST、MPQA、IMDB、Yelp、20NG、AG、R8、DBpedia、Ohsumed、SQuAD、SNLI、MNLI、MSRP、MRDA、RCV1、AAPD,评价指标,如accuracy、Precision、Recall、F1、EM、MRR、HL、Micro-F1、Macro-F1、P@K,和技术挑战,包括多标签文本分类。

Language:PythonStargazers:588Issues:0Issues:0

Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Language:Jupyter NotebookLicense:MITStargazers:4035Issues:0Issues:0

langdetect

Port of Google's language-detection library to Python.

Language:PythonLicense:NOASSERTIONStargazers:1690Issues:0Issues:0

ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language:PythonLicense:MITStargazers:661Issues:0Issues:0

Summarization-Papers

Summarization Papers

Language:TeXStargazers:978Issues:0Issues:0

tilse

Timeline summarization and evaluation.

Language:PerlLicense:MITStargazers:30Issues:0Issues:0

heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.

Language:JavaLicense:GPL-3.0Stargazers:341Issues:0Issues:0

spacy-models

💫 Models for the spaCy Natural Language Processing (NLP) library

Language:PythonStargazers:1571Issues:0Issues:0

factCC

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Language:PythonLicense:BSD-3-ClauseStargazers:275Issues:0Issues:0

pumpkin-book

《机器学习》(西瓜书)公式详解

License:NOASSERTIONStargazers:23577Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:81043Issues:0Issues:0

stopwords

中文常用停用词表(哈工大停用词表、百度停用词表等)

Stargazers:4533Issues:0Issues:0

README

README文件语法解读,即Github Flavored Markdown语法介绍

License:UnlicenseStargazers:6780Issues:0Issues:0

COVID-19-tracker

北航大数据高精尖中心研究团队进行数据来源的整理与获取,利用自然语言处理等技术从已公开全国4626确诊患者轨迹中抽取了基本信息(性别、年龄、常住地、工作、武汉/湖北接触史等)、轨迹(时间、地点、交通工具、事件)及病患关系形成结构化信息

License:MITStargazers:82Issues:0Issues:0

PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

Language:PythonLicense:MITStargazers:1278Issues:0Issues:0

git-for-win

Git for Windows. 国内直接从官网下载比较困难,需要翻墙。这里提供一个国内的下载站,方便网友下载

Stargazers:2317Issues:0Issues:0