Beast code in Giters

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Language:PythonMIT53400

pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object

Language:PythonMIT153100

Text-Attention-Heatmap-Visualization

Plot the vector graph of attention based text visualisation

Language:Python36200

deep-significance

Enabling easy statistical significance testing for deep neural networks.

Language:PythonGPL-3.032100

nlpcda

一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda

Language:PythonApache-2.0172700

annotated_deep_learning_paper_implementations

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:PythonMIT5217200

lime

Lime: Explaining the predictions of any machine learning classifier

Language:JavaScriptBSD-2-Clause1143100

dygiepp

Span-based system for named entity, relation, and event extraction.

Language:PythonMIT56500

ckipnlp

CKIP CoreNLP Toolkits

Language:PythonGPL-3.011500

speech-nlp-datasets

Contains links to publicly available datasets for modeling health outcomes using speech and language.

10800

lstm-crf-pytorch

LSTM-CRF in PyTorch

Language:Python45600

CPT

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Language:Python47500

unilm-pytorch

pytorch版unilm模型

Language:PythonMIT2500

Unilm

Language:Python43300

Discourse-Chinese-Discourse-Parser-ACL2020

code

200

Lifelog-VidLife

VidLife contains personal life events with triple forms from The Big Bang Theory, eg. (Leonard, visit, Penny), which is designed for training and evaluating personal life event extraction systems. You could download part of the life events annotations, which is released in this repository. The complete dataset will be made available online after our paper is accepted.

200

Lifelog-LiveKB

People often forget something in the daily life, thus information recall support for people at the right time and at the right place is emerging. Constructing personal knowledge base for individuals is important for the application of memory recall and living assistance. We collect 18 users who set their tweets as public and posted tweets ranged from 2009 to 2017. We aim to extract life events from tweets shared on Twitter, and construct personal knowledge bases of individuals.

200

Lifelog-PKBQAC-Dataset

A Dataset for Personal Knowledge Base Question Ansewring and Unanswerable Question Correction

300

Lifelog-VisLife

Recently, people tend to record their daily life via filming Video Weblog (VLog), which contains visual and audio data. These large scale multimodal data can be used to support information recall service that enables users to query their past experiences. To this end, we construct a visual lifelogging dataset for investigating the issues of personal life event extraction from vlogs shared on YouTube and constructing a personal knowledge base (PKB) for individuals. There are 1,733 videos from three selected YouTubers ranging from 2016 to 2019. The videos we crawled are all about traveling.

200

Finance-NumClaim

Numerals provide important information in financial narratives. Our statistical result in the financial analysis reports shows that over 58.47% of sentences contain at least one numeral. Without the numerals, lots of fine-grained information in the analysis reports will be lost. This phenomenon evidences the importance of the numerals in the financial narrative. Based on our observation, investors always make a claim with an estimation. This estimation can be a cue for detecting the investor's fine-grained claim. Therefore, we propose an expert-annotated dataset, NumClaim, for probing argument mining in the financial narrative. Among 5,144 instances in the NumClaim dataset, 23.78% and 76.22% of instances containing numerals are annotated as In-claim'' and Out-of-claim'', respectively.

100

Finance-ICRD

There are two tasks in the ICRD. We separate the datasets into three parts, including Train/Dev/Test. (1) Premise Detection In the premise detection task, we aim at identifying whether the given sentence is a premise. There are two keys for each instance. "sentence" is the given sentence. If the value of "ans" is 0, means the given sentence is not a premise. If the value of "ans" is 1, means the given sentence is a premise. (2) Claim-Premise Inference When given a claim and a sentence, models are asked to predict whether the given sentence is the premise of the claim. There are three keys for each instance. "claim" is the given claim and "compare_sent" is the other given sentence. If the value of "ans" is 0, means the given sentence is not a premise of the given claim. If the value of "ans" is 1, means the given sentence is a premise of the given claim.

100