NTU NLP Lab (ntunlplab)

NTU NLP Lab

ntunlplab

Geek Repo

Natural Language Processing Laboratory, National Taiwan University

Location:Taiwan

Home Page:http://nlg.csie.ntu.edu.tw

Github PK Tool:Github PK Tool

NTU NLP Lab's repositories

traditional-chinese-alpaca

A Traditional-Chinese instruction-following model with datasets based on Alpaca.

Language:PythonLicense:Apache-2.0Stargazers:129Issues:5Issues:3

LifeEventDialog

Life Event Dialog contains fine-grained personal life event annotations on DailyDialog.

Language:PythonStargazers:5Issues:1Issues:0

Dialogue-MPDD

A dialogue dataset is an indispensable resource for building a dialogue system. Additional information like emotions and interpersonal relationships labeled on conversations enables the system to capture the emotion flow of the participants in the dialogue. However, there is no publicly available Chinese dialogue dataset with emotion and relation labels. In this paper, we collect the conversions from TV series scripts, and annotate emotion and interpersonal relationship labels on each utterance. This dataset contains 25,548 utterances from 4,142 dialogues. We also set up some experiments to observe the effects of the responded utterance on the current utterance, and the correlation between emotion and relation types in emotion and relation classification tasks.

Stargazers:4Issues:0Issues:0

Finance-FinProLex

FinProLex provides 5,162 tokens in professional analysts' reports and the financial social media platform posts with expert-like scores. The expert-like scores are calculated based on the pointwise mutual information (PMI).

Stargazers:4Issues:0Issues:0

Finance-NTUSD-Fin

NTUSD-Fin provides various scoring methods including frequency, CFIDF, chi-squared value, market sentiment score and word vector for the tokens. Only the tokens appeared at least ten times and shown significantly difference between expected and observed frequency with chi-squared test are remained in our dictionary. The predetermined significance level is 0.05. The market sentiment score is calculated by substracting the bearish PMI from the bullish PMI. There are 8,331 words, 112 hashtags and 115 emojis in the constructed dictionary, NTUSD-Fin.

Chinese-Word-Ordering-Errors-Detection-and-Correction-Corpus

Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries in Chinese sentences.

Stargazers:3Issues:0Issues:0

Finance-FinNum

Numeral is the crucial part of financial documents. In order to understand the detail of opinions in financial documents, we should not only analyze the text, but also need to assay the numeric information in depth. Because of the informal writing style, analyzing social media data is more challenging than analyzing news and official documents. FinNum is a dataset for fine-grained numeral understanding in financial social media data - to identify the category of a numeral.

Stargazers:3Issues:0Issues:0

Finance-Numeracy-600K

Numeral is the crucial part of in narrative, especially in financial documents. We should not only analyze the text, but also need to assay the numeric information in depth. Numeracy-600K is a dataset for testing the numeracy of machines.

Stargazers:3Issues:0Issues:0

NL2KB

A total of 7,139 Chinese relation patterns that cover 1,087 DBpedia properties are extracted and verified by human annotators. This resource can be used for knowledge base construction and knowledge base retrieval (e.g., question-answering).

Stargazers:3Issues:0Issues:0

NTU-Chinese-Causal-Corpus

A Chinese causal corpus containing 1,314 pairs of arguments based on the Chinese Discourse Treebank (CDTB) by Li et al. (2014).

Stargazers:3Issues:0Issues:0

NTUSD

Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed in this task.

License:MITStargazers:3Issues:0Issues:0

AMDRD

Analysis Model of Discourse Relations within a Document(AMDRD)

Language:PythonLicense:GPL-3.0Stargazers:2Issues:1Issues:0

ICDA

Interactive Clinical Diagnostic Assistant for Medical Interview

Language:PythonStargazers:2Issues:0Issues:0

NTU-English-Tense-Predictor

A rule-based English tense predictor based on the output of the dependency parser like Stanford CoreNLP.

Language:PythonStargazers:2Issues:0Issues:0

NTU-Irony-Corpus

The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.

Stargazers:2Issues:0Issues:0

tw-eH

Learning to Generate Explanation from e-Hospital Services for Medical Suggestion

Language:PythonLicense:MITStargazers:2Issues:0Issues:0

WSD-MSD-1030

A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.

Stargazers:2Issues:0Issues:0

C2RC2

Categorizing Citation Relations in Scientific Papers Based on the Contributions of Cited Papers

License:MITStargazers:1Issues:0Issues:0

ContributionSum

The ContributionSum Dataset

License:GPL-3.0Stargazers:1Issues:1Issues:0

SEEN

SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance

Language:PythonLicense:MITStargazers:1Issues:1Issues:0

contrastive-debate-representation

Contrastively learning participant representations per round in thread-based debates.

Language:PythonStargazers:0Issues:1Issues:0

Citation-Intent-Classification-Evidence-Extraction-

Citation Intent Classification and Its Supporting Evidence Extraction for Citation Graph Construction

Stargazers:0Issues:1Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

NTUNLP-ImageGallery

提供台大AI中心共享平台圖片。

License:Apache-2.0Stargazers:0Issues:1Issues:0

PRRCA

Peer Review and Rebuttal Counter-Arguments Dataset

Stargazers:0Issues:1Issues:0