Web IR / NLP Group @ NUS's repositories
SG-Deep-Question-Generation
This repository contains code and models for the paper: Semantic Graphs for Generating Deep Questions (ACL 2020).
nus-sms-corpus
This is the distribution point for the NUS SMS Corpus as described and updated from This is a corpus of SMS (Short Message Service) messages collected for research at the Department of Computer Science at the National University of Singapore. This dataset consists of 67,093 SMS messages taken from the corpus on Mar 9, 2015. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The data collectors opportunistically collected as much metadata about the messages and their senders as possible, so as to enable different types of analyses. This corpus was collected by Tao Chen and Min-Yen Kan. If you use this data, please ensure the following paper is cited. For more details, please refer to Citation field. Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355. URL: https://link.springer.com/article/10.1007%2Fs10579-012-9197-9
Summarization-Papers
Summarization Papers
lib4moocdata
Library for processing MOOC data dumps. Currently limited to Coursera data.
AutomaticKeyphraseExtraction
Data for Automatic Keyphrase Extraction Task
CRS-Paper-List
In this repository, we summary a paper list of works in conversational recommendation system and its related areas.
SemanticTokenizer
Item Tokenization: the future for the recommender systems
CoAnnotating
This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation"
ControllableLyricTranslation
Code for the paper "Songs Across Borders: Singable and Controllable Neural Lyric Translation"
LLM-Misinfo-QA
This repository contains data and code used for On the Risk of Misinformation Pollution with Large Language Models (to appear on Findings of EMNLP 2023).
nnose
Codebase for NNOSE: Nearest Neighbor Occupational Skill Extraction
RL-for-Question-Generation
This repository contains codes and models for the paper: Exploring Question-Specific Rewards for Generating Deep Questions (COLING 2020).
SciTab
The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"
ssnlp-2023
Singapore Symposium on Natural Language Processing (SSNLP 2023)
UNO-DST
Official Repo for Project UNO-DST: Leveraging Unlabelled Data in Zero-shot Dialogue State Tracking