Master Thesis Instructions

Contact Points:

Luca Cagliero: luca.cagliero@polito.it
Moreno La Quatra: moreno.laquatra@polito.it

This repository contains the information used for the development a master thesis in the NLP domain at PoliTO.

Instructions and templates
Thesis proposals
Ongoing Topics
Published Theisis

Instructions and templates

The latex template to write the master thesis is avaiable in Overleaf

The first step is to create a GitHub Educational account and create an ad-hoc repository containing all relevant code and information for the master thesis.

The research work expected during the development of the master thesis will cover the following steps.

State-of-the-art exploration

Collect, read and analyze the most recent and relevant publications in the proposed application field. Related works could be summarized and presented by using the Markdown Template available here. Publication could be searched by using the following services:

Data collection and finding

The majority of the thesis requires a step of data collection or data search. During the exploration of the state of the art the student is asked to collect and organize the data used by each publication. Dataset must be presented in an organized way by expoiting the Markdown template available here If a new data collection is created/parsed please create a specific Markdown file (template available here) explaining both the data collection procedure and the statistics of the data collection.

A video tutorial explaining how to create a Python package is available here: YouTube .

Thesis proposals

AL meets NLP

Active Learning is a subfield of machine learning that aims at reducing the number of supervised samples needed to train a machine learning model. It include an human-in-the-loop procedure aimed at selecting the most relevant examples for model's learning.

The main objectives of this thesis are:

Explore the state of the art in Active Learning
Define and propose an Active Learning approach for the fine-tuning of deep neural language models.
Simulate the process on existing benchmark dataset

References 📚:

Peshterliev, S., Kearney, J., Jagannatha, A., Kiss, I., & Matsoukas, S. (2019, June). Active Learning for New Domains in Natural Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers) (pp. 90-96). article

Schröder, C., & Niekler, A. (2020). A Survey of Active Learning for Text Classification using Deep Neural Networks. arXiv preprint arXiv:2008.07267. article

Interesting projects 💻:

modAL
libact

Additional Material:

Active Learning: Algorithmically Selecting Training Data to Improve Alexa’s Natural-Language Understanding post

Active Learning for Natural Language Processing. post

Temporal Summarization

News events such as protests, accidents or natural disasters represent a unique information access problem where traditional approaches fail. For example, immediately after an event, the corpus may be sparsely populated with relevant content. Even when, after a few hours, relevant content becomes available, it is often inaccurate or highly redundant. At the same time, crisis events demonstrate a scenario where users urgently need information, especially if they are directly affected by the event. The goal of the TREC Temporal Summarization Track is to develop systems for efficiently monitoring the information associated with an event over time. source TREC Temporal Summarization Track

The main objectives of this thesis are:

Define and explore the state of the art in Temporal Summarization.
Design a new temporal summarization framework able to process data streams from different sources.
Simulate the process on existing benchmark dataset
Create a web-based dashboard to present live updates about ongoing events.

Generative Sentiment Analysis

Sentiment analysis is one of the most important task in several NLP pipelines. It consists in the analysis of text for classifying its sentiment being both positive or negative. Generative models such as GPT3 open a large set of possibilities in this scenario. This master thesis will cover both generative language and sentiment analysis.

The main objectives of this thesis are:

Define and explore the state of the art in Language Generation.
Analyze state of the art methodologies in Sentiment Anaysis.
Simulate an innovative pipeline on existing benchmark dataset.

References 📚:

Wang, H., & Zhai, C. (2017). Generative models for sentiment analysis and opinion mining. In A practical guide to sentiment analysis (pp. 107-134). Springer, Cham. article

Gupta, R. (2019, May). Data augmentation for low resource sentiment analysis using generative adversarial networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7380-7384). IEEE. article

sofiaperosin / MTI-polito

Master Thesis Instructions

Table of contents

Instructions and templates

State-of-the-art exploration

Data collection and finding

Thesis proposals

AL meets NLP

Temporal Summarization

Generative Sentiment Analysis

Ongoing Topics

Text Summarization

About