StfBlanchet / Agoraphon

A Flask application for analyzing activity on an online discussion forum, using scraping, indexing, analytics, relational graph and NLP.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Agoraphon

A Flask application for analyzing activity on an online discussion forum, using scraping, indexing, analytics, relational graph and NLP.

Agor@phon

Objectives

The Agor@phon project aims to contribute to knowledge of natural language for machine learning purposes and to provide real world material to study phenomena such as disinformation, propaganda, and hate / extremist speech.

A tool developed by a researcher-programmer for scientific research, it materializes in a web platform that ensures the collection and in-depth analysis of multimodal contents published on an online discussion forum.

NLP / linguistic application

The content collected will make it possible to build real-world French text corpora, which are rare compared with English ones. This is all the more a valuable resource that, on the forum studied here, the language is of a very oral and slang style, with idioms specific to its user community. That kind of communication is a pain for natural language processing systems which algorithms are mainly trained on texts written by professionals (e.g. press articles) and / or to be read by the greatest number (e.g. Wikipedia).

Disinformation and hate speech investigation

The obtained datasets will also allow the study of phenomena such as propaganda, fake news, trolls as well as hate / extremist speech for which any online communication platform may be fertile ground. What makes the type of forums studied here a little bit different is that users can register truly anonymously - no phone number or verified professional email to provide, which facilitates opportunistic or impulsive interventions, whether to launch or to participate in a discussion. Also, the desire to build and feed any community of followers is out of concern for most of users. Unlike other platforms where family, friends and colleagues may identify them, they can post freely without worrying about their reputation or popularity. And when social desirability is not at stake, anything goes…

It should be noted that, on this research subject too, large French corpora are few or else concentrated on easily accessible deposits (e.g. Twitter).

In addition, the forum is a place of convergence of different kinds of sources, whether social networks, micro-blogging and videos or images sharing platforms, information sites, or even messaging such as Telegram or Whatsapp which screenshots can be found shared by users. Thus it forum opens on a wider spectrum than itself and offers materials that enable to catch societal trends.

How it works

alt text alt text alt text alt text alt text alt text alt text alt text alt text

Stack and Architecture

The application is built with Flask framework 1.1 and written in Python 3.8.

The whole system is based on a distributed architecture. Three servers are at play: the first one is dedicated to scraping ; the second one to data indexing and retrieving ; and the third one hosts the application where the data mining, analytics and visualization tasks are performed.

Status

This project is in progress.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Authors

  • Initial work: Stephanie BLANCHET, R&D Cognitician. Data Pythonist.
  • Contact: agoraphon@gmail.com

License

This project is licensed under the MIT License - see MIT for details.

About

A Flask application for analyzing activity on an online discussion forum, using scraping, indexing, analytics, relational graph and NLP.