MarcosFP97 / eXtream

Big Data framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting started

First of all, you need to have a Docker installation in your computer. Afterwards, you can launch eXtream executing launch.sh.

Docs and technology

eXtream is a flexible easy-to-use platform that facilitates the development of real-time Big Data applications that analyze dynamic streams of data. At the moment, the main use cases that we have developed on the top of eXtream are oriented to support real-time filtering and topic analysis on Social Media. eXtream’s design permits the rapid development of effective and efficient solutions for large-scale processing of massive volumes of data.

The Term “Social Big Data” refers to the content generated by users on social networks. Twitter users, for example, generate more than 481000 tweets per minute. Processing such amount of information is a computational challenge by itself and being able to do it in real-time requires highly effective solutions. One possible use case of eXtream is to use the platform to support reputation analysis in social networks (e.g., a company filtering and analyzing contents related to its products or to the products of its competitors). eXtream comes from the Python library Catenae, developed in CiTIUS (Centro Singular de Investigación de Tecnologías Inteligentes de la USC). It has a number of advantages, including modularity, simplicity to scale-up or the possibility of doing RPC calls between modules. eXtream is composed of some pre-installed modules and the user can upload its own modules (e.g., a new filtering Python module). The pre-installed modules available include: a real-time text filter, a topic analysis module, a dynamic tag cloud generator and a batch processing module, among others.

The main aspects that differentiate eXtream from other Big Data solutions, such as Hadoop or Storm, are:

Framework comparison

eXtream combines streaming operations with batch processing, it is written in native Python, it allows cycles in consume topologies and it uses Docker in order to simplify dependency management. A core strength is that eXtream allows to create consumer topologies in a visual way (i.e., the user can create her own pipeline of processing by simply uploading new Python modules).

eXtream was developed in CiTIUS by a team of developers and researchers who have expertise on multiple areas, including High Performing Computing, Big Data, Text Mining, Information Retrieval and Machine Learning. We envisage two main type of users: programmers who might want to personalise the platform to support new Big Data applications or tools and final users who essentially interact with a dashboard that shows the real-time results and analysis made by eXtream.

For more information, have a look at eXtream.pdf

Web Interface

Some user interface images are shown below.

In the main page, users can configure its own toplogy in a visual way.

Main page

Real-time filtering is a way to extract texts that are mentioning a given query in social networks.

Filter module

In this case, we decided to make a simple example of batch processing: counting texts in a time range. However, the goal was showing eXtream's other way of working.

Batch module

Tag cloud module is a first approach to automatic summarization and it could be improved into a NLG summarization module.

Tag module

Topic analysis module helps users to identify hidden topics in a dynamic corpus.

Topic module

Stats module returns platform statistics since it was put into production.

Stats module

About

Big Data framework

License:GNU General Public License v3.0


Languages

Language:JavaScript 53.7%Language:Python 30.7%Language:CSS 12.1%Language:Dockerfile 1.2%Language:HTML 1.2%Language:Shell 1.1%