revantkumar / BDS-Project

Comment Data Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RCA Tool

Architecture

The RCA tool has three main components:

  • Data Dashboard and Control Panel
  • Data Collection and Parsing
  • Data Analysis

Each of the above three are discussed below.

1. Data Dashboard and Control Panel

The data dashboard and control panel have been built using the Bootstrap framework and JavaScript. On the backend Python Flask has been used for computations and to interact with the data collection middle-wear and the analysis engine.
Flask is a python web-framework that provides the tools to build a powerful web application. It also provides user session management capabilities and helps structure the web-application as MVC model. These powerful features have been used to develop the data dashboard control panel. SQLite is used as the database to store user based data and analysis logs that a user performs. In future the system can be enhanced by using the database layer even more strongly by storing aggregated analysis. The data dashboard can be further enhanced by using rich data visualization libraries like D3.js, D3plus, NVD3, etc.
The data dashboard and control panel module along with the Flask framework code is available under app/ directory. The module is executed using the following:

python app.py

The above starts the server at port 5000. The dashboard can be viewed by visiting http://localhost:5000.
The database file appdb.db provided contains a default user:bds and password:bds

2. Data Collection and Parsing

The data collection and parsing engine is built using a headless Webkit scriptable library, Phantom.js and Python's Web.py framework. Web.py is a powerful framework and is used in the RCA tool to build and provide REST endpoints. The endpoints serve as a connection for the front-end to interact with the data collection and data parsing module.
The three main endpoints provided by this layer are:

  • fetch_comments: The end point provides the functionality of firing the data collection engine built using Phantom.js. The docket link provided by the user is used to download a csv file that contains links for docket comments. Phantom.js script then uses it to download the comments from the docket and stores it on the file system. The endpoint is used by passing a token generated by the dashboard.
  • get_top: The end point provides the top 100 recent comments downloaded and parsed by the data collection engine. Different dockets, triggered by one or different users are identified through a token. This can be further extended to serve several attributes such as, sort, date, filter-words, etc.
  • get_count: The end point provides the total count of comments downloaded and parsed by the data collection engine. Different dockets, triggered by one or different users are identified through a token. This can be further extended to serve attributes like processed, downloaded, etc.

This module uses a headless browser library Phantom.js to bypass several session based restrictions enforced by the datasource websites like cookie, etc.

The data collection and parsing engine's code with the REST endpoints are under dataservice/ directory. The module is executed using the following:

python fetch_comments.py

The above starts the server at port 8080 through which the endpoints can be accessed and thus making the module features available.

3. Data Analysis Engine

About

Comment Data Analysis


Languages

Language:CSS 55.0%Language:JavaScript 27.9%Language:HTML 12.5%Language:Python 4.7%