The RCA tool has three main components:
- Data Dashboard and Control Panel
- Data Collection and Parsing
- Data Analysis
Each of the above three are discussed below.
The data dashboard and control panel have been built using the Bootstrap
framework and JavaScript
. On the backend Python Flask
has been used for computations and to interact with the data collection middle-wear and the analysis engine.
Flask is a python web-framework that provides the tools to build a powerful web application. It also provides user session
management capabilities and helps structure the web-application as MVC
model. These powerful features have been used to develop the data dashboard control panel. SQLite
is used as the database to store user based data and analysis logs that a user performs. In future the system can be enhanced by using the database layer even more strongly by storing aggregated analysis. The data dashboard can be further enhanced by using rich data visualization libraries like D3.js
, D3plus
, NVD3
, etc.
The data dashboard and control panel module along with the Flask framework code is available under app/
directory.
The module is executed using the following:
python app.py
The above starts the server at port 5000
. The dashboard can be viewed by visiting http://localhost:5000
.
The database file appdb.db
provided contains a default user:bds
and password:bds
The data collection and parsing engine is built using a headless Webkit scriptable library, Phantom.js
and Python's Web.py
framework. Web.py
is a powerful framework and is used in the RCA tool to build and provide REST endpoints. The endpoints serve as a connection for the front-end to interact with the data collection and data parsing module.
The three main endpoints provided by this layer are:
fetch_comments
: The end point provides the functionality of firing the data collection engine built usingPhantom.js
. Thedocket link
provided by the user is used to download acsv
file that contains links for docket comments.Phantom.js
script then uses it to download the comments from the docket and stores it on the file system. The endpoint is used by passing atoken
generated by the dashboard.get_top
: The end point provides the top 100 recent comments downloaded and parsed by the data collection engine. Different dockets, triggered by one or different users are identified through atoken
. This can be further extended to serve several attributes such as,sort
,date
,filter-words
, etc.get_count
: The end point provides the total count of comments downloaded and parsed by the data collection engine. Different dockets, triggered by one or different users are identified through atoken
. This can be further extended to serve attributes likeprocessed
,downloaded
, etc.
This module uses a headless browser library Phantom.js
to bypass several session based restrictions enforced by the datasource websites like cookie
, etc.
The data collection and parsing engine's code with the REST
endpoints are under dataservice/
directory. The module is executed using the following:
python fetch_comments.py
The above starts the server at port 8080
through which the endpoints can be accessed and thus making the module features available.