jonatasgrosman / eras

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eras: Improving the quality control in the annotation process for Natural Language Processing tasks https://www.sciencedirect.com/science/article/abs/pii/S0306437920300521

INITIAL SETUP

Install dependencies

Config variables

This project is composed by four modules: (i) authentication, (ii) data, (iii) nlp, and (iv) client. Authentication module is a Python module which is responsible for managing the encryption of keys and passwords along the modules. Data module is a Python module which is responsible for managing data generated by annotation. NLP is a Python module which is responsible for the tokenization of plain texts to be annotated. Client module is a JavaScript module which is responsible for generating the web interface. Each module has its respective folder at the root of the project.

Authentication module has a configuration file located at authentication/module/__init__.py. You have to change the value of SECRET_KEY variable from supersecretkey to a random value at your choice. That key is used to generate all the passwords. All variables that can be configured for this module are specified in config dictionary. In that dictionary, you have to modify RECOVERY_PASSWORD_PAGE_URL variable, changing PAGE_URL_OR_IP by the web page URL or IP, and PORT by the port number, if necessary.

After specifying a new value for SECRET_KEY, you need to recreate the default passwords for data module, nlp module, and admin user, using cryptUtil.py script located at authentication/module/util/. In that script, you have to fill key variable with the value of SECRET_KEY and decrypted_password variable with a password at your choice. The script produces as output the password encrypted using the combination of decrypted_password and key. Take note of the encrypted passwords, since they are used in the section Config DB of this guide. In addition, the decrypted passwords are used in the configuration files of data module and nlp module.

Data module has a configuration file located at data/module/__init__.py. All variables that can be configured for this module are specified in config dictionary. In that dictionary, you have to change the value of AUTHENTICATION_PASSWORD by the password you chose for data module.

Nlp module has a configuration file located at nlp/module/__init__.py. All variables that can be configured for this module are specified in config dictionary. In that dictionary, you have to change the value of AUTHENTICATION_PASSWORD by the password you chose for nlp module.

Client module has a configuration file located at client/js/config/constants.js. You have to change AUTHENTICATION_SERVER_URL by the authentication module URL or IP followed by the port number, if necessary. Similarly, you have to modify DATA_SERVER_URL, specifying data module corresponding information. Remember as this is a JavaScript module, it is going to be executed at the client computer. Therefore, authentication module and data module need to be specified with an external IP address or URL in order to be accessed on web.

Config DB

  • go to the mongoDB console

mongo

  • creating authentication DB

use authentication

  • creating unique key

db.users.createIndex( { "email": 1 }, { unique: true } )

  • adding default users (remember to change ADMIN_ENCRYPTED_PASSWORD, NLP_ENCRYPTED_PASSWORD, and DATA_ENCRYPTED_PASSWORD by the encrypted passwords created above using cryptUtil.py script)

db.users.insert({"firstName":"admin", "lastName":"", "email":"admin@eras", "password":"ADMIN_ENCRYPTED_PASSWORD", "role":"ADMIN"})

db.users.insert({"firstName":"", "lastName":"", "email":"nlp@eras", "password":"NLP_ENCRYPTED_PASSWORD", "role":"MODULE"})

db.users.insert({"firstName":"", "lastName":"", "email":"data@eras", "password":"DATA_ENCRYPTED_PASSWORD", "role":"MODULE"})

EXECUTION

  • run Freeling instances (change the port parameter if necessary)

analyze -f pt.cfg --nonumb --noloc --noner --nodict --outlv tagged --server --flush --port 50040

analyze -f en.cfg --nonumb --noloc --noner --nodict --outlv tagged --server --flush --port 50050

analyze -f pt.cfg --outlv tagged --server --flush --port 50041

analyze -f en.cfg --outlv tagged --server --flush --port 50051

  • run ERAS API

gunicorn --chdir authentication -b 0.0.0.0:50000 --log-level debug run:app

gunicorn --chdir data -b 0.0.0.0:50001 --log-level debug run:app

gunicorn --chdir nlp -b 0.0.0.0:50002 --log-level debug run:app

  • run ERAS client

go to /client folder and run python3 -m http.server 80

  • To check if everything is working properly, go to http://localhost, sign in using admin credentials and use the files contained in /samples folder to create a project and make some annotations

  • Alternatively, if you want to run this project as a Linux service, you can use erasd script, located at the root of the project. In that file, you have to change PROJECT_PATH by the ERAS path at your computer, and VIRTUAL_ENV_PATH by the Python virtual environment path at your computer. In the script header, there are the instructions to install the script as a Linux service.

About

License:MIT License


Languages

Language:JavaScript 74.5%Language:Python 9.8%Language:CSS 7.6%Language:HTML 6.9%Language:Less 1.2%Language:Shell 0.1%Language:Makefile 0.0%