This repo is used for AppAI system development and research purpose.
- Programming languange - Python 3.6
- Pip3.6
- Python virtual environment
- Ray
- Docker
- Images
- Redis
- RabbitMQ
- Haddop HDFS
- Images
$ sudo add-apt-repository ppa:jonathonf/python-3.6
$ sudo apt-get update
## install python
$ sudo apt-get install python3.6
# Below steps are optional, follow the steps below if you decide to use VS Code.
## install ubuntu make
$ sudo apt-get install ubuntu-make
## install Visual Studion Code
$ umake web visual-studio-code
$ sudo apt-get install python3.6-venv
$ python3.6 -m venv VIRTUAL-ENV-AppAI
Activate the python virtual environment
$ source VIRTUAL-ENV-AppAI/bin/activate
Once the python virtual environment is activated, the environment name should appears at the beginning of the terminal line in ().
Upgrade pip to latest
$ pip install -U pip
For first time setup, install the project dependency.
$ pip install -r requirements.txt
For saving project dependency
$ pip freeze > requirements.txt
Deactivate the python virtual environment
$ deactivate
Make sure to add both the project and the src/ folders in your PYTHONPATH variable.
Note: This step doesn't set the PYTHONPATH for the whole system. So, you will need to repeat the steps for new terminals.
If interested, you can add the export
line to ~/.bashrc
file in the Ubuntu, so it auto-loads when you opens a new terminal.
$ pwd src/
$ export PYTHONPATH=full/path/to/project/:full/path/to/project/src/
Docker installation: see link
Method 1: Start all the docker container at once: use the script at 10.10.10.111/data/demo-alpha-lite
$ ./run_docker_container.sh
Method 2: Start them individually, use the commands below.
- Start RabbitMQ in Docker
$ docker run -d --hostname rabbit --name docker-rabbit -p 5672:5672 -p 15672:15672 rabbitmq:3-management
- For the first time running the command, the machine will pull the rabbitmq image from DockerHub.
- Default credentials for UI: http://HOSTIP:15672
- Username: guest
- Password: guest
- Start Redis in Docker
docker run --name docker-redis -p 6379:6379 -d redis
- For the first time running the command, the machine will pull the redis image from DockerHub.
-
Setup Hadoop HDFS
3.1 Download our modified standalone Hadoop HDFS image file from 10.10.10.111, path: /data/demo-alpha-lite3.2 After downloaded the image, load image into the machine's docker image.
$ docker load < docker-hadoop-no-yarn-img.tar.gz
3.3 Start Hadoop HDFS
$ docker run --hostname hadoop --name docker-hadoop-no-yarn -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -p 50090:50090 -d docker-hadoop-no-yarn-img
Visit HDFS web UI to verify deployment: http://HOST-IP:50070
Important HDFS configuration:
- update the
/etc/hosts
file to associate a proper IP to the hostname hadoop
if using the HDFS on 10.10.10.111, append following line
10.10.10.111 hadoop
to file
/etc/hosts
.if using local machine, use your own IP instead of
10.10.10.111
- update the
The configuration has three sections: resources
, services
, streams
.
- To switch between
local
anddatabase
configuration, developer must change it in thelib.Config.ConfigUtil.py
- For Mongo database configuration, the Mongo server connection info is hardcoded in
lib.Common.Auxiliary.py
This is the section for third party services we used in our AppAI system.
- mysql used for simulation data storage
- redis used for system internal data caching
- hdfs used for storing data.
- mq used as message broker to exchange messages between AppAI and other outside system/devices
This is the section for the AppAI system services. The AppAI system is designed with MicroService architecture in mind. So, the processing logic is decomposed into services.
-
InputDataHandler is designed to receive data from outside of AppAI system.
-
StreamProcessor is designed to accept data from InputDataHandler and to perform realtime processing.
-
TrainingActivator is designed to activate training events, by schedule or user-triggered.
- the specific training schedule configuration is at
stream.XXX.pipelines.[#].training_activation
. The following configuration said to trigger a training eventeveryday at 16:38:00 localtime
{ "training_activation":{ "day": [], "weekday": [ 0,1,2,3,4,5,6 ], "time": "16:38:00" } }
day
is for specifying which day of the month should start the training. Empty for default. Valid inputs: 1 - 31.weekday
is for specifying which day of the week should start the training. Empty for default. Valid input: 0 - 6. (0 as Monday, 6 as Sunday)time
is for specifying which time of the day the training should start. It must be in following format "HH:MM:SS"Note:
Ifday
is set, whetherweekday
field is set or not, there will be no effect. Betweenday
andweekday
, only one field should be set.
- the specific training schedule configuration is at
-
ModelTrainer is designed to receive training event requests from TrainingActivator and perform the training.
-
DataRetriever is designed to query realtime processing results and model training related info through Web API.
This is the section for specifying the incoming data stream and its corresponding pipeline processing logic.
- store stream is for sales forecasting scenario
- meter stream is for abnormal energy consumption detection scenario
For each pipeline, the training dataset range is configured at stream.XXX.pipelines.[#].training_data_period
.
The following configuration said to use the past 1 month
data from the current activation time
to train the current model.
{
"training_data_period": {
"years": 0,
"months": 1,
"days": 0,
"hours": 0
}
}
- years is for the year interval to repeat the training
- months is for the month interval to repeat the training
- days is for the day interval to repeat the training
- hours is for the hour interval to repeat the training
Note:
default value for each field is 0. If all the fields are set to non-zero, the combination of all fields will be used to calculate the training dataset range
- Unit testing reference: https://realpython.com/python-testing/#automating-the-execution-of-your-tests