Activity prediction with the help of neural networks

Based on the InceptionTime model.

Link to repo. Link to paper.

Setup

env_files/ stores python environment
Install a conda environment with environment_cuda.yml and a python virtual env with requirements.txt.
- conda env create -f enf_files/environment_cuda.yml (only necessary if you want to train over the GPU)
- pip install -r env_files/requirements.txt
run nbdev_install_hooks to make sure notebooks metadata is cleaned before pushing to repo!

Two different Datasets:

in input/separated are the csv files of Armans student thesis. The Dataset consists of 14 different activities, recorded by 30 persons.
- Data has 50 sensor features (Sensor length: 25) and x,y,z acceleration.
The second dataset are the timeseries_data measurements directly saved to the cloud
- downloaded to the repo from aws aws_downloader.ipynb
- aws needs to be installed for the command line over this Link.
- then you need to login to the MinkTec aws with aws configure and the ipi keys received from the AWS Master.
- or directly used from the Nextcloud external data folder
- Data has potentially variable sensor features, x,y,z acceleration and gyroscope data.

aws_downloader.ipynb: Download the timeseries_data from aws and saves the data to disk at input/timeseries_data.
run_inception_tuner.ipynb: Tweak model parameter and tune + fit an InceptionTime model.
create_models/: Scripts of inception_tuner model, Arman's models and Autokeras models.
utils/: Helper scripts around data preprocessing, visualization, saving tflite models, etc.
input/: Datasets are stored here. Arman's old Dataset and new timeseries_data
run_models.ipynb DEPRECATED: Runs Arman's old models and several Autokeras default models and compares them.
notebooks/ DEPRECATED old Scripts

In run_inception_tuner.ipynb use load_armans_dataset function and tweak hyperparameters.

run aws_downloader.ipynb or connect to the Nextcloud external data source
In run_inception_tuner.ipynb use load_timeseries_dataset function and tweak hyperparameters.
CURRENTLY NOT ENOUGH DATA AND PARTICIPANTS TO TRAIN A SENSIBLE MODEL BUT WORKS IN THEORY

Prediction of 7 basic activities: ~ 93% acc and 92% f1 score.
Not the full sensor length is important. n_keep = 12 seems to be enough.
- look out for this parameter later, might be necessary to increase if activities use higher back / neck or arms.
Scaling of the data results in a worse model.
- minmax scaling results in a garbage model.
- standardization results in a slightly worse model (~ 2-5 % accuracy).
- use Z-Score normalization for more sensible feature importance
the worst predictions are on the walkingUpstairs category.
- might be because of the low amount of data available.
PCA with 15-10 components seems to work the best (might even improve the result a little bit).
FocalLoss (for skewed classes) performs a bit worse.
F1-Loss performs the best. especially on walkingUpstairs category -> higher f1-score

Create sensible train, validation and test sets. Which split?
- Smartphone model?
- Sensor model?
Make sure dataset does not get skewed (1 Activity or Person dominates the data)
- visualize every persons contribution, for every activity
Recognize movement and remove data during standstill (Waiting at traffic lights during cycling)
- right now done by std over acc x and z over some threshold
  - good enough?
Try different models with the help of fastai and tsai
Create a Sample Dataset to iterate on more easily (~5 - 20% of the data)
Maybe recreate the dataset collection with 15 Hz
Expand hyperparameter search space
learn differences between the tflite models
- which model improves performance?
- is the model size essential
- inference speed
- raw and quant model are the fastest right now
Potential Cloud GPU Providers:
- Runpod
- VastAI
- Paperspace