HackSheffield project
19-20/3/2016
Sensor data from real world can be extremely noisy. Trying to make sense of tens of thousands of entries manually is NOT an option. Our approach uses different techniques to map the entries to the proper activities.
Initial dataset has many instances that are mislabeled. We use a rule-based pre-processing routine to filter the meaningless data points. A simple neural network is then trained to classify the entries into 6 categories: IN_VEHICLE, ON_BICYCLE, STILL, UNKNOWN, TILTING, WALKING The final goal is to visualise this new, improved information on a front-end.
The data is in CSV format, the pre-processing is done using simple python scripts and subroutines. The neural-net is written in python and is an input layer directly fed into an output layer. Activation function is a simple max function. Noise was introduced to try to reduce the effect of the skew in the classes composing the dataset.
The dataset, being real-world sensor data, was skewed towards certain activities. This means that some classes were overrepresented during the learning process and dominated the network.
There is [a certain degree of] learning from the network and the visualisation shows that pre-processing is useful to a certain extent.
Expect some serious noise in real world-data, sensors are cool but take them with a pinch of salt. Unsupervised learning is hard but super-cool.
Introducing techniques to reduce the effect of skew in the data.
[Data for this project is covered by non-disclosure agreement and cannot be distributed.]