Source: Utilizing the Kaggle Python Docker Container image
0. Create data folder
Docker container will map this folder.
mkdir data
kaggle/python
image:
1. Run the container based on docker run --restart always -v ${PWD}/data:/tmp/working -w=/tmp/working -p 8800:8888 --name kaggle \
-d kaggle/python jupyter notebook --no-browser --ip="0.0.0.0" --notebook-dir=/tmp/working --allow-root
2. Access the log to get the http token for accessing Jupyter:
docker logs kaggle
CURRENT TOKEN:
40119a2f87c125c72f7603945ca6b1561e0fb9ed45929234
For example:
http://640b804c545b:8888/?token=8e28bf1201d83f3f43521fba4b0cf382107781a4955ecf93&token=8e28bf1201d83f3f43521fba4b0cf382107781a4955ecf93
- Replace 640b804c545b with
localhost
or the IP of the machine where Kaggle image is running. - Replace port 8888 (container) by 8800 (host)
Everything can be done with the bash script ./kaggle.sh
Using the Jupyter token
In the http line above:
token=40119a2f87c125c72f7603945ca6b1561e0fb9ed45929234
Don't know why the next procedure does not set the password
So if you want to set a password for accessing Jupyter, after launching the container go to:
http://localhost:8888
Enter your token and change the password.
3. SSH into the container
docker exec -it kaggle bash
4. ANOMALY DETECTION ANALYSIS
./
] Z-score for anomaly detection
S1.A [Source tutorial: Z-score for anomaly detection
DATASET: Gearbox fault raw signals ./input/gearbox-fault-diagnosis/
Notebook:
Zscore.GearboxFault-anomaly_detection.ipynb
./MultivariateGaussian
] Multivariate Gaussian Analisis
S1.B [Source tutorial: Wondering how to build an anomaly detection model?
- Gists here https://gist.github.com/abhishek-Kumar009
- Dataset from Github
- [WITH ERRORS, SOLVED IN S1.D]
DATASET: Servers' throughput (mb/s) & latency (ms)
anomalyData.mat
&anomalyDataTest.mat
Notebook:
MGD.server-anomaly_detection.ipynb
./MultivariateGaussian
] Mixture Gaussian models
S1.C [Source tutorial: Anomaly Detection in Python with Gaussian Mixture Models
We can see that Multivariate Gaussian performs not quite good.
- Same dataset as above
Notebook:
GaussianMixtureModels.server-anomaly_detection.ipynb
./MultivariateGaussian
] Mixture Gaussian models
S1.D [Same source tutorial as above
- This is the clean & improved version: GMM testing against validation set
- Same dataset as above
Notebook:
server-gaussianmodel-anomalydetection.ipynb
S2.A Detecting outliers using KNN algorithm: Gearbox dataset
Source tutorial: gearbox dataset requires to compute standard deviation for equal size samples of acceleration signal
- Using Python
pyod
packages for KNN analysis
Notebook with dummy data:
pyodKNN.DummyDataset-anomaly_detection.ipynb
DATASET: Gearbox fault gearbox of standard deviation of equal size samples of acceleration signal ./input/gearbox-fault-diagnosis/stdev/
This dataset is computed here:
dataset.GearboxFault-stdev.ipynb
Notebook:
pyodKNN.GearboxFault-anomaly_detection.ipynb
S2.B: NASA Bearing Dataset: EDA & PCA analysis
Using dataset of experiment 2. Raw data is proccessed in Kaggle:
Notebook:
NASAbearingDataset-EDA_PCA.ipynb
S2.C Detecting outliers using KNN algorithm: NASA bearing dataset
- Applying the same analysis in S2.A
- Same dataset as in S2.B
- Analysis is based on the PCA dimensionality reduction computed in S2.B
Notebook:
NASAbearingDataset-pyodKNN.ipynb
./PCA-MDistance_outliers_detection
] Detecting outliers using PCA plus Mahalanobis distance
S3 [One notebook for each experiment:
nasabearingdataset-pca-outliers-detection_SetNo1.ipynb
for Set No.1nasabearingdataset-pca-outliers-detection_SetNo2.ipynb
for Set No.2nasabearingdataset-pca-outliers-detection_SetNo3.ipynb
for Set No.3