Action Detection

Using YOLO to detect an action from an image or video

Download YOLOv3 weights from YOLO website.
Convert the Darknet YOLO model to a Keras model.
Run YOLO detection.

git clone https://github.com/ashish-roopan/Action-detection-Stanford40-yolov3-.git
wget https://pjreddie.com/media/files/yolov3.weights
python convert.py yolov3.cfg yolov3.weights model_data/yolo_weights.h5
python yolo_video.py [OPTIONS...] --image, for image detection mode, OR
python yolo_video.py [video_path] [output_path (optional)]

Training for action detection on Stanford40 dataset

1.Download dataset from here and extract to root directory.

2.Create folders train,test,train_ann,test_ann.

3.Split images into train and test set by running datasplit.py

Generate your own annotation file and class names file and save it as .txt . One row for one image;
Row format: image_file_path box1 box2 ... boxN;
Box format: x_min,y_min,x_max,y_max,class_id (no space).
Here is an example:
```
path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
...
```
run convert2text.py for making annotations in the required format
Make sure you have run python convert.py -w yolov3.cfg yolov3.weights model_data/yolo_weights.h5
The file model_data/yolo_weights.h5 is used to load pretrained weights.
Create a folder logsand start training.
python train.py
Use your trained weights or checkpoint weights with command line option --model model_file when using yolo_video.py Remember to modify class path or anchor path, with --classes class_file and --anchors anchor_file.

If you want to use original pretrained weights for YOLOv3:
1. wget https://pjreddie.com/media/files/darknet53.conv.74
2. rename it as darknet53.weights
3. python convert.py -w darknet53.cfg darknet53.weights model_data/darknet53_weights.h5
4. use model_data/darknet53_weights.h5 in train.py

Some issues to know

The test environment is
- Python 3.5.2
- Keras 2.1.5
- tensorflow 1.6.0
Default anchors are used. If you use your own anchors, probably some changes are needed.
The inference result is not totally the same as Darknet but the difference is small.
The speed is slower than Darknet. Replacing PIL with opencv may help a little.
Always load pretrained weights and freeze layers in the first stage of training. Or try Darknet training. It's OK if there is a mismatch warning.
The training strategy is for reference only. Adjust it according to your dataset and your goal. And add further strategy if needed.
For speeding up the training process with frozen layers train_bottleneck.py can be used. It will compute the bottleneck features of the frozen model first and then only trains the last layers. This makes training on CPU possible in a reasonable time. See this for more information on bottleneck features.

ashish-roopan / Action-detection-Stanford40-yolov3-

Action Detection

Using YOLO to detect an action from an image or video

Training for action detection on Stanford40 dataset

Some issues to know

About

Languages