monoloxo / PyTorch_YOWO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YOWO

Big thanks to YOWO for their open source. I reimplemented YOWO and reproduced the performance. On the AVA dataset, my reproduced YOWO is better than the official YOWO. I hope that such a real-time action detector with simple structure and superior performance can attract your interest in the task of spatio-temporal action detection.

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yowo python=3.6
  • Then, activate the environment:
conda activate yowo
  • Requirements:
pip install -r requirements.txt 

Dataset

You can download UCF24 and JHMDB21 from the following links:

UCF101-24:

  • Google drive

Link: https://drive.google.com/file/d/1Dwh90pRi7uGkH5qLRjQIFiEmMJrAog5J/view?usp=sharing

  • BaiduYun Disk

Link: https://pan.baidu.com/s/11GZvbV0oAzBhNDVKXsVGKg

Password: hmu6

JHMDB21:

  • Google drive

Link: https://drive.google.com/file/d/15nAIGrWPD4eH3y5OTWHiUbjwsr-9VFKT/view?usp=sharing

  • BaiduYun Disk

Link: https://pan.baidu.com/s/1HSDqKFWhx_vF_9x6-Hb8jA

Password: tcjd

AVA

You can use instructions from here to prepare AVA dataset.

Experiment

  • UCF101-24
Model Clip GFLOPs Frame mAP Video mAP FPS Weight
YOWO 16 43.8 80.4 48.8 - -
YOWO (Ours) 16 43.8 84.9 50.5 36 github
YOWO-Nano 16 6.0 81.0 49.7 91 github
  • AVA v2.2
Model Clip mAP FPS weight
YOWO 16 17.9 33 -
YOWO 32 19.1 - -
YOWO (Ours) 16 20.6 33 github
YOWO (Ours) 32 github
YOWO-Nano 16 18.4 100 github
YOWO-Nano 32 github

Train YOWO

  • UCF101-24
python train.py --cuda -d ucf24 -v yowo --num_workers 4 --eval_epoch 1 --eval

or you can just run the script:

sh train_ucf.sh
  • AVA
python train.py --cuda -d ava_v2.2 -v yowo --num_workers 4 --eval_epoch 1 --eval

or you can just run the script:

sh train_ava.sh

Test YOWO

  • UCF101-24
python test.py --cuda -d ucf24 -v yowo --weight path/to/weight --show
  • AVA
python test.py --cuda -d ava_v2.2 -v yowo --weight path/to/weight --show

Evaluate YOWO

  • UCF101-24
# Frame mAP
python eval.py \
        --cuda \
        -d ucf24 \
        -v yowo \
        -bs 8 \
        -size 224 \
        --weight path/to/weight \
        --cal_frame_mAP \

Our YOWO's result of frame mAP@0.5 IoU on UCF101-24:

AP: 85.25% (1)
AP: 96.94% (10)
AP: 78.58% (11)
AP: 68.61% (12)
AP: 78.98% (13)
AP: 94.92% (14)
AP: 90.00% (15)
AP: 77.44% (16)
AP: 75.82% (17)
AP: 91.07% (18)
AP: 97.16% (19)
AP: 62.71% (2)
AP: 93.22% (20)
AP: 79.16% (21)
AP: 80.07% (22)
AP: 76.10% (23)
AP: 92.49% (24)
AP: 86.29% (3)
AP: 76.99% (4)
AP: 74.89% (5)
AP: 95.74% (6)
AP: 93.68% (7)
AP: 93.71% (8)
AP: 97.13% (9)
mAP: 84.87%

Our YOWO-Nano's result of frame mAP@0.5 IoU on UCF101-24:

AP: 65.53% (1)
AP: 97.19% (10)
AP: 78.60% (11)
AP: 66.09% (12)
AP: 70.95% (13)
AP: 87.57% (14)
AP: 84.48% (15)
AP: 89.19% (16)
AP: 77.62% (17)
AP: 89.35% (18)
AP: 94.54% (19)
AP: 34.73% (2)
AP: 93.34% (20)
AP: 82.73% (21)
AP: 80.11% (22)
AP: 70.74% (23)
AP: 88.19% (24)
AP: 85.56% (3)
AP: 66.48% (4)
AP: 71.48% (5)
AP: 94.33% (6)
AP: 93.09% (7)
AP: 90.36% (8)
AP: 90.75% (9)
mAP: 80.96%
# Video mAP
python eval.py \
        --cuda \
        -d ucf24 \
        -v yowo \
        -bs 8 \
        -size 224 \
        --weight path/to/weight \
        --cal_video_mAP \

Our YOWO's result of video mAP@0.5 IoU on UCF101-24:

-------------------------------
V-mAP @ 0.05 IoU:
--Per AP:  [94.1, 99.64, 68.62, 97.44, 87.21, 100.0, 82.72, 100.0, 99.87, 96.08, 44.8, 92.43, 91.76, 100.0, 24.29, 92.53, 90.23, 96.55, 94.24, 63.46, 73.44, 51.48, 82.85, 88.67]
--mAP:  83.85
-------------------------------
V-mAP @ 0.1 IoU:
--Per AP:  [94.1, 97.37, 67.16, 97.44, 85.2, 100.0, 82.72, 100.0, 99.87, 96.08, 44.8, 92.43, 91.76, 100.0, 24.29, 92.53, 90.23, 96.55, 94.24, 63.46, 70.75, 51.48, 79.44, 88.67]
--mAP:  83.36
-------------------------------
V-mAP @ 0.2 IoU:
--Per AP:  [70.0, 97.37, 62.86, 89.47, 59.5, 100.0, 78.04, 100.0, 90.74, 96.08, 44.8, 92.43, 91.76, 100.0, 22.29, 92.53, 90.23, 96.55, 94.24, 58.8, 42.35, 48.03, 53.41, 88.67]
--mAP:  77.51
-------------------------------
V-mAP @ 0.3 IoU:
--Per AP:  [14.33, 48.86, 61.27, 76.36, 12.58, 87.34, 78.04, 100.0, 90.74, 93.28, 44.8, 89.89, 91.76, 100.0, 15.41, 92.53, 88.99, 96.55, 94.24, 51.4, 24.52, 42.89, 5.63, 78.64]
--mAP:  65.84
-------------------------------
V-mAP @ 0.5 IoU:
--Per AP:  [0.18, 1.9, 58.16, 33.87, 1.31, 44.26, 49.09, 100.0, 61.3, 91.23, 44.8, 70.06, 59.22, 100.0, 3.73, 92.53, 87.71, 89.53, 91.29, 45.06, 0.97, 20.94, 0.0, 65.41]
--mAP:  50.52
-------------------------------
V-mAP @ 0.75 IoU:
--Per AP:  [0.0, 0.0, 27.05, 0.0, 0.0, 0.56, 9.81, 69.56, 14.42, 31.74, 3.43, 29.46, 0.93, 48.21, 0.71, 61.32, 45.81, 16.04, 84.41, 14.2, 0.06, 0.96, 0.0, 35.95]
--mAP:  20.61

Our YOWO-Nano's result of video mAP@0.5 IoU on UCF101-24:

-------------------------------
V-mAP @ 0.05 IoU:
--Per AP:  [82.6, 99.22, 65.57, 96.8, 83.21, 100.0, 79.01, 100.0, 97.19, 96.08, 44.73, 93.47, 91.15, 98.48, 23.33, 95.97, 91.44, 96.55, 93.81, 63.46, 70.45, 51.44, 87.88, 87.19]
--mAP:  82.88
-------------------------------
V-mAP @ 0.1 IoU:
--Per AP:  [82.6, 95.29, 65.57, 94.81, 83.21, 100.0, 79.01, 100.0, 97.19, 96.08, 44.73, 93.47, 91.15, 98.48, 23.33, 95.97, 91.44, 96.55, 93.81, 63.46, 67.26, 51.44, 80.33, 87.19]
--mAP:  82.18
-------------------------------
V-mAP @ 0.2 IoU:
--Per AP:  [50.67, 78.87, 63.91, 82.36, 50.96, 100.0, 79.01, 100.0, 87.87, 96.08, 44.73, 90.49, 91.15, 98.48, 21.79, 95.97, 91.44, 96.55, 93.81, 63.46, 44.19, 48.75, 34.85, 87.19]
--mAP:  74.69
-------------------------------
V-mAP @ 0.3 IoU:
--Per AP:  [9.19, 29.82, 60.21, 68.02, 16.21, 86.67, 74.23, 100.0, 87.87, 92.76, 44.73, 80.86, 91.15, 98.48, 14.07, 95.97, 91.44, 96.55, 93.81, 52.13, 24.71, 43.26, 5.53, 77.27]
--mAP:  63.96
-------------------------------
V-mAP @ 0.5 IoU:
--Per AP:  [0.0, 0.0, 58.56, 26.91, 5.7, 40.87, 56.73, 91.42, 58.24, 90.68, 44.73, 66.93, 54.1, 98.48, 5.71, 95.97, 86.61, 89.4, 91.0, 46.61, 0.66, 18.85, 0.0, 65.44]
--mAP:  49.73
-------------------------------
V-mAP @ 0.75 IoU:
--Per AP:  [0.0, 0.0, 21.81, 0.0, 0.0, 1.11, 7.33, 56.58, 7.69, 39.05, 9.47, 20.53, 0.0, 36.57, 2.25, 66.92, 32.27, 12.78, 69.46, 10.47, 0.04, 0.34, 0.0, 29.66]
--mAP:  17.68
  • AVA

Run the following command to calculate frame mAP@0.5 IoU:

python eval.py \
        --cuda \
        -d ava_v2.2 \
        -v yowo \
        --weight path/to/weight

Our YOWO's result of frame mAP@0.5 IoU on AVA-v2.2:

AP@0.5IOU/answer phone: 0.6200712155913068,
AP@0.5IOU/bend/bow (at the waist): 0.3684199174015223,
AP@0.5IOU/carry/hold (an object): 0.4368366146575504,
AP@0.5IOU/climb (e.g., a mountain): 0.006524045204733175,
AP@0.5IOU/close (e.g., a door, a box): 0.10121428961033546,
AP@0.5IOU/crouch/kneel: 0.14271053289648555,
AP@0.5IOU/cut: 0.011371656268128742,
AP@0.5IOU/dance: 0.3472742170664651,
AP@0.5IOU/dress/put on clothing: 0.05568205010936085,
AP@0.5IOU/drink: 0.18867980887744548,
AP@0.5IOU/drive (e.g., a car, a truck): 0.5727336663149236,
AP@0.5IOU/eat: 0.2438949290288357,
AP@0.5IOU/enter: 0.03631300073681878,
AP@0.5IOU/fall down: 0.16097137034226533,
AP@0.5IOU/fight/hit (a person): 0.35295156111441717,
AP@0.5IOU/get up: 0.1661305661768072,
AP@0.5IOU/give/serve (an object) to (a person): 0.08171070895093906,
AP@0.5IOU/grab (a person): 0.04786212215222141,
AP@0.5IOU/hand clap: 0.16502425129399353,
AP@0.5IOU/hand shake: 0.05668297330776857,
AP@0.5IOU/hand wave: 0.0019633474257698715,
AP@0.5IOU/hit (an object): 0.004926567809641652,
AP@0.5IOU/hug (a person): 0.14948677865170307,
AP@0.5IOU/jump/leap: 0.11724856806405773,
AP@0.5IOU/kiss (a person): 0.18323100733498285,
AP@0.5IOU/lie/sleep: 0.5566160853381206,
AP@0.5IOU/lift (a person): 0.05071348972423068,
AP@0.5IOU/lift/pick up: 0.02400509697339648,
AP@0.5IOU/listen (e.g., to music): 0.008846030334678949,
AP@0.5IOU/listen to (a person): 0.6111863505487993,
AP@0.5IOU/martial art: 0.35494188472527066,
AP@0.5IOU/open (e.g., a window, a car door): 0.13838582757710105,
AP@0.5IOU/play musical instrument: 0.17637146118119046,
AP@0.5IOU/point to (an object): 0.0030957935199989314,
AP@0.5IOU/pull (an object): 0.006138508972102678,
AP@0.5IOU/push (an object): 0.008798412014783267,
AP@0.5IOU/push (another person): 0.06436728640658615,
AP@0.5IOU/put down: 0.011691087258412239,
AP@0.5IOU/read: 0.23947763826955498,
AP@0.5IOU/ride (e.g., a bike, a car, a horse): 0.3573836844473405,
AP@0.5IOU/run/jog: 0.3893352170239517,
AP@0.5IOU/sail boat: 0.09309936689447072,
AP@0.5IOU/shoot: 0.006834072970687,
AP@0.5IOU/sing to (e.g., self, a person, a group): 0.08181910176202781,
AP@0.5IOU/sit: 0.7709624420964878,
AP@0.5IOU/smoke: 0.05268953989999123,
AP@0.5IOU/stand: 0.7668298075740738,
AP@0.5IOU/swim: 0.17407407407407408,
AP@0.5IOU/take (an object) from (a person): 0.0383472793429592,
AP@0.5IOU/take a photo: 0.025915711741497306,
AP@0.5IOU/talk to (e.g., self, a person, a group): 0.7390988530695071,
AP@0.5IOU/text on/look at a cellphone: 0.009139739938803557,
AP@0.5IOU/throw: 0.015058496300738047,
AP@0.5IOU/touch (an object): 0.3090900998192289,
AP@0.5IOU/turn (e.g., a screwdriver): 0.01904009620734998,
AP@0.5IOU/walk: 0.6288594756415645,
AP@0.5IOU/watch (a person): 0.6489390785120175,
AP@0.5IOU/watch (e.g., TV): 0.11913599687628156,
AP@0.5IOU/work on a computer: 0.18941724461502552,
AP@0.5IOU/write: 0.022696113047944347,
mAP@0.5IOU: 0.20553860351814546
AP@0.5IOU/answer phone: 0.5639651669314073,
AP@0.5IOU/bend/bow (at the waist): 0.33601517221666766,
AP@0.5IOU/carry/hold (an object): 0.4208577802547332,
AP@0.5IOU/climb (e.g., a mountain): 0.015362037830534558,
AP@0.5IOU/close (e.g., a door, a box): 0.05856722579699733,
AP@0.5IOU/crouch/kneel: 0.16270710742985536,
AP@0.5IOU/cut: 0.03259447757034726,
AP@0.5IOU/dance: 0.19936510569452462,
AP@0.5IOU/dress/put on clothing: 0.01974443432453662,
AP@0.5IOU/drink: 0.09356501752959727,
AP@0.5IOU/drive (e.g., a car, a truck): 0.5698893029493408,
AP@0.5IOU/eat: 0.19427064247923537,
AP@0.5IOU/enter: 0.022437662936697852,
AP@0.5IOU/fall down: 0.1913729400012108,
AP@0.5IOU/fight/hit (a person): 0.33869826417910914,
AP@0.5IOU/get up: 0.11046598370903302,
AP@0.5IOU/give/serve (an object) to (a person): 0.04165150003199611,
AP@0.5IOU/grab (a person): 0.039442366284766966,
AP@0.5IOU/hand clap: 0.0511105021063975,
AP@0.5IOU/hand shake: 0.010261407092347795,
AP@0.5IOU/hand wave: 0.004008741526772979,
AP@0.5IOU/hit (an object): 0.00635673102300397,
AP@0.5IOU/hug (a person): 0.12071949962695369,
AP@0.5IOU/jump/leap: 0.04288684128713736,
AP@0.5IOU/kiss (a person): 0.1509158942914109,
AP@0.5IOU/lie/sleep: 0.49796421561453186,
AP@0.5IOU/lift (a person): 0.048965276424816656,
AP@0.5IOU/lift/pick up: 0.021571795788197068,
AP@0.5IOU/listen (e.g., to music): 0.008597518435883253,
AP@0.5IOU/listen to (a person): 0.5717068364857729,
AP@0.5IOU/martial art: 0.30153108495935566,
AP@0.5IOU/open (e.g., a window, a car door): 0.13374910597196993,
AP@0.5IOU/play musical instrument: 0.06300166361621182,
AP@0.5IOU/point to (an object): 0.0009608316917870056,
AP@0.5IOU/pull (an object): 0.006314960498212668,
AP@0.5IOU/push (an object): 0.007886200720014886,
AP@0.5IOU/push (another person): 0.04178496002131167,
AP@0.5IOU/put down: 0.009678644121314455,
AP@0.5IOU/read: 0.12988728095972746,
AP@0.5IOU/ride (e.g., a bike, a car, a horse): 0.35723030069750433,
AP@0.5IOU/run/jog: 0.3304660793110652,
AP@0.5IOU/sail boat: 0.09961189675108656,
AP@0.5IOU/shoot: 0.002028200868641035,
AP@0.5IOU/sing to (e.g., self, a person, a group): 0.07922409715996187,
AP@0.5IOU/sit: 0.769997196390207,
AP@0.5IOU/smoke: 0.027182118963007835,
AP@0.5IOU/stand: 0.7644546148083041,
AP@0.5IOU/swim: 0.34791666666666665,
AP@0.5IOU/take (an object) from (a person): 0.026775853194284386,
AP@0.5IOU/take a photo: 0.02549066470092448,
AP@0.5IOU/talk to (e.g., self, a person, a group): 0.7072203473798517,
AP@0.5IOU/text on/look at a cellphone: 0.007649665742978625,
AP@0.5IOU/throw: 0.02350848266675922,
AP@0.5IOU/touch (an object): 0.3272209015074646,
AP@0.5IOU/turn (e.g., a screwdriver): 0.01293785657008335,
AP@0.5IOU/walk: 0.5949790093227657,
AP@0.5IOU/watch (a person): 0.624513189952497,
AP@0.5IOU/watch (e.g., TV): 0.0817558010886299,
AP@0.5IOU/work on a computer: 0.14103543044480588,
AP@0.5IOU/write: 0.04247217386708656,
mAP@0.5IOU: 0.18390837880780497

About


Languages

Language:Python 99.8%Language:Shell 0.2%