EfficientDet

[1] Mingxing Tan, Ruoming Pang, Quoc V. Le. EfficientDet: Scalable and Efficient Object Detection. CVPR 2020. Arxiv link: https://arxiv.org/abs/1911.09070

Updates:

Apr22: Speed up end-to-end latency: D0 has up to >200 FPS throughput on Tesla V100.
- A great collaboration with @fsx950223.
Apr1: Updated results for test-dev and added EfficientDet-D7.
Mar26: Fixed a few bugs and updated all checkpoints/results.
Mar24: Added tutorial with visualization and coco eval.
Mar 13: Released the initial code and models.

Quick start tutorial: tutorial.ipynb

Quick install dependencies: pip install -r requirements.txt

1. About EfficientDet Models

EfficientDets are a family of object detection models, which achieve state-of-the-art 52.6mAP on COCO test-dev, yet being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Our models also run 2x - 4x faster on GPU, and 5x - 11x faster on CPU than other detectors.

EfficientDets are developed based on the advanced backbone, a new BiFPN, and a new scaling technique:

Backbone: we employ EfficientNets as our backbone networks.
BiFPN: we propose BiFPN, a bi-directional feature network enhanced with fast normalization, which enables easy and fast feature fusion.
Scaling: we use a single compound scaling factor to govern the depth, width, and resolution for all backbone, feature & prediction networks.

Our model family starts from EfficientDet-D0, which has comparable accuracy as YOLOv3. Then we scale up this baseline model using our compound scaling method to obtain a list of detection models EfficientDet-D1 to D6, with different trade-offs between accuracy and model complexity.

** For simplicity, we compare the whole detectors here. For more comparison on FPN/NAS-FPN/BiFPN, please see Table 4 of our paper.

2. Pretrained EfficientDet Checkpoints

We have provided a list of EfficientDet checkpoints and results as follows:

Model	AP^val	AP^test	AP₅₀	AP₇₅	AP_S	AP_M	AP_L	#params	#FLOPs
EfficientDet-D0 (ckpt, val, test-dev)	33.5	33.8	52.2	35.8	12.0	38.3	51.2	3.9M	2.54B
EfficientDet-D1 (ckpt, val, test-dev)	39.1	39.6	58.6	42.3	17.9	44.3	56.0	6.6M	6.10B
EfficientDet-D2 (ckpt, val, test-dev)	42.5	43.0	62.3	46.2	22.5	47.0	58.4	8.1M	11.0B
EfficientDet-D3 (ckpt, val, test-dev)	45.9	45.8	65.0	49.3	26.6	49.4	59.8	12.0M	24.9B
EfficientDet-D4 (ckpt, val, test-dev)	49.0	49.4	69.0	53.4	30.3	53.2	63.2	20.7M	55.2B
EfficientDet-D5 (ckpt, val, test-dev)	50.5	50.7	70.2	54.7	33.2	53.9	63.2	33.7M	135.4B
EfficientDet-D6 (ckpt, val, test-dev)	51.3	51.7	71.2	56.0	34.1	55.2	64.1	51.9M	225.6B
EfficientDet-D7 (ckpt, val, test-dev)	52.1	52.6	71.6	56.9	35.3	55.9	65.0	51.9M	324.8B

** val denotes validation results, test-dev denotes test-dev2017 results. AP^val is for validation accuracy, all other AP results in the table are for COCO test-dev2017. All accuracy numbers are for single-model single-scale without ensemble or test-time augmentation. All checkpoints are trained with baseline preprocessing (no autoaugmentation). ** EfficientNet-D0 to D6 are trained with 300 epochs, EfficientNet-D7 is trained with 500 epochs.

3. Export SavedModel, frozen graph, tensort models, or tflite.

Run the following command line to export models:

!rm  -rf savedmodeldir
!python model_inspect.py --runmode=saved_model --model_name=efficientdet-d0 \
  --ckpt_path=efficientdet-d0 --saved_model_dir=savedmodeldir \
  --tensorrt=FP16  --tflite_path=efficientdet-d0.tflite

Then you will get:

saved model under savedmodeldir/
frozen graph with name savedmodeldir/efficientdet-d0_frozen.pb
TensorRT saved model under savedmodeldir/tensorrt_fp32/
tflite file with name efficientdet-d0.tflite

Notably, --tflite_path only works after 2.3.0-dev20200521

4. Benchmark model latency.

There are two types of latency: network latency and end-to-end latency.

(1) To measure the network latency (from the fist conv to the last class/box prediction output), use the following command:

!python model_inspect.py --runmode=bm --model_name=efficientdet-d0

** add --hparams="precision=mixed-float16" if running on V100.

On single Tesla V100 without TensorRT, our D0 network (no pre/post-processing) has 134 FPS (frame per second) for batch size 1, and 238 FPS for batch size 8.

(2) To measure the end-to-end latency (from the input image to the final rendered new image, including: image preprocessing, network, postprocessing and NMS), use the following command:

!rm  -rf /tmp/benchmark/
!python model_inspect.py --runmode=saved_model --model_name=efficientdet-d0 \
  --ckpt_path=efficientdet-d0 --saved_model_dir=/tmp/benchmark/ \

!python model_inspect.py --runmode=saved_model_benchmark \
  --saved_model_dir=/tmp/benchmark/efficientdet-d0_frozen.pb \
  --model_name=efficientdet-d0  --input_image=testdata/img1.jpg  \
  --output_image_dir=/tmp/  \

On single Tesla V100 without using TensorRT, our end-to-end latency and throughput are:

Model	mAP	batch1 latency	batch1 throughput	batch8 throughput
EfficientDet-D0	33.8	10.2ms	97 fps	209 fps
EfficientDet-D1	39.6	13.5ms	74 fps	140 fps
EfficientDet-D2	43.0	17.7ms	57 fps	97 fps
EfficientDet-D3	45.8	29.0ms	35 fps	58 fps
EfficientDet-D4	49.4	42.8ms	23 fps	35 fps
EfficientDet-D5	50.7	72.5ms	14 fps	18 fps

** FPS means frames per second (or images/second).

5. Inference for images.

# Step0: download model and testing image.
!export MODEL=efficientdet-d0
!export CKPT_PATH=efficientdet-d0
!wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/${MODEL}.tar.gz
!wget https://user-images.githubusercontent.com/11736571/77320690-099af300-6d37-11ea-9d86-24f14dc2d540.png -O img.png
!tar xf ${MODEL}.tar.gz

# Step 1: export saved model.
!python model_inspect.py --runmode=saved_model \
  --model_name=efficientdet-d0 --ckpt_path=efficientdet-d0 \
  --hparams="image_size=1920x1280" \
  --saved_model_dir=/tmp/saved_model

# Step 2: do inference with saved model.
!python model_inspect.py --runmode=saved_model_infer \
  --model_name=efficientdet-d0  \
  --saved_model_dir=/tmp/saved_model  \
  --input_image=img.png --output_image_dir=/tmp/
# you can visualize the output /tmp/0.jpg

Alternatively, if you want to do inference using frozen graph instead of saved model, you can run

# Step 0 and 1 is the same as before.
# Step 2: do inference with frozen graph.
!python model_inspect.py --runmode=saved_model_infer \
  --model_name=efficientdet-d0  \
  --saved_model_dir=/tmp/saved_model/efficientdet-d0_frozen.pb  \
  --input_image=img.png --output_image_dir=/tmp/

Lastly, if you only have one image and just want to run a quick test, you can also run the following command (it is slow because it needs to construct the graph from scratch):

# Run inference for a single image.
!python model_inspect.py --runmode=infer --model_name=$MODEL \
  --hparams="image_size=1920x1280"  --max_boxes_to_draw=100   --min_score_thresh=0.4 \
  --ckpt_path=$CKPT_PATH --input_image=img.png --output_image_dir=/tmp
# you can visualize the output /tmp/0.jpg

Here is an example of EfficientDet-D0 visualization: more on tutorial

6. Inference for videos.

You can run inference for a video and show the results online:

# step 0: download the example video.
!wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/data/video480p.mov -O input.mov

# step 1: export saved model.
!python model_inspect.py --runmode=saved_model \
  --model_name=efficientdet-d0 --ckpt_path=efficientdet-d0 \
  --saved_model_dir=/tmp/savedmodel

# step 2: inference video using saved_model_video.
!python model_inspect.py --runmode=saved_model_video \
  --model_name=efficientdet-d0 \
  --saved_model_dir=/tmp/savedmodel --input_video=input.mov

# alternative step 2: inference video and save the result.
!python model_inspect.py --runmode=saved_model_video \
  --model_name=efficientdet-d0   \
  --saved_model_dir=/tmp/savedmodel --input_video=input.mov  \
  --output_video=output.mov

7. Eval on COCO 2017 val or test-dev.

// Download coco data.
!wget http://images.cocodataset.org/zips/val2017.zip
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!unzip val2017.zip
!unzip annotations_trainval2017.zip

// convert coco data to tfrecord.
!mkdir tfrecord
!PYTHONPATH=".:$PYTHONPATH"  python dataset/create_coco_tfrecord.py \
    --image_dir=val2017 \
    --caption_annotations_file=annotations/captions_val2017.json \
    --output_file_prefix=tfrecord/val \
    --num_shards=32

// Run eval.
!python main.py --mode=eval  \
    --model_name=${MODEL}  --model_dir=${CKPT_PATH}  \
    --validation_file_pattern=tfrecord/val*  \
    --val_json_file=annotations/instances_val2017.json

You can also run eval on test-dev set with the following command:

!wget http://images.cocodataset.org/zips/test2017.zip
!unzip -q test2017.zip
!wget http://images.cocodataset.org/annotations/image_info_test2017.zip
!unzip image_info_test2017.zip

!mkdir tfrecord
!PYTHONPATH=".:$PYTHONPATH"  python dataset/create_coco_tfrecord.py \
      --image_dir=test2017 \
      --image_info_file=annotations/image_info_test-dev2017.json \
      --output_file_prefix=tfrecord/testdev \
      --num_shards=32

# Eval on test-dev: testdev_dir must be set.
# Also, test-dev has 20288 images rather than val 5000 images.
!python main.py --mode=eval  \
    --model_name=${MODEL}  --model_dir=${CKPT_PATH}  \
    --validation_file_pattern=tfrecord/testdev*  \
    --testdev_dir='testdev_output' --eval_samples=20288
# Now you can submit testdev_output/detections_test-dev2017_test_results.json to
# coco server: https://competitions.codalab.org/competitions/20794#participate

8. Train on PASCAL VOC 2012 with backbone ImageNet ckpt.

# Download and convert pascal data.
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
!tar xf VOCtrainval_11-May-2012.tar
!mkdir tfrecord
!PYTHONPATH=".:$PYTHONPATH"  python dataset/create_pascal_tfrecord.py  \
    --data_dir=VOCdevkit --year=VOC2012  --output_path=tfrecord/pascal

# Download backbone checkopints.
!wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz
!tar xf efficientnet-b0.tar.gz 

!python main.py --mode=train_and_eval \
    --training_file_pattern=tfrecord/pascal*.tfrecord \
    --validation_file_pattern=tfrecord/pascal*.tfrecord \
    --model_name=efficientdet-d0 \
    --model_dir=/tmp/efficientdet-d0-scratch  \
    --backbone_ckpt=efficientnet-b0  \
    --train_batch_size=8 \
    --eval_batch_size=8 --eval_samples=512 \
    --num_examples_per_epoch=5717 --num_epochs=1  \
    --hparams="num_classes=20,moving_average_decay=0"

9. Finetune on PASCAL VOC 2012 with detector COCO ckpt.

Create a config file for the PASCAL VOC dataset called voc_config.yaml and put this in it.

  num_classes: 20
  moving_average_decay: 0

Download efficientdet coco checkpoint.

!wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-d0.tar.gz
!tar xf efficientdet-d0.tar.gz

Finetune needs to use --ckpt rather than --backbone_ckpt.

!python main.py --mode=train_and_eval \
    --training_file_pattern=tfrecord/pascal*.tfrecord \
    --validation_file_pattern=tfrecord/pascal*.tfrecord \
    --model_name=efficientdet-d0 \
    --model_dir=/tmp/efficientdet-d0-finetune  \
    --ckpt=efficientdet-d0  \
    --train_batch_size=8 \
    --eval_batch_size=8 --eval_samples=1024 \
    --num_examples_per_epoch=5717 --num_epochs=1  \
    --hparams=voc_config.yaml

If you want to do inference for custom data, you can run

# Setting hparams-flag is needed sometimes.
!python model_inspect.py --runmode=infer \
  --model_name=efficientdet-d0   --ckpt_path=efficientdet-d0 \
  --hparams=voc_config.yaml  \
  --input_image=img.png --output_image_dir=/tmp/

You should check more details of runmode which is written in caption-4.

10. Train on multi GPUs.

Install horovod.

Create a config file for the PASCAL VOC dataset called voc_config.yaml and put this in it.

  num_classes: 20
  moving_average_decay: 0

Download efficientdet coco checkpoint.

!wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-d0.tar.gz
!tar xf efficientdet-d0.tar.gz

Finetune needs to use --ckpt rather than --backbone_ckpt.

!horovodrun -np <num_gpus> -H localhost:<num_gpus> python main.py --mode=train \
    --training_file_pattern=tfrecord/pascal*.tfrecord \
    --validation_file_pattern=tfrecord/pascal*.tfrecord \
    --model_name=efficientdet-d0 \
    --model_dir=/tmp/efficientdet-d0-finetune  \
    --ckpt=efficientdet-d0  \
    --train_batch_size=8 \
    --eval_batch_size=8 --eval_samples=1024 \
    --num_examples_per_epoch=5717 --num_epochs=1  \
    --hparams=voc_config.yaml
    --strategy=horovod

If you want to do inference for custom data, you can run

# Setting hparams-flag is needed sometimes.
!python model_inspect.py --runmode=infer \
  --model_name=efficientdet-d0   --ckpt_path=efficientdet-d0 \
  --hparams=voc_config.yaml  \
  --input_image=img.png --output_image_dir=/tmp/

You should check more details of runmode which is written in caption-4.

11. Training EfficientDets on TPUs.

To train this model on Cloud TPU, you will need:

A GCE VM instance with an associated Cloud TPU resource.
A GCS bucket to store your training checkpoints (the "model directory").
Install latest TensorFlow for both GCE VM and Cloud.

Then train the model:

!export PYTHONPATH="$PYTHONPATH:/path/to/models"
!python main.py --tpu=TPU_NAME --training_file_pattern=DATA_DIR/*.tfrecord --model_dir=MODEL_DIR --strategy=tpu

# TPU_NAME is the name of the TPU node, the same name that appears when you run gcloud compute tpus list, or ctpu ls.
# MODEL_DIR is a GCS location (a URL starting with gs:// where both the GCE VM and the associated Cloud TPU have write access.
# DATA_DIR is a GCS location to which both the GCE VM and associated Cloud TPU have read access.

For more instructions about training on TPUs, please refer to the following tutorials:

EfficientNet tutorial: https://cloud.google.com/tpu/docs/tutorials/efficientnet
RetinaNet tutorial: https://cloud.google.com/tpu/docs/tutorials/retinanet

NOTE: this is not an official Google product.

lvweiwolf / efficientdet