This is a PyTorch version for MOT TrackletNet
We use FCOS as an example detector to detect objects with class on images.
Start to train or eval or test:
see detection/detect.sh
Remember to change DATASET.ROOT in configs/fcos_detector.yaml to the root of your own data.
We use tracking labels to generate crop images. Destination dir structure is data/crop_data dir in tree.md. The cropped images will be used to train the appearance model and tracklet connectivity model.
Start to crop:
python3 datasets/crop_gt_box.py --data_root data/ --dst_root data/crop_data
We use Facenet + triplet loss as an example to train feature embeddings generation based on cropped images.
Start to train:
python3 appearance/train_appearance.py --cfg configs/facenet_triplet_appearance.yaml
Remember to change DATASET.ROOT in configs/facenet_triplet_appearance.yaml to the root of your own cropped data, change MODEL.FEATURE_PATH and MODEL.LOGITS_PATH to the pretrained weights path of InceptionResnetV1 (20180402-114759-vggface2-features.pt and 20180402-114759-vggface2-logits.pt)
Use feature embeddings inferenced by the appearance model in step 3 to train the tracklet connectivity model (referring to the paper of TNT) .
Start to train:
python3 tracklets/train_trackletpair_fushion.py --cfg configs/tracklet_pair_connectivity.yaml
Remember to change DATASET.ROOT in configs/tracklet_pair_connectivity.yaml to the root of your own cropped data, change MODEL.APPEARANCE.WEIGHTS to the path of your own well trained appearance model .pt file (e.g. Facenet). TRACKLET.WINDOW_lEN indicates the length of the silding window for trackletpair sampling.
Generate tracklet clusters with interpolation among discrete tracklets in one cluster, based on frame images. The pipeline constructs as below:
-
We firstly use well trained detector in step 1 to get detected results on each frame.
-
Secondly, we apply well trained appearance model in step 3 to get apearance embedding for each detected objects.
-
Thirdly, we use geometry constriants and advanced location prediction to merge the neighbor detected resultes as coarse tracklets.
-
Fourthly, we analyze the temporal connections between coarse tracklet pairs and the object location information to cluster the tracklets into coarse clusters. We use well trained tracklet connectivity model in step 4 to update the coarse clusters, compute the cost between tracklet pairs in one cluster.
-
Fifthly, we consider the cost between tracklet pairs as edge weights in a tracklet graph. We use graph optimal method to generate the optimal clusters from the tracklets.
-
Lastly, we interpolate among discrete tracklets in one cluster, saving results into data/visualize.json:
dict{
frame_id: {
track_id: {
label: 0
loc: [xmin, ymin, xmax, ymax]
}
}
}
Start the pipeline:
python3 TNT/generate_clusters.py --cfg configs/cluster_generation.yaml --frame_dir [the root path for your own frame images of one video]
Remember to change DATASET.ROOT in configs/cluster_generation.yaml to the root of your own data, change MODEL.DETECTION_WEIGHTS to the path of your own well trained detection model .pt file in step 1 (e.g. FCOS), change MODEL.APPEARANCE.WEIGHTS to the path of your own well trained appearance model .pt file in step 3 (e.g. Facenet), change MODEL.RESUME_PATH to the path of your own well trained tracklet connectivity model .pt file in step 4.
See tree.md to get the example data dir structure when running the project.
See requirements.txt.
Start to set:
pip install -r requirements.txt