BokyLiu/multitask_semantic_segmentation

Final Group Project: Multi-Task Segmentation and Depth Estimation

We introduce a multi-task learning setup using synthetic indoor scenes to learn image segmentation and depth estimation in a combined learning procedure. With different encoder-decoder architectures and several multi-task loss functions, we learn a common representation of the tasks. This facilitates learning tasks in conjunction as opposed to learning those tasks separately. We trained a baseline Unet inspired architecture adaptation we call Unet-Hydra and several DeepLab and SOSD-inspired architectures tailored to our tasks individually. Evaluating on mIOU for a semantic segmentation task and RMSE for depth estimation, we achieved a score of 0.60 mIOU and 4.96 RMSE for Unet-Hydra, 0.66 mIOU and 5.21 RMSE for the best DeepLab inspired architecture and 0.76 mIOU and 4.63 RMSE for the SOSD inspired architecture after extensive hyperparameter search. Our best model yields 0.77 mIOU and 4.28 RMSE when evaluated on Hypersim Dataset and results in 0.58 mIOU and 0.59 RMSE when performing transfer learning to real world indoor scenes from the NYU V2 dataset. The main insights of this work are that multi-task learning with different architectures continually outperforms single task learning and that shared task decoder outperform single-decoder architectures. Manually weighted task losses outperform learned weight parameters. Additionally, our work shows that multi-task learning on synthetic data can be successfully used for transfer learning real world data can produce good results on the semantic segmentation task.

BokyLiu / multitask_semantic_segmentation

Final Group Project: Multi-Task Segmentation and Depth Estimation

About

Languages