The UAVid Dataset for Video Semantic Segmentation

Question

The UAVid Dataset for Video Semantic Segmentation

guanfuchen opened this issue 6 years ago · comments

related paper

摘要
Video semantic segmentation has been one of the research focus in computer vision recently. It serves as a perception foundation for many fields such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. Currently, there already exist several semantic segmentation datasets for complex urban scenes, such as the Cityscapes and CamVid datasets. They have been the standard datasets for comparison among semantic segmentation methods. In this paper, we introduce a new high resolution UAV video semantic segmentation dataset as complement, UAVid. Our UAV dataset consists of 30 video sequences capturing high resolution images. In total, 300 images have been densely labelled with 8 classes for urban scene understanding task. Our dataset brings out new challenges. We provide several deep learning baseline methods, among which the proposed novel Multi-Scale-Dilation net performs the best via multi-scale feature extraction. We have also explored the usability of sequence data by leveraging on CRF model in both spatial and temporal domain.

guanfuchen · Answer 1 · Mon Nov 19 2018 11:14:32 GMT+0800 (China Standard Time)

At present, there are only several public semantic segmentation datasets available, which focus only on certain applications. MS COCO [1] provides semantic segmentation dataset containing common objects recognition in common scenes, and its semantic labelling task focuses on person, car, animal and different stuffs. Pascal VOC dataset [2] also provides objects like bus, car, cow, dog for semantic segmentation task. Other semantic segmentation datasets are designed for street scene objects recognition. Their target objects include pedestrians, cars, roads, lanes, traffic lights, trees and other street scene related objects. Specially, CamVid [3] provides continuously labelled driving frames, which can be used for temporal consistency evaluation. Highway Driving dataset [4] provides 30Hz labels that are even denser in temporal domain, and it is designed for semantic video segmentation for driving scenes. Daimler Urban Segmentation dataset [5] is also a video dataset for street scene understanding, but its labels are sparser in temporal domain. Cityscapes dataset [6] focuses more on data variation as it is much larger in the number of labelled frames, which are collected from 50 cities, making it closer to real world complexity. Each frame is much larger in size compared with CamVid. The newly published Berkeley Deep Drive dataset [7] has even more image labels with medium image size across multiple street scenes. The KITTI Vision Benchmark Suite [8] also provides images of medium size for the task. To help learning models to generalize well across different scenes, ADE20K dataset [9] contributes as it spans more diverse scenes, and objects from much more different categories are labelled. ADE20K dataset brings more variability and complexity for general object representations in images. For remote sensing community, aerial image dataset is provided for ISPRS 2D semantic labelling contest [10]. All datasets above have had great impacts on the development of current state-of-the-art semantic segmentation methods.

目前，只有几种公共语义分割数据集可用，它们只关注某些应用。 MS COCO [1]提供了在常见场景中包含共同对象识别的语义分割数据集，其语义标注任务侧重于人，车，动物和不同的东西。 Pascal VOC数据集[2]还提供了用于语义分割任务的公共汽车，汽车，牛，狗等对象。其他语义分割数据集被设计用于街道场景对象识别。他们的目标对象包括行人，汽车，道路，车道，交通灯，树木和其他街道场景相关的物体。特别地，CamVid [3]提供连续标记的驾驶图像，可用于时间一致性评估。高速公路驾驶数据集[4]提供30Hz标签，在时域中更加密集，它被设计用于驾驶场景的语义视频分割。戴姆勒城市分割数据集[5]也是用于街道场景理解的视频数据集，但其标签在时域中较为稀疏。城市景观数据集[6]更侧重于数据变化，因为标记帧的数量要大得多，这些帧从50个城市收集，使其更接近现实世界的复杂性。与CamVid相比，每个框架的尺寸都要大得多。新发布的Berkeley Deep Drive数据集[7]在多个街景中拥有更多具有中等图像尺寸的图像标签。 KITTI Vision Benchmark Suite [8]还提供适合该任务的中等大小的图像。为了帮助学习模型在不同场景中得到很好的推广，ADE20K数据集[9]有助于跨越更多样化的场景，并标记来自更多不同类别的对象。 ADE20K数据集为图像中的一般对象表示带来了更多的可变性和复杂性。对于遥感社区，为ISPRS 2D语义标记竞赛[10]提供了航空图像数据集。上述所有数据集都对当前最先进的语义分割方法的发展产生了很大的影响。

As present, most of the modern visual semantic segmentation tasks use information acquired on the ground. However, another data acquisition platform is more and more utilized, which is the unmanned aerial vehicle(UAV). Compact and light weighted UAVs are a trend for future data acquisition. The UAVs make image retrieval in large area cheaper and more convenient, which allows quick access to useful information around certain area. Distinguished from collecting images by satellites, UAVs capture images from the sky with flexible flying schedule and higher resolution, bringing the possibility to monitor and analyze landscape at specific location and time swiftly. These abilities make UAVs an effective data collection means for various applications.

目前大部分视觉语义分割数据集从地面进行采集。UAV作为数据采集平台，具有cheaper和more convenient的优点，同时具有更高的分辨率。

guanfuchen · Answer 2 · Mon Nov 19 2018 11:16:13 GMT+0800 (China Standard Time)

数据集示例

标注的类别示例

标注类别的像素直方图

数据集分割

guanfuchen · Answer 3 · Mon Nov 19 2018 11:16:50 GMT+0800 (China Standard Time)

网络架构

guanfuchen · Answer 4 · Mon Nov 19 2018 11:17:05 GMT+0800 (China Standard Time)

实验结果

guanfuchen · Answer 5 · Mon Nov 19 2018 11:17:24 GMT+0800 (China Standard Time)

总结与展望