rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance on Pascal Context with Seg-L-Mask/16

hardyho opened this issue · comments

Hi, thanks for the great works and the code!
I'm trying to reproduce the baseline base on mmsegmentation. While the baseline could be reproduced well on cityscapes and ADE20k, I could only get 56.9 on single scale on Pascal Context(58.1 reported). Anything I've missed?
Below is the config I'm running base on mmsegmentation, anything wrong in the setting?
Great thanks for your help!

_base_ = [
    # "./training_scheme.py",
    "../_base_/models/segmenter_vit-b16.py",
    "../_base_/datasets/pascal_context_meanstd0.5.py",
    "../_base_/default_runtime.py",
    "../_base_/schedules/schedule_80k.py",
]

model = dict(
    pretrained="pretrain/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz",
    backbone=dict(
        type="VisionTransformer",
        img_size=(480, 480),
        patch_size=16,
        in_channels=3,
        embed_dims=1024,
        num_layers=24,
        num_heads=16,
        mlp_ratio=4,
        out_indices=(5, 11, 17, 23),
        qkv_bias=True,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.1,
        with_cls_token=True,
        final_norm=True,
        norm_cfg=dict(type="LN", eps=1e-6),
        act_cfg=dict(type="GELU"),
        norm_eval=False,
        interpolate_mode="bicubic",
    ),
    neck=dict(
        type="UseIndexSingleOutNeck",
        index=-1,
    ),
    decode_head=dict(
        n_cls=60,
        n_layers=2,
        d_encoder=1024,
        n_heads=16,
        d_model=1024,
        d_ff=4 * 1024,
    ),
    test_cfg=dict(mode="slide", crop_size=(480, 480), stride=(320, 320)),
)

optimizer = dict(
    _delete_=True,
    type="SGD",
    lr=0.001,
    weight_decay=0.0,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            "pos_embed": dict(decay_mult=0.0),
            "cls_token": dict(decay_mult=0.0),
            "norm": dict(decay_mult=0.0),
        }
    ),
)

lr_config = dict(
    _delete_=True,
    policy="poly",
    warmup_iters=0,
    power=0.9,
    min_lr=1e-5,
    by_epoch=False,
)

# By default, models are trained on 8 GPUs with 2 images per GPU
data = dict(samples_per_gpu=2)

@hardyho Hi, this is not an answer for your question but rather a related question.
How did you obtain the Pascal Context dataset for the experiment you used? I downloaded the annotations from its official site but it's unclear which images belong to training and which belong to the validation sets. Also, which images belong to the training and validation set of the 59-class segmentation version used in this paper.

Hi @hardyho ,
Sorry for not answering sooner. We are currently working on a PR with the developpers of mmsegmentation. The only implementation I support now is the original one, and I cannot tell you as is why your implementation misses 1 point. I checked the config quickly and it looks reasonable.

@erikdao > How did you obtain the Pascal Context dataset for the experiment you used ?
The script https://github.com/rstrudel/segmenter/blob/master/segm/scripts/prepare_pcontext.py downloads and prepares the dataset for you. Please check it out.

@rstrudel Thank you very much!