wzmsltw / BSN-boundary-sensitive-network

Codes of our paper: "BSN: Boundary Sensitive Network for Temporal Action Proposal Generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistency in ground truth region generation

samxuxiang opened this issue · comments

Hi, I think there is some inconsistency between the code and your paper in terms of how ground truth regions are been generated. In the TEM_load_data.py file, you have:

gt_lens=gt_xmaxs-gt_xmins
gt_len_small=np.maximum(tgap,0.1*gt_lens)
gt_start_bboxs=np.stack((gt_xmins-gt_len_small/2,gt_xmins+gt_len_small/2),axis=1)
gt_end_bboxs=np.stack((gt_xmaxs-gt_len_small/2,gt_xmaxs+gt_len_small/2),axis=1)

I believe that this correspond to 3.4 part of the paper where you describe TEM training as:

For ground truth action instance φ g = (t s , t e ) in Ψ ω , we denote its region
as action region r ... as r = [t s − d g /10, t s + d g /10]
and r = [t e − d g /10, t e + d g /10] separately, where d g = t e − t s ...

So it seems that in your paper the start and end regions are generated by +/- one-tenth of the ground truth duration. But in your code it is +/- duration/20 (divide by 10 and then divide by 2 again).

It will great if you can help clarify this part. Thank you.

Hi @samxuxiang , thanks for pointing out this. I think this is a typo in my paper. But actually, you can also try 0.2*gt_lens in experiemnts.