fuankarion / active-speakers-context

Code for the Active Speakers in Context Paper (CVPR2020)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Postprocessing of the labels

victorywys opened this issue · comments

Hi,
Thanks for this fantastic work! Currently, I'm trying to replicate your results and build my own model. When I looked into the way you're dealing with the data, I find two functions in core/dataset.py called: _postprocess_speech_label and _post_process, which seems to transform SPEAKING_NOT_AUDIBLE to NOT_SPEAKING. As far as I can understand, this will change the original 3-category classification task to a 2-category classification during training. Will that influence the results and does it conform to the official guide? Maybe I'm misunderstanding something, please correct me if so. Thanks!

Hi you are right, we turned the problem into a binary one as the official evaluation is indeed binary (active speaker vs anything else).
Additionally, less than 2% of labels correspond to SPEAKING_NOT_AUDIBLE, so the current dateset is not the best option to evaluate the 3 category problem.