MATeR: Multimodal Audio Text Regressor

This repository contains qualitative examples and specific details for our paper Leveraging multimodal content for podcast summarization presented at ACM SAC 2022.

Filtering advertising content from human descriptions

The model fine-tuned to classify a given sentence as containing advertising content or not is available on 🤗 Hub. It is used as a pre-processing step to remove advertising content from text descriptions. The model is trained on a set of ~2200 manually annotated examples. The data used to train the model can be found here

If you find the resources in this repository useful for your research please cite the following paper:

@inproceedings{vaiani2022leveraging,
  title={Leveraging multimodal content for podcast summarization},
  author={Vaiani, Lorenzo and La Quatra, Moreno and Cagliero, Luca and Garza, Paolo},
  booktitle={Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing},
  pages={863--870},
  year={2022}
}

Qualitative examples

Demo for the paper "Leveraging multimodal content for podcast summarization" by Lorenzo Vaiani, Moreno La Quatra, Luca Cagliero, and Paolo Garza - published in ACM SAC 2022, 2022

High score examples
Low score examples

High score examples

Audio samples for cases where MATeR generated summaries obtain significant higher ROUGE-2 F1 score if compared with the best competitor.

Example 1 (7vXiAjVFnhNI3T9Gkw636a_3N0I2LsRLmalC25phOYJmh)

Audio Samples (Example 1)

MATeR

7vXiAjVFnhNI3T9Gkw636a_3N0I2LsRLmalC25phOYJmh_84.700s_114s.mov

HiBERT

7vXiAjVFnhNI3T9Gkw636a_3N0I2LsRLmalC25phOYJmh_233.800s_263.400s_audio.mov

MATeR-textonly

7vXiAjVFnhNI3T9Gkw636a_3N0I2LsRLmalC25phOYJmh_60s_83.200s_audio.mov

Example 2 (550ZI1sg75lbNv7TXoQbec_6VAekMPV1MjUP5k465g52D)

Audio Samples (Example 2)

MATeR

550ZI1sg75lbNv7TXoQbec_6VAekMPV1MjUP5k465g52D_83.300s_107.100s.mov

HiBERT

550ZI1sg75lbNv7TXoQbec_6VAekMPV1MjUP5k465g52D_30.200s_46.900s_audio.mov

MATeR-textonly

550ZI1sg75lbNv7TXoQbec_6VAekMPV1MjUP5k465g52D_3594.600s_3624.600s_audio.mov

Example 3 (0L0j6X6cf3DO1Bs0D0K4Ch_0HfelmyAjA2a6Z6r7UQ6pe)

Audio Samples (Example 3)

MATeR

0L0j6X6cf3DO1Bs0D0K4Ch_0HfelmyAjA2a6Z6r7UQ6pe_116s_145.900s.mov

HiBERT

0L0j6X6cf3DO1Bs0D0K4Ch_0HfelmyAjA2a6Z6r7UQ6pe_266s_295.800s_audio.mov

MATeR-textonly

0L0j6X6cf3DO1Bs0D0K4Ch_0HfelmyAjA2a6Z6r7UQ6pe_30.400s_50.500s_audio.mov

Low score examples

Audio samples for cases where MATeR generated summaries obtain significant lower ROUGE-2 F1 score if compared with the best competitor.

Example 1 (7lODos6uX9G0hRGtaBFgr6_5OmsBCCphxwDEpGudThv6s)

Audio Samples (Example 1)

MATeR

7lODos6uX9G0hRGtaBFgr6_5OmsBCCphxwDEpGudThv6s_215.800s_244.900s.mov

HiBERT

7lODos6uX9G0hRGtaBFgr6_5OmsBCCphxwDEpGudThv6s_72.700s_102.500s_audio.mov

MATeR-textonly

7lODos6uX9G0hRGtaBFgr6_5OmsBCCphxwDEpGudThv6s_3484s_3513.700s_audio.mov

Example 2 (2KBfl8eidzorW02RzQf9K8_51nmU0wf4wR6wVHACEagPs)

Audio Samples (Example 2)

MATeR

2KBfl8eidzorW02RzQf9K8_51nmU0wf4wR6wVHACEagPs_1647.900s_1676.900s.mov

HiBERT

2KBfl8eidzorW02RzQf9K8_51nmU0wf4wR6wVHACEagPs_13.300s_42.400s_audio.mov

MATeR-textonly

2KBfl8eidzorW02RzQf9K8_51nmU0wf4wR6wVHACEagPs_1010.500s_1040.300s_audio.mov

Example 3 (51Bg4WCSE54ldyM7K4Nzff_3GSeNWxX70abttVM99zWsy)

Audio Samples (Example 3)

MATeR

51Bg4WCSE54ldyM7K4Nzff_3GSeNWxX70abttVM99zWsy_109.400s_139.200s.mov

HiBERT

51Bg4WCSE54ldyM7K4Nzff_3GSeNWxX70abttVM99zWsy_13.400s_41.700s_audio.mov

MATeR-textonly

51Bg4WCSE54ldyM7K4Nzff_3GSeNWxX70abttVM99zWsy_404.700s_410.700s_audio.mov

About

Demo for the paper "Leveraging multimodal content for podcast summarization" by Lorenzo Vaiani, Moreno La Quatra, Luca Cagliero, and Paolo Garza - published in ACM SAC 2022