Marco Moldovan's repositories
hierarchical-language-modeling
We address the task of learning contextualized word, sentence and document representations with a hierarchical language model by stacking Transformer-based encoders on a sentence level and subsequently on a document level and performing masked token prediction.
multimodal-self-distillation
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
3d-attention-video-understanding
Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.
cross-modal-speech-segment-retrieval
Learning a common representation space from speech and text for cross-modal retrieval given textual queries and speech files.
joint-nas-hpo
Automatically improving and analyzing the performance of a neural network for a fashion classification dataset. Instead of only considering the architecture and hyperparameters separately we build a system to jointly optimize them.
kg-augmented-lm
Leveraging knowledge graphs to learn a more factually grounded language model for retrieval and question answering downstream tasks.
seminar_multimodal_dl
https://slds-lmu.github.io/seminar_multimodal_dl/