There are 0 repository under mbert topic.
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Official Repository for the paper titled "Meta-Learning for Effective Multi-task and Multilingual Modelling" accepted at EACL 2021
[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.
A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.
Zero-shot and Translation Experiments on XQuAD, MLQA and TyDiQA
ICEBERT: Interlingual-Clusters Enhanced BERT. A BERT-like model trained on clusters of monolingual subwords.
This research examines Large Language Models in Bengali Natural Language Inference, comparing them with state-of-the-art models using the XNLI dataset.
mBERT and XLM-R for encodeing of Scandinavian languages
HASOC2021: Subtask 2 a) Codemix Challenge; Contains baselines and hierarchical approach that extracts the relevant context useful for classification of hostile tweets on English-Hindi code-mix data obtained from twitter.
Deployed model which can summarize Lithuanian language text by leveraging Artificial Neural Networks, Transformers, mBERT.
Multilingual hate speech detection for German, Italian and Spanish Social Media Posts #machine learning #classifier
By using the hypothesis of historical linguistics, we found a way to improve the performance of multilingual transformers with limited amount of data
This is a project proposal to implement Yan et al.'s (2020) mBERT-Unaligned for cross-lingual RDs with Japanese, German and Italian untranslatable terms
This is a project proposal to implement Yan et al.'s (2020) mBERT-Unaligned for cross-lingual RDs with Japanese, German and Italian untranslatable terms
Collection of scripts used to create SRL datasets for Galician and Spanish using a verbal indexing method, as well as fine-tuned BERT and XLM-R models for SRL on each language
Align Parallel Sentence of 104 Languages with the help of mBERT and LaBSE
Fine tuned BERT, mBERT and XLMRoBERTa for Abusive Comments Detection in Telugu, Code-Mixed Telugu and Telugu-English.
GPT 3.5 FineTuning
Bengali Misogyny Identification with Deep Learning and LIME.
This study presents a novel multimodal fusion technique for disaster identification in Bangla, combining text and image data using the "BanglaCalamityMMD" dataset. Employing DisasterTextNet, DisasterImageNet, and DisasterMultFusionNet, the approach addresses a key gap in Bangla disaster research.
This study addresses the gap in translating Bangla regional dialects into standard Bangla by creating a large-scale multilingual benchmark dataset of 32,500 sentences in Bangla, Banglish, and English, representing five regional Bangla dialects such as Sylheti, Chittagong, Mymensingh, Noakhali, and Barishal.
Slovenian Definition Extraction