There are 6 repositories under code-switching topic.
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
A curated list of research papers and resources on code-switching
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
Implementation of meta-transfer-learning for ASR and LM (ACL 2020)
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Natural Language Procesing
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Pytorch implementation of CS-Tacotron, a code-switching speech synthesis end-to-end generative TTS model.
💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection
Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning (CALCS 2018, ACL)
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
A sequence tagging model with active learning
Code repository for ACL2020 paper Multi-label and Multilingual News Framing Analysis
Implementation of a deep learning model (BiLSTM) to detect code-switching
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
POSIT aims to segment and tag mixed-text that contains English and C-like code, such that the user both knows what a token is, and within the language it's used in, what role, such as an AST tag or PoS tag, it serves.
[EMNLP 2023] Official repository of paper titled "Detecting Propaganda Techniques in Code-Switched Social Media Text"
A package for determining the matrix language in bilingual sentences
Point of Interest Error Rate (PIER) Metric for Code-Switching ASR: A specialized evaluation metric designed to focus on critical points in multilingual speech recognition, providing a more accurate analysis of code-switched utterances.
a socket script to obtain chinese phones-sequence for any english word
Code for "CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units" (Accepted at ACL-SRW 2024) 🇹🇭
Official repository for the paper titled "From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text" accepted at ACL 2021
Chrome extension for translating highlighted English text into Chinglish (a chinese + english hybrid)
A simple UI to translate a text written in romanised hindi form to fully english sentence
A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text
Japanese Speaking English Speech Dataset
This repository contains crowdsourced universal part-of-speech tags for the Miami Bangor corpus.
An English-Spanish code switching dataset adapted from the Miami-Corpus
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. (Interspeech 2019)
ArEnAV: Arabic English Audio Visual Code-Switching Deepfake Dataset
Code-Switched Data generation based on Part-of-speech and Language Modeling of the generated text.
A language detection model for code-switched texts in es/en/zh
This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.