There are 1 repository under cross-lingual topic.
A curated list of pretrained sentence and word embedding models
“百聆”是一个基于LLaMA的语言对齐增强的英语/中文大语言模型,具有优越的英语/中文能力,在多语言和通用任务等多项测试中取得ChatGPT 90%的性能。BayLing is an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
AAAI-20 paper: Cross-Lingual Natural Language Generation via Pre-Training
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Cross-Lingual Machine Reading Comprehension (EMNLP 2019)
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework
This repo contains the code for the paper Neural Factor Graph Models for Cross-lingual Morphological Tagging.
A diffusion-based cross-lingual voice conversion model, as my bachelor's thesis
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity
Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
EMNLP 2022: ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems (AAAI-2020)
PyTorch implementation of ACL paper https://arxiv.org/abs/1906.02656
Code for InfoCTM: A Mutual Information Maximization Perspective of Cross-lingual Topic Modeling (AAAI2023)
Zero-shot Cross-lingual Task-Oriented Dialogue Systems (EMNLP 2019)
MT/IE: Cross-lingual Open Information Extraction with Neural Sequence-to-Sequence Models
EMNLP-2020: Cross-lingual Spoken Language Understanding with Regularized Representation Alignment
Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).
Discovering Universal Geometry in Embeddings with ICA
Cross-lingual Normalized Pointwise Mutual Information for cross-lingual topic evaluation.
Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Code for paper "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph", EMNLP 2021 - findings.
Data and code for "Understanding Linearity of Cross-Lingual Word Embedding Mappings" (TMLR 2022)
Official code repo for paper: ACROSS: An Alignment-based Framework for Low-Resource Many-to-One Cross-Lingual Summarization
Code for the paper "Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation" (COLING 2020)