MissPenguin / unilm

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

The family of UniLM AI:

UniLM (v1@NeurIPS'19 | v2@ICML'20 | v3@ACL'21): unified pre-training for language understanding and generation

InfoXLM (v1@NAACL'21 | v2@ACL'21): multilingual/cross-lingual pre-trained models for 100+ languages

DeltaLM (NEW): encoder-decoder pre-training for language generation and translation for 100+ languages

MiniLM (v1@NeurIPS'20 | v2@ACL'21): small and fast pre-trained models for language understanding and generation

AdaLM (v1@ACL'21): domain, language, and task adaptation of pre-trained models

LayoutLM (v1@KDD'20 | v2@ACL'21): multimodal (text + layout/format + image) pre-training for Document AI (e.g. scanned documents, PDF, etc.)

LayoutXLM (NEW): multimodal (text + layout/format + image) pre-training for multilingual document understanding

LayoutReader (EMNLP'21): Pre-training of text and layout for reading order detection

BEiT (NEW): BERT Pre-Training of Image Transformers

UniSpeech (v1@ICML'21): Speech Pre-Training for ASR and TTS

s2s-ft: sequence-to-sequence fine-tuning toolkit

XLM-T (NEW): Multilingual NMT w/ pretrained cross-lingual encoders

TrOCR (NEW): Transformer-based OCR w/ pre-trained models


  • September 28th, 2021: T-ULRv5 (aka XLM-E/InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
  • [Model Release] September, 2021: LayoutLM-cased are on HuggingFace
  • [Model Release] September, 2021: TrOCR - Transformer-based OCR w/ pre-trained BEiT and RoBERTa models.
  • August 2021: LayoutLMv2 and LayoutXLM are on HuggingFace
  • [Model Release] August, 2021: LayoutReader - Built with LayoutLM to improve general reading order detection.
  • [Model Release] August, 2021: DeltaLM - Encoder-decoder pre-training for language generation and translation.
  • August 2021: BEiT is on HuggingFace
  • [Model Release] July, 2021: BEiT - Towards BERT moment for CV
  • [Model Release] June, 2021: LayoutLMv2, LayoutXLM, MiniLMv2, and AdaLM.
  • May, 2021: LayoutLMv2, InfoXLMv2, MiniLMv2, UniLMv3, and AdaLM were accepted by ACL 2021.
  • April, 2021: LayoutXLM is coming by extending the LayoutLM into multilingual support! A multilingual form understanding benchmark XFUND is also introduced, which includes forms with human labeled key-value pairs in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese).
  • March, 2021: InfoXLM was accepted by NAACL 2021.
  • December 29th, 2020: LayoutLMv2 is coming with the new SOTA on a wide varierty of document AI tasks, including DocVQA and SROIE leaderboard.
  • October 8th, 2020: T-ULRv2 (aka InfoXLM) as the SOTA on the XTREME leaderboard. // Blog
  • September, 2020: MiniLM was accepted by NeurIPS 2020.
  • July 16, 2020: InfoXLM (Multilingual UniLM) arXiv
  • June, 2020: UniLMv2 was accepted by ICML 2020; LayoutLM was accepted by KDD 2020.
  • April 5, 2020: Multilingual MiniLM released!
  • September, 2019: UniLMv1 was accepted by NeurIPS 2019.


***** New September, 2021: TrOCR release *****

  • TrOCR (September 22, 2021): Transformer-based OCR with pre-trained models, which leverages the Transformer architecture for both image understanding and bpe-level text generation. The TrOCR model is simple but effective (convolution free), and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models"

***** August, 2021: LayoutReader release *****

***** August, 2021: DeltaLM release *****

***** July, 2021: BEiT release *****

***** June, 2021: LayoutXLM | AdaLM | MiniLMv2 release *****

***** May, 2021: LayoutLMv2 | LayoutXLM release *****

  • LayoutLM 2.0 (December 29, 2020): multimodal pre-training for visually-rich document understanding by leveraging text, layout and image information in a single framework. It is coming with new SOTA on a wide range of document understanding tasks, including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.9781), Kleister-NDA (0.834 -> 0.852), RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). "LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding ACL 2021"

***** February, 2020: UniLM v2 | MiniLM v1 | LayoutLM v1 | s2s-ft v1 release *****

***** October 1st, 2019: UniLM v1 release *****


This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using UniLM AI models, please submit a GitHub issue.

For other communications related to UniLM AI, please contact Furu Wei (fuwei@microsoft.com).

ezoic increase your site revenue


UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

License:MIT License


Language:Python 97.4%Language:Shell 1.8%Language:Cuda 0.4%Language:C++ 0.2%Language:Cython 0.1%Language:Lua 0.1%Language:Makefile 0.0%Language:Batchfile 0.0%