- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
- AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
- Code: https://github.com/google-research/vision_transformer
- Paper: https://arxiv.org/pdf/2010.11929.pdf
- MLP-Mixer: An all-MLP Architecture for Vision
- Code: https://github.com/google-research/vision_transformer
- Paper: https://arxiv.org/pdf/2105.01601.pdf
- Learning to Perturb Word Embeddings for Out-of-distribution QA
- Code: ???
- Paper: https://arxiv.org/pdf/2105.02692v1.pdf
- Emerging Properties in Self-Supervised Vision Transformers (Dino)
- Code: https://github.com/facebookresearch/dino
- Paper: https://arxiv.org/pdf/2104.14294.pdf
- Supplementary: https://ai.facebook.com/blog/dino-paws-computer-vision-with-self-supervised-transformers-and-10x-more-efficient-training
- Pay Attention to MLPs
- Code:
- Paper: https://arxiv.org/pdf/2105.08050.pdf
- An overview of mixing augmentation methods and augmentation strategies (Slow Read)
-
Overview of data augmentation(DA) techniques for top tier papers published from 17' and after
-
Check out https://www.deepspeed.ai/ ...! DL optimization library
- OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
- Deep Residual Learning for Image Recognition
- You Only Look Once: Unified, Real-Time Object Detection
- YOLO9000: Better, Faster, Stronger
- YOLOv3: An Incremental Improvement
- YOLOv4: Optimal Spped & Accuracy of Object Detection
- SSD: Single Shot Multibox Detector
- Focal Loss for Dense Object Detection
- Region-based Convolutional Networks for Accurate Object Detection & Segmentation
- Fast R-CNN
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Mask R-CNN
- U-Net: Convolutional Networks for Biomedical Image Segmentation
- CatBoost: Unbiased Boosting w/ Categorical Features
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree
- XGBoost : A Scalable Tree System
- Efficient Estimation of Word Representations in Vector Space
- Distributed Representations of Words and Phrases and their Compositionality
- Enriching Word Vectors with Subword Information
- Bag of Tricks for Efficient Text Classification
- Convolutional Neural Networks for Sentence Classification
- Effective Approaches to Attention based Neural Machine Translation
- Attention Is All You Need
- BERT: Pre training of Deep Bidirectional Transformers for Language Understanding
-
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
-
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
-
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
-
Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
-
From Recognition to Cognition: Visual Commonsense Reasoning
-
UNITER: UNiversal Image-TExt Representation Learning
-
Connective Cognition Network for Directional Visual Commonsense Reasoning
-
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
-
ERNIEVIL Knowledge Enhanced VisionLanguage Representations Through Scene Graph
-
Vilbert Pretraining Task-Agnostic Visiolinguistic Representations
-
VLBert Pretraining of Generic VisualLinguistic Representations
-
Language Models as Knowledge Bases?
-
Commonsense Knowledge Base Completion with Structural and Semantic Context
-
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
-
ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
-
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
- Bag of Tricks for Efficient TC
- onnecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
- From Recognition to Cognition: Visual Commonsense Reasoning
- Vilbert Pretraining Task-Agnostic Visiolinguistic Representations
- Word2Vec
- Transformer
- https://github.com/thunlp/GNNPapers on GNN must-read
- https://github.com/mhagiwara/100-nlp-papers on NLP must-read
- https://github.com/thunlp/RCPapers on RC must-read
- https://github.com/thunlp/PLMpapers on PLM must-read