AIgrads

This is a niche collection of research papers which are proven to be gradients pushing the field of Natural Language Processing, Deep Learning and Artificial Intelligence

RAG

Lin, X. V., Chen, X., Chen, M., Shi, W., Lomeli, M., James, R., ... & Yih, S. (2023). "Ra-dit: Retrieval-augmented dual instruction tuning." arXiv preprint arXiv:2310.01352 [PDF].

Large Language Models

Wang, Dongsheng, et al. "DocLLM: A layout-aware generative language model for multimodal document understanding." arXiv preprint arXiv:2401.00908 (2023) [PDF].
McIntosh, Timothy R., et al. "From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape." arXiv preprint arXiv:2312.10868 (2023) PDF.
Pelrine, Kellin, et al. "Exploiting Novel GPT-4 APIs." arXiv preprint arXiv:2312.14302 (2023) [PDF].
Jiang, Albert Q., et al. "Mistral 7B." arXiv preprint arXiv:2310.06825 (2023) [PDF].
Liu, Zhengzhong, et al. "LLM360: Towards Fully Transparent Open-Source LLMs." arXiv preprint arXiv:2312.06550 (2023) [PDF].
Dao, Tri, et al. "Flashattention: Fast and memory-efficient exact attention with io-awareness." Advances in Neural Information Processing Systems 35 (2022): 16344-16359. [PDF]
Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021). [PDF]
Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023). [arXiv]
Wu, Shijie, et al. "Bloomberggpt: A large language model for finance." arXiv preprint arXiv:2303.17564 (2023) [PDF].
Yang, H., Liu, X. Y., & Wang, C. D. (2023). FinGPT: Open-Source Financial Large Language Models. arXiv preprint arXiv:2306.06031. PDF
Wei, Jason, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022): 24824-24837. PDF

NLP Pretraining and Architectures

UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training Hangbo Bao et al. 2020 [arXiv]
MPNet: Masked and Permuted Pre-training for Language Understanding Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu et al. NIPS 2020 [arXiv]
ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding Sun et al. AAAI 2020 [arXiv]
STRUCTBERT: INCORPORATING LANGUAGE STRUCTURES INTO PRETRAINING FOR DEEP LANGUAGE UNDERSTANDING Wang et al, ICLR 2020 [arXiv]
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization, Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J Liu. Google Research, ICML 2020 [arXiv]
GPT3: Language Models are Few Shot Learners, Brown et al, Open AI, 2020 [arXiv]
ELECTRA: Pre-Training Text Encoders As Discriminators Rather Than Genererators, Clark et al., Stanford and Google, 2020 [arXiv]
XLNet: Generalized Autoregressive Pretraining for Language Understanding, Yang et al. NIPS 2019, Google AI Brain, 2019 [arXiv]
ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations, Zhenzhong Lan1, Mingda Chen, 2020 [arXiv]
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Raffel et al, Google, 2019 [arXiv]
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Liu et al. Facebook AI, 2019 [arXiv]
SpanBERT: Improving Pre-training by Representing and Predicting Spans, Joshi, Chen, AllenAI, Facebook Research, 2020 [arXiv]
UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation, Dong et al., Microsoft Research, 2019 [NIPS]
DistilBERT, a distilled version of BERT: smaller,faster, cheaper and lighter, Victor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF, 2020 [arXiv]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, Lewis et al. [arXiv]
MASS: Masked Sequence to Sequence Pre-training for Language Generation, Song et al. [arXiv]
GPT-2: Language Models are Unsupervised Multitask Learners Radford et al. 2018, OpenAI [OpenAI]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin et al. 2018, Google AI Language [arXiv]
GPT: Improving Language Understanding by Generative Pre-Training Alec Radford, Karthik Narasimhan, Tim Saliman, Ilya Sutskever @ OpenAI, 2018 [OpenAI]
Attention Is All You Need, Vaswani et al, 2017 [arXiv]
GRAPH ATTENTION NETWORKS Petar Velickovi, Yoshua Bengio et. al., ICLR 2018 [arXiv]]
He, Pengcheng, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. "DeBERTa: Decoding-enhanced BERT with Disentangled Attention." arXiv preprint arXiv:2006.03654 (2020). [arXiv]
Lewis, Mike, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, and Luke Zettlemoyer. "MARGE: Pre-training via paraphrasing." arXiv preprint arXiv:2006.15020 (2020).[arXiv]
Mao, Yuning, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. "Generation-augmented retrieval for open-domain question answering." arXiv preprint arXiv:2009.08553 (2020) [arXiv].
Fedus, William, Barret Zoph, and Noam Shazeer. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity." arXiv preprint arXiv:2101.03961 (2021) [arXiv].

Computer Vision

Bell, Sean, Yiqun Liu, Sami Alsheikh, Yina Tang, Edward Pizzi, M. Henning, Karun Singh, Omkar Parkhi, and Fedor Borisyuk. "Groknet: Unified computer vision model trunk and embeddings for commerce." In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2608-2616. 2020. [PDF]

Fine Tuning & Down-stream Tasks

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge Huang et al. 2020 [arXiv]
Syntax-guided Controlled Generation of Paraphrases, Ashutosh Kumar, Kabir Ahuja, Raghuram Vadapalli, Partha Talukdar, ACL 2020 [arXiv]
Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension Daniel Andor, Luheng He, Kenton Lee, Emily Pitler, ACL 2019 [arXiv]
CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION Nitish Shirish Keskar, Richard Socher, 2020 [arXiv]
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks, Gururangan et al, ACL 2020 [arXiv]
Unifying Question Answering, Text Classification, and Regression via Span Extraction Nitish Keskar, Richard Socher et al [arXiv]
How to Fine-Tune BERT for Text Classification? Sun et al, 2019[arXiv]
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Peters, Ruder, Smith, AI2, ACL 2019 [arXiv]
To Tune or Not To Tune? How About the Best of Both Worlds?, Wang et al, 2019, [arXiv]
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks Sascha Rothe, Shashi Narayan, Aliaksei Severyn, ACL 2020 [arXiv]
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping, Dodge et al. 2020 [arXiv]
If Beam Search is the Answer, What was the Question?, Clara Meister, Tim Vieira, Ryan Cotterell, Oct 2020, EMNLP [arXiv]
Xu, Benfeng, Licheng Zhang, Zhendong Mao, Quan Wang, Hongtao Xie, and Yongdong Zhang. "Curriculum learning for natural language understanding." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6095-6104. 2020. [ACLWeb]
Lee-Thorp, James, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. "FNet: Mixing Tokens with Fourier Transforms." arXiv preprint arXiv:2105.03824 (2021).

Multi task learning

MT-DNNKD: Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding, Xiaodong Liu et al. [arXiv]
Multi-Task Deep Neural Networks for Natural Language Understanding, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, 2019 [arXiv]
BAM! Born-Again Multi-Task Networks for Natural Language Understanding, Kevin Clark, Christopher D. Manning et al. [arXiv]
Multitask Prompted Training Enables Zero-Shot Task Generalization,Victor Sanh, Albert Webson, Colin Raffel et al. 2021, [arXiv]

Datasets, Benchmarks & Metrics

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, Robyn Speer et al. AAAI 2017 [arXiv]
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison,Alessandro Raganato, Jose Camacho-Collados and Roberto Navigli, ACL 2017 [arXiv]
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems, Wang et al, NIPS 2019 [arXiv]
CHECKLIST: Beyond Accuracy: Behavioral Testing of NLP Models with CHECKLIST, Ribeiro, Wu, Guestrin, Sameer Singh, 2020 [ACLWeb]
HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang et al. ACL 2018 [arXiv]
LearningQ: A Large-scale Dataset for Educational Question Generation Guanliang Chen et al. AAAI 2018, [PDF]
Petroni, Fabio, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne et al. "Kilt: a benchmark for knowledge intensive language tasks." arXiv preprint [arXiv:2009.02252 (2020).
Khot, Tushar, Peter Clark, Michal Guerquin, Peter Jansen, and Ashish Sabharwal. "QASC: A Dataset for Question Answering via Sentence Composition." In AAAI, pp. 8082-8090. 2020.

Knowledge Probes

Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge Alon Talmor, Peter Clark et al, NIPS 2020 [arXiv]
oLMpics - On what Language Model Pre-training Captures Alon Talmor, Yoav Goldberg et al. The Allen Institute for AI, 2020 [arXiv]

Explainable AI

A Framework for Understanding Unintended Consequences of Machine Learning Harini Suresh, John V. Guttag, 2020 [arXiv]
How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng, Luis Garcia, Mani Srivastava, NIPS 2020 [arXiv]
Explaining Explanations: Axiomatic Feature Interactions for Deep Networks Janizek, Sturmfels, Lee, 2020 [arXiv]
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables Zhou, Hu, Zhang, Liang, Sun, Xiong, Tang et al. [arXiv]
Principles and Practice of Explainable Machine Learning Vaishak Belle, Ioannis Papantonis [arXiv]
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI, Arrieta et al., 2019 [arXiv]
Jhamtani, Harsh, and Peter Clark. "Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering." arXiv preprint arXiv:2010.03274 (2020)[arXiv]
Narang, Sharan, Colin Raffel, Katherine Lee, Adam Roberts, Noah Fiedel, and Karishma Malkan. "WT5?! Training Text-to-Text Models to Explain their Predictions." arXiv preprint arXiv:2004.14546 (2020).[arXiv]

Probing and Interpretability

A Survey of the State of Explainable AI for Natural Language Processing Danilevsky et al. [arXiv]
Definitions, methods, and applications in interpretable machine learning W. James Murdocha, Chandan Singhb, Karl Kumbiera, Reza Abbasi-Asl, and Bin Yua, PNAS 2019 [PNAS]
Towards Transparent and Explainable Attention Models Mohankumar, Mitesh Khapra et al. ACL 2020 [arXiv]
Revealing the Dark Secrets of BERT Olga Kovaleva, ACL 2019 [arXiv]
DeepLift: Learning Important Features Through Propagating Activation Differences Avanti Shrikumar et al, Stanford University, ICML 2019 [arXiv]
Analysis Methods in Neural Language Processing: A Survey, Belinkov, Glass 2019 [arXiv]
LIME: "Why Should I Trust You?": Explaining the Predictions of Any Classifier Ribeiro, Sameer Singh, Guestrin, University of Washington, KDD 2016 [arXiv]
Axiomatic Attribution for Deep Networks, Sundararajan, Taly, Yan, Google, ICML 2017 [arXiv]
How Important Is a Neuron?, Kedar Dhamdhere, Mukund Sundararajan, Qiqi Yan, Google Research [arXiv]
SHAP: A Unified Approach to Interpreting Model Predictions, Lundberg, Lee, University of Washington, NIPS 2017 [arXiv]
Attention is not not explanation Sarah Wiegreffe, Yuval Pinter, EMNLP 2019 [arXiv]
Attention is not Explanation, Sarthak Jain, Byron C Wallace, NAACL-2019, [arXiv]
What do you Learn from Context? Probing for Sentence Structure in Contextualized Word Representations, ICLR 2019, Tenney at el [openreview]
Are Sixteen Heads Really Better than One?, Paul Michel, Omer Levy, Graham Neubig, 2019 [arXiv]
Fine-Grained Analysis of Sentence Embedding Using Auxiliary Prediction Tasks, Yossi Adi, Yoav Goldberg et al, ICLR 2017 [arXiv]
Assessing BERT’s Syntactic Abilities, Yoav Goldberg, 2019 [arXiv]
Generating Derivational Morphology with BERT Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schutze, 2020 [arXiv]
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs Warstadt et al. [arXiv]
What Does BERT Look At? An Analysis of BERT’s Attention, Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning, 2019 [arXiv]
BERT Rediscovers the Classical NLP Pipeline, Ian Tenney, Dipanjan Das, Ellie Pavlick, 2019 [arXiv]
Visualizing and Measuring the Geometry of BERT, Andy Coenen, Martin Wattenberg et al, NIPS 2019 [arXiv]
Designing and Interpreting Probes with Control Tasks, John Hewitt, Percy Liang, EMNLP-2019 [arXiv]
Open Sesame: Getting Inside BERT’s Linguistic Knowledge, Yongjie Lin, Yi Chern Tan, Robert Frank, 2019 [arXiv]
A Structural Probe for Finding Syntax in Word Representations, John Hewitt, Christopher D. Manning, 2019, NAACL 2019, Standford [arXiv]
On Identifiability in Transformers, Brunner, Liu, Pascual, Richter, Ciaramita, Wattenhofer, Google Research, ICLR 2020 [arXiv]
NILE : Natural Language Inference with Faithful Natural Language Explanations, Sawan Kumar, Partha Talukdar, 2020 [arXiv]
Quantifying Attention Flow in Transformers, Samira Abnar, Willem Zuidema, ACL 2020 [arXiv]
Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? Sen et al, ACL 2020 [arXiv] ]
Understanding Attention for Text Classification, Xiaobing Sun and Wei Lu, Singapore University, ACL 2020 [arXiv]

Few Shots Learning

Schick, Timo, and Hinrich Schütze. "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners." arXiv preprint [arXiv:2009.07118] (2020).

Extreme Classification

Taming Pretrained Transformers for Extreme Multi-label Text Classification], Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit S. Dhillon, 2020 [arXiv]
Dahiya, Kunal, Deepak Saini, Anshul Mittal, Ankush Shaw, Kushal Dave, Akshay Soni, Himanshu Jain, Sumeet Agarwal, and Manik Varma. "DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents." In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 31-39. 2021.[arXiv]

Conversational AI

Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking, Giovanni Campagna Agata Foryciarz Mehrad Moradshahi Monica S. Lam, ACL 2020 [arXiv]

Automated Essay Scoring

Automated Essay Scoring with Discourse-Aware Neural Models, Farah Nadeem et al. ACL 2019 [arXiv]

Summarisation

Get To The Point: Summarization with Pointer-Generator Networks Abigail See, Peter J. Liu, Christopher D. Manning, 2017 [arXiv]

Question Generation

A Recurrent BERT-based Model for Question Generation Ying-Hong Chan, Yao-Chung Fan. ACL 2019 workshop on Question Answering, [arXiv]
Improving Neural Question Generation using Answer Separation, Yanghoon Kim, Hwanhee Lee, Joongbo Shin and Kyomin Jung, AAAI 2018 [arXiv]
Question Generation for Question Answering, Nan Duan, Duyu Tang, Peng Chen, Ming Zhou, EMNLP 2017 [arXiv]
Toward Subgraph Guided Knowledge Graph Question Generation with Graph Neural Networks Yu Chen, Lingfei Wu, Mohammed J. Zaki, 2020 [arXiv]
Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks, Yao Zhao, Xiaochuan Ni, Yuanyuan Ding, Qifa Ke, ACL 2018 [arXiv]
CopyBERT: A Unified Approach to Question Generation with Self-Attention, Stalin Varanasi, Saadullah Amin, Gunter Neumann, ACL 2020 [arXiv]
Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model, Sathish Indurthi, Dinesh Raghu, Mitesh M. Khapra and Sachindra Joshi. ACL 2017 [arXiv]
Recent Advances in Neural Question Generation, Liangming Pan, Min-Yen Kan et al. 2019 [arXiv]
Semantic Graphs for Generating Deep Questions, Liangming Pan, Min-Yen Kan et al. 2019 [arXiv]
Learning to Ask: Neural Question Generation for Reading Comprehension Xinya Du et al. ACL 2017 [arXiv]
Neural Models for Key Phrase Extraction and Question Generation Sandeep Subramanian et al. Machine Reading for Question Answering workshop at ACL 2018 [arXiv]
Ko, Wei-Jen, Te-Yuan Chen, Yiyan Huang, Greg Durrett, and Junyi Jessy Li. "Inquisitive Question Generation for High Level Text Comprehension." arXiv preprint arXiv:2010.01657 (2020).[arXiv]
Sultan, Md Arafat, Shubham Chandel, Ramón Fernandez Astudillo, and Vittorio Castelli. "On the importance of diversity in question generation for QA." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5651-5656. 2020.[arXiv]
Lopez, Luis Enrico, Diane Kathryn Cruz, Jan Christian Blaise Cruz, and Charibeth Cheng. "Simplifying Paragraph-level Question Generation via Transformer Language Models." arXiv preprint arXiv:2005.01107 (2020)[arXiv].

Question Answering

Retrieve, Rerank, Read, then Iterate: Answering Open-Domain Questions of Arbitrary Complexity from Text Peng Qi, Christopher Manning et al. 2020 [arXiv]
Explain Yourself! Leveraging Language Models for Commonsense Reasoning Rajani, McCann, Xiong, Richard Socher, 2019 [arXiv]
Min, Sewon, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu et al. "NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned." arXiv preprint arXiv:2101.00133 (2021). [arXiv]
Karpukhin, Vladimir, Barlas Oğuz, Sewon Min, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. "Dense Passage Retrieval for Open-Domain Question Answering.", arXiv preprint arXiv:2004.04906 (2020).[arXiv]
Cheng, Hao, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering." arXiv preprint arXiv:2005.01898 (2020).[arXiv]
Ju, Ying, Fubang Zhao, Shijie Chen, Bowen Zheng, Xuefeng Yang, and Yunfeng Liu. "Technical report on conversational question answering." arXiv preprint arXiv:1909.10772 (2019).[arXiv]
Khashabi, Daniel, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. "Unifiedqa: Crossing format boundaries with a single qa system." arXiv preprint [arXiv:2005.00700] (2020).
Gomez-Perez, Jose Manuel, and Raul Ortega. "ISAAQ--Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention." arXiv preprint [arXiv:2010.00562] (2020).
Wang, Peifeng, Nanyun Peng, Filip Ilievski, Pedro Szekely, and Xiang Ren. "Connecting the dots: A knowledgeable path generator for commonsense question answering." arXiv preprint arXiv:2005.00691 (2020).
Saxena, Apoorv, Soumen Chakrabarti, and Partha Talukdar. "Question Answering Over Temporal Knowledge Graphs." arXiv preprint arXiv:2106.01515 (2021)[arXiv].

Meta Learning

DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference Shikhar Murty, Tatsunori B. Hashimoto, Christopher D. Manning, 2020 [arXiv]

Knowledge Graph Tasks

Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction Jinheon Baek, Dong Bok Lee, Sung Ju Hwang, NIPS 2020 [arXiv]
Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings, Hongyu Ren, Weihua Hu, Jure Leskovec, ICLR 2020 [arXiv]

Knowledge Infusion

GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction Oct 2020 [arXiv]
SemBERT: Semantics-aware BERT for Language Understanding Zhuosheng Zhang et al. AAAI 2020 [arXiv]
SenseBERT: Driving Some Sense into BERT Yoav Levine et al. AI21 Labs, May 2020 [arXiv]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Piktus et al, Facebook AI Research, NIPS 2020 [arXiv]
Augmenting Neural Networks with First-order Logic Tao Li, Vivek Srikumar, Archive Preprint, 2019 [arXiv]
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering Kaixin Ma et al. COIN workshop, ACL 2019 [arXiv]
Neural Natural Language Inference Models Enhanced with External Knowledge Qian Chen et al. ACL 2018 [arXiv]
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation Wang et al, Preprint, Feb 2020 [arXiv]
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning Bill Yuchen Lin et al. EMNLP-IJCNLP 19 [arXiv]
Improving question answering with external knowledge Xiaoman Pan et al. MRQA 2019 [arXiv]
Improving Natural Language Inference Using External Knowledge in the Science Questions Domain Wang et al, AAAI 2019 [arXiv]
KG-BERT: BERT for Knowledge Graph Completion, Liang Yao, Chengsheng Mao, Yuan Luo, AAAI 2020 [arXiv]
K-BERT: Enabling Language Representation with Knowledge Graph, Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang, AAAI 2020 [arXiv]
Structural Information Preserving for Graph-to-Text Generation, Linfeng Song et al., ACL 2020 [arXiv]
Low-Dimensional Hyperbolic Knowledge Graph Embeddings Chami et al, ACL 2020 [arXiv]
K-Adapters: Infusing Knowledge into Pre-Trained Models with Adapters, Ruize Wang, Ming Zhou et al. ACL 2020 [arXiv]
ERWISE: Zero-shot Word Sense Disambiguation using Sense Definition Embeddings Sawan Kumar, Partha Talukdar, ACL 2019 [arXiv]
KnowBERT: Knowledge Enhanced Contextual Word Representations Peters et al, ACL 2019 [arXiv]
Sequential Latent Knowledge Selection For Knowledge-Grounded Dialogue, Byeongchang Kim, Jaewoo Ahn, Gunhee Kim, ICLR 2020 [arXiv]
Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition, Angli Liu, Jingfei Du, Veselin Stoyanov [arXiv] ]
Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling, Logan IV, ACL 2019, [arXiv]
Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning, Kursuncu et al. AAAI 2020 [arXiv]
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, Bosselut et al. ACL 2019 [arXiv]
ERNIE: Enhanced Language Representation with Informative Entities, Zhang et al, ACL 2019 [arXiv]
EmbedKGQA: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings, Apoorv Saxena, Aditay Tripathi, Partha Talukdar, 2020 [ACLWeb]
Feng, Yanlin, Xinyue Chen, Bill Yuchen Lin, Peifeng Wang, Jun Yan, and Xiang Ren. "Scalable multi-hop relational reasoning for knowledge-aware question answering." arXiv preprint [arXiv:2005.00646] (2020).
Sun, Yu, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu et al. "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation." arXiv preprint arXiv:2107.02137 (2021).

KG Embeddings

Convolutional 2D Knowledge Graph Embeddings, Dettmers et al, AAAI 2018 [arXiv]
Translating Embeddings for Modeling Multi-relational Data, Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, NIPS 2013 [arXiv]
InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions, Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, Nilesh Agrawal, Partha Talukdar, AAAI 2020 [arXiv]
Abboud, Ralph, Ismail Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. "BoxE: A Box Embedding Model for Knowledge Base Completion." Advances in Neural Information Processing Systems 33 (2020).

Knowledge Graphs in Artificial Intelligence

Ji, Shaoxiong, Shirui Pan, Erik Cambria, Pekka Marttinen, and S. Yu Philip. "A survey on knowledge graphs: Representation, acquisition, and applications." IEEE Transactions on Neural Networks and Learning Systems (2021).
Lecue, Freddy. "On the role of knowledge graphs in explainable AI." Semantic Web 11, no. 1 (2020): 41-51. [PDF]

Math Word Problems

Tan, Minghuan, Lei Wang, Lingxiao Jiang, and Jing Jiang. "Investigating Math Word Problems using Pretrained Multilingual Language Models." arXiv preprint arXiv:2105.08928 (2021)[arXiv].
Amini, Aida, Saadia Gabriel, Peter Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. "MathQA: Towards interpretable math word problem solving with operation-based formalisms." arXiv preprint [arXiv:1905.13319] (2019).
Zhang, Jipeng, Lei Wang, Roy Ka-Wei Lee, Yi Bin, Yan Wang, Jie Shao, and Ee-Peng Lim. "Graph-to-tree learning for solving math word problems." Association for Computational Linguistics, ACL, 2020.
Saxton, David, Edward Grefenstette, Felix Hill, and Pushmeet Kohli. "Analysing mathematical reasoning abilities of neural models." arXiv preprint [arXiv:1904.01557] (2019).
Ran, Qiu, Yankai Lin, Peng Li, Jie Zhou, and Zhiyuan Liu. "NumNet: Machine reading comprehension with numerical reasoning." arXiv preprint arXiv:1910.06701 (2019).
Xie, Zhipeng, and Shichao Sun. "A Goal-Driven Tree-Structured Neural Model for Math Word Problems." In IJCAI, pp. 5299-5305. 2019.
Wang, Yan, Xiaojiang Liu, and Shuming Shi. "Deep neural solver for math word problems." In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 845-854. 2017 [PDF].
Li, Jierui, Lei Wang, Jipeng Zhang, Yan Wang, Bing Tian Dai, and Dongxiang Zhang. "Modeling intra-relation in math word problems with different functional multi-head attentions." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6162-6167. 2019.
Liang, Zhenwen, Jipeng Zhang, Jie Shao, and Xiangliang Zhang. "MWP-BERT: A Strong Baseline for Math Word Problems." arXiv preprint [arXiv:2107.13435] (2021).
Liu, Qianying, Wenyu Guan, Sujian Li, Fei Cheng, Daisuke Kawahara, and Sadao Kurohashi. "Reverse Operation based Data Augmentation for Solving Math Word Problems." arXiv preprint [arXiv:2010.01556] (2020).
Hong, Yining, Qing Li, Daniel Ciao, Siyuan Huang, and Song-Chun Zhu. "Learning by Fixing: Solving Math Word Problems with Weak Supervision." In AAAI Conference on Artificial Intelligence. [PDF] 2021.
Hong, Yining, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, and Song-Chun Zhu. "SMART: A Situation Model for Algebra Story Problems via Attributed Grammar." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, pp. 13009-13017. 2021. [PDF]
Qin, Jinghui, Xiaodan Liang, Yining Hong, Jianheng Tang, and Liang Lin. "Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks." arXiv preprint arXiv:2107.01431 (2021).
Lample, Guillaume, and François Charton. "Deep learning for symbolic mathematics." arXiv preprint arXiv:1912.01412 (2019).
Miao, Shen-Yun, Chao-Chun Liang, and Keh-Yih Su. "A diverse corpus for evaluating and developing english math word problem solvers." arXiv preprint arXiv:2106.15772 (2021).
Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, and Liang Lin. 2020. Semantically-aligned universal tree-structured solver for math word problems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3780–3789.
Patel, Arkil, Satwik Bhattamishra, and Navin Goyal. "Are NLP Models really able to Solve Simple Math Word Problems?." arXiv preprint [arXiv:2103.07191] (2021).
Griffith, Kaden, and Jugal Kalita. "Solving Arithmetic Word Problems with Transformers and Preprocessing of Problem Text." arXiv preprint [arXiv:2106.00893] (2021).
Liu, Qianying, Wenyv Guan, Sujian Li, and Daisuke Kawahara. "Tree-structured decoding for solving math word problems." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2370-2379. 2019. PDF
Thawani, Avijit, Jay Pujara, Pedro A. Szekely, and Filip Ilievski. "Representing Numbers in NLP: a Survey and a Vision." arXiv preprint arXiv:2103.13136 (2021).

Emperical Systems

Why Reinvent the Wheel – Let’s Build Question Answering Systems Together Kuldeep Singh et al., IW3C2 2018 [arXiv]

Software packages on NLP

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models, Pruksachatkun et al, 2020 [arXiv]
Huggingface's Transformers: State-of-the-art Natural Language Processing, Wolf et al, 2020 [arXiv]
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models, Wallace et al, 2019 EMNLP [arXiv]

AI Ethics & Future

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data Bender, Koller, ACL 2020 [arXiv]
THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS, Krishna, Mohit Iyyer et al. ICLR 2020 [arXiv]
What Can We Do to Improve Peer Review in NLP?, Anna Rogers, Anna Rogers, 2020 [arXiv]

Model Compression

Optimal Subarchitecture Extraction For BERT Adrian de Wynter and Daniel J. Perry, 2020 [arXiv]

Deep Learning Building Blocks

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay, Leslie N Smith, 2018 [arXiv]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Sergey Ioffe, Christian Szegedy, 2015 [arXiv]
Kaiming Initialization: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun [arXiv]
LAMB: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes, Yang You, Jing Li [arXiv]
Sentencepience: Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates Taku Kudo, Google, 2018 [arXiv]
Self-Attention with Relative Position Representations Peter Shaw, Jakob Uszkoreit, Ashish Vaswani [arXiv]
Group Normalization Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European conference on computer vision (ECCV). 2018. [arXiv]
Cheng, Hao, Xiaodong Liu, Lis Pereira, Yaoliang Yu, and Jianfeng Gao. "Posterior Differential Regularization with f-divergence for Improving Model Robustness." arXiv preprint arXiv:2010.12638 (2020) [arXiv].
Holtzman, Ari, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. "The curious case of neural text degeneration." arXiv preprint [arXiv:1904.09751] (2019).
Sablayrolles, Alexandre, Matthijs Douze, Cordelia Schmid, and Hervé Jégou. "Spreading vectors for similarity search." arXiv preprint arXiv:1806.03198 (2018) [PDF].

Learning Outcomes

Prerequisite-Driven Deep Knowledge Tracing Penghe Chen, Yu Lu, Vincent W. Zheng, Yang Pian, ICDM 2018 [arXiv]
Deep Knowledge Tracing Piech et al, [NIPS 2015]
Individualized Bayesian Knowledge Tracing Models Michael V. Yudelson, Kenneth R. Koedinger, and Geoffrey J. Gordon, CMU, [Springer 2013]

Speech Synthesis

Li, Naihan, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu. "Neural speech synthesis with transformer network." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 6706-6713. 2019. (AAAI)
Ren, Yi, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. "Fastspeech: Fast, robust and controllable text to speech." arXiv preprint arXiv:1905.09263 (2019).
Łańcucki, Adrian. "Fastpitch: Parallel text-to-speech with pitch prediction." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588-6592. IEEE, 2021 [Arxiv].
Boersma, Paul. "Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound." In Proceedings of the institute of phonetic sciences, vol. 17, no. 1193, pp. 97-110. 1993.
Ren, Yi, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv preprint arXiv:2006.04558 (2020).
McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. "Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi." In Interspeech, vol. 2017, pp. 498-502. 2017. [PDF]
Weiss, Ron J., R. J. Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, and Diederik P. Kingma. "Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5679-5683. [IEEE, 2021].

keyurfaldu / AIgrads

AIgrads

RAG

Large Language Models

NLP Pretraining and Architectures

Computer Vision

Fine Tuning & Down-stream Tasks

Multi task learning

Datasets, Benchmarks & Metrics

Knowledge Probes

Explainable AI

Probing and Interpretability

Few Shots Learning

Extreme Classification

Conversational AI

Automated Essay Scoring

Summarisation

Question Generation

Question Answering

Meta Learning

Knowledge Graph Tasks

Knowledge Infusion

KG Embeddings

Knowledge Graphs in Artificial Intelligence

Math Word Problems

Emperical Systems

Software packages on NLP

AI Ethics & Future

Model Compression

Deep Learning Building Blocks

Learning Outcomes

Speech Synthesis

Machine Learning Fundametals

About

Languages