Awesome papers in the fields of computer vision, machine learning, pattern recognition, data mining, and natural language processing.

This is a collection of awesome papers I have read (carefully or roughly) in the fields of computer vision, machine learning, pattern recognition, data mining, and natural language processing (where the notes only represent my personal views). The collection will be continuously updated, so stay tuned. Any suggestions and comments are welcome (jianglinlu@outlook.com).

Manifold Learning
- Nonlinear Dimensionality Reduction
- Subspace Learning
- Graph Construction
Sparse Representation
Low-Rank Representation
Clustering
- Shallow Clustering
- Deep Clustering
Learning to Hash
- Shallow Hashing
- Deep Hashing
Domain Adaptation
- Shallow Domain Adaptation
- Deep Domain Adaptation
Convolutional Neural Network
Neural Tangent Kernel
Transformers
- Vision Transformers
- Graph Transformers
Graph Neural Network
- Spectral-based GNN
- Spatial-based GNN
- Graph Pooling
- Graph Structure Learning
- GNNs+LMs
- GNN Pruning
- Self-Supervised GNN
- GNN Pre-training
- GNN Adversarial Attacks
- Graph Domain Adaptation
- Graph Data Augmentation
- Graph Generation
- Causality with Graph
- Weisfeiler-Lehman Test
- Graph Information Bottleneck
- Deeper GNN
- Graph OOD Generalization
- Few-Shot Learning on Graph
Visual Question Answering
Implicit Neural Representations
Deep Generative Models
- Generative Adversarial Networks
- Variational Autoencoders
- Normalizing Flows
- Autoregressive Models
- Diffusion Models
- Score-based Generative Models
Network Compression
- Pruning
- Knowledge Distillation
- Network Quantization
- Low-Rank Factorization
Dataset Distillation
Learning with Label Noise
- Statistically Inconsistent Classifiers
- Statistically Consistent Classifiers
Contrastive Learning
Low-Level Vision
- High Dynamic Range Imaging
- Image Super-Resolution
- Image Low-Light Enhancement
Vision Language Pretraining
Point Cloud
Causal Inference
Natural Language rocessing
AI for Sciences
- Drug-Target Binding
Others
- Variational Inference
- Procrustes Problem
- Label Propagation
- CUR Decomposition
- Matrix Completion
- Optimization Methods
- Bilevel Programming
- PAC Learning
- Information Theory
- Probability Theory
- Gumbel-Max Trick
- Stochastic Differential Equations
- Quantum Computing
Learning Sources

Manifold Learning [Back to Top]

Nonlinear Dimensionality Reduction

A Global Geometric Framework for Nonlinear Dimensionality Reduction. Joshua B. Tenenbaum et al, Science 2000. [PDF] [Author]
Notes: This is a classical paper that proposes Isometric Feature Mapping (ISOMAP) for nonlinear dimensionality reduction, which contains three step including neighborhood graph construction, shortest paths computing, and low-dimensional embedding.
Nonlinear Dimensionality Reduction by Locally Linear Embedding. Sam T. Roweis et al, Science 2000. [PDF] [Author]
Notes: This is a classical paper that proposes Locally Linear Embedding (LLE) for nonlinear dimensionality reduction, which, being different from ISOMAP, assumes that each data point and its neighbors lie on or close to a locally linear patch of the manifold. The local geometry of these patches is characterized by linear coefficients that reconstruct each data point from its neighbors.
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Mikhail Belkin et al, Neural Computation 2003. [PDF] [Author]
Notes: This is a classical paper that proposes Laplacian Eigenmaps (LE) for nonlinear dimensionality reduction and data representation, which uses the notion of Laplacian of the graph to compute a low-dimensional representation of the data set that optimally preserves local neighborhood information in a certain sense.
Locality Preserving Projections. Xiaofei He et al, NIPS 2003. [PDF] [Author]
This paper proposes Locality Preserving Projections (LPP), which computes a linear projection matrix that maps the data point to a subspace. The linear transformation optimally preserves local neighborhood information in a certain sense. This work can be regarded as a linear extension of Laplacian Eigenmaps (LE).
Neighborhood Preserving Embedding. Xiaofei He et al, ICCV 2005. [PDF] [Author]
This paper proposes Neighborhood Preserving Embedding (NPE), which aims at preserving the local neighborhood structure on the data manifold. Here, the locality or local structure means that each data point can be represented as a linear combination of its neighbors. This work can be regarded as a linear extension of Locally Linear Embedding (LLE).
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. Shuicheng Yan et al, IEEE TPAMI 2007. [PDF] [Author]
Notes: This paper proposes a general framework called Graph Embedding for linear dimensionality reduction, in which an intrinsic graph characterizes the intraclass compactness while a penalty graph characterizes the interclass separability.
Learning Signal-Agnostic Manifolds of Neural Fields. Yilun Du et al, NeurIPS 2021. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Subspace Learning

Spectral Regression for Efficient Regularized Subspace Learning. Deng Cai et al, ICCV 2007. [PDF] [Author]
Notes: This paper proposes Spectral Regression (SR) for subspace learning, which casts the problem of learning the projective functions into a regression framework and avoids the eigen-decomposition of dense matrices. It is worth noting that different kinds of regularizers can be naturally incorporated into SR such as L₁ regularization.
Updating.... * et al, .* [PDF] [Author]

Graph Construction

Graph Construction and $b$-Matching for Semi-supervised Learning. Tony Jebara et al, ICML 2009. [PDF] [Author]
Influence of Graph Construction on Semi-supervised Learning. Celso Sousa et al, ECML PKDD 2013. [PDF] [Author]
How to Learn a Graph from Smooth Signals. Vassilis Kalofolias et al, AISTATS 2016. [PDF] [Author]
A Quest for Structure: Jointly Learning the Graph Structure and Semi-Supervised Classification. Xuan Wu et al, CIKM 2018. [PDF] [Author]
Notes: This paper proposes **Parallel Graph Learning (PG-Learn) for the graph construction step of semi-supervised learning. The two main ingredients include a) a gradient-based optimization of the edge weights (different kernel bandwidths in each dimension) and b) a parallel hyperparameter search algorithm. It adopts LGC algorithm and the corresponding solution can be found without explicitly taking any matrix inverse and instead using the power method.
Updating.... * et al, .* [PDF] [Author]

Sparse Representation [Back to Top]

Regression Shrinkage and Selection Via the Lasso. Rob Tibshirani, Journal of the Royal Statistical Society 1996. [PDF] [Author]
Notes: This is a classical paper that proposes Least absolute shrinkage and selection operator (LASSO) for linear regression, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. AKA L₁ penalty.
Regularization and Variable Selection via the Elastic Net. Hui Zou et al, Journal of the royal statistical society 2005. [PDF] [Author]
Notes: This paper proposes Elastic Net for regularization and variable selection, which encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. The Elastic Net combines L₂ regularization and L₁ regularization together, and can be viewed as a generalization of LASSO.
Sparse Principal Component Analysis. Hui Zou et al, Journal of Computational and Graphical Statistics 2006. [PDF] [Author]
Notes: This paper proposes Sparse Principal Component Analysis (SPCA) that introduces the LASSO or Elastic Net into Principal Component Analysis (PCA) to produce modified principal components with sparse loadings. It formulates PCA as a regression-type optimization problem and then obtains sparse loadings by imposing the LASSO or Elastic Net constraint on the regression coefficients. The Theorem 4 of Reduced Rank Procrustes Rotation is useful.
Robust Face Recognition via Sparse Representation. John Wright et al, IEEE TPAMI 2009. [PDF] [Author]
Notes:
Robust principal component analysis?. Emmanuel J. Cand`es et al, Journal of the ACM 2011. [PDF] [Author]
Notes:
Updating.... * et al, .* [PDF] [Author]

Resources

SLEP: Sparse Learning with Efficient Projections. Jun Liu et al, Arizona State University 2009. [PDF] [Resource] [Author]
Notes: This paper develops a Sparse Learning with Efficient Projections (SLEP) package written in Matlab for sparse representation learning.

Low-Rank Representation [Back to Top]

Robust Subspace Segmentation by Low-Rank Representation. Guangcan Liu et al, ICML 2010. [PDF] [Author]
Notes:
Robust Recovery of Subspace Structures by Low-Rank Representation. Guangcan Liu et al, IEEE TPAMI 2013. [PDF] [Author]
Notes:
The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Zhouchen Lin et al, arXiv 2013. [PDF] [Author]
Notes:
Updating.... * et al, .* [PDF] [Author]

Clustering [Back to Top]

Shallow Clustering

Large Scale Spectral Clustering with Landmark-Based Representation. Xinlei Chen et al, AAAI 2011. [PDF] [Author]
Notes: This paper adopts an Anchor Graph for spectral clustering.
Sparse Subspace Clustering: Algorithm, Theory, and Applications. Ehsan Elhamifar et al, IEEE TPAMI 2013. [PDF] [Author]
Notes: This papers proposes Sparse Subspace Clustering (SSC) which introduces sparse representation into the subspace clustering problem, and define the Self-Expressiveness property: each data point in a union of subspaces can be efficiently reconstructed by a combination of other points in the dataset.
Clustering and Projected Clustering with Adaptive Neighbors. Feiping Nie et al, KDD 2014. [PDF] [Author]
Notes: This paper proposes Clustering with Adaptive Neighbors (CAN) to learn the data similarity matrix and clustering structure simultaneously. It is worth noting that they present an effective method to determine the regularization parameter considering the locality of the data.
Updating.... * et al, .* [PDF] [Author]

Deep Clustering

Deep Subspace Clustering Networks. Pan Ji et al, NIPS 2017. [PDF] [Author]
Notes: This is the first deep subspace clustering network, however, it has been proved to be ill-posed by the paper.
A Critique of Self-Expressive Deep Subspace Clustering. Benjamin David Haeffele et al, ICLR 2021. [PDF] [Author]
Notes: This papers show that many previous deep subspace networks are ill-posed, and their performance improvement is largely attributable to an ad-hoc post-processing step.
Updating.... * et al, .* [PDF] [Author]

Learning to Hash [Back to Top]

Shallow Hashing

Locality-Sensitive Hashing Scheme based on p-Stable Distributions. Mayur Datar et al, SCG 2004. [PDF] [Author]
Notes: This paper proposes a novel Locality-Sensitive Hashing (LSH) for the Approximate Nearest Neighbor Problem under L_p norm, based on p-stable distributions. The key idea is to hash the points using several hash functions so as to ensure that, for each function, the probability of collision is much higher for objects which are close to each other than for those which are far apart. Then, one can determine near neighbors by hashing the query point and retrieving elements stored in buckets containing that point.
Spectral Hashing. Yair Weiss et al, NIPS 2008. [PDF] [Author]
Notes: This paper proposes Spectral Hashing (SH) where the bits are calculated by thresholding a subset of eigenvectors of the Laplacian of the similarity graph. The basic idea is to embed the data in a Hamming space such that the neighbors in the original data space remain neighbors in the Hamming space.
Hashing with Graphs. Wei Liu et al, ICML 2011. [PDF] [Author]
Notes: This paper proposes Anchor Graph Hashing (AGH) which builds an approximate neighborhood graph using Anchor Graphs, resulting in O(n) time for graph construction. The graph is sufficiently sparse with performance approaching to the true KNN graph as the number of anchors increases.
Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. Yunchao Gong et al, IEEE TPAMI 2013. [PDF] [Author]
Notes: This paper proposes Iterative Quantization (ITQ) that finds a rotation of zero-centered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube. The optimization problem is intrinsically an Orthogonal Procrustes problem.
The Power of Asymmetry in Binary Hashing. Behnam Neyshabur et al, NIPS 2013. [PDF] [Author]
Notes: This paper proves that shorter and more accurate hash codes can be obtained by using two distinct code maps. The asymmetry here is defined in function space, i.e., adopting different hashing functions for similarity calculation.
Sparse Projections for High-dimensional Binary Codes. Yan Xia et al, CVPR 2015. [PDF] [Author]
Notes: This paper proposes Sparse Projections (SP) which imposes orthogonal and L₀ constraints on the hashing projection matrix. For the resulting non-convex optimization problem, they adopt the variable-splitting and penalty techniques.
Supervised Discrete Hashing. Fumin Shen et al, CVPR 2015. [PDF] [Author]
Notes: This paper proposes Supervised Discrete Hashing (SDH) that solves the discrete optimization without any relaxations using a discrete cyclic coordinate descent (DCC) algorithm. The assumption is that good hash codes are optimal for linear classification.
Fast Supervised Discrete Hashing. Jie Gui et al, TPAMI 2018. [PDF] [Author]
Notes: This paper proposes Fast Supervised Discrete Hashing (FSDH) that regress the class labels of training data to the corresponding hash codes to accelerate the SDH. It avoids iterative hash code-solving step of the DCC algorithm.

Deep Hashing

Feature Learning based Deep Supervised Hashing with Pairwise Labels. Wu-Jun Li et al, IJCAI 2016. [PDF] [Author]
Notes: This paper proposes the first deep hashing method called Deep Pairwise-Supervised Hashing (DPSH) for applications with pairwise labels, which can perform simultaneous feature learning and hash-code learning. This paper can be regarded as a deep learning extension of Latent Factor Hashing (LFH).
Asymmetric Deep Supervised Hashing. Qing-Yuan Jiang et al, AAAI 2018. [PDF] [Author]
Notes: This paper proposes the first asymmetric deep hashing method called Asymmetric Deep Supervised Hashing(ADSH), which can treats query points and database points in an asymmetric way. Specifically, ADSH learns a deep hash function only for query points while the hash codes for database points are directly learned.
Deep Supervised Hashing with Anchor Graph. Yudong Chen et al, ICCV 2019. [PDF] [Author]
Notes: This paper proposes Deep Anchor Graph Hashing (DAGH), which adopts an Anchor Graph to learn the hash codes of the whole training samples directly during training. Since in different epochs the anchors used are different, the entire training samples will be trained if given enough epochs. This paper can also be regarded as an asymmetric deep hashing method.
Deep Cross-Modal Hashing. Qing-Yuan Jiang et al, CVPR 2017. [PDF] [Author]
Notes: This paper proposes the first deep cross-modal hashing called Deep Cross-Modal Hashing (DCMH) which can be regarded as a cross-modal extension of Deep Pairwise-Supervised Hashing (DPSH).
Updating.... * et al, .* [PDF] [Author]

Survey

A Survey on Learning to Hash. Jingdong Wang et al, IEEE TPAMI 2018. [PDF] [Author]

[//]: 1. A Survey on Deep Hashing Methods. Xiao Luo et al, ACM TKDD 2022. [PDF] [Author]

Domain Adaptation [Back to Top]

My Personal Learning Notes on Domain Adaptation. Jianglin Lu. [PDF]

Shallow Domain Adaptation

Domain Adaptation under Target and Conditional Shift. Kun Zhang et al, ICML 2013. [PDF] [Author]
Notes: This paper exploits importance reweighting or sample transformation to find the learning machine that works well on test data, and propose to estimate the weights or transformations by reweighting or transforming training data to reproduce the covariate distribution on the test domain.
Domain Adaptation with Conditional Transferable Components. Mingming Gong et al, ICML 2016. [PDF] [Author]
Notes:
Structural Re-weighting Improves Graph Domain Adaptation. Shikun Liu et al, ICML 2023. [PDF] [Author]

Deep Domain Adaptation

Universal Domain Adaptation. Kaichao You et al, CVPR 2019. [PDF] [Author]
Notes: This paper introduces Universal Domain Adaptation (UDA) that requires no prior knowledge on the label sets of source and target domains.
Updating.... * et al, .* [PDF] [Author]

Survey

A Survey on Transfer Learning. Sinno Jialin Pan et al, IEEE TKDE 2010. [PDF] [Author]
A Comprehensive Survey on Transfer Learning. Fuzhen Zhuang et al, Proceedings of the IEEE 2021. [PDF] [Author]

Neural Tangent Kernel [Back to Top]

Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Arthur Jacot et al, NeurIPS 2018. [PDF] [Author]
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent. Jaehoon Lee et al, NeurIPS 2019. [PDF] [Author]
Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels. Simon S. Du et al, NeurIPS 2019. [PDF] [Author]
Gradient Descent Finds Global Minima of Deep Neural Networks. Simon S. Du et al, ICML 2019. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Convolutional Neural Network [Back to Top]

Going deeper with convolutions. Christian Szegedy et al, CVPR 2015. [PDF] [Author]
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Kaiming He et al, ICCV 2015. [PDF] [Author]
Deep Residual Learning for Image Recognition. Kaiming He et al, CVPR 2016. [PDF] [Author]
Searching for Activation Functions. Prajit Ramachandran et al, arXiv 2017. [PDF] [Author]
Densely Connected Convolutional Networks. Gao Huang et al, CVPR 2017. [PDF] [Author]
A ConvNet for the 2020s. Zhuang Liu et al, CVPR 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Transformers [Back to Top]

Vision Transformers

Attention Is All You Need. Ashish Vaswani et al, NIPS 2017. [PDF] [Author]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Ze Liu et al, ICCV 2021. [PDF] [Author]
Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet. Li Yuan et al, ICCV 2021. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Transformers

Graph Transformer Networks. Seongjun Yun et al, NeurIPS 2019. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Meta-Transformer: A Unified Framework for Multimodal Learning. Yiyuan Zhang et al, arXiv 2023. [PDF] [Author]

Survey

A Survey on Vision Transformer. Kai Han et al, IEEE TPAMI 2022. [PDF] [Author]

Graph Neural Network [Back to Top]

My Personal Learning Notes on Graph Neural Network. Jianglin Lu. [PDF]

Spectral-based GNN

Simplifying Graph Convolutional Networks. Felix Wu et al, ICML 2019. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Spatial-based GNN

Semi-Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf et al, ICLR 2017. [PDF] [Author] [Code]
Inductive Representation Learning on Large Graphs. William L. Hamilton et al, NeurIPS 2017. [PDF] [Author] [Code]
Notes: Most existing approaches are inherently transductive, which require all nodes in the graph are present during training of the embeddings. This paper proposes SAmple and aggreGatE (GraphSAGE) for inductive node embedding, which generates embeddings by sampling and aggregating features from a node's local neighborhood.
Graph Attention Networks. Petar Veličković et al, ICLR 2018. [PDF] [Author]
Pitfalls of Graph Neural Network Evaluation. Oleksandr Shchur et al, NeurIPS 2018. [PDF] [Author]
Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning. Qimai Li et al, AAAI 2018. [PDF] [Author]
Adaptive Sampling Towards Fast Graph Representation Learning. Wenbing Huang et al, NeurIPS 2018. [PDF] [Author]
LanczosNet: Multi-Scale Deep Graph Convolutional Networks. Renjie Liao et al, ICLR 2019. [PDF] [Author] [Code]
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. Ziyu Liu et al, CVPR 2020. [PDF] [Author] [Code]
Grale: Designing Networks for Graph Learning. Jonathan Halcrow et al, KDD 2020. [PDF] [Author]
Graph Neural Networks with Adaptive Residual. Xiaorui Liu et al, NeurIPS 2021. [PDF] [Author]
E(n) Equivariant Graph Neural Networks. Victor Garcia Satorras et al, ICML 2021. [PDF] [Author]
Understanding over-squashing and bottlenecks on graphs via curvature. Jake Topping et al, ICLR 2022. [PDF] [Author]
DropMessage: Unifying Random Dropping for Graph Neural Networks. Taoran Fang et al, AAAI 2023. [PDF] [Author] [Code]
Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure. Ryoma Sato et al, ICML 2023. [PDF] [Author]
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs. Chenxiao Yang et al, ICLR 2023. [PDF] [Author] [Code]
Updating.... * et al, .* [PDF] [Author]

Graph Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling. Rex Ying et al, NeurIPS 2018. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Structure Learning (AKA, Latent Graph Learning)

Deep Convolutional Networks on Graph-Structured Data. Mikael Henaff et al, arXiv 2015. [PDF] [Author]
Notes:
Learning Conditioned Graph Structures for Interpretable Visual Question Answering. Will Norcliffe-Brown et al, . [PDF] [Author]
Adaptive Graph Convolutional Neural Networks. Ruoyu Li et al, AAAI 2018. [PDF] [Author]
Notes: The bottlenecks of current graph CNNS: a. restrict graph degree; b. required identical graph structure shared among inputs; C. fixed graph constructed without training; d. incapability of learning from topological structure. This paper proposes Adaptive Graph Convolution Network (AGCN) that feeds on original data of diverse graph structures. AGCN seems to be designed primarily for graph classification. Besides, AGCN needs an initial graph and suffers from the limitation of transductive models as described in DGM.
Topology Optimization based Graph Convolutional Network. Liang Yang et al, IJCAI 2019. [PDF] [Author]
Notes: This paper proposes Topology Optimization based GCN (TO-GCN) to jointly learn the network topology and the parameters of fully connected network. The refinement of the network topology is modeled as a Label Propagation process where the network topology is modeled as the multiplication of the predicted label matrix with its transpose matrix. The TO-GCN also penalizes the high similarities between the nodes from different classes.
Semi-Supervised Learning With Graph Learning-Convolutional Networks. Bo Jiang et al, CVPR 2019. [PDF] [Author] [Code]
Notes: This paper proposes Graph Learning-Convolutional Network (GLCN) for semi-supervised task, which integrates both graph learning and graph convolution in a unified network architecture such that both given and estimated labels are incorporated to provide weakly supervised information for graph structure refinement. The graph learning function is similar to GAT and the graph learning loss is similar to CAN. The graph learned in the sense of probability is dense and lack sparse structure.
Large Scale Graph Learning from Smooth Signals. Vassilis Kalofolias et al, ICLR 2019. [PDF] [Author]
Notes: This papers uses approximate nearest neighbor techniques for large scale graph learning from smooth signals. Also refer to paper.
Learning Discrete Structures for Graph Neural Networks. Luca Franceschi et al, ICML 2019. [PDF] [Author] [Code]
Notes: This paper proposes Learning Discrete Structures (LDS) to learn the graph structure and the parameters of GCNs by approximately solving a bilevel program that learns a discrete probability distribution of the edges of the graph. Given two objective functions $F$ and $L$, the outer and inner objectives, and two sets of variables, $\theta \in \mathcal{R}^{m}$ and $\omega \in \mathcal{R}^{d}$, the outer and inner variables, a Bilevel Program is given by: $\min_{\theta, \omega_{\theta}}F(\omega_{\theta}, \theta)$ such that $\omega_{\theta} \in \arg \min_{\omega} L(\omega, \theta)$. LDS only works in the transductive setting and the graph topology learned cannot be controlled due to the sampling strategy.
Graph Structure Learning for Robust Graph Neural Networks Wei Jin et al, KDD 2020. [PDF] [Author] [Code]
Notes: This paper proposes Property GNN (Pro-GNN) that explores graph properties of sparsity, low rank and feature smoothness to defend adversarial attacks. Pro-GNN simultaneously learns the clean graph structure from perturbed graph and GNN parameters to defend against adversarial attacks. This paper assumes that the graph structure has already been perturbed before training GNNs while the node features are not changed.
Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings. Yu Chen et al, NeurIPS 2020. [PDF] [Author] [Code]
Notes: This paper proposes Iterative Deep Graph Learning (IDGL) that learns graph structure and graph embedding simultaneously. The graph learning problem is transferred as a similarity metric learning problem and an adaptive graph regularization is leveraged (assume that the optimized graph structure is potentially a shift from the initial graph structure). IDGL adopts multi-head self-attention with $\epsilon$-neighborhood sparsification for graph construction. An Anchor Graph based version is also proposed and the corresponding node-anchor message passing strategy is provided. IDGL works on (semi-)supervised tasks and needs an initial $kNN$ graph construction.
Latent-Graph Learning for Disease Prediction. Luca Cosmo et al, MICCAI 2020. [PDF] [Author]
Notes: This paper proposes an end-to-end trainable graph learning architecture that automatically learns to predict an underlying patients-graph. The edge weight is learned through a sigmoid-like function with two trainable parameters. This method can work in inductive setting since it does not directly optimize a graph for a given population but rather learn a function that predicts the graph from input features. The graph learned is directly used only in a classification loss without any regularization. Besides, the global threshold and the Euclidean space embedding may not be necessarily optimal.
Graph-Revised Convolutional Network. Donghan Yu et al, ECML PKDD 2020. [PDF] [Author] [Code]
Notes: This paper proposes Graph-Revised Convolutional Network (GRCN), where a GCN-based graph revision module is introduced for predicting missing edges and revising edge weights w.r.t. downstream tasks via joint optimization. The similarity graph is calculated based on node embedding using certain kernel function (specifically, using dot product in their implementation for simplicity). The Representer Theorem is provided to show that, under certain conditions, the optimal regression function can be expressed as a linear combination of kernel functions defined on training samples. Compared with the graph revision in GAT and GLCN which use entrywise product, GRCN adopts the entrywise addition operator in order for new edges to be considered. A graph sparsification process is also proposed and the gradients will only backpropagate through the top-$K$ values. In GRCN, an initial graph is required.
SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks. Bahare Fatemi et al, NeurIPS 2021. [PDF] [Author] [Code]
Notes: This paper proposes Simultaneous Learning of Adjacency and GNN Parameters with Self-supervision (SLAPS) for semi-supervised classification, which provides more supervision for inferring a graph structure through self-supervision. The authors also identify a Supervision Starvation problem in latent graph learning: the edges between pairs of nodes that are far from labeled nodes receive insufficient supervision. To solve this, a multi-task learning framework is designed by supplementing the classification task with a self-supervised task (which is based on the hypothesis that a graph structure is suitable for predicting the node feature is also suitable for predicting the node labels). Also refer to paper.
Graph Structure Estimation Neural Networks. Ruijia Wang et al, WWW 2021. [PDF] [Author] [Code]
Graph Structure Learning with Variational Information Bottleneck. Qingyun Sun et al, AAAI 2022. [PDF] [Author]
Notes: This paper proposes Variational Information Bottleneck guided Graph Structure Learning (VIB-GSL) that advances the Information Bottleneck principle for graph structure learning. Refer to paper.
Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space. Yaohua Wang et al, ICLR 2022. [PDF] [Author]
Notes: This paper proposes Ada-NETS for face clustering, in which each face is transformed to a new structure space and the similarity is calculated by a weighted combination between cosine similarity and Jaccard similarity. An adaptive neighbor discovery strategy based on $F_{\beta}$-score is also proposed to determine a proper number of edges connecting to each face image.
Robust Graph Structure Learning via Multiple Statistical Tests. Yaohua Wang et al, NeurIPS 2022. [PDF] [Author]
Notes: This paper proposes views the feature vector of each node as an independent sample, and make the decision of whether creating an edge between two nodes based on their similarity in feature representation by a single statistical test. Transformers based method is proposed, which contains the fourth-order statistics of features.
pyGSL: A Graph Structure Learning Toolkit. Max Wasserman et al, NeurIPS Workshop 2022. [PDF] [Author] [Code]
Notes: This paper introduce pyGSL, a Python library that provides efficient implementations of state-of-the-art graph structure learning models along with diverse datasets to evaluate them on. The resource is limited and the code repository is not well-developed.
Learning Continuous Graph Structure with Bilevel Programming for Graph Neural Networks. Minyang Hu et al, IJCAI 2022. [PDF] [Author]
Notes: This paper proposes to directly model the continuous graph structure with dual-normalization (using a symmetric normalization function), which implicitly imposes sparse constraint and reduces the influence of noisy edges. The whole learning process is formulated as a bilevel programming problem similar to Learning Discrete Structures but armed with an improved Neumann-IFT algorithm for optimization.
GPN: A Joint Structural Learning Framework for Graph Neural Networks. Qianggang Ding et al, AAAI 2022. [PDF] [Author]
Notes:
Towards Unsupervised Deep Graph Structure Learning. Yixin Liu et al, WWW 2022. [PDF] [Author] [Code]
Notes: This paper proposes an unsupervised learning paradigm for graph structure learning. The proposed method is highly related to SLAPS (paper). In SLAPS, the authors find that "Although SLAPS2s does not use the node labels in learning an adjacency matrix, it outperforms kNN-GCN (8.4% improvement when using an FP generator). With an FP generator, SLAPS2s even achieves competitive performance with SLAPS; this is mainly because FP does not leverage the supervision provided by GCNC toward learning generalizable patterns that can be used for nodes other than those in the training set."
Learning Graph Structure from Convolutional Mixtures. Max Wasserman et al, arXiv 2022. [PDF] [Author]
Self-organization Preserved Graph Structure Learning with Principle of Relevant Information. Qingyun Sun et al, arXiv 2022. [PDF] [Author]
Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting. Hongyuan Yu et al, arXiv 2022. [PDF] [Author]
DBGSL: Dynamic Brain Graph Structure Learning. Alexander Campbell et al, . [PDF] [Author]
Position-aware Structure Learning for Graph Topology-imbalance by Relieving Under-reaching and Over-squashing. Qingyun Sun et al, CIKM 2022. [PDF] [Author]
Semi-Supervised Clustering via Dynamic Graph Structure Learning. Huaming Ling et al, arXiv 2022. [PDF] [Author]
Boosting Graph Structure Learning with Dummy Nodes. Xin Liu et al, ICML 2022. [PDF] [Author]
NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification. Qitian Wu et al, NeurIPS 2022. [PDF] [Author] [Code]
Multi-view graph structure learning using subspace merging on Grassmann manifold. Razieh Ghiasi et al, Multimedia Tools and Applications 2022. [PDF] [Author]
ASGNN: Graph Neural Networks with Adaptive Structure. Zepeng Zhang et al, arXiv 2022. [PDF] [Author]
Self-organization Preserved Graph Structure Learning with Principle of Relevant Information. Qingyun Sun et al, AAAI 2023. [PDF] [Author]
Differentiable Graph Module (DGM) for Graph Convolutional Networks. Anees Kazi et al, IEEE TPAMI 2023. [PDF] [Author] [Code]
Notes: The current GNNs are often restricted to the transductive setting and rely on the assumption that underlying graph is known and fixed. This paper proposes Differentiable Graph Module (DGM) that infers the graph directly from the data. Specifically, DGM is a learnable function that predicts edge probabilities in the graph which are optimal for the downstream task. For discrete DGM, the authors construct a sparse $k$-degree graph by using the Gumbel-Top-$k$ trick to sample edges from the probabilities. The sampling scheme, however, does not allow the gradient of the downstream classification loss function to flow through the graph prediction branch. To solve this issue, a compound loss is designed which rewards edges involved in a correct classification and penalizes edges that led to misclassification. Latent Graph: the graph itself is not be explicitly given.
Latent Graph Inference using Product Manifolds. Haitz Saez de Oc ´ ariz Borde et al, ICLR 2023. [PDF] [Author]

GNNs+LMs

Efficient and effective training of language and graph neural network models * et al, .* [PDF] [Author]
Beyond Graph Neural Networks: A New Frontier for Large Language Models * et al, .* [PDF] [Author]
GNN-LM: Language Modeling based on Global Contexts via GNN * et al, .* [PDF] [Author]
GPT4Graph: Can Large Language Models Understand Graph Structured Data? An Empirical Evaluation and Benchmarking * et al, .* [PDF] [Author]
Disentangled Representation Learning with Large Language Models for Text-Attributed Graphs * et al, .* [PDF] [Author]
Empower Text-Attributed Graphs Learning with Large Language Models (LLMs) * et al, .* [PDF] [Author]
GraphGPT: Graph Instruction Tuning for Large Language Models * et al, .* [PDF] [Author]
Can LLMs Effectively Leverage Graph Structural Information: When and Why * et al, .* [PDF] [Author]

GNN Pruning

A Unified Lottery Ticket Hypothesis for Graph Neural Networks. Tianlong Chen et al, ICML 2021. [PDF] [Author]
Notes:
A Study on the Ramanujan Graph Property of Winning Lottery Tickets. Bithika Pal et al, ICML 2022. [PDF] [Author]
Rethinking Graph Lottery Tickets: Graph Sparsity Matters. Bo Hui et al, ICLR 2023. [PDF] [Author]
Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective. Kun Wang et al, ICLR 2023. [PDF] [Author]
You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets. Tianjin Huang et al, LoG 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Self-Supervised GNN

Graph Contrastive Learning with Augmentations. Yuning You et al, NeurIPS 2020. [PDF] [Author]
When Does Self-Supervision Help Graph Convolutional Networks? Yuning You et al, ICML 2020. [PDF] [Author]
Self-Supervised Representation Learning via Latent Graph Prediction. Yaochen Xie et al, ICML 2022. [PDF] [Author]
Notes: This paper proposes LaGraph, a predictive SSL framework for representation learning of graph data, based on self-supervised latent graph prediction. It makes two assumptions: a. the observed feature vector of each node in an observed graph is independently generated from a certain distribution conditioned on the corresponding latent graph; b. the conditional distribution of the observed graph is centered at the latent graph.
Automated Self-Supervised Learning for Graphs. Wei Jin et al, ICLR 2022. [PDF] [Author]
Uncovering the Structural Fairness in Graph Contrastive Learning. Ruijia Wang et al, NeurIPS 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

GNN Pre-Training

Strategies for Pre-training Graph Neural Networks. Weihua Hu et al, ICLR 2020. [PDF] [Author]
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. Jiezhong Qiu et al, KDD 2020. [PDF] [Author]

GNN Adversarial Attacks

Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective. Kaidi Xu et al, IJCAI 2019. [PDF] [Author]
Empowering Graph Representation Learning with Test-Time Graph Transformation. Wei Jin et al, ICLR 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Domain Adaptation

Graph Domain Adaptation via Theory-Grounded Spectral Regularization. Yuning You et al, ICLR 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Data Augmentation

Data Augmentation for Graph Neural Networks. Tong Zhao et al, AAAI 2021. [PDF] [Author]
Notes:
Local Augmentation for Graph Neural Networks. Songtao Liu et al, ICML 2022. [PDF] [Author]
Notes:
Graph Data Augmentation for Graph Machine Learning: A Survey. Tong Zhao et al, arXiv 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Generation (Graph Diffusion Model)

Fast Graph Generation via Spectral Diffusion. Tianze Luo et al, arXiv 2022. [PDF] [Author]
DiGress: Discrete Denoising diffusion for graph generation. Clement Vignac et al, ICLR 2023. [PDF] [Author]
Equivariant Diffusion for Molecule Generation in 3D. Emiel Hoogeboom et al, ICML 2022. [PDF] [Author]
Bipartite Graph Diffusion Model for Human Interaction Generation. Baptiste Chopin et al, arXiv 2023. [PDF] [Author]
Diffusion Probabilistic Models for Graph-Structured Prediction. Hyosoon Jang et al, arXiv 2023. [PDF] [Author] [Code]
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions. Emiel Hoogeboom et al, NeurIPS 2021. [PDF] [Author] [Code]
GMNN: Graph Markov Neural Networks. Meng Qu et al, ICML 2019. [PDF] [Author] [Code])
Diffusion Probabilistic Models for Structured Node Classification. Hyosoon Jang et al, ICMLW 2023. [PDF] [Author] [Code]
Structured Denoising Diffusion Models in Discrete State-Spaces. Jacob Austin et al, NeurIPS 2021. [PDF] [Author] [Code]
Notes: While $q(x_t|x_{t-1})$ can in theory be arbitrary, efficient training of $p_{\theta}$ is possible when $q(x_t|x_{t-1})$:
$\quad$ a. Permits efficient sampling of $x_t$ from $q(x_t|x_0)$ for an arbitrary time $t$.
$\quad$ b. Has a tractable expression for the forward process posterior $q(x_{t-1}|x_{t}, x_0)$.
DiGress: Discrete Denoising Diffusion for Graph Generation. Clement Vignac et al, ICLR 2023. [PDF] [Author] [Code]
Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling. Xiaohui Chen et al, ICML 2023. [PDF] [Author] [Code]
Updating.... * et al, .* [PDF] [Author]

Survey

A Survey on Deep Graph Generation: Methods and Applications. Yanqiao Zhu et al, LoG 2022. [PDF] [Author]
Generative Diffusion Models on Graphs: Methods and Applications. Wenqi Fan et al, arXiv 2023. [PDF] [Author]
A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material. Mengchun Zhang et al, arXiv 2023. [PDF] [Author]
A Systematic Survey on Deep Generative Models for Graph Generation. Xiaojie Guo et al, TPAMI 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Causality with Graph

Learning Causal Effects on Hypergraphs. Jing Ma et al, KDD 2022. [PDF] [Author]
Learning Causality with Graphs. Jing Ma et al, AI Magazine, 2022. [PDF] [Author]
CLEAR: Generative Counterfactual Explanations on Graphs. Jing Ma et al, NeurIPS 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Weisfeiler-Lehman Test

How Powerful are Graph Neural Networks? Keyulu Xu et al, ICLR 2019. [PDF] [Author]
Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. Pan Li et al, NeurIPS 2020. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Graph Information Bottleneck

Graph Information Bottleneck. Tailin Wu et al, NeurIPS 2020. [PDF] [Author] [Project]
Notes: This paper proposes Graph Information Bottleneck (GIB), an information-theoretic principle inherited from Information Bottleneck (IB), adapted for representation learning on graph-structured data. IB provides a critical principle for representation learning: an optimal representation should contain the minimal sufficient information for the downstream task. Based on this, GIB aims to extract information from both the graph structure and node features and further encourages the information in learned representation to be both minimal ans sufficient. The authors further propose a variational upper bound for constraining the information from the node features and graph structure, and a variational lower bound for maximizing the information in the representation to predict the target. The i.i.d. assumption of data points is typically used to derive variational bounds and make accurate estimation of those bounds to learning IB-based models. However, node features of graph-structured data may be correlated. Local-Dependence assumption: given the data related to the neighbors within a certain number of hops of a node $v$, the data in the rest of the graph will be independent of $v$.
Recognizing Predictive Substructures with Subgraph Information Bottleneck. Junchi Yu et al, TPAMI 2021. [PDF] [Author]
Graph Information Bottleneck for Subgraph Recognition. Junchi Yu et al, ICLR 2021. [PDF] [Author]
Improving Subgraph Recognition with Variational Graph Information Bottleneck. Junchi Yu et al, CVPR 2022. [PDF] [Author]
Heterogeneous Graph Information Bottleneck. Liang Yang et al, IJCAI 2021. [PDF] [Author]
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. Fan-Yun Sun et al, ICLR 2020. [PDF] [Author]

Deeper GNN

Towards Deeper Graph Neural Networks. Meng Liu et al, KDD 2020. [PDF] [Author]
Notes:

Graph OOD Generalization

Mind the Label Shift of Augmentation-based Graph OOD Generalization. Junchi Yu et al, CVPR 2023. [PDF] [Author]

Few-Shot Learning on Graph

Few-Shot Learning on Graphs. Chuxu Zhang et al, arXiv 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Heterogeneous Graph Neural Network

Heterogeneous Graph Neural Network. Chuxu Zhang et al, KDD 2019. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Survey

Geometric Deep Learning: Going beyond Euclidean Data. Michael Bronstein et al, IEEE SPM 2017. [PDF] [Author]
A Comprehensive Survey on Graph Neural Networks. Zonghan Wu et al, IEEE TNNLS 2021. [PDF] [Author]
Self-Supervised Learning of Graph Neural Networks: A Unified Review. Yaochen Xie et al, IEEE TPAMI 2023. [PDF] [Author]
A Survey on Graph Structure Learning: Progress and Opportunities. Yanqiao Zhu et al, arXiv 2022. [PDF] [Author]

Visual Question Answering [Back to Top]

VQA: Visual Question Answering. Stanislaw Antol et al, ICCV 2015. [PDF] [Author]
Hierarchical Question-Image Co-Attention for Visual Question Answering. Jiasen Lu et al, NIPS 2016. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Implicit Neural Representations [Back to Top]

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Ben Mildenhall et al, ECCV 2020. [PDF] [Author]
Implicit Neural Representations with Periodic Activation Functions. Vincent Sitzmann et al, NeurIPS 2020. [PDF] [Author] [Code] [Project]
Notes: Most previous implicit neural representations constructed on ReLU-based MLPs lack the capacity to represent fine details in the underlying signals, and typically do not represent the derivatives of a target signal well. This is partly due to the fact that ReLU networks are piecewise linear, their second derivative is zero everywhere, and they are thus incapable of modeling information contained in higher-order derivatives of natural signals. To tackle these problems, this paper proposes Sinusoidal Representation Networks (SIRENs) that leverages periodic activation functions for implicit neural representations. Given a class of functions $\Phi$ that satisfy equations of the form: $$\mathcal{C}(x, \Phi, \nabla_{x} \Phi, \nabla_{x}^{2} \Phi, ...)$$in this implicit problem formulation, a functional $\mathcal{C}$ takes as input the spatio-temporal coordinates $x \in \mathbb{R}^m$ and, optional, derivatives of $\Phi$ with respect to these coordinates. The goal is to learn a neural network that parameterizes $\Phi$ to map $x$ to some quantity of interest while satisfying the constraint presented in the above equation. $\Phi$ is implicitly defined by the relation modeled by $\mathcal{C}$ and we refer to $\Phi$ as implicit neural representations.
Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. Sida Peng et al, CVPR 2021. [PDF] [Author]
Learning Continuous Image Representation with Local Implicit Image Function. Yinbo Chen et al, CVPR 2021. [PDF] [Author]
Implicit Neural Representations with Structured Latent Codes for Human Body Modeling. Sida Peng et al, TPAMI 2023. [PDF] [Author]
Generalised Implicit Neural Representations. Daniele Grattarola et al, NeurIPS 2022. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Deep Generative Models [Back to Top]

Survey

Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models. Sam Bond-Taylor et al, TPAMI 2022. [PDF] [Author]

Variational Autoencoders

Auto-Encoding Variational Bayes. Diederik P. Kingma et al, arXiv 2013. [PDF] [Author]
An Introduction to Variational Autoencoders. Diederik P. Kingma et al, arXiv 2013. [PDF] [Author]
Improved Variational Inference with Inverse Autoregressive Flows. Diederik P. Kingma et al, NIPS 2016. [PDF] [Author] [Code]
Ladder Variational Autoencoders. Casper Kaae Sønderby et al, NIPS 2016. [PDF] [Author]
Neural Discrete Representation Learning. Aaron van den Oord et al, NIPS 2017. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Generative Adversarial Networks

Generative Adversarial Nets. Ian J. Goodfellow et al, NeurIPS 2014. [PDF] [Author] [Code]
Conditional Generative Adversarial Nets. Mehdi Mirza et al, arXiv 2014. [PDF] [Author]
Coupled Generative Adversarial Networks. Ming-Yu Liu et al, NeurIPS 2016. [PDF] [Author] [Code]
Alias-Free Generative Adversarial Networks. Tero Karras et al, NeurIPS 2021. [PDF] [Author]
Image-to-Image Translation with Conditional Adversarial Networks. Phillip Isola et al, CVPR 2017. [PDF] [Author] [Code] [Website]
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Jun-Yan Zhu et al, ICCV 2017. [PDF] [Author] [Code] [Website]
Updating.... * et al, .* [PDF] [Author]

Normalizing Flows

Updating.... * et al, .* [PDF] [Author]

Autoregressive Models

Updating.... * et al, .* [PDF] [Author]

Diffusion Models

Deep Unsupervised Learning using Nonequilibrium Thermodynamics. Jascha Sohl-Dickstein et al, ICML 2015. [PDF] [Author]
Denoising Diffusion Probabilistic Models. Jonathan Ho et al, NeurIPS 2020. [PDF] [Author] [Code]
Variational Diffusion Models. Diederik P. Kingma et al, NeurIPS 2021. [PDF] [Author]
Denoising Diffusion Implicit Models. Jiaming Song et al, ICLR 2021. [PDF] [Author] [Code]
Improved Denoising Diffusion Probabilistic Models. Alex Nichol et al, ICML 2021. [PDF] [Author] [Code]
Diffusion Models Beat GANs on Image Synthesis. Prafulla Dhariwal et al, NeurIPS 2021. [PDF] [Author]
Classifier-Free Diffusion Guidance. Jonathan Ho et al, arXiv 2022. [PDF] [Author]
Progressive Distillation for Fast Sampling of Diffusion Models. Tim Salimans et al, ICLR 2022. [PDF] [Author] [Code]
GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation. Minkai Xu et al, ICLR 2022. [PDF] [Author]
High-Resolution Image Synthesis with Latent Diffusion Models. Robin Rombach et al, CVPR 2022. [PDF] [Author] [Code1] [Code2]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Chitwan Saharia et al, NeurIPS 2022. [PDF] [Author]
Hierarchical Text-Conditional Image Generation with CLIP Latents. Aditya Ramesh et al, arXiv 2022. [PDF] [Author]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. Alex Nichol et al, 2022 arXiv. [PDF] [Author]
Elucidating the Design Space of Diffusion-Based Generative Models. Tero Karras et al, NeurIPS 2022. [PDF] [Author]
Understanding Diffusion Models: A Unified Perspective. Calvin Luo et al, arXiv 2022. [PDF] [Author]
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. Cheng Lu et al, NeurIPS 2022. [PDF] [Author] [Code]
Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models. Fan Bao et al, ICLR 2022. [PDF] [Author]
On Distillation of Guided Diffusion Models. Chenlin Meng et al, CVPR 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Diffusion Pruning

Structural Pruning for Diffusion Models. Gongfan Fang et al, arXiv 2023. [PDF] [Author] [Code]

Survey

Diffusion Models in Vision: A Survey. Florinel-Alin Croitoru et al, TPAMI 2022. [PDF] [Author]

Score-based Generative Models

Bayesian Learning via Stochastic Gradient Langevin Dynamics. Max Welling et al, ICML 2011. [PDF] [Author]
Generative Modeling by Estimating Gradients of the Data Distribution. Yang Song et al, NeurIPS 2019. [PDF] [Author]
Improved Techniques for Training Score-Based Generative Models. Yang Song et al, NeurIPS 2020. [PDF] [Author]
Score-Based Generative Modeling through Stochastic Differential Equations. Yang Song et al, ICLR 2021. [PDF] [Author] [Code]
How to Train Your Energy-Based Models. Yang Song et al, arXiv 2021. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Network Compression [Back to Top]

Pruning

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. Jonathan Frankle et al, ICLR 2019. [PDF] [Author]
Notes: This paper proposes the Lottery Ticket Hypothesis (LTH): A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations. Before this paper, contemporary experience is that the architectures uncovered by pruning are harder to train from the start, reaching lower accuracy than the original networks. This paper finds that when the parameters of LTH are randomly reinitialized, the wining tickets no longer match the performance of the original network, offering evidence that these smaller networks do not train effectively unless they are appropriately initialized. In another word, when randomly reinitialized, winning tickets perform far worse, meaning structure alone cannot explain a winning ticket's success. The authors identify a winning ticket by training a network and pruning its smallest-magnitude weights, and then each unpruned connections' value is then reset to its initialization from original network before it was trained. The pruning process can be one-shot, but iterative pruning shows better performance.The Lottery Ticket Conjecture is that Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket. A limitation is that on deeper networks (ResNet-18 and VGG-19), iterative pruning is unable to find winning tickets unless train the networks with learning rate warmup.
Rethinking the Value of Network Pruning. Zhuang Liu et al, ICLR 2019. [PDF] [Author] [Code]
Auto Graph Encoder-Decoder for Neural Network Pruning. Sixing Yu et al, ICCV 2021. [PDF] [Author]
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models. Tianlong Chen et al, CVPR 2021. [PDF] [Author]
The Lottery Ticket Hypothesis for Vision Transformers. Xuan Shen et al, AAAI 2022. [PDF] [Author]
Dual Lottery Ticket Hypothesis. Yue Bai et al, ICLR 2022. [PDF] [Author]
Notes: This paper proposes the Dual Lottery Ticket Hypothesis (DLTH): A randomly selected subnetwork from a randomly initialized dense network can be transformed into a trainable condition, where the transformed subnetwork can be trained in isolation and achieve better at least comparable performance to LTH and other strong baselines. LTH can be seen as finding structure according to weights, because it prune the pretrained network to find mask using weight magnitude ranking; DLTH can be seen as finding weights based on a given structure, because it transforms weights for a randomly selected sparse network. To substantiate DLTH, a Random Sparse Network Transformation (RST) is proposed, which adopts a regularization term to borrow learning capacity and realize information extrusion from the weights that will be masked.
Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning. Sixing Yu et al, ICML 2022. [PDF] [Author] [Code]
Updating.... * et al, .* [PDF] [Author]

Knowledge Distillation

Updating.... * et al, .* [PDF] [Author]

Network Quantization

Training Binary Neural Networks through Learning with Noisy Supervision. Kai Han et al, ICML 2020. [PDF] [Author] [Code]
Updating.... * et al, .* [PDF] [Author]

Low-Rank Factorization

Updating.... * et al, .* [PDF] [Author]

Survey

Recent Advances on Neural Network Pruning at Initialization. Huan Wang et al, IJCAI 2022. [PDF] [Author]
Notes: This is the first survey on pruning at initialization.

Dataset Distillation [Back to Top]

Dataset Distillation. Tongzhou Wang et al, arXiv 2018. [PDF] [Author] [Code] [Website]
Generalizing Dataset Distillation via Deep Generative Prior. George Cazenavette et al, CVPR 2023. [PDF] [Author]

Survey

Dataset Distillation: A Comprehensive Review. Ruonan Yu et al, arXiv 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Learning with Label Noise [Back to Top]

Statistically Consistent Classifiers

Classification with Noisy Labels by Importance Reweighting. Tongliang Liu et al, TPAMI 2015. [PDF] [Author]
Are Anchor Points Really Indispensable in Label-Noise Learning?. Xiaobo Xia et al, NeurIPS 2019. [PDF] [Author] [Code]
Part-dependent Label Noise: Towards Instance-dependent Label Noise. Xiaobo Xia et al, NeurIPS 2020. [PDF] [Author] [Code]
Updating.... * et al, .* [PDF] [Author]

Statistically Inconsistent Classifiers

Updating.... * et al, .* [PDF] [Author]

Contrastive Learning [Back to Top]

A Simple Framework for Contrastive Learning of Visual Representations. Ting Chen et al, ICML 2020. [PDF] [Author]
Notes:
Updating.... * et al, .* [PDF] [Author]

Low-Level Vision [Back to Top]

High Dynamic Range Imaging

Ghost-free High Dynamic Range Imaging with Context-Aware Transformer. Zhen Liu et al, ECCV 2022. [PDF] [Author]
Notes: This is the first work that introduces Transformer for HDR imaging.
Updating.... * et al, .* [PDF] [Author]

Survey

Deep Learning for HDR Imaging: State-of-the-Art and Future Trends. Lin Wang et al, IEEE TPAMI 2021. [PDF] [Author]

Image Super-Resolution

Cross-Scale Internal Graph Neural Network for Image Super-Resolution. Shangchen Zhou et al, NeurIPS 2020. [PDF] [Author]

Image Low-Light Enhancement

Updating.... * et al, .* [PDF] [Author]

Vision Language Pretraining [Back to Top]

Learning Transferable Visual Models From Natural Language Supervision. Alec Radford et al, ICML 2021. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Point Cloud [Back to Top]

Dynamic Graph CNN for Learning on Point Clouds. Yue Wang et al, ACM TOG 2019. [PDF] [Author]
Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling. Jiancheng Yang et al, CVPR 2019. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Causal Inference [Back to Top]

Advances in Variational Inference. Cheng Zhang et al, TPAMI 2019. [PDF] [Author]
A Causal View on Robustness of Neural Networks. Cheng Zhang et al, NeurIPS 2020. [PDF] [Author]
Relating Graph Neural Networks to Structural Causal Models. Matej Zečević* et al, arXiv 2021.* [PDF] [Author]
Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure. Shaohua Fan et al, NeurIPS 2022. [PDF] [Author]
The Causal Structure of Domain Invariant Supervised Representation Learning. Zihao Wang et al, arXiv 2023. [PDF] [Author]
Generative Causal Explanations for Graph Neural Networks. Wanyu Lin et al, ICML 2021. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Natural Language rocessing [Back to Top]

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Pengfei Liu et al, 2022 ACM Computing Surveys. [PDF] [Author]
Sparks of Artificial General Intelligence: Early experiments with GPT-4. S´ebastien Bubeck et al, arXiv 2023. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

AI for Sciences [Back to Top]

Scientific discovery in the age of artificial intelligence. Hanchen Wang et al, 2023 Nature. [PDF] [Author]

Drug-Target Binding

DeepDTA: deep drug–target binding affinity prediction. Hakime Öztürk et al, Bioinformatics 2018. [PDF] [Author] [Code]
PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction . Qingyuan Feng et al, arXiv 2018. [PDF] [Author]
WideDTA: prediction of drug-target binding affinity. Hakime Öztürk et al, arXiv 2019. [PDF] [Author]
AttentionDTA: prediction of drug–target binding affinity using attention model. Qichang Zhao et al, BIBM 2019. [PDF] [Author]
Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities. Maha A. Thafar et al, Frontiers in chemistry 2019. [PDF] [Author]
DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Mostafa Karimi et al, Bioinformatics 2019. [PDF] [Author]
GANsDTA: Predicting Drug-Target Binding Affinity Using GANs. Lingling Zhao et al, Frontiers in Genetics 2020. [PDF] [Author]
DeepPurpose: a deep learning library for drug–target interaction prediction. Kexin Huang et al, Bioinformatics 2020. [PDF] [Author] [Code]
GraphDTA: predicting drug–target binding affinity with graph neural networks. Thin Nguyen et al, Bioinformatics 2021. [PDF] [Author] [Code]
Prediction of drug–target binding afnity using similarity‑based convolutional neural network. Jooyong Shim et al, Scientific Reports 2021. [PDF] [Author]
Deep drug-target binding affinity prediction with multiple attention blocks. Yuni Zeng et al, Briefings in Bioinformatics 2021. [PDF] [Author]
NerLTR-DTA: drug–target binding affinity prediction based on neighbor relationship and learning to rank. Xiaoqing Ru et al, Bioinformatics 2022. [PDF] [Author]
Hierarchical graph representation learning for the prediction of drug-target binding affinity. Zhaoyang Chu et al, Information Sciences 2022. [PDF] [Author]
Co-VAE: Drug-Target Binding Affinity Prediction by Co-Regularized Variational Autoencoders. Tianjiao Li et al, TPAMI 2022. [PDF] [Author]
Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Maha A. Thafar et al, Scientific Reports 2022. [PDF] [Author]
DTITR: End-to-end drug–target binding affinity prediction with transformers. Nelson R. C. Monteiro et al, Computers in Biology and Medicine 2022. [PDF] [Author]
MGraphDTA: deep multiscale graph neural network for explainable drug–target binding affinity prediction. Ziduo Yang et al, 2022. [PDF] [Author]
Improving the generalizability of protein-ligand binding predictions with AI-Bind . Ayan Chatterjee et al, Nature Communications 2023. [PDF] [Author] [Code]
. * et al, .* [PDF] [Author]

Others [Back to Top]

Variational Inference

Variational Inference: A Review for Statisticians. David Blei et al, arXiv 2018. [PDF] [Author]

Procrustes Problem

A Generalized Solution of the Orthogonal Procrustes Problem. Peter H. Schönemann, Psychometrika 1966. [PDF] [Author]
Notes: This is a classical paper that proposes a generalized solution to the Orthogonal Procrustes Problem, which is applicable to the case where the matrices involved are of less than full column rank.
Generalized Embedding Regression: A Framework for Supervised Feature Extraction. Jianglin Lu et al, IEEE TNNLS 2022. [PDF]
Notes: My first-author paper attempts to unify previous hand-crafted feature extraction methods in a Generalized Embedding Regression (GER) framework. Based on GER, a new supervised feature extraction method is further proposed, which adopts the penalty graph Laplacian as the constraint matrix of a generalized orthogonal constraint. We theoretically demonstrate that the resulted optimization subproblem is intrinsically an unbalanced Procrustes problem, and elaborately design an iterative algorithm to solve it with convergence guarantee. Although the topic is somewhat out-of-date, the optimization makes me excited.

Label Propagation

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin Zhu, ICML 2003. [PDF] [Author]
Notes:
Label Propagation Through Linear Neighborhoods Fei Wang, ICML 2006. [PDF] [Author]
Notes:
Label Propagation with Weak Supervision , ICLR 2023. [PDF] [Author]
Notes:

CUR Decomposition

Optimal CUR Matrix Decompositions. Christos Boutsidis et al, SIAM 2017. [PDF] [Author]
Joint Active Learning with Feature Selection via CUR Matrix Decomposition. Changsheng Li et al, IEEE TPAMI 2019. [PDF] [Author]
Notes: This work performs sample selection and feature selection simultaneously based on CUR decomposition.
Robust CUR Decomposition: Theory and Imaging Applications. HanQin Cai et al, SIAM 2021. [PDF] [Author]
Notes: This paper considers the use of Robust PCA in a CUR decomposition framework.

Matrix Completion

Speedup Matrix Completion with Side Information: Application to Multi-Label Learning. Miao Xu et al, NIPS 2013. [PDF] [Author]
Notes: This paper explicitly explores the side information of data for matrix completion, with which the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced from $O(n \ln^2 n)$ to $O(\ln n)$.
Matrix Completion on Graphs. Vassilis Kalofolias et al, arXiv 2014. [PDF] [Author]
Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. Federico Monti et al, NeurIPS 2017. [PDF] [Author]
Graph Convolutional Matrix Completion. Rianne van den Berg et al, KDD 2018. [PDF] [Author]
Notes: This paper considers matrix completion for recommender systems from the point of view of link prediction on graphs.
Inductive Matrix Completion Based on Graph Neural Networks. Muhan Zhang et al, ICLR 2020. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Optimization Methods

Optimization Methods for Large-Scale Machine Learning. L ́eon Bottou et al, SIAM 2018. [PDF] [Author]
Notes: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications.
Updating.... * et al, .* [PDF] [Author]

Bilevel Programming

Smooth Bilevel Programming for Sparse Regularization. Clarice Poon et al, NeurIPS 2021. [PDF] [Author]
Bilevel Optimization Portal. [Website]

Probably Approximately Correct (PAC) Learning

Updating.... * et al, .* [PDF] [Author]

Information Theory

Deep Variational Information Bottleneck. Alexander A. Alemi et al, ICLR 2017. [PDF] [Author]
Visual Information Theory. Christopher Olah et al, Blog 2015. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Probability Theory

Probability Theory: The Logic of Science. E. T. Jaynes et al, Book. [PDF] [Author]

Gumbel-Max Trick

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning. Iris A. M. Huijben et al, TPAMI 2023. [PDF] [Author]
Categorical Reparameterization with Gumbel-Softmax. Eric Jang et al, ICLR 2017. [PDF] [Author]
Updating.... * et al, .* [PDF] [Author]

Stochastic Differential Equations

An Intuitive Introduction For Understanding and Solving Stochastic Differential Equations. Chris Rackauckas et al, 2014. [PDF] [Author]

Quantum Computing

My Personal Learning Notes on Quantum Computing. Jianglin Lu. [PDF]

Learning Sources [Back to Top]

UvA Deep Learning Tutorials. [Website]
PyTorch Image Models (timm) Documentation [Website]
PyTorch Geometric (PyG) Documentation [Website]
Deep Graph Library (DGL) Tutorials and Documentation [Website]
PyTorch Lightning Documentation [Website]
KeOps Documentation [Website]
Qiskit Machine Learning Documentation [Website]
Interpretable Machine Learning [Website]

Awesome papers in the fields of computer vision, machine learning, pattern recognition, data mining, and natural language processing.

Contents

Manifold Learning [Back to Top]

Nonlinear Dimensionality Reduction

Subspace Learning

Graph Construction

Sparse Representation [Back to Top]

Resources

Low-Rank Representation [Back to Top]

Clustering [Back to Top]

Shallow Clustering

Deep Clustering

Learning to Hash [Back to Top]

Shallow Hashing

Deep Hashing

Survey

Domain Adaptation [Back to Top]

Shallow Domain Adaptation

Deep Domain Adaptation

Survey

Neural Tangent Kernel [Back to Top]

Convolutional Neural Network [Back to Top]

Transformers [Back to Top]

Vision Transformers

Graph Transformers

Survey

Graph Neural Network [Back to Top]

Spectral-based GNN

Spatial-based GNN

Graph Pooling

Graph Structure Learning (AKA, Latent Graph Learning)

GNNs+LMs

GNN Pruning

Self-Supervised GNN

GNN Pre-Training

GNN Adversarial Attacks

Graph Domain Adaptation

Graph Data Augmentation

Graph Generation (Graph Diffusion Model)

Survey

Causality with Graph

Weisfeiler-Lehman Test

Graph Information Bottleneck

Deeper GNN

Graph OOD Generalization

Few-Shot Learning on Graph

Heterogeneous Graph Neural Network

Survey

Visual Question Answering [Back to Top]

Implicit Neural Representations [Back to Top]

Deep Generative Models [Back to Top]

Survey

Variational Autoencoders

Generative Adversarial Networks

Normalizing Flows

Autoregressive Models

Diffusion Models

Diffusion Pruning

Survey

Score-based Generative Models

Network Compression [Back to Top]

Pruning

Knowledge Distillation

Network Quantization

Low-Rank Factorization

Survey

Dataset Distillation [Back to Top]

Survey

Learning with Label Noise [Back to Top]

Statistically Consistent Classifiers

Statistically Inconsistent Classifiers

Contrastive Learning [Back to Top]

Low-Level Vision [Back to Top]

High Dynamic Range Imaging

Survey

Image Super-Resolution

Image Low-Light Enhancement

Vision Language Pretraining [Back to Top]

Point Cloud [Back to Top]

Causal Inference [Back to Top]