Paper Highlights
ICML 2018
Orals
- Transfer Learning via Learning to Transfer
- Asynchronous Decentralized Parallel Stochastic Gradient Descent
- D2:Decentralized Training over Decentralized Data
- Stochastic Variance-Reduced Cubic Regularized Newton Methods
- Communication-Computation Efficient Gradient Coding
- Fast Variance Reduction Method with Stochastic Batch Size
- A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates
- Compressing Neural Networks using the Variational Information Bottleneck
- Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization
- Improved Training of Generative Adversarial Networks Using Representative Features
- Knowledge Transfer with Jacobian Matching
- Differentially Private Identity and Equivalence Testing of Discrete Distributions
Parallel and Distributed Learning Session
- The Hidden Vulnerability of Distributed Learning in Byzantium
- Asynchronous Byzantine Machine Learning (the case of SGD)
- DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
ICLR 2016
Accept Papers
Need More reading
Need More reading
Famous Attention Transfer
- Learning to Optimize
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
Recent read list
- Online Bayesian Transfer Learning for Sequential Data Modeling
- Paleo: A Performance Model for Deep Neural Networks
- SGDR: Stochastic Gradient Descent with Warm Restarts
- Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
- Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
- Why Deep Neural Networks for Function Approximation?