YuheD/awesome-performance-evaluation

A collection of papers in performance evaluation.

Involved Topics:

Transferability Estimation
Model/Dataset Vectorization
Model/Algorithm/Representation Evaluation
Generalization Gap Prediction
Out-of-distribution Error Prediction
Accuracy Prediction
Model Validation
Calibration Error Prediction
Confidence Calibration

Survey

A Survey on Evaluation of Out-of-Distribution Generalization [Paper]
Which Model to Transfer? A Survey on Transferability Estimation [Paper]
A Survey of Language Model Confidence Estimation and Calibration [Paper]
Calibration of Neural Networks [Paper]

2024

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress [Paper]
Energy-based Automated Model Evaluation [Paper]
Rethinking The Uniformity Metric in Self-Supervised Learning [Paper]
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning [Paper]
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [TPAMI] [Code]
- Conference ver. : Rethinking Confidence Calibration for Failure Prediction [ECCV22]
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets [Paper]
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks [Paper]
Online GNN Evaluation Under Test-time Graph Distribution Shifts [ICLR]

2023

Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples [ArXiv]
K-Means Clustering Based Feature Consistency Alignment for Label-Free Model Evaluation [CVPR Workshop]
Predicting Out-of-Domain Generalization with Neighborhood Invariance [TMLR]
Test Accuracy vs. Generalization Gap: Model Selection in NLP without Accessing Training or Testing Data [SIGKDD]
Analysis of Task Transferability in Large Pre-trained Classifiers [Under Review]
A Bag-of-Prototypes Representation for Dataset-Level Applications [CVPR]
DataMap: Dataset transferability map for medical image classification [PR]
To transfer or not transfer: Unified transferability metric and analysis [ArXiv]
Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [Artificial Intelligence in Medicine Xiv]
Identification of Negative Transfers in Multitask Learning Using Surrogate Models [TMLRArXiv]
Quantifying the impact of data characteristics on the transferability of sleep stage scoring models [AIM]
Model selection, adaptation, and combination for transfer learning in wind and photovoltaic power forecasts [Energy and AI]
Identifying Useful Learnwares for Heterogeneous Label Spaces [ICML]
Transferability prediction among classification and regression tasks using optimal transport [Multimedia Tools and Applications]
Choosing public datasets for private machine learning via gradient subspace distance[Paper]
Learning to Predict Task Transferability via Soft Prompt[Paper]
Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes[Paper]
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
How to Estimate Model Transferability of Pre-Trained Speech Models? [InterSpeech]
TaskWeb: Selecting Better Source Tasks for Multi-task NLP[[Paper]](https://arxiv.org/abs/2305.13256
Feasibility and Transferability of Transfer Learning: A Mathematical Framework [ArXiv]
Topological Vanilla Transfer Learning [Paper]
Model Spider: Learning to Rank Pre-Trained Models Efficiently [Arxiv]
Towards Estimating Transferability using Hard Subsets [ArXiv]
Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation [MICCAI]
Simple Transferability Estimation for Regression Tasks [UAI]
Transferability Metrics for Object Detection [ArXiv]
Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance[ArXiv]
ETran: Energy-Based Transferability Estimation [ICCV]
How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Transferability [ICCV]
Exploring Model Transferability through the Lens of Potential Energy[ICCV]
Unleashing the power of Neural Collapse for Transferability Estimation [ArXiv]
Foundation Model is Efficient Multimodal Mltitask Model Selector [ArXiv]
?[multi-model] Towards Robust Multi-Modal Reasoning via Model Selection [ArXiv]
Graph-based fine-grained model selection for multi-source domain [PAA]
Guided Recommendation for Model Fine-Tuning [CVPR]
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [ArXiv]
Estimating the Transfer Learning Ability of a Deep Neural Networks by Means of Representations [NCMLCR]
Source Selection based on Diversity for Machine Learning [Patent ]
Efficient Prediction of Model Transferability in Semantic Segmentation Tasks [ICIP]
The Performance of Transferability Metrics Does Not Translate to Medical Tasks [MICCAI workshop]
How to Estimate Model Transferability of Pre-Trained Speech Models? [Interspeech]
How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey [ArXiv]
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach [ICCV]
Guided recommendation for model fine-tuning[Paper]
LOVM: Language-Only Vision Model Selection [NeurIPSW]
RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank [ICML Oral]
T-Measure: A Measure for Model Transferabilty [Under Review]
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [TMLR]
Domain Adaptation for Network Performance Modeling with and without Labeled Data [NOMS]
Content-Based Search for Deep Generative Models [ArXiv]
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels [NeurIPS]
Learning inter-task transferability in the absence of target task samples[Paper]
Model selection for cross-lingual transfer[Paper]
ModelGiF: Gradient Fields for Model Functional Distance[Paper]
Predicting Out-of-Distribution Error with Confidence Optimal Transport [Paper]
Gnnevaluator: Evaluating gnn performance on unseen graphs without labels [NeurIPS]
Came: Contrastive automated model evaluation [ICCV]
On the Importance of Feature Separability in Predicting Out-Of-Distribution Error [NeurIPS]
Characterizing out-of-distribution error via optimal transport [NeurIPS]
What can we Learn by Predicting Accuracy? [WACV]
Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis [Paper]

2022

On the Relationship Between Explanation and Prediction: A Causal View [ArXiv]
The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs [AIR]
Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy [ISITA]
Transferability Estimation Based On Principal Gradient Expectation [ArXiv]
Transferability-Guided Cross-Domain Cross-Task Transfer Learning [ArXiv]
Wasserstein Task Embedding for Measuring Task Similarities [ArXiv][Code]
Efficiently tuned parameters are task embeddings[Paper]
Fisher task distance and its application in neural architecture search[Paper]
Leveraging task transferability to meta-learning for clinical section classification with limited data[Paper]
Transferability Between Regression Tasks[Paper]
Dataset2vec: Learning dataset meta-features[Paper]
CogTaskonomy: Cognitively Inspired Task Taxonomy Is Beneficial to Transfer Learning in NLP[ACL]
Exploring the role of task transferability in large-scale multi-task learning[Paper]
Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance [ECML PKDD]
Frustratingly Easy Transferability Estimation [ICML] [Slides]
Transferability Estimation Using Bhattacharyya Class Separability [CVPR]
Transferability Metrics for Selecting Source Model Ensembles [CVPR]
How stable are Transferability Metrics evaluations? [ECCV] [TensorFlow]
Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS] [Codes]
Neural Transferability: Current Pitfalls and Striving for Optimal Scores [Paper]
Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs [JMLR]
PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks [ECCV] [Codes]
Which Model to Transfer? Finding the Needle in the Growing Haystack [CVPR]
Evidence > Intuition: Transferability Estimation for Encoder Selection [EMNLP]
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space [ECCV]
Efficient Semantic Segmentation Backbone Evaluation for Unmanned Surface Vehicles based on Likelihood Distribution Estimation [MSN]
ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [NeurIPS]
Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning [NeurIPS]
Predicting Out-of-Distribution Error with the Projection Norm [paper]
Agreement-on-the-line: Predicting the performance of neural networks under distribution shift [NeurIPS]
Leveraging unlabeled data to predict out-of-distribution performance [Paper]
Estimating and Explaining Model Performance When Both Covariates and Labels Shift [NeurIPS]
Unsupervised and semi-supervised bias benchmarking in face recognition [ECCV]
On the strong correlation between model invariance and generalization [NeurIPS]
Active surrogate estimators: An active learning approach to label-efficient model evaluation [NeurIPS]
Predicting out-of-domain generalization with local manifold smoothness [Paper]

2021

A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning [NeurIPS]
Transferability Estimation for Semantic Segmentation Task []
OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations [CVPR] [Poster]
Practical Transferability Estimation for Image Classification Tasks [ArXiv]
What to pre-train on? efficient intermediate task selection [EMNLP]
Efficiently identifying task groupings for multi-task learning[NeurIPS]
The information complexity of learning tasks, their structure and their distance[Paper]
An information-geometric distance on the space of tasks](https://proceedings.mlr.press/v139/gao21a.html)
[ImageDataset2Vec: An image dataset embedding for algorithm selection[Paper]
Similarity of classification tasks[Paper]
Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification[Paper]
Analysis and Prediction of NLP models via Task Embeddings[Paper]
Inter-task similarity measure for heterogeneous tasks[Paper]
Ranking Neural Checkpoints [CVPR]
LogME: Practical Assessment of Pre-trained Models for Transfer Learning [ICML] [PyTorch]
Scalable Diverse Model Selection for Accessible Transfer Learning [NeurIPS] [PyTorch]
A linearized framework and a new benchmark for model selection for fine-tuning [ArXiv]
Are Labels Always Necessary for Classifier Accuracy Evaluation? [ICCV]
Predicting With Confidence on Unseen Distributions [ICCV]
What does rotation prediction tell us about classifier accuracy under varying testing environments?[ICML]
Detecting errors and estimating accuracy on unlabeled data with self-training ensembles[NeurIPS]
Ranking models in unlabeled new environments [ICCV]

2020

Duality diagram similarity: a generic framework for initialization selection in task transfer learning [ECCV]
Exploring and Predicting Transferability across NLP Tasks [EMNLP]
Geometric Dataset Distances via Optimal Transport [NeurIPS]
Similarity of neural networks with gradients[Paper]
Measuring and Harnessing Transference in Multi-Task Learning [Ar]
LEEP: A New Measure to Evaluate Transferability of Learned Representations [ICML] [Slides] [PyTorch]
Source Model Selection for Deep Learning in the Time Series Domain [IEEE Access]
[Ranking and rejecting of pre-trained deep neural networks in transfer learning based on separation index][ArXiv]
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability [Paper]
Predicting neural network accuracy from weight [Paper]
Computing the testing error without a testing set [CVPR]
Fantastic generalization measures and where to find them [ICLR]

2019

TASK2VEC: Task Embedding for Meta-Learning [ICCV]
Finding the Most Transferable Tasks for Brain Image Segmentation [BIBM]
aserstein Task Ebei for Measring Tas imilaitis [ArXiv] 0.17)
Zero-Shot Task Transfer
Transferability and Hardness of Supervised Classification Tasks [ICCV]
An informationtheoretic approach to transferability in task transfer learning [ICIP] [Codes]
Model reuse with reduced kernel mean embedding specification [ArXiv]
TASK2VEC: Task Embedding for Meta-Learning [ICCV]
Service Metric Prediction in Clouds using Transfer Learning [DiVA]
Predicting the Generalization Gap in Deep Networks with Margin Distributions [ICLR]

2018

Taskonomy: Disentangling Task Transfer Learning [CVPR Best Paper]
Dynamics and reachability of learning tasks[Paper]
Stronger generalization bounds for deep nets via a compression approach [ICML]

2017

Exploring generalization in deep learning [NeurIPS]
Estimating accuracy from unlabeled data: A probabilistic logic approach [NeurIPS]

2016

Learning to Select Pre-trained Deep Representations with Bayesian Evidence Framework [CVPR]
Learning with rejection [Paper]
Estimating accuracy from unlabeled data: A bayesian approach [ICML]

2004

Using model disagreement on unlabeled data to validate classification algorithms [NeurIPS]

YuheD / awesome-performance-evaluation

Involved Topics:

Survey

2024

2023

2022

2021

2020

2019

2018

2017

2016

2004

About