SebastianGlavind / PhD-study

Code example from my PhD work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PhD study

This repository contains a set of toolboxes and code examples developed during my PhD study at Aalborg University. Its main purpose is to support my PhD thesis, but it also contains some tutorials developed for teaching on related topics. The links below display the tutorials via nbviewer to ensure a proper rendering of the formulas.

Bayesian networks

The following toolboxes and tutorials are implemented in R;

Toolboxes

The toolboxes are among others used in Glavind and Faber (2018), and Glavind and Faber (2020).

Structure learning

Parameter learning

Inference

Linear regression

The following tutorials are implemented in Python;

  • Linear regression. This tutorial introduces linear regression; first, from a maximum likelihood estimation (MLE) perspective, and second, from a Bayesian perspective. In both cases, the tutorial implements a selection of different learning algorithms.

  • Linear regression - assumptions and interpretations. This notebook considers and assesses the underlaying assumptions of linear regression in detail and discusses the interpretation of these models.

  • Bayesian linear regression with Stan. This tutorial shows how to implement Bayesian linear regression models using the probabilistic programming language Stan.

  • EM for Bayesian linear regression. This tutorial considers how the expectation maximization (EM) algorithm may be used to learn a parameter setting for a Bayesian linear regression model.

Bayesian hierarchical models

The following tutorials are implemented in Python;

  • Bayesian hierarchical models with Stan. This tutorial introduces how to implement Bayesian hierarchical regression models using the probabilistic programming language Stan by studying the fatigue data set in Glavind et al. (2020). Moreover, the concept of Bayesian model averaging is introduced as a means for making inferences for new out-of-sample fatigue sensitive details.

Gaussian processes

The following tutorials are implemented in Python;

Neural networks

The following tutorials are implemented in Python;

  • Neural network regression using keras and tensorflow. This tutorial introduces neural network regression with keras and tensorflow by considering the Boston housing data set; first, in a single-output setting; and second, in a multi-output setting. Finally, the tutorial considers hyperparameters tuning in general models using random search cross-validation.

  • Neural network classification using keras and tensorflow. This tutorial introduces neural network classification with keras and tensorflow by considering the Wine recognition data set. The tutorial first study how a neural network is implemented for classification tasks and then considers how to tune hyperparameters in general models using random search cross-validation.

Tree-based learners

The following tutorials are implemented in Python;

  • Gradient boosting regression using XGBoost. This tutorial introduces gradient boosting regression with XGBoost by considering the Boston housing data set. The tutorial first study how gradient boosting is implemented in a single-output setting as well as the effect of different data pre-processing steps. Then, it is shown how gradient boosting may be extended to a multi-output setting. Finally, the tutorial considers hyperparameters tuning in general models using Bayesian optimization with a Gaussian process prior based GPyOpt.

  • Gradient boosting classification using XGBoost. This tutorial introduces gradient boosting classification with XGBoost by considering the Wine recognition data set. The tutorial first study how gradient boosting is implemented as well as the effect of different data pre-processing steps. Then, the tutorial considers hyperparameters tuning in general models using Bayesian optimization with a Gaussian process prior based GPyOpt. Finally, the tutorial elaborates on the feature importance functionalities of XGBoost.

Gaussian mixture models

The following tutorials are implemented in Python;

  • EM for Gaussian mixtures. This tutorial considers how Gaussian mixture models may be used for cluster analysis; it implements the expectation maximization (EM) learning algorithm, and introduces the evidence lower bound, as well as the Bayesian information criterion (BIC) and the integrated complete-data likelihood (ICL), for model selection.

Algorithms for optimization

The following tutorials are implemented in Python;

  • Deterministic algorithms for unconstrained, continuous-valued optimization. This tutorial considers set of local derivative-based optimization algorithms for unconstrained, continuous-valued optimization. The algorithms covered are first-order methods, i.e., gradient decent and its variations (e.g., conjugate gradient decent and Adam), and second-order methods, i.e. Newton's method and quasi-Newton methods (DFP and BFGS).

  • Stochastic algorithms for unconstrained, continuous-valued optimization. This tutorial considers set of stochastic optimization algorithms, including population methods, for unconstrained, continuous-valued optimization. The algorithms covered are stochastic gradient decent, stochastic hill-climbing, simulated annealing, genetic algorithms, and particle swarm optimization.

Sensitivity analysis and feature selection

The following tutorials are implemented in Python;

  • Variance-based sensitivity analysis for independent inputs. This tutorial implements a set of methods, which are applicable when the inputs are independent. First, a surrogate-based method is considered that decomposes the variance based on linear regression considerations. Second, two simulation-based methods are introduced; the first method performs conditional sampling by binning the input space, and the second method performs efficient conditional sampling.

  • Variance-based sensitivity analysis for correlated inputs. This tutorial implements a set of methods, which are applicable when the inputs are correlated. First, two surrogate-based methods are considered; the first method decomposed the variance based on (linear) regression considerations, and the second method decomposes the variance based on a polynomial chaos expansion. Second, two simulation-based methods are introduced; the first method performs conditional sampling by binning the input space, and the second method performs conditional sampling for randomly sampled input realizations.

Hyperparameter tuning, model selection and automated machine learning (AutoML)

References


Sebastian T. Glavind and Michael H. Faber, “A framework for offshore load environment modeling”, in proceedings of the ASME 2018 37th International Conference on Ocean, Offshore and Arctic Engineering (OMAE2018), OMAE2018-77674, 2018.

Sebastian T. Glavind and Michael H. Faber, “A framework for offshore load environment modeling”, Journal of Offshore Mechanics and Arctic Engineering, vol. 142, no. 2, pp. 021702, OMAE-19-1059, 2020.

Sebastian T. Glavind, Henning Brüske and Michael H. Faber, “On normalized fatigue crack growth modeling”, in proceedings of the ASME 2020 39th International Conference on Ocean, Offshore and Arctic Engineering (OMAE2020), OMAE2020-18613, 2020.


About

Code example from my PhD work

License:Apache License 2.0


Languages

Language:Jupyter Notebook 99.6%Language:R 0.4%Language:Python 0.0%