Hellwalker / DASE

Data Analysis in Software Engineering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

# Data Analysis in Software Engineering (DASE)

Javier Dolado and Daniel Rodriguez

(DRAFT - Not ready yet `r Sys.Date()`)


This course covers sereral aspects of data analysis in Software Engieering (SE) and is been created by [Javier Dolado](www.sc.ehu.eus/jiwdocoj/) at the [University of the Basque Country](www.ehu.eus) and [Daniel Rodriguez](http://www.cc.uah.es/drg) at the [University of Alcala](http://www.uah.es/).

It is mainly based on [R](http://cran.r-project.org) and [RStudio](http://www.rstudio.com/) but we will also show some examples in [Weka](http://www.cs.waikato.ac.nz/ml/weka/) and other packages. [RMarkdown](http://rmarkdown.rstudio.com/) can be compiled to html (and other formats) with RStudio's Knit.

It is structured as follows:

1 [Introduction to data analysis and model building](./sections/intro.Rmd)

2 [Data Sources](./sections/dataSources.Rmd)

   - Sources of information
   - Public repositories in software engineering
   
3 [Preprocessing Techniques](./sections/basicPreprocessing.Rmd)

   + Data types
   + Data Cleaning (duplicates, imbalance, noise)
   + Data Discretisation
   + Data Normalisation
   
4 [Exploratory Data Analysis](./sections/exploratoryDataAnalysis.Rmd)

   + Visualization
   
5 [Descriptive Statistics](./sections/descriptiveStatistics.Rmd)

6 [Basic Model Building](./sections/basicModelBuilding.Rmd) (Machine Learning Techniques)

   + Supervised
      + Regression and classification
      + Rules and Decision Trees
      + Nearest Neighbours (Lazy approaches)
      + Neural Networks
      + Probabilistic Classifiers
      
   + Unsupervised
      + Clustering
      + Association rules
      
   + Other approaches
      + Weak Classification, Semi-supervised learning
   
7 [Evaluation](./sections/evaluation.Rmd)

   - Descriptive statistics
   - Evaluation measures in machine learning
   - Graphical evaluation techniques (ROC and other visual evaluation techniques)
   - [Evaluation in Software Engineering](./sections/evaluationInSoftEng.Rmd)
   
8 [Advanced Model Building](./sections/advancedModelBuilding.Rmd) (Advanced algorithms)

  - Metalearners
  - Hybrid approaches
    
9 [Advanced Preprocessing Techniques](./sections/advancedPreprocessingTechniques.Rmd)

   + Noise
   + Feature Selection and Instance Selection
   + Imbalance
   + Missing values (Imputation methods)

10 [Classical Hypothesis Testing](./sections/classicalHypothesisTesting)

   + p-values
   + Equivalence Hypothesis Testing
   
11 [Time Series](./sections/timeSeries.Rmd)

12 [Social Network Analysis](./sections/SNAinSE.Rmd)

13 [Dealing with Large Volumes of Data](./sections/bigData.Rmd)

   + Apache Spark Introduction

Appendix A - [Introduction to R](./sections/rIntro.Rmd)

***

# Acknowledgements 

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 324356

***

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

About

Data Analysis in Software Engineering


Languages

Language:TeX 100.0%