rihemebh / Data-mining-Labs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-mining-Labs

Table of Contents
  1. About The Labs
  2. Getting Started
  3. Content

About The repository

This repository contains 4 different labs about data mining.Each lab introduces new concepts on the subject.

(back to top)

Built With

(back to top)

Getting Started

In order to run these labs and extend it you need to follow some few steps :

Prerequisites

  • Make sure that Weka is installed on your operating system.
  • Make sure that Python is installed on your operating system. (download from https://www.python.org/downloads/) (Jupyter is more of a feature)

Installation

Clone the repo

git clone https://github.com/rihemebh/Data-mining-Labs.git

(back to top)

Content

Lab1: Weka Interface

It represents an introduction to Weka (Waikato environment for knowledge analysis).You will be able to :

  1. Discover some datasets(including the famous iris Dataset).
  2. Create classifiers(Decision Tree).
  3. Visualize and interpret data.
  4. Use features filters

Lab2: Weka experimenter

It represents an introduction to Weka experimenter interface.You will be able to :

  1. Generate CSV files containing experiment details.
  2. interpret the different test results.
  3. Compare different algorithms using the weka analyzer.

Lab3 : Supervised Classification

It represents an introduction to the scikit-learn Python library.You will be able to :

  1. Read and manipulate Datasets (Iris Dataset).
  2. Create and use a classifier with different algorithms (Naïve Bayes, Decision Trees).
  3. Evaluate classifier performances (Calculating errors).
  4. Use cross-validation to evaluate the classifier.

Lab4 : Clustering

It represents an introduction to the concept of Unservised Learning.You will be able to :

  1. Read and manipulate Datasets.
  2. Using kmeans for the clustering.
  3. Learning the silhouette coefficient utility.
  4. Using Agglomerative Hierarchical Clustering (CAH) and generating the Dendrogram.
  5. Using Principal component analysis (PCA).
  6. Implementing the Dvisive ANAlysing (DIANA) algorithm.

(back to top)

About


Languages

Language:Jupyter Notebook 100.0%