Machine Learning and Statistics Assessment Repository

Overview
Repository Contents
Requirements
How to run

Overview

This repository contains two Jupyter notebooks and ancillary files demonstrating some aspects of data analysis using the Python programming language and various associated technologies in fulfillment of requirements for the Machine learning and statistics module of the HDipSc in Computing in Data Analytics at the Galway-Mayo Institute of Technology (GMIT). One notebook, scikit-learn.ipynb focuses on the preparation and predictive analysis of data using the Python scikit-learn machine learning library. The second, scipy-stats.ipynb, focuses on performing an ANOVA using the Python statistics library (scipy-stats)[https://docs.scipy.org/doc/scipy/index.html].

Repository Contents

The repository contains two Jupyter notebooks which are independent of one another;

The rest of the contents of the repository all support those files in some way. They are:

README.md; this file
requirements.txt; a list of Python packages required to run the notebooks
.gitignore; a git support file which may be safely ignored
data/; a directory containing two sub directories; penguins/, which contains the penguin data analysed in the scipy-stats notebook, and wine/, which contains the data used in the scikit-learn notebook.

Requirements

Nothing extra is required to view the contents of the repository on github or nbviewer or binder. However see below for discussion of the limitations of these formats.

To run these notebooks locally Python v3.9+ with Pip or some other package manager is the minimum requirement. In order to clone this repository - the easiest way to acquire the code - git v.2+ is required.

Assuming Python is installed then the Python packages listed in requirements.txt are required. These can usually be installed in one go using the requirements.txt file with pip or, presumably, any other Python package manager. See below for details.

How to run

There are three ways to consume the notebooks in this repository:

View here on github by simply clicking on scikit-learn.ipynb or scipy-stats.ipynb, or on nbviewer by clicking on the appropriate button:
- For the scikit-learn notebook:
- For the scipy-stats notebook:
This is fine if viewing is all that is required, but if interactivity is necessary or desirable then options 2 or 3 should be considered.
View and interact with the notebooks on binder by clicking on the button below:

This will give access to the entire repository via a JupyterLab session. The code in the notebooks can be changed and executed or new notebooks can be started to experiment with the data, which is, of course, also accessible from the binder session.
Clone the repository and run a Jupyter server locally by following these steps (these steps have been tested on a Linux system, some details may differ if using a different operating system):
- Ensure that Python v3.9+, Pip, and git v.2+ are all installed.
- Clone the repository by typing git clone git@github.com:fod/machine-learning-statistics.git into a terminal.
- The repository will be downloaded. When it is complete, enter the machine-learning-statistics directory and create a Python virtual environment in a directory called .venv with python -m venv .venv
- Activate the virtual environment with source .venv/bin/activate
- Next install the required packages with pip install requirements.txt
- Finally, start a jupyter server by typing jupyter-lab. A jupyter lab session should launch in a browser window. If it doesn't, a link which can be pasted into a browser address bar is printed in the terminal.

fod / machine-learning-statistics

Machine Learning and Statistics Assessment Repository

Overview

Repository Contents

Requirements

How to run

About

Languages