- Install Docker.
- Clone this repo.
cd
into this repo.- Run
docker-compose build
. - To access the notebooks, run
docker-compose up
and go to the URL in the terminal output.
Docker Compose is automatically configured to launch JupyterLab. If you'd like to disable it, remove the following two lines from docker-compose.yml
:
environment:
- JUPYTER_ENABLE_LAB=1
and repeat steps 4 and 5.
- Install virtualenv using
pip3
(this workshop is specifically made for Python 3). - Clone this repo and
cd
into it. - Create a virtual environment with
python -m venv venvname
. Feel free to replacevenvname
with whatever you'd like to name the environment. - Type
source venvname/bin/activate
to activate the environment. - Run
pip install -r requirements.txt
. - Run
jupyter notebook
to launch Jupyter Notebook, orjupyter lab
to launch JupyterLab.
Each bullet will include hands-on exercises.
Section 1: First steps of preprocessing
- Setup and introduction to preprocessing
- Dealing with missing data
- Exploring data types
- Class distribution and imbalance
Section 2: Standardizing data for machine learning
- What is standardization, and when should you standardize?
- Log normalization
- Scaling for feature comparison
- Standardization and modeling
Section 1: Extracting information from features
- What is feature engineering?
- Extracting features using regular expressions
- Encoding variables
- Aggregate statistics
Section 2: Feature selection
- What is feature selection, and when should you manually remove features?
- Removing correlated features
- Using dimensionality reduction for feature selection
- Using PCA to train a dataset
Section 3: UFO dataset (if we have time!)
- Apply various preprocessing techniques to a dataset of UFO sightings and discuss as a group.