Appsilon / datascience-python

Introduction to Data Science in Python by Appsilon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction to Data Science in Python by Appsilon

Introduction

Welcome to the course Introduction to Data Science in Python by Appsilon!

Target audience

This course aims to introduce people that know how to code in Python into the Data Science world. In particular I show tricks and tips useful for STEM/economic students. One of secondary goals is to show students how use free tools that are industry standards at the same time instead of Matlab/Statistica/SAS and so on.

Covered topics

  1. The course starts with introducing what does Data Scientist do in his work and why this job is so important in XXI century. Then we start the technical part of the course.
  2. numpy - numbers and vectors, fundamentals of all calculations in Python
  3. pandas - data frames - SQL-like, in-memory data, fundamentals of data processing in Python
  4. matplotlib and plotly - plots, basics of data visualization
  5. scikit-learn - introduction to machine learning, examples from the go-to library in Python
  6. streamlit, quarto, fastapi - simple, useful and creative ways to share your work in Python and to generate beautiful reports

Apart from those libraries I present and benchmark the polars library - a high-performant replacement for pandas if you work datasets of sizes 0.5GB - 5GB and pandas starts to be too slow.

Course materials

All course materials are located either here or on google drive. Code and small datasets are in repo, while large size datasets are located on google drive.

I suggest using html files, generated from qmd and ipynb with quarto.

Guide to setup an environment included in the introduction presentation.

tl;dr You can try

conda create -n ds-course python=3.10
conda activate ds-course
pip install -r requirements.txt

Homeworks

Each lecture has also some homework assignment. For every homework, there's provided solution in a separate directory. Note that solutions are not necessarily the best possible, but may present some interesting approach. Very often there are multiple ways you can approach the same problem.

License

The course has been prepared by Piotr Pasza Storożenko from Appsilon. It is available under CC BY 4.0 license. Feel free to use these materials for your use, you just have to attribute the original author.

Some exercise have been inspired by the exercises author had to solve while studying.

About

Introduction to Data Science in Python by Appsilon


Languages

Language:HTML 85.1%Language:Jupyter Notebook 14.9%Language:Python 0.0%