Introduction to Data Science in Python by Appsilon
Introduction
Welcome to the course Introduction to Data Science in Python by Appsilon!
Target audience
This course aims to introduce people that know how to code in Python into the Data Science world. In particular I show tricks and tips useful for STEM/economic students. One of secondary goals is to show students how use free tools that are industry standards at the same time instead of Matlab/Statistica/SAS and so on.
Covered topics
- The course starts with introducing what does Data Scientist do in his work and why this job is so important in XXI century. Then we start the technical part of the course.
numpy
- numbers and vectors, fundamentals of all calculations in Pythonpandas
- data frames - SQL-like, in-memory data, fundamentals of data processing in Pythonmatplotlib
andplotly
- plots, basics of data visualizationscikit-learn
- introduction to machine learning, examples from the go-to library in Pythonstreamlit
,quarto
,fastapi
- simple, useful and creative ways to share your work in Python and to generate beautiful reports
Apart from those libraries I present and benchmark the polars
library - a high-performant replacement for pandas
if you work datasets of sizes 0.5GB - 5GB and pandas starts to be too slow.
Course materials
All course materials are located either here or on google drive. Code and small datasets are in repo, while large size datasets are located on google drive.
I suggest using html
files, generated from qmd
and ipynb
with quarto
.
Guide to setup an environment included in the introduction presentation.
tl;dr You can try
conda create -n ds-course python=3.10
conda activate ds-course
pip install -r requirements.txt
Homeworks
Each lecture has also some homework assignment. For every homework, there's provided solution in a separate directory. Note that solutions are not necessarily the best possible, but may present some interesting approach. Very often there are multiple ways you can approach the same problem.
License
The course has been prepared by Piotr Pasza Storożenko from Appsilon. It is available under CC BY 4.0 license. Feel free to use these materials for your use, you just have to attribute the original author.
Some exercise have been inspired by the exercises author had to solve while studying.