This repo is used to host the playground for python data analysis libraries
I'll be following Wes Mckinney's 3rd edition - Python for Data Analysis; which desals with data wrangling and analysis using standard python ML libraries. Book deals with Tabular data in different forms, multi - dimm arrays, sql tables and time series Libraries used:
- NumPy
- pandas
- matplotlib
- IPython and Jupyter
- SciPy
- scikit-learn
- statsmodel
conda config --add channels conda-forge
conda config --set channel_priority strict
conda create -y -n pydata-book python=3.10
conda activate pydata-book
Materials and IPython notebooks for "Python for Data Analysis, 3rd Edition" by Wes McKinney, published by O'Reilly Media. Book content including updates and errata fixes can be found for free on my website.
If you are reading the 2nd Edition (published in 2017), please find the
reorganized book materials on the 2nd-edition
branch.
If you are reading the 1st Edition (published in 2012), please find the
reorganized book materials on the 1st-edition
branch.
- Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
- Chapter 3: Built-in Data Structures, Functions, and Files
- Chapter 4: NumPy Basics: Arrays and Vectorized Computation
- Chapter 5: Getting Started with pandas
- Chapter 6: Data Loading, Storage, and File Formats
- Chapter 7: Data Cleaning and Preparation
- Chapter 8: Data Wrangling: Join, Combine, and Reshape
- Chapter 9: Plotting and Visualization
- Chapter 10: Data Aggregation and Group Operations
- Chapter 11: Time Series
- Chapter 12: Introduction to Modeling Libraries in Python
- Chapter 13: Data Analysis Examples
- Appendix A: Advanced NumPy
The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.