This is the code repository for Python Data Cleaning and Preparation Best Practices, published by Packt.
A practical guide to organizing and handling data from various sources and formats using Python
This book provides up-to-date methods for data cleaning and preparation using Python. You’ll learn effective techniques for solving the challenges you face, exploring how to create high-quality data products based on well-formatted and cleaned data.
- Ingest data from different sources and write it to the required sinks
- Profile and validate data pipelines for better quality control
- Get up to speed with grouping, merging, and joining structured data
- Handle missing values and outliers in structured datasets
- Implement techniques to manipulate and transform time series data
- Apply structure to text, image, voice, and other unstructured data
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter01.
The code will look like the following:
def process_in_batches(data, batch_size):
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]
Following is what you need for this book: Whether you're a data analyst, data engineer, data scientist, or a data professional responsible for data preparation and cleaning, this book is for you. Working knowledge of Python programming is needed to get the most out of this book.
With the following software and hardware list you can run all code files present in the book (Chapter 1-13).
Chapter | Software required | OS required |
---|---|---|
1-13 | Python3 | Windows, macOS, or Linux |
1-13 | Visual Studio Code (or your preferred IDE) | Windows, macOS, or Linux |
Maria Zervou is a Generative AI and machine learning expert, dedicated to making advanced technologies accessible. With over a decade of experience, she has led impactful AI projects across industries and mentored teams on cutting-edge advancements. As a machine learning specialist at Databricks, Maria drives innovative AI solutions and industry adoption. Beyond her role, she democratizes knowledge through her YouTube channel, featuring experts on AI topics. A recognized thought leader and finalist in the Women in Tech Excellence Awards, Maria advocates for responsible AI use and contributes to open source projects, fostering collaboration and empowering future AI leaders.