PacktPublishing / Data-Centric-Machine-Learning-with-Python

Data-Centric Machine Learning with Python, published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Centric Machine Learning with Python

This is the code repository for Data-Centric Machine Learning with Python, published by Packt.

The ultimate guide to engineering and deploying high-quality models based on good data

What is this book about?

In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets.

This book covers the following exciting features:

  • Understand the impact of input data quality compared to model selection and tuning
  • Recognize the crucial role of subject-matter experts in effective model development
  • Implement data cleaning, labeling, and augmentation best practices
  • Explore common synthetic data generation techniques and their applications
  • Apply synthetic data generation techniques using common Python packages
  • Detect and mitigate bias in a dataset using best-practice techniques
  • Understand the importance of reliability, responsibility, and ethical considerations in ML/AI

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter05.

The code will look like the following:

import pandas as pd
import os
FILENAME = "./loan_dataset.csv"
DATA_URL = "http://archive.ics.uci.edu/ml/machine-learning/databases/00350/default%20of%20credit%20card%20clients.xls"

Following is what you need for this book: This book is for data science professionals and machine learning enthusiasts looking to understand the concept of data-centricity, its benefits over a model-centric approach, and the practical application of a best-practice data-centric approach in their work. This book is also for other data professionals and senior leaders who want to explore the tools and techniques to improve data quality and create opportunities for small data ML/AI in their organizations.

With the following software and hardware list you can run all code files present in the book (Chapter 5-9).

Software and Hardware List

Chapter Software required OS required
5-9 Python 3 Windows, macOS, or Linux

Related products

Get to Know the Authors

Jonas Christensen has spent his career leading data science functions across multiple industries. He is an international keynote speaker, postgraduate educator, and advisor in the fields of data science, analytics leadership, and machine learning and host of the Leaders of Analytics podcast.

Nakul Bajaj is a data scientist, MLOps engineer, educator, and mentor, helping students and junior engineers navigate their data journey. He has a strong passion for MLOps, with a focus on reducing complexity and delivering value from machine learning use cases in business and healthcare.

Manmohan Gosada is a seasoned professional with a proven track record in the dynamic field of data science. With a comprehensive background spanning various data science functions and industries, Manmohan has emerged as a leader in driving innovation and delivering impactful solutions. He has successfully led large-scale data science projects, leveraging cutting-edge technologies to implement transformative products. With a postgraduate degree, he is not only well-versed in the theoretical foundations of data science but is also passionate about sharing insights and knowledge. A captivating speaker, he engages audiences with a blend of expertise and enthusiasm, demystifying complex concepts in the world of data science.

About

Data-Centric Machine Learning with Python, published by Packt

License:MIT License


Languages

Language:Jupyter Notebook 100.0%