mickdelaney / companion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code repository for the Practical Weak Supervision O'Reilly book.

Overview

Getting quality labeled data for supervised learning is an important step towards training performant machine learning models. In many real-world projects, getting labeled data often takes up a significant amount of time. Weak Supervision is emerging as an important catalyst towards enabling data science teams to fuse insights from heuristics , and crowd-sourcing to produce weakly labeled datasets that can be used as inputs for machine learning and deep learning tasks.

Who Should Read This Book

The primary audience of the book will be professional and citizen data scientists who are already working on machine learning projects, and face the typical challenges of getting good, quality labeled data for these projects. They will have working knowledge of the programming language Python, and are familiar with machine learning libraries and tools.

Navigating the book and code samples

This book is organized roughly as follows:

  • Chapter 1 provides a basic introduction to the field of Weak Supervision, and how data scientists and machine learning engineers can use it as part of the data science process.
  • Chapter 2 discusses how to get started with using Snorkel for weak supervision and introduces concepts in using Snorkel for data programming.
  • Chapter 3 describes how to use Snorkel for labeling, and provides code examples on how one can use Snorkel to label a text and image dataset.
  • Chapters 4 and 5 are included as part of the book to enable practitioners to have an end-to-end understanding of how to use a weakly labeled dataset for text and image classification
  • Chapter 6 discusses the practical considerations on using Snorkel with large datasets, and how to use Spark clusters to scale labeling.

About


Languages

Language:Jupyter Notebook 100.0%