dave-malone / gt-custom-workflow

SageMaker Groundtruth custom workflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build your own custom labeling workflow using SageMaker Ground Truth

Successful machine learning models are built on the shoulders of large volumes of high-quality training data. But, the process to create the training data necessary to build these models is often expensive, complicated, and time-consuming. The majority of models created today require a human to manually label data in a way that allows the model to learn how to make correct decisions. Amazon SageMaker Ground Truth significantly reduces the time and effort required to create datasets for training to reduce costs. These savings are achieved by using machine learning to automatically label data. The model is able to get progressively better over time by continuously learning from labels created by human labelers. Amazon SageMaker Ground Truth provides commonly used task HTML templates and workflows to solve many Vision, NLP and other use cases. It also provides enhanced HTML elements that makes building custom template easier and provide a familiar UI to workers. However, customers may need to build custom workflow for various reasons such as,

  • Branded styling and user experience consistent with IT policies
  • Complex input consisting of multiple elements per task as images, text, custom metadata
  • Dynamic decision making on task input to prevent certain item from going to worker
  • Consolidating labelling output for subsequent training job

In this blog post, we demonstrate a custom text annotation labeling workflow to build labelled dataset for Natural language processing (NLP) problem

Augumented Manifest

server/data/manifest.json server/data/mini_manifest.json

Script to extract text using Amazon Textract

server/prep/detect_lines.py

Script to create Manifest

server/prep/prep_manifest.py

Cloudformation script to deploy Lambda

server/processing/cfn-template.json server/processing/labeling_lambda.zip

Pre and post labeling lambdas

server/processing/sagemaker-gt-postprocess.py server/processing/sagemaker-gt-preprocess.py

React Components

web/README.md web/package.json web/src/App.css web/src/App.js web/src/App.test.js web/src/index.css web/src/index.js web/public/index.html web/public/manifest.json web/public/template.html

About

SageMaker Groundtruth custom workflow


Languages

Language:Python 48.1%Language:JavaScript 26.7%Language:HTML 13.1%Language:CSS 10.6%Language:Shell 1.5%