aws-samples / amazon-sagemaker-groundtruth-and-amazon-comprehend-ner-examples

Example of Amazon SageMaker Ground Truth and Amazon Comprehend Custom NER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Companion Codes for AWS Blog Post Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend

Update October 2020: Amazon Comprehend now supports Amazon SageMaker GroundTruth to help label your datasets for Comprehend's Custom Model training. For Custom EntityRecognizer, checkout Annotations documentation for more details. For Custom MultiClass and MultiLabel Classifier, checkout MultiClass and MultiLabel documentation for more details respectively.

This repository contains the source CloudFormation template that this blog post uses to setup the data conversion pipeline, and sample corpus.

It is recommended to deploy a stack by following the instructions contained in the blog post. Once the stack is deployed, you can then upload the sample corpus to your new bucket.

We also recommend that you check out https://github.com/aws-samples/amazon-comprehend-examples/ which contains:

  1. the source module used by the Lambda function in this repository, and

  2. the convertGroundtruthToComprehendERFormat.sh script to parse the augmented output manifest file and create the annotations and documents file in CSV format accepted by Amazon Comprehend.

Deployment from source template

To still deploy your stack using the source template in this repository, rather than the one-click deployment described in the blog post, please follow these steps:

  1. Build Lambda layer: on your EC2 instance, run bin/build-lambda-layer-s3fs-p38.sh. Upload the resulted zip file lambda-layer-s3fs-p38.zip to your S3.

  2. Update CloudFormation/template.yaml, specifically resource S3fsP38Layer to point to the layer file from step 1. After modifications, it should look like this:

     S3fsP38Layer:
       Type: AWS::Lambda::LayerVersion
       Properties:
         ...
         Content:
           S3Bucket: <your_s3_bucket_to_host_lambda_layer>
           S3Key: <your_s3_key_to_the_layer_zip>
         ...
  3. Package and upload the CloudFormation template to your S3. Make sure that you deploy to the same region as the layer zip's S3 location.

    cd CloudFormation/
    
    aws  cloudformation package --template-file template.yaml --s3-bucket <value> --s3-prefix <value> --output-template-file cfn.yaml
    
    aws cloudformation deploy --template-file cfn.yaml --capabilities CAPABILITY_IAM --stack-name <value> --parameter-overrides BucketName=<your_s3_bucket>

License Summary

This sample code is made available under the MIT-0 License. See the LICENSE file.

About

Example of Amazon SageMaker Ground Truth and Amazon Comprehend Custom NER

License:MIT No Attribution


Languages

Language:Python 67.2%Language:Shell 32.8%