Working with AWS Comprehend and AWS Sagemaker

Working with AWS Comprehend

1. Uploaded a file on a new bucket.

2. Lambda function created with default role, and a new policy added to access Comprehend and logs.

3. Lambda function time and memory size increased to 30 seconds and 256 mb respectively.

4. S3 PUT trigger added to Lambda function so whenever file will be uploaded to S3, this lambda function will be invoked.

5. Comprehend is called to detect sentiment for the file present in the S3 bucket.

6. As Comprehend doesnt take more than 5000 bytes at a time and our text file is quite big, we will now use nltk to sentence tokenize the whole text and pass each sentence one by one.

7. The above way worked fine, but as sentiments of paragraphs are better than a sentence, so I decided to do that. Seeing the file we can notice that all the tweets in the file are 2 line seperated so I split them using '/n/n' and found sentiment of each item of the list

Working with AWS SageMaker

1. Upload all training files on a bucket, invoke a lambda function on 'PUT' operation of bucket.

2. Lambda function will fetch the file, tokenize it, find Levenshtein distance between words and save in a csv file 'trainVector.csv'. The csv file will be saved in bucket 'train-data'.

3. If another training file is uploaded, then append to the same csv file

4. If the file name is from 000-299 then its a training file, 300-401 is a testing file. Hence if its a testing file, then uploaded to 'testVector.csv'.

5. Now our testing and training data is ready for SageMaker K means algorithm.

References

https://aws.amazon.com/blogs/compute/using-aws-lambda-and-amazon-comprehend-for-sentiment-analysis/

http://www.nltk.org/howto/metrics.html

https://medium.com/@pekelny/how-to-unit-test-an-aws-lambda-524069d4fe06

Harshpatel44 / Working-with-AWS-Comprehend-and-AWS-Sagemaker

Working with AWS Comprehend and AWS Sagemaker