jacobeturpin / aws-nlp-data-pipeline

Pipeline for ingesting and leveraging text data using AWS services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS NLP Data Pipeline

Ingest real-time streaming text data with automatic appending of NLP metadata

Architecture Kibana Dashboard

Overview

This project represents a mostly serverless data engineering architecture for ingesting real-time streaming data and automatically appending NLP metadata via managed AWS services. The project may serve as a baseline for implementing complex ingestion pipelines powering NLP services.

The following AWS services are leveraged:

Deployment

This project leverages GitHub Actions for its CI/CD pipeline. If forking, you can deploy via your own Actions by providing the following Secrets in your repository:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION_ID
  • IP_ADDRESS

Example

A dataset for demonstration purposes has been provided. Use the following script to send example data to the Ingest Lambda for processing.

python stream.py

About

Pipeline for ingesting and leveraging text data using AWS services

License:Apache License 2.0


Languages

Language:Python 100.0%