There are 9 repositories under aws-glue topic.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
The Open source Resource as Code framework for Apache Kafka. Jikkou helps you implement GitOps for Kafka at scale!
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Data Lake as Code, featuring ChEMBL and OpenTargets
Glue scripts for converting AWS Service Logs for use in Athena
Automated data quality suggestions and analysis with Deequ on AWS Glue
Open innovation with 60 minute cloud experiments on AWS
Streamlit EDA Dashboard Powered by AWS Cloud
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Bring your own data Labs: Build a serverless data pipeline based on your own data
Terraform modules for provisioning and managing AWS Glue resources
Use the AWS Glue Schema Registry in Python projects.
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Sample code to collect Apache Iceberg metrics for table monitoring
Automate the daily partitioning of your CloudTrail bucket in Athena
Build and Deploy A Serverless Data Pipeline on AWS
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
End-to-end data engineer project
Terraform module which creates Glue resources on AWS
AWS Glue tutorial for data developers.
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
🐋 Docker image for AWS Glue Spark/Python
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
Example of how to set SBT up for local development of AWS Glue Scripts
Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.
This repository contains a 10 step program to enter the world of Data Engineering