There are 9 repositories under aws-glue topic.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Data Lake as Code, featuring ChEMBL and OpenTargets
Glue scripts for converting AWS Service Logs for use in Athena
Open innovation with 60 minute cloud experiments on AWS
Automated data quality suggestions and analysis with Deequ on AWS Glue
Streamlit EDA Dashboard Powered by AWS Cloud
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Bring your own data Labs: Build a serverless data pipeline based on your own data
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Use the AWS Glue Schema Registry in Python projects.
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Terraform modules for provisioning and managing AWS Glue resources
Automate the daily partitioning of your CloudTrail bucket in Athena
Build and Deploy A Serverless Data Pipeline on AWS
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Terraform module which creates Glue resources on AWS
AWS Glue tutorial for data developers.
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
🐋 Docker image for AWS Glue Spark/Python
Sample code to collect Apache Iceberg metrics for table monitoring
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
End-to-end data engineer project
Example of how to set SBT up for local development of AWS Glue Scripts
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.
DevOps에 대한 개념 이해와 AWS 개발자 도구를 활용한 실습 및 연구
This repository contains a 10 step program to enter the world of Data Engineering