aws-glue

There are 9 repositories under aws-glue topic.

aws-sdk-pandas
aws / aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
python aws pandas apache-arrow apache-parquet data-engineering etl data-science redshift athena lambda aws-lambda aws-glue emr amazon-athena glue-catalog mysql amazon-sagemaker-notebook modin ray
Language:Python 4073
awesome-aws-workshops
dgomesbr / awesome-aws-workshops
(Unofficial) curated list of awesome workshops found around in the internet. As we all have been there, finding that workshop that you have just attended shouldn't be hard. The idea is to provide an easy central repository, in a collaborative way.
aws-workshops serverless aws-iot application-modernization sql-server analytics amazon-textract amazon-sagemaker sagemaker-workshop aws aws-iam immersion-day amazon-sagemaker-workshop aws-glue amazon-eks-workshop iot-aws
Language:HTML 413
tokern / piicatcher
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
aws-athena aws-glue aws-redshift catalog data data-catalog database phi pii python snowflake
Language:Python 322
jikkou
streamthoughts / jikkou
The Open source Resource as Code framework for Apache Kafka. Jikkou helps you implement GitOps for Kafka at scale!
apache-kafka automation aws-glue cluster-manager datamesh devops gitops hacktoberfest infrastructure-as-code java kafka kafka-cluster kafka-manager kafka-topic yaml
Language:Java 253
dataall
data-dot-all / dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
aws aws-glue aws-lake-formation aws-s3 data data-science etl-framework lakeformation lakehouse redshift
Language:Python 246
aws-samples / data-lake-as-code
Data Lake as Code, featuring ChEMBL and OpenTargets
aws aws-cdk aws-cdk-constructs aws-glue aws-lake-formation
Language:TypeScript 173
awslabs / athena-glue-service-logs
Glue scripts for converting AWS Service Logs for use in Athena
glue-scripts cloudtrail-logs elb-logs athena glue-job aws-glue s3-log-parser cloudfront-logs alb-logs vpc-flow-logs
Language:Python 140
aws-samples / amazon-deequ-glue
Automated data quality suggestions and analysis with Deequ on AWS Glue
deequ aws-glue aws data-quality
Language:Scala 88
cloud-experiments
aws-samples / cloud-experiments
Open innovation with 60 minute cloud experiments on AWS
aws-glue data-science amazon-athena amazon-sagemaker amazon-s3 machine-learning amazon-comprehend amazon-rekognition notebooks aws-cloud
Language:Jupyter Notebook 87
streamlit-application-deployment-on-aws
aws-samples / streamlit-application-deployment-on-aws
Streamlit EDA Dashboard Powered by AWS Cloud
streamlit-dashboard aws aws-cloudformation aws-glue aws-athena aws-cognito aws-sagemaker
Language:Python 84
Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
mls-c01 aws machine-learning apache-spark amazon-athena amazon-ec2 amazon-emr amazon-sagemaker aws-glue aws-certified aws-certified-machine-learning amazon-cloudwatch amazon-transcribe aws-batch amazon-comprehend aws-lambda amazon-s3 amazon-textract linear-regression neural-network
73
aws-samples / aws-glue-jobs-unit-testing
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
automated-testing aws aws-glue pytest
Language:Python 48
aws-samples / analyzing-reddit-sentiment-with-aws
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
kinesis-firehose delivery-stream aws-glue data-stream amazon-athena data-lake reddit sentiment-analysis sentiment-classification real-time self-learning tutorials
Language:Python 44
aws-samples / bring-your-own-data-labs
Bring your own data Labs: Build a serverless data pipeline based on your own data
analytics aws labs hands-on-lab aws-glue aws-s3 aws-quicksight
Language:HTML 44
cloudposse / terraform-aws-glue
Terraform modules for provisioning and managing AWS Glue resources
aws aws-glue etl etl-job glue workflow
Language:HCL 34
DisasterAWARE / aws-glue-schema-registry-python
Use the AWS Glue Schema Registry in Python projects.
aws aws-glue schema-registry kafka avro python
Language:Python 34
aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
apache-iceberg aws-glue aws-dms aws-athena apache-spark
Language:Python 33
1oglop1 / aws-glue-monorepo-style
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
python aws aws-glue serverless datascience terraform
Language:Python 32
awslabs / amazon-athena-cross-account-catalog
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
amazon-athena aws-glue
Language:Python 30
aws-samples / monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
apache-iceberg aws aws-cloudwatch aws-glue aws-lambda data-quality monitoring sam-cli apache-spark pyiceberg
Language:Python 28
SWO-GS / athena-cloudtrail-partitioner
Automate the daily partitioning of your CloudTrail bucket in Athena
athena aws aws-athena aws-glue cloudtrail cloudtrail-logs glue gorillastack partitioning
Language:JavaScript 28
tokern / lakecli
A CLI to manage and monitor permissions in AWS Lake Formation
aws aws-glue aws-lake-formation permissions sql
Language:Python 26
vincentclaes / serverless_data_pipeline_example
Build and Deploy A Serverless Data Pipeline on AWS
aws aws-glue aws-lambda aws-s3 python serverless-framework
Language:Python 26
aws-samples / aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
aws-glue apache-iceberg aws-athena apache-spark aws-glue-streaming
Language:Python 25
andreichiro / data_engineer_end2end
End-to-end data engineer project
ansible-playbook api-rest aws-api-gateway aws-ecs-fargate aws-glue aws-lambda aws-s3 databricks dbt-cloud docker pandas pyspark python sql terraform vitrinedev
Language:HTML 23
chgasparoto / terraform-aws-glue
Terraform module which creates Glue resources on AWS
aws aws-glue terraform terraform-modules
Language:HCL 23
mikaelahonen-solita / aws-glue-tutorial
AWS Glue tutorial for data developers.
spark pyspark aws-glue tutorial
Language:Python 23
moritzkoerber / covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
aws aws-ecr aws-glue aws-lambda aws-s3 docker great-expectations pyspark spark api aws-cdk aws-redshift aws-cloudformation apache-airflow apache-spark
Language:Python 23
webysther / aws-glue-docker
🐋 Docker image for AWS Glue Spark/Python
spark aws-glue development docker docker-image dockerfile aws cdk sam aws-cli python-poetry pytest python glue-pyspark data-engineering pandas apache-arrow glue-catalog etl aws-glue-docker
Language:Dockerfile 23
aws-samples / amazon-rds-export-to-s3-automation
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
amazon-athena amazon-eventbridge amazon-rds amazon-s3 amazon-sns aws-backup aws-cloudformation aws-glue aws-glue-crawler aws-kms aws-lambda
20
aws-samples / aws-glue-crawler-utilities
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
aws-glue aws-glue-crawler
Language:Python 19
amzn / rheoceros
Cloud-based AI / ML workflow and data application development framework
bring-your-own-account aws ai flow cloud data-science event-based low-code-framework machine-learning feature-engineering aws-glue aws-emr sagemaker-notebook-instance sagemaker-notebook aws-lambda serverless spark pyspark scala-spark
Language:Python 17
jhole89 / aws-glue-sbt-quickstart
Example of how to set SBT up for local development of AWS Glue Scripts
sbt apache-spark aws-glue quickstart
Language:Scala 16
jonrau1 / AWS-ComplianceMachineDontStop
Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.
terraform aws guardduty modules lambda python remediation automation kinesis-firehose waf security-hub aws-glue aws-config devops cloud-security compliance aws-cognito aws-xray devsecops secops
Language:HCL 16
vincentclaes / glue-devcontainer
Glue VSCode devcontainer setup
aws aws-glue glue vscode
Language:Python 14
Data-Engineering-Onboarding-Starter
wednesday-solutions / Data-Engineering-Onboarding-Starter
This repository contains a 10 step program to enter the world of Data Engineering
aws aws-glue data data-engineering etl glue spark workflow data-template dataengg data-engg-learning data-engineering-starter dataengg-template
Language:Python 14

aws-glue

aws / aws-sdk-pandas

dgomesbr / awesome-aws-workshops

tokern / piicatcher

streamthoughts / jikkou

data-dot-all / dataall

aws-samples / data-lake-as-code

awslabs / athena-glue-service-logs

aws-samples / amazon-deequ-glue

aws-samples / cloud-experiments

aws-samples / streamlit-application-deployment-on-aws

Ditectrev / Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

aws-samples / aws-glue-jobs-unit-testing

aws-samples / analyzing-reddit-sentiment-with-aws

aws-samples / bring-your-own-data-labs

cloudposse / terraform-aws-glue

DisasterAWARE / aws-glue-schema-registry-python

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

1oglop1 / aws-glue-monorepo-style

awslabs / amazon-athena-cross-account-catalog

aws-samples / monitoring-apache-iceberg-table-metadata-layer

SWO-GS / athena-cloudtrail-partitioner

tokern / lakecli

vincentclaes / serverless_data_pipeline_example

aws-samples / aws-glue-streaming-etl-with-apache-iceberg

andreichiro / data_engineer_end2end

chgasparoto / terraform-aws-glue

mikaelahonen-solita / aws-glue-tutorial

moritzkoerber / covid-19-data-engineering-pipeline

webysther / aws-glue-docker

aws-samples / amazon-rds-export-to-s3-automation

aws-samples / aws-glue-crawler-utilities

amzn / rheoceros

jhole89 / aws-glue-sbt-quickstart

jonrau1 / AWS-ComplianceMachineDontStop

vincentclaes / glue-devcontainer

wednesday-solutions / Data-Engineering-Onboarding-Starter