There are 0 repository under awsglue topic.
Tutorials and examples of how to deploy Presto and connect it to different data sources
In this project I have used the Trending YouTube Video Statistics data from Kaggle to analyze and prepare it for usage.
This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.
This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).
Projects on Big Data Using Pyspark and AWS
This project demonstrates how you can build downstream data pipeline using dbt in athena
Create Glue table using CI -CD
Leverages Apache Kafka to facilitate streaming real time data generated by Python to upload data into S3 using s3fs
Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.
Big data and Cloud Deployment
This project offers a robust data pipeline solution designed to efficiently extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. Leveraging a blend of industry-standard tools and services, the pipeline ensures seamless data processing and integration.