Dogukan Ulu's repositories
kafka_spark_structured_streaming
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
airflow_kafka_cassandra_mongodb
Produce Kafka messages, consume them and upload into Cassandra, MongoDB.
csv_extract_airflow_docker
Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.
docker-airflow
Docker Apache Airflow
crypto_api_kafka_airflow_streaming
Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase
aws_end_to_end_streaming_pipeline
An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)
glue_etl_job_data_catalog_s3
Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog
parquet_gcs_bucket_to_bigquery_table
Parquet files will be obtained regularly from a public GCS bucket. They will be written to BQ table
kaggle_projects
In this repository, I created ML algorithms for various Kaggle Competitions
s3_trigger_lambda_to_rds
Send a dataframe to S3 automatically, trigger Lambda and modify dataframe, upload to RDS
send_data_to_aws_services
This repo automates the processes when we want to send remote data to AWS services such as Kinesis, S3, etc.
dogukannulu
My personal repo
csv_to_kinesis_streams
This repo will write a CSV file to the Amazon Kinesis Data Streams
twitter_etl_s3
Get data via Twitter API, orchestrate with Airflow and store in S3 bucket
amazon_msk_kafka_streaming
Create Kafka topic, stream the data to producer and consume on the console using Amazon MSK
data-generator
This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.
IBM-Data-Science-Capstone-Project
This repository is created for IBM Data Science Professional Certificate Capstone Project
read_from_s3_upload_to_rds
Upload the remote data into Amazon S3, read the data and upload to Amazon RDS MySQL
prefect-example-flows
Create sample Prefect flows, deploy them as Docker containers and store within GitHub
snowpipe-aws-stream-processing
Get the streaming data from the S3 bucket with SQS queue. Load into Snowflake with Snowpipe and modify the data with Snowflake task