dogukannulu

Dogukan Ulu's repositories

kafka_spark_structured_streaming

Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra

Language:Python121 1 2

streaming_data_processing

Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO

Language:Python4900

airflow_kafka_cassandra_mongodb

Produce Kafka messages, consume them and upload into Cassandra, MongoDB.

Language:Python3300

csv_extract_airflow_docker

Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.

Language:Python31 20

docker-airflow

Docker Apache Airflow

Language:ShellApache-2.01200

crypto_api_kafka_airflow_streaming

Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase

Language:Python1100

aws_end_to_end_streaming_pipeline

An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)

Language:Python9 10

glue_etl_job_data_catalog_s3

Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog

Language:Jupyter Notebook900

parquet_gcs_bucket_to_bigquery_table

Parquet files will be obtained regularly from a public GCS bucket. They will be written to BQ table

Language:Python9 20

kaggle_projects

In this repository, I created ML algorithms for various Kaggle Competitions

Language:Jupyter Notebook500

s3_trigger_lambda_to_rds

Send a dataframe to S3 automatically, trigger Lambda and modify dataframe, upload to RDS

Language:Python5 10

send_data_to_aws_services

This repo automates the processes when we want to send remote data to AWS services such as Kinesis, S3, etc.

Language:Python5 10

dogukannulu

My personal repo

400

csv_to_kinesis_streams

This repo will write a CSV file to the Amazon Kinesis Data Streams

Language:Python300

twitter_etl_s3

Get data via Twitter API, orchestrate with Airflow and store in S3 bucket

Language:Python3 20

amazon_msk_kafka_streaming

Create Kafka topic, stream the data to producer and consume on the console using Amazon MSK

Language:Python2 10

data-generator

This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.

200

datasets

This repo contains datasets used in trainings.

GPL-3.0200

IBM-Data-Science-Capstone-Project

This repository is created for IBM Data Science Professional Certificate Capstone Project

Language:Jupyter Notebook200

read_from_s3_upload_to_rds

Upload the remote data into Amazon S3, read the data and upload to Amazon RDS MySQL

Language:Jupyter Notebook2 10

docker-hadoop

Apache-2.0100

super_lig_streamlit

Language:Python100

prefect-example-flows

Create sample Prefect flows, deploy them as Docker containers and store within GitHub

Language:Python000

snowpipe-aws-stream-processing

Get the streaming data from the S3 bucket with SQS queue. Load into Snowflake with Snowpipe and modify the data with Snowflake task

Language:Python010