Salman AlSuwaina's repositories
Data_Lake
In this project, We will use Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. To complete the project, you will need to load data from S3, process the data into analytics tables using Spark, and load them back into S3. We will deploy this Spark process on a cluster using AWS.
Data_Modeling_Apache-cassandra
Applying data modeling to a NoSQL database with Apache Cassandra and build an ETL pipeline using Python. And modeling the data by creating tables in Apache Cassandra to run queries.
Data_Modeling_PostGres
Modeling the data with Postgres and building an ETL pipeline using Python. I will define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.
Data_Warehouses_AWS
applying data warehouses tools and AWS to build an ETL pipeline for a database hosted on Redshift. loading data from AWS S3 bucket to staging tables on Redshift and executing SQL statements that create the analytics tables from these staging tables.
Machine_Learning_Arrival_prediction
Analyzing a dataset that shows the information of patients of a hospital and shows whether the patient arrived at their booked appointment or not. The goal of this project is to build a Machine learning model that can predict whether the patient with the given info will arrive or not.