SrishtiSingh3895

Srishti Singh's repositories

USImmigrationDataLakeETL

The project aims to create a data lake for US immigration data and developing an ETL pipeline to build this data lake using data from various sources. The project was completed as a part of Udacity's Data Engineering Nanodegree program.

Language:Jupyter Notebook100

DataLakeWithSpark_Udacity

This project creates a data lake using AWS S3, EMR and Spark to build an ETL pipeline for a music database.

Language:Python000

DataModelingWithApacheCassandra_Udacity

Modeled song data using Apache Cassandra and designed an ETL pipeline.

Language:Jupyter Notebook000

DataModelingWithPostgres_Udacity

This project was done as a part of Udacity's Data Engineering Nanodegree Program.

Language:Jupyter Notebook000

DataPipelinesWithAirflow_Udacity

Created the DAGs and designed an ETL pipeline using custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. This project was completed as a part of Udacity's Data Engineering Nanodegree.

Language:Python000

DataWarehousingOnAWS_Udacity

This project builds a data warehouse using Amazon Redshift and S3, and was completed as a part of Udacity's Data Engineering Nanodegree program.

Language:Python000

FrequentPatternMining

Comparative study of frequent pattern mining algorithm on Adult Census Data

Language:Jupyter Notebook000

Hackathon-Summer-2020

Data and details for University of Rochester Biomedical Data Science Hackathon

Language:HTMLGPL-3.0000

Online-music-streaming-app

In this project we have created a music database which can be a part of a much larger application of online music streaming. The database keeps record of all the songs and its properties as well as all the artists and their details. Moreover, it also keeps track of the all the users, their playlists and the songs in their playlists. The idea is to capture the user’s taste of music by storing the details of the songs like song name, song genre, artist, number of times user listened to a song etc. so that the analysts can use this data to design a recommender system and to improve the song base. The amount of data that can be collected for creating a music library is quite large. For the purpose of this project, we used two Kaggle datasets of Spotify top tracks and artists to create the database. We also generated synthetic data for user details, playlists etc. Database management system is necessary for this application since it consumes a huge amount of space, and many users access it at the same time from various locations. The database is administered by admins, who can add/delete songs as well as users from their region. Thus, there are two login pages, one for users and other for admins with different functions associated with these accounts. Some of the admin functions include, inserting new songs, deleting old songs, deleting and monitoring users. Moreover, some of the user functions are, viewing or searching songs/artists, creating playlists, deleting playlists, etc.

000

SpotifyDataAnalysis

Analysis of popularity of top 100 spotify songs based on the musical attributes

Language:Python000