Nitish Bhardwaj's starred repositories
spark-big-data
Spark with Scala. Big data project to analyze 35 GB Parquet data (~400 GB as decompressed CSV) and extract business insights from it
aws-data-pipeline
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
real-time-streaming-analytics-application-using-apache-kafka
Sample code repository to build a real-time streaming analytics application using Apache Kafka on AWS
prioritizing-event-processing-with-apache-kafka
Technical solution to implement event processing prioritization with Apache Kafke using the concept of buckets.
realtime-dynamodb-zero-etl-opensearch-visualization
In the fast-paced world of data-driven decision making, real-time insights are crucial for staying ahead of the competition. Amazon OpenSearch Service and Amazon DynamoDB offer a powerful combination that enables organizations to visualize and analyze data in near real-time, without the need for complex Extract, Transform, Load (ETL) processes
real-time-gaming-leaderboard-apache-flink
Example gaming leaderboard application covering streaming ingestion, CDC enrichment, processing and visualisation including demo of advance real-time analytics concepts like late data arrival, exactly-once, dynamic config, archival and on-demand replay
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
flink-recommandSystem-demo
:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
rust-data-engineering
Code for a Duke Coursera Rust-based data engineering course
kafka-python
Python client for Apache Kafka
Several-Coding-Patterns-for-Solving-Data-Structures-and-Algorithms-Problems-during-Interviews
Several Coding Patterns for Solving Data Structures and Algorithms Problems during Interviews
simplebank
Backend master class: build a simple bank service in Go
build-your-own-x
Master programming by recreating your favorite technologies from scratch.
data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
HashtagCashtag
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
DataEngineeringProjects
Data Engineering Projects
covid19_23
Reporting of COVID-19 cases, deaths, hospital and ICU occupancies till week 47 of 2023 using a data engineering pipeline in Azure, visualized in PowerBI
book-product-data-pipeline-project
Automate ETL pipeline, build a data warehouse.
TTC-delay-analytics
Analyzing TTC bus, subway & streetcar delay data to identify delay hotspots, root causes, and find valuable insights by creating an ETL pipeline and a dashboard