Nitish Bhardwaj's starred repositories

spark-big-data

Spark with Scala. Big data project to analyze 35 GB Parquet data (~400 GB as decompressed CSV) and extract business insights from it

Language:ScalaStargazers:2Issues:0Issues:0

aws-data-pipeline

A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):

Language:PythonStargazers:20Issues:0Issues:0

test_db

A sample MySQL database with an integrated test suite, used to test your applications and database servers

Language:ShellStargazers:3943Issues:0Issues:0

real-time-streaming-analytics-application-using-apache-kafka

Sample code repository to build a real-time streaming analytics application using Apache Kafka on AWS

Language:JavaLicense:MIT-0Stargazers:2Issues:0Issues:0

prioritizing-event-processing-with-apache-kafka

Technical solution to implement event processing prioritization with Apache Kafke using the concept of buckets.

Language:JavaLicense:MIT-0Stargazers:19Issues:0Issues:0
Language:JavaLicense:MIT-0Stargazers:1Issues:0Issues:0

realtime-dynamodb-zero-etl-opensearch-visualization

In the fast-paced world of data-driven decision making, real-time insights are crucial for staying ahead of the competition. Amazon OpenSearch Service and Amazon DynamoDB offer a powerful combination that enables organizations to visualize and analyze data in near real-time, without the need for complex Extract, Transform, Load (ETL) processes

Language:PythonLicense:MIT-0Stargazers:3Issues:0Issues:0

real-time-gaming-leaderboard-apache-flink

Example gaming leaderboard application covering streaming ingestion, CDC enrichment, processing and visualisation including demo of advance real-time analytics concepts like late data arrival, exactly-once, dynamic config, archival and on-demand replay

Language:TypeScriptLicense:MIT-0Stargazers:9Issues:0Issues:0
Language:ShellStargazers:1000Issues:0Issues:0

nyc-taxi

Data Engineering Project (Live Dashboard) and On-the-fly ML Prediction using Kafka, Spark, ElasticSearch and HDFS

Language:Jupyter NotebookStargazers:8Issues:0Issues:0

Azure-Databricks-NYC-Taxi-Workshop

An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset

Language:ScalaLicense:MITStargazers:104Issues:0Issues:0

flink-recommandSystem-demo

:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。

Language:JavaStargazers:4221Issues:0Issues:0

rust-data-engineering

Code for a Duke Coursera Rust-based data engineering course

Language:RustLicense:NOASSERTIONStargazers:106Issues:0Issues:0
Language:JavaLicense:Apache-2.0Stargazers:172Issues:0Issues:0

kafka-python

Python client for Apache Kafka

Language:PythonLicense:Apache-2.0Stargazers:5526Issues:0Issues:0

Several-Coding-Patterns-for-Solving-Data-Structures-and-Algorithms-Problems-during-Interviews

Several Coding Patterns for Solving Data Structures and Algorithms Problems during Interviews

Stargazers:1324Issues:0Issues:0

cheat.sh

the only cheat sheet you need

Language:PythonLicense:MITStargazers:37832Issues:0Issues:0

simplebank

Backend master class: build a simple bank service in Go

Language:GoLicense:MITStargazers:4735Issues:0Issues:0

flink

Apache Flink

Language:JavaLicense:Apache-2.0Stargazers:23517Issues:0Issues:0

dragonfly

A modern replacement for Redis and Memcached

Language:C++License:NOASSERTIONStargazers:24538Issues:0Issues:0

build-your-own-x

Master programming by recreating your favorite technologies from scratch.

Stargazers:284365Issues:0Issues:0

data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Stargazers:9806Issues:0Issues:0

Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Language:PythonLicense:NOASSERTIONStargazers:1410Issues:0Issues:0

F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language:PythonStargazers:5Issues:0Issues:0

HashtagCashtag

My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on ​lambda architecture​, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - ​Apache Kafka ​for data ingestions, Apache Spark ​& ​Spark Streaming ​for batch & real-time processing, ​Apache Cassandra f​ or storage, ​Flask​, ​Bootstrap and ​HighCharts f​ or frontend.

Language:ScalaStargazers:461Issues:0Issues:0

DataEngineeringProjects

Data Engineering Projects

Language:Jupyter NotebookStargazers:2Issues:0Issues:0

dp203

Exam DP-203: Data Engineering on Microsoft Azure Crash Course

Language:Jupyter NotebookLicense:MITStargazers:70Issues:0Issues:0

covid19_23

Reporting of COVID-19 cases, deaths, hospital and ICU occupancies till week 47 of 2023 using a data engineering pipeline in Azure, visualized in PowerBI

Language:PythonStargazers:1Issues:0Issues:0

book-product-data-pipeline-project

Automate ETL pipeline, build a data warehouse.

Language:PythonStargazers:8Issues:0Issues:0

TTC-delay-analytics

Analyzing TTC bus, subway & streetcar delay data to identify delay hotspots, root causes, and find valuable insights by creating an ETL pipeline and a dashboard

Language:Jupyter NotebookStargazers:3Issues:0Issues:0