Nitish3110

Nitish Bhardwaj's starred repositories

spark-big-data

Spark with Scala. Big data project to analyze 35 GB Parquet data (~400 GB as decompressed CSV) and extract business insights from it

Language:Scala200

A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):

Language:Python2000

test_db

A sample MySQL database with an integrated test suite, used to test your applications and database servers

Language:Shell394300

real-time-streaming-analytics-application-using-apache-kafka

Sample code repository to build a real-time streaming analytics application using Apache Kafka on AWS

Language:JavaMIT-0200

prioritizing-event-processing-with-apache-kafka

Technical solution to implement event processing prioritization with Apache Kafke using the concept of buckets.

Language:JavaMIT-01900

understanding-apache-kafka-partitions

Language:JavaMIT-0100

realtime-dynamodb-zero-etl-opensearch-visualization

In the fast-paced world of data-driven decision making, real-time insights are crucial for staying ahead of the competition. Amazon OpenSearch Service and Amazon DynamoDB offer a powerful combination that enables organizations to visualize and analyze data in near real-time, without the need for complex Extract, Transform, Load (ETL) processes

Language:PythonMIT-0300

real-time-gaming-leaderboard-apache-flink

Example gaming leaderboard application covering streaming ingestion, CDC enrichment, processing and visualisation including demo of advance real-time analytics concepts like late data arrival, exactly-once, dynamic config, archival and on-demand replay

Language:TypeScriptMIT-0900

docker-hive

Language:Shell100000

nyc-taxi

Data Engineering Project (Live Dashboard) and On-the-fly ML Prediction using Kafka, Spark, ElasticSearch and HDFS

Language:Jupyter Notebook800

Azure-Databricks-NYC-Taxi-Workshop

An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset

Language:ScalaMIT10400

flink-recommandSystem-demo

:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度，放入redis缓存，分析日志信息，将画像标签和实时记录放入Hbase。在用户发起推荐请求后，根据用户画像重排序热度榜，并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品，最后返回新的用户列表。

Language:Java422100

rust-data-engineering

Code for a Duke Coursera Rust-based data engineering course

Language:RustNOASSERTION10600

flink-tutorials

Language:JavaApache-2.017200

kafka-python

Python client for Apache Kafka

Language:PythonApache-2.0552600

Several-Coding-Patterns-for-Solving-Data-Structures-and-Algorithms-Problems-during-Interviews

Several Coding Patterns for Solving Data Structures and Algorithms Problems during Interviews

132400

cheat.sh

the only cheat sheet you need

Language:PythonMIT3783200

simplebank

Backend master class: build a simple bank service in Go

Language:GoMIT473500

flink

Apache Flink

Language:JavaApache-2.02351700

dragonfly

A modern replacement for Redis and Memcached

Language:C++NOASSERTION2453800

build-your-own-x

Master programming by recreating your favorite technologies from scratch.

28436500

data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

980600

Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Language:PythonNOASSERTION141000

F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

Language:Python500

HashtagCashtag

My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.

Language:Scala46100