banarasi04's repositories
awesome-etl
A curated list of awesome ETL frameworks, libraries and software.
data-engineering-zoomcamp
Free Data Engineering course!
Data-Science-with-Spark
Machine Learning and Data Analysis Case Studies using Spark.
DataStructureAndAlgorithmsMadeEasyInJava
Data Structure And Algorithms Made Easy In Java
datawarehouse
Solution of Datawarehouse course
drunken-data-quality
Spark package for checking data quality
Fake-Apache-Log-Generator
Generate a boatload of Fake Apache Log files very quickly
HDFSChecksumForLocalfile
This program / jar creates checksum, with same algorithm that hadoop uses to create on hdfs files. So integrity of file can be verified on local and hadoop system. Can also, be used to check if file exist based on checksum, before uploading and cluttering hdfs with duplicate files.
JustEnoughScalaForSpark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Machine-Learning-with-Python
Machine Learning Implementations in Python
mlops-zoomcamp
Free MLOps course from DataTalks.Club
Store
A sample online store web application built in eclipse.
tpcc
Java implementation of TPC-C benchmark
tpcds
Port of TPC-DS data generator to Java
tpcds-gen
Wrap up TPC-DS dsgen into a map-reduce task
tsdb
The Prometheus time series database layer.