Mi ;'s repositories
awesome-python
A curated list of awesome Python frameworks, libraries, software and resources
aws-glue-samples
AWS Glue code samples
data-engineering-zoomcamp
Free Data Engineering course!
github-slideshow
A robot powered training repository :robot:
interviews
Everything you need to know to get the job.
Projects-Solutions
:pager: Links to others' solutions to Projects (https://github.com/karan/Projects/)
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
sagemaker-python-sdk
A library for training and deploying machine learning models on Amazon SageMaker
spark-daria
Essential Spark extensions and helper methods ✨😲
atom
:atom: The hackable text editor
awesome-systematic-trading
A curated list of awesome libraries, packages, strategies, books, blogs, tutorials for systematic trading.
build-your-own-x
Master programming by recreating your favorite technologies from scratch.
Databricks-Certified-Data-Engineer-Professional
The resources of the preparation course for Databricks Data Engineer Professional certification exam
DataGristle
Tough and flexible tools for data analysis, transformation, validation and movement.
dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
hadoop
Public hadoop release repository
Hash-Buster
Crack hashes in seconds.
hops
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
joblib-spark
Joblib Apache Spark Backend
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
llama3
The official Meta Llama 3 GitHub site
pyspark-tutoriallll
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
spark-essentials
The official repository for the Rock the JVM Spark Essentials with Scala course
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
winutils
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows