There are 1 repository under rdd topic.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Spark RDD with Lucene's query and entity linkage capabilities
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
InfluxDB connector to Apache Spark on top of Chronicler
Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
Causal Inference Using Quasi-Experimental Methods
Spark access to Common Information Model (CIM) files
pyspark dataframe made easy
Guide to Clojure REPL Driven Development with Emacs Doom
openmrs - mysql - debezium - kafka - spark - scala
Sentiment Analysis and Data Visualization
:pencil: Preview your Markdown locally as it would appear on GitHub, with live updating
SQLRDD for Harbour++ and Harbour
One Ring is a framework to unify, unite and bind Apache Spark-based computing modules, and run them in parametrized chains
A bunch of low-level basic methods for data processing and monitoring with Scala Spark
Apache Spark Basics - Java Examples
rddapp: Regression Discontinuity Design Application
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
A library having Java and Scala examples for Spark 2.x
Package provides java implementation of big-data genetic programming for Apache Spark
Pyspark WordCount
Reading, writing and deleting from HBase with Spark RDD
MT4S - Multiple Tests 4 Spark - a simple Junit/Scalatest testing framework for Apache Spark
Replication files and simulations for Johansson et al 2023 JHE
Replication of Lindo, Sanders & Oreopoulos (2010), Student Project
Apache Spark machine learning project using pyspark