apachespark

There are 3 repositories under apachespark topic.

DataExpert-io / data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
apachespark awesome bigdata data dataengineering sql
Language:Jupyter Notebook 26995
apache / hudi
Upserts, Deletes And Incremental Processing on Big Data.
apacheflink apachehudi apachespark bigdata data-integration datalake hudi incremental-processing stream-processing
Language:Java 5687
holdenk / sparkProjectTemplate.g8
Template for Spark Projects
g8 apachespark spark
Language:Scala 101
ApacheSpark
martandsingh / ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
apachespark data-analysis data-engineering database databricks datalake deltalake etl-pipeline hadoop hive spark spark-sql spark-streaming timetravel etl pyspark sql
Language:Python 95
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
spark sparksql scala sparkscala shapeless bigdata apachespark
Language:Scala 78
josephmachado / docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
apachespark docker docker-compose pyspark pyspark-notebook
Language:C 34
propelledanalytics / SparkSQL.jl
SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.
spark julia-language julialang apachespark
Language:Julia 25
tspannhw / FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
apacheflink apachekafka apachenifi apachespark cloudera lakehouse streaming timspann apacheiceberg
19
aravinthsci / Spark_Delta_Lake
Delta Lake Examples
apachespark datalake delta-lake deltalake spark
Language:Jupyter Notebook 12
SmartDataAnalytics / MA-INF-4223-DBDA-Lab
Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn
teaching apachespark bigdata sansa semantics machine-learning rdf bonn university
Language:Jupyter Notebook 11
SandeepAswathnarayana / professional-certificate-programs
This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Science].
deeplearning machinelearning datascience recurrent-neural-networks python pytorch tensorflow pandas numpy matplotlib scipy scikit-learn recommender-system restricted-boltzmann-machine ggplot folium-maps seaborn autoencoder image-classification apachespark
Language:Jupyter Notebook 9
CarolinaNicasio / APACHESPARK-PYSPARK-2023
PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendimiento y un conjunto de herramientas integradas para el análisis y manejo de datos a gran escala.
apache apachespark data-science dataframe github-actions pyspark python python3 rdd spark
7
datumbrain / gossub
Trigger spark-submit in Golang. A Go implementation of famous SparkLauncher.java.
spark apachespark golang go
Language:Go 7
sfrechette / spark-jdbc-mssql
Connect to SQL Server using Apache Spark
apache-spark apachespark jdbc-driver scala spark sql-server sqlserver
Language:Scala 7
lensesio / lenses-jdbc-spark
Apache Spark with Kafka via JDBC !!!
kafka apachespark jdbc-driver
Language:Java 6
ashkrit / sparkmicroservices
Microservices for Spark application
apachespark microservice
Language:Java 5
funkyminds / cleanframes-examples
Examples usages for cleanframes library
spark sparksql bigdata scala shapeless apachespark
Language:Scala 5
sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark
Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.
scala linkprediction aws emr awsemr apachespark dataframes s3 bigdata link-prediction big-data big-data-analytics databricks
Language:Scala 5
AbdelmajidLh / spark-functionality-repo
Ce dépôt GitHub contient un document détaillé sur les bases du langage Scala.
apache apachespark databricks databricks-notebooks pyspark python3 scala spark
3
divithraju / divith-raju-Immigration-Data-Engineering
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
apachespark bigdata bigdataproject capstone-project datacleaning dataengineering datalake datamodeling datapipeline dataprocessing dataset pandas sql bigdataprocessing dataschema datawherehouse
Language:Jupyter Notebook 3
bartosz25 / data-ai-summit-2020
You will find here the demo codes for my Data+AI 2020 talk about customizing Apache Spark state store.
apachespark structured-streaming apache-spark data-engineering streaming-data
2
mayankrawat / CSVJoin
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
apachespark spark one-to-many-join one-to-one-join kafka kafka-producer-spark one-to-many-joins-spark join-apache-spark kafka-with-spark kafka-producer spark-dataframes spark-kafka-integration spark-kafka kafka-spark integrate-kafka-spark one-to-many spark-sql spark-java java spark-csv
Language:Java 2
MRYingLEE / Apache_Spark_in_Google_Colab
This is a Jupyter Notebook to practice Apache Spark in Google Colab, especially for the exam CCA Spark and Hadoop Developer Exam (CCA175).
apachespark googlecolab spark colab
Language:Jupyter Notebook 2
Arkaprabha-B / PySpark-GraphFrames
Implementation of GraphFrames using PySpark in Eclipse IDE
apachespark pyspark-tutorial python3 graphframes
1
Cloudslab / dSpark
Source code for the work "dSpark: Deadline-Based Resource Allocation for Big Data Applications in Apache Spark" published in IEEE e-Science 2017
bigdata resource-allocation apachespark deadline performance-modeling
Language:Java 1
Orkhan-1 / Full-Course-Apache-Spark
This comprehensive course is designed for beginners and experienced developers alike, providing an in-depth exploration of Apache Spark
apachespark bigdata spark-sql spark-streaming
Language:Java 1
sarathchandrikak / ETL-Bank-Transcation
Data Analysis of bank transaction data
apachespark pyspark redshift s3bucket sql sqoop
Language:Jupyter Notebook 1
syedsaadahmed / Hands-on-with-Apache-Spark
Working with Apache Spark, Creating some small tutorials and at last implemeting a small project
apachespark joins machine-learning machine-learning-algorithms pyspark python sql
1
Thapep / ApacheSpark
Apache Spark project for Advanced Topics on Databases course
ntua databases apachespark-rdd spark-sql dataframes-api apachespark
Language:Python 1
ZeroTwoDataRW / DE-Stream-Project-Random-Generated-User-Data
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
airflow apachespark cassandra-database docker postgesql kafka python
Language:Python 1
ayyankalu / Ipl_Data_analysis_Spark
This repository showcases IPL data analysis using Apache Spark. The project demonstrates the power of Spark for data transformation, cleaning, SQL queries, and visualization, all performed with PySpark to handle large-scale data efficiently.
apachespark bigdata data-analysis-python datavisualization iplanalysis pyspark sql
Language:Jupyter Notebook 0
Cyang18 / MusicProducer
This is a distributed system that utilizes Apache Spark through Dataproc. We use the Spotify API to send song data to Apache Spark, which then forwards the information to Google Cloud Services. The system processes this data to recommend songs based on the extracted information.
python3 apache apachespark dataproc-cluster hive javascript
Language:Python 0
sakshi2k / apache-spark-databricks-job
Sample project to run databricks job using a java jar and utilising UDFs.
apachespark java udfs user-defined-functions
Language:Java 0
hassonor / kafka-spark-data-engineering
apachekafka apachespark data docker docker-compose high-performance java kafka producer python
Language:Java
MarioInf-Master-CompuerScience-UCM / Gestion_datosInformacion
Repositorio de trabajo de la asignatura "Sistemas de gestión de datos y de la información" (curso 22-23), perteneciente al Máster en Ingeniería Informática de la Universidad Complutense de Madrid (UCM)
mongodb python spanish-language apachespark mapreduce neo4j nosql-database web-scraping
Language:Jupyter Notebook
Punam918 / Real_Time_Voting_Analysis
Real Time Voting Analysis System using Big Data Technologies like apache spark and apache Kafka.
apachekafka apachespark big-data dataengineering
Language:Python

apachespark

DataExpert-io / data-engineer-handbook

apache / hudi

holdenk / sparkProjectTemplate.g8

martandsingh / ApacheSpark

funkyminds / cleanframes

josephmachado / docker_for_data_engineers

propelledanalytics / SparkSQL.jl

tspannhw / FLiPStackWeekly

aravinthsci / Spark_Delta_Lake

SmartDataAnalytics / MA-INF-4223-DBDA-Lab

SandeepAswathnarayana / professional-certificate-programs

CarolinaNicasio / APACHESPARK-PYSPARK-2023

datumbrain / gossub

sfrechette / spark-jdbc-mssql

lensesio / lenses-jdbc-spark

ashkrit / sparkmicroservices

funkyminds / cleanframes-examples

sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark

AbdelmajidLh / spark-functionality-repo

divithraju / divith-raju-Immigration-Data-Engineering

bartosz25 / data-ai-summit-2020

mayankrawat / CSVJoin

MRYingLEE / Apache_Spark_in_Google_Colab

Arkaprabha-B / PySpark-GraphFrames

Cloudslab / dSpark

Orkhan-1 / Full-Course-Apache-Spark

sarathchandrikak / ETL-Bank-Transcation

syedsaadahmed / Hands-on-with-Apache-Spark

Thapep / ApacheSpark

ZeroTwoDataRW / DE-Stream-Project-Random-Generated-User-Data

ayyankalu / Ipl_Data_analysis_Spark

Cyang18 / MusicProducer

sakshi2k / apache-spark-databricks-job

hassonor / kafka-spark-data-engineering

MarioInf-Master-CompuerScience-UCM / Gestion_datosInformacion

Punam918 / Real_Time_Voting_Analysis