big-data-processing

There are 10 repositories under big-data-processing topic.

drshahizan / BDM
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
big-data big-data-analytics big-data-architecture big-data-processing
Language:Jupyter Notebook 58
dlt-with-debug
souvik-databricks / dlt-with-debug
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark
Language:Python 49
airscholar / FlinkCommerce
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
apache-flink big-data big-data-processing python realtime-streaming
Language:Java 44
eskimo
eskimo-sh / eskimo
Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.
bigdata big-data-platform big-data big-data-projects big-data-analytics big-data-processing webconsole cluster-management spark elasticsearch kafka gluster glusterfs zeppelin cerebro kibana flink kubernetes kubernetes-cluster kubernetes-setup
Language:Java 26
felipefrizzo / terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
terraform terraform-aws terraform-provider kinesis-firehose cloudwatch-logs parquet big-data big-data-processing etl-job analytics
Language:HCL 26
StarPlatinumStudio / Flink-SQL-Practice
Flink SQL 实战 -中文博客专栏
apache-flink stream-processing big-data-processing sql
Language:Java 16
giucris / yasp
Yet Another SPark Framework
big-data big-data-processing elt etl etl-framework etl-pipeline framework scala spark sparksql
Language:Scala 10
pyajs / veronica
big data processing and machine learning platform，just like useing sql
big-data-processing machine-learning-platform pyspark python3 sql xql
Language:Python 10
impresso / impresso-text-acquisition
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
historical-newspapers big-data-processing impresso-project
Language:Jupyter Notebook 9
anjijava16 / GCP_Data_Enginner_Utils
GCP_Data_Enginner
bigquery big-data-processing gcp gcp-storage pubsub dataflow python scala dataproc dataproc-cluster shell-script pyspark notebook
Language:Shell 8
hope-data-science / R4BD
R for Big Data (Chinese Version)
big-data r big-data-analytics-techniques big-data-processing
Language:R 8
bdnf / BigData-Engineering-Projects
Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow
airflow big-data-analytics big-data-processing cassandra data-lake data-warehouse redshift spark
Language:Jupyter Notebook 6
tabletop-labs / tabletop
A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse
elasticscaling kafka stream-processing big-data big-data-analytics big-data-processing modern-data-stack real-time microservices timetravel semi-structured-cloud-warehouse
Language:Go 5
theGuyWithBlackTie / electricChargingStations
big-data electric-vehicles charging-stations spark-ml big-data-processing
Language:Jupyter Notebook 4
vvittis / FlinkSampling
Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.
stratum streaming-tuples reservoir-sampling topic java streaming-data big-data-analytics big-data-processing apache-flink group-by sampling
Language:Java 4
akardapolov / dimension-db
Hybrid time-series and block-column storage database engine written in Java
big-data-processing column-store dbms java sql time-series
Language:Java 3
chandnii7 / Big-Data-Processing-Pipeline
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
twitter-api kafka kafka-producer kafka-consumer kafka-streaming zookeeper mongodb tableau big-data big-data-processing data-processing-pipelines data-analytics nosql-database data-visualization
Language:Python 3
Big-Data-Algorithms
kochlisGit / Big-Data-Algorithms
Implementation of algorithms for big data using python, numpy, pandas.
lsh lsh-algorithm min-hasing shingling python similar-items pcy a-priori multistage-pcy multihash-pcy frequent-itemsets frequent-itemset-mining bloom-filter stream-mining streams big-data-processing
Language:Python 3
mtumilowicz / big-data-scala-spark-batch-workshop
Introduction to Spark Batch processing.
spark spark-sql batch-processing big-data big-data-processing workshop workshop-materials
Language:Scala 3
software-competence-center-hagenberg / AVUBDI
Github Repository for a versatile usable Big Data infrastructure (AVUBDI)
kafka spark docker-compose docker big-data-processing big-data-platform docker-swarm template-project
Language:Shell 3
VladOnMyOwn / ctr-poisson-bootstrap
Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups
ab-test ab-testing ab-tests bootstrap statistical-tests statistics poisson-bootstrap big-data big-data-processing python click-through-rate
Language:Jupyter Notebook 3
Anirban166 / Big-Data-ft.-Genomics
Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.
big-data-processing bioinformatics data-structures-and-algorithms genomic-sequences
Language:C++ 2
chuanting / imbalance_index
全球电信资源分布不均衡指数刻画
6g big-data-processing digital-divide connect-the-unconnected
Language:HTML 2
IncredibleProgress / sweetheart.py
rock-solid pillars for enterprise-grade solutions
ubuntu rethinkdb rust-lang tailwindcss py-script jupyter big-data-processing python rhel vue nginx-unit
Language:Python 2
JamesHanZhang / table-data-format-transform-app
excel, markdown, csv, sql 数据源批量/单独格式互相转换
big-data-processing csv-to-excel csv-to-sql data-cleaning-pipeline data-preprocessing easy-to-use etl-framework excel-to-md multifileupload
Language:Python 2
zaid-24 / Crack-Detection-using-CNN
Crack Detection model using yolov7
big-data-processing cnn python pytorch yolov7
Language:Jupyter Notebook 2
adnanrahin / NFL-Big-Data-Bowl-2022
The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.
spark scala big-data big-data-processing rdd spark-sql
Language:Scala 1
alessiococchieri / BDA-project-sparkify
This Git repo showcases my analysis of Sparkify dataset with PySpark on Apache Spark cluster mode and JupyterLab on Docker. The goal was to identify at-risk customers and develop retention strategies. The analysis tested multiple machine learning models and uncovered insights into customer behavior and churn patterns.
churn-analysis churn-prediction machine-learning pyspark spark sparkify apache-spark big-data big-data-analytics big-data-processing
Language:Jupyter Notebook 1
Ayoub-etoullali / Activites-Pratiques-BigData
MapReduce Job Development, RDDs Programming, Medical Data Management, Sales Analysis, And Efficient Data Integration For Big Data Analysis. Spark: Big Data Processing, SQOOP Integration, And Spark Structured Streaming For Real-Time Data.
apache-spark big-data-analysis big-data-processing data-integration mapreduce mapreduce-java mapreduce-jobs medical-data-management pyspark rdds real-time real-time-data real-time-database sales-analysis spark spark-structured-streaming sqoop sqoop-export sqoop-import
Language:Java 1
isandratskiy / awesome-bigdata-testing
A list of awesome big data testing frameworks, resources and other awesomeness.
testing awesome big-data data awesome-list big-data-processing big-data-testing automation automation-testing big-data-automation database
1
jamestiotio / dbsys
SUTD 2021 50.043 Database and Big Data Systems Code Dump
database database-management database-management-system database-systems database-design big-data big-data-processing big-data-analytics sql nosql etl spark pyspark hadoop hdfs mapreduce b-plus-tree zeppelin data-analysis data-analytics
Language:Java 1
levindoneto / pandas-simple-csv-parser
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
csv parser pandas-datareader data-manipulation conda-environment big-data-processing pandas-dataframes
Language:Python 1
louiecai / Sentiment-Analysis-API
Sentiment-Analysis-API
big-data-processing deep-learning lstm-neural-networks machine-learning neural-network nlp rnn-pytorch sentiment-analysis
Language:Python 1
lucamoroz / BigDataComputing-UniPD
Collection of homework (mostly Spark-based) from the course "Big Data Computing" - University of Padua.
big-data-processing java spark
Language:Java 1
Mileristovski / DataEngineer-SparkStreaming
Track a Boat est un système de suivi maritime en temps réel utilisant Kafka, Spark Structured Streaming et WebSockets. Il permet de visualiser la position des navires, analyser leurs trajectoires et prévoir leurs destinations sur une carte interactive.
big-data-processing distributed-computing docker docker-compose kafka kafka-topics maritime-data pipeline python scala spark-structured-streaming websocket
Language:Jupyter Notebook 1
RghdE / CapstoneTwo_EducationalLandscape
Big Data and AI Engineering bootcamp 2nd capstone project. Using Big Data Tools to predict the probability of university enrollment for Egypt's High School students. :school: :books: :microscope:
apache-pig big-data big-data-analytics big-data-processing big-data-projects big-data-visualization data-science machine-learning pyspark
Language:Jupyter Notebook 1

big-data-processing

drshahizan / BDM

souvik-databricks / dlt-with-debug

airscholar / FlinkCommerce

eskimo-sh / eskimo

felipefrizzo / terraform-aws-kinesis-firehose

StarPlatinumStudio / Flink-SQL-Practice

giucris / yasp

pyajs / veronica

impresso / impresso-text-acquisition

anjijava16 / GCP_Data_Enginner_Utils

hope-data-science / R4BD

bdnf / BigData-Engineering-Projects

tabletop-labs / tabletop

theGuyWithBlackTie / electricChargingStations

vvittis / FlinkSampling

akardapolov / dimension-db

chandnii7 / Big-Data-Processing-Pipeline

kochlisGit / Big-Data-Algorithms

mtumilowicz / big-data-scala-spark-batch-workshop

software-competence-center-hagenberg / AVUBDI

VladOnMyOwn / ctr-poisson-bootstrap

Anirban166 / Big-Data-ft.-Genomics

chuanting / imbalance_index

IncredibleProgress / sweetheart.py

JamesHanZhang / table-data-format-transform-app

zaid-24 / Crack-Detection-using-CNN

adnanrahin / NFL-Big-Data-Bowl-2022

alessiococchieri / BDA-project-sparkify

Ayoub-etoullali / Activites-Pratiques-BigData

isandratskiy / awesome-bigdata-testing

jamestiotio / dbsys

levindoneto / pandas-simple-csv-parser

louiecai / Sentiment-Analysis-API

lucamoroz / BigDataComputing-UniPD

Mileristovski / DataEngineer-SparkStreaming

RghdE / CapstoneTwo_EducationalLandscape