There are 10 repositories under big-data-processing topic.
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Flink SQL 实战 -中文博客专栏
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
GCP_Data_Enginner
R for Big Data (Chinese Version)
Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow
A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse
Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.
Hybrid time-series and block-column storage database engine written in Java
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
Implementation of algorithms for big data using python, numpy, pandas.
Introduction to Spark Batch processing.
Github Repository for a versatile usable Big Data infrastructure (AVUBDI)
Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups
Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.
全球电信资源分布不均衡指数刻画
rock-solid pillars for enterprise-grade solutions
excel, markdown, csv, sql 数据源批量/单独格式互相转换
Crack Detection model using yolov7
The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.
This Git repo showcases my analysis of Sparkify dataset with PySpark on Apache Spark cluster mode and JupyterLab on Docker. The goal was to identify at-risk customers and develop retention strategies. The analysis tested multiple machine learning models and uncovered insights into customer behavior and churn patterns.
MapReduce Job Development, RDDs Programming, Medical Data Management, Sales Analysis, And Efficient Data Integration For Big Data Analysis. Spark: Big Data Processing, SQOOP Integration, And Spark Structured Streaming For Real-Time Data.
A list of awesome big data testing frameworks, resources and other awesomeness.
SUTD 2021 50.043 Database and Big Data Systems Code Dump
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
Sentiment-Analysis-API
Collection of homework (mostly Spark-based) from the course "Big Data Computing" - University of Padua.
Track a Boat est un système de suivi maritime en temps réel utilisant Kafka, Spark Structured Streaming et WebSockets. Il permet de visualiser la position des navires, analyser leurs trajectoires et prévoir leurs destinations sur une carte interactive.
Big Data and AI Engineering bootcamp 2nd capstone project. Using Big Data Tools to predict the probability of university enrollment for Egypt's High School students. :school: :books: :microscope: