labrijisaad / Apache-beam-k-means

Implementing K-means clustering in sequential, streaming, and distributed formats using Apache Beam.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache Beam K-means: Big Data Class Project

Introduction 🌟

Explore and implement K-means clustering in sequential, streaming, and distributed modes using Apache Beam.

Project Objectives 🎯

  • Understanding K-means
  • Sequential K-Means in Python: Crafting a Python-based model.
  • Streaming K-means in Python: Tailoring for dynamic data.
  • Apache Beam for Scalability: Large dataset processing.

Results πŸ“ˆ

Check the insights in ./notebooks. Problem statement in ./docs.

Run in Colab πŸ’»

Access the notebook directly on Colab.

Project Architecture πŸ—οΈ

Architecture detailed using the Makefile.

Connect 🀝

About

Implementing K-means clustering in sequential, streaming, and distributed formats using Apache Beam.


Languages

Language:Jupyter Notebook 98.8%Language:Makefile 1.2%