Mansoorinho / DDA2022

Distributed Data Analytics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DDA2022

Distributed Data Analytics

Solution by Mansoor Nabawi.

  1. Linear Regression.
  2. Distributed Computing with Message Passing Interface (MPI), Exercise 1: Basic Parallel Vector Operations with MPI, Exercise 2: Parallel Matrix Vector multiplication using MPI, Exercise 2: Parallel Matrix Operation using MPI
  3. Complex Data Lab: Processing Text Data in a Distributed Setting. Exercise 1: Data cleaning and text tokenization. Exercise 2: Calculate Term Frequency (TF). Exercise 3: Calculate Inverse Document Frequency (IDF). Exercise 4: Calculate Term Frequency Inverse Document Fre- quency (TF-IDF) scores (5 points).
  4. Complex Data Lab: K-means clustering in a Distributed Setting. Distributed K-means Clustering.
  5. Distributed Machine Learning (Supervised)
  6. Preparing your Hadoop infrastructure. Setting up a Hadoop infrastructure.
  7. PyTorch Network Analysis.
  8. Image Classification, Normalization Effect, Network Regularization, Optimizers (CNN).
  9. Distributed Computing with Apache Spark.
  10. Implementing Parallel Stochastic Gradient Descent, PyTorch distributed execution

About

Distributed Data Analytics


Languages

Language:Jupyter Notebook 95.1%Language:Python 4.9%