annie0sc / practice-flink-wordcount

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hello! Welcome!

About

  • This is my experiment for my big-data assignment. I am working on a Covid dataset by running word-count algorithms coded in python and json.

  • Basically an analysis on the correlation between Weekly-covid-cases and deaths-weekly by counting.

Tech Used:

  • Apache-Flink
  • Apache-Flink-PyFlink
  • Python
  • Google Collab

Install & Execute Pyflink

Locally:

  1. Install Python 3.8.0
choco install python --version=3.8.0
  1. Check for version in cmd
python --version
  1. Install Flink in Python using pip command.
python -m pip install apache-flink
  1. If a warning is dispalyed, update pip.
python -m pip install --upgrade pip
  1. Store your python code in a .py file and run it using the following command:
python filename.py
  1. To store the output on a file:
python filename.py --output output.txt

TIPS:

Make sure all previous or advanced versions of python and flink are removed from the system so that the installations run smoothly.

Google Colab

  1. Install Colaboratory in Google Drive.
  2. Create new Colab file.
  3. Install Flink
!pip install apache-flink
  1. Connect the resources(RAM and Disk) to hosted runtime.
  2. Run python code cell by clicking:
enter + shift 

Demonstrative Video

https://use.vg/hboCoj

Resources/References

  1. Kaggle Dataset on Covid-19
  2. Code Forked from uuboyscy
  3. Apache Flink Example
  4. Apache Flink Table Example
  5. Main Group Repository

About


Languages

Language:Python 100.0%