Avisakula123 / GroupIntoBatches

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache_Beam_python-GroupIntoBatches

Rohith Avisakula

Sub-topic : GroupIntoBatches

Prerequisites

  • Python
  • Apache beam
  • Google Colaboratory

Commands Used

  • Install apache beam using the below command.
pip install apache-beam
  • Next install the dependencies required using below command.
!pip install apache-beam[gcp,aws,test,docs]
  • The command that lists all the files.
! ls
  • First sign in to google drive account and google colab with same credentials and upload .csv file to google drive account.
  • Import .csv file into google colab.
# Code to read csv file into colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Autheticate E-Mail ID
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Get File from Drive using file-ID
# Get the file
downloaded = drive.CreateFile({'id':'1b73yN7MjGytqSP5wimYAQmtByOvGGe8Y'}) # replace the id with id of file you want to access
downloaded.GetContentF
  • Command for result
! cat results.txt-00000-of-00001

Screenshots for commands

  • For installation of apache beam.

  • For installing required dependencies and libraries.

  • Program for GroupIntoBatches.

  • For importing file into colobaratory.

  • For display of list of files.

  • For output of the file.

References

About