krishna-kimo / get_url_data

Get Url Data is a cloud function as the pre processing step for Content Processing Pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FETCH URL DATA


  • This is the first step of the content processing pipeline.
  • The project is a cloud function which takes the url as a input from the Pub/Sub
  • The content for the URL would be fetched and cleaned and taken and if the process had data, it would be pushed to Pub/Sub to be picked by the content processing beam pipeline
  • If the data is invalid or has an error. the data with the corresponding reason for the error would be logged for further analysis

About

Get Url Data is a cloud function as the pre processing step for Content Processing Pipeline


Languages

Language:Jupyter Notebook 97.5%Language:Python 2.0%Language:Shell 0.4%Language:Dockerfile 0.1%