MohitKumarMandhre / twitter-airflow-project

This is End-To-End Data Engineering Project using Airflow and Python. In this project, we will extract data using Twitter API, use python to transform data, deploy the code on Airflow/EC2 and save the final result on Amazon S3.

Home Page:https://github.com/MohitKumarMandhre/twitter-airflow-project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

twitter-airflow-project

  • Extracting data from Twitter

Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details, such as: Data encoding and decoding.This Twitter API gives developers access to almost all of Twitter's functionalities like likes, retweets, tweets, etc.

  • Use Python to extract data from API

  • Deploying code on EC2

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) Cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster.It provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.

  • Use Airflow for workflow management

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow's extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows.It allows you to take data from different sources, transform it into meaningful information, and load it to destinations like data lakes or data warehouses.

  • Store data into S3 bucket

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere.

  • Image gallery

Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize the flow of values between basic blocks, and to provide optimization techniques in the basic block.

Made with ๐Ÿ’– & ๐Ÿ”ฅ by MKM.

About

This is End-To-End Data Engineering Project using Airflow and Python. In this project, we will extract data using Twitter API, use python to transform data, deploy the code on Airflow/EC2 and save the final result on Amazon S3.

https://github.com/MohitKumarMandhre/twitter-airflow-project


Languages

Language:Python 100.0%