Lacerdash / WebScrapping-Flight-Data

This repository contains Jupyter Notebooks for web scraping, transforming and loading flight data from 2 online travel companies.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web Scrapping Flight Data

This is a personal project to extract and compare flight prices from two different websites: Decolar.com and Passagens Promo. The objective is to be able to compare the prices between the two websites after extracting, transforming, and loading the data. The project is inspired by a real business need that I experienced and it aims to help me train my web scraping skills that I have studied for real-world projects.

Project Description

The project consists of 3 main .ipynb files:

Additionally, the repository includes the following files:

  • Fligh Data.xlsx: This file contains the final output data
  • Dim_iata.xlsx: This file contains a list of IATA codes for airports used by the .ipynb files
  • search_parameters.xlsx: This file contains randomly generated search parameters for the .ipynb files

Requirements

This project requires the following dependencies:

  • Python 3
  • Requests
  • Beautiful Soup 4
  • Pandas

Usage

To run this project, follow these steps:

  1. Clone the repository to your local machine:

    git clone https://github.com/Lacerdash/WebScrapping-Flight-Data.git
  2. Navigate to the repository directory:

    cd WebScrapping-Flight-Data
  3. Open the WebScrappingPassagens.ipynb file in a Jupyter notebook environment or your preferred IDE, and run the cells to execute the code.

  4. The output files will be saved in the output directory.

About

This repository contains Jupyter Notebooks for web scraping, transforming and loading flight data from 2 online travel companies.


Languages

Language:Jupyter Notebook 100.0%