Omar ElMaria's repositories
skyscanner_crawler
This repo contains a Python script that crawls 5120 flight routes from the popular flight aggregator Skyscanner
google_trends_py_script
This repo contains a script that pulls Google Trends data from the Pytrends library and plots the popularity of several keywords
motorcycle_importing_cost_analysis
This repo contains a Python script that uses Scrapy to scrape motorcycle attributes off of a Polish website and enter them into an online importing cost estimation tool using Selenium
scrapy_playwright_with_proxy_service
This repo contains the source code showing how to integrate a Proxy service (ScraperAPI) with Scrapy Playwright. The repo has two spiders, one for quotestocrape.com and the other for httpbin.org/ip
switchback_test_dag
This repo contains a data pipeline composed of Python and Big Query scripts that extract, clean, and aggregate data, as well as perform statistical significance tests. The code is fully orchestrated on Airflow and feeds a Tableau dashboard that displays the success metrics of surge pricing switchback experiments.
wafid_crawling_bot
This repo contains a Python script that tracks the availability of medical appointments on https://wafid.com/medical-status-search/ in the UAE
airflow_local
This repo contains the DAGs that run on my local Airflow environment. I use the local environment to test my DAGs before deploying them to virtual machines via Kubernetes
anmeldung_bot
This is a bot that notifies the user of available Anmledung (i.e., appointment registration) appointments in Berlin, Germany
bq_routine_asa_and_scheme_versioning
[Delivery Hero] This is a BigQuery routine to detect changes in the ASA or scheme configuration throughout a particular time period. It lets you know which changes occurred and how many times they took place. This helps us troubleshoot data problems in experiments.
dfp_spielerbeurteilung_dashboard
This repo contains a Jupyter notebook that cleans data in CSV files and publishes the data to BigQuery so it gets fed to a Looker Studio dashboard
elasticity_test_analysis
This repo contains a Jupyter notebook that analyzes the order and CVR of various price elasticity tests in Thailand
gpt_python_bootcamp
This repo contains the materials used in the GPT Python bootcamp
indeed_crawler_mod
This script scrapes job listings on Indeed, a popular job platform. The code was modified to work on a Windows VM
latam_airlines_scraper_selenium
This repo contains a Python script that opens the website of LATAM airlines, inputs some parameters in the flight search fields and scrapes some data off of the page using Python selenium
loved_brands_automation_local
This repo contains an algorithm that identifies vendors whose customers have a higher willingness to pay. The inherent inelasticity of these vendors is utilized as part of a price differentiation strategy called "Loved Brands"
loved_brands_match_percentage_analysis
This repo contains a Python script that computes the match percentage of Loved Brands and Non-Loved Brands between different pipeline run dates
natural-earth-vector
A global, public domain map dataset available at three scales and featuring tightly integrated vector and raster data.
permanent_residence_appointment_finder
This repo contains a Selenium script that automatically checks for Consultation appointments on the Volkshochschule Berlin Mitte Website (https://vhsmitte.flexappoint.de/#/). This website is used to book appointments for the "Leben in Deutschland" test, which is a prerequisite for obtaining the permanent residence or citizenship in Germany
randomization_algorithm_analysis
A Python script to analyze if the variant allocation algorithm produces any bias in switchback tests
wine_and_real_estate_listings_r_scraper
This repo contains two Rmd files. The first file scrapes wine listings under the brand name "mövenpick" using the rvest package. The second scrapes Javascript-rendered apartment listings on the Swiss real estate website (homegate.ch) using RSelenium
wolt_crawler
This repo contains a Python Selenium script that scrapes the restaurant name, subtitle, delivery fee, and promised order time from the restaurants listing page of Wolt (https://wolt.com/en/discovery/restaurants)