victor-soeiro / WebScraping-Projects

This repository contains web scraping projects using Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebScraping-Projects

Introduction

Over the years Web Scraping has become a personal hobby, a kind of challenge to practice my skills. Most of the projects done in this period were not distributed to the general public, so I decided to organize and publish them here on GitHub and the data on Kaggle.

The interest in Data Science encouraged me to use Web Scraping to analyze some data I was interested in, such as games and anime.

This repository will contain the code used for the data distributed in Kaggle, and also a step-by-step explanation of the process. Have fun with me as I venture into various sites with unstructured data.

Disclaimer: This repository is a personal project distributed under an MIT license to practice Web Scraping, distributing free data for people to do exploratory data analysis. I do not recommend using it for other purposes. Use at your own risk.

Tools

I exclusively use Python and some of its packages, like:

  • BeautifulSoup
  • Requests
  • CloudScraper

Remember, respect the request limit of the site to not cause any harm.

Projects Description

You can recommend me any site to be part of this project, just send me an e-mail with the site and the reason to be part of this repository.

Below are all the projects I have done with the links. I hope you have a lot of fun.

projects category github kaggle
01 anime-planet comics Link Link
02 tapas comics Link Link
03 toomics comics Link Link
04 jmlr articles Link Link
05 webtoons comics Link Link
06 afk-arena games Link
07 arknights games Link Link
08 justwatch streamings Link Multiple LinksĀ¹
09 funko pop collectibles Link Link
10 a24 movies
11
12
13
14

Ref. 1: Each streaming contains a link. Below is a list of all the streamings links:

streamings kaggle
hbo max Link
hulu Link
netflix Link
amazon prime Link
paramount Link
disney+ Link
crunchyroll Link
dark matter Link
rakuten viki Link

Licence

Copyright (c) 2022 Victor Soeiro

This project is licensed under the MIT License

Contact

If you have any questions or suggestions, send me an email to victor.soeiro.araujo@gmail.com

About

This repository contains web scraping projects using Python.

License:MIT License


Languages

Language:Jupyter Notebook 100.0%