π£
ScrapeIt This is basically a webscraper that is capable of scraping a mobile websites for all details about mobile phones listed on their website.
The code has 3 parts :
- script
- spider
- convert
π
Script file This is the main file that processes the list of phones and get their respective links. As the site is dymanic, the script checks if a specific row number is loaded, if not it then it waits untillt the row number is loaded and then it scrapes the page for all links that lead to the phones specification page. Finally saves the data as a json file.
π·οΈ
Spider This uses the saved json file. It is of the format phone_name:phone_relative_link
. The spider uses this data to crawl into the various websites and saves the data as a dictionary. Finally, the dicts of all the phones is made into a list and saved it into a file
π
Convert This converts the saved file into a csv so that it can be used with more ease
Note the URL used is a public website, but is saved as a variable in the
secret.py
file
βοΈ
How to use: - Run
script.py
- Run
spider.py
- Run
convert.py
π₯
Contributing Help If you are really interested in contributing to the please follow the below steps and rules.
- Fork the project
π΄ (Starβ the repo before thatπ ) - Clone it.
https://github.com/<username>/ScrapeIt.git
- Look for any issues clicking the issues tab. Go through it and assign take one. Make sure you get assigned or atleast say that you are gonna work on it.
- Always create a new branch and work on the feature or bug. Check this if you are not that familiar with branching, Git Branching.
- If you are using any other module for implementing any new features, please install the modules in the virtual environment and update it in the
requirements.txt
by using the below command.
pip freeze > requirements.txt
If you have any doubts or issues, let the maintainers know about it. They would be ready to help.