dasdachs / wd-1-scraping-exercise

Educational project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intro to Python recap exercise

Important: this project is for educational purpose only. Any missuses or commercial use is forbidden.

NOTE:

  1. The docs assume that you use Pycharm as your editor/IDE of choice.
  2. The repository uses type hints. Type hints are optional and not needed. You can ignore them, they are there to give you a hint of the functions and for better IDE support.

What we will build: a cli application to scrape, store and update search items.

We will use some battle tested 3rd party libraries:

  • requests Making HTTP requests from our code
  • beautifulsoup Parsing HTML files
  • click For parsing (getting) commands and arguments from the command line
  • rich Formatting the output to stdout (command line or shell)
  • black Auto-code formatter for Python
  • pytests Writing and running tests

Project setup

The url of the project is https://github.com/dasdachs/wd-1-scraping-exercise.git

Via Pycharm

(from the official docs)

  • From the main menu, choose VCS | Get from Version Control.
  • In the Get from Version Control dialog, choose GitHub on the left.
  • Specify the URL of the repository that you want to clone. You can select a repository from the list of all GitHub projects associated with your account and the organization that your account belongs to
  • In the Directory field, enter the path to the folder where your local Git repository will be created.
  • Click Clone. If you want to create a project based on these sources, click Yes in the confirmation dialog. PyCharm will automatically set Git root mapping to the project root directory.

Via command line

git clone https://github.com/dasdachs/wd-1-scraping-exercise.git
cd wd-1-scraping-exercise
python -m venv venv
pip install -r requirements.txt
# OS specific command
venv\Scripts\activate.bat # Windows
venv/source/activate      # MacOS or Linux

Exercise

Once setup, this are the tasks

  1. Go to bolha.com and do some searches with the browsers development tools open. Now answer the following questions:

    • what (changes) happens when you do a search
    • where in the browser markup are the items
    • what are the properties of the items (think title, url, price etc.)
    • (BONUS) do the same analysis for multiple results aka. pagination
    • (BONUS) do the same analysis for filters, e.g. newest first

    Write the answers to paper or a text file.

  2. Open src/scaper.py and finish the functions

  3. Open src/models.py and finish the class

  4. Open cli.py and finish the functions

What's next? Expand cli arguments, add more data etc.

Formatting

Black is used to ensure code style consistency.

You can run black from the command line

black -v .

Or integrate it with Pycharm using this steps.

Tests

When you are done with a task run the test suite.

pytest -v

Or use Pycharm

Misc

Contact the author for bugs etc.

About

Educational project

License:MIT License


Languages

Language:HTML 90.7%Language:Python 9.2%Language:Makefile 0.1%