Problem when scraping data from theprotocol.it website

Question

Problem when scraping data from theprotocol.it website

SebastianSlezak opened this issue 4 months ago · comments

Short Description
Job listings from theprotocol.it do not scrape.

Detailed Description
The problem occurs every time we want to scrape data from theprotocol.it.
The bug was tested on the links:

For both links the error is the same, the jobs do not scrape and in the logs fly endlessly looped messages like the following:

Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=1
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=2
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=3
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=4
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=5
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=6
Found 53 job offers

Steps to Reproduction

Adding a link to "websites," for example theprotocol.it or the one that is given in the documentation link theprotocol.it from the documentation
Running the script

Expected Behavior
Scraping jobs from theprotocol.it

Additional Information
I was scraping job listings into a google sheet.
Scraping from other sites worked fine, only with theprotocol.it there was a loop and the listings would not add to the google sheet.

Kacper Włodarczyk · Answer 1 · Tue Apr 16 2024 01:10:19 GMT+0800 (China Standard Time)

This is a problem with pagination.

For example, go to this link https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw
and click on the first offer from the list and then copy the url

Offers view should looks like this

Sebastian Ślęzak · Answer 2 · Tue Apr 16 2024 02:23:24 GMT+0800 (China Standard Time)

After following the steps you indicated, everything scrapes correctly.