Problem when scraping data from theprotocol.it website
SebastianSlezak opened this issue · comments
Short Description
Job listings from theprotocol.it do not scrape.
Detailed Description
The problem occurs every time we want to scrape data from theprotocol.it.
The bug was tested on the links:
For both links the error is the same, the jobs do not scrape and in the logs fly endlessly looped messages like the following:
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=1
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=2
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=3
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=4
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=5
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=6
Found 53 job offers
Steps to Reproduction
- Adding a link to "websites," for example theprotocol.it or the one that is given in the documentation link theprotocol.it from the documentation
- Running the script
Expected Behavior
Scraping jobs from theprotocol.it
Additional Information
I was scraping job listings into a google sheet.
Scraping from other sites worked fine, only with theprotocol.it there was a loop and the listings would not add to the google sheet.
This is a problem with pagination.
For example, go to this link https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw
and click on the first offer from the list and then copy the url
Offers view should looks like this
After following the steps you indicated, everything scrapes correctly.