DEENUU1 / job-scraper

🔎 Job offers scraper: bulldogjob.pl, indeed.com, it.pracuj.pl, jooble.org, justjoin.it, nofluffjob.com, olx.pl, pracuj.pl, theprotocol.it, useme.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem when scraping data from theprotocol.it website

SebastianSlezak opened this issue · comments

Short Description
Job listings from theprotocol.it do not scrape.

Detailed Description
The problem occurs every time we want to scrape data from theprotocol.it.
The bug was tested on the links:

  1. theprotocol.it 1
  2. theprotocol.it 2

For both links the error is the same, the jobs do not scrape and in the logs fly endlessly looped messages like the following:

Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=1
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=2
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=3
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=4
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=5
Found 53 job offers
Successfully visited https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw&pageNumber=6
Found 53 job offers

Steps to Reproduction

  1. Adding a link to "websites," for example theprotocol.it or the one that is given in the documentation link theprotocol.it from the documentation
  2. Running the script

Expected Behavior
Scraping jobs from theprotocol.it

Additional Information
I was scraping job listings into a google sheet.
Scraping from other sites worked fine, only with theprotocol.it there was a loop and the listings would not add to the google sheet.

This is a problem with pagination.

For example, go to this link https://theprotocol.it/filtry/trainee,assistant,junior;p/krakow;wp/zdalna,stacjonarna,hybrydowa;rw
and click on the first offer from the list and then copy the url

obraz

Offers view should looks like this

obraz

After following the steps you indicated, everything scrapes correctly.