teamcapybara / capybara

Acceptance test framework for web applications

Home Page:http://teamcapybara.github.io/capybara/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with Capybara Gem - Scraping Blocked on Indeed Site

ror-web-expert opened this issue · comments

Problem:
I'm facing an issue with my Rails application that involves scraping data from different sites using the Capybara gem. Everything works fine for most sites, but I'm encountering a problem specifically with Indeed.

Description:
When I attempt to scrape data from Indeed with the headless option set to true, I get blocked. However, when I set the headless option to false, the scraping works fine. Upon inspecting the screenshot generated by @session.save_screenshot, it clearly indicates that I've been blocked.

capybara-202401171752145506410800

Steps to Reproduce:

Set headless: true in browser options.
Attempt to scrape data from Indeed.
Observe the blocking issue.
Expected Behavior:
Scraping should work seamlessly with headless mode enabled, just as it does for other sites.

Environment:

Rails Version: 7
Capybara Version: 3.39.2
Nokogiri Version: 1.15.4-x86_64-linux

Additional Information:

Adding a proxy service did not resolve the issue.
The problem seems specific to the interaction between Indeed and Capybara with headless mode.

Workaround:

Setting headless: false resolves the blocking issue, but this is not an ideal solution.

Request for Assistance:
I'm seeking guidance on potential solutions or workarounds to enable headless scraping for Indeed without being blocked.
Any insights or recommendations would be greatly appreciated.

Thank you for your assistance!

I fail to see how this is an issue with Capybara. Capybara is a tool for testing web apps, not a scraping tool actively hiding itself from sites. The fact that you're using it abuse the terms of service of indeed is not something we can help you with.