How to parse pages with HTTP errors (403, 404)

Question

How to parse pages with HTTP errors (403, 404)

n-studio opened this issue 3 years ago · comments

I'm using mechanize.

Some pages with errors (403, 404...) still display valuable information, but the scraper will just retry or skip the page. Is there a way to treat error pages just like 200 pages?

n-studio · Answer 1 · Thu Sep 02 2021 03:33:05 GMT+0800 (China Standard Time)

I fixed my issue by configuring Capybara.

Capybara.configure do |config|
  config.raise_server_errors = false
end