Issues with using skip_request_errors
doutatsu opened this issue · comments
I am trying to use the configuration provided, to skip 404 errors, but instead, I am getting Runtime error raised. Perhaps this is the intended behaviour, but I was expecting to get false
or empty object, or something? Let me know if I misunderstood the functionality. Here is the configuration:
# frozen_string_literal: true
require 'kimurai'
module Spiders
class Test < Kimurai::Base
@name = 'test_spider'
@disable_images = true
@engine = :mechanize
@skip_request_errors = [
{ error: RuntimeError }
]
def parse(response, url:, data: {})
end
end
end
If I then run it with Spiders::Test.parse!(:parse, url: 'https://google.com/asdfsdf')
, I get back this error:
BrowserBuilder (mechanize): created browser instance
Browser: started get request to: https://google.com/asdfsdf
Browser: driver mechanize has been destroyed
Traceback (most recent call last):
2: from (irb):2
1: from (irb):2:in `rescue in irb_binding'
RuntimeError (Received the following error for a GET request to https://google.com/asdfsdf: '404 => Net::HTTPNotFound for https://google.com/asdfsdf -- unhandled response')
Am I doing something wrong or that's expected behaviour? I also tried this for the configuration:
{ error: RuntimeError, message: '404 => Net::HTTPNotFound' }
Hi @doutatsu,
The skip_request_errors
is a key to the @config
variable. Here's how to write it the right way:
module Spiders
class Test < Kimurai::Base
@name = "test_spider"
@engine = :mechanize
@config = {
disable_images: true,
skip_request_errors: [{ error: RuntimeError, message: "404 => Net::HTTPNotFound" }]
}
def parse(response, url:, data: {})
end
end
end
As Mechanize raises a RuntimeError with a more generic message, you can also write it this way:
skip_request_errors: [{ error: RuntimeError, message: "Received the following error" }]
I hope this helps!
I thought because you can do @disable_images = true
, you can write any of the configuration options in that way, by specifying the option as an instance variable. I'll try out and see if it works
@doutatsu , please check the README here https://github.com/vifreefly/kimuraframework#all-available-config-options . That's the reference for all config options. So like you see you cannot do @disable_images = true
, it will not work.
Thanks, @vifreefly, I didn't realise that. I misunderstood how the configuration works. Sorry for the trouble