vifreefly / kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

request_to method throws argument error for Ruby 3.0

nick-symon opened this issue · comments

Hello,

First, think you for maintaining this fantastic framework.

I set up a spider pretty much identically to the one in the README. I wrote a parse function with the same arguments as those specified in the README as well (response, url:, data: {})

In my first parse function I used the respond_to method to route urls to a second parse function, which had the same arguments as the first.

I got the following error: wrong number of arguments (given 2, expected 1; required keyword: url) (ArgumentError)

I'm running Ruby 3.0.1.

I believe there may be an issue with the use of keyword arguments in the request_to method related to Ruby 3.0. The spider works fine when I visit the url using the browser object and call the second parse function directly.

This appears to be similar to the related issue with rbcat

I'm relatively new to Ruby, so I apologize in advance for any inaccuracies!

Perhaps using a double splat operator to restructure the hash that request_to passes into the handler, which should typically accept keyword arguments based on the conventions in the readme file:

current:

def request_to(handler, url:, data: {})
  request_data = { url: url, data: data }

  browser.visit(url)
  public_send(handler, browser.current_response, request_data)
end

with splats:

def request_to(handler, url:, data: {})
  request_data = { url: url, data: data }

  browser.visit(url)
  public_send(handler, browser.current_response, **request_data)
end

The reason this is currently failing is because the 'parse' method as currently method has a required 'url' keyword argument. Using the 'request_to' method, the 'url' parameter is included in the 'request_data' hash which is sent to the parse method. Under Ruby 3, the last positional hash argument is no longer converted to keyword arguments, so the parse method throws an error as it's not receiving the required url keyword argument. To fix this under ruby 3, using the double splat to convert the hash into keyword arguments passed to parse will fix it.

commented

i keep getitng error of this kind on many kimurai's methods

@MarcoNaik thanks for the comment. I may create a pull request for this. Could you include some examples of which methods are throwing these errors?

commented

Just copying the example on the github I've had the same 'number of arguments method' with the 'save_to' method.

and cant get attributes values with [ ] like a[:href]

[]': no implicit conversion of Symbol into Integer (TypeError) from bazaar-scrapper.rb:22:in parse' from /home/marco/.local/share/gem/ruby/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in public_send' from /home/marco/.local/share/gem/ruby/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in request_to' from /home/marco/.local/share/gem/ruby/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:128:in block in crawl!' from /home/marco/.local/share/gem/ruby/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in each' from /home/marco/.local/share/gem/ruby/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in crawl!'

sorry idk how to post log kinda new to this github forum thing

Any progress on this? I am currently running into the same error

commented

I'm running into the same issue.

I ran into the same issue as well. Worked around it by running my script on Ruby 2.7.5 instead of Ruby 3.

commented

Likewise I'm getting the error wrong number of arguments (given 2, expected 1; required keyword: url) (ArgumentError) when trying to run the crawl/parse methods. I had issues with rbcat as well. Any plans to add Ruby 3.0 support or is there a fix I'm unaware of?

=== Full error ===

g:test_api/ $ ruby app/spiders/test_spider.rb                                                                                                                  [17:47:53]
I, [2021-12-17 17:48:01 +0900#15507] [M: 280]  INFO -- test_spider: Spider: started: test_spider
D, [2021-12-17 17:48:02 +0900#15507] [M: 280] DEBUG -- test_spider: BrowserBuilder (mechanize): created browser instance
I, [2021-12-17 17:48:02 +0900#15507] [M: 280]  INFO -- test_spider: Browser: started get request to: http://www.google.com
I, [2021-12-17 17:48:03 +0900#15507] [M: 280]  INFO -- test_spider: Browser: finished get request to: http://www.google.com
I, [2021-12-17 17:48:03 +0900#15507] [M: 280]  INFO -- test_spider: Info: visits: requests: 1, responses: 1
I, [2021-12-17 17:48:03 +0900#15507] [M: 280]  INFO -- test_spider: Browser: driver mechanize has been destroyed
F, [2021-12-17 17:48:03 +0900#15507] [M: 280] FATAL -- test_spider: Spider: stopped: {:spider_name=>"test_spider", :status=>:failed, :error=>"#<ArgumentError: wrong number of arguments (given 2, expected 1; required keyword: url)>", :environment=>"development", :start_time=>2021-12-17 17:48:01.404891 +0900, :stop_time=>2021-12-17 17:48:03.8518 +0900, :running_time=>"2s", :visits=>{:requests=>1, :responses=>1}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
app/spiders/test_spider.rb:16:in `parse': wrong number of arguments (given 2, expected 1; required keyword: url) (ArgumentError)
  from /Users/g/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `public_send'
  from /Users/g/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:204:in `request_to'
  from /Users/g/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:128:in `block in crawl!'
  from /Users/g/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in `each'
  from /Users/g/.rbenv/versions/3.0.2/lib/ruby/gems/3.0.0/gems/kimurai-1.4.0/lib/kimurai/base.rb:124:in `crawl!'
  from app/spiders/test_spider.rb:91:in `<main>'

I'm facing the same problem as well