vifreefly / kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why class instead of instances?

wasabigeek opened this issue · comments

Genuinely curious, it seems a bit unusual as it's not as straightforward to change the start_urls at runtime (if I understood correctly, class instance variables are not thread-safe, so if I change them at runtime, they might wreck havoc in something like Sidekiq?).

Do you happen to have a link read to the code at hand?

i am a casual user myself so I don't know much of the internals of kimurai. It did what I tried to do with it but there were a few strange warnings, so perhaps the API could be improved and warnings reduced.

I agree, it makes much more sense to write:

class GithubSpider < Kimurai::Base
  def initialize
    @start_urls = ["https://github.com/search?q=Ruby%20Web%20Scraping"]
  end

  def parse(response, url:, data: {})
    ...
  end
end

GithubSpider.new.crawl!

It could enable things like GithubSpider.new(start_urls: ["https://github.com/search?q=Ruby%20Web%20Scraping"]).crawl!