vifreefly / kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setting cookies will request a page twice

n-studio opened this issue · comments

When I set cookies, the spider will request the page twice instead of once. Is it an expected behavior or is it a bug? It also sleeps twice.

Without cookies

DEBUG -- : BrowserBuilder (selenium_chrome): created browser instance
DEBUG -- : BrowserBuilder (selenium_chrome): enabled before_request.delay
DEBUG -- : Browser: sleep 2.93 seconds before request...
DEBUG -- : BrowserBuilder (selenium_chrome): enabled custom user_agent
DEBUG -- : BrowserBuilder (selenium_chrome): enabled native headless_mode
INFO -- : Browser: started get request to: https://www.google.com
[DEPRECATION] :driver_path is deprecated. Use :service with an instance of Selenium::WebDriver::Service instead.
INFO -- : Browser: finished get request to: https://www.google.com
INFO -- : Info: visits: requests: 1, responses: 1

With cookies

DEBUG -- : BrowserBuilder (selenium_chrome): created browser instance
DEBUG -- : BrowserBuilder (selenium_chrome): enabled custom cookies
DEBUG -- : BrowserBuilder (selenium_chrome): enabled before_request.delay
DEBUG -- : Browser: sleep 1.63 seconds before request...
DEBUG -- : BrowserBuilder (selenium_chrome): enabled custom user_agent
DEBUG -- : BrowserBuilder (selenium_chrome): enabled native headless_mode
DEBUG -- : Browser: sleep 3.03 seconds before request... // <-- sleeps a second time
INFO -- : Browser: started get request to: https://www.google.com
[DEPRECATION] :driver_path is deprecated. Use :service with an instance of Selenium::WebDriver::Service instead.
INFO -- : Browser: finished get request to: https://www.google.com
INFO -- : Info: visits: requests: 1, responses: 1
DEBUG -- : Browser: driver.current_memory: 333616
INFO -- : Browser: started get request to: https://www.google.com // <-- makes another request
INFO -- : Browser: finished get request to: https://www.google.com
INFO -- : Info: visits: requests: 2, responses: 2

lib/kimurai/capybara_ext/session.rb

      # cookies
      # (Selenium only) if config.cookies present and browser was just created,
      # visit url_to_visit first and only then set cookies:

It seems to be the expected behavior, maybe due to selenium limitations?
Closing now.