teampoltergeist / poltergeist

A PhantomJS driver for Capybara

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Authz request header missing

andrijaperovic opened this issue · comments

Meta

Poltergeist Version:
1.16.0

Expected Behavior

Expect to be able to see full request headers when calling network_traffic.

Actual Behavior

Authorization header including bearer token is missing.

Steps to reproduce

Define a class WebScraper which we will use to execute a Capybara::Poltergeist::Driver object.

class WebScraper
  include Capybara::DSL
  Capybara.default_driver = :poltergeist
  Capybara.register_driver :poltergeist do |app|
    options = {
        :js_errors => true,
        :timeout => 120,
        :debug => false,
        :phantomjs_options => ['--load-images=no', '--disk-cache=false'],
        :inspector => true,
    }
    Capybara::Poltergeist::Driver.new(app, options)
  end
  Capybara.javascript_driver = :poltergeist

  def scrape
    yield page
  end

  def self.scrape(&block)
   new.scrape(&block)
  end
end

Pass a block to WebScraper in order to do some DOM manipulation:

WebScraper.scrape do |page|
  page.visit 'https://www.somehost.com/login'
  page.fill_in('email', :with => 'user@emaildomain.com')
  page.fill_in('password', :with => 'password')
  page.find('.login-button').click
  page.driver.clear_network_traffic
  page.driver.headers = { 'User-Agent'  => 'A user agent' }
  page.visit URL
  puts page.driver.network_traffic.as_json.select { |x| x.try(:[], "data").try(:[], "url").include?('https://www.somehost.com/api_request_we_are_interested_in') }
end

The output for this request should look something like this:

{"data"=>{"headers"=>[{"name"=>"Access-Control-Request-Method", "value"=>"GET"}, {"name"=>"Origin", "value"=>"https://www.host.com"}, {"name"=>"Referer", "value"=>"https://www.somehost.com/api_request_we_are_interested_in"}, {"name"=>"Access-Control-Request-Headers", "value"=>"api-version, accept, origin"}, {"name"=>"Accept", "value"=>"*/*"}, {"name"=>"Content-Length", "value"=>"0"}, {"name"=>"User-Agent", "value"=>"A user agent"}], "id"=>34, "method"=>"?", "time"=>"2017-10-19T20:12:05.016Z", "url"=>"https://www.somehost.com/api_request_we_are_interested_in"}, "response_parts"=>[], "error"=>nil}

If we observe this same request in the developer console in the browser for instance, we will see a number of request headers which are not actually included in our response. Is there filtering which happens at the header level in the poltergeist library?

Using phantomjs version 2.1.1.0.

No there is no filtering going on in Poltergeist. The request info passed to Poltergeist from PhantomJS and stored here - https://github.com/teampoltergeist/poltergeist/blob/master/lib/capybara/poltergeist/client/web_page.coffee#L89 - returned here - https://github.com/teampoltergeist/poltergeist/blob/master/lib/capybara/poltergeist/client/web_page.coffee#L198 - and converted to a ruby object here - https://github.com/teampoltergeist/poltergeist/blob/master/lib/capybara/poltergeist/browser.rb#L266. --- If you're not seeing something that you expect then PhantomJS isn't providing it and there's nothing Poltergeist can do about that.

Additionally click is not guaranteed to have completed actions it triggers when it returns, so in your example there is in fact no way to say that login ever actually happened since the second visit could occur before the form submission completes. You should add an expectation/assertion after the click to confirm login has completed based on a visible change on the page.

Thanks for the quick reply @twalpole. You are right, I should probably block and wait on an assertion for an element in the DOM after login completes. In this case, however, page.driver.network_traffic is in fact returning the network traffic which occurs after the login mechanism.

I ended up coming up with a solution using Browsermob::Proxy and Selenium::WebDriver, which is able to correctly capture the headers as HAR:

server = BrowserMob::Proxy::Server.new("#{Dir.pwd}/tools/browsermob-proxy/bin/browsermob-proxy") #=> #<BrowserMob::Proxy::Server:0x000001022c6ea8 ...>
  server.start

  proxy = server.create_proxy #=> #<BrowserMob::Proxy::Client:0x0000010224bdc0 ...>

  # WebDriver setup
  caps = Selenium::WebDriver::Remote::Capabilities.chrome(:proxy => proxy.selenium_proxy(:ssl))
  driver = Selenium::WebDriver.for(:chrome, :desired_capabilities => caps)

  # Navigate to login url and populate user credentials, submit form
  driver.navigate.to LOGIN_URL
  user_credentials = YAML.load(File.read('user_credentials.yml'))
  driver.find_element(:name, 'email').send_keys(user_credentials['username'] || ENV['USERNAME'])
  driver.find_element(:name, 'password').send_keys(user_credentials['password'] || ENV['PASSWORD'])
  driver.find_element(:css, "div.entrance-content > button").click

  # Wait for login step to complete
  wait = Selenium::WebDriver::Wait.new(:timeout => 10)
  wait.until {
    driver.find_element(:css, "div.home-header")
  }

  # Capture network traffic
  proxy.new_har('capture',:capture_headers => true)

  driver.navigate.to URL

  # Parse back Authorization bearer token 
  token = proxy.har.entries.map { |e| e.request.headers }.flatten.find { |h| h['name'] == 'Authorization'}['value']

  proxy.close
  driver.quit

  token

Hopefully PhantomJS can provide similar functionality in the future, and by way of PhantomJS, Poltergeist as well. Cheers.

@andrijaperovic PhantomJS has been abandoned - I wouldn't expect any new functionality from it in the future.