Monitoring 'Secondary' Network Traffic

Question

Monitoring 'Secondary' Network Traffic

tomgallagher opened this issue 8 years ago · comments

Tom Gallagher commented 8 years ago

Hello again

I'm trying to get accurate network traffic stats when I visit a site.

My broad question is: when does the network traffic monitor start and stop? And what kind of requests is it monitoring?

If I do this,

@session.driver.network_traffic.each do |request|
     request.response_parts.uniq(&:url).each do |response|
           puts "\n Response URL #{response.url}: \n Status #{response.status}"
          @response_array.push(response)
     end
end

my understanding was that this would be able to collect all the activity on the page, a bit like the network tab in Chome dev tools.

However, this command does not seem to capture any network traffic that occurs as a result of scripts on the page.

If I wait for the page load event and then count the images on the page using this script

wait_script =   "waitForComplete();
                            function waitForComplete() {
                            
                                switch (document.readyState) {
                                  case 'loading':
                                    setTimeout(function(){ waitForComplete(); }, 1000);
                                    console.log('Waiting....');
                                    break;
                                  case 'interactive':
                                    setTimeout(function(){ waitForComplete(); }, 1000);
                                    console.log('Waiting....');
                                    break;
                                  case 'complete':
                                    return true;
                                    break;
                                }
                            
                            }"
            
marker = @session.evaluate_script(wait_script)

then I get all the images on the page using this:

image_array_script = "  getImageSetString();
                                    function getImageSetString() {
                                    var elementSet = document.getElementsByTagName('img');
                                    var imgSrcs = [];
                                    for (var i = 0; i < elementSet.length; i++) {
                                        var checkString = elementSet[i].src.toString();
                                        if (checkString.indexOf('chrome-extension://') == -1) {
                                            imgSrcs.push(elementSet[i].src);
                                        }
                                    }
                                    var tags = document.getElementsByTagName('*');
                                    var numTags = tags.length;
                                    for (var i = 0; i < numTags; i++) {
                                        var tag = tags[i];
                                        var backgroundImageRef = window.getComputedStyle(tag, null).getPropertyValue('background-image');
                                        if (backgroundImageRef !== 'undefined' && backgroundImageRef !== 'none' && backgroundImageRef.indexOf('chrome-extension://') == -1 ) {
                                            var trimmedImageRef = backgroundImageRef.substring(4, backgroundImageRef.length - 2);
                                            if (imgSrcs.indexOf(trimmedImageRef) === -1) {
                                                imgSrcs.push(trimmedImageRef);
                                            }
                                        }
                                    }
                                    var imageSetString = JSON.stringify(imgSrcs);
                                    return imageSetString; }   "
            
@image_array = JSON.parse(@session.evaluate_script(image_array_script))

Then the number of images is different from the number of image requests from network traffic.

Is there something obvious I'm doing wrong?

Thanks

Tom

Thomas Walpole · Answer 1 · Wed Feb 01 2017 02:22:40 GMT+0800 (China Standard Time)

Calling evaluate_script with wait_script doesn't actually wait for anything - since the JS returns immediately (evaluate_script won't wait for the setTimeouts to occur). The network traffic is filled in by callbacks from PhantomJS, so whenever PhantomJS starts a load/receives content of a resource the network_traffic is updated

Tom Gallagher · Answer 2 · Wed Feb 01 2017 03:44:44 GMT+0800 (China Standard Time)

Hi Thomas

Thanks a lot for getting back.

So can I confirm: there's no reason for Poltergeist/PhantomJS to miss any resourceRequested events? Any kind of request is treated the same. From script or html.

So the obvious next question is: given that the traffic log is frozen from the point at which I code

@session.driver.network_traffic.each do |request|

is there any way to get Phantom to wait until images are loaded or all the secondary network requests have finished? I have thought about passing the images src attributes and then getting Capybara to wait on their appearance...but this seemed like a long way around the problem.

Thanks again

Tom

Thomas Walpole · Answer 3 · Wed Feb 01 2017 06:08:56 GMT+0800 (China Standard Time)

@tomgallagher There is no reason for Poltergeist to miss any resourceRequested events reported by PhantomJS - I can't however state that PhantomJS will report all (I havent dove deep enough in PhantomJS), and if you're attempting to use the PhantomJS 2.5 beta that reports redirects differently so some may get missed. Checking that all 'img' elements are loaded should be doable with something like

expect(page).not_to have_css('img') { |img| !img['complete'] } # caveat: I haven't tried that but it should be close

which should wait/retry until there are no 'img' elements that don't yet have the 'complete' property set. If using a lot it would probably be better to implement as a custom selector in Capybara.

For images used for backgrounds I'm not really sure, although it's probably possible to get the network_traffic and check that all responses have a stage of 'end' - if not sleep a bit and then get network_traffic again.

Tom Gallagher · Answer 4 · Wed Feb 01 2017 18:37:10 GMT+0800 (China Standard Time)

OK, thanks for the feedback. Capybara's auto wait function the way forward then. I suppose I could use a Xpath count as well, as I know how many images there should be.

Thomas Walpole · Answer 5 · Thu Feb 02 2017 02:47:31 GMT+0800 (China Standard Time)

@tomgallagher If you know how many images you're waiting for, something like the following should work - note that you need to use the filter block because you need the complete property - there is no complete attribute

expect(page).to have_css('img', count: expected_count) { |img| img['complete'] }  # could also specify :minimum instead of :count