Corion / WWW-Mechanize-Chrome

automate the Chrome browser

Home Page:https://metacpan.org/release/WWW-Mechanize-Chrome

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add infinite scroll function?

sdondley opened this issue · comments

I have created a wrapper for WMC (with Moose) that includes a method for scrolling down to the bottom of a page with an infinite scroll and then waits for more elements to load. I'm wondering if it might be useful to improve it and incorporate into the WMC module. One call to the function will cause the browser to scroll down to the bottom of the page (twice just to make sure it registers) and then return once it detects more elements have been loaded. Here it is along with its helper functions:

sub infinite_scroll {
  my $s = shift;
  my $wait_time = shift || 120;

  my $current_element_count = $s->get_element_count;
  $s->scroll_to_bottom;

  # wait 1/10th sec for more of the page to load
  usleep 100000;

  my $new_element_count = $s->get_element_count;

  my $start_time = time();
  while (($new_element_count - $current_element_count) < 10) {

    # wait for wait time
    if (time() - $start_time > $wait_time) {
      return 0;
    }

    # wait 1/10th sec for more of the page to load
    usleep 100000;
    $new_element_count = $s->get_element_count;
  }
  usleep 100000;
  return 1;
}

sub scroll_to_bottom {
  my $s = shift;
  $s->eval( 'window.scroll(0,document.body.scrollHeight + 100)' );
  usleep 100000;
  $s->eval( 'window.scroll(0,document.body.scrollHeight + 200)' );
}

sub get_element_count {
  my $s = shift;
  my ($el_count) = $s->eval( 'document.getElementsByTagName("*").length' );
  return $el_count;
}

One more thing, you can throw the infinite_scroll method into a while loop to scroll all the way to the bottom, like so:

while ($mech->infinite_scroll) {
  # you can put other tests here
  last if ...
}

This sounds very interesting and I think it should go into WWW::Mechanize::Chrome, or certainly at least the mechanics, like a ->scrollTo method (with "top", "bottom" as shorthands) and the element count functions.

I'm not sure if there is a better way to add a test if any new elements have been added to the infinite scroll part - maybe instead of ->get_element_count, there should/could be an arbitrary CSS / XPath selector whose number is counted. But for a start, the simple-minded approach of the overall number of elements is certainly good enough.

The usleep calls should be better calls to $mech->sleep() , as that will allow other parts of your program (if any) to run while you wait for the browser.

I have no idea yet how to write a good test suite for the infinite scroll functionality, but I would welcome it if you put that code into WMC, at least if you aren't too unhappy if I change the API around a bit...

Feel free to change it as you see fit. You've got more experience than me. I will post a patch later today.

I used usleep because I assumed if $mech->sleep used regular seconds for the argument. Please let me know if I'm wrong on that.

I can try to create a simple html page with the necessary javascript for simulating an infinite scroll. I'm sure there must be a simple library that can enable that.

That reminds me, I have also hacked together a function that will screenshot the entire page and stitch the images together into one giant png. It uses ImageMagick. I'll post that as a separate issue when I get some time.

Taking a complete screenshot should always work using ->content_as_png , but the documentation doesn't mention the word "screenshot" at all. If it doesn't work for you, then I consider that a bug in the module - I think stitching shouldn't be necessary...

Ah - I think stitching may still be necessary as ->content_as_png captures stuff according to the screen/window dimensions and not the page dimensions. But capturing a complete page should be possible through the API too, and still not need stitching. I hope. :)

Perhaps now that you fixed the ability to change the viewport size, the viewport can simply be changed to be at least the same height of the content and then a screenshot can be taken?

OK, I've got a basic HTML page that does infinite scroll in the test directory. I'm now trying to create a test to test. I need a little hand holding to help save me some time. So, I've looked at the other tests and copied one of them to get me started. But I'm unsure on how to run my individual test file from the command line.

I usually run the single tests as

perl -Ilib -w t/99-that-test.t

That way, I don't need to bother with make etc.

(this is to be run from the base git checkout directory)

ok, that worked. thanks.

And what does @instances do, exactly? Do I need an instance for each test? I'm a little unclear. Here is my test so far:

#!perl -w
use strict;
use Test::More;
use Log::Log4perl qw(:easy);

use WWW::Mechanize::Chrome;

use Test::HTTP::LocalServer;

use lib '.';
use t::helper;

Log::Log4perl->easy_init($ERROR);  # Set priority of root logger to ERROR

# What instances of Chrome will we try?
my $instance_port = 9222;
my @instances = t::helper::browser_instances();

if (my $err = t::helper::default_unavailable) {
    plan skip_all => "Couldn't connect to Chrome: $@";
    exit
} else {
    plan tests => 1*@instances;
};

sub new_mech {
    #use Mojolicious;
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        @_,
    );
};

my $server = Test::HTTP::LocalServer->spawn(
    #debug => 1
);

t::helper::run_across_instances(\@instances, $instance_port, \&new_mech, 9, sub {
    my ($browser_instance, $mech) = @_;

    isa_ok $mech, 'WWW::Mechanize::Chrome';
    $mech->autodie(1);

    $mech->get_local('76-infinite_scroll.html');
    $mech->allow('javascript' => 1);


});

Ah - the @instances are the collection of Chrome versions the tests are run under. I have a directory chrome-versions/ in which I have installed the various versions of Chrome and Chromium I run the complete test suite against.

By default, also the main version of Chrome as installed for the user is tested, so you don't need to set up anything special.

When creating a new Mechanize instance, pass headless => 0:

sub new_mech {
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        headless => 0,
        @_,
    );
};

I think I see what's going on. It's running the browser in @instances before it gets to &new_mech.

No, I got the order wrong. helper.pm by default sets headless, so you need to overwrite that:

sub new_mech {
    #use Mojolicious;
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        @_,
        headless => 0,
    );
};

That did the trick. Thanks!