Add infinite scroll function?

Question

Add infinite scroll function?

sdondley opened this issue 6 years ago · comments

I have created a wrapper for WMC (with Moose) that includes a method for scrolling down to the bottom of a page with an infinite scroll and then waits for more elements to load. I'm wondering if it might be useful to improve it and incorporate into the WMC module. One call to the function will cause the browser to scroll down to the bottom of the page (twice just to make sure it registers) and then return once it detects more elements have been loaded. Here it is along with its helper functions:

sub infinite_scroll {
  my $s = shift;
  my $wait_time = shift || 120;

  my $current_element_count = $s->get_element_count;
  $s->scroll_to_bottom;

  # wait 1/10th sec for more of the page to load
  usleep 100000;

  my $new_element_count = $s->get_element_count;

  my $start_time = time();
  while (($new_element_count - $current_element_count) < 10) {

    # wait for wait time
    if (time() - $start_time > $wait_time) {
      return 0;
    }

    # wait 1/10th sec for more of the page to load
    usleep 100000;
    $new_element_count = $s->get_element_count;
  }
  usleep 100000;
  return 1;
}

sub scroll_to_bottom {
  my $s = shift;
  $s->eval( 'window.scroll(0,document.body.scrollHeight + 100)' );
  usleep 100000;
  $s->eval( 'window.scroll(0,document.body.scrollHeight + 200)' );
}

sub get_element_count {
  my $s = shift;
  my ($el_count) = $s->eval( 'document.getElementsByTagName("*").length' );
  return $el_count;
}

Steve Dondley commented 6 years ago

See #16

Steve Dondley · Answer 1 · Sat Jul 07 2018 01:27:16 GMT+0800 (China Standard Time)

One more thing, you can throw the infinite_scroll method into a while loop to scroll all the way to the bottom, like so:

while ($mech->infinite_scroll) {
  # you can put other tests here
  last if ...
}

Max Maischein · Answer 2 · Sat Jul 07 2018 01:38:50 GMT+0800 (China Standard Time)

This sounds very interesting and I think it should go into WWW::Mechanize::Chrome, or certainly at least the mechanics, like a ->scrollTo method (with "top", "bottom" as shorthands) and the element count functions.

I'm not sure if there is a better way to add a test if any new elements have been added to the infinite scroll part - maybe instead of ->get_element_count, there should/could be an arbitrary CSS / XPath selector whose number is counted. But for a start, the simple-minded approach of the overall number of elements is certainly good enough.

The usleep calls should be better calls to $mech->sleep() , as that will allow other parts of your program (if any) to run while you wait for the browser.

I have no idea yet how to write a good test suite for the infinite scroll functionality, but I would welcome it if you put that code into WMC, at least if you aren't too unhappy if I change the API around a bit...

Steve Dondley · Answer 3 · Sat Jul 07 2018 01:49:53 GMT+0800 (China Standard Time)

Feel free to change it as you see fit. You've got more experience than me. I will post a patch later today.

I used usleep because I assumed if $mech->sleep used regular seconds for the argument. Please let me know if I'm wrong on that.

I can try to create a simple html page with the necessary javascript for simulating an infinite scroll. I'm sure there must be a simple library that can enable that.

Steve Dondley · Answer 4 · Sat Jul 07 2018 01:52:57 GMT+0800 (China Standard Time)

That reminds me, I have also hacked together a function that will screenshot the entire page and stitch the images together into one giant png. It uses ImageMagick. I'll post that as a separate issue when I get some time.

Max Maischein · Answer 5 · Sat Jul 07 2018 01:55:40 GMT+0800 (China Standard Time)

Taking a complete screenshot should always work using ->content_as_png , but the documentation doesn't mention the word "screenshot" at all. If it doesn't work for you, then I consider that a bug in the module - I think stitching shouldn't be necessary...

Max Maischein · Answer 6 · Sat Jul 07 2018 01:57:22 GMT+0800 (China Standard Time)

Ah - I think stitching may still be necessary as ->content_as_png captures stuff according to the screen/window dimensions and not the page dimensions. But capturing a complete page should be possible through the API too, and still not need stitching. I hope. :)

Steve Dondley · Answer 7 · Sat Jul 07 2018 01:58:11 GMT+0800 (China Standard Time)

Does content_as_png screenshot everything outside of the viewport though?

…

On Fri, Jul 6, 2018 at 1:55 PM Max Maischein ***@***.***> wrote: Taking a complete screenshot should always work using ->content_as_png , but the documentation doesn't mention the word "screenshot" at all. If it doesn't work for you, then I consider that a bug in the module - I think stitching shouldn't be necessary... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACY-fI1H8zs5_CgwpwSdpswxJdfcZH5rks5uD6ScgaJpZM4VFyI3> .

Steve Dondley · Answer 8 · Sat Jul 07 2018 02:01:23 GMT+0800 (China Standard Time)

Perhaps now that you fixed the ability to change the viewport size, the viewport can simply be changed to be at least the same height of the content and then a screenshot can be taken?

Steve Dondley · Answer 9 · Sat Jul 07 2018 04:17:55 GMT+0800 (China Standard Time)

OK, I've got a basic HTML page that does infinite scroll in the test directory. I'm now trying to create a test to test. I need a little hand holding to help save me some time. So, I've looked at the other tests and copied one of them to get me started. But I'm unsure on how to run my individual test file from the command line.

Max Maischein · Answer 10 · Sat Jul 07 2018 04:21:20 GMT+0800 (China Standard Time)

I usually run the single tests as

perl -Ilib -w t/99-that-test.t

That way, I don't need to bother with make etc.

Max Maischein · Answer 11 · Sat Jul 07 2018 04:21:48 GMT+0800 (China Standard Time)

(this is to be run from the base git checkout directory)

Steve Dondley · Answer 12 · Sat Jul 07 2018 04:22:48 GMT+0800 (China Standard Time)

ok, that worked. thanks.

Steve Dondley · Answer 13 · Sat Jul 07 2018 04:25:37 GMT+0800 (China Standard Time)

And what does @instances do, exactly? Do I need an instance for each test? I'm a little unclear. Here is my test so far:

#!perl -w
use strict;
use Test::More;
use Log::Log4perl qw(:easy);

use WWW::Mechanize::Chrome;

use Test::HTTP::LocalServer;

use lib '.';
use t::helper;

Log::Log4perl->easy_init($ERROR);  # Set priority of root logger to ERROR

# What instances of Chrome will we try?
my $instance_port = 9222;
my @instances = t::helper::browser_instances();

if (my $err = t::helper::default_unavailable) {
    plan skip_all => "Couldn't connect to Chrome: $@";
    exit
} else {
    plan tests => 1*@instances;
};

sub new_mech {
    #use Mojolicious;
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        @_,
    );
};

my $server = Test::HTTP::LocalServer->spawn(
    #debug => 1
);

t::helper::run_across_instances(\@instances, $instance_port, \&new_mech, 9, sub {
    my ($browser_instance, $mech) = @_;

    isa_ok $mech, 'WWW::Mechanize::Chrome';
    $mech->autodie(1);

    $mech->get_local('76-infinite_scroll.html');
    $mech->allow('javascript' => 1);


});

Max Maischein · Answer 14 · Sat Jul 07 2018 04:32:28 GMT+0800 (China Standard Time)

Ah - the @instances are the collection of Chrome versions the tests are run under. I have a directory chrome-versions/ in which I have installed the various versions of Chrome and Chromium I run the complete test suite against.

By default, also the main version of Chrome as installed for the user is tested, so you don't need to set up anything special.

Steve Dondley · Answer 15 · Sat Jul 07 2018 04:48:20 GMT+0800 (China Standard Time)

Thanks. And how can I run the test in non-headless mode so I can see what the browser is doing?

…

On Fri, Jul 6, 2018 at 4:32 PM Max Maischein ***@***.***> wrote: Ah - the @instances are the collection of Chrome versions the tests are run under. I have a directory chrome-versions/ in which I have installed the various versions of Chrome and Chromium I run the complete test suite against. By default, also the main version of Chrome as installed for the user is tested, so you don't need to set up anything special. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACY-fHOxJ_xvYhhU4tEmU0sU8CNZ3yeVks5uD8lcgaJpZM4VFyI3> .

-- Prometheus Labor Communications, Inc. http://prometheuslabor.com 413-572-1300 Communicate or Die: American Labor Unions and the Internet http://communicateordie.com

Max Maischein · Answer 16 · Sat Jul 07 2018 04:51:14 GMT+0800 (China Standard Time)

When creating a new Mechanize instance, pass headless => 0:

sub new_mech {
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        headless => 0,
        @_,
    );
};

Steve Dondley · Answer 17 · Sat Jul 07 2018 04:52:29 GMT+0800 (China Standard Time)

OK, yeah, was just trying that when your email came in. :)

…

On Fri, Jul 6, 2018 at 4:51 PM Max Maischein ***@***.***> wrote: When creating a new Mechanize instance, pass headless => 0: sub new_mech { WWW::Mechanize::Chrome->new( autodie => 1, headless => 0, @_, ); }; — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACY-fMIlwi84GN9nFsKbOQniMrSyyQxjks5uD83CgaJpZM4VFyI3> .

-- Prometheus Labor Communications, Inc. http://prometheuslabor.com 413-572-1300 Communicate or Die: American Labor Unions and the Internet http://communicateordie.com

Steve Dondley · Answer 18 · Sat Jul 07 2018 05:05:31 GMT+0800 (China Standard Time)

Hmm, that didn't work. debug output is telling me headless is still set to 1.

…

On Fri, Jul 6, 2018 at 4:52 PM Steve Dondley ***@***.***> wrote: OK, yeah, was just trying that when your email came in. :) On Fri, Jul 6, 2018 at 4:51 PM Max Maischein ***@***.***> wrote: > When creating a new Mechanize instance, pass headless => 0: > > sub new_mech { > WWW::Mechanize::Chrome->new( > autodie => 1, > headless => 0, > @_, > ); > }; > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#15 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ACY-fMIlwi84GN9nFsKbOQniMrSyyQxjks5uD83CgaJpZM4VFyI3> > . > -- Prometheus Labor Communications, Inc. http://prometheuslabor.com 413-572-1300 Communicate or Die: American Labor Unions and the Internet http://communicateordie.com

-- Prometheus Labor Communications, Inc. http://prometheuslabor.com 413-572-1300 Communicate or Die: American Labor Unions and the Internet http://communicateordie.com

Steve Dondley · Answer 19 · Sat Jul 07 2018 05:12:43 GMT+0800 (China Standard Time)

I think I see what's going on. It's running the browser in @instances before it gets to &new_mech.

Max Maischein · Answer 20 · Sat Jul 07 2018 05:13:35 GMT+0800 (China Standard Time)

No, I got the order wrong. helper.pm by default sets headless, so you need to overwrite that:

sub new_mech {
    #use Mojolicious;
    WWW::Mechanize::Chrome->new(
        autodie => 1,
        @_,
        headless => 0,
    );
};

Steve Dondley · Answer 21 · Sat Jul 07 2018 05:15:16 GMT+0800 (China Standard Time)

That did the trick. Thanks!