ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Home Page:https://archivebox.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autoscroll before before archiving and take full-height screenshots

pirate opened this issue · comments

I've sumbitted a Chromium bug tracker feature request for adding a --full-page flag: https://bugs.chromium.org/p/chromium/issues/detail?id=854013

Hopefully it's merged, allowing us to screenshot the full height of pages, instead of limiting them to the config settings defined by DIMENSIONS.

This will be easy with user scripts the moment pyppeteer is merged in #177. Or if we switch to playwright it's also easy using playwright's --full-page flag. #51

The code provided in this playwright issue solves the full-page screenshot problem for me
microsoft/playwright#620

Here is the code I use to take a full page screenshot with playwright

const { chromium } = require('playwright');

(async () => {

  const browser = await chromium.launch({
    channel: 'chrome' // or 'msedge', 'chrome-beta', 'msedge-beta', 'msedge-dev', etc.
  });
  const context = await browser.newContext();
  const page = await context.newPage();
  
  await page.goto('https://apple.com/');
  await scrollFullPage(page);
  
  await page.screenshot({ 
    path: 'apple.png',
    fullPage : true
  });
  
  await browser.close();
})();

async function scrollFullPage(page) {
  await page.evaluate(async () => {
    await new Promise(resolve => {
      let totalHeight = 0;
      const distance = 100;
      const timer = setInterval(() => {
        const scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;
        
        if (totalHeight >= scrollHeight){
          clearInterval(timer);
          resolve();
        }
      }, 100);
    });
  });
}`

Is this feature natively available now or only via hacking in user scripts?

Not available natively yet, it's blocked on #51

Ah fair enough, thanks! Seems like #51 encapsulates a whole ton of effort to make this happen, so thanks and good luck!