mozilla / fathom

A framework for extracting meaning from web pages

Home Page:http://mozilla.github.io/fathom/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve `isVisible` for correctness and performance

biancadanforth opened this issue · comments

Related to #91 .

As mentioned in this Price Tracker issue, isVisible overwhelmingly contributes to excessive jank right around page load.

Since Fathom's isVisible method is largely based on Price Tracker's, it suffers from this same problem, and it would be a good idea to have a more performant implementation.

Additionally, the current implementation in Fathom has a bug, so this should be fixed as well.

I have successfully implemented memoization using a Map[1] in an updated isVisible implementation in Price Tracker, but I am getting into an infinite recursion when I try to use Fathom's note method instead.

Here is the experimental branch (built from Price Tracker PR #317 using Fathom 3.0) where I try to use note[2]. Only the last two commits are new. The offending line is when I console.log(fnode.getNote('image') in isVisible.

@erikrose , I see in the Fathom source code that you try to account for infinite recursion already.

  • What is happening here, and is this[2] the right way to use note?
  • How would we incorporate the use of note (or a Map) solution into the Fathom library, since both require some code outside of the isVisible implementation itself?

[1]: Memoizing using a JavaScript Map

const visibleElements = new Map(); // HTML element => isVisible (Boolean) map

// …

  isVisible(fnode) {
    const element = fnode.element;
    const cachedResult = visibleElements.get(element);
    if (cachedResult !== undefined) {
      return cachedResult;
    }
    const rect = element.getBoundingClientRect();
    if (rect.width === 0 || rect.height === 0) {
      visibleElements.set(element, false);
      return false;
    }
    const style = getComputedStyle(element);
    if (style.opacity === '0') {
      visibleElements.set(element, false);
      return false;
    }
    // workaround for https://github.com/w3c/csswg-drafts/issues/4122
    const scrollX = window.pageXOffset;
    const scrollY = window.pageYOffset;
    const absX = rect.x + scrollX;
    const absY = rect.y + scrollY;
    window.scrollTo(absX, absY);
    const newX = absX - window.pageXOffset;
    const newY = absY - window.pageYOffset;
    const eles = document.elementsFromPoint(newX, newY);
    window.scrollTo(scrollX, scrollY);
    const result = eles.includes(element);
    visibleElements.set(element, result);
    return result;
  }

[2] Memoizing using Fathom note WIP

diff --git a/src/extraction/fathom/ruleset_factory.js b/src/extraction/fathom/ruleset_factory.js
index 65fa7d4..1624126 100644
--- a/src/extraction/fathom/ruleset_factory.js
+++ b/src/extraction/fathom/ruleset_factory.js
@@ -2,7 +2,7 @@
  * License, v. 2.0. If a copy of the MPL was not distributed with this
  * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
 
-import {dom, out, rule, ruleset, score, type} from 'fathom-web';
+import {dom, out, rule, ruleset, score, type, note} from 'fathom-web';
 import {euclidean} from 'fathom-web/clusters';
 
 const TOP_BUFFER = 150;
@@ -87,6 +87,7 @@ export default class RulesetFactory {
 
   /** Scores fnode by its vertical location relative to the fold */
   isAboveTheFold(fnode) {
+    console.log(fnode.noteFor('image'));
     const viewportHeight = 950;
     const imageTop = fnode.element.getBoundingClientRect().top;
 
@@ -199,6 +200,7 @@ export default class RulesetFactory {
   }
 
   isVisible(fnode) {
+    console.log(fnode.noteFor('image'));
     const element = fnode.element;
     const rect = element.getBoundingClientRect();
     if (rect.width === 0 || rect.height === 0) {
@@ -258,7 +260,7 @@ export default class RulesetFactory {
        * Image rules
        */
       // consider all visible img elements
-      rule(dom('img').when(this.isVisible.bind(this)), type('image')),
+      rule(dom('img').when(this.isVisible.bind(this)), note(() => ({isVisible: true})).type('image')),
       // and divs, which sometimes have CSS background-images
       // TODO: Consider a bonus for <img> tags.
       rule(dom('div').when(fnode => this.isVisible(fnode) && this.hasBackgroundImage(fnode)), type('image')),

After talking with Erik, we are limiting this issue to improving isVisible's performance without memoizing (and fixing the existing bug with getComputedStyle().width/getComputedStyle().height). Memoizing is handled by #121 .