ghostery / adblocker

Efficient embeddable adblocker library

Home Page:https://www.ghostery.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YouTube ads on first load

skunkfox opened this issue · comments

I am using the following video as a test: https://www.youtube.com/watch?v=-t9XZgk6kNY

I discovered when using the Vieb browser that I was seeing ads in the following video. I know that I don't see ads in Brave or Firefox with ublock origin, and I know that Vieb using ghostery's adblock. Curious, I fired up adblocker-electron-example and loaded the page. Sure enough, the same ads appeared.

However, after pausing the video, if I then search YouTube for the same video, it plays without ads! This doesn't occur in Vieb, so I opened up the developer console in both Vieb and adblocker-electron-example. I'm noticing that when I load the YouTube video for both for the first time, one of the items that gets blocked is something called '56-y-ORG.js', request url: 'https://tpc.googlesyndication.com/sodar/56-y-0RG.js'. Maybe it has something to do with cross-origin? On subsequent Vieb loads that network request continues to appear, but on subsequent adblocker-electron-example loads, it does not.

I'd love to help debug this issue and improve adblocking, but I'm still very new to electron and node.js. I was wondering if there is someone I can contact on Telegram or Discord or Matrix to bounce questions off of, since I'd really like to help as much as I can.

commented

Hi @skunkfox,

Thanks for reaching out. One reason the result with Vieb and adblocker-electron-example can differ is if they use different versions of the rules/resources. I see the rules are vendored in https://github.com/Jelmerro/Vieb/tree/master/app/blocklists and are manually updated. Maybe we could try with the latest version of all files.

The adblocker-electron-example project also loads more blocklists by default (so maybe would it be good to try to load the uBlock Origin filters there as well). You can check adsLists and adsAndTrackingLists here: https://github.com/ghostery/adblocker/blob/master/packages/adblocker/src/fetch.ts#L55

In general, even in a browser, blocking YouTube ads is tricky and there is often a race between ads loading and how fast the adblocker can inject the scripts in the page. In particular, whenever there is a round-trip between the page content and the adblocker (e.g. using ipcMain.on("get-cosmetic-filters")), it might already be too late (sometimes the ads will be blocked, sometimes not, depending on if injection is fast enough). I am not sure if that's possible on Electron but in the browser we can listen to navigation events and pro-actively inject scripts to pages from the main process (which works better).

Cc @Jelmerro who might be able to provide more insights there.

I hope that helps,

@remusao The lists are included by default, but can also optionally be updated on startup, which to my knowledge most users are aware of and using (from what I read in comments and reddit posts). I am looking for more lists to include by default, but I do see a lot of ads passing through with easylist and easyprivacy that are not there when using ublock origin. If that is due to missing lists, better filtering by ublock or by an incorrect implementation in Vieb is not known to me, but I highly suspect it's a combination of missing lists and ublock origin having better adblocking. My main question is: What can I provide in terms of data or code that would help with debugging the cause of this issue, so that we can block ads on Youtube more reliably?

commented

Thanks for the quick feedback @Jelmerro. I would probably start by trying to add the same lists as default uBlock Origin in Vieb and then check again if ads are still going through.

I think the following would be a good starting point (including adsLists and adsAndTrackingLists): https://github.com/ghostery/adblocker/blob/master/packages/adblocker/src/fetch.ts#L55

How often are these lists updated in the repo, and/or can we fetch these lists directly from easylist/ublock/peter instead? I can't find the urls for all of them, only a few. Besides that, my biggest worry is that it's not the lists, but the custom loading system I had to make due to Electron only allowing a single onBeforeRequest listener at a time.
When creating a new session newSess I do this:
image
And also this:
image
The adblockerPreload is the worst part of this, since it's not exported, I had to do this:
image
Which I would like to avoid, but for now I will try to add more lists, which is in line with what I recommend my users who complain about Youtube ads, though I'm skeptical if they can really be blocked properly at all. Code can be found in the Vieb repo in the index.js file in case you are curious to check it out yourself. Thanks for looking into this.

commented

The lists are being updated continuously, but on this versions mirrored on this repository are updated at most once per day (and that's if I merge the PR...). It's a good idea to pull from the source directly if you can. The URLs can be found here: https://github.com/ghostery/adblocker/blob/master/packages/adblocker/assets/update.js#L27

Why is it an issue for you to only register the listener once? I agree it's not pretty, but the way you're doing it seems fine? Do you need to listen to requests apart from the adblocking use-case?

I agree with you that blocking YouTube ads is particularly challenging (even in the browser, where all capabilities are available to the adblocker). And Electron is a bit more limited in some ways. But it can probably be made to work well enough. I don't have lots of time to dedicate to this issue at the moment unfortunately but happy to help however I can.

To get the preload script path, did you try something like that? https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L18

Thanks for the list of urls, I am now only missing the resources file still.

Yes, I need the listener myself for redirecting urls, but it seems to work fine indeed, so not something that really needs changing. I'm mostly concerned about the preload.

As for the preload path, yes I had that before, but since moving to webpack I had to do it as a string instead of using require.resolve, as that will not resolve correctly after building it with webpack. I ran out of ideas to try to get the path in a way that would satisfy webpack, so I had to use a fixed path, but I imagine that this path does not change often, and if it does, the build will fail anyway.

My bad, the resources file is in there above the line you linked, but it's coming from a cliqz cdn indirectly, by fetching some metadata first instead of the original location (if any). I would rather get it from the source if possible.

commented

There is no source per say because the resources file has to be transformed to be compatible with Cliqz' adblocker. So your best bet is to use this one from the CDN. On the other hand it does not change very often so it might not be a big issue.

I can see the ads on Youtube even when adding all the lists mentioned, which is in line with @skunkfox's test results, because they also had it with the adblocker-electron-example. Any chance it's not the fault of the lists, but missing support for some syntax in the ublock lists?

commented

I don't think so, because @cliqz/adblocker is perfectly able to block YouTube ads in a browser context. I would suspect that Electron's API (or Electron wrapper) is not on par with the browser support and something is either missing, or behaving in a way that is not allowing YouTube ads to be blocked. We'd need to investigate to figure this out. One usual suspect (especially on Electron) is the injection of scriptlets in pages. There were issues in the past with sandboxing, etc. so we should make sure that the injection works as expected.

I'm so glad to see that there has been activity on this issue.

What can I, a programmer but JS and electron newb, do to help?

commented

@skunkfox thanks for offering your help! I think one first step would be to verify that scripts are injected in pages by the ElectronBlocker. This happens here: https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L271

There were updates in the past with Electron which prevented the scripts from being injected in page content (sandboxing, etc.). So first step would be to make sure that's not the case. Philipp recently added some debugging code for the WebExtensionBlocker here:

We could do something similar in ElectronBlocker to add a console.error, visit YouTube and expect to see these logs in the console logs of the page. If they are not seen, then maybe the scripts are not injected correctly.

Depending on the result of this investigation we can think of next steps (if the injection does not work, there might be options in Electron that we can tweak, etc.).

All of the above should be possible to test by using the adblock-electron-example project locally.

@skunkfox thanks for offering your help! I think one first step would be to verify that scripts are injected in pages by the ElectronBlocker. This happens here: https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L271

There were updates in the past with Electron which prevented the scripts from being injected in page content (sandboxing, etc.). So first step would be to make sure that's not the case. Philipp recently added some debugging code for the WebExtensionBlocker here:

We could do something similar in ElectronBlocker to add a console.error, visit YouTube and expect to see these logs in the console logs of the page. If they are not seen, then maybe the scripts are not injected correctly.

Depending on the result of this investigation we can think of next steps (if the injection does not work, there might be options in Electron that we can tweak, etc.).

All of the above should be possible to test by using the adblock-electron-example project locally.

Wow, thanks for the quick response!

So believe it or not I just got really sick and tested positive for COVID. So I'm mostly going to be sleeping for these next few days. However, I look forward to helping debug after I get better. Just a warning that it may be a few days.

commented

Get well soon!

I am slowly recovering from COVID. I added some debug code.

  private injectScripts(sender: Electron.WebContents, script: string): void {
    sender.executeJavaScript(script);
    console.error("[ADBLOCKER-DEBUG] - [injectScripts] - ", JSON.stringify(script));
  }

  private injectStyles(sender: Electron.WebContents, styles: string): void {
    if (styles.length > 0) {
      console.error("[ADBLOCKER-DEBUG] - [injectStyles] - ", JSON.stringify(styles));
      sender.insertCSS(styles, {
        cssOrigin: 'user',
      });
    }

It looks like these functions are being called when I visit the YouTube video and see ads. I can paste the console.error messages if you'd like, but they are enormous.

commented

Great to hear that you're doing better. Thanks for looking into this issue, that's a great start. So at least we know that scriplets and styles are correctly returned for youtube.com. I think the next step would be to check that the injection in the page happens as expected. One way would be to wrap the scripts injected with some debugging code to see its effect in page. You could try something like this maybe? https://github.com/ghostery/adblocker/blob/master/packages/adblocker-webextension/adblocker.ts#L631

Great to hear that you're doing better. Thanks for looking into this issue, that's a great start. So at least we know that scriplets and styles are correctly returned for youtube.com. I think the next step would be to check that the injection in the page happens as expected. One way would be to wrap the scripts injected with some debugging code to see its effect in page. You could try something like this maybe? https://github.com/ghostery/adblocker/blob/master/packages/adblocker-webextension/adblocker.ts#L631

I have made a similar modification to adblocker-electron/adblocker.ts:

private injectScripts(sender: Electron.WebContents, script: string): void {
    let debugMarker;
    debugMarker = (text: string) =>
      `console.log('[ADBLOCKER-DEBUG-TEST]:', ${JSON.stringify(text)});`;

    const codeRunningInPage = `(function(){
      ${debugMarker('run scriptlets (executing in "page world")')}
      ${script}}
      )()`;

    const codeRunningInContentScript = `
    (function(code) {
        ${debugMarker('run injection wrapper (executing in "content script world")')}
        var script;
        try {
          script = document.createElement('script');
          script.appendChild(document.createTextNode(decodeURIComponent(code)));
          (document.head || document.documentElement).appendChild(script);
        } catch (ex) {
          console.error('Failed to run script', ex);
        }
        if (script) {
            if (script.parentNode) {
              script.parentNode.removeChild(script);
            }
            script.textContent = '';
        }
    })(\`${encodeURIComponent(codeRunningInPage)}\`);`;


    console.log("BEGIN SCRIPT CONTENT\n", script, "\nEND SCRIPT CONTENT");

    
    sender.executeJavaScript(codeRunningInContentScript);
  }

When I load the YouTube link (as he first page I load) I see the attached output in the browser debug console.
image_2022-10-22_19-46-26

I'm not sure how helpful that is to you. As for the contents of the injected scripts they seem rather obfuscated and difficult to debug directly (at least for a JavaScript / TypeScript / Electron newb like myself). If you are curious, here they are:

BEGIN SCRIPT CONTENT
 (function(){const e="[].playerResponse.adPlacements [].playerResponse.playerAds playerResponse.adPlacements playerResponse.playerAds adPlacements playerAds";const t="{{2}}";const n="{{1}}"!==e&&""!==e?e.split(/ +/):[];let o;let s,r;if(0!==n.length)o=0!==n.length&&"{{2}}"!==t&&""!==t?t.split(/ +/):[];else{s=console.log.bind(console);let e;if(""===t||"{{2}}"===t)e=".?";else if("/"===t.charAt(0)&&"/"===t.slice(-1))e=t.slice(1,-1);else e=t.replace(/[.*+?^${}()|[\]\\]/g,"\\$&");r=new RegExp(e)}const l=function(e,t,n=false){let o=e;let s=t;for(;;){if("object"!==typeof o||null===o)return false;const e=s.indexOf(".");if(-1===e){if(false===n)return o.hasOwnProperty(s);if("*"===s)for(const e in o){if(false===o.hasOwnProperty(e))continue;delete o[e]}else if(o.hasOwnProperty(s))delete o[s];return true}const t=s.slice(0,e);if("[]"===t&&Array.isArray(o)||"*"===t&&o instanceof Object){const t=s.slice(e+1);let r=false;for(const e of Object.keys(o))r=l(o[e],t,n)||r;return r}if(false===o.hasOwnProperty(t))return false;o=o[t];s=s.slice(e+1)}};const f=function(e){for(const t of o)if(false===l(e,t))return false;return true};const i=function(e){if(void 0!==s){const t=JSON.stringify(e,null,2);if(r.test(t))s("uBO:",location.hostname,t);return e}if(false===f(e))return e;for(const t of n)l(e,t,true);return e};JSON.parse=new Proxy(JSON.parse,{apply:function(){return i(Reflect.apply(...arguments))}});Response.prototype.json=new Proxy(Response.prototype.json,{apply:function(){return Reflect.apply(...arguments).then((e=>i(e)))}})})(); 
END SCRIPT CONTENT
BEGIN SCRIPT CONTENT
 (function(){const e="ytInitialPlayerResponse.adPlacements";let t="undefined";const n=document.currentScript;if("undefined"===t)t=void 0;else if("false"===t)t=false;else if("true"===t)t=true;else if("null"===t)t=null;else if("''"===t)t="";else if("[]"===t)t=[];else if("{}"===t)t={};else if("noopFunc"===t)t=function(){};else if("trueFunc"===t)t=function(){return true};else if("falseFunc"===t)t=function(){return false};else if(/^\d+$/.test(t)){t=parseFloat(t);if(isNaN(t))return;if(Math.abs(t)>32767)return}else return;let i=false;const r=function(e){if(i)return true;i=void 0!==e&&null!==e&&void 0!==t&&null!==t&&typeof e!==typeof t;return i};const f=function(e,n,i,r){if(false===r.init(e[n]))return;const f=Object.getOwnPropertyDescriptor(e,n);let u,s;if(f instanceof Object){e[n]=t;if(f.get instanceof Function)u=f.get;if(f.set instanceof Function)s=f.set}try{Object.defineProperty(e,n,{configurable:i,get(){if(void 0!==u)u();return r.getter()},set(e){if(void 0!==s)s(e);r.setter(e)}})}catch(e){}};const u=function(e,i){const s=i.indexOf(".");if(-1===s){f(e,i,false,{v:void 0,init:function(e){if(r(e))return false;this.v=e;return true},getter:function(){return document.currentScript===n?this.v:t},setter:function(e){if(false===r(e))return;t=e}});return}const c=i.slice(0,s);const o=e[c];i=i.slice(s+1);if(o instanceof Object||"object"===typeof o&&null!==o){u(o,i);return}f(e,c,true,{v:void 0,init:function(e){this.v=e;return true},getter:function(){return this.v},setter:function(e){this.v=e;if(e instanceof Object)u(e,i)}})};u(window,e)})(); 
END SCRIPT CONTENT
BEGIN SCRIPT CONTENT
 (function(){const e="playerResponse.adPlacements";let t="undefined";const n=document.currentScript;if("undefined"===t)t=void 0;else if("false"===t)t=false;else if("true"===t)t=true;else if("null"===t)t=null;else if("''"===t)t="";else if("[]"===t)t=[];else if("{}"===t)t={};else if("noopFunc"===t)t=function(){};else if("trueFunc"===t)t=function(){return true};else if("falseFunc"===t)t=function(){return false};else if(/^\d+$/.test(t)){t=parseFloat(t);if(isNaN(t))return;if(Math.abs(t)>32767)return}else return;let i=false;const r=function(e){if(i)return true;i=void 0!==e&&null!==e&&void 0!==t&&null!==t&&typeof e!==typeof t;return i};const f=function(e,n,i,r){if(false===r.init(e[n]))return;const f=Object.getOwnPropertyDescriptor(e,n);let u,s;if(f instanceof Object){e[n]=t;if(f.get instanceof Function)u=f.get;if(f.set instanceof Function)s=f.set}try{Object.defineProperty(e,n,{configurable:i,get(){if(void 0!==u)u();return r.getter()},set(e){if(void 0!==s)s(e);r.setter(e)}})}catch(e){}};const u=function(e,i){const s=i.indexOf(".");if(-1===s){f(e,i,false,{v:void 0,init:function(e){if(r(e))return false;this.v=e;return true},getter:function(){return document.currentScript===n?this.v:t},setter:function(e){if(false===r(e))return;t=e}});return}const c=i.slice(0,s);const o=e[c];i=i.slice(s+1);if(o instanceof Object||"object"===typeof o&&null!==o){u(o,i);return}f(e,c,true,{v:void 0,init:function(e){this.v=e;return true},getter:function(){return this.v},setter:function(e){this.v=e;if(e instanceof Object)u(e,i)}})};u(window,e)})(); 
END SCRIPT CONTENT

I would like to reiterate my willingness to help in any way I can, after all I use Vieb as my main browser.

Where can I go from here?

commented

Thanks @skunkfox, that is very useful. Now we know that the scripts that should be injected are injected. I checked with uBlock Origin's logger and and it seems to match:

image

Two things come to mind:

  1. These scripts need to be injected in any frame of the page, so one thing we could also check is that it's also the case here with Electron (i.e. injecting both in main page document but also in all iframes). Maybe one way could be to check the we see these injected each multiple times (in the adblocker logs from Electron?) since that seems to be what to expect based on uBlock Origin's logger. After loading a YouTube page I can see (using document.querySelectorAll('iframe') in the page console dev tool) that there are 3 iframes so each of these should get the scripts.
  2. Another issue which might be more tricky to check, is that from what I remember, the timing of injection matters a great deal on YouTube. Meaning that we should aim at injecting these scripts as quickly as possible in the page. If the ad has time to start loading before injection then it's too late to prevent it. Here we would need to see if we can perform the injection earlier. At the moment we have a round-trip to the page: https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L86
    a. The preload script is injected in the page (that might already be happening pretty late in the life-cycle of the frame; not entirely sure how Electron implements that)
    b. The preload script sends a message to main process to request injections https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron-preload/preload.ts#L15
    c. When this message is received by main process: https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L86 it then checks for things to inject and performs the injection here: https://github.com/ghostery/adblocker/blob/master/packages/adblocker-electron/adblocker.ts#L270

I suspect this is too slow. In the webextension adblocker for browsers we now have a way to detect new frames using the webNavigation listener from the main process and inject scripts earlier; avoiding a round-trip with the page. We should look for alternatives in Electron APIs. If there is a way to listen to an event for new pages or frames from the main process, then we could react quicker.

I see there are a bunch of events here: https://www.electronjs.org/docs/latest/api/web-contents#instance-events
We can experiment and check in which order they trigger, and what is the earliest point where we could perform the injection successfully.

So I am somewhat skeptical that the scriplets are being injected into all iframes. For one, I see that there are three scripts getting injected and the content of each one is unique. I would expect to see multiple copies of each script getting injected, one for each iframe. Although I could well be wrong about this, maybe each iframe is supposed to receive a unique script?

I added some debug code to list additional iframe context:

    // Some added debugging
    let frames = event.sender.mainFrame.frames;
    let name = event.sender.mainFrame.name;
    let url_holder = event.sender.mainFrame.url;

    console.log("[BEGIN DEBUG DUMP]");
    console.log("Name: ", name);
    console.log("Frames: ", frames);
    console.log("URL: ", url_holder);
    console.log("[END DEBUG DUMP]");

    // Inject custom stylesheets
    this.injectStyles(event.sender, styles);

    // Inject scriptlets
    for (const script of scripts) {
      console.log("INJECTING SCRIPLETS");
      this.injectScripts(event.sender, script);
    }

I get the following output:

[BEGIN DEBUG DUMP]
Name:  
Frames:  []
URL:  https://www.youtube.com/watch?v=-t9XZgk6kNY
[END DEBUG DUMP]
INJECTING SCRIPLETS
SCRIPT INJECTED
INJECTING SCRIPLETS
SCRIPT INJECTED
INJECTING SCRIPLETS
SCRIPT INJECTED

The source code for WebFrame says:

    readonly frames: WebFrameMain[];
    /**
     * A `WebFrameMain[]` collection containing every frame in the subtree of `frame`,
     * including itself. This can be useful when traversing through all frames.
     *
     */

If the 'frames' attribute should include the main frame as well, I don't see why my console.log() output shows an empty list. I must be doing something wrong. Any tips?

Hello guys, juste came across the same problem in my Electron app, first video loaded always gets ads. Did you have the time to look into it ? :)

Hello guys, juste came across the same problem in my Electron app, first video loaded always gets ads. Did you have the time to look into it ? :)

This is driving me insane. I am going to look back into it in the next few days. I'll coerce my friend who is more experienced with Javascript to help me.

@michael-dm Do you want to help me debug this? You have a Telegram or Discord or IRC or Matrix handle I can use to get in touch?

Hello @skunkfox, I made a repro of this bug on Electron Fiddle if it's helpful :
https://gist.github.com/bf00689b195a6464b2f21fd6e9671595

Hello @skunkfox, I made a repro of this bug on Electron Fiddle if it's helpful : https://gist.github.com/bf00689b195a6464b2f21fd6e9671595

You can add me on Discord here : Michaël#3536

Great, I just started getting into this again this weekend.
Sent you a friend request. :)

This also happens on https://www.npmjs.com/package/@cliqz/adblocker-puppeteer

Just start a puppeteer instance and open a tab or two on youtube.com and you will 100% get an ad
I've tested this dozens of times

This also happens on https://www.npmjs.com/package/@cliqz/adblocker-puppeteer

Just start a puppeteer instance and open a tab or two on youtube.com and you will 100% get an ad I've tested this dozens of times

Hey Ataxeus, I'm starting a debugging branch to work on this. (Not that I'm making much progress so far, right now I'm still trying to determine if the scriptlets are running in the correct frames). Do you have a Discord or Slack or Telegram? I'd be interested in group chatting with you and michael-dm and finally figuring this out.

Okay, so strangely enough, ads stopped appearing on the old test video and I had to update it.

So, I've got this debug repo set up. You can clone it, then type 'yarn bootstrap' and 'yarn build' and 'yarn watch' in the root directory to build everything and re-build it on changes. Then cd to 'packages/adblocker-electron-example' and run 'npm start' to bring up an electron video to a page where YouTube shows an ad.

In the browser debug window you'll see some debug output I added to './adblocker-electron/adblocker.ts'. Specifically I expect 'codeRunningInPage' and 'console.log('Current Frame:', sName);' to show me the name of the current frame. Nada. You'd think for how long I've had this bug open I would have figured this out by now but I have no experience with Electron and real life events keep pulling me away from this.

I suspect the problem has to do with what @remusao mentioned earlier , specifically point 1 that maybe the scriptlets aren't being injected multiple times, because right now it looks like each frame gets a unique scriptlet. (Or at least, when I log the scriptlets they are all unique). But it could be point 2 as well.

I'd appreciate any help for with this because I had having to switch to librewolf each time I visit YouTube.

Any updates about this? I'm encountering the same issue with Ads not being blocked in YouTube videos in a project I am working on.

Any updates about this? I'm encountering the same issue with Ads not being blocked in YouTube videos in a project I am working on.

Sigh, no. I started a new job and the hours are crazy so I don't have much time to work on personal projects at the moment.