jsdom / jsdom

A JavaScript implementation of various web standards, for use with Node.js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JSDOM does not load <script> javascript code recursively

kelson42 opened this issue · comments

If you have a script1 which create a script2 tag which itself create a script3 tag, then only script1 and script2 are listed in the DOM using the following code:

var jsdom = require('jsdom');
jsdom.defaultDocumentFeatures = { 
    FetchExternalResources   : ['script'],
    ProcessExternalResources : ['script'],
    MutationEvents           : '2.0',
};
var html = "<html><head><script src="script1.js"></script></head><body>test</body></html>";
var window = jsdom.jsdom( html ).createWindow();
window.addEventListener('load', function () { 
/* Only two script nodes in the DOM */
});

It looks like jsdom does not evaluate the code of script2, so jsdom does not seem to work recursively. This is a problem in my case. How to fix that?

Can you should that this behavior is different in a browser? I am pretty sure asynchronously-inserted script tags, e.g. ones inserted from inside another script tag, do not delay the window's load event. Note that they will still be inserted, but they will not delay load.

Sounds to be a reasonable explanation to me ; but then which event should be used/how to get the full populated DOM?

I'm not sure; what event would you normally use in a browser?

To be clear, jsdom is happy to take bugs where its behavior demonstrably differs from that of a browser. So if you can create a test case to that effect, please let us know! But if there's no difference, it's not really our job to give you tech support on how the web platform works... StackOverflow might be a better place for that.

I'm also not sure if this is an issue with jsdom, an incompetence on my side (don't know the event name) or simply a use case jsdom can't satisfy. I just know that at http://en.wikipedia.org, we have 23 scripts nodes, most of them created dynamically. If you save the page statically with your browser, you can see them. Using jsdom "load" event, only 11 of them are created, and it seems to me no event are sent after all javascripts are executed or similar, so I have no way to retrieve a dom with this 23 nodes.

Seems to me, this could be related to #380 because at http://en.wikipedia.org, javascript code is partly loaded dynamically using "ajax" type requests. So, if xmlhttprequest is "not working" in jsdom, then this is pretty clear why all scripts are loaded...

It is true, dynamically inserted scripts load asynchronously. When the document fires the load event your dynamically injected script tags may not have finished executing but your hard coded scripts will have (by default). There are module loaders that use this method to shorten page load times. Basically, you'd write script tags in the page to start your loader and have it dynamically insert script tags so everything loaded asynchronously. The module loaders provide methods for defining dependencies, so if you absolutely have to wait for a certain script to load before beginning some dependent script, you can do that. The module loader will also provide some signal indicating when it is finished loading all of the scripts. So, unless you've rolled your own loader, you should be able to find documentation on what events fire and when. For your loader you'd just hook into whatever the "all scripts have fired" event is and proceed from there.

There is no event defined to tell the browser when all dynamically generated script tags have loaded because there is no possible way for the browser to know how many script tags will be generated in the future.

Imagine it, you run a setInterval that generates a new script tag every 10 seconds. How is the browser to be expected to know that another script will be generated in ten seconds? When would it fire the "all dynamically generated and inserted scripts have executed"? When some idle time came up between intervals? Never? After every script dynamically inserted?

The solution is either in the documentation of your module loader or, you will have to create your own signal letting you know that all the dynamic scripts have been loaded. Then, you'll wait for this signal before asking jsdom to do whatever it is that you want it to do after all dynamically injected scripts have been executed. The equivalent user action is navigating to a page, waiting for it to load completely, and then viewing the "live" source of the page with document.documentElement.innerHTML or tools available in the browser. If you want to go the quick and dirty route, you could defer your actions with a setTimeout that has some extremely long duration to ensure that a best effort has been made to execute the dynamically injected scripts before you take a snapshot or whatever.

@domenic Maybe a wiki article about this would help? I understand not wanting to reiterate every tip and trick about dealing with the live dom and http protocols but this is something pretty basic that is often a cause of confusion for people first using a headless browser.

http://www.stevesouders.com/blog/2009/04/27/loading-scripts-without-blocking/
http://www.sitepoint.com/non-blocking-async-defer/

Hmm, I think that really nice treatise you just wrote, @matthewkastor, should be pretty helpful :). Yeah, maybe I'll close this, since it does seem nobody can demonstrate something different from a browser, and then open a new one to add a quick pointer in the readme of some sort.

Thanks.

Closed in favor of #675.