AngleSharp / AngleSharp.Js

:angel: Extends AngleSharp with a .NET-based JavaScript engine.

Home Page:https://anglesharp.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading jQuery 2.2.4 causes deadlock

doominator42 opened this issue Β· comments

Bug Report

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp? AngleSharp v0.13.0
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • Did you perform a search in the issues?

Description

As the title says, loading jQuery 2.2.4 causes a deadlock. I'm using the following source:
https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.js

Steps to Reproduce

var config = Configuration.Default
	.WithRequester(new HttpClientRequester())
	.WithDefaultLoader(new LoaderOptions()
	{
		IsResourceLoadingEnabled = true
	})
	.WithJs()
	.WithEventLoop());

var document = await BrowsingContext.New(config).OpenAsync(r => r.Content(@"
<!DOCTYPE html>
<html>
    <head>
        <script src=""https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.js""></script>
    </head>
    <body>
        The body
    </body>
</html>"));

After investigation, it turns out that the issue comes from setting innerHTML while the document is loading. Here is a simple repro:

var config = Configuration.Default
	.WithRequester(new HttpClientRequester())
	.WithDefaultLoader(new LoaderOptions()
	{
		IsResourceLoadingEnabled = true
	})
	.WithJs()
	.WithEventLoop());

var document = await BrowsingContext.New(config).OpenAsync(r => r.Content(@"
<!DOCTYPE html>
<html>
    <head>
        <script>
            var div = document.createElement('div');
            document.documentElement.appendChild(div).innerHTML = '<span></span>';
        </script>
    </head>
    <body>
        The body
    </body>
</html>"));

Expected behavior: OpenAsync should return the document with the script loaded

Actual behavior: OpenAsync is blocked and does not finish ever

Environment details: .Net Core 3.1

Possible Solution

So, after a bunch of debugging, I found the deadlock happens in AngleSharp.Html.Parser.Dom.HtmlDomBuilder.cs:L141.

After parsing the last html part of the fragment, the tokenizer returns an EndOfFile token and is passed through the following methods:

The End method then sets _waiting = _document.FinishLoadingAsync(); which is causing the deadlock. So, I think the End method should not be called when a fragment is being parsed, but I'm not quite sure where exactly to put the condition. Also, I don't know if there are other cases appart of document fragments which could cause this issue.

Here's what I changed and now jQuery loads correctly.
HtmlDomBuilder.cs:L1511

case HtmlTokenType.EndOfFile:
{
+	if (IsFragmentCase)
+		return;

	CheckBodyOnClosing(token);

	if (_templateModes.Count != 0)
	{
		InTemplate(token);
	}
	else
	{
		End();
	}

	return;
}

Is my thinking correct? Also the changes are to be done in the AngleSharp repo, but I'm posting here because I tought the issue is more related to scripting. Should I open an issue in the AngleSharp repo as well or just a PR is enough?

Awesome project btw πŸ‘

Awesome investigation πŸš€ !

I think indeed that something like that fix would be working, but I think we may rather need to look at the FinishLoadingAsync. In the fragment case there shouldn't be anything open so something is definitely fishy there.

I also fully agree that this is rather an AngleSharp issue, but I appreciate that you posted it here - since scripting is the only victim here.

I will try to spend some time tomorrow thinking about the potential root cause / a potential solution and I would post it here. I would then love to get your input or assertion / evaluation for this. Let's to fix this and make a PR for AngleSharp 0.14! 🍻

Thanks for your input!

You are absolutly right about the FinishLoadingAsync. At first, I tought it was using the same instance of HtmlDomBuilder for the fragment and the main document, so that the End call was closing the main document.

I looked further into FinishLoadingAsync. What happens is that it's queueing the DOMContentLoaded event and returning a task which would not be completed until the event is invoked on the event loop. The event then never fires because it's waiting for the current event to finish.
Here is where the event is queued
What causes FinishLoadingAsync to wait for the event to complete

I'm not sure how to fix this one. Maybe the event could be fired synchronously (direct call, without using the event loop)? I would be curious to see how actual browsers implemented this.

I'm not sure how to fix this one. Maybe the event could be fired synchronously (direct call, without using the event loop)? I would be curious to see how actual browsers implemented this.

Same here. I thought about over it. I tend to actually use the early escape of the end of file case as shown in the OP.

Haven't reached a conclusion yet (not sure about any side effects).

(not sure about any side effects).

I think that FinishLoadingAsync still needs to be called to wait for resources (attached references) to load.

I had some spare time and looked at chromium source and they fire the event synchronously, see source. Ironically, there is a comment right above that says it should be asynchronous but the linked bug ticket has been closed with status WontFix.

I then looked at firefox source to compare and they fire the event asynchronously. Also, I'm not 100% sure (c++ is not really my thing), but I think it is not awaited.

So I can see 3 possible solutions:

  • Fire the event synchronously
  • Fire the event asynchronously without await
  • Don't fire the event in case of a fragment

IMO, the first would be the best.

What do you think @FlorianRappl ?

Yeah I think it is either

  • Fire the event synchronously, or
  • Fire the event asynchronously without await.

Let's go with the synchronous one. That way we delay / block with the resources, but we don't run into a (hidden) deadlock. I think it should be best.

Should I make the PR in AngleSharp or do you want to make it (I'll approve)? It would go into the 0.14 pre version, which we could then test and (if it works) I would make sure 0.14 is released ~start of next week.

πŸ‘ I'll submit the PR in the next days.

I saw there's already some tests for jQuery, but they are not using the event loop. That causes the events to be fired synchronously, exactly as my changes does, so that's why it was passing.
I ran the tests with .WithEventLoop() and it passes when applying my changes.

Should I send another PR for the tests? Because the changes are in the other repo and tests would be in this repo, not sure if it would break things.

No its fine - send the PR for this repo, too. Since this is a AngleSharp.Js 0.14 pre-release you can reference the AngleSharp 0.14 pre-release, too.

Thanks - much appreciated! 🍻

Available in develop.