w3c / event-timing

A proposal for an Event Timing specification.

Home Page:https://w3c.github.io/event-timing/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

First input should ignore events before First Contentful Paint?

patrickhulce opened this issue · comments

I wrote this issue up before testing it myself. Once I tested myself (test page) I was never able to get the first-input performance event type to fire by clicking before FCP. If someone could confirm that FID ignores input prior to FCP (and maybe add to text in README?) then no need to read the rest :)

If you've already had a discussion regarding this I missed, would also love to read the history here!

Read this if input before FCP counts

tl;dr - there's nothing for the page to react to before it's even painted anything, so the developer shouldn't focus on those delays via FID.

This was prompted by a request for Lighthouse to remove our lower bound on FCP for "Max Potential FID" metric (GoogleChrome/lighthouse#10760) citing that FID in the wild doesn't care about FCP. However, we feel that considering the page's visual state is important information that should guide the tracking of FID. I was even under the impression that this was already the case! (and lol, maybe it is? see testing note above)

I assert that the vast majority of input events prior to FCP will fall into one of two buckets:

  1. It occurs before any meaningful handlers have been registered resulting in very low FID values.
  2. It occurs during the setup of the page in JS resulting in very high FID values.

In either case, the FID value reported is not capturing how responsive the page was to input it is attempting to handle. Instead, it is just capturing whether the input happened to occur during the JS-phase of page setup or not. There is certainly some useful signal in that, but none that is not better captured by measuring FCP directly.

To take it to an extreme, we definitely shouldn't care about the input delay while the root document request is inflight as it obviously has no benefit to the measuring the interactivity of the page's frontend. -- this is when I went to test to make sure that was true and found I couldn't repro --

Interesting idea, as always :)

The answer is: we currently do count inputs occurring before FCP. I tried clicking on your test page before FCP and didn't get FID. I think Chrome currently ignores clicks that occur before the first paint, hence no FID is produced in this case simply because no input is passed to the renderer process. I do get FID from using my keyboard before FCP, and it results in a high FID value.

Regarding whether this is desirable or not, I'm not sure. I do hope that this is an edge case, i.e. first input should usually occur after FCP (I should look into data to verify this claim heh). But if a user does interact with a page when it's blank, then presumably they're expecting something to happen. Discarding inputs before FCP seems fine, but this also points to the general problem: we don't even know if the first input is a meaningful input for the user.

In case 1. (no handlers), I'll point out that this is a currently unsolved problem but not restricted to cases before FCP: a page could be all pretty and laid out yet not yet have handlers, and FID will not properly capture this problem. And in case 2., the fact that FID occurs means that the input is actually 'handled' (even if it ends up doing nothing), so technically having a high FID for this case is reasonable?

That said, I do agree with the concern here. More thought required :) Let me know if you are also able to reproduce with keyboard strokes!

I think Chrome currently ignores clicks that occur before the first paint
Let me know if you are also able to reproduce with keyboard strokes!

Ah thanks for the clarification @npm1 that certainly explains the behavior I was seeing. I can reproduce exactly what you're talking about w.r.t keypresses before FP 👍

first input should usually occur after FCP (I should look into data

I would be very curious how prevalent this problem is, do share if you end up collecting 😃My hunch is that nearly all of this is not meaningful input and is hopefully a small percentage.

I'll point out that this is a currently unsolved problem but not restricted to cases before FCP: a page could be all pretty and laid out yet not yet have handlers, and FID will not properly capture this problem

Very true. I'm not attempting to solve all instances of this problem as we know from experience with TTI that figuring this out is incredibly difficult. Just trying to suggest that if the page hasn't reached FCP yet, that's an extremely high quality signal that the page isn't ready to handle input yet.

the fact that FID occurs means that the input is actually 'handled' (even if it ends up doing nothing), so technically having a high FID for this case is reasonable?

Oh yes it's undoubtedly capturing the correct value in terms of the delay to input processing. No issue there 👍 Just that in this case all it's really doing is capturing a portion of FCP depending on their unprompted input time instead of the unique interaction quality we're trying to measure with FID. My concern is that it muddies the waters of trying to interpret FID data with values that weren't really delays in processing meaningful input.

I took a look at recent data we have on Chrome. On Android, the first input occurs before FCP less than 1% of the cases, and on Windows it’s less than 0.6% of cases. In addition, I looked into how the percentiles would be affected if we exclude inputs that occur before FCP. Based on the data, it looks like percentiles would remain fairly similar, although the percentiles would all remain equal or decrease. On Windows only after the 90 percentiles see a difference greater than 1ms, and it’s always less than 5% in those cases. On Android, we see differences greater than 1ms starting from around the 80th percentile, but they’re always less than 6% deltas. So overall my conclusion here is that the data is not very different with/without inputs before FCP.

Our team also thought a bit about this and there are several downsides to excluding inputs occurring before FCP:

  • Added complexity in the metric. This also makes it harder to reason about the FID values, because FCP is a somewhat arbitrary threshold (flashback to TTI? :)).
  • There's not super clear separation between reasons to interact before FCP vs after (i.e. in both cases the interaction may be caused by frustration).

Based on the analysis of data plus the downsides, what do you think?

I concede that FCP might be an arbitrary point reminiscent of TTI, but I think at the very least excluding keypresses before any paint would at the very least be consistent with the current position of click being excluded. I imagine the reasons for originally excluding click from the renderer are very similar to my concerns here. It couldn't possibly be meaningful input because the user has no idea what they just did, nothing is there yet!

There's not super clear separation between reasons to interact before FCP vs after (i.e. in both cases the interaction may be caused by frustration).

While certainly true, I don't understand this one as something that should be encouraged. As I mentioned, slowness to reach the first paint is very well tracked by other factors. There's a clear opportunity to enhance the distinctness and clarity of the metric's meaning in ways that can simplify the interpretation.

Added complexity in the metric.

I sympathize with this concern if the cost to benefit ratio is low. If the consensus is we would want to do this, but it's too much work to change for little benefit. That's enough for me I suppose. I think that justifies the case I'm defending in the lab.

I concede that FCP might be an arbitrary point reminiscent of TTI, but I think at the very least excluding keypresses before any paint would at the very least be consistent with the current position of click being excluded. I imagine the reasons for originally excluding click from the renderer are very similar to my concerns here. It couldn't possibly be meaningful input because the user has no idea what they just did, nothing is there yet!

I think the reason for excluding clicks is that their result is highly dependent on where you click, so processing them may result in unexpected behavior for a user since they can't see what they're clicking. That's not really the case for keypresses. I could press keydown and that just means I want to see further down the page. So I think the reasons are not necessarily the same, but I wouldn't be very surprised if we also exclude those in the future.

There's not super clear separation between reasons to interact before FCP vs after (i.e. in both cases the interaction may be caused by frustration).

While certainly true, I don't understand this one as something that should be encouraged. As I mentioned, slowness to reach the first paint is very well tracked by other factors. There's a clear opportunity to enhance the distinctness and clarity of the metric's meaning in ways that can simplify the interpretation.

I don't know if I agree that the proposed change would introduce clarity. If we were to exclude inputs before first paint, why we wouldn't want to also exclude clicks that occurs on a location without handlers and on a background-color pixel? And since including the early inputs results in higher delays on aggregate, it seems to correctly capture poor user experience. Sure, that makes it overlap with FCP, but only slightly as these cases are rare.

Added complexity in the metric.

I sympathize with this concern if the cost to benefit ratio is low. If the consensus is we would want to do this, but it's too much work to change for little benefit. That's enough for me I suppose. I think that justifies the case I'm defending in the lab.

Yea, there's a tradeoff here with gains vs simplicity and based on the data I think I'd rather keep it simple. Didn't quite get that last sentence. But anyways, thanks for the suggestion, and I hope that the explanation made sense :)

Closing for now