Latency issues when searching large files

Question

Latency issues when searching large files

helmus opened this issue 6 years ago · comments

Willem D'Haeseleer commented 6 years ago

Thank you so much for getting out 3.5, I really appreciate the work !
Is there an option to disable full file search ?

I'm asking because, the latency is still multiple seconds when using 1 letter search in large files:

When entering multiple characters, the latency is reduced ( not gone ), but AceJumps seems to always have at least 2 letters for the tag, even when there is only 1 or 2 tag matches, requiring seemingly unnecessary keystrokes.

When confirming the tag there is also considerable latency between keystrokes. In the above example you can see I'm confirming the JJ tag. Between the first J and the second J the ui freezes for about 100, 200 ms before the cursor jumps, this is very noticeable and adds more latency to "jump time".

Given these two problems searching with multiple characters is not a viable solution to reduce latency / keystrokes.
It seems to me that both problems could be resolved if somehow it could be possible to disable searching the whole file and search only what's on screen.

I can only pay out the bounty if the performance regression introduced by 3.3.2 is fully alleviated.
Thank you again for working on this !

breandan · Answer 1 · Sun Mar 11 2018 07:32:27 GMT+0800 (China Standard Time)

As described in the Readme (perhaps we can make this clearer), you can switch to the new Word Mode (doing so will disable search entirely) by changing the default keyboard shortcut from Activate AceJump mode to Activate Word Mode or one of the Word Mode variants proposed by @svensp. We will continue to work on reducing search latency, but if you need instant tagging, I would suggest using Word Mode.

Willem D'Haeseleer · Answer 2 · Sun Mar 11 2018 07:51:37 GMT+0800 (China Standard Time)

@breandan thanks for the prompt response !
I tried word mode but it does not fit my use case.
When Jumping to a an exact place my stare is already focused on the exact letter I want to jump to, my mind is already editing there, I just need the cursor to follow, any noise at that point is disturbing, Lighting up the whole screen with the new word mode is not helpful at that point.

I will also frequently jump to the middle of the word, in which case word mode does not apply at all.

I'm adding an additional 15$ bounty ( 30$ total ) to have this issue resolved:

Both bounties ( 30$ total ) can be claimed if the latency in for single character search in large files is back to the pre 3.3.2 performance.

breandan · Answer 3 · Sun Mar 11 2018 08:06:54 GMT+0800 (China Standard Time)

Thanks for your detailed feedback. If anyone feels inclined to submit a solution, AceJump gladly accepts Pull Requests! You can find our contributing guidelines here. Related: #207

Willem D'Haeseleer · Answer 4 · Sun Mar 11 2018 08:45:31 GMT+0800 (China Standard Time)

@breandan I like the gaze detection idea, but, if the goal of that feature is to reduce latency, wouldn't that be equally solved by just using the current visible text as a reference to where the user is looking ?
Gaze detection could be really useful to figure out what monitor / application window the user is looking at tough, that would be a really cool application of it I think, but you will still have latency issues if the whole file is scanned instead of just the screen.

breandan · Answer 5 · Sun Mar 11 2018 09:22:56 GMT+0800 (China Standard Time)

#218 would probably not reduce search latency, although it could reduce user latency. By using gaze as an input method, we can improve how tags are assigned and predict where users might select a tag to provide shorter, or more reachable tags. It could help RSI or motor impaired users.

It also might be used as a novel interactive mode, for caret movement. The trouble with using gaze for control directly, is it is very noisy. If implemented, we would need to implement a soft version, or a mode where users would explicitly activate the gaze detector (for example, press-and-hold).

Deleted user · Answer 6 · Wed Mar 21 2018 14:54:55 GMT+0800 (China Standard Time)

I'll see what I can do within the next few days

Carlo De Pieri · Answer 7 · Mon May 07 2018 02:14:48 GMT+0800 (China Standard Time)

Has any progress been made towards solving this particular issue?
I personally found Word Mode not to fit my workflow as well sadly.
Wouldn't it be possible to add an only-on-screen search mode beside the current full-text search mode?

breandan · Answer 8 · Sat May 26 2018 03:59:21 GMT+0800 (China Standard Time)

Has any progress been made towards solving this particular issue?

This one is still wide open AFAIK. @helmus Can you provide a specific latency that a solution needs to achieve in order to claim your bounty? Are you willing to accept 20ms? 30ms?

Thomas Nield · Answer 9 · Sat May 26 2018 22:52:43 GMT+0800 (China Standard Time)

Haven't dug into the source code in Solver.kt, but has concurrency been leveraged at all yet? Honestly, the way I would solve this type of problem is bring in RxJava and then use its switchMap() operator to kill each previous search task and start a new one.

https://thomasnield.gitbooks.io/rxjavafx-guide/content/9.%20Switching,%20Throttling,%20and%20Buffering.html

breandan · Answer 10 · Sun May 27 2018 10:28:19 GMT+0800 (China Standard Time)

This is a great suggestion. We use a weird concurrency primitive to implement switchMap(), so it would be nice to remove this. If we integrated RxKotlin, we could paint tags more asynchronously and simplify the code path considerably. Opened #237. Thanks @thomasnield!

Willem D'Haeseleer · Answer 11 · Wed May 30 2018 00:32:37 GMT+0800 (China Standard Time)

@breandan It's a little difficult to measure exact performance, but generally this statement still stands:

Both bounties ( 30$ total ) can be claimed if the latency in a single character search in large files is back to the pre 3.3.2 performance.

Using the native file search as a benchmark is probably a good idea. It is instant regardless of file size, but again acejump does not need to search the whole file, only what's on screen.

chylex · Answer 12 · Tue Nov 17 2020 19:07:44 GMT+0800 (China Standard Time)

I made a PR to address several performance issues and stuttering while scrolling #339.

Using the included latency unit test:

Before: 25.1s (avg 185ms)
After:   9.2s (avg  28ms)

Willem D'Haeseleer · Answer 13 · Fri Nov 20 2020 04:47:20 GMT+0800 (China Standard Time)

@chylex wonderful!
Did you test with larger files? I will absolutely pay out both bounties if it works with larger files ( per the bounties requirement ).
Large file = 1000 lines of lorem ipsum.

chylex · Answer 14 · Fri Nov 20 2020 04:56:47 GMT+0800 (China Standard Time)

Hopefully the PR gets pulled in soon so you can try it yourself, I'm not sure what exact format of lorem ipsum you're trying it on.

Willem D'Haeseleer · Answer 15 · Fri Nov 20 2020 05:11:24 GMT+0800 (China Standard Time)

you can use this generator:
https://www.lipsum.com/

Generate 100 paragraphs, then remove the empty lines with regex search and replace.

I'll point out that the version of AceJump before 3.3.2 would only search characters that were visible on the screen ( which makes sense since those are the only ones you can annotate).
The performance regression introduced by 3.3.2 is caused by searching all characters in the whole file, which of course becomes slower as the file grows.

chylex · Answer 16 · Fri Nov 20 2020 05:25:15 GMT+0800 (China Standard Time)

Well it's about 3-4 times faster, but if the gif you posted intially was 4 seconds then it's still going to take about 1 second to trigger, which is what I'm seeing with 100 paragraphs of lipsum. I'm trying to optimize it some more, but reverting it to only search visible area would be the best solution, I suggested it in the PR but haven't tried it myself yet.

chylex · Answer 17 · Fri Nov 20 2020 07:01:16 GMT+0800 (China Standard Time)

PR has been merged, I did just happen to find some more optimizations with rendering. I'll continue and probably make another PR tomorrow, but the main dev wants to try this immediately it's setting editor.document.isInBulkUpdate = true before text highlights are generated (and back to false afterwards), otherwise it re-renders the editor for every added highlight.

Willem D'Haeseleer · Answer 18 · Fri Nov 20 2020 07:41:49 GMT+0800 (China Standard Time)

@chylex how much effort do you estimate it would be to only search the visible area ( lazy loading on scroll not needed ).
I'm willing to increase the bounty to something that is more worth your time.

chylex · Answer 19 · Sat Nov 21 2020 01:40:37 GMT+0800 (China Standard Time)

Assigning text by only considering the visible area has been suggested, although this would not be compatible with AceJump's current search functionality due to the tag assignment problem. In order to assign tags, we must ensure no tags could ever result in a collision, which is basically an ambiguous string elsewhere in the editor text. We do not currently test for tag collisions, and I did not check very carefully, but it looks like your PR does not break this assumption.

@breandan Would you accept a PR that adds an option to only consider the visible area for both marking and typing, and attempting to scroll cancels AceJump? I think the use case where a user triggers AceJump and starts scrolling would be pretty rare. If I already have my hand on my mouse to scroll, I can click where I want to go much faster.

chylex · Answer 20 · Sat Nov 21 2020 04:51:59 GMT+0800 (China Standard Time)

I just noticed, now that the optimizations are in, the bottleneck is the skimming feature used in long documents which increases delay by 400 ms. I'd recommend either removing it completely, or changing the limit to maybe 100 kB because anything below that is now running at acceptable speeds.

breandan · Answer 21 · Sat Nov 21 2020 10:39:18 GMT+0800 (China Standard Time)

Would you accept a PR that adds an option to only consider the visible area for both marking and typing, and attempting to scroll cancels AceJump?

Sure, I will accept that PR as long as it does not require adding too much code. This is partly implemented, although it could use a settings page or action binding. Since there is always a default maximum search length, maybe the simplest solution is to expose that buffer length as a configurable option or default to "screen mode" if the query takes longer than <configurable> ms to complete. If you can come up with a lazy loading solution that meets the latency requirements and does not violate the tag assignment problem, I will match @helmus' bounty, up to $100 USD. Or, if you have another idea, feel free to suggest.

edit: To clear up a possible misunderstanding, AceJump also modifies the scrollbar to bring the search results into view if they are not on screen. So we listen for updates to the scrollbar to settle to paint the tags. This could probably be handled in a synchronous way, but to simplify things, we listen for all changes (either manual or AceJump triggered) with the same listener.

chylex · Answer 22 · Sun Nov 22 2020 03:02:53 GMT+0800 (China Standard Time)

To clear up a possible misunderstanding, AceJump also modifies the scrollbar to bring the search results into view if they are not on screen.

Personally I find that very confusing, especially when I mistype the tag and it instantly scrolls me to a place I have no context for. I think I'll just add the option to do search for what's visible and completely ignore the rest, although I did notice that if the text is very long (such as the lorem ipsum example), because AceJump only considers top and bottom offset, it also generates tags for text which is past the right edge of the editor view. I was looking for a way to fix that but maybe it's an edge case not worth looking into.

chylex · Answer 23 · Sun Nov 22 2020 13:21:56 GMT+0800 (China Standard Time)

I have everything implemented, will you be able to review #343 first? All the features branch off of that so I want to avoid merge conflicts when I add options.

Willem D'Haeseleer · Answer 24 · Wed Nov 25 2020 01:08:19 GMT+0800 (China Standard Time)

@chylex Thank you so much Chylex, I will verify the changes asap and award the bounty to you if the latency has been resolved 🙇

chylex · Answer 25 · Wed Nov 25 2020 02:08:13 GMT+0800 (China Standard Time)

If there are still any latency issues, I figured out how to deal with extremely long lines, but that is part of a major refactoring so it won't be ready for a while.

Willem D'Haeseleer · Answer 26 · Wed Nov 25 2020 05:03:33 GMT+0800 (China Standard Time)

@chylex I tested the canary build, works perfectly, very smooth, thank you so much. Please claim both bounties here and I will unlock the funds:
https://www.bountysource.com/issues/56061152-latency-issues-when-searching-large-files
https://www.bountysource.com/issues/46543096-acejump-lagging-on-large-files

chylex · Answer 27 · Wed Nov 25 2020 10:28:40 GMT+0800 (China Standard Time)

Claimed, thanks!