constant state of 'Indexing'

Question

constant state of 'Indexing'

andyczerwonka opened this issue 5 months ago · comments

Describe the bug

After the latest upgrade last night, my project is now in a constant state of indexing, with one of my cores pinned at 100%. The new version is unusable to me. I've attached the index error report.

reports.zip

Expected behavior

Indexing completes.

Operating system

Linux

Editor/Extension

VS Code

Version of Metals

v1.3.1

Extra context or search terms

I'm also noticing that many of my project now report SemanticDB errors, but I'm assuming that's because the indexing is not completing.

Tomasz Godzik · Answer 1 · Sat May 18 2024 00:27:51 GMT+0800 (China Standard Time)

Thanks for reporting! Any chance to get a stack trace of the case where indexing hangs? Would probably make it much easier to figure out

Andy Czerwonka · Answer 2 · Sat May 18 2024 00:37:27 GMT+0800 (China Standard Time)

Thanks for reporting! Any chance to get a stack trace of the case where indexing hangs? Would probably make it much easier to figure out

If you pull in the latest spark-core into any example, you will see that it hangs on the hadoop client dependency. I just used SizeEstimator.estimate from spark-core in the example to pull it in. You should see it fail for you.

Tomasz Godzik · Answer 3 · Sat May 18 2024 05:26:55 GMT+0800 (China Standard Time)

Ok, this was interesting. Our issue was caused by hadoop releasing broken sources with:
public static final String REFERRER_ORIGIN_HOST = "audit.example.org.apache.hadoop.shaded.org.;
And we couldn't find the end quote.

We assumed that sources released actually compile, which is a reasonable assumption I though. Added a workaround for that possibility.

Andy Czerwonka · Answer 4 · Sat May 18 2024 05:54:37 GMT+0800 (China Standard Time)

We assumed that sources released actually compile, which is a reasonable assumption

💯 There's no way sources should be broken. I'll report that to the project.

Andy Czerwonka · Answer 5 · Sat May 18 2024 22:19:11 GMT+0800 (China Standard Time)

Turns out that I was able to get rid of that dependency in my project, so there is to rush to release the fix, at least for me. I think it's a pretty rare occurrence, saying that, it did come in via Spark, which I don't think is that rare.

Andy Czerwonka · Answer 6 · Tue May 21 2024 03:40:11 GMT+0800 (China Standard Time)

I'll report that to the project.

I tried. They're making it too hard to do so.

Tomasz Godzik · Answer 7 · Tue May 21 2024 03:42:46 GMT+0800 (China Standard Time)

No worries, I will just merge the workaround and hopefully this should no happen with the next release. Though I think it's worth investigating on their side why this happened. I am pretty sure we are not the only tooling that might use sources.

Jean-Luc Deprez · Answer 8 · Wed May 29 2024 16:48:56 GMT+0800 (China Standard Time)

Any chance that it's little more complicated?

I observe indexing takes long for a fairly simple project: e.g. time: indexed workspace in 3m16s where it didn't before.

~~When looking at the library list I also see a couple very odd things, like the entire compilation stack is added to the library collection, while I don't see these pop up in e.g. the bloop files.~~

I seem to only observe this in a mono-module project, not in a multi module project.

(Note GraalVM JDK 21, Metals version: 1.3.1)

UPDATE: the zinc stuff could there because my project dirs also generate .bloop folders.

Jean-Luc Deprez · Answer 9 · Wed May 29 2024 17:13:21 GMT+0800 (China Standard Time)

Still something is off:

Metals version: 1.2.2
time: indexed workspace in 29s

Metals version: 1.3.1
time: indexed workspace in 2m15s

For the same bloop files

Tomasz Godzik · Answer 10 · Wed May 29 2024 17:28:31 GMT+0800 (China Standard Time)

We started indexing Java jars, but that should not cause that much of an increase 🤔

Any chance to get this as a repro or at least a basic build.sbt with the deps? The issue you are experiencing is for sure something different.

Jean-Luc Deprez · Answer 11 · Wed May 29 2024 20:04:32 GMT+0800 (China Standard Time)

Sadly the dep tree starts with an internal library suite. So not easy to make that available, but I'll have a shot looking at the difference in file accesses between both versions using procmon.

But likely not before tomorrow.

Tomasz Godzik · Answer 12 · Wed May 29 2024 20:32:20 GMT+0800 (China Standard Time)

Is the project using a lot of Java deps? Could that explain a difference in indexing? Also this should only be at the start

Jean-Luc Deprez · Answer 13 · Tue Jun 11 2024 21:00:51 GMT+0800 (China Standard Time)

How do you define "Java deps" here? I would guess they're all jars.

But yes, it looks to be in that general area.

I monitored both metals 1.2.2. and 1.3.1, specifically looking for file events in to my ivy and maven repo's.

version	file events
1.2.2	38k
1.3.1	1.5m

What's more then a bit suspicious to me it that (starting 14:26:30) over a period of 15 seconds it seems to be opening and closing the same file 3k times. Each time navigating the file tree in the process.

This is just one example it seems to be a recurring phenomenon with (all) other deps. Looks like something is unnecessarily chatty with these files?

FYI post indexing it shows 180 deps, but that's including the SBT deps (from the other "issue").

Jean-Luc Deprez · Answer 14 · Tue Jun 11 2024 21:01:49 GMT+0800 (China Standard Time)

Sorry for the delayed response btw.

Tomasz Godzik · Answer 15 · Tue Jun 11 2024 21:27:08 GMT+0800 (China Standard Time)

No worries, we started to index Java dependencies to for searching dependencies, which is why I asked about it. We might have a bug there

Jean-Luc Deprez · Answer 16 · Tue Jun 11 2024 21:50:26 GMT+0800 (China Standard Time)

While I can't make up my mind from browsing the sources of the different meta projects (on GH) and I know I'm freewheeling here, but...

It looks like somehow it's re-opening that jar file (multiple times even) for each source file it contains. (in case of this guava file 636 files)

Jean-Luc Deprez · Answer 17 · Thu Jun 13 2024 19:27:32 GMT+0800 (China Standard Time)

Should I log a new ticket?

Tomasz Godzik · Answer 18 · Thu Jun 13 2024 19:48:39 GMT+0800 (China Standard Time)

Sure! Makes sense. I haven't had the chance to look into it yet.