sourcegraph / scip-clang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate performance improvements

varungandhi-src opened this issue · comments

I did some rough benchmarking on scip-clang (based on Clang 16) performance, as well as some code digging to see if there were any easy wins. It seems like the perf overhead compared to lsif-clang (based on Clang 11) is nearly 80%-90%.

  100 TUs 500 TUs 1000 TUs
lsif-clang time 4.3 32.8 74.8
scip-clang time 5.23 59.7 (+ 82%) 144.33 (+ 93%)
lsif-clang index size 80M 544M 1.2G
scip-clang index size 11M 80M 144M

When comparing scip-clang to type-checking only with Clang 14 (the one I have on my system), it seems like the indexer adds about 30% overhead over type-checking. Ideally IMO this would be at most 10%-15%. However, it's still much smaller than the 80%-90% number.

scip-clang overhead scaling vs TU size

So somehow, lsif-clang is even faster than the baseline type-checking in Clang 14. Some potential hypotheses:

  • The clangd code in lsif-clang skips a bunch of declarations. We don't do any of that in scip-clang because we want to index everything. It might be possible to skip some things maybe, not sure. I haven't looked at the code deeply, but it's possible that if the traversal in lsif-clang is fused with type-checking (not unreasonable), then type-checking work itself is getting skipped for a bunch of decls (which would explain the "negative overhead").
  • There is some serious type-checking perf regression in Clang itself between Clang 11 and Clang 14 which hasn't yet been fixed in Clang 16, explaining the missing 50%-60%. <- Seems implausible, but can't rule it out.

I'm not going to spend more time on this right now, since there isn't any obvious low-hanging fruit. That said, I think that

  • Parallelizing index merging (#139) should help get a 5%-15% perf improvement, depending on the amount of code and core count.
  • We should get 10%-15% perf improvement using the techniques outlined in #27.