smallrye / jandex

Java Annotation Indexer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Index classes used in the constant pool

FroMage opened this issue · comments

Issue #50 was flawed because we didn't want to index all methods calls, so I thought that clients could specify which calls we wanted to flag during indexing, but this doesn't work with packaged indexes.

Now, in Quarkus we have a double step to run transformers conditionally:

  • We have an APT plugin in some modules that add a marker file to META-INF
  • If the marker is here, we register transformers for all class files of that jar, but only if their constant pool references class constants from a list Foo,Bar,Gee

This allows us to only transform classes that use a certain special feature.

Now, in quarkusio/quarkus#814 I want to add a transformer based on the usage of a Router class, and could take the same road, but it would still require scanning the constant pool of every class. I would either create a new extension just for this, or add it to the existing resteasy extension, but every user would pay the ConstPool scanning.

And in quarkusio/quarkus#10929 I definitely don't want to create an extension for a single class, which should be provided by Quarkus Core. And I don't want to add it to Quarkus Core if it means scanning every application class ConstPool even if nobody uses this class. I can't use the marker file in this case.

Therefore, I think we can solve all issues if Jandex indexes CONSTANT_Class (7) entries. Ultimately all method calls reference Fieldref/Methodref/InterfaceMethodref (https://docs.oracle.com/javase/specs/jvms/se14/html/jvms-4.html#jvms-4.4.2) which reference CONSTANT_Class_info (https://docs.oracle.com/javase/specs/jvms/se14/html/jvms-4.html#jvms-4.4.1).

I don't think this would add a huge size to the index, and would allow us to scan efficiently for classes which make calls to special classes, which would work for ORM with Panache, as well as the Reverse-Router and Logging with Panache use-cases without having to use marker files or even creating extensions for single classes just to have the marker.

I can work on this if you think this is the right idea.

WDYT @n1hility @stuartwdouglas @mkouba ?

I think the idea is ok. My gut feeling is that it won't bloat the index size too much, as we should already have DotName objects for a lot of the classes.

Yeah lets look at some numbers on this. I think its a good idea.

@FroMage do you have the bandwidth to do a small prototype?

I have an implementation, I tried it on indexing Jandex itself, and got these index sizes:

Indexed 145 classes
V9 size: 774 bytes
V10 size: 830 bytes

I'll do a PR and we can take it from there to see if we merge or not.

I tried it on indexing Jandex itself

You should probably try some bigger library with large classes too ;-)

PS. I like the idea...

I started looking into this, because I have a reasonable prototype of quarkusio/quarkus#10929 and promised to @FroMage that I'll take this Jandex task over. I rebased his branch of top of current Jandex master, cleaned up a little and pushed here: https://github.com/Ladicek/jandex/commits/class-constant-indexing I'll refer to that version as 2.2.3.Final-SNAPSHOT.

Then, I did some measurements with hibernate-core-5.4.27.Final.jar. That JAR itself, FYI, is 7.1 MB.

First, I indexed that JAR 10 times with Jandex 2.2.2.Final:

$ for I in $(seq 10) ; do java -jar ~/.m2/repository/org/jboss/jandex/2.2.2.Final/jandex-2.2.2.Final.jar -o without-users$I.jandex ~/.m2/repository/org/hibernate/hibernate-core/5.4.27.Final/hibernate-core-5.4.27.Final.jar ; done
Wrote without-users1.jandex in 1,4640 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users2.jandex in 1,3370 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users3.jandex in 1,3580 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users4.jandex in 1,3270 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users5.jandex in 1,3620 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users6.jandex in 1,3820 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users7.jandex in 1,4450 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users8.jandex in 1,3710 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users9.jandex in 1,4090 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users10.jandex in 1,3930 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)

Then, I did the same with 2.2.3.Final-SNAPSHOT:

$ for I in $(seq 10) ; do java -jar target/jandex-2.2.3.Final-SNAPSHOT.jar -o with-users$I.jandex ~/.m2/repository/org/hibernate/hibernate-core/5.4.27.Final/hibernate-core-5.4.27.Final.jar ; done
Wrote with-users1.jandex in 2,2850 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users2.jandex in 2,3410 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users3.jandex in 2,3240 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users4.jandex in 2,2720 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users5.jandex in 2,2630 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users6.jandex in 2,2900 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users7.jandex in 2,3330 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users8.jandex in 2,3540 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users9.jandex in 2,2470 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users10.jandex in 2,3020 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)

So, just this change increased the indexing time by 1 second, that is, almost doubled (!), for the Hibernate Core JAR, which has 4801 classes and 54513 class usages (that's the new thing). That's quite a lot I'd say, and I think I need to look into it more.

Just to be sure load performance isn't impacted as severely, I also ran my LoadIndexAndDumpHeap class 10 times for the index generated by 2.2.2.Final:

$ for I in $(seq 10) ; do java -cp target/classes/:target/test-classes/ org.jboss.jandex.test.util.LoadIndexAndDumpHeap without-users1.jandex without-users$I.hprof ; done
Reading org.jboss.jandex.Index@2077d4de took 67ms
Reading org.jboss.jandex.Index@2077d4de took 69ms
Reading org.jboss.jandex.Index@2077d4de took 72ms
Reading org.jboss.jandex.Index@2077d4de took 69ms
Reading org.jboss.jandex.Index@2077d4de took 80ms
Reading org.jboss.jandex.Index@2077d4de took 76ms
Reading org.jboss.jandex.Index@2077d4de took 70ms
Reading org.jboss.jandex.Index@2077d4de took 82ms
Reading org.jboss.jandex.Index@2077d4de took 82ms
Reading org.jboss.jandex.Index@2077d4de took 78ms

And for the index generated by 2.2.3.Final-SNAPSHOT:

$ for I in $(seq 10) ; do java -cp target/classes/:target/test-classes/ org.jboss.jandex.test.util.LoadIndexAndDumpHeap with-users1.jandex with-users$I.hprof ; done
Reading org.jboss.jandex.Index@2077d4de took 88ms
Reading org.jboss.jandex.Index@2077d4de took 112ms
Reading org.jboss.jandex.Index@2077d4de took 105ms
Reading org.jboss.jandex.Index@2077d4de took 106ms
Reading org.jboss.jandex.Index@2077d4de took 87ms
Reading org.jboss.jandex.Index@2077d4de took 109ms
Reading org.jboss.jandex.Index@2077d4de took 97ms
Reading org.jboss.jandex.Index@2077d4de took 107ms
Reading org.jboss.jandex.Index@2077d4de took 91ms
Reading org.jboss.jandex.Index@2077d4de took 89ms

Load time increased by roughly 25 ms, or cca 30 %. That isn't great either, but probably acceptable (I guess?).

Finally, I used Eclipse MAT to figure out the retained size of the Index object. The heap dumps were generated by the previous test.

The index generated by Jandex 2.2.2.Final, loaded by 2.2.3.Final-SNAPSHOT, takes 5 499 040 bytes.

At the same time, the index generated by Jandex 2.2.3.Final-SNAPSHOT, loaded by the same version, takes 6 220 168 bytes.

That's an increase of cca 700 kB, or more than 10 %. That seems acceptable too, but I'd love to hear everyone's opinion.

The real test would be to index rt.jar :)

Btw it’s interesting that the name table is that much larger, I assume it’s likely references to the entire JDK library. However then I am also surprised because it should increase memory usage significantly.

I’ll have to look but the names are sorted not for reproducibility but because the index is compressed in radix tree form (e.g prefix is stored once)

(FWIW I am not too concerned with index write perf unless we think it will hurt Quarkus dev mode perf in some way)

Ah, I did notice that the size increased slightly (5,100,112 vs 5,101,571). Maybe sorting by length then hash code will get the best of both worlds?

Just for completeness the class-const-indexing branch without my PR gives
Wrote with-users.jandex in 5.7040 seconds (19832 classes, 53 annotations, 2321 instances, 241792 class usages, 5746347 bytes)

If the sort isn’t precise then the writing portion needs to change as well because then you need to write the entire dotname for each entry. Otherwise when you load the index it will be corrupts

Thanks @stuartwdouglas, that's an interesting finding. I thought -- what about if we only use a HashMap for building the tables in memory, and before we write them down to the index, we build a TreeMap from the HashMap. That way, we should only pay the price of sorting once, which should be much better. And indeed it is:

$ for I in $(seq 10) ; do java -jar ~/.m2/repository/org/jboss/jandex/2.2.2.Final/jandex-2.2.2.Final.jar -o without-users$I.jandex ~/.m2/repository/org/hibernate/hibernate-core/5.4.27.Final/hibernate-core-5.4.27.Final.jar ; done
Wrote without-users1.jandex in 1,3390 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users2.jandex in 1,3540 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users3.jandex in 1,4340 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users4.jandex in 1,5810 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users5.jandex in 1,3240 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users6.jandex in 1,3940 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users7.jandex in 1,3730 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users8.jandex in 1,3760 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users9.jandex in 1,3210 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)
Wrote without-users10.jandex in 1,3860 seconds (4801 classes, 39 annotations, 2272 instances, 1140798 bytes)

And, with a new commit on my branch:

$ for I in $(seq 10) ; do java -jar target/jandex-2.2.3.Final-SNAPSHOT.jar -o with-users$I.jandex ~/.m2/repository/org/hibernate/hibernate-core/5.4.27.Final/hibernate-core-5.4.27.Final.jar ; done
Wrote with-users1.jandex in 1,2410 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users2.jandex in 1,1980 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users3.jandex in 1,1840 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users4.jandex in 1,2990 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users5.jandex in 1,2410 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users6.jandex in 1,1210 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users7.jandex in 1,2540 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users8.jandex in 1,2270 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users9.jandex in 1,2700 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)
Wrote with-users10.jandex in 1,2330 seconds (4801 classes, 39 annotations, 2272 instances, 54513 class usages, 1281212 bytes)

It's obviously not as good as @stuartwdouglas's results, but it doesn't affect the index structure in any way, and it's still an improvement while adding a new feature :-)

EDIT: the change I did is here: Ladicek@0c90ef9

And just for the fun of it, I also tried with rt.jar. (Everything I'm doing is with OpenJDK 8u275, FYI.)

$ java -jar ~/.m2/repository/org/jboss/jandex/2.2.2.Final/jandex-2.2.2.Final.jar -o without-users.jandex /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/rt.jar 
Wrote without-users.jandex in 4,1380 seconds (19832 classes, 53 annotations, 2321 instances, 4746296 bytes)

And with my latest branch:

$ java -jar target/jandex-2.2.3.Final-SNAPSHOT.jar -o with-users.jandex /usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/lib/rt.jar
Wrote with-users.jandex in 3,1840 seconds (19832 classes, 53 annotations, 2321 instances, 241784 class usages, 5392514 bytes)

So, not bad! :-)

I spent some time looking for possible performance improvements in the index reader, but couldn't find any. That doesn't mean there aren't any, I'm a performance newbie really :-)

I've seen the code creates a bunch of HashMaps with initial capacity equal to the number of entries that will be inserted, so obviously there's gonna be resizing, but setting the capacity to 2 * expected number of entries (or 1.5 * expected number of entries, which I computed as n + (n >> 1)) leads to very minuscule improvements.

Then I thought perhaps there's a lot of cache misses, as the code constantly accesses the name table and the other arrays, which can possibly be pretty big and the access is random, but I don't really know how much the JVM code itself skews perf stat output, and I certainly don't feel like refactoring the code to access data in a predictable manner.

So I'm thinking I'll just declare the existing code "good enough" and move on :-)

I'm in favor of including this capability (I haven't had a chance to review the change in more detail though). I aim to do a jandex 3.0 adding this and order preservation support.

OK, thanks! I'll submit a new PR, because I don't think I can overwrite @FroMage's one.

Submitted #112.