nokeedev / gradle-native

The home of anything about Gradle support for natively compiled languages

Home Page:https://nokee.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate performance improvement to balance off correctness

lacasseio opened this issue · comments

As a Nokee developer, I want to ensure Gradle + Nokee achieves a 40% performance improvement over straight Xcode, so the case for Gradle + Nokee is compelling.

This issue is not about the fact that we cannot achieve a 40% performance improvement over Xcode; it's about finding the current inefficiency in the task execution that prevents us from achieving the 40% performance improvement. So far, we have been chasing around small inefficiencies, such as unrolling Java Stream or using arrays instead of List. However, from our latest profiling run, assembling the derived data directory is showing up quite strongly. Despite our initial attempt to colocalize all our work into a single task, we may require access to some of Gradle's internal performance features (such as virtual file system watching). For this to happen using public APIs, we would need to split the Xcode target task into multiple independent Gradle tasks. This issue is about investigating what is required for this work and determining the performance improvement this change would introduce.

Acceptance Criteria

  • How much of an impact would a custom task assembling the derived data directory bring? The leading idea is by using incremental changes; we could only copy the files that change, thus reducing by a sizable margin the number of files to copy. We could also better synchronize the modification time (required by Xcode's incremental detection). Finally, it would solve #774.
  • How can we avoid the final sync of the Product directory? To conclude the build, we sync the Product directory of the derived data directory into a new directory for each consumption by the downstream tasks. We should look into avoiding this extra sync to reduce the work.

Here is a summary of our investigation. We explored extracting the code that 1) assembles the derived data directory, 2) synchronizes the resulting derived data Build/Products directory for exporting, and 3) isolates the Xcode project's targets. The first two items show heavily in the profiler, while the last item was mostly for convenience.

All three items can benefit from the InputChanges, which helps reduce the work required and keep the output directory in sync for our needs. The Xcode project isolation group all isolation work into a single task per project, which makes it more efficient on paper. As for the actual performance gain, we could see an improvement for the first item. We got down to about 1s faster than Xcode on average. However, the second item made us on par with Xcode due to more tasks requiring up-to-date checking. There was no to minimal impact with item 3. Digging deeper into why we are not seeing as much of a performance improvement as expected is explained by the lack of parallel inputs/outputs snapshotting. The whole up-to-date checking takes about 2.2s of the execution phase. This value scales with the number of tasks in the graph as the work is sequential. If Gradle checked everything in parallel, we would likely see this bottleneck disappear entirely. Waiting for the Gradle team to comment.

Despite having a small impact on the performance, the potential of such change would undoubtedly be more beneficial given some improvement to Gradle. We could also try to avoid the work in the second item.

  • Waiting for Gradle team to respond on parallel task input/output snapshot
  • Avoid synchronizing the resulting derived data by consuming the Build/Products directory in-place.

Consuming the derived data in place does bring some performance improvement. It's safe to say that we should export the derived data directory as used by Xcode, leaving the consumer to cherry-pick what they want from the directory. This behaviour should most likely only be internal, as if the derived data is restored from the cache it won't contain the same files (aka intermediate files are ignored).

With the last change, we are ~1s faster than Xcode in terms of incremental build (up-to-date checking). We could be faster, but we are still waiting for a response from the Gradle team. We would still need to shave about 2-3s to reach our target of 40% faster. Depending on what the Gradle team may respond, that may be the key to achieving this work. There are probably 500ms we can shave during the configuration phase.

That being said, Gradle would still be extremely fast when it comes to restoring from the cache as there will be no need to execute a rebuild. So cases bouncing between branches or a warm cache from CI would all be faster than Xcode.

See this branch for the work done as part of this spike.

Still waiting for a response from the Gradle team regarding snapshotting task inputs/outputs in parallel. The closes response we got is Configuration Cache should allow this, but Configuration Cache is broken because of gradle/gradle#22215. The only solution is to "erase" the nested provider during task dependency calculation which is not trivial.