CesiumGS / cesium-native

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reduce network request gaps when loading tiles

csciguy8 opened this issue · comments

When instrumenting cesium native code, I've recently discovered some opportunities to improve network performance by reducing some apparent "gaps" when loading tiles.

Background

Below is a simplified diagram of how a tile is loaded. A worker thread fetched data needed for the tile, then processes it in some way that is useable to the native runtime that needs it (Ex. Unreal).
Load Gap - Diagram 1

We do this across multiple workers, in parallel, to achieve faster load times (configured with maximumSimultaneousTileLoads).
Here is an example of what multiple workers loading tiles could look like...
Load Gap - Diagram 2

While the workers are effectively busy 100% of the time, you may notice a gap between when a worker finishes downloading a tile and when it starts downloading the next one.
Load Gap - Diagram 3

Even though parallel fetches can help gaps in network utilization, it is still possible that network utilization is underutilized.

In the previous example, we configured 4 workers. You might expect that 4 network requests would always be in flight, but that's not the case. Notice the period of inactivity in the middle of the load.
Load Gap - Diagram 4

Ideally, we would batch the network requests as tightly as possible, to maximize network throughput.

Here is an alternate scheme where network requests are batched together as tightly as possible, with the processing work queued to different threads.

image

Notice the network inactivity event is gone and all workers are fetching for a longer, more contiguous block of time. Also, processing work is more densely packed among the tile workers, which may open more chances for memory cache hits or batching optimizations.

Proposed work

  • Start investigation at Tileset::_processWorkerThreadLoadQueue. This is where all potential tile work is known and parallel work is throttled with maximumSimultaneousTileLoads
  • Refactor TilesetContentManager::loadTileContent to separate the network fetch (CachingAssetAccessor::get) from the data post processing work.
  • Queue network fetch work together, potentially reusing maximumSimultaneousTileLoads to configure our maximum parallel network fetches
  • Data post processing work should execute as network fetch work completes. The best way to achieve this can be decided later, although the previous diagram hints at a separate pool of tile processing workers.

Benefits

  • Reduced total loading time
  • More consistent peak network usage during a loading event
  • More predictable scaling of maximumSimultaneousTileLoads. This now corresponds directly to parallel network requests

Reference

This work hints at moving parts our code towards a more "Data Parallel" perspective, where parts of our tile loading can continue to be broken down into small parallelizable tasks, with an emphasis on batching and throughput.

This ticket is very similar, with more ideas related to short vs long running tasks, #473

Here is data from the original investigation showing potential gaps in a Google 3D Tiles test (Chrysler Building, 828 tiles)
The highlighted row showed a tile that took 228 ms to complete, with a 26 ms gap where it was not fetching data from the network (gapUsecs).
Load Gap Analysis - Chrysler Release

Preliminary exploration is encouraging...

Have a branch that has isolated the "content fetch" part of tile loading work and dispatches all this work together, and as tightly as possible.

Testing with the Google Tiles test "LocaleChrysler" yields about a 15% reduction in total load time. Not all coding is finished, so there might be more gains when all is done.

https://github.com/CesiumGS/cesium-native/tree/network-work-refactor