cylc / cylc-ui

Web app for monitoring and controlling Cylc workflows

Home Page:https://cylc.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

efficiency: investigate bottleneck

oliver-sanders opened this issue · comments

See also: cylc/cylc-uiserver#547

This workflow has proven to be remarkably difficult for the UIS & UI to handle:

#!Jinja2

{% set members = 10 %}
{% set hours = 100 %}

[scheduler]
    allow implicit tasks = True

[task parameters]
    member = 0..{{members}}
    fcsthr = 0..{{hours}}
  [[templates]]
    member = member%(member)03d
    fcsthr = _fcsthr%(fcsthr)03d

[scheduling]
  initial cycle point = 2000
  runahead limit = P3
  [[xtriggers]]
    start = wall_clock(offset=PT7H15M)
  [[graph]]
    T00,T06,T12,T18 = """
        @start & prune[-PT6H]:finish => prune & purge
        @start => sniffer:ready_<member,fcsthr> => <member,fcsthr>_process? => finish
        <member,fcsthr>_process:fail? => fault
      """

[runtime]
    [[sniffer]]
        [[[outputs]]]
{% for member in range(0, members + 1) %}
    {% for hour in range(0, hours + 1) %}
            ready_member{{ member | pad(3, 0) }}_fcsthr{{ hour | pad(3, 0) }} = {{ member }}{{ hour }}
    {% endfor %}
{% endfor %}

For more information see: https://cylc.discourse.group/t/slow-load-of-cylc-workflows-disconnects/823/19

Investigation so far has confirmed:

  • The scheduler is not a source of delay.
  • The UIS chokes on the update for several seconds.
    • During this time, updates to other workflows are suspended
  • The UI chokes on the deltas of several seconds.
  • The browser takes a couple of seconds to update.

This issue focuses on the UI side of things.

Suggested remediation (UI only, please update with new suggestions):

  • #1581
    • We are requesting overlapping data between cycle points and families.
  • #347
    • Potentially up to a 50% speedup for the tree view.
    • We are requesting job info for all jobs when the tree view is loaded.
    • But we only need it when the node is expanded.
    • We could request it on demand, but if we did, it wouldn't be responsive.
    • Investigation required...
  • #1618
    • Don't auto-expand all cycle points in the tree view when first loaded if there are too many tasks to comfortably display.
  • #1617
    • Work out a more efficient solution to the expand/collapse buttons in the tree view.
  • #1632
    • Switch from arrays to a data structure with better insert/remove performance at arbitrary indices.
  • #1623
    • Store optimisation reducing the impact of large families.
  • #1631
    • Remove an O(n^2) routine from the data store reducing the cost of nodes with a high number of children.

IMO, the UI side of this issue is more concerning than the UIS side because UIS delay loads the server, whereas UI delay hits the user's browser.

The bulk of the time is being taken in the data store processing the deltas, this should be the first target for improvement. Profiling required to highlight problem areas, given that the table view is only slightly faster to load than the tree view, family tree computation is unlikely to be the cause.

Profiling Experiments

1 - JS Profiling

Profile the time it takes to load the tree view for the workflow in the OP with hours turned down to 20. Workflow is started in paused mode.

Results:

  • 5.68 seconds of scripting time.
    • 1.5s seconds of which is accounted for by the UPDATE_DELTAS call chain (i.e. workflow data store stuff)
      • 0.61 seconds of which is applyInheritance.
      • 0.90 seconds of which is shared between createTreeNode and addChild.

The remainder appears to be vuejs.

2 - View Load Time

Open a view, then measure the time it takes to open the same view in a new workspace tab.

  • Because you've already opened the view, there's nothing for the data store to do.
  • This gives you an idea of mount/render time.

Manual timings to the nearest second:

  • Tree - 13s
  • Table - 2s
  • Simple Tree - 0s

3 - Component Loading

Start with the "simple tree" view and add in the components used by the regular "tree" view one by one, measuring the impact on load time for each.

  • Simple Tree ~0s
    • With <Task /> icons ~3s
      • With <Job /> icons ~4.5s
        • With <v-btn /> buttons (used for expand/collapse) ~8.5s
          • With <v-icon /> icon (used for expand/collapse icons) ~11.5s

Using these timings to extract the cost per component:

  • 4s <v-btn />
  • 3s <Task />
  • 3s <v-icon />
  • 1.5s <Job />

Note: These costs are for 1'000 tasks, e.g 0.004s per <Task /> icon.

Conclusions

  1. The store is a little sluggish, we should look into possible optimisations
  2. The real killer is the component count in the Tree view.
  3. Potential for easy gains simplifying the expand/collapse system.

Remediation:

  • Implement the expand/collapse icons natively rather than with Vue components (i.e. don't use vuetify).
    • Potential for up to 60% reduction in mount time.
    • #1617
  • Use a virtual scroller tree.
    • Potential for a ~25x reduction in mount time on my screen.
    • Related to #1159 as Element provides a virtual tree implementation.
  • Don't auto-expand cycle points when the node count is too high.
    • Potential for a 5x reduction in mount time (5 cycles in this example).
    • #1618
  • Ideas?

The three optimisations up so far make a reasonable dent in the CPU time.

The time is going into two places:

  • Unpacking deltas into the data store.
  • Iterating the store, mounting and rendering components in the views.

The data store time is more concerning than the view time as views can be optimised (e.g. table view reduces the number of nodes on screen by pagination, tree view can use a virtual scroller in the future to similar effect) but the data store time will always remain so the store should be the main target of optimisation.