HSF / PyHEP.dev-workshops

PyHEP Developer workshops

Home Page:https://indico.cern.ch/e/PyHEP2023.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Full automatic differentiation applications and gradient sharing

matthewfeickert opened this issue · comments

This topic doesn't seem to be fully represented across any of the other Issues yet, though there are connections between it and:

As far as I understand in the ecosystem we don't really have the ability to fully use and share gradient information (with the notable exception of neos). pyhf uses automatic differentiation but this information is all internal, and not something that is currently accessible from outside of the calculation (again c.f. neos).

I'm biased as this topic is something directly related to the IRIS-HEP Analysis Systems goals, but I would be very interested to know:

  • What are the plans moving forward across different tools to add full automatic differentiation support?
  • What are the barriers for this right now?
  • What coordination and planning needs to happen across the ecosystem for exchanging of graidents to be useful?

(Though he won't be able to attend in person (c.f. #5 (comment)) it would be useful to include @phinate in these discussions.)

+1 good point, a topic by itself! It maybe goes together with backends in general (yes, API is one but also JIT for example).

+1 of course!

There's certainly some blue sky thinking needed in order to determine what to do with the gradient information assuming we have it accessible. Most things have focused on a neos-type workflow as a practical medium-ground between traditional analysis and fully learning the likelihood (recent example from @lukasheinrich) -- are there ways in which we can go further? What happens if we have a differentiable simulator in the mix, for instance? Perhaps loss functions could involve matrix elements, for example, but in what way?

Just stirring the pot a bit, I have no good answers yet :)

+1 I'm particularly interested in how to do this at scale, perhaps identifying where network or serialization bottlenecks may occur in developing such a system with where there might be checkpointing or the fitter (or some other thing interested in gradients) could be a distributed object in the network/cluster.

+1 Interested in general for MC generator applications.