Full automatic differentiation applications and gradient sharing

Question

Full automatic differentiation applications and gradient sharing

matthewfeickert opened this issue a year ago · comments

This topic doesn't seem to be fully represented across any of the other Issues yet, though there are connections between it and:

#3
#4
#5

As far as I understand in the ecosystem we don't really have the ability to fully use and share gradient information (with the notable exception of neos). pyhf uses automatic differentiation but this information is all internal, and not something that is currently accessible from outside of the calculation (again c.f. neos).

I'm biased as this topic is something directly related to the IRIS-HEP Analysis Systems goals, but I would be very interested to know:

What are the plans moving forward across different tools to add full automatic differentiation support?
What are the barriers for this right now?
What coordination and planning needs to happen across the ecosystem for exchanging of graidents to be useful?

(Though he won't be able to attend in person (c.f. #5 (comment)) it would be useful to include @phinate in these discussions.)

Peter Fackeldey commented a year ago

+1

Alexander Held commented a year ago

+1

Oksana Shadura commented a year ago

+1

Jonas Eschle · Answer 1 · Mon Jul 24 2023 13:20:38 GMT+0800 (China Standard Time)

+1 good point, a topic by itself! It maybe goes together with backends in general (yes, API is one but also JIT for example).

Nathan Simpson · Answer 2 · Mon Jul 24 2023 16:10:35 GMT+0800 (China Standard Time)

+1 of course!

There's certainly some blue sky thinking needed in order to determine what to do with the gradient information assuming we have it accessible. Most things have focused on a neos-type workflow as a practical medium-ground between traditional analysis and fully learning the likelihood (recent example from @lukasheinrich) -- are there ways in which we can go further? What happens if we have a differentiable simulator in the mix, for instance? Perhaps loss functions could involve matrix elements, for example, but in what way?

Just stirring the pot a bit, I have no good answers yet :)

Lindsey Gray · Answer 3 · Mon Jul 24 2023 20:33:27 GMT+0800 (China Standard Time)

+1 I'm particularly interested in how to do this at scale, perhaps identifying where network or serialization bottlenecks may occur in developing such a system with where there might be checkpointing or the fitter (or some other thing interested in gradients) could be a distributed object in the network/cluster.

ph-ilten · Answer 4 · Tue Jul 25 2023 21:50:17 GMT+0800 (China Standard Time)

+1 Interested in general for MC generator applications.