Generating samples with their log-probability

Question

Generating samples with their log-probability

simonschnake opened this issue a year ago · comments

Description

There a some use-cases (at least I have one), where some also needs the ladj,
while calculating the inverse operation. For my use-case I am not only using the normalizing flow
to generate samples, but also want to know the likelihood of the produced samples.

Implementation

Implementation could be somewhat hard, because in principle every transformation would need to include another
method inverse_and_ladj. I am willing to help and contribute pull requests for that. My main focus at the moment are neural spline flows.

I'm mainly opening this issue to figure out if this is wanted, and one has to decide how to introduce the functionality in the consuming classes of the transformations.

Davide Valsecchi · Answer 1 · Thu Apr 06 2023 17:24:53 GMT+0800 (China Standard Time)

This feature would be particularly necessary for applications like importance sampling, where samples and their probability should be computed with the least number of steps possible.

I'm not sure if I have understood enough the zuko code, but would it work just changing this line? https://github.com/francois-rozet/zuko/blob/master/zuko/distributions.py#L121 from

 def rsample(self, shape: Size = ()) -> Tensor:
        if self.base.has_rsample:
            z = self.base.rsample(shape)
        else:
            z = self.base.sample(shape)

        return self.transform.inv(z)

to

 def rsample(self, shape: Size = ()) -> Tensor:
        if self.base.has_rsample:
            z = self.base.rsample(shape)
        else:
            z = self.base.sample(shape)

        return self.transform.inv.call_and_ladj(z)  # with a proper inversion of the prob.

I would be happy to help adding this feature 👍

François Rozet · Answer 2 · Fri Apr 07 2023 15:53:59 GMT+0800 (China Standard Time)

Hello @simonschnake and @valsdav, thank you for your interest in Zuko 🔥 This is a sensible request. I just have a few questions:

Do you need the ladj of the inverse or the log_prob of your samples?

The former would require to modify all transformations while the latter might only require to modify the NormalizingFlow class.

Do you need this as a convenience enhancement or as a performance improvement?

This could dramatically change the way the feature is implemented.

Simon Schnake · Answer 3 · Sat Apr 08 2023 00:11:37 GMT+0800 (China Standard Time)

I would need the log_prob of the produced samples. To my understanding, the correct log_prob(y) = log_prob(x) + ladj(x), so calculating the ladj for the inverse transformation would be necessary.
If you check out other libraries like NFlows, Distrax, they compute the ladj in both directions.

I have roughly written an example using the NSF Flow. I could provide a pull request.

Thanks a lot for your help
Simon

François Rozet · Answer 4 · Sat Apr 08 2023 17:58:29 GMT+0800 (China Standard Time)

Before submitting a PR, we need to decide how the feature should be implemented. It is a major API change so I don't want to rush it. IMHO, nflows does this in a bad way: transformations always compute their ladj, which is a waste of resources most of the time.

I like the idea of @valsdav to use the call_and_ladj of the inverse transform instead of adding a new method inverse_and_ladj. However, as is, it would not be more efficient as the call_and_ladj function of Inverse first applies _inverse and then computes log_abs_det_jacobian. Hence, it would be equivalent, in terms of performance, to a convenience method

def rsample_and_log_prob(self, shape: Size = ()) -> Tuple[Tensor, Tensor]:
    x = self.rsample(shape)
    log_p = self.log_prob(x)
    return x, log_p

My idea is to modify the inv property of ComposedTransform such that it does not return an Inverse transform, but a new ComposedTransform with its transformations reversed. With that modification, calling inv.call_and_ladj(y) should be more efficient. However, this "performance bump" will likely be negligible with respect to the cost of inverting autoregressive transformations anyway.

François Rozet · Answer 5 · Sun Apr 09 2023 17:06:12 GMT+0800 (China Standard Time)

My proposition is something like

diff --git a/zuko/distributions.py b/zuko/distributions.py
index b975e76..82a02e1 100644
--- a/zuko/distributions.py
+++ b/zuko/distributions.py
@@ -120,6 +120,18 @@ class NormalizingFlow(Distribution):
 
         return self.transform.inv(z)
 
+    def rsample_and_log_prob(self, shape: Size = ()) -> Tuple[Tensor, Tensor]:
+        if self.base.has_rsample:
+            z = self.base.rsample(shape)
+        else:
+            z = self.base.sample(shape)
+
+        log_p = self.base.log_prob(z)
+        x, ladj = self.transform.inv.call_and_ladj(z)
+        ladj = _sum_rightmost(ladj, self.reinterpreted)
+
+        return x, log_p - ladj
+
 
 class Joint(Distribution):
     r"""Creates a distribution for a multivariate random variable :math:`X` which
diff --git a/zuko/transforms.py b/zuko/transforms.py
index e926ee6..e4c087b 100644
--- a/zuko/transforms.py
+++ b/zuko/transforms.py
@@ -107,6 +107,17 @@ class ComposedTransform(Transform):
             x = t(x)
         return x
 
+    @property
+    def inv(self):
+        new = self.__new__(ComposedTransform)
+        new.transforms = [t.inv for t in reversed(self.transforms)]
+        new.domain_dim = self.codomain_dim
+        new.codomain_dim = self.domain_dim
+
+        Transform.__init__(new)
+
+        return new
+
     def _inverse(self, y: Tensor) -> Tensor:
         for t in reversed(self.transforms):
             y = t.inv(y)

It provides a slight boost in performance, and I think it would make back-propagation slightly more stable.

Simon Schnake · Answer 6 · Sun Apr 09 2023 20:45:04 GMT+0800 (China Standard Time)

That seems very elegant and nice, but I have the feeling that I can get very complicated.
We would design a second inverse class for each transformation class.

I shortly designed a fitting version of the MonotonicAffineTransform
here simonschnake/zuko@master...simonschnake-patch-1

Could be that I misunderstand the architecture, and it is a lot simpler.

François Rozet · Answer 7 · Sun Apr 09 2023 21:48:47 GMT+0800 (China Standard Time)

Sorry, I was not clear enough. Only transformations that can profit from a faster .inv.call_and_ladj should have their inv property overwritten. Currently, this is only the case of ComposedTransform and maybe FreeFormJacobianTransform. For univariate transformations it would be useless as they are often used with AutoregressiveTransform.

Davide Valsecchi · Answer 8 · Wed Apr 12 2023 13:57:53 GMT+0800 (China Standard Time)

Thanks a lot @francois-rozet ! 💯