Need help on plotting model output against raw OD

Question

Need help on plotting model output against raw OD

jnrsloth opened this issue 2 months ago · comments

Hey,
I'm trying to manually plot my pooled fitted data that is produced with

python3 amiga.py fit -i 'AMiGA_BY4741.txt' -o BY4741_YPD --interval 1200 --pool-by "Isolate,Substrate" --sample-posterior --save-cleaned-data --save-gp-data --verbose

However, when I plot the mu variable (which i believe is the OD approximation for the fit model) against my raw OD values, the values look nothing alike.
When I don't pool my data, the OD_Growth_Fit overlays my raw data perfectly.
Is there a transformation I need to do to mu to make it align with my raw values for plotting? The worst MSE value I have across my different substrates is 0.06, which I would consider pretty good, so I imagine that I'm just missing some sort of trick.

Sorry if I missed something in the docs!

Firas Midani · Answer 1 · Thu May 30 2024 05:42:44 GMT+0800 (China Standard Time)

I think the key idea here is that AMiGA does two basic manipulations to the growth curve before modeling it with Gaussian Process regression. These were higlighted in this page; Search for the following sub-heading The log-based estimates of AUC, K, and Death are relative! I am reproducing the relevant section here for clarity.

All growth curves should start with a non-zero OD which indicates the starting size of the microbial population. To estimate exponential growth rates and other metrics, AMiGA must transform the OD data with a natural logarithm in order to infer certain metrics like the maximum specific growth rate. Then, to account for variation in the starting OD, the measurement at the first time point is subtracted. In other words.

First, we apply natural log transformation
$$f(t) = \ln\text{OD}(t)$$

Second, we subtract first time point
$$f(t) = \ln\text{OD}(t) - \ln\text{OD}(0)$$

which is equivalent to
$$f(t) = \ln\left(\frac{\text{OD}(t)}{\text{OD}(0)}\right) $$

So at the first time point
$$f(0) = \ln\left(\frac{\text{OD}(0)}{\text{OD}(0)}\right) = \ln{1} = 0 $$

and all measurements of OD at other time points is thus relative to an arbitrary initial measurement of OD(0). This affects several metrics in particular AUC_log, K_log, and Death_log.

In the case of modeling growth curves separately one-at-a-time, it is straightforward to convert the growth predicted by AMiGA and check their fit against the raw growth curve. To do so, we can simply exponentiate then multiply by the baseline
$$\exp{\left[f(t)\right]} \times \text{OD}(0) = \exp{\left[\ln\left(\frac{\text{OD}(t)}{\text{OD}(0)}\right)\right]} \times \text{OD}(0) = \text{OD}(t)$$

The OD_Growth_fit exponentiates the predicted growth curves but does not correct for the baseline, so it is not completely clear me to me how they were able to overlay your raw data. Did you mean OD_Fit instead? Search for --save-gp-data in this page for more details.

In the case of pooled analysis, the mu prediction reverses neither of those manipulations. It basically models the baseline-corrected log-transformed data: $$f(t) = \ln\left(\frac{\text{OD}(t)}{\text{OD}(0)}\right) $$

In your case, you will have to reverse these steps manually. Do note: because you are modeling multiple growth curves, you don't have a single baseline. Instead, you have multiple baselines which vary from each other at least slightly. So, you may never be able to perfectly recover your raw data from the predicted curve because the GP regression model is inferring the mean baseline across all your growth curves. However, you can try to first exponentiate your mu values and then multiply by the baseline of each of your growth curves. Let's take the topmost green line in your plot gp_mu_Y1. This looks like it reaches about ~5.02 max value and let's say the baseline for its corresponding true data is about ~0.0075, then:

$$\exp{\left(5.02\right)} \times 0.0075 \simeq 1.13$$

This is ver close to the value of Y1 in your plot. Please try this out on your mu values and see if you are able to overlay the un-transformed mu values on your raw data.

jnrsloth · Answer 2 · Fri May 31 2024 15:51:00 GMT+0800 (China Standard Time)

Thank you!
I had tried the exponentiation step but forgot to multiply it by the baseline! The data fits much nicer now.

And you are right, I do mean OD_Fit, however my growth curves were inoculated at T0 to just above the detection limit of my plate reader, so when the graph was small enough, OD_Growth_fit looked like it was overlapping :)
Thanks again for your help!