Function for batch analysis of new files

Question

Function for batch analysis of new files

drguggiana opened this issue 3 years ago · comments

Hihi!

First of all, thanks a ton for the work you have put on this, so far I'm super happy with VAME. Second, below I outline a sorta request/idea:

I already have a model trained on a very large portion of my data (~500k data points) that I'm very happy with
I'd like to incorporate new data (being produced daily) into the analysis, namely using the already trained encoder to obtain the latent space representation of the data, and also their motif structure (i.e. clustering)
This would ideally be a single function, so that I can incorporate it into my pipeline, and where I can supply data independently from the VAME project folder, as I have my own data structure for other analyses that have nothing to do with VAME
I looked at pose_segmentation.py . If I get this right, it seems I would need to modify embedd_latent_vectors (as the data path is assembled hardcoded from the VAME project folder structure) to get the latent representation, and also use parts of same_parameterization to get the motifs.
This also would mean I'll have to save the kmeans object created from the trained model the first time I run vame.pose_segmentation() (i.e. will have to expose it in the code).

The question is: is this something you are working on/have something lying around/something already there that I missed, or should I just go ahead and try to code it myself (+ send a pull request if this is of interest). Hope this makes sense and I'm not completely off the mark. And thanks in advance for the help !

K L · Answer 1 · Thu Jun 17 2021 15:45:02 GMT+0800 (China Standard Time)

Hi!

Thank you for sharing your thoughts and you make some good points, which would make VAME even more flexible.
We were already discussing this idea but have not started to work on this. The trained model, however, is saved and you can always encode new data with your trained model. The tricky part is the kmeans assignment and I am not sure at the moment if there is a way to save the sklearn kmeans object. But if this is possible, you would be able to assign the same motif numbers to new data.

We might look into that as well, but if you need something like this soon, you can give it a shot and if it works we are happy to include it into the code!

Cheers,
Kevin

Drago Guggiana · Answer 2 · Mon Jun 21 2021 19:34:18 GMT+0800 (China Standard Time)

Hi Kevin,

Thanks for the reply. Indeed I kinda wanna use it soon, so I'll get to it and keep y'all posted :) (since the kmeans cluster centers are saved, I might be able to reconstruct the kmeans object with those, but let's see)

Best,

Drago

Kirill Chesnov · Answer 3 · Fri Feb 04 2022 08:32:30 GMT+0800 (China Standard Time)

Hi @drguggiana,

I'm also looking into having the same functionality - did you succeed in implementing it?

Drago Guggiana · Answer 4 · Fri Feb 04 2022 18:02:59 GMT+0800 (China Standard Time)

Hi @chesnov,

I did. Issue is I have a slightly nightmarish setup, so some parts of my solution are very "me" specific, and hence didn't go for a pull request (sadly don't have the time to write a full general solution at the moment).

That said, if you look in my branched VAME repo, the relevant changes are in pose_segmentation.py, where I wrote a function to do the batching (plus changes in the init files to expose it), and the actual implementation is in my prey_capture repo (prey_capture/snakemake_scripts/run_latents.py), lines 65 to 92, where I grab trajectories, align them egocentrically, extract the latents with the aforementioned function and finally reconstruct the kmeans object to get the motifs (heads up, as the new version of VAME does clustering differently and I haven't updated mine yet, which will probably need more changes directly on VAME).

Not sure how useful this is for you, but feel free to hit me up if you have any questions.

K L · Answer 5 · Fri Jun 10 2022 19:03:21 GMT+0800 (China Standard Time)

Thank you again for you comments @drguggiana! I will close this issue for now, as we are preparing for an update of VAME hopefully within the next few months and I keep your ideas in mind for this.

Cheers,
Kevin