jeetsukumaran / DendroPy

A Python library for phylogenetic scripting, simulation, data processing and manipulation.

Home Page:https://pypi.org/project/DendroPy/.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

phylogenetic distances with gigantic tree

b-tierney opened this issue · comments

Hi,

I have a massive tree (upwards of 100K microbes) and am concerned about memory limitations if I were to try to extract a phylogenetic distance matrix for all taxa in it. I'm hope to use a subsampling strategy, looking at only ~1000 taxa at a time. Do you have any advice for how to subset only pairwise distances like this?

Thanks so much in advance.

Wow thank you so much for the fast response.

Ah so I can prune by taxon id? That sounds like what I need.

Current plan is to generate 10-20 random subsets...you're right that it may work out if I just use a big enough VM, but I'm currently not running at scale, still writing unit tests, so strategy definitely can change, I was considering at one point just building a brand new tree for each iteration. Being able to prune gives me a solid backup plan, though.

Let me know if you have any other ideas, I deeply appreciate it.

Spectacular, thank you so much