This package provides (or will provide) methods for microbial community analyses. For now, I'm adding stuff as I need it, but pull-requests are more than welcome.
The AbundanceTable
type is treated like a 2D array where columns are samples
and rows are features (eg species). Sample and feature names are also stored,
and there's a convenience function if you want to convert a DataFrame
to an
AbundanceTable
, assuming the first column contains feature names:
using Microbiome
using DataFrames
df = DataFrame(species=["E. coli", "B. fragilis", "L. casei"],
sample1=[0.1, 0.4, 0.5],
sample2=[0.3, 0.7, 0.0],
sample3=[0.0, 0.2, 0.8])
abund = AbundanceTable(df)
@show abund
@show abund.samples
@show abund.features
Note: I've used a relative abundance table here, but that need not be the case.
You can also use raw counts, RPK etc. If you want relative abundance, you can
do relativeabundance(abund)
You can also filter on the n
most abundant features accross the dataset. This
function automatically generates an n+1
row for other
containing the
remaining features. Note - these doesn't modify in-place, so you've gotta
reassign if you want to update:
abund2 = filterabund(abund, 1)
@show abund2
@show abund2.features
Quite often, it's useful to boil stuff down to distances between samples. For
this, I'm using an interface with Distances.jl
to generate a symetric
DistanceMatrix
, which also contains a vector for samples, and a field
specifying which type of distance was used to calulate it. You can load one
in manually, or generate it from an AbundanceTable
.
using Distances
dm = getdm(abund, BrayCurtis())
@show dm
@show dm.labels
I've also implemented a method to do a principle coordinates analysis. If
necessary, you can include correct_neg=true
to use the correction method
described in Lingoes (1971)
p = pcoa(dm)
@show eigenvalue(p, 2)
@show principalcoord(p, 1)
I've included some plotting recipes for convenience using RecipesBase
.
using StatPlots
abund = AbundanceTable(
rand(100, 10), ["sample_$x" for x in 1:10],
["feature_$x" for x in 1:100])
abund = relativeabundance(abund)
plot(abund, title="Random abundance")
dm = getdm(abund, BrayCurtis())
p = pcoa(dm, correct_neg=true)
plot(p, title="Random PCoA")