frederic-mahe / mumu

C++ implementation of lulu, a R package for post-clustering curation of metabarcoding data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log file interpretation

CristinaZb opened this issue · comments

Dear @frederic-mahe
While log file exploration I'm missing column headers, I do not known what each column refers to.
Also, why the number of "accepted" match does not agree with the number of final ASVs?

The manpage describes the log file and its different columns:

       -l, --log filename
                Output file for OTU merging statistics (18 columns separated by tabulations). OTUs are
                processed in no specific order. For a given query OTU  with  potential  parents,  mumu
                will  order potential parents by decreasing similarity with the query OTU, then by de‐
                creasing abundance, then by decreasing incidence (or spread),  and  finally  by  names
                (increasing ASCIIbetical order). Each potential parent is tested, and the search stops
                if parenthood criteria are matched or if the list is exhausted. The different  columns
                correspond to:

                       1.  name of query OTU.

                       2.  name of potential parent OTU.

                       3.  percentage of similarity (float value ranging from 0 to 100).

                       4.  total  abundance  of the query OTU (sum through all samples, positive inte‐
                           ger).

                       5.  total abundance of the potential parent OTU (sum through all samples, posi‐
                           tive integer).

                       6.  overlap  abundance  of the query OTU (sum through all samples where the po‐
                           tential parent OTU is also present, positive integer).

                       7.  overlap abundance of the potential parent  OTU  (sum  through  all  samples
                           where the query OTU is also present, positive integer).

                       8.  incidence  of  the  query  OTU  (number  of  samples where the query OTU is
                           present, positive integer).

                       9.  incidence of the potential parent OTU (number of samples where  the  poten‐
                           tial parent OTU is present, positive integer).

                       10. incidence of the potential parent OTU (number of samples where both the po‐
                           tential parent OTU and the query OTU are present, positive integer).

                       11. smallest abundance ratio (for each sample, compute the abundance of the po‐
                           tential  parent  OTU  divided  by  the abundance of the query OTU, find the
                           smallest value, float).

                       12. sum of the abundance ratios (positive integer).

                       13. average value of abundance ratios (float).

                       14. smallest non-null abundance ratio (exclude ratios  for  samples  where  the
                           query OTU is present but not the potential parent OTU, float).

                       15. average  value  of  non-null  abundance  ratios (exclude ratios for samples
                           where the query OTU is present but not the potential parent OTU, float).

                       16. largest ratio value (float).

                       17. relative co-occurence value (number of samples  where  both  the  potential
                           parent  OTU  and the query OTU are present divided by the number of samples
                           where the query OTU is present, float).

                       18. status: 'accepted' or 'rejected'. The potential parent OTU  is  either  ac‐
                           cepted as a parent, or rejected.

                Abundance  and incidence values in the log file correspond to the values in the origi‐
                nal input table. Abundance and incidence values can be updated  only  when  the  whole
                dataset has been processed and all potential parents are known.

                Also,  to avoid circular linking among OTUs with the same abundance values, merging is
                only possible with parent OTUs that are strictly more abundant than the query OTU. For
                instance, OTUs of abundance one can only be merged with OTUs of abundance > 1.

Also, why the number of "accepted" match does not agree with the number of final ASVs?

accepted means that this particular ASV will be merged with a parent ASV. So, your initial number of ASVs, minus the number of accepted merges, should give you the final number of ASVs.

For example, with a test dataset:

mumu \
    --otu_table tmp.table \
    --match_list tmp.match_list \
    --log log_file \
    --new_otu_table tmp3 \
    --minimum_match 84 \
    --minimum_ratio_type min

# check results
grep -c "accepted$" log_file
parse OTU table... done, 29199 entries
parse match list... done
sort lists of matches... done
search for potential parent OTUs... done
merge OTUs... done
update spread values... done
write new OTU table... done, 19897 entries

9302 # accepted in log file

19897 + 9302 = 29199, as expected.

Thank you for the information, what you explain about accepted combinations makes complete sense.

I've noticed a minor bug in the log file (column 6: incorrect computation of the query overlap abundance when there is no overlap). It has no effect on the merging results, but you might want to install the new mumu 1.0.1 if you want to use the log file for visualization or exploratory stats.

Sorry for my ignorance but I'm not able to upgrade the package, the terminal says 'Unable to locate mumu package'
Actually, the mumu --help command works to me, but the man mumu does not, any idea what is going on?

No problem.

To install mumu for the first time (assuming git is installed on your Linux system, in a terminal):

git clone https://github.com/frederic-mahe/mumu.git
cd ./mumu/
make
make check
sudo make install  # to install the binary and the manpage

To read the man page:

man mumu

To upgrade mumu:

cd ./mumu/  # go back to your mumu folder
make dist-clean
git pull
make
make check
sudo make install  # to install the new binary and the new manpage