Module request: bamdst
hongtir opened this issue · comments
Name of the tool
bamdst
Tool homepage
https://github.com/shiquan/bamdst
Tool description
Bamdst is a lightweight tool to stat the depth coverage of target regions of bam file(s).
Tool output
## The file was created by bamdst
## Version : 1.0.9
## Files : ST0217_Lg.bam
[Total] Raw Reads (All reads) 766026326
[Total] QC Fail reads 0
[Total] Raw Data(Mb) 113517.44
[Total] Paired Reads 766026326
[Total] Mapped Reads 750267810
[Total] Fraction of Mapped Reads 97.94%
[Total] Mapped Data(Mb) 111466.09
[Total] Fraction of Mapped Data(Mb) 98.19%
[Total] Properly paired 735084416
[Total] Fraction of Properly paired 95.96%
[Total] Read and mate paired 744300282
[Total] Fraction of Read and mate paired 97.16%
[Total] Singletons 5967528
[Total] Read and mate map to diff chr 5429012
[Total] Read1 383013163
[Total] Read2 383013163
[Total] Read1(rmdup) 359498611
[Total] Read2(rmdup) 359195220
[Total] forward strand reads 375142297
[Total] backward strand reads 375125513
[Total] PCR duplicate reads 31573979
[Total] Fraction of PCR duplicate reads 4.21%
[Total] Map quality cutoff value 20
[Total] MapQuality above cutoff reads 694167039
[Total] Fraction of MapQ reads in all reads 90.62%
[Total] Fraction of MapQ reads in mapped reads 92.52%
[Target] Target Reads 343938951
[Target] Fraction of Target Reads in all reads 44.90%
[Target] Fraction of Target Reads in mapped reads 45.84%
[Target] Target Data(Mb) 50324.92
[Target] Target Data Rmdup(Mb) 47435.18
[Target] Fraction of Target Data in all data 44.33%
[Target] Fraction of Target Data in mapped data 45.15%
[Target] Len of region 1442021955
[Target] Average depth 34.90
[Target] Average depth(rmdup) 32.89
[Target] Coverage (>0x) 93.55%
[Target] Coverage (>=4x) 93.52%
[Target] Coverage (>=10x) 93.33%
[Target] Coverage (>=30x) 74.65%
[Target] Coverage (>=100x) 0.07%
[Target] Target Region Count 24429
[Target] Region covered > 0x 21804
[Target] Fraction Region covered > 0x 89.25%
[Target] Fraction Region covered >= 4x 89.23%
[Target] Fraction Region covered >= 10x 89.18%
[Target] Fraction Region covered >= 30x 83.59%
[Target] Fraction Region covered >= 100x 0.34
[flank] flank size 200
[flank] Len of region (not include target region) 1451274993
[flank] Average depth 34.88
[flank] flank Reads 346031656
[flank] Fraction of flank Reads in all reads 45.17%
[flank] Fraction of flank Reads in mapped reads 46.12%
[flank] flank Data(Mb) 50621.18
[flank] Fraction of flank Data in all data 44.59%
[flank] Fraction of flank Data in mapped data 45.41%
[flank] Coverage (>0x) 94.56%
[flank] Coverage (>=4x) 94.52%
[flank] Coverage (>=10x) 94.32%
[flank] Coverage (>=30x) 75.40%
[flank] Coverage (>=100x) 0.07%
Log filename pattern
coverage.report
Data suitable for MultiQC plot(s)
- Fraction of Mapped Reads
- Average depth
Most interesting data for the General Stats table
- Fraction of Mapped Reads
- Average depth
Before submitting
- I have included example data (zipped, not pasted) that can be used to write the module.
Thank you @hongtir for the test example!
One question - in the header, the word "Files" is in plural form: ## Files : ST0217_Lg.bam
. Is it possible that the tool might be run on multiple BAM inputs? What output would look like in tis case?
Alright, seems like it supports multiple BAM inputs, but they can't come from different samples - the idea is that those BAMs would be split by chromosome, and have no overlap in positions. Added a test example for that: https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/bamdst/multiple-inputs
Now, it's not clear what is the best source to fetch the sample name for the report. The file name is coverage.report
and cannot be overwritten, and we can't assume users would rename the file. The ## Files
header can list multiple files.
I guess the best bet would be to take the first file name listed in ## Files
. One problem is that it's not clear how to split them up since file names can contain spaces:
## Files : exam ple/test1.bam exam ple/test2.bam
I'd just assume that the file names can be either .bam
or .cram
.
Started a module here: #2161
Only table columns from coverage.report
for now. Suggestions on what plots to add are welcomed :)
The file name is
coverage.report
and cannot be overwritten, and we can't assume users would rename the file.
I think users would have to rename this as sample_id.coverage.report
- because if they don't, and if you have bamdst
run with multiple samples, there might be unwanted consequences if care is not taken in carefully staging the files from different samples.
Yes, but people could put the files in different directories or all sorts. And it's good to never assume that users take care 😅