vgl-hub / gfastats

A single fast and exhaustive tool for summary statistics and simultaneous *fa* (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Null results for statistics parameters

CEPHAS-01 opened this issue · comments

Hi,

I just ran gfastats on a gfa file produced by hifiasm in trio mode to obtain the summary assembly statistics on scaffold length, N and L statistics etc. The command is as follows:

gfastats -f $inFile -t --stats > $outFile

The output I got is 0 for most of the parameters reported.

scaffolds 0

Total scaffold length 0
Average scaffold length nan
Scaffold N50 0
Scaffold auN 0.00
Scaffold L50 0
Largest scaffold 0
Smallest scaffold 0

contigs 0

Total contig length 0
Average contig length nan
Contig N50 0
Contig auN 0.00
Contig L50 0
Largest contig 0
Smallest contig 0

gaps in scaffolds 0

Total gap length in scaffolds 0
Average gap length in scaffolds 0.00
Gap N50 in scaffolds 0
Gap auN in scaffolds 0.00
Gap L50 in scaffolds 0
Largest gap in scaffolds 0
Smallest gap in scaffolds 0
Base composition (A:C:G:T) 0:0:0:0
GC content % nan

soft-masked bases 0

segments 21136

Total segment length 6154411558
Average segment length 291181.47

gaps 0

paths 0

edges 59590

Average degree 2.82

connected components 126

Largest connected component length 1596008570

dead ends 716

disconnected components 167

Total length disconnected components 115218292

separated components 293

bubbles 1052

circular segments 2

I am using the latest release (v1.3.6) of gfastats, extracted and compiled from the "gfastats.v1.3.6.tar.gz " file.

Am I running the command correctly?

Hi @CEPHAS-01,

Thanks for reaching out. Sorry I realize that this is a bit confusing. Conceptually, we do not consider a 'contig' as defined in the GFA unless there is an actual path (potentially involving multiple segments) that define its sequence. This is an attempt to distinguish contigs from segments.

In practice, just add the option --discover-paths and it will generate a path for each segment, thus generating contigs that can then be evaluated in the stats.

I'll add a comment in the readme as well.

Hi Giulio,
Thanks for the prompt response.
It sure works well now with the --discover-paths option.
Would be nice to have that as a comment in the README as you suggested, so that other users would be rightly guided.

done, thank you for the input!