vgl-hub / gfastats

A single fast and exhaustive tool for summary statistics and simultaneous *fa* (fasta, fastq, gfa [.gz]) genome assembly file manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fasta-to-gfa conversion

YanWangTF opened this issue · comments

Thank you for making this tool available. I found when I converted a fasta file to gfa, the scaffold number increased in the output file (a few scaffolds turned to multiple 'segments'). I was wondering if there is a flag that we can use to keep the output file having the same scaffolds as the input fasta file?

hi @YanWangTF

Thanks for reaching out. Can you provide an example? Scaffolds are supposed to be represented as multiple segments (S lines) connected by gaps (J lines) in GFA. This is what the format specifies: http://gfa-spec.github.io/GFA-spec/GFA1.html and it makes sense, segments should not have N bases.

Best

Hi @gf777, thank you for the clarification! I used the gfastats to convert a genome assembly (scaffolded using HiC data) from FASTA to GFA, which worked smoothly. While I was using Bandage to draw the graph, it produced more scaffolds (nodes) on the graph. For example, it has 22 scaffolds while showing 27 nodes. Or do you have an alternative tool to visualize the GFA that draws scaffolds instead of segments or that presents the links between segments in the meantime? Thank you.

hi @YanWangTF
Have a look at https://github.com/asl/BandageNG , the currently maintained "nextGen" version of bandage. It supports also J lines

Thank you @gf777! The BandageNG worked perfectly with the output from gfastats this time with links and merging options. Much appreciate your help with this.