smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wf_aligner->cigar->operations is blank

lpipes opened this issue · comments

I tried to align two sequences:

GGAGCATCAGTAGACCTAACCATTTTCTCCCTCCATCTAGCAGGAGTATCTTCAATCCTAGGAGCAATTAATTTCATCACTACAATTATTAACATAAA
CCTCAGAAGAAACGCTGCTCACGCCGGGTCTTCAGTTGATTTTGCCATTTTTTCTTTGCACTTGGCGGGAGTTTCTTCTCTATTAGGGGCCGTCAATTTTATTAGAACTTTAAGCAATTTGCGAATTTTTGGGATATTTTTGGATCAGATGCCCCTTTTTTGTTGAGCAGTATTAATTACGGCAGTTCTATTGCTTTTATCCTTACCCGTTCTCGCGGGGGCTATTACCATGTTGCTAACAGACCGCAATTTGAATTCCAGATTTTATGACGTTAGGGGAGGGGGGGATCCCGTGCTCTACCAGCATTTATTT

wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch =4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
// Initialize Wavefront Aligner
wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
wavefront_align(wf_aligner,seq1,strlen(seq1),seq2,strlen(seq2)); //Align

But the cigar->operations doesn't contain anything. Isn't this supposed to contain the cigar string?

(gdb) p wf_aligner->cigar->begin_offset
$38 = 3686
(gdb) p wf_aligner->cigar->end_offset
$39 = 3999
(gdb) p wf_aligner->cigar->operations
$40 = 0x7ffff28e5b98 ""

Hi,

Ok. The point is that cigar->operations is a raw buffer storing the CIGAR starting at wf_aligner->cigar->begin_offset. Try with these simple modifications:

  char* pattern = "GGAGCATCAGTAGACCTAACCATTTTCTCCCTCCATCTAGCAGGAGTATCTTCAATCCTAGGAGCAATTAATTTCATCACTACAATTATTAACATAAA";
  char* text    = "CCTCAGAAGAAACGCTGCTCACGCCGGGTCTTCAGTTGATTTTGCCATTTTTTCTTTGCACTTGGCGGGAGTTTCTTCTCTATTAGGGGCCGTCAATT"
                  "TTATTAGAACTTTAAGCAATTTGCGAATTTTTGGGATATTTTTGGATCAGATGCCCCTTTTTTGTTGAGCAGTATTAATTACGGCAGTTCTATTGCTT"
                  "TTATCCTTACCCGTTCTCGCGGGGGCTATTACCATGTTGCTAACAGACCGCAATTTGAATTCCAGATTTTATGACGTTAGGGGAGGGGGGGATCCCGT"
                  "GCTCTACCAGCATTTATTT";

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = gap_affine;
  attributes.affine_penalties.mismatch = 4;
  attributes.affine_penalties.gap_opening = 6;
  attributes.affine_penalties.gap_extension = 2;
  // Initialize Wavefront Aligner
  wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
  int status = wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text)); //Align
  fprintf(stderr,"Status = %d\n",status);
  // Display result
  fprintf(stderr,"Offsets = [%d,%d)\n",wf_aligner->cigar->begin_offset,wf_aligner->cigar->end_offset);
  fprintf(stderr,"CIGAR = %.*s\n",
      wf_aligner->cigar->end_offset-wf_aligner->cigar->begin_offset,
      wf_aligner->cigar->operations+wf_aligner->cigar->begin_offset);

Ok thanks for the explanation! I am getting this to work now. However, I ran into another issue with cigar_print_pretty with a different set of sequences. My pattern is:
GCTAGCAGGTAACTTAGCCCATGCCGGAGCATCAGTAGATCTAGCTATTTTTTCCCCTACACCTAGGCAGGAGTCTCTTCAATTCTAGGAGCAATCAACTTCATCACAACTATTATTAACATGAAACCACCAGGAATATCACAAGACCAAATCCCCCTATTTGTATGATCTGTATTAATTACAGCAGTTCCCTACTACTCCTATCACTACCTGTCTTAGCAGGGGCCATCACAATGCTACTAACAGATCGTAACCTAAATACATCCTTTTTCGACCCAGCAGGAGGAGGAGACGACCCCCAATTCTATATCAACACTTTTTTTC
My text is:
CCTTAGGGGAAGAATCTTTCATTCTTCCCCTGCGGTCGATATTTCCATATTTTGTTATGAAAATATCGGCACCGGGTCCGTTATTTTAGCTATTAATATAGTGGTAACTATTATTAATATACGTTCACCACACATAACTCTTAGACGAATATCATTATTTCCATGAAGCCTATTGGTCGCCAGGATCCTTCTTCTACTGGCGATGCCGGTACTCATAGGGGCTATCTCAATGTTGGTACTGGATCGAAATTTCGACACTAGATTTTTCGATCCAACGGGGGGCGGAGACCCCATCTTGTACCAACACTTATTCTGATTCTTCGGGCATCCTGAGGTCTACATTCTGATTTTGCCCGGATTTGGTATCATCTCGCATATTATTGCCCACTACTCCGCAAAGCCGACAGTTTTTGGTTCGCTGGGTATAATTTATGCTATATTAGGAATTGGGGTACTAGGTTTCGTTGTCTGGGCTCACCATATGTTTACGGTTGGCATGGATATCGACACCCGTGCTTATTTCACCGCAGCTACTATAATTATCGCAGTCCCGACAGGCATTAAGATTTTCAGTTGATTAGCCACTATTTGAGGAACTAATATTCATTACGAGACGCCAATACTCTGGGCCTTAGGGTTTATTTTCATATTTACCTTAGGAGGGTTAACAGGGATCGTTCTTGCTAATTCGTCTTTAGACATCGTCCTCCATGATACGTTCTATGTCGTGGCTCATTTTCATTACGCTCTTTCTATGGGGACGGTTTTCTCTATTTTTGGGGGTGTTAATTACTGGTTCCCAGTGATAACCGGGCTCACCCTCCATAGTCGGTGAACGAAGATCCACTTCTTTATTACATTTATTGGGGTTAATTTAACCTTCTTCCCCCAACACTTTTTAGGGTTAGCAGGCATACCTCGCCGGTATTCCGATTATCCCGACGCTTATGCGATGTGAAATTCCCTTTCTTCAATTGGTTCAATAATTTCTTTTGTAGGGCTATTGTTTTTCATCTTTATTTTGTGGGAAAGTTTTGCTGCCCAGCGTGGGGTTATTAGTCCTACCGCTCAAGTAGCTTCAGCAGAGTGAGGGTTCATGCCCTTGAACCCAATGACTGAGTTGGAGCCTATTTCTTTTACTCGTTCAACCCCTTTCTTCGCCTTTATAACTCCTTAATGACACAATAGATATAAGAAATTTAGAGACCAGACTACACTGCAAAATGTAACACTTTTATTGTTTCTCTTACCCCCTAACCACAAAGCTTATACCTGTTACTCTGTAAATATACAAGATGCATCTCCACCTTCTCCCCGTTATACCTAGCCGTGTGTAGTTTCCTAGTGTCTAAATTTCCCCATAACCATATTAATATATACGTTGTTTCCTTAGGCCCTAATATTATATATTTGCACTACAATTACTTCCGGCTTTACGTTAATATGGTATTCACCTCTTTTGTGTAATGACTAAATTTGCGGTTCATATTTTGGTTTATGATGTCCTTACGCCCCTTATACTATAAGTTGTAACATTCAGAGCTTGATTACATGTCCTTAAATTTAACACTCTTACTTATGTTATAGTTTGAGAGAAATAGTCTAGTAGCACCTCCTTTAACCGCTTTTATTGAGGCCGCTGGCTATTTTCTACTAATCATAAAGATATTGGGACCTTATATTTTATTTTAGGTACTTGGGCAGGCCTCCTTGGGACCTCAATAAGATTACTTATTCGAGCTGAACTAGGACAACCAGGGTCTCTACTAGGTAGAGACCAGCTGTATAATACTATCGTAACTGCCCACGCCTTTTTAATAATTTTTTTCTTAGTTATGCCTATTATAATTGGGGGATTTGGTAACTGGTTAATCCCCCTTATGCTTGGGGCTCCAGACATAGCCTTTCCACGGTTAAACAATATGAGGTTTTGATTACTGCCCCCTTCATTGTTTCTACTTGTAATAAGGTCAGTAGTTGAGAAAGGGGTAGGCACTGGGTGAACAGTTTATCCTCCTCTATCTGGAAACATTGCACACGCCGGAGCCTCAGTTGACTTAGCGATTTTCAGTCTTCATCTAGCTGGAGTGTCCTCGATTTTAGGGGCCCTAAACTTTATTACTACGGTTATCAACATACGGCCAAATGCTCTACGGCTGGAACGAGTCCCTCTGTTCCCTTGAGCAGTTAAAATCACCGCTATTCTACTGTTGCTATCCCTACCGGTCTTAGCTGGAGCTATCACTATGCTTTTAACAGATCGTAACTTGAACACCGCTTTTTTCGACCCTGGAGGTGGTGGTGACCCTATTTTATATGAACACTTATTC

But when I use cigar_print_pretty the pattern_pos becomes longer than the length of the pattern and I get a Segfault.

446					if (pattern[pattern_pos] != text[text_pos]) {
(gdb) p pattern_pos
$68 = 480
(gdb) p text_pos
$69 = 385
(gdb) p pattern[pattern_pos]
$70 = 0 '\000'
(gdb) p (int)strlen(pattern)
$73 = 324
(gdb) p (int)strlen(text)
$74 = 2330

Ok, but I will need the code snippet you are using to tell you more.
AFAIK pattern_pos is never provided to the library, so I don't know how can it change.

Thanks,

I was able to figure it out. It was user error sorry. I have another question. With Glocal alignment, what should the values for alignment_endsfree, text_begin_free, and text_end_free be set to? Thank you btw for adding this in from the previous WFA version.

    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = 0;
    attributes.alignment_form.pattern_end_free = 0;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = text_end_free;

Ok, glad you solved it.

WRT the glocal alignment, all these values are positive integers or zero (to disable), and each one regulates how much begin/end can be free of penalty in the alignment. See the following figure.

image

Does this help? Can it be better understood now?

I sort of understand. But I haven't had much luck with adjusting the parameters. For example, I have the parameters set to:

 wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
 attributes.distance_metric = gap_affine;
 attributes.affine_penalties.mismatch =4;
 attributes.affine_penalties.gap_opening = 6;
 attributes.affine_penalties.gap_extension = 2;
 attributes.alignment_form.span = alignment_endsfree;
 attributes.alignment_form.pattern_begin_free = 0;
 attributes.alignment_form.pattern_end_free = 0;

But I still get this kind of alignment, where the beginning A is aligned to the first base of the text/reference.
Screen Shot 2022-10-07 at 10 29 29 AM

Ok, but you are not defining the text_end/begin, and for the pattern, you are setting both to zero (i.e., disabled). Try setting some leeway, like:

 attributes.alignment_form.pattern_begin_free = 50;
 attributes.alignment_form.pattern_end_free = 50;

I added those changes but still getting this:
Screen Shot 2022-10-07 at 11 44 51 AM

Hi,

If you can paste any of those ASCII sequences here, I can provide an example of how to use it.

>KM404175.1_3_2
ATTTGTGTGATCCGTACTTATCACTGCCGTCCTACTACTCCTATCAATTCCAGTTCTCGCTGCTGGTATCACCATACTACTAACAGACCGAAAACTAAATACCACATTCTTCGACCCCGCTGGAGGAGGCGACCCAGGCCTATACCAACA
>KM001333.1
AACCTAGCTGGCAACCTAGCTCATGCTGGAGCTTCAGTAGACCTAGCTATTTTCTCCCTTCACGTAGCAGGTGTATCCTCCATCCTAGGTGCAATCAACTTCATCACAACTGCCATCAACATAAAACCACCCGCCCTCTCACAATACCAAACACCTTTATTTGTATGATCCGTACTTATCACTGCTGTCTTACTGCTTCTATCACTTCCAGTTCTCGCCGCTGGCATCACCATACTACTAACAGACCGAAATCTAAACACCACATTCTTTGATCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Hi,
I'm also getting this type of alignment (see image) with these sequences. Can you suggest different parameters that I can try to get a better Glocal alignment?

>DQ433359.1
CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA
>GU571333.1_53
CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC

The parameters I used were:

        wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
        attributes.distance_metric = gap_affine;
        attributes.affine_penalties.mismatch =4;
        attributes.affine_penalties.gap_opening = 6;
        attributes.affine_penalties.gap_extension = 2;
        attributes.alignment_form.span = alignment_endsfree;
        attributes.alignment_form.pattern_begin_free = 50;
        attributes.alignment_form.pattern_end_free = 50;

Screen Shot 2022-10-11 at 10 56 00 PM

I'm sorry for not getting back to you sooner. I have been playing with the example, and I can make a suggestion:

(1) Use a match score different from 0.
(2) Use full ends-free.

Using the following code:

int main(int argc,char* argv[]) {
  char* pattern =
      "CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATAC"
      "CAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC";
  char* text =
      "CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTA"
      "TCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCC"
      "GTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGC"
      "CGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA";

  // Configure alignment attributes
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = gap_affine;
  attributes.affine_penalties.match = -1;
  attributes.affine_penalties.mismatch = 4;
  attributes.affine_penalties.gap_opening = 6;
  attributes.affine_penalties.gap_extension = 2;
  attributes.alignment_form.span = alignment_end2end;

  // Initialize Wavefront Aligner
  wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
  // Align
  wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
  fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar->score);
  // Display alignment
  fprintf(stderr,"  PATTERN  %s\n",pattern);
  fprintf(stderr,"  TEXT     %s\n",text);
  fprintf(stderr,"  SCORE (RE)COMPUTED %d\n",cigar_score_gap_affine(wf_aligner->cigar,&attributes.affine_penalties));
  cigar_print_pretty(stderr,wf_aligner->cigar,pattern,strlen(pattern),text,strlen(text));

  // Free
  wavefront_aligner_delete(wf_aligner);
}

Suggests that the alignment is found in the middle of the reference (semi-global or glocal).

      PATTERN    C-----------------------------------------------------ACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC---------------------------------------------------------------------------------------------------------------
                 |                                                     ||| ||| |||| ||||| || |||||| ||||||| || ||||| |||||||| ||||| || |||||||| || |||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||||||||| || ||||                                                                                                               
      TEXT       CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA

In that case, I would suggest using ends-free, so that:

  // Configure alignment attributes
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = gap_affine;
  attributes.affine_penalties.match = -1;
  attributes.affine_penalties.mismatch = 4;
  attributes.affine_penalties.gap_opening = 6;
  attributes.affine_penalties.gap_extension = 2;

  attributes.alignment_form.span = alignment_endsfree;
  attributes.alignment_form.pattern_begin_free = 0;
  attributes.alignment_form.pattern_end_free = 0;
  attributes.alignment_form.text_begin_free = strlen(text);
  attributes.alignment_form.text_end_free = strlen(text);

Gives the following:

      PATTERN    -----------------------------------------------------CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC---------------------------------------------------------------------------------------------------------------
                                                                      |||| ||| |||| ||||| || |||||| ||||||| || ||||| |||||||| ||||| || |||||||| || |||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||||||||| || ||||                                                                                                               
      TEXT       CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA