smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Valgrind uninitialized value errors

ctsa opened this issue · comments

I'm seeing a lot of valgrind 'Conditional jump or move depends on uninitialised value' errors coming from libwfa2 on a relatively simple example program, which I noticed while trying to trace down some instability in a larger program using wfa2.

Here is the small example code:

test.c

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include "wavefront/wavefront_align.h"

int main() {
    char *pattern = "A";
    char *text    = "ABC";

    wavefront_aligner_attr_t attr = wavefront_aligner_attr_default;
    attr.distance_metric = gap_linear;
    attr.linear_penalties.match = -1;
    attr.linear_penalties.mismatch = 1;
    attr.linear_penalties.indel = 1;
    attr.alignment_scope = compute_alignment;

    attr.alignment_form.span = alignment_endsfree;
    attr.alignment_form.pattern_begin_free = 0;
    attr.alignment_form.pattern_end_free = 0;
    attr.alignment_form.text_begin_free = 0;
    attr.alignment_form.text_end_free = 2;

    wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attr);
    wavefront_align(wf_aligner, pattern, strlen(pattern), text, strlen(text));
    cigar_print_pretty(stderr,
      wf_aligner->cigar,pattern,strlen(pattern),text,strlen(text));

    fprintf(stderr,"Alignment Score %d\n",wf_aligner->cigar->score);

    wavefront_aligner_delete(wf_aligner);
}

I cloned today's main branch (931181d), and compiled it in DEBUG mode, then compiled test.c and ran it with valgrind as shown below, I preview a few of the errors, but there were something like 30 of these "Conditional jump or move depends on uninitialised value" errors:

$ gcc -g -I../ test.c ../build/libwfa2.a -lm
$ valgrind ./a.out
==6548== Memcheck, a memory error detector
==6548== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6548== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==6548== Command: ./a.out
==6548==
==6548== Conditional jump or move depends on uninitialised value(s)
==6548==    at 0x4124B5: wavefront_extend_matches_packed_kernel (wavefront_extend_kernels.c:47)
==6548==    by 0x4124B5: wavefront_extend_matches_packed_endsfree (wavefront_extend_kernels.c:119)
==6548==    by 0x40CE7F: wavefront_extend_endsfree_dispatcher_seq (wavefront_extend.c:229)
==6548==    by 0x40CF04: wavefront_extend_endsfree_dispatcher_threads (wavefront_extend.c:246)
==6548==    by 0x40CFC7: wavefront_extend_endsfree (wavefront_extend.c:282)
==6548==    by 0x411996: wavefront_unialign (wavefront_unialign.c:251)
==6548==    by 0x401643: wavefront_align_unidirectional (wavefront_align.c:132)
==6548==    by 0x401930: wavefront_align (wavefront_align.c:228)
==6548==    by 0x4012A0: main (test.c:26)
==6548==
==6548== Conditional jump or move depends on uninitialised value(s)
==6548==    at 0x4120A6: wavefront_termination_endsfree (wavefront_termination.c:128)
==6548==    by 0x412513: wavefront_extend_matches_packed_endsfree (wavefront_extend_kernels.c:122)
==6548==    by 0x40CE7F: wavefront_extend_endsfree_dispatcher_seq (wavefront_extend.c:229)
==6548==    by 0x40CF04: wavefront_extend_endsfree_dispatcher_threads (wavefront_extend.c:246)
==6548==    by 0x40CFC7: wavefront_extend_endsfree (wavefront_extend.c:282)
==6548==    by 0x411996: wavefront_unialign (wavefront_unialign.c:251)
==6548==    by 0x401643: wavefront_align_unidirectional (wavefront_align.c:132)
==6548==    by 0x401930: wavefront_align (wavefront_align.c:228)
==6548==    by 0x4012A0: main (test.c:26)

Is there a way that this uninitialized state could be addressed?

Hi,

Of course, I should fix this. I haven't had luck reproducing the execution with Valdrind to get that message. But I am on it...

Ok. Can you share details of your system's setup (e.g., Valgrind version, GCC version, ...)?
I cannot reproduce the error.

Thanks in advance,

Thanks for taking a look, interesting observation! I'll try to work with some different configurations to see what's going on and if I can give you a more specific description of the issue.

Hi @smarco,

I can point to the specific initialization problem now.

To start with the source, the problem occurs in the values pointed to by wf_sequences->seq_buffer initialized here:

wf_sequences->seq_buffer = malloc(proposed_size);

The pattern or text should then actually be copied into the seq_buffer here:

memcpy(buffer_dst,sequence,sequence_length);

For the example code I present above, the pattern is of length 1, so for instance in this case only the first byte of seq_buffer is initialized.

Now one example where un-initialized bits can start to make their way into branching decisions is shown here:

uint64_t* pattern_blocks = (uint64_t*)(wf_aligner->sequences.pattern+WAVEFRONT_V(k,offset));
uint64_t* text_blocks = (uint64_t*)(wf_aligner->sequences.text+WAVEFRONT_H(k,offset));
// Compare 64-bits blocks
uint64_t cmp = *pattern_blocks ^ *text_blocks;
while (__builtin_expect(cmp==0,0)) {

In this case, even when WAVEFRONT_V(..) returns 0, we're looking at the first 8 bytes of sequences.pattern even though we've only initialized the first byte. Further it seems that in the example code we're querying even further into the fully uninitialized 8 byte segments when WAVEFRONT_V(..) is non-zero, so even padding out the pattern and text to a multiple of 8 in the initialization doesn't solve the issue.

A simple change to replace the original malloc call for wf_sequences->seq_buffer with calloc solves all the issues valgrind is highlighting. I'm not sure if there are reasons you'd like to avoid that or prefer to address this issue more precisely, but I submitted a PR for the simple fix in case this is a reasonable approach for you:

#77

Regarding the ability to reproduce this issue, I'm stumped. I tried a number of different OS versions, gcc versions, build settings, etc. and I detect the same valgrind errors in all cases. Note the error messages become messier in valgrind for versions that I tried earlier than 3.15, but it is still showing problems. To make it as simple as possible here is one formula that worked:

  • Centos7
  • gcc 10.2
  • cmake 3.20.2
  • valgrind 3.15

Build wfa2 with cmake -DCMAKE_BUILD_TYPE=Debug .. build the example code with gcc -g -I../ test.c ../build/libwfa2.a -lm run valgrind ./a.out.

Hi @smarco, Any thoughts on the above? Is this a PR you can consider?

Sorry for the delay. I have been a few days out.

I was not initializing the padding and, thus, the Valgrind error. Although the centinels in between the padding guarantee that the algorithm is correct, I agree that a clean Valgrind report is the way to go. Moreover, I got to reproduce the error on a CentOS7+gcc10.

Glad to merge the PR (thank you). Later, I will push another patch to restrict the initialization to the padding and avoid a potentially costly call to calloc(.).

Thank you so much,