shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.

Home Page:https://www.nature.com/articles/s41587-022-01559-w

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run errors

gevro opened this issue · comments

commented

Hi, I'm running into trouble and getting several errors:

  1. I'm using the latest 1.1.1 version.

  2. Input is a BAM file with config:
    #sample_name bam vcf depth sex
    60603-W1-B2 60603-W1-B2.cram 60603-W1-B2.filtered.vcf.gz 51.3408 M

  3. input.log shows this:

ANNOVAR Version:
        $Date: 2020-06-07 23:56:37 -0400 (Sun,  7 Jun 2020) $
ANNOVAR Information:
        For questions, comments, documentation, bug reports and program update, please visit http://www.openbioinformatics.org/annovar/
ANNOVAR Command:
        /bin/annovar/annotate_variation.pl -filter -build hg38 -dbtype gnomad_genome /tmp/tmpy7mejtlt /bin/annovar/humandb -outfile 60603-W1-B2/input
ANNOVAR Started:
        Fri Aug 25 18:38:21 2023
NOTICE: Output file with variants matching filtering criteria is written to 60603-W1-B2/input.hg38_gnomad_genome_dropped, and output file with other variants is written to 60603-W1-B2/input.hg38_gnomad_genome_filtered
NOTICE: Processing next batch with 3158595 unique variants in 3158595 input lines
NOTICE: Database index loaded. Total number of bins is 28084439 and the number of bins to be scanned is 2768371
NOTICE: Scanning filter database /bin/annovar/humandb/hg38_gnomad_genome.txt...Done
  1. Output:
  • input.hg38_gnomad_genome_filtered has 201,820 lines
  • features.txt is empty
  • images and matrices folders are empty
  1. In STDERR, I'm getting this WARNING when I run deepmosaic-draw:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "C.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
  1. After this step: "NOTICE: Scanning filter database /bin/annovar/humandb/hg38_gnomad_genome.txt...Done
    /DeepMosaic/deepmosaic/gnomadAnnotation.py:30: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
    df = pd.read_csv(output_dir + "input." + build + "_" + dbtype + "_dropped", header=None, sep="\t")"

    I'm getting a bunch of these warnings (thousands of lines) right when the pipeline starts:

/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260008_1_2600980418: No such file or directory
/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260007_1_2601222400: No such file or directory
/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260012_1_2601368861: No such file or directory
/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260011_1_2601476278: No such file or directory
/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260010_1_2600633160: No such file or directory
/reference-files/cram-cache/6a/ef/897c3d6ff0c78aff06ac189178dd.tmp_260006_1_2601420330: No such file or directory
[E::cram_read_container] Container header CRC32 failure

It looks like there are CRAM warnings/errors, but there is no CRAM input into the pipeline. So I'm not sure what the issue is.

Can you please assist?

Thanks!

Looks like a cram input.
image
Please check and confirm.

Best,

Xiaoxu

commented

You're right - sorry I missed that bug! Even though I converted to BAM, I was still putting CRAM in the config.