simseqgen provides wrapper scripts to simulate reference sequences and reads.
simseqgen subcommand
where subcommands are described below.
simseqgen
is packaged with some demographic models specified
following the demes
specification.
The repo
subcommand provides information about the models.
The subcommand make_ref
takes as input a tree
sequence that has been
generated by simulations with e.g.
msprime or
SLiM. The output consists of a vcf file
with variant sites and a fasta file with all sequences.
If a reference individual is specified, a reference fasta file is also generated, and the variant calls of the other individuals will be modified with respect to the reference sequence. This can be useful for simulating outgroups.
WIP: By default, the fasta sequences are generated by filling the monomorphic sites with DNA drawn uniformly from the four bases. A reference sequence can be supplied, in which case only the variant sites will be modified.
TODO.
This example simulates reference sequences based on an Out of Africa demes model with outgroups chimpanzee, gorilla, and orangutan.
simseqgen repo --ls
name uri
ooa_with_outgroup /path/to/simseqgen/data/ooa_with_outgroup.demes.yaml
The demes file is first used as input to msprime to simulate ancestry
msp ancestry --demography /path/to/simseqgen/data/ooa_with_outgroup.demes.yaml CHB:6 YRI:6 CEU:7 chimpanzee:1 gorilla:1 orangutan:1 -o ooa.ts --recombination-rate 1e-8 --length 1e6 --random-seed 42
followed by the addition of mutations
msp mutations --random-seed 42 1.25e-9 ooa.ts -o ooa.mut.ts
Run simseqgen to generate vcf and fasta sequences for the tree sequence:
simseqgen make_ref ooa.mut.ts --reference_chromosome CEU:6
The output vcf will consist of 6 CEU samples as one sample has been
chosen as reference. In addition, all derived alleles specific to the
reference have been flipped. In this case three output files are
generated: 1) simseqgen.reference.fasta
containing the first
haplotype of the reference individual 2) simseqgen.fasta
containing
the chromosomes from the other individuals and 3) simseqgen.vcf.gz
containing variants for the non-reference individuals.