- data/rt: 200k tree
- data/aligned_all.filtered.fa.zip: backbone alignment
To speed up the process, split the sequence file into two files in realtively same size and each ideally contain the same number of genomes.
- to calculate placement error in edges run
bash select_copy/select_copy_split.sh tree_file seq_file output_error_file
- to calculate placement error in path length run
bash select_copy/dist_err_split.sh tree_file seq_file outdir
Run
bash udance/run.sh -b backbone_tree_file -s sequence_file -l 10000000 -o outdir -t cores -c 15000
conda env create -f depp_env.yml
Run bash scripts/prepare_depp.sh tree_file seq_file outdir
This will
- Trim the backbone tree and remove sequences in the backbone sequences so that backbone tree and backbone sequences have the same set of species.
- Make the aligned backbone sequence file to be in four types (full-length, V4, V4 100bp and V4 150bp) by trimming the sequences. Following is the position (index start from 0) we used in the previous run for trimming the squences. Those positions are found by aligning the fragments of 16S to the aligned full-length 16S. It would be good to double check that.
-
V4 100 start: 571 End: 669
-
V4 150 Start: 571 End: 719
-
V4 Start: 571 End: 819
The output will be stored in outdir
- backbone_tree: outdir/backbone.nwk
- backbone_seq: outdir/full_length/backbone.fa, outdir/v4_100/backbone.fa, outdir/v4_150/backbone.fa, outdir/v4/backbone.fa
For each of the backbone sequence type, train a model using the command:
train_cluster_depp.sh -t backbone_tree -s backbone_seq -g gpu_id -o outdir
Do the placement using the following command:
depp-place-rRNA.sh -q query_seq -a accessory_file -o ourdir -x cores -t mixed
.
Pretrain accessory_file can be accessed here
depp-place-rRNA.sh -q data/test_depp/query0.fa -a accessory_file -o ourdir -x cores -t mixed
pip install apples
run_apples.py -s backbone_seq_file -q query_seq_file -t backbone_tree -o outfile
pip install treecluster
TreeCluster.py -h
run_upp.py -s seq_to_be_aligned -a backbone_seq -t backbone_tree -A 200 -d otudir -x 36 -p tmp_dir
Note: backbone_tree can be obtained using FastTree: FastTree -nt backbone_seq > backbone_tree