whc2 / code_for_Coreopsideae

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

These scripts were used in the genome project of three plants in the tribe Coreopsideae within the sunflower family (Asteraceae). All the whole-genome sequencing data that support this project have been deposited at China National Genomics Data Center (https://ngdc.cncb.ac.cn) with the Project ID PRJCA017572. The genome assemblies, gene annotations, and other resources are also available at Zenodo (https://doi.org/10.5281/zenodo.8296602).

Contigs coverage

1. reads mapping

minimap2 -t 90 -ax map-hifi -I 16G ctgs.fa reads1.fq.gz | samtools view -@ 20 -bo reads1.fq.gz.bam - minimap2 -t 90 -ax map-hifi -I 16G ctgs.fa reads2.fq.gz | samtools view -@ 20 -bo reads2.fq.gz.bam -

2. merge bam files, filter unmapped and non-primary alignments, and sort bam

samtools merge -@ 40 ccs_bam_merged.bam reads1.fq.gz.bam reads2.fq.gz.bam

samtools view -F 0x104 -@ 40 ccs_bam_merged.bam -o ccs_bam_merged.noSecond_noUnmap.bam

samtools sort -o ccs_bam_merged.noSecond_noUnmap.sort.bam -@ 80 ccs_bam_merged.noSecond_noUnmap.bam

3. calculate base-level coverage

bedtools genomecov -ibam ccs_bam_merged.noSecond_noUnmap.sort.bam -bg > ccs_bam.bedGraph

4. calculate contig-level coverage

perl contig_coverage.pl ctg.fa.len ccs_bam.bedGraph > ccs_bam.bedGraph.ctgCov

Insertion time of estimation for intact LTRs

1. generate bed file of each LTR

perl ltr_extract.pl genome.EDTA.intact.gff3

2. get fasta of each LTR (bedtools v2.30.0), use data parallel

bedtools getfasta -fi genome.fa -bed LTR_Copia_LTR_100000.bed -fo LTR_Copia_LTR_100000.bed.fa -s

3. multiple sequence alignment of each LTR, use data parallel

muscle -in LTR_Copia_LTR_100000.bed.fa -out LTR_Copia_LTR_100000.bed.fa.mucle

4. move different types of LTRs into different directory

mv LTR_Copia*.muscle z.Copia/

5. estimate insertion time

R CMD BATCH insertion_time.R insertion_time.Rout

6. summary insertion times

perl sum_times.pl LTR_Copia_Insert_time.csv LTR_Gypsy_Insert_time.csv LTR_unknown_Insert_time.csv > LTR_Insert_sum.tsv

7. draw insertion density plot

Rscript intact_LTR_density.R LTR_Insert_sum.tsv

About

License:GNU General Public License v3.0


Languages

Language:Perl 85.2%Language:R 14.8%