CSD_project

An attempt to find CSD genes in other hymenopterans using the only reference (Apis melifera).

Latin	Common	NCBI	From?
Apis mellifera	Honey Bee	Amel_HAv3.1	Uppsala University
Osmia bicornis	Red Mason Bee	iOsmBic2.1	DTOL
Nomada fabriciana	Fabricus' Nomad Bee	iyNomFabr1	DTOL
Apis laboriosa	Himalayan Honeybee	ASM1406632v1	Shangri-la
Exoneura robusta	Bee...	ASM1945341v1	Uni of new Hampsire
Apis florea	Little Honeybee	Aflo_1.1	Baylor college of Medicine

Apologies for the unordered table.

Starting with these 6 assemblies, I expect at least the two other Apis sp. will contain a close match to the A. mellifera csd gene. The others I hope are closely enough related that there would be some degree of a match.

Notes

It is noted that it seems as though a similar experiment was performed by Schmieder, Colinet and Poirie

The above note that the csd gene is a neofunctional duplication of the fem gene (feminiser). So these will both be used for comparisons.

Original Workflow

1 - Download the assemblies from NCBI

Self-explanatory

2 - Convert to protein and align

As a protein sequence would be more conserved than the DNA nucleotide sequence, thanks to the redundancy of codons, the above assemblies will be converted to protein sequence.

Using Prommer we can convert to protein and align, with the caveat that it is more computationally intensive than just using the DNA nucleotide sequence. But this is also not multi-threaded

csd

for i in ../fasta/*.fasta;
    do prefix="../fasta/";
    j=${i#"$prefix"};
    echo "Running for: CSD + ${j}";
    bsub -q long -o out.txt -e error.txt -M 12000 -R "select[mem > 12000] rusage[mem=12000]" -M12000 /software/grit/tools/nummer323/MUMmer3.23/promer --mum -p CSD-${j} ../fasta/apis-ref-csd.fasta ${i};
done

fem

for i in ../fasta/*.fasta;
    do prefix="../fasta/";
    j=${i#"$prefix"};
    echo "Running for: CSD + ${j}";
    bsub -q long -o out.txt -e error.txt -M 12000 -R "select[mem > 12000] rusage[mem=12000]" -M12000 /software/grit/tools/nummer323/MUMmer3.23/promer --mum -p FEM-${j} ../fasta/apis-ref-fem.fasta ${i};
done

Each job took ~80 seconds so not too intensive.

3 - Convert output to Dot style

suffix=".delta"
for i in *.delta;
	do echo "Running for: ${i}";
	j=${i#"$suffix"}
	/software/grit/bin/DotPrep.py --delta ${i} --out ../dot/${j}
done

suffix=".delta"
for i in *.delta;
	do echo "Running for: ${i}";
	j=${i#"$suffix"}
	/software/grit/bin/DotPrep.py --delta ${i} --out ../fem-dot/${j}
done

4 - Visualise in Dot

This resulted in minimal alignments. The best being between the fem and csd gene which makes sense as csd is a neofunctional dupe of fem, but even this alignment was very minimal.

Oddly enough, the reference csd taken from A. mellifera didn't align at all. Perhaps Dot is too simple a tool?

New Workflow

This is now version 4 of my original plan.

Using Biopython, i've converted the genomic fasta to pep. This will not be perfect but should help indicate whether there is merit to the project.

for i in fasta/*.fasta; do 
    python3 scripts/convert.py ${i} ./pep/;
done

Take genomic protein fasta and use with a BLASTP search.

Tiphia sp. will act as a Negative control as csd is expected to only be in the Apodiae, which Tiphia sp. is not. It is likely that Tiphia sp use a ml-CSD.

Apis mellifera will be the positive control, as it contains the reference CSD

Bombus sp., Apis sp. will be screened

Further investigations

Using HMM to identify the domains in the Apodiae assemblies

DLBPointon / CSD_project