An attempt to find CSD genes in other hymenopterans using the only reference (Apis melifera).
Latin | Common | NCBI | From? |
---|---|---|---|
Apis mellifera | Honey Bee | Amel_HAv3.1 | Uppsala University |
Osmia bicornis | Red Mason Bee | iOsmBic2.1 | DTOL |
Nomada fabriciana | Fabricus' Nomad Bee | iyNomFabr1 | DTOL |
Apis laboriosa | Himalayan Honeybee | ASM1406632v1 | Shangri-la |
Exoneura robusta | Bee... | ASM1945341v1 | Uni of new Hampsire |
Apis florea | Little Honeybee | Aflo_1.1 | Baylor college of Medicine |
Apologies for the unordered table.
Starting with these 6 assemblies, I expect at least the two other Apis sp. will contain a close match to the A. mellifera csd gene. The others I hope are closely enough related that there would be some degree of a match.
It is noted that it seems as though a similar experiment was performed by Schmieder, Colinet and Poirie
The above note that the csd gene is a neofunctional duplication of the fem gene (feminiser). So these will both be used for comparisons.
Self-explanatory
As a protein sequence would be more conserved than the DNA nucleotide sequence, thanks to the redundancy of codons, the above assemblies will be converted to protein sequence.
Using Prommer we can convert to protein and align, with the caveat that it is more computationally intensive than just using the DNA nucleotide sequence. But this is also not multi-threaded
for i in ../fasta/*.fasta;
do prefix="../fasta/";
j=${i#"$prefix"};
echo "Running for: CSD + ${j}";
bsub -q long -o out.txt -e error.txt -M 12000 -R "select[mem > 12000] rusage[mem=12000]" -M12000 /software/grit/tools/nummer323/MUMmer3.23/promer --mum -p CSD-${j} ../fasta/apis-ref-csd.fasta ${i};
done
for i in ../fasta/*.fasta;
do prefix="../fasta/";
j=${i#"$prefix"};
echo "Running for: CSD + ${j}";
bsub -q long -o out.txt -e error.txt -M 12000 -R "select[mem > 12000] rusage[mem=12000]" -M12000 /software/grit/tools/nummer323/MUMmer3.23/promer --mum -p FEM-${j} ../fasta/apis-ref-fem.fasta ${i};
done
Each job took ~80 seconds so not too intensive.
suffix=".delta"
for i in *.delta;
do echo "Running for: ${i}";
j=${i#"$suffix"}
/software/grit/bin/DotPrep.py --delta ${i} --out ../dot/${j}
done
suffix=".delta"
for i in *.delta;
do echo "Running for: ${i}";
j=${i#"$suffix"}
/software/grit/bin/DotPrep.py --delta ${i} --out ../fem-dot/${j}
done
This resulted in minimal alignments. The best being between the fem and csd gene which makes sense as csd is a neofunctional dupe of fem, but even this alignment was very minimal.
Oddly enough, the reference csd taken from A. mellifera didn't align at all. Perhaps Dot is too simple a tool?
This is now version 4 of my original plan.
Using Biopython, i've converted the genomic fasta to pep. This will not be perfect but should help indicate whether there is merit to the project.
for i in fasta/*.fasta; do
python3 scripts/convert.py ${i} ./pep/;
done
Take genomic protein fasta and use with a BLASTP search.
Tiphia sp. will act as a Negative control as csd is expected to only be in the Apodiae, which Tiphia sp. is not. It is likely that Tiphia sp use a ml-CSD.
Apis mellifera will be the positive control, as it contains the reference CSD
Bombus sp., Apis sp. will be screened
- Using HMM to identify the domains in the Apodiae assemblies