These scripts are designed to retrieve taxonomic information for protein sequences using NCBI's Entrez Utilities.
Before running the scripts, ensure that you have the following dependencies installed:
The getTax.sh
script retrieves taxonomic information for a given protein sequence accession number.
./getTax.sh [ACCESSION_NUMBER]
Replace [ACCESSION_NUMBER] with the desired protein sequence accession number.
Example:
./getTax.sh AAA46477.1
An easy way to get taxonomy info for multiple accession numbers would be creating a run.sh file.
To process multiple accession numbers, you can use the run.sh
script:
The run.sh
script processes a list of protein sequence accession numbers using the getTax.sh
script.
- Update
run.sh
with the desired accession numbers. - Execute
run.sh
.
#!/usr/bin/bash
./getTax.sh AAA46477.1 &
./getTax.sh AAC55975.1 &
./getTax.sh AAD02414.1 &
./getTax.sh AAD47817.1 &
./getTax.sh AAF29594.1 &
./getTax.sh AAF29595.1 &
./getTax.sh AAF80604.1 &
./getTax.sh AAH69831.1 &
wait
The run.sh
script can be created based on a file containing multiple accession numbers or you could use:
while read acc; do ./getTax.sh ${acc}; done < accession_numbers.txt
Amanda Araújo Serrão de Andrade aandradebio@gmail.com
Feel free to contact me, open an issue, or a pull request.