rna-seq-accession-management

Scripts and data for managing the large number of RNA-Seq accessions we want to use

How to use

Additional lists of RNA-Seq experiments should be generated by doing the following:

Run a command similar to: ./fetch_new_accessions.bash human_taxon_id.txt 200_human_SRA_20190812_9606.txt. You may need to change some things such as organism name, taxon id file, date, etc.
Move the output file (which would be 200_human_SRA_20190812_9606.txt after the previous command) to a directory inside the previous_accessions directory.
Add it to the repo using git add, git commit, and push the changes with git push.
You've done it! You can now upload the file to S3 and kick off a surveyor dispatcher job for it!

The scripts are configured to automatically filter accessions from the same taxon id contained in the previous_accessions directory. How many accessions to output is not currently configured as a parameter and instead that number is hardcoded as 200 into fetch_new_accessions.bash. If you'd like to paremeterize that parameter a PR would be welcomed, otherwise it's not too hard to find and replace it.

About

Scripts and data for managing the large number of RNA-Seq accessions we want to use

BSD 3-Clause "New" or "Revised" License

Languages

Language:HTML 99.5%Language:Python 0.3%Language:Shell 0.2%