Scripts and data for managing the large number of RNA-Seq accessions we want to use
Additional lists of RNA-Seq experiments should be generated by doing the following:
- Run a command similar to:
./fetch_new_accessions.bash human_taxon_id.txt 200_human_SRA_20190812_9606.txt
. You may need to change some things such as organism name, taxon id file, date, etc. - Move the output file (which would be
200_human_SRA_20190812_9606.txt
after the previous command) to a directory inside the previous_accessions directory. - Add it to the repo using
git add
,git commit
, and push the changes withgit push
. - You've done it! You can now upload the file to S3 and kick off a surveyor dispatcher job for it!
The scripts are configured to automatically filter accessions from the same taxon id contained in the previous_accessions directory.
How many accessions to output is not currently configured as a parameter and instead that number is hardcoded as 200 into fetch_new_accessions.bash
.
If you'd like to paremeterize that parameter a PR would be welcomed, otherwise it's not too hard to find and replace it.