krabberod / OperonOrchestrator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OperonOrchestrator

This is code for handling sequences for the ribosomal operon database. Sequence input is from Mycocosm and NCBI.

Genomes in NCBI can be found:

rsync -t -v rsync://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt .

Overview as of Oct. 16, 2023

##  See ftp://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt for a description of the columns in this file.
## This is from column 25: 
  16538 archaea
1779617 bacteria
  16572 fungi
   6249 invertebrate
   7232 metagenomes
   7593 other (???)
   3604 plant
   1978 protozoa
   2854 vertebrate_mammalian
   3746 vertebrate_other
  80657 viral

TODO: Gather stats from Vsearch aligning on Saga for each genome TODO: Select the reference sequences.

  • should be longer than 4000 bp.
  • If there are several copies they should be most similar to the others

About


Languages

Language:R 100.0%