sequences dropped from the index
fklirono opened this issue · comments
Hello,
kallisto (0.44.0) seems to be silently dropping sequences from the index.
Working example:
- download circBase hg19 circRNA sequences.
- count them (140790)
- index them and count again how many targets are included in the index (92509)
Is there a reason why some sequences are not indexed?
Code to reproduce example:
wget 'http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz' | gzip -d -c > human_hg19_circRNAs_putative_spliced_sequence.fa
sed -n '/^>/p' human_hg19_circRNAs_putative_spliced_sequence.fa | wc -l
kallisto index -i human_hg19_circRNAs_putative_spliced_sequence.fa.fai human_hg19_circRNAs_putative_spliced_sequence.fa
kallisto inspect human_hg19_circRNAs_putative_spliced_sequence.fa.fai
sorry for opening this issue here. I should have opened it directly at the kallisto github....had not had enough coffee yet to wake up...