nanxstats / protr

🧬 Toolkit for generating various numerical features of protein sequences

Home Page:https://nanx.me/protr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

extractCTDT returns wrong results when a transition has a count of zero

koefoed opened this issue · comments

> packageVersion('protr')
[1] ‘0.5.1> extractCTDT("NAKADQASSDAQTANAKADQASNDANAARSDAQAAKDDAARANQRADNAA")
prop1.Tr1221 prop1.Tr1331 prop1.Tr2332 prop2.Tr1221 prop2.Tr1331 prop2.Tr2332 
  0.67346939           NA           NA   0.36734694   0.22448980           NA 
prop3.Tr1221 prop3.Tr1331 prop3.Tr2332 prop4.Tr1221 prop4.Tr1331 prop4.Tr2332 
          NA           NA   0.67346939   0.36734694   0.22448980           NA 
prop5.Tr1221 prop5.Tr1331 prop5.Tr2332 prop6.Tr1221 prop6.Tr1331 prop6.Tr2332 
  0.22448980           NA   0.26530612   0.04081633   0.42857143           NA 
prop7.Tr1221 prop7.Tr1331 prop7.Tr2332 
  0.57142857   0.06122449   0.10204082 

Note the value of NA for prop2.Tr2332.

This example sequence has one G2G3 transition for property 2, but no G3G2 transition.

The G3G2 transition being absent results in an NA in GSummary[[2]]['G3G2'](line 45 in the source).
This NA propagates through the sum and destroys the correct result for G2G3, which in this case is 0.02040816.

Also, (most of) the other NA's should be zero, as this is the correct percentage when there are no transitions.