nanxstats / protr

🧬 Toolkit for generating various numerical features of protein sequences

Home Page:https://nanx.me/protr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error reading FASTA files

kbarylyuk opened this issue · comments

Hi @road2stat ,

I would like to extract some amino acid sequence properties for a set of 3699 proteins. I have discovered the protr package and it seems to offer the functionality that I am looking for. Unfortunately, the function readFASTA returns an error even when I am trying to read the example.fasta file available at the ProtrWeb server:

> AAseq <- readFASTA(file = system.file("Data/example.fasta", package = "protr"))
Error in readFASTA(file = system.file("Data/example.fasta", package = "protr")) : 
  no line starting with a > character found
In addition: Warning message:
In file(con, "r") :
  file("") only supports open = "w+" and open = "w+b": using the former

Any ideas why this fails? Thank you!

My session info:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Biostrings_2.48.0    XVector_0.20.0       BiocInstaller_1.30.0 protr_1.5-1          dbscan_1.1-2         FactoMineR_1.41      Rtsne_0.13          
 [8] ggplot2_3.0.0        pRolocGUI_1.14.0     pRoloc_1.21.8        MLInterfaces_1.60.1  cluster_2.0.7-1      annotate_1.58.0      XML_3.98-1.16       
[15] AnnotationDbi_1.42.1 IRanges_2.14.11      S4Vectors_0.18.3     MSnbase_2.6.3        ProtGenerics_1.12.0  BiocParallel_1.14.2  mzR_2.14.0          
[22] Rcpp_0.12.18         Biobase_2.40.0       BiocGenerics_0.26.0 

loaded via a namespace (and not attached):
  [1] tidyselect_0.2.4      RSQLite_2.1.1         htmlwidgets_1.2       grid_3.5.1            trimcluster_0.1-2.1   lpSolve_5.6.13        rda_1.0.2-2.1        
  [8] munsell_0.5.0         codetools_0.2-15      preprocessCore_1.42.0 DT_0.4                withr_2.1.2           colorspace_1.3-2      knitr_1.20           
 [15] rstudioapi_0.7        leaps_3.0             geometry_0.3-6        robustbase_0.93-2     dimRed_0.1.0          mzID_1.18.0           labeling_0.3         
 [22] hwriter_1.3.2         bit64_0.9-7           ggvis_0.4.3           rprojroot_1.3-2       coda_0.19-1           ipred_0.9-7           randomForest_4.6-14  
 [29] diptest_0.75-7        R6_2.2.2              doParallel_1.0.11     flexmix_2.3-14        DRR_0.0.3             bitops_1.0-6          assertthat_0.2.0     
 [36] promises_1.0.1        scales_1.0.0          nnet_7.3-12           gtable_0.2.0          affy_1.58.0           ddalpha_1.3.4         timeDate_3043.102    
 [43] rlang_0.2.2           CVST_0.2-2            genefilter_1.62.0     scatterplot3d_0.3-41  RcppRoll_0.3.0        splines_3.5.1         lazyeval_0.2.1       
 [50] ModelMetrics_1.2.0    impute_1.54.0         hexbin_1.27.2         broom_0.5.0           yaml_2.2.0            reshape2_1.4.3        abind_1.4-5          
 [57] threejs_0.3.1         crosstalk_1.0.0       backports_1.1.2       httpuv_1.4.5          caret_6.0-80          tools_3.5.1           lava_1.6.3           
 [64] affyio_1.50.0         RColorBrewer_1.1-2    proxy_0.4-22          plyr_1.8.4            base64enc_0.1-3       progress_1.2.0        zlibbioc_1.26.0      
 [71] purrr_0.2.5           RCurl_1.95-4.11       prettyunits_1.0.2     rpart_4.1-13          viridis_0.5.1         sampling_2.8          sfsmisc_1.1-2        
 [78] LaplacesDemon_16.1.1  magrittr_1.5          data.table_1.11.4     pcaMethods_1.72.0     mvtnorm_1.0-8         whisker_0.3-2         randomcoloR_1.1.0    
 [85] hms_0.4.2             mime_0.5              evaluate_0.11         xtable_1.8-3          mclust_5.4.1          gridExtra_2.3         compiler_3.5.1       
 [92] biomaRt_2.36.1        tibble_1.4.2          V8_1.5                crayon_1.3.4          htmltools_0.3.6       segmented_0.5-3.0     later_0.7.4          
 [99] tidyr_0.8.1           lubridate_1.7.4       DBI_1.0.0             magic_1.5-8           MASS_7.3-50           fpc_2.1-11.1          Matrix_1.2-14        
[106] vsn_3.48.1            gdata_2.18.0          mlbench_2.1-1         bindr_0.1.1           gower_0.1.2           igraph_1.2.2          pkgconfig_2.0.2      
[113] flashClust_1.01-2     recipes_0.1.3         MALDIquant_1.18       foreach_1.4.4         prodlim_2018.04.18    stringr_1.3.1         digest_0.6.16        
[120] pls_2.7-0             rmarkdown_1.10        dendextend_1.8.0      curl_3.2              kernlab_0.9-27        shiny_1.1.0           gtools_3.8.1         
[127] modeltools_0.2-22     nlme_3.1-137          jsonlite_1.5          bindrcpp_0.2.2        viridisLite_0.3.0     limma_3.36.3          pillar_1.3.0         
[134] lattice_0.20-35       httr_1.3.1            DEoptimR_1.0-8        survival_2.42-6       glue_1.3.0            FNN_1.1.2.1           gbm_2.1.3            
[141] prabclus_2.2-6        iterators_1.0.10      bit_1.1-14            class_7.3-14          stringi_1.2.4         mixtools_1.1.0        blob_1.1.1           
[148] memoise_1.1.0         dplyr_0.7.6           e1071_1.7-0       

@kbarylyuk - system.file is a base function that generates paths to the files which exist in the package itself. Please replace it with the path of your own file. Meaning:readFASTA("Data/example.fasta").

Thank you very much, @road2stat ! This is very useful, everything is working now.

I just stumbled on the exact same thing. Maybe add a comment in the instruction code on https://cran.r-project.org/web/packages/protr/vignettes/protr.html? It is not so easy for a bear of very little brain...

@jonalv -good point, could you send in a PR? Thanks!

@jonalv -good point, could you send in a PR? Thanks!

oki :)