biodavidjm / artMS

Analytical R Tools for Mass Spectrometry

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated

jfertaj opened this issue · comments

Dear David,

I have installed the new version of artMS that includes some nice features. However, I am having some issues when running analyses that were successful run with artMS 1.9.4.

I got an error during Msstats step after handling the fractions (no fractions enabled in my experiments).

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 7002761 rows; more than 584171 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
> 

When I run the same files with artMS 1.9.4 the analyses ends perfectly.

This is my yaml configuration file

files:
  evidence: evidence_LS.txt
  keys: keys_LS.txt
  contrasts: contrasts_LS.txt
  summary: summary_LS.txt
  output: results_LS/results_LS.txt
qc:
  basic: 0
  extended: 0
  extendedSummary: 0
data:
  enabled: 1
  silac:
    enabled: 0
  filters:
    enabled: 1
    contaminants: 1
    protein_groups: keep
    modifications: AB
  sample_plots: 1
msstats:
  enabled: 1
  msstats_input: ~
  profilePlots: none
  normalization_method: equalizeMedians
  normalization_reference: ~
  summaryMethod: TMP
  MBimpute: 1
  censoredInt: NA
  feature_subset: all
  n_top_feature: 3
  logTrans: 2
  remove_uninformative_feature_outlier: no
  min_feature_count: 2
  equalFeatureVar: yes
  remove50missing: no
  fix_missing: ~
  maxQuantileforCensored: 0.999
  use_log_file: no
  append: no
  log_file_path: ~
output_extras:
  enabled: 1
  annotate:
    enabled: 1
    species: HUMAN
  plots:
    volcano: 1
    heatmap: 1
    LFC: -0.58 0.58
    FDR: 0.05
    heatmap_cluster_cols: 0
    heatmap_display: log2FC

Any help would be appreciated
Thanks

Juan

Hi Juan, thanks for reporting this.

we would need a little bit more information to debug this issue.

  • Could you please re-run the artmsQuantification() function activating the parameter display_msstats = TRUE and provide the full output message display in the console?
  • Could you also please copy and paste the content of artms_sessionInfo_quantification.log?

Thanks!

Hi I meet the same question

my worng is

artMS: Relative Quantification using MSstats

Reading the configuration file
LOADING DATA
MERGING FILES
CONVERT Intensity values < 1 to NA
FILTERING
-- Contaminants CON__|REV__ removed
-- Removing protein groups
-- Use <Leading.razor.protein> as Protein ID
-- PROCESSING AB
CONVERTING THE DATA TO MSSTATS FORMAT
-- Selecting Sequence Type: MaxQuant 'Modified.sequence' column
(+) column added (with value 1, MSstats requirement)
-- Adding NA values for missing values (required by MSstats)
-- Write out the MSstats input file (-mss.txt)
RUNNING MSstats (it usually takes a 'long' time: please, be patient)
-- Normalization method: equalizeMedians
INFO [2021-08-10 00:23:44] ** Features with one or two measurements across runs are removed.
INFO [2021-08-10 00:23:44] ** Fractionation handled.
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in 37881 rows; more than 4218 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

Thanks!

Thanks,

To debug the issue, it is also needed the following information:

  • Please, run the following commands and provide the outputs:
# R version
version

# artMS version
packageVersion("artMS")
  • Could you also please copy and paste the content of artms_sessionInfo_quantification.log?

Thanks

version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 1.0
year 2021
month 05
day 18
svn rev 80317
language R
version.string R version 4.1.0 (2021-05-18)
nickname Camp Pontanezen

artMS version

packageVersion("artMS")
[1] ‘1.10.2’

and the log is
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C LC_TIME=Chinese (Simplified)_China.936

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] artMS_1.10.2

loaded via a namespace (and not attached):
[1] nlme_3.1-152 bitops_1.0-7 bit64_4.0.5 RColorBrewer_1.1-2 httr_1.4.2
[6] GenomeInfoDb_1.28.1 UpSetR_1.4.0 tools_4.1.0 backports_1.2.1 utf8_1.2.2
[11] R6_2.5.0 KernSmooth_2.23-20 lazyeval_0.2.2 DBI_1.1.1 BiocGenerics_0.38.0
[16] colorspace_2.0-2 ade4_1.7-17 tidyselect_1.1.1 gridExtra_2.3 bit_4.0.4
[21] compiler_4.1.0 VennDiagram_1.6.20 preprocessCore_1.54.0 Biobase_2.52.0 formatR_1.11
[26] plotly_4.9.4.1 ggdendro_0.1.22 caTools_1.18.2 scales_1.1.1 checkmate_2.0.0
[31] stringr_1.4.0 digest_0.6.27 minqa_1.2.4 XVector_0.32.0 pkgconfig_2.0.3
[36] htmltools_0.5.1.1 lme4_1.1-27.1 fastmap_1.1.0 limma_3.48.1 htmlwidgets_1.5.3
[41] rlang_0.4.11 GlobalOptions_0.1.2 RSQLite_2.2.7 shape_1.4.6 generics_0.1.0
[46] jsonlite_1.7.2 gtools_3.9.2 dplyr_1.0.7 zip_2.2.0 RCurl_1.98-1.3
[51] magrittr_2.0.1 GenomeInfoDbData_1.2.6 futile.logger_1.4.3 Matrix_1.3-3 Rcpp_1.0.7
[56] munsell_0.5.0 S4Vectors_0.30.0 fansi_0.5.0 lifecycle_1.0.0 yaml_2.2.1
[61] stringi_1.7.3 MASS_7.3-54 zlibbioc_1.38.0 org.Hs.eg.db_3.13.0 gplots_3.1.1
[66] plyr_1.8.6 grid_4.1.0 blob_1.2.2 parallel_4.1.0 MSstatsConvert_1.2.2
[71] ggrepel_0.9.1 crayon_1.4.1 MSstats_4.0.1 lattice_0.20-44 Biostrings_2.60.2
[76] splines_4.1.0 circlize_0.4.13 KEGGREST_1.32.0 pillar_1.6.2 boot_1.3-28
[81] log4r_0.3.2 seqinr_4.2-8 marray_1.70.0 stats4_4.1.0 futile.options_1.0.1
[86] glue_1.4.2 lambda.r_1.2.4 data.table_1.14.0 png_0.1-7 vctrs_0.3.8
[91] nloptr_1.2.2.2 tidyr_1.1.3 gtable_0.3.0 getopt_1.20.3 purrr_0.3.4
[96] cachem_1.0.5 ggplot2_3.3.5 openxlsx_4.2.4 viridisLite_0.4.0 survival_3.2-11
[101] tibble_3.1.3 pheatmap_1.0.12 AnnotationDbi_1.54.1 memoise_2.0.0 IRanges_2.26.0
[106] corrplot_0.90 cluster_2.1.2 ellipsis_0.3.2

Thanks!
You are using the right version. The issue might be the keys.txt file. Could you please copy and paste here the content of the keys file? Alternatively, you could send it by email to artms.help@gmail.com

My key file is

Raw.file Condition BioReplicate Run IsotopeLabelType
A1.raw a a_1 1 L
A2.raw a a_2 2 L
A3.raw a a_3 3 L
B_1.raw b b_1 1 L
B_2.raw b b_2 2 L
B_3.raw b b_3 3 L
C_1.raw c c_1 1 L
C_2.raw c c_2 2 L
C_3.raw c c_3 3 L

Ok, we got it,

the problem is your keys. Please, check the documentation to find out more about it Content > Input files > keys.txt

  • Condition: The conditions names must follow these rules:
    • Use only letters (A - Z, both uppercase and
      lowercase) and numbers (0 - 9). The only special character allowed
      is underscore (_).
    • Very important: A condition name cannot begin with a number
      (R limitation)
      .
  • BioReplicate: biological replicate number. It is based on the condition
    name. Use as prefix the corresponding Condition name, and add as suffix
    dash (-) plus the biological replicate number.
    For example, if condition H1N1_06H has too biological replicates,
    name them H1N1_06H-1 and H1N1_06H-2

i.e., you are using _ instead of - in the BioReplicate column. Change that (a-1 instead a_1, etc), re-run artmsQuantification.

We definitely need to add a function to check for this to make sure it stops the analysis if the. We'll do it in the next version of artMS.

Thanks

I replace the a_1 to a-1, But I met the same wrong
By the way. I using the MSstats run the same file, and I finish it. I did not meet any wrong

Ok, I forgot to mention to make the "Run" column from 1 to 9 and please, try again.

I finished it! Thank you !

Hi,
I want to analysis the Methylation in my data. So I set the user defined PTM

in my config file, I wrote:

data:
enabled: 1
silac:
enabled: 0
filters:
enabled: 1
contaminants: 1
protein_groups: remove
modifications: PTM:KR:methyl

But I met the trouble:

Error in .artms_filterData(x = x, config = config, verbose = verbose) :
The config > data > filters > modification PTM:KR:METHYL is not valid option

Glad to hear that the issue was solved.
With respect to the other question, could you please start a new github issue?

Thanks for your patience helping me!

Hi David,

Sorry for open again this issue. I have run my data using a the example time course experiment template in MSstats manual and it run without any warnings, I don't know if the issue could be that my data is a time course experiment with same sample measured in two different times and it caused artMS to failed.
I don't know how to translate the annotation file required in MSstast to keys file for artMS but I attached here the file in case you want to have a look

Thanks
Juan
annotation2.txt

Hi Juan,

It looks like you have 6 different conditions (Time1_N, Time1_P, etc), with 15 bioreplicates each? (Sample_10N, Sample_11N, etc). Is this correct?

if it is the case, you are not following the naming rules explained above and in the documentation.

This would be very easy to solve, i.e., you should call your bioreplicates Time1_N-1, Time1_N-2, Time1_N-3,... Time1_N-15 etc.