cumc / pecotmr

Pair-wise enrichment, colocalization, TWAS and Mendelian Randomization to integrate molecular QTL and GWAS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code improvements

danielnachun opened this issue · comments

Now that our package has draft implementations of most major steps, we should start to think about polishing the code more to emphasize simplicity, readability and consistency. Broadly speaking I think we should prioritize our choice of packages to use functions from sources in the following order:

  • tidyverse packages from CRAN
  • other packages from CRAN
  • Bioconductor packages
  • Packages only available on GitHub

The following packages that we currently depend on can probably be removed

  • Rlab - uses a message2 function that is not even in the documentation - can we just use the base message function?
  • doMC, doParallel and foreach - all of these can be replaced with a combination of tidyverse packages:future, dplyr and purrr.
  • data.table - the fread function can be replaced with the vroom function from the vroom package.
  • plink2R - this can be replaced with snpStats for loading bed/bim/fam files and pgenlibr

The following packages are currently only available on GitHub. We cannot submit our package to CRAN or Bioconductor until these packages have been made available on one of those two resources:

  • Gao's version of susieR
  • mr.mash.alpha
  • mr.ash.alpha

Stylistically, I would like to also strictly enforce several concepts from functional programming:

  • no loops - replace these with group_modify or group_map from dplyr when working with data frames, and map (or its derivates) from purrr when working with lists
  • immutable variables - once a variable is declare it should be modified. The only exception to this is to save memory
  • descriptive variable names - variables which are not part of algebraic operations with formal notation in the manuscript should have descriptive variable names.
commented

Thank you @danielnachun everything you said sound good to me.

immutable variables - once a variable is declare it should be modified.

should, or should not?

until these packages have been made available on one of those two resources

My version of susieR should be merged to stephenslab susieR hopefully soon depending on progress of the other project. However I can also merge the prototype as is without loudly advertising the feature, just to save us some logistics headache. i can push through on getting mr.ash and mr.mash on cran before our official release.

The following packages that we currently depend on can probably be removed

Sounds good. I wonder who could take care of this -- Travyse and/or Tiffany?

no loops - replace these with group_modify or group_map from dplyr when working with data frames, and map (or its derivates) from purrr when working with lists

I'm not particularly familiar with these tbh but sounds doable. The question is how to implement it. Perhaps we should wait until we have enough unit tests, then someone (who?) would have to identify each of it, rewrite, and run though the tests?