hughesevoanth / moosefun

a repo full of random useful functions for myself and colleagues

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Random useful functions for Moose's work

Authors: David Hughes

Date started: 31st Jan. 2020

About:

This repo contains a variety of helpful functions that I come back to time and time again. Rather than adding them into a variety of different packages I will try to maintain this catch-all that I and my colleagues can source when necessary.

Installation instructions of the moosefun package

1. insure that the devtools library is installed on your local machine
	 > ifelse("devtools" %in% rownames(installed.packages()), 
	 NA, 
	 install.packages("devtools"))
	 
2. install moosefun
	> devtools::install_github("hughesevoanth/moosefun")

iPVs functions have moved to their own package

Date moved: March 6th 2020
For functions to aid in the identification of principal variables in inter-correlated data sets please follow the path below.

https://github.com/hughesevoanth/iPVs

coverting rsid to SNPID and SNPID to rsid

the useful functions are: rsid_2_chrbp() & chrbp_2_rsid()

  • Do note that this function is only good for a handful (1 to a few hundred) of SNPs that need an id conversion.
  • The reason for this is because for each ID to convert a query is sent over the web and a built in sleep step needed to be implemented to avoid query crashes, which can still occur.
  • If you can I would avoid trying to run these functions for more than ~100 SNPs at a time.
  • If you need to convert thousands of IDs look into using ANNOVAR (https://annovar.openbioinformatics.org/en/latest/) or some other tool.

EXAMPLE:

#########################
## install biomaRt
#########################
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("biomaRt")

#########################
## 	load libraries
#########################
library(moosefun)
library(biomaRt)

#########################
## convert rsids to snpids
#########################
## read in the SNP list
snps = c("rs4987657","rs4987667","rs4987682")
## Extract mapping cooridinates
map = rsid_2_chrbp(snps)
o = order(map$chr, map$bp)
map = map[o,]

#########################
## convert snpids to rsids
#########################
## read in the SNP list
snps = c("7:142569596:A_G","7:142572908:T_C","7:142574913:A_G")

## NOTE: the function splits snpids on ":" and only uses the first two 
## strings as chr and bp, all else is ignored

## Extract mapping cooridinates
map2 = chrbp_2_rsid(snps)
o = order(map2$chr, map2$bp)
map2 = map2[o,]

--

DESCRIPTION OF OTHER FUNCTIONS

ztransform()

  • taken from the GenABEL package to perform z-transformations. This function is necessary to run my edited version of the rank normal transformation function.

rntransform()

  • taken from the GenABEL package to perform rank normal transformations, but edited to randomly split tied values.

moose_biplot()

  • a function to plot a biplot or a PCA with a loadings plot on top of it.

  • however, the uniqueness here is that we are not plotting the loading from variables used in the construction of the PCA.

  • rather, we are passing a novel set of variables | traits | phenotypes that will be correlated to the PC-axis (1 and 2) and then plotted.

  • as an example:

    1. generate a prcomp object:

       pca = prcomp( iris[, 1:4] )
      
    2. or you can generate a probabilistic pca (pcaRes) object

       pca = ppca( as.matrix( iris[, 1:4] ), nPCs = 4)
      
    3. run the function

       moose_biplot(PCA = pca, dataframe_of_phenotypes = iris[, 1:4], 
        plot_top_N_phenotypes = 3, 
        grouping1 = iris$Species, grouping1NAME = "species",
        grouping2 = iris$Species, grouping2NAME =  "species",
        scalearrows = FALSE )
      
    • the dataframe_of_phenotypes can be any matrix of quantitative trait with the same number of row as passed to the prcomp() or ppca() functions.
    • grouping1 dictates the color scheme and the ellipses to be drawn Currently limited to 9 groups
    • grouping2 dictates the plot shapes to be used. Currently limited to 5 groups
    • scalearrows allows you to scale the largest correlated trait to a rho of 1, and all other arrows in correspoinding manner. This can be done to aid in visualization. Note: if scalearrows is set to TRUE, the relative length of the arrows remain informative.

About

a repo full of random useful functions for myself and colleagues

License:MIT License


Languages

Language:R 100.0%