RajLabMSSM / echodata

echoverse module: Example data.

Home Page:https://rajlabmssm.github.io/echodata

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get topSS from input GWAS using import_topSNPs

AMCalejandro opened this issue · comments

I find a bit painful having to generate the Gene Locus columns from the input GWAS every time.
I would like to see import_topSNPs being able to generate that for me rather than passing Gene Locus, or a function called make_topSNPs that does it for you.

I came up with a function to do this, basically wrapping the function here https://github.com/oyhel/vautils

This is the approach I used

make_topSNPs = function(data,
                        build = "hg19",
                        write.out = T,
                        custom_gene = NULL,
                        .metadata_file = metadata) {
  # I need to do this in advance to enter vautils function
  data = data %>%
    dplyr::rename(rsid = `#SNP`,
                  chromosome = CHR,
                  position = POS)

  snps_mapped = vautils::find_nearest_gene(as.data.frame(data),
                                           build=build,
                                           collapse=F,
                                           snp = "rsid",
                                           flanking=1000)


  if(any(snps_mapped$distance == "intergenic")) {
    snps_mapped = snps_mapped %>%
      mutate(distance = recode(distance, "intergenic" = "0"))
  }

  snps_mapped_filt = snps_mapped %>%
    mutate(distance = abs(as.numeric(distance))) %>%
    arrange(distance) %>%
    group_by(rsid) %>%
    filter(row_number() == 1) %>%
    ungroup()

  if (!is.null(custom_gene)) {
     custom_gene_filt = snps_mapped %>%
       dplyr::filter(GENE %in% custom_gene)
     # Remove the picked gene from vautils
     snps_mapped_filt = snps_mapped_filt %>%
       dplyr::filter(!((chromosome == custom_gene_filt$chromosome) & (position == custom_gene_filt$position)))

     snps_mapped_filt = rbind(snps_mapped_filt, custom_gene_filt)
  }

  snps_mapped_filt = data.frame(Locus = snps_mapped_filt$GENE,
                     Gene = snps_mapped_filt$GENE,
                     CHR = snps_mapped_filt$chromosome,
                     POS = snps_mapped_filt$position,
                     SNP = snps_mapped_filt$rsid)


  tmp_2 = data %>%
    dplyr::filter(rsid %in% snps_mapped_filt$SNP) %>%
    dplyr::select(SNP = rsid, Effect, P = Pval)
  mydf = snps_mapped_filt %>% inner_join(tmp_2)


  if (write.out) {
    fwrite(mydf, paste(.metadata_file[1],"top_SNPs.txt", sep = "/"),
           col.names = T,
           row.names = F,
           sep ="\t",
           quote = F)
  }
  return(mydf)
}