bschilder / ThreeWayTest

Summary statistics-based association test for identifying the pleiotropic effects with set of genetic variants

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide full data

bschilder opened this issue · comments

From your notes:

Note that the full final_1kg_genotype_correct.Rdat is not uploaded because of the size. Here we upload one part-version of this data final_1kg_genotype_correct_sub.Rdata and a document on how to access the full data.

There's several ways to circumvent this issue. They basically all involve using functions to download the large data and then caching it somewhere locally on the user's computer.

With piggyback

I personally find it easiest to use piggyback, which as the added benefit of keeping you data in the same GitHub repo as your package (makes it easier to keep track of).

Other approaches

Data hosting

You can also host the data elsewhere and write your own downloader function. Here's a couple of free hosting options for scientific data:

Data caching

Here's several ways/locations you can cache data in R:

  • tools::R_user_dir("ThreeWayTest"): Points to a package specific cache directory.
  • BiocFileCache: Bioconductor's dedicated package for caching files.

@ftdbdl has uploaded the full data to GitHub Releases using piggyback:
final_1kg_genotype_correct.Rdata

https://github.com/bschilder/ThreeWayTest/releases

I'll add a convenience function for downloading and caching the data locally.

Thank you so much! Please @ me if there are any further changes you need me to make!

I've just reuploaded the genotype data in .rds format as this is more flexible, since loading .rda objects forces you to assign it as a global variable without control over the name of that variable.

The full genotype data (final_1kg_genotype_correct.rds) can now be downloaded and cached with the function:

ThreeWayTest::get_full_genotype()