The reGSusie
Nextflow pipeline is designed for conducting Genome-Wide Association Studies (GWAS) using a combination of regenie, LDstore2, and susieR. This pipeline offers a comprehensive and customizable solution for processing genetic association data and fine-mapping causal variants.
- Conduct association testing with regenie.
- Compute in-sample LD matrix with LDstore2.
- Perform statistical fine-mapping with susieR.
Before running the pipeline, ensure you have the following dependencies installed:
-
Clone the reGSusie GitHub repository and navigate to the directory:
git clone https://github.com/idarahu/reGSusie.git cd reGSusie
-
Customize the nextflow.config file based on your analysis requirements (see Pipeline Configuration below).
-
To execute the pipeline, use the
submit_slurm.sh
script. This script is configured to submit a Slurm job for pipeline execution. Here's how to run the pipeline:sbatch submit_slurm.sh
For specific parameters, file types, and other details needed for regenie, LDstore2, and susieR, please refer to their respective documentations.
full_pipeline
: If set tofalse
, only theLDstore
andSuSiE
components of the pipeline are executed.regenie_outputs
: A text file (.txt
) where each row corresponds to theregenie
output file that should be used as input forLDstore
andSuSiE
when the previous parameter was set tofalse
.
step1_input_format
: Select the input format forregenie
step 1:bed
,pgen
, orbgen
.step1_bed
: When selectingbed
as the input format, it assumes that the input folder contains the required files with the extensions.bed
,.bim
, and.fam
.step1_pgen
: When selectingpgen
as the input format, it assumes that the input folder contains the required files with the extensions.pgen
,.pvar
, and.psam
.step1_bgen
: When selectingbgen
as the input format, it assumes that the input folder contains the required files with the extensions.bgen
,.sample
, and.bgi
.
phenotype_file
: Path to the phenotype file.phenotype_list
: Path to a text file containing a comma-separated list of phenotypes.covariate_file
: Path to the covariate file.covariate_list
: A comma-separated list of covariates.bsize
: Size of the genotype blocks. The default value is 4000.
samples_to_keep
: Path to a file specifying samples to keep.minINFO
: Minimum imputation info score. The default value is 0.6.
All the following files for regenie step 2 must adhere to the format chr*.bgen
/chr*.sample
/chr*.bgen.bgi
, where *
represents the chromosome number.
step2_bgen
: Path to BGEN files for regenie step 2.step2_sample
: Path to sample files for regenie step 2.step2_bgi
: Path to BGI files for regenie step 2.
p_value
: p-value (default is 5e-08) for defining the genome-wide significant locus.window_size
: Half of the size of the fine-mapping region. The default value is 1500000, which corresponds to a 3 Mb total window around a lead variant.max_region_width
: Maximal fine-mapping region width (default is 6 Mb).window_shrink_ratio
: Shrink ratio (default is 0.95) applied when regions exceed the maximal width (original regions are reduced recursively).bgen_chr_has_zero
: Should be set toT
(true) if chromosomes1..9
are given as01..09
in bgens.remove_MHC
: If set toT
(true), then the histocompatibility complex (MHC) region is excluded from the analysis.GRCh
: Select the Right GRCh (choose either37
or38
for the GRCh).MHC_start
andMHC_end
: MHC Regions in GRCh37 and GRCh38 (specify the MHC region coordinates).
samples_ld_incl
: Path to the text file with samples that should be included.
n_covariates
: Number of covariates used in the analysis.max_causal_SNPs
: Number of maximum causal variants (default is 10).
outdir
: Directory where results are stored (default is 'results').prefix1
: Prefix for regenie results (default is 'regenie').prefix2
: Prefix for fine-mapping results (default is 'finemapping').
excutor
: Process Executorqueue
: Process Queue
- Include the
base.config
file.
- Singularity is enabled.
- Auto-mounts are enabled.
- Cache directory:
"$baseDir/singularity_img/"
check_max
: A function to check and validate max memory, time, and CPU parameters.