strug-hub / UKBB_spirometry_on_COPD

Colocalization analysis of Meconium Ileus Genome-Wide Association Study (GWAS) (Gong et al., 2019) signal around SLC26A9 with spirometry association analysis in the UK Biobank and Spirometa Consortium (Shrine et., 2019) around the same region to elucidate whether there is a shared genetic contribution to disease

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

This project contains the steps undertaken to perform association analyses of spirometry measures in the UK Biobank (UKBB) with variants around SLC26A9 gene for patients with spirometry-defined COPD with modified GOLD criteria 2-4 of moderate to very severe lung function. The definition here is relaxed in that measurements are not required to be post-bronchodilator measurements. According to Mannino and Buist, 2007, in instances where pre-bronchodilator lung function has been recorded, an overestimate of airflow obstruction may result.

Analysis

  1. 01-extract_phenos_of_interest.sh

    • input:
      • ukb24727.tab, which contains all phenotypic information from UKBB
    • output:
      • ukb24727_spirometry.tab, a smaller file containing the required variables only
  2. 02-subset_qc_copd_individuals.R

    • input:
      • ukb24727_spirometry.tab
    • output:
      • ukbb_spiro_and_geno_qc_v2.csv, which contains all the individuals passing spirometry and genotyping QC, and their spirometry measures (best FEV1, best FVC, and FEV1pp)
      • GOLD2-4_copd_ukbb_spirodata.csv, which is the subset of individuals from ukbb_spiro_and_geno_qc.csv that fit GOLD class 2-4 criteria for lung function (i.e. FEV1/FVC ratio < 0.7 and FEV1pp < 80%)

    This step removes individuals that did not pass spirometry and genotyping QC, removes related and non-European individuals, and calculates FEV1pp using the GLI calculator (Global Lung Function Initiative 2021, version 2.0). The procedure is similar to Shrine et al. 2019 with the exception of removal of related individuals, where KING's --unrelated option (v2.0) was used here to obtain the unrelated set, which results in the removal of 36,004 participants versus 1,165 in Shrine et al.. Another key difference is that we removed non-Europeans using UKBB's VariableID 22006, whereas Shrine et al. opted for a K-means clustering method. Taken together, Shrine et al.'s method yields 321,047 participants, whereas our method yields 263,461.

  3. 03-pca.sh

    • input: genotype array data for:
      • All individuals defined in GOLD2-4_copd_ukbb_spirodata.csv
      • All individuals defined in ukbb_spiro_and_geno_qc.csv
    • output:
      • 15-ukbb_copd_pcair_eigenvectors.txt for individuals corresponding to GOLD2-4_copd_ukbb_spirodata.csv
      • 18-ukbb_ukbbspiro_flashpca2_eigenvectors.txt for individuals corresponding to ukbb_spiro_and_geno_qc.csv

    FlashPCA v2.1 was used to calculate the principal components.

  4. 04-assoc_spirometry.R

    • inputs:
      • ukbb_spiro_and_geno_qc.csv
      • 15-ukbb_copd_pcair_eigenvectors.txt, derived from genotyped array data from UKBB using flashPCA2 (v2.1)
      • 18-ukbb_ukbbspiro_flashpca2_eigenvectors.txt, derived from genotyped array data from UKBB using flashPCA2 (v2.1)
      • ukb_imp_chr1_v3.bgen, which is the imputation dataset provided by UKBB
    • output:
      • ratio.irnt.assoc_v3.csv, association results for FEV1/FVC ratio among all 263,461 UKBB participants
      • fev1pp.irnt.assoc_v3.csv, association results for FEV1pp among all 263,461 UKBB participants
      • pef.irnt.assoc_v3.csv, association results for PEF among all 263,461 UKBB participants
      • ratio.irnt.assoc_copd_only.csv, association results for FEV1/FVC ratio among all UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)
      • fev1pp.irnt.assoc_copd_only.csv, association results for FEV1pp among all 263,461 UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)
      • pef.irnt.assoc_copd_only.csv, association results for PEF among all 263,461 UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)
      • hasCOPD.assoc.csv, case-control association analysis of spirometrically-defined COPD cases (GOLD2-4), against those with healthy lung function (22,071 cases versus 242,097 controls)
  5. merge_and_convert_to_html.py

    • inputs: the association files:
      • ratio.irnt.assoc.tsv
      • fev1pp.irnt.assoc.tsv
      • ratio.irnt.assoc_copd_only.tsv
      • fev1pp.irnt.assoc_copd_only.tsv
      • hasCOPD.assoc.tsv
    • output:

This step prepares the association results for loading as secondary datasets into LocusFocus

Results

The GWAS of Meconium Ileus (MI) at chr1:205,780,000-205,940,000 was tested for colocalization against the lung function phenotypic associations derived above, to test for the pleiotropic effects of this modifier locus of Cystic Fibrosis (CF) on lung function.

Colocalization was observed when the genome-wide associated peak was tested:

LocusFocus plot testing colocalization of PEF (peak expiratory flow) from Shrine et al. (2019) (shown as points and corresponding left y-axis) against GTEx V7 lung eQTL for SLC26A9 gene, Meconium Ileus GWAS, FEV11pp calculated from UKBB spirometry measures in all participants after QC as explained above, FEV1/FVC ratio also from Shrine et al. (2019) and COPD case/control study calculated in this project after QC (cases=22,071; controls=241,390). All secondary datasets are shown as lines traversing the lowest p-values per window (window=6.67Kbp) with corresponding right y-axis. The Simple Sum colocalization region tested (gray area) was selected to match the observed peak at chr1:205,899,000-205,925,000. A total of 85 SNPs in this region were used to test for colocalization using the Simple Sum method.

Colocalization results obtained are summarized in UKBB_spirometry_SS_pvalues.csv. In short, the MI GWAS colocalizes with FEV1/FVC ratio and PEF (peak expiratory flow) association studies, and this colocalization is found to be statistically significant after multiple testing correction (-log10P > 1.78).

Dataset Simple Sum colocalization -log10(P-value)
Meconium Ileus GWAS (N=6,770) 8.1
FEV1/FVC in Shrine et al. (N~396,686) 8.65
FEV1pp in UKBB (N=263,461) Did not pass first stage test
COPD case-control (N=263,461) Did not pass first stage test

About

Colocalization analysis of Meconium Ileus Genome-Wide Association Study (GWAS) (Gong et al., 2019) signal around SLC26A9 with spirometry association analysis in the UK Biobank and Spirometa Consortium (Shrine et., 2019) around the same region to elucidate whether there is a shared genetic contribution to disease

License:MIT License


Languages

Language:HTML 95.0%Language:R 3.4%Language:Python 0.9%Language:Shell 0.6%