miltondp / DETECT

DETECT: DiseasE Trajectory fEature extraCTion method

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Set-Up:
    - your own Snakefile with the following first line:

    include: "/path/to/DETECT/Snakefile" # this imports the rules from this module

    - config file
    - outcome code lists
    - input date matrices indicated by config file


Input (Date of Onset) Matrix Format:
    - .csv
    - column names = IID, code1, code2, etc...
    - row names = person1, person2, etc...
    - cell values - decimal value with unit YEARS relative to some date (we used the linux epoch by default)

Running:
    - in a python 3.8 environment with pandas, numpy, scipy, matplotlib, and snakemake>=7.8.3
    - for dry-run > snakemake -n {OUTPUT} # {OUTPUT} is the target file you're asking for
    - example: lsf queueing/cluster system profile > snakemake --profile lsf {OUTPUT} # {OUTPUT} is the target file you're asking for
    - A shortcut to asking for multiple output files would be to add a custom all rule to your Snakefile - example:

rule all:
    input:
        'Summary/positive_phecode_tests.csv',
        'Summary/positive_icd_tests.csv,
        expand('Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv', outcome=[c.replace('*', '') for c in open(config['icd_code_list']).read().splitlines()], dataset=[k for k, v in config['icd_date_matrices'].items()], index=config['icd_index_code_list'])),
        'Trajectory_Times/E11_I21._icd_4dig.all_signpost_times.csv.gz' # particular interest in diabetes -> MI

    - to use your custom all rule > snakemake all --profile lsf

Suggested Target outputs:
    - All phecode tests with RR > 1 = 'Summary/positive_phecode_tests.csv'
    - All ICD tests with RR > 1 = 'Summary/positive_icd_tests.csv'
    - Individual-Level Trajectories for one Phecode test = 'Trajectories/trajectories_{index}_{outcome}_phecode_{dataset}.csv'
    - Individual-Level Trajectories for one ICD Code test = 'Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv'

Aggregated Target outputs:
    - All Individual-Level Trajectories for Phecodes = expand('Trajectories/trajectories_{index}_{outcome}_phecode_{dataset}.csv', outcome=[c.replace('*', '') for c in open(config['phecode_list']).read().splitlines()], dataset=[k for k, v in config['phecode_date_matrices'].items()], index=config['phecode_index_code_list'])
    - All Individual-Level Trajectories for ICD codes = expand('Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv', outcome=[c.replace('*', '') for c in open(config['icd_code_list']).read().splitlines()], dataset=[k for k, v in config['icd_date_matrices'].items()], index=config['icd_index_code_list'])

Plotting/Detailed outputs:
    - Trajectory timing information for selected index/outcome combinations = 'Trajectory_Times/{index}_{outcome}_icd_4dig.all_signpost_times.csv.gz'

About

DETECT: DiseasE Trajectory fEature extraCTion method


Languages

Language:Python 100.0%