mandawilson / CMOMerge

Merge two CMO projects from their portal repository folders

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMOMerge

Merge multiple CMO projects from their portal repository folders

Usage

CMOMerge/App.sh
usage: MergePortal [-h] [--tumorType TUMORTYPE] [--labName LABNAME]
                   [--projectNumber PROJECTNUMBER]
                   [--institutionName INSTITUTIONNAME]
                   [--mergeBatches MERGEBATCHES]
                   [--cnaGeneList CNAGENELIST]
                   [--project PROJECTPATH:CLINICALFILE] (:CLINICALFILE is optional)

For projects with no merging issues you should simply be able to do

./CMOMerge/App.sh \
    --project /path/to/portal/repo/batch1 \
    --project /path/to/portal/repo/batch2

and this will create a merged project in the _merge subfolder. E.g.:

./CMOMerge/App.sh
    --projectNumber 5529_mi \
    --project bic-mskcc/cecc/cmo/levine2/5529_b \
    --project bic-mskcc/cecc/mskcc/levine2/5529

will create:

_merge/cecc/cbe/levine2/5529_mi

  • --projectNumber is mandatory

  • For --tumorType, --labName and --institutionName; if all batches in merge have consistent data for these the program will select automatically. If there are discrepencies then they need to be specified explicitly. The program will print an error in these cases and exit.

Copy Number Gene Lists

If there is an incompatibility in the genes in the copy number data you will get an:

Inconsistent gene sets

The program will do a union merge which means the union of genes in all batches will be used in the merge and any batch that is missing a gene will get NA in the CNA table.

If you do not want this to happen, for example you prefer the intersection merge to be done, then you will then you need to specify a common gene set to use. This is done with the

--cnaGeneList <GENELISTNAME>

option and currently the following gene lists are available:

  • impact341.genes: The genes in IMPACT341
  • hemepactA.genes: The intersection of hemepactV2 and hemepactV3

About

Merge two CMO projects from their portal repository folders


Languages

Language:Python 92.0%Language:Shell 8.0%