sushilashenoy / annotate-family-pairs

R script which can label relationships in a pedigree

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

annotate-family-pairs

R script which can label relationships in a pedigree

Basic usage

To generate a list of all pairs of individuals and their relationships in a plink .fam format file pedigree.fam and label each pair with its relationship, the following is sufficient:

./annotate_family_pairs.R -i pedigree.fam

This will generate a tab-delimited text file called pedigree.pairs with the following columns:

  • fam.id - FID from input file
  • id1 - IID from input file
  • id2 - another IID from the input file with the same FID as id1
  • degree - degree of relationship
  • relationship - text description of relationship (e.g. "full sibling", "parent/offspring", "great-grand avuncular", "half fifth cousin thrice removed")
  • shortrel - abbreviated description of relationship (e.g. "FS", "PO", "GGAV", "H5C3R")

Advanced usage

Usage: ./annotate_family_pairs.R [-[-help|h]] [-[-input|i] <character>] [-[-output|o] [<character>]]
       [-[-nthreads|n] [<integer>]] [-[-multi|m]] [-[-extra|x]] [-[-redo-fam|r]] [-[-parents|p]]
    -h|--help        print usage information
    -i|--input       input file (.fam) or prefix
    -o|--output      output file (default = ./basename(input).pairs)
    -n|--nthreads    number of threads to use for calculation
    -m|--multi       Speeds up computation when the input .fam contains multiple families with the same family
                     structure (e.g. output from [ped-sim](https://github.com/williamslab/ped-sim) to
                     speed up calculations
    -x|--extra       Include extra columns (d1/d2/a) in output
    -r|--redo-fam    Ignore the existing fam IDs & redefine families based on individuals connected via
                     paternal or maternal ID
    -p|--parents     Include individuals who only appear as mothers or fathers in the .fam file

Additional details

  • -i/--input <character> (Required) The script will load pedigree information from this file. The file is expected to have a .fam suffix, if the provided name does not include this it will be added.
  • -o|--output <character> If supplied, the script will write output to this file. If not supplied, it will use <input prefix>.pairs
  • -n|--nthreads <integer> The script will attempt to analyse multiple families in parallel using this number of threads. This uses the parallel R package, specifically the mclapply() function.
  • -m|--multi This option is mainly useful for pedigrees generated by ped-sim, in which multiple families of identical structure can be simulated, with names that are identical except for an integer suffix.
  • -x|--extra This option adds three columns to the output. These numbers are used internally to correctly label relationships and may be of interest if you need to determine which individual is which in an asymmetric relationship, or if the pedigree includes an unusual relationship that cannot be labeled such as one of non-integer degree.
    • d1 - meiotic distance between id1 and most recent common ancestors with id2
    • d2 - meiotic distance between id2 and most recent common ancestors with id1
    • a - number of most recent common ancestors
  • -r|--redo-fam When this option is used, the FIDs in the input file are ignored and new family IDs are inferred from maternal (MID) and paternal (PID) relationships alone. This can be useful if there are individuals with the same FID who are not actually connected within the pedigree.
  • -p|--parents By default, only IDs which appear in the IID column of the input file are included. With this option, individuals who only appear as parents (MID/PID) in the pedigree are also included in the list of relationships.

Requirements

The script requires R to be installed as well as several R packages:

  • getopt - Used to parse command line options
  • igraph - Used to generate graph of pedigrees
  • (optional) parallel - Used to run on multiple CPUs on some platforms.

To use the script on the command line as described above it will be necessary to give it executable privileges (i.e. chmod +x annotate_family_pairs.R)

About

R script which can label relationships in a pedigree

License:GNU General Public License v3.0


Languages

Language:R 100.0%