R script which can label relationships in a pedigree
To generate a list of all pairs of individuals and their relationships in a plink .fam
format file pedigree.fam
and label each pair with its relationship, the following is sufficient:
./annotate_family_pairs.R -i pedigree.fam
This will generate a tab-delimited text file called pedigree.pairs
with the following columns:
fam.id
- FID from input fileid1
- IID from input fileid2
- another IID from the input file with the same FID asid1
degree
- degree of relationshiprelationship
- text description of relationship (e.g. "full sibling", "parent/offspring", "great-grand avuncular", "half fifth cousin thrice removed")shortrel
- abbreviated description of relationship (e.g. "FS", "PO", "GGAV", "H5C3R")
Usage: ./annotate_family_pairs.R [-[-help|h]] [-[-input|i] <character>] [-[-output|o] [<character>]]
[-[-nthreads|n] [<integer>]] [-[-multi|m]] [-[-extra|x]] [-[-redo-fam|r]] [-[-parents|p]]
-h|--help print usage information
-i|--input input file (.fam) or prefix
-o|--output output file (default = ./basename(input).pairs)
-n|--nthreads number of threads to use for calculation
-m|--multi Speeds up computation when the input .fam contains multiple families with the same family
structure (e.g. output from [ped-sim](https://github.com/williamslab/ped-sim) to
speed up calculations
-x|--extra Include extra columns (d1/d2/a) in output
-r|--redo-fam Ignore the existing fam IDs & redefine families based on individuals connected via
paternal or maternal ID
-p|--parents Include individuals who only appear as mothers or fathers in the .fam file
-i/--input <character>
(Required) The script will load pedigree information from this file. The file is expected to have a.fam
suffix, if the provided name does not include this it will be added.-o|--output <character>
If supplied, the script will write output to this file. If not supplied, it will use<input prefix>.pairs
-n|--nthreads <integer>
The script will attempt to analyse multiple families in parallel using this number of threads. This uses theparallel
R
package, specifically themclapply()
function.-m|--multi
This option is mainly useful for pedigrees generated by ped-sim, in which multiple families of identical structure can be simulated, with names that are identical except for an integer suffix.-x|--extra
This option adds three columns to the output. These numbers are used internally to correctly label relationships and may be of interest if you need to determine which individual is which in an asymmetric relationship, or if the pedigree includes an unusual relationship that cannot be labeled such as one of non-integer degree.d1
- meiotic distance betweenid1
and most recent common ancestors withid2
d2
- meiotic distance betweenid2
and most recent common ancestors withid1
a
- number of most recent common ancestors
-r|--redo-fam
When this option is used, the FIDs in the input file are ignored and new family IDs are inferred from maternal (MID) and paternal (PID) relationships alone. This can be useful if there are individuals with the same FID who are not actually connected within the pedigree.-p|--parents
By default, only IDs which appear in the IID column of the input file are included. With this option, individuals who only appear as parents (MID/PID) in the pedigree are also included in the list of relationships.
The script requires R
to be installed as well as several R
packages:
getopt
- Used to parse command line optionsigraph
- Used to generate graph of pedigrees- (optional)
parallel
- Used to run on multiple CPUs on some platforms.
To use the script on the command line as described above it will be necessary to give it executable privileges (i.e. chmod +x annotate_family_pairs.R
)