Pedigree File

Family meta data file. Records important metrics for tracking samples and find biases in isolation of DNA or subsequent sequence analysis.

Among other things, the file enables:

  1. Automatic coverage specification (correct target file(s))
  2. Application of mendelian filtering models, e.g. autosomal dominant, based on pedigree, sex and disease status
  3. Collection of analysis info for the sequence analysis pipeline

MIP supports 2 file formats for pedigree metadata PLINK and YAML:

YAML

An example pedigree file with additional metadata can be found at metadata.yaml.

PLINK

The pedigree file format defined by PLINK, although we currently only support tab-sep pedigree files.

The first row should start with a “#” (hash) and contain relevant headers separated by tabs describing each column. The first six columns are mandatory. The name and order of the headers should follow:

Mandatory Columns
ColumnName Type Summary
familyID String Family identification number (mandatory)
sampleID String Sample identification number (mandatory)
father String Father identification number (mandatory)
mother String Mother identification number (mandatory)
sex String ‘1’=male ‘2’=female ‘other’=unknown (mandatory)
phenotype String ‘-9’=missing ‘0’=missing ‘1’=unaffected ‘2’=affected (mandatory)

In addition to these mandatory columns we use the pedigree file to record meta data on each individual. Entries within each column should be separated with ”;” (semi-colon) and entered in consecutive order. Each individual recorded in the pedigree file is written on one line and a tab should separate each entry. No individual should be recorded twice. The order of individuals below the header line does not matter.

If there is no information on the parents or the grandparents they should be encoded as “0”.

An example pedigree file can be found here.

The pedigree file should named: <FDN>_pedigree.txt.

Additional columns in the pedigree file
ColumnName Default Value Type Summary
CMMSID Na String The clinics identification number for the individual
Tissue_origin Na String Tissue of Isolation (DNA/RNA)
Isolation_kit Na String Kit used to isolate nucleic acids
Isolation_date Na Integer Date of performing isolation of nucleic acids
Isolation_personnel Na String Personnel performing isolation of nucleic acids
Medical_doctor Na String Responsible clinician(s)
Inheritance_model Na String Probable disease genetic model inheritance within pedigree
Phenotype_terms Na String Phenotypic terms associated with the disorder
CMMS_seqID Na String Batch identification
SciLifeID Na String ScilifeLab identification
Capture_kit Na String Capture kit used in library preparation
Capture_date Na Integer Date of performing capture procedure
Capture_personnel Na String Personnel performing capture procedure
Clustering_date Na Integer Date of clustering
Sequencing_kit Na String Sequencing kit
Clinical_db dbCMMS String The clinical database
Clinical_db_gene_annotation IEM String Genes associated with a disease group within the clinical database
Sequencing_type Na String Type of sequencing performed

Pedigree capture kits aliases

  • Agilent Sure Select
    • Agilent_SureSelect.V2 => Agilent_SureSelect.V2.GenomeReferenceSourceVersion_targets.bed
    • Agilent_SureSelect.V3 => Agilent_SureSelect.V3.GenomeReferenceSourceVersion_targets.bed
    • Agilent_SureSelect.V4 => Agilent_SureSelect.V4.GenomeReferenceSourceVersion_targets.bed
    • Agilent_SureSelect.V5 => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
    • Agilent_SureSelectCRE.V1 => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
    • Agilent_SureSelectFocusedExome.V1 => Agilent_SureSelectFocusedExome.V1.GenomeReferenceSourceVersion_targets.bed
    • Latest => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
  • NimbleGen
    • Nimblegen_SeqCapEZExome.V2 => Nimblegen_SeqCapEZExome.V2.GenomeReferenceSourceVersion_targets.bed
    • Nimblegen_SeqCapEZExome.V3 => Nimblegen_SeqCapEZExome.V3.GenomeReferenceSourceVersion_targets.bed

Note

You can use other target region files with MIP but then you have to supply the complete filename with ”.bed” ending.

Abbrevations

Abbreviation Explation
FDN Family ID
CMMSID The CMMS sampleID
CMMS SeqID BatchID e.g. WES8
SciLifeID The id tag provided by Science for Life Laboratory
AR Autosomal recessive
AD Autosomal dominant