Pedigree File¶

Family meta data file. Records important metrics for tracking samples and find biases in isolation of DNA or subsequent sequence analysis.

Among other things, the file enables:

Automatic coverage specification (correct target file(s))
Application of mendelian filtering models, e.g. autosomal dominant, based on pedigree, sex and disease status
Collection of analysis info for the sequence analysis pipeline

MIP supports 2 file formats for pedigree metadata PLINK and YAML:

YAML

An example pedigree file with additional metadata can be found at metadata.yaml.

PLINK

The pedigree file format defined by PLINK, although we currently only support tab-sep pedigree files.

The first row should start with a “#” (hash) and contain relevant headers separated by tabs describing each column. The first six columns are mandatory. The name and order of the headers should follow:

Mandatory Columns¶
ColumnName	Type	Summary
familyID	`String`	Family identification number (mandatory)
sampleID	`String`	Sample identification number (mandatory)
father	`String`	Father identification number (mandatory)
mother	`String`	Mother identification number (mandatory)
sex	`String`	‘1’=male ‘2’=female ‘other’=unknown (mandatory)
phenotype	`String`	‘-9’=missing ‘0’=missing ‘1’=unaffected ‘2’=affected (mandatory)

In addition to these mandatory columns we use the pedigree file to record meta data on each individual. Entries within each column should be separated with ”;” (semi-colon) and entered in consecutive order. Each individual recorded in the pedigree file is written on one line and a tab should separate each entry. No individual should be recorded twice. The order of individuals below the header line does not matter.

If there is no information on the parents or the grandparents they should be encoded as “0”.

An example pedigree file can be found here.

The pedigree file should named: <FDN>_pedigree.txt.

Additional columns in the pedigree file¶
ColumnName	Default Value	Type	Summary
CMMSID	Na	`String`	The clinics identification number for the individual
Tissue_origin	Na	`String`	Tissue of Isolation (DNA/RNA)
Isolation_kit	Na	`String`	Kit used to isolate nucleic acids
Isolation_date	Na	`Integer`	Date of performing isolation of nucleic acids
Isolation_personnel	Na	`String`	Personnel performing isolation of nucleic acids
Medical_doctor	Na	`String`	Responsible clinician(s)
Inheritance_model	Na	`String`	Probable disease genetic model inheritance within pedigree
Phenotype_terms	Na	`String`	Phenotypic terms associated with the disorder
CMMS_seqID	Na	`String`	Batch identification
SciLifeID	Na	`String`	ScilifeLab identification
Capture_kit	Na	`String`	Capture kit used in library preparation
Capture_date	Na	`Integer`	Date of performing capture procedure
Capture_personnel	Na	`String`	Personnel performing capture procedure
Clustering_date	Na	`Integer`	Date of clustering
Sequencing_kit	Na	`String`	Sequencing kit
Clinical_db	dbCMMS	`String`	The clinical database
Clinical_db_gene_annotation	IEM	`String`	Genes associated with a disease group within the clinical database
Sequencing_type	Na	`String`	Type of sequencing performed

Pedigree capture kits aliases¶

Agilent Sure Select
- Agilent_SureSelect.V2 => Agilent_SureSelect.V2.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V3 => Agilent_SureSelect.V3.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V4 => Agilent_SureSelect.V4.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V5 => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelectCRE.V1 => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelectFocusedExome.V1 => Agilent_SureSelectFocusedExome.V1.GenomeReferenceSourceVersion_targets.bed
- Latest => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
NimbleGen
- Nimblegen_SeqCapEZExome.V2 => Nimblegen_SeqCapEZExome.V2.GenomeReferenceSourceVersion_targets.bed
- Nimblegen_SeqCapEZExome.V3 => Nimblegen_SeqCapEZExome.V3.GenomeReferenceSourceVersion_targets.bed

Note

You can use other target region files with MIP but then you have to supply the complete filename with ”.bed” ending.

Abbrevations¶

Abbreviation	Explation
FDN	Family ID
CMMSID	The CMMS sampleID
CMMS SeqID	BatchID e.g. WES8
SciLifeID	The id tag provided by Science for Life Laboratory
AR	Autosomal recessive
AD	Autosomal dominant

Pedigree File¶

Pedigree capture kits aliases¶

Abbrevations¶

Table Of Contents

Related Topics

This Page