Pedigree File¶
Family meta data file. Records important metrics for tracking samples and find biases in isolation of DNA or subsequent sequence analysis.
Among other things, the file enables:
- Automatic coverage specification (correct target file(s))
- Application of mendelian filtering models, e.g. autosomal dominant, based on pedigree, sex and disease status
- Collection of analysis info for the sequence analysis pipeline
MIP supports 2 file formats for pedigree metadata PLINK and YAML:
YAML
An example pedigree file with additional metadata can be found at metadata.yaml.
PLINK
The pedigree file format defined by PLINK, although we currently only support tab-sep pedigree files.
The first row should start with a “#” (hash) and contain relevant headers separated by tabs describing each column. The first six columns are mandatory. The name and order of the headers should follow:
ColumnName | Type | Summary |
---|---|---|
familyID | String |
Family identification number (mandatory) |
sampleID | String |
Sample identification number (mandatory) |
father | String |
Father identification number (mandatory) |
mother | String |
Mother identification number (mandatory) |
sex | String |
‘1’=male ‘2’=female ‘other’=unknown (mandatory) |
phenotype | String |
‘-9’=missing ‘0’=missing ‘1’=unaffected ‘2’=affected (mandatory) |
In addition to these mandatory columns we use the pedigree file to record meta data on each individual. Entries within each column should be separated with ”;” (semi-colon) and entered in consecutive order. Each individual recorded in the pedigree file is written on one line and a tab should separate each entry. No individual should be recorded twice. The order of individuals below the header line does not matter.
If there is no information on the parents or the grandparents they should be encoded as “0”.
An example pedigree file can be found here.
The pedigree file should named: <FDN>_pedigree.txt
.
ColumnName | Default Value | Type | Summary |
---|---|---|---|
CMMSID | Na | String |
The clinics identification number for the individual |
Tissue_origin | Na | String |
Tissue of Isolation (DNA/RNA) |
Isolation_kit | Na | String |
Kit used to isolate nucleic acids |
Isolation_date | Na | Integer |
Date of performing isolation of nucleic acids |
Isolation_personnel | Na | String |
Personnel performing isolation of nucleic acids |
Medical_doctor | Na | String |
Responsible clinician(s) |
Inheritance_model | Na | String |
Probable disease genetic model inheritance within pedigree |
Phenotype_terms | Na | String |
Phenotypic terms associated with the disorder |
CMMS_seqID | Na | String |
Batch identification |
SciLifeID | Na | String |
ScilifeLab identification |
Capture_kit | Na | String |
Capture kit used in library preparation |
Capture_date | Na | Integer |
Date of performing capture procedure |
Capture_personnel | Na | String |
Personnel performing capture procedure |
Clustering_date | Na | Integer |
Date of clustering |
Sequencing_kit | Na | String |
Sequencing kit |
Clinical_db | dbCMMS | String |
The clinical database |
Clinical_db_gene_annotation | IEM | String |
Genes associated with a disease group within the clinical database |
Sequencing_type | Na | String |
Type of sequencing performed |
Pedigree capture kits aliases¶
- Agilent Sure Select
- Agilent_SureSelect.V2 => Agilent_SureSelect.V2.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V3 => Agilent_SureSelect.V3.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V4 => Agilent_SureSelect.V4.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelect.V5 => Agilent_SureSelect.V5.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelectCRE.V1 => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
- Agilent_SureSelectFocusedExome.V1 => Agilent_SureSelectFocusedExome.V1.GenomeReferenceSourceVersion_targets.bed
- Latest => Agilent_SureSelectCRE.V1.GenomeReferenceSourceVersion_targets.bed
- NimbleGen
- Nimblegen_SeqCapEZExome.V2 => Nimblegen_SeqCapEZExome.V2.GenomeReferenceSourceVersion_targets.bed
- Nimblegen_SeqCapEZExome.V3 => Nimblegen_SeqCapEZExome.V3.GenomeReferenceSourceVersion_targets.bed
Note
You can use other target region files with MIP but then you have to supply the complete filename with ”.bed” ending.
Abbrevations¶
Abbreviation | Explation |
---|---|
FDN | Family ID |
CMMSID | The CMMS sampleID |
CMMS SeqID | BatchID e.g. WES8 |
SciLifeID | The id tag provided by Science for Life Laboratory |
AR | Autosomal recessive |
AD | Autosomal dominant |