Setup¶
Filename convention¶
The permanent filename should follow the following format:
{LANE}_{DATE}_{FLOW CELL}_{IDN}_{BARCODE SEQ}_{DIRECTION 1/2}.fastq.qz
Note
The familyID and sampleID(s) needs to be unique and the sampleID supplied should be equal to the {IDN} in the filename.
Dependencies¶
Make sure you have loaded/installed all dependencies and that they are in your $PATH
.
You only need to load the dependencies that are required for the modules that you want to
run. If you fail to install dependencies for a module, MIP will tell you what dependencies
you need to install (or add to your $PATH
) and exit. Version after the software name
are tested for compatibility with MIP.
Program/Modules
- Perl YAML.pm module and Log::Log4perl.pm since this is not included in the Perl standard distribution
- Simple Linux Utility for Resource Management (SLURM)
- FastQC (version: 0.11.2)
- Mosaik (version: 2.2.24)
- BWA (version: 0.7.10)
- SAMTools (version: 1.1)
- BedTools (version: 2.20.1)
- PicardTools (version: 1.125)
- Chanjo (version: 2.3.0)
- GATK (version: 3.3-0)
- VEP (version: 76)
- vcfParser.pl (Supplied with MIP; see vcfParser)
- SnpEff (4.0)
- ANNOVAR (version: 2013-08-23)
- GENMOD (version: 1.7.7)
- Score_mip_variants (version: 0.5.4)
- VcfTools (version: 0.1.12b)
- PLINK (version: 1.07)
Depending on what programs you include in the MIP analysis you also need to add
these programs to your $PATH
:
- FastQC
- Mosaik
- BWA
- SAMTools
- Tabix
- BedTools
- VcfTools
- PLINK
and these to your python virtualenvironment
:
- Chanjo
- GENMOD
- Score_mip_variants
- Cosmid (version: 0.4.9.1) for automatic download
To make sure that you use the same commands to work on the virtualenvironment, you need to
install a virtual environment wrapper. We recommend pyenv and pyenv-virtualenvwrapper.
To enable the virualenvwrapper add: pyenv virtualenvwrapper
to your ~/.bash_profile
.
Databases/References¶
Please checkout Cosmid to download references and/or databases on your own or via MIP.
MIP can build/download many program prerequisites automatically:
Note
Download is only enabled when using the default parameters of MIP and requires a Cosmid installation in your python virtualenvironment.
Automatic Download:
- Human Decoy Genome Reference (1000G)
- The Consensus Coding Sequence project database (CCDS)
- Relevant references from the 1000G FTP Bundle (mills, omni, dbsnp etc)
Automatic Build:
- Human Genome Reference Meta Files:
- The sequence dictionnary (”.dict”)
- The ”.fasta.fai” file
- Mosaik:
- The Mosaik align format of the human genome {mosaikAlignReference}.
- The Mosaik align jump database {mosaikJumpDbStub}.
- The Mosaik align network files {mosaikAlignNeuralNetworkPeFile} and {mosaikAlignNeuralNetworkSeFile}. These will be copied from your MOSAIK installation to the MIP reference directory.
- BWA:
- The BWA index of the human genome.
Note
If you do not supply these parameters (Mosaik/BWA) MIP will create these from scratch using the supplied human reference genom as template.
- Capture target files:
- The “infile_list” and .pad100.infile_list files used in {pPicardToolsCalculateHSMetrics}
- The ”.pad100.interval_list” file used in by some GATK modules.
Note
If you do not supply these parameters MIP will create these from scratch using the supplied latest supported capture kit ”.bed” file and the supplied human reference genome as template.
ANNOVAR: The choosen Annovar databases are downloaded before use if lacking in the annovar/humandb directory using Annovars built-in download function.
Note
This applies only to the supported annovar databases. Supply flag “–annovarSupportedTableNames 1” to list the MIP supported databases.