Setup¶
Filename convention¶
The permanent filename should follow the following format:
{LANE}_{DATE}_{FLOW CELL}_{IDN}_{BARCODE SEQ}_{DIRECTION 1/2}.fastq.qz
Note
The familyID and sampleID(s) needs to be unique and the sampleID supplied should be equal to the {IDN} in the filename.
However, MIP will except filenames in other formats as long as the filename contains the sampleID and the mandatory information can be collected from the fastq header.
Dependencies¶
Make sure you have loaded/installed all dependencies and that they are in your $PATH
.
You only need to load the dependencies that are required for the modules that you want to
run. If you fail to install dependencies for a module, MIP will tell you what dependencies
you need to install (or add to your $PATH
) and exit. MIP comes with an install script install.pl
,
which will install all necessary programs to execute models in MIP via bioconda and/or $SHELL.
Version after the software name are tested for compatibility with MIP.
Program/Modules
- Perl modules: YAML.pm, Log4perl.pm, List::MoreUtils, DateTime, DateTime::Format::ISO8601, DateTime::Format::HTTP, DateTime::Format::Mail, Set::IntervalTree from CPAN, since these are not included in the perl standard distribution
- Simple Linux Utility for Resource Management (SLURM)
- FastQC (version: 0.11.5)
- Mosaik (version: 2.2.24)
- BWA (version: 0.7.15)
- BWAKit (version: 0.7.12)
- Sambamba (version: 0.6.3)
- SAMTools (version: 1.3.1)
- BedTools (version: 2.26.0)
- PicardTools (version: 2.5.0)
- Chanjo (version: 3.4.1)
- Manta (version: 1.0.0)
- GATK (version: 3.6)
- freebayes (version: 1.0.2)
- VT (version: 20151110)
- VEP (version: 84) with plugin “UpDownDistance, LoFtool, LoF”
- vcfParser.pl (Supplied with MIP; see vcfParser)
- SnpEff (4.2)
- ANNOVAR (version: 2013-08-23)
- GENMOD (version: 3.5.6)
- variant_integrity (version: 0.0.4)
- VcfTools (version: 0.1.0)
- BcfTools (version: 1.3.1)
- Htslib (version: 1.3.1)
- PLINK2 (version: 1.90b3x35)
- MultiQC (version: 0.8dev0)
Depending on what programs you include in the MIP analysis you also need to add
these programs to your $PATH
:
- FastQC
- Mosaik
- BWA
- SAMTools
- Tabix
- BedTools
- VcfTools
- PLINK
and these to your python virtualenvironment
:
- Chanjo
- GENMOD
- Cosmid (version: 0.4.9.1) for automatic download
To make sure that you use the same commands to work on the virtualenvironment, you need to
install a virtual environment wrapper. We recommend pyenv and pyenv-virtualenvwrapper.
To enable the virualenvwrapper add: pyenv virtualenvwrapper
to your ~/.bash_profile
.
Databases/References¶
Please checkout Cosmid to download references and/or databases on your own or via MIP.
MIP can build/download many program prerequisites automatically:
Note
Download is only enabled when using the default parameters of MIP and requires a Cosmid installation in your python virtualenvironment.
Automatic Download:
- Human Decoy Genome Reference (1000G)
- The Consensus Coding Sequence project database (CCDS)
- Relevant references from the 1000G FTP Bundle (mills, omni, dbsnp etc)
Automatic Build:
- Human Genome Reference Meta Files:
- The sequence dictionnary (”.dict”)
- The ”.fasta.fai” file
- Mosaik:
- The Mosaik align format of the human genome {mosaikAlignReference}.
- The Mosaik align jump database {mosaikJumpDbStub}.
- The Mosaik align network files {mosaikAlignNeuralNetworkPeFile} and {mosaikAlignNeuralNetworkSeFile}. These will be copied from your MOSAIK installation to the MIP reference directory.
- BWA:
- The BWA index of the human genome.
Note
If you do not supply these parameters (Mosaik/BWA) MIP will create these from scratch using the supplied human reference genom as template.
- Capture target files:
- The “infile_list” and .pad100.infile_list files used in {pPicardToolsCalculateHSMetrics}
- The ”.pad100.interval_list” file used by some GATK modules.
Note
If you do not supply these parameters MIP will create these from scratch using the supplied latest supported capture kit ”.bed” file and the supplied human reference genome as template.
ANNOVAR: The choosen Annovar databases are downloaded before use if lacking in the annovar/humandb directory using Annovars built-in download function.
Note
This applies only to the supported annovar databases. Supply flag “–annovarSupportedTableNames” to list the MIP supported databases.