Setup

Filename convention

The permanent filename should follow the following format:

{LANE}_{DATE}_{FLOW CELL}_{IDN}_{BARCODE SEQ}_{DIRECTION 1/2}.fastq.qz

Note

The familyID and sampleID(s) needs to be unique and the sampleID supplied should be equal to the {IDN} in the filename.

However, MIP will except filenames in other formats as long as the filename contains the sampleID and the mandatory information can be collected from the fastq header.

Dependencies

Make sure you have loaded/installed all dependencies and that they are in your $PATH. You only need to load the dependencies that are required for the modules that you want to run. If you fail to install dependencies for a module, MIP will tell you what dependencies you need to install (or add to your $PATH) and exit. MIP comes with an install script install.pl, which will install all necessary programs to execute models in MIP via bioconda and/or $SHELL. Version after the software name are tested for compatibility with MIP.

Program/Modules

  • Perl modules: YAML.pm, Log4perl.pm, List::MoreUtils, DateTime, DateTime::Format::ISO8601, DateTime::Format::HTTP, DateTime::Format::Mail, Set::IntervalTree from CPAN, since these are not included in the perl standard distribution
  • Simple Linux Utility for Resource Management (SLURM)
  • FastQC (version: 0.11.5)
  • Mosaik (version: 2.2.24)
  • BWA (version: 0.7.15)
  • BWAKit (version: 0.7.12)
  • Sambamba (version: 0.6.3)
  • SAMTools (version: 1.3.1)
  • BedTools (version: 2.26.0)
  • PicardTools (version: 2.5.0)
  • Chanjo (version: 3.4.1)
  • Manta (version: 1.0.0)
  • GATK (version: 3.6)
  • freebayes (version: 1.0.2)
  • VT (version: 20151110)
  • VEP (version: 84) with plugin “UpDownDistance, LoFtool, LoF”
  • vcfParser.pl (Supplied with MIP; see vcfParser)
  • SnpEff (4.2)
  • ANNOVAR (version: 2013-08-23)
  • GENMOD (version: 3.5.6)
  • variant_integrity (version: 0.0.4)
  • VcfTools (version: 0.1.0)
  • BcfTools (version: 1.3.1)
  • Htslib (version: 1.3.1)
  • PLINK2 (version: 1.90b3x35)
  • MultiQC (version: 0.8dev0)

Depending on what programs you include in the MIP analysis you also need to add these programs to your $PATH:

  • FastQC
  • Mosaik
  • BWA
  • SAMTools
  • Tabix
  • BedTools
  • VcfTools
  • PLINK

and these to your python virtualenvironment:

  • Chanjo
  • GENMOD
  • Cosmid (version: 0.4.9.1) for automatic download

To make sure that you use the same commands to work on the virtualenvironment, you need to install a virtual environment wrapper. We recommend pyenv and pyenv-virtualenvwrapper. To enable the virualenvwrapper add: pyenv virtualenvwrapper to your ~/.bash_profile.

Databases/References

Please checkout Cosmid to download references and/or databases on your own or via MIP.

MIP can build/download many program prerequisites automatically:

Note

Download is only enabled when using the default parameters of MIP and requires a Cosmid installation in your python virtualenvironment.

Automatic Download:

  1. Human Decoy Genome Reference (1000G)
  2. The Consensus Coding Sequence project database (CCDS)
  3. Relevant references from the 1000G FTP Bundle (mills, omni, dbsnp etc)

Automatic Build:

Human Genome Reference Meta Files:
  1. The sequence dictionnary (”.dict”)
  2. The ”.fasta.fai” file
Mosaik:
  1. The Mosaik align format of the human genome {mosaikAlignReference}.
  2. The Mosaik align jump database {mosaikJumpDbStub}.
  3. The Mosaik align network files {mosaikAlignNeuralNetworkPeFile} and {mosaikAlignNeuralNetworkSeFile}. These will be copied from your MOSAIK installation to the MIP reference directory.
BWA:
  1. The BWA index of the human genome.

Note

If you do not supply these parameters (Mosaik/BWA) MIP will create these from scratch using the supplied human reference genom as template.

Capture target files:
  1. The “infile_list” and .pad100.infile_list files used in {pPicardToolsCalculateHSMetrics}
  2. The ”.pad100.interval_list” file used by some GATK modules.

Note

If you do not supply these parameters MIP will create these from scratch using the supplied latest supported capture kit ”.bed” file and the supplied human reference genome as template.

ANNOVAR: The choosen Annovar databases are downloaded before use if lacking in the annovar/humandb directory using Annovars built-in download function.

Note

This applies only to the supported annovar databases. Supply flag “–annovarSupportedTableNames” to list the MIP supported databases.