Configuration¶
Example files¶
Basic options¶
input¶
The path to a file specifying the input data sets. The software come with an example file named “input.yaml”.
Example: “data/input.yml”.
genome¶
Complete genome in a SINGLE plain FASTA file. Genome can be downloaded from Gencode (human and mouse), UCSC or ENSEMBL. Example link for human: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz. Important: Remember to ungzip the file using “gzip -d hg38.fa.gz”! Tip: To make a single genome file from multiple fastq files, execute: cat chr1.fa chr2.fa chr3.fa > genome.fa
Example: “/home/kai/genome/GRCh38/genome.fa”.
annotation¶
Genome annotation in GTF format. For human and mouse, Gencode annotations are available at http://www.gencodegenes.org/. Very important: chromosome names in the annotations GTF file have to match chromosome names in the FASTA genome sequence file. For example, one can use ENSEMBL FASTA files with ENSEMBL GTF files, and UCSC FASTA files with UCSC FASTA files. However, since UCSC uses chr1, chr2,... naming convention, and ENSEMBL uses 1, 2, ... naming, the ENSEMBL and UCSC FASTA and GTF files cannot be mixed together.
Example: “/home/kai/genome/GRCh38/gencode.v25.annotation.gtf”.
Genome Indices¶
seq_index¶
This is the FILE containing GENOME SEQUENCE INDEX.
Example: “/home/kai/genome/GRCh38/GRCh38.index”.
bwa_index¶
This is the DIRECTORY containing BWA INDICES.
Example: “/home/kai/genome/GRCh38/BWAIndex/”.
star_index¶
This is the DIRECTORY containing STAR INDICES.
Example: “/home/kai/genome/GRCh38/STAR_index/”.
rsem_index¶
This is the DIRECTORY containing RSEM INDICES.
Example: “/home/kai/genome/GRCh38/RSEM_index/”.