(Require version v1.3 or above)
The analysis may need following additional software:
- samtools >= v1.9.
- BWA >= v0.7.17.
- taiji-utils: install using
pip install taiji-utils.
Preparing input and configuration files
To start, create a TAB-delimited file named input.tsv and a YAML file named config.yml. Taiji accepts multiple types of input. Below are example input and configuration files for common input types. Detailed documentation about the input format can be found here.
File 1: input.tsv
type id rep path
scRNA-seq pbmc 1 pbmc_granulocyte_sorted_10k_RNA_R1.fastq.gz,pbmc_granulocyte_sorted_10k_RNA_R2.fastq.gz
In this example input file, we analyze the raw fastq files produced by the 10x Genomics platform.
File 2: config.yml
output_dir: "output"
input: input.tsv
genome: "path-to-genome/GRCh38.fa"
star_index: "path-to-STAR/STAR_index" # optional
genome_index: "path-to-genome/GRCh38.index" # optional
annotation: "path-to-annotation/gencode.v31.annotation.gtf"
scrna_options:
cell_barcode_length: 16 # specific to 10X genomics
umi_length: 12 # specific to 10X genomics
See here for all available options and their roles.
File 1: input.tsv
type id group rep path tags
scRNA-seq forebrain_E11.5 forebrain_E11.5 1 E11.5_R1.fastq.gz,E11.5_R2.fastq.gz Demultiplexed
scRNA-seq forebrain_P0 forebrain_P0 1 P0_R1.fastq.gz,P0_R2.fastq.gz Demultiplexed
In this example we used demultiplexed FASTQ files, in which each fastq record were demultiplexed by adding the barcode to the beginning of each read in the following format: “@” + “barcode” + “:” + “read_name”. Below is one example of demultiplexed fastq file:
$ zcat CEMBA180306_2B.demultiplexed.R1.fastq.gz | head
@AGACGGAGACGAATCTAGGCTGGTTGCCTTAC:7001113:920:HJ55CBCX2:1:1108:1121:1892 1:N:0:0
ATCCTGGCATGAAAGGATTTTTTTTTTAGAAAATGAAATATATTTTAAAG
+
DDDDDIIIIHIIGHHHIIIHIIIIIIHHIIIIIIIIIIIIIIIIIIIIII
File 2: config.yml
input: "input.tsv"
output_dir: "output/"
bwa_index: "path-to-BWAIndex/genome.fa"
genome: "path-to-genome-fasta/genome.fa"
If you don’t have genome fasta file or BWA index on your computer, you can tell Taiji to automatically download that for you by specifying the genome assembly:
input: "input.tsv"
output_dir: "output/"
assembly: "mm10"
Quality control
taiji run --config config.yml --select SCRNA_QC
More analyses
Use taiji view taiji.html to see what are availiable!