(Require version v1.3 or above)
The analysis may need following additional software:
- samtools >= v1.9.
- BWA >= v0.7.17.
- taiji-utils: install using pip install taiji-utils.
Preparing input and configuration files
To start, create a TAB-delimited file named input.tsv and a YAML file named config.yml. Taiji accepts multiple types of input. Below are example input and configuration files for common input types. Detailed documentation about the input format can be found here.
File 1: input.tsv
type	id	rep	path
scRNA-seq	pbmc	1	pbmc_granulocyte_sorted_10k_RNA_R1.fastq.gz,pbmc_granulocyte_sorted_10k_RNA_R2.fastq.gzIn this example input file, we analyze the raw fastq files produced by the 10x Genomics platform.
File 2: config.yml
output_dir: "output"
input: input.tsv
genome: "path-to-genome/GRCh38.fa"
star_index: "path-to-STAR/STAR_index"  # optional
genome_index: "path-to-genome/GRCh38.index"  # optional
annotation: "path-to-annotation/gencode.v31.annotation.gtf"
scrna_options:
  cell_barcode_length: 16   # specific to 10X genomics
  umi_length: 12   # specific to 10X genomicsSee here for all available options and their roles.
File 1: input.tsv
type	id	group	rep	path	tags
scRNA-seq	forebrain_E11.5	forebrain_E11.5	1	E11.5_R1.fastq.gz,E11.5_R2.fastq.gz	Demultiplexed
scRNA-seq	forebrain_P0	forebrain_P0	1	P0_R1.fastq.gz,P0_R2.fastq.gz	DemultiplexedIn this example we used demultiplexed FASTQ files, in which each fastq record were demultiplexed by adding the barcode to the beginning of each read in the following format: “@” + “barcode” + “:” + “read_name”. Below is one example of demultiplexed fastq file:
$ zcat CEMBA180306_2B.demultiplexed.R1.fastq.gz | head 
@AGACGGAGACGAATCTAGGCTGGTTGCCTTAC:7001113:920:HJ55CBCX2:1:1108:1121:1892 1:N:0:0
ATCCTGGCATGAAAGGATTTTTTTTTTAGAAAATGAAATATATTTTAAAG
+
DDDDDIIIIHIIGHHHIIIHIIIIIIHHIIIIIIIIIIIIIIIIIIIIIIFile 2: config.yml
input: "input.tsv"
output_dir: "output/"
bwa_index: "path-to-BWAIndex/genome.fa"
genome: "path-to-genome-fasta/genome.fa"If you don’t have genome fasta file or BWA index on your computer, you can tell Taiji to automatically download that for you by specifying the genome assembly:
input: "input.tsv"
output_dir: "output/"
assembly: "mm10"Quality control
taiji run --config config.yml --select SCRNA_QCMore analyses
Use taiji view taiji.html to see what are availiable!