Nextflow RNAseq

RNA-seq

  1. Merge re-sequenced FastQ files (cat)
  2. Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
  3. Read QC (FastQC)
  4. UMI extraction (UMI-tools)
  5. Adapter and quality trimming (Trim Galore!)
  6. Removal of genome contaminants (BBSplit)
  7. Removal of ribosomal RNA (SortMeRNA)
  8. Choice of multiple alignment and quantification routes: STAR -> Salmon STAR -> RSEM HiSAT2 -> NO QUANTIFICATION
  9. Sort and index alignments (SAMtools)
  10. UMI-based deduplication (UMI-tools)
  11. Duplicate read marking (picard MarkDuplicates)
  12. Transcript assembly and quantification (StringTie)
  13. Create bigWig coverage files (BEDTools, bedGraphToBigWig)
  14. Extensive quality control: RSeQC Qualimap dupRadar Preseq DESeq2
  15. Pseudo-alignment and quantification (Salmon; optional)
  16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)

umi-tools extract:

Flexible removal of UMI sequences from fastq reads.
UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read. Can also filter reads by quality or against a whitelist (see above)

The remaining commands, group, dedup and count/count_tab, are used to identify PCR duplicates using the UMIs and perform different levels of analysis depending on the needs of the user. A number of different UMI deduplication schemes are enabled - The recommended method is directional.

umi-tools dedup:

Groups PCR duplicates and deduplicates reads to yield one read per group
Use this when you want to remove the PCR duplicates prior to any downstream analysis

Introducing BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries

Removal of genome contaminants (BBSplit)
Removal of ribosomal RNA

StringTie for Transcript assembly and quantification

Extensive quality control:

The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library,

Salmon used expectation–maximization (EM) algorithm to assign reads if two copy of genes occurs.

