Comprehensive smallRNA-7 profiling using exceRpt pipeline with full reference databases

gene_x 0 like s 314 view s

Tags: pipeline

TODO_1: Update the image

mapping_heatmap3

  1. Input data

    1. mkdir ~/DATA/Data_Ute/Data_Ute_smallRNA_7/raw_data
    2. cd raw_data
    3. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf930/01_0505_WaGa_wt_EV_RNA_S1_R1_001.fastq.gz 0505_WaGa_wt.fastq.gz
    4. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf931/02_0505_WaGa_sT_DMSO_EV_RNA_S2_R1_001.fastq.gz 0505_WaGa_sT_DMSO.fastq.gz
    5. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf932/03_0505_WaGa_sT_Dox_EV_RNA_S3_R1_001.fastq.gz 0505_WaGa_sT_Dox.fastq.gz
    6. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf933/04_0505_WaGa_scr_DMSO_EV_RNA_S4_R1_001.fastq.gz 0505_WaGa_scr_DMSO.fastq.gz
    7. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf934/05_0505_WaGa_scr_Dox_EV_RNA_S5_R1_001.fastq.gz 0505_WaGa_scr_Dox.fastq.gz
    8. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf935/06_1905_WaGa_wt_EV_RNA_S6_R1_001.fastq.gz 1905_WaGa_wt.fastq.gz
    9. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf936/07_1905_WaGa_sT_DMSO_EV_RNA_S7_R1_001.fastq.gz 1905_WaGa_sT_DMSO.fastq.gz
    10. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf937/08_1905_WaGa_sT_Dox_EV_RNA_S8_R1_001.fastq.gz 1905_WaGa_sT_Dox.fastq.gz
    11. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf938/09_1905_WaGa_scr_DMSO_EV_RNA_S9_R1_001.fastq.gz 1905_WaGa_scr_DMSO.fastq.gz
    12. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf939/10_1905_WaGa_scr_Dox_EV_RNA_S10_R1_001.fastq.gz 1905_WaGa_scr_Dox.fastq.gz
    13. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf940/11_control_MKL1_S11_R1_001.fastq.gz control_MKL1.fastq.gz
    14. cp ~/DATA/Data_Ute/Data_Ute_smallRNA_7/231016_NB501882_0435_AHG7HMBGXV/nf941/12_control_WaGa_S12_R1_001.fastq.gz control_WaGa.fastq.gz
    15. #END
  2. Run cutadapt

    1. some common adapter sequences from different kits for reference:
    2. - TruSeq Small RNA (Illumina): TGGAATTCTCGGGTGCCAAGG
    3. - Small RNA Kits V1 (Illumina): TCGTATGCCGTCTTCTGCTTGT
    4. - Small RNA Kits V1.5 (Illumina): ATCTCGTATGCCGTCTTCTGCTTG
    5. - NEXTflex Small RNA Sequencing Kit v3 for Illumina Platforms (Bioo Scientific): TGGAATTCTCGGGTGCCAAGG
    6. - LEXOGEN Small RNA-Seq Library Prep Kit (Illumina): TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC *
    7. mkdir trimmed; cd trimmed
    8. for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do
    9. cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 -o ${sample}_cutadapted.fastq.gz --minimum-length 5 --trim-n ../raw_data/${sample}.fastq.gz >> LOG
    10. done
    11. # -- check if it is necessary to remove adapter from 5'-end --
    12. (Option_1) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -o /dev/null --report=minimal 0505_WaGa_wt_cutadapted.fastq.gz --> The trimming statistics in the output will show how often 5'-end adapters were removed.
    13. (Option 2) zcat your_sample.fastq.gz | grep 'TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC' | head -n 20
    14. (Option 3) fastqc your_sample.fastq.gz
    15. #Open the generated HTML report and check:
    16. # The "Overrepresented sequences" section for adapter sequences.
    17. # The "Per base sequence content" plot to see if there are unexpected sequences at the start of reads.
    18. #(If check results shows both ends contain adapter) cutadapt -g TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC -q 20 --minimum-length 10 -o ${sample}_trimmed.fastq.gz ${sample}.fastq.gz >> LOG2
    19. # -g → Trims 5'-end adapters
    20. # -a → Trims 3'-end adapters; -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC → Specifies the adapter sequence to be removed from the 3' end of the reads. The sequence provided is common in RNA-seq libraries (e.g., Illumina small RNA sequencing).
    21. # -q 20 → Performs quality trimming at both read ends, removing bases with a Phred quality score below 20.
  3. Install exceRpt (https://github.gersteinlab.org/exceRpt/)

    1. docker pull rkitchen/excerpt
    2. mkdir MyexceRptDatabase
    3. cd /mnt/nvme0n1p1/MyexceRptDatabase
    4. wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
    5. tar -xvf exceRptDB_v4_hg38_lowmem.tgz
    6. #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg19_lowmem.tgz
    7. #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_hg38_lowmem.tgz
    8. #http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_mm10_lowmem.tgz
    9. wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOmiRNArRNA.tgz
    10. tar -xvf exceRptDB_v4_EXOmiRNArRNA.tgz
    11. wget http://org.gersteinlab.excerpt.s3-website-us-east-1.amazonaws.com/exceRptDB_v4_EXOGenomes.tgz
    12. tar -xvf exceRptDB_v4_EXOGenomes.tgz
  4. Run exceRpt

    1. #[COMPLETE_DB]
    2. docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \
    3. -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \
    4. -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
    5. -t rkitchen/excerpt \
    6. INPUT_FILE_PATH=/exceRptInput/0505_WaGa_wt_cutadapted.fastq.gz \
    7. MAIN_ORGANISM_GENOME_ID=hg38 \
    8. N_THREADS=50 \
    9. JAVA_RAM='800G'
    10. #[SMALL_DB]
    11. docker run -v /mnt/nvme0n1p1/MyInputSample:/exceRptInput \
    12. -v /mnt/nvme0n1p1/MyResults:/exceRptOutput \
    13. -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \
    14. -t rkitchen/excerpt \
    15. INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz
    16. N_THREADS=50 \
    17. JAVA_RAM='800G'
    18. #[REAL_RUNNING_SMALL_DB]
    19. mkdir results
    20. for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do
    21. docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \
    22. -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results:/exceRptOutput \
    23. -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/hg38 \
    24. -t rkitchen/excerpt \
    25. INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G'
    26. done
    27. mkdir results_exo2
    28. for sample in 0505_WaGa_wt; do
    29. for sample in 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do
    30. docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \
    31. -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo2:/exceRptOutput \
    32. -v /mnt/nvme0n1p1/MyexceRptDatabase/hg38:/exceRpt_DB/gh38 \
    33. -v /mnt/nvme0n1p1/MyexceRptDatabase/miRBase:/exceRpt_DB/miRBase \
    34. -v /mnt/nvme0n1p1/MyexceRptDatabase/NCBI_taxonomy_taxdump:/exceRpt_DB/NCBI_taxonomy_taxdump \
    35. -v /mnt/nvme0n1p1/MyexceRptDatabase/Genomes_BacteriaFungiMammalPlantProtistVirus:/exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus \
    36. -v /mnt/nvme0n1p1/MyexceRptDatabase/ribosomeDatabase:/exceRpt_DB/ribosomeDatabase \
    37. -t rkitchen/excerpt \
    38. INPUT_FILE_PATH=/exceRptInput/${sample}_cutadapted.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
    39. done
    40. #DEBUG_1 for ERROR: could not find adapters at path /exceRpt_DB/adapters/adapters.fa
    41. #The /exceRpt_DB/adapters/adapters.fa in the Docker environment will be overwritten when assigning a new directory as /exceRpt_DB. Therefore, we should create a new adapters.fa file in the new database environment
    42. jhuang@WS-2290C:/mnt/nvme0n1p1/MyexceRptDatabase$ cp -r ../exceRpt/exceRpt_coreDB/* ./
    43. #DEBUG_2 for EXITING because of fatal input ERROR: could not open user-defined parameters file /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in
    44. #jhuang@WS-2290C:/mnt/nvme0n1p1/MyexceRptDatabase$ cp STAR_Parameters_Exogenous.in Genomes_BacteriaFungiMammalPlantProtistVirus/
    45. #Debugging Tips
    46. # Verify Database Structure and Ensure your mounted /exceRpt_DB contains:
    47. # /exceRpt_DB
    48. # ├── hg38/ # Endogenous
    49. # ├── NCBI_taxonomy_taxdump/ # Taxonomy
    50. # └── Genomes_BacteriaFungi.../ # Exogenous references
    51. # Check Intermediate Files
    52. # Confirm that the endogenous step generates the expected input for exogenous processing (e.g., exogenous_alignments.sam).
    53. mkdir results_g results_exo4 results_exo5
    54. docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo4:/exceRptOutput \
    55. -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
    56. -t rkitchen/excerpt \
    57. INPUT_FILE_PATH=/exceRptInput/testData_human.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
    58. #NOTE that rkitchen/excerpt refers to exceRpt_shortRNA (bash script): The extra-cellular RNA processing toolkit (exceRpt) optimised for smallRNA analysis; This pipeline processes a single smallRNA sequence file from a single sample
    59. #TODO_3: how to call exceRpt_longRNA: The extra-cellular RNA processing toolkit (exceRpt) optimised for longRNA analysis; This pipeline processes a single longRNA sequence file from a single sample.
    60. # docker inspect rkitchen/excerpt:latest; docker history rkitchen/excerpt:latest; docker history --no-trunc rkitchen/excerpt:latest
    61. # "Entrypoint": [
    62. # "make",
    63. # "-f",
    64. # "/exceRpt_bin/exceRpt_smallRNA",
    65. # "EXE_DIR=/exceRpt_bin",
    66. # "DATABASE_PATH=/exceRpt_DB",
    67. # "JAVA_EXE=java",
    68. # "OUTPUT_DIR=/exceRptOutput",
    69. # "MAP_EXOGENOUS=off",
    70. # "N_THREADS=4"
    71. # ]
    72. #[REAL_RUNNING_COMPLETE_DB]
    73. #NOTE that if not renamed in the input files, then have to RENAME all files recursively by removing "_cutadapted.fastq" in all names in _CORE_RESULTS_v4.6.3.tgz (first unzip, removing, then zip, mv to ../results_g).
    74. cd trimmed
    75. for file in *_cutadapted.fastq.gz; do
    76. echo "mv \"$file\" \"${file/_cutadapted.fastq/}\""
    77. done
    78. mkdir results_exo5
    79. for sample in 0505_WaGa_wt 0505_WaGa_sT_DMSO 0505_WaGa_sT_Dox 0505_WaGa_scr_DMSO 0505_WaGa_scr_Dox 1905_WaGa_wt 1905_WaGa_sT_DMSO 1905_WaGa_sT_Dox 1905_WaGa_scr_DMSO 1905_WaGa_scr_Dox control_MKL1 control_WaGa; do
    80. docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/trimmed:/exceRptInput \
    81. -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo5:/exceRptOutput \
    82. -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB \
    83. -t rkitchen/excerpt \
    84. INPUT_FILE_PATH=/exceRptInput/${sample}.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
    85. done
    86. #The running process: https://github.com/gersteinlab/exceRpt/blob/master/exceRpt_smallRNA (bash script) in docker, then call java scripts https://github.com/gersteinlab/exceRpt/blob/master/exceRpt_Tools/main/ExceRpt_Tools.java, ProcessEndogenousAlignments.java and ProcessExogenousAlignments.java.
    87. #NOTE that in exceRpt_smallRNA.sh
    88. ## Choose what kind of EXOGENOUS alignments to attempt:
    89. ## - off : none
    90. ## - miRNA : map only to exogenous miRNAs in miRbase
    91. ## - on : map to exogenous miRNAs in miRbase AND the genomes of all sequenced species in ensembl/NCBI
    92. #Most of the Docker command is loading directories on your machine (the -v parameters) so that exceRpt can read from or write to them. The directory to the left of each : can obviously be whatever you want, but it is important to make sure the right side of each : is written as above or exceRpt will not be able to find/write the data it needs.
  5. Processing exceRpt output from multiple samples

    Also provided is a script to combine output from multiple samples run through the exceRpt pipeline. The script (mergePipelineRuns.R) will take as input a directory containing 1 or more subdirectories or zipfiles containing output from the makefile above. In this way, results from 1 or more smallRNA-seq samples can be combined, several QC plots are generated, and the read-counts are normalised ready for downstream analysis by clustering and/or differential expression.

    Installation

    1. This script is comparatively much simpler to install. Once the R software (http://cran.r-project.org/) is set up on your system the script should automatically identify and install all required dependencies. Again, this script is available on the Genboree Workbench (www.genboree.org) and is also free for academic use.

    Using the script: On the command line

    1. mamba activate r_env
    2. jhuang@WS-2290C:/mnt/nvme0n1p1/exceRpt-master$ Rscript mergePipelineRuns.R /home/jhuang/DATA/Data_Ute/Data_Ute_smallRNA_7/MyResults/
    3. #OBSERVE the env of R: ~/mambaforge/envs/r_env/lib/R/library
    4. #which R: /home/jhuang/mambaforge/envs/r_env/bin/R
    5. #The env is nothing to do with "sudo chmod -R 777 /usr/lib/R/site-library"
    6. #ERROR: MyResults is not writable --> DEBUG: sudo chown -R jhuang:jhuang MyResults MyResults2 results results2

    -- COUNTINE HERE after docker running --> Using the script: Interactively in R

    1. #Alternatively in an interactive R session, the merge can be performed using the following two commands:
    2. mkdir summaries_g summaries_exo4 summaries_exo5
    3. (r_env) jhuang@WS-2290C:~/DATA/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master$ R
    4. #WARNING: need to reload the R-script after each change of the script.
    5. source("mergePipelineRuns_functions.R")
    6. # -- DEBUG freetype-error --
    7. # #sudo apt-get install libfreetype6-dev
    8. # mamba activate r_env
    9. # mamba install -c conda-forge --force-reinstall freetype fontconfig pkg-config
    10. # library(systemfonts)
    11. # system_fonts() # Should return font list without errors
    12. getwd()
    13. [1] "/media/jhuang/Elements/Data_Ute/Data_Ute_smallRNA_7/exceRpt-master"
    14. processSamplesInDir("../results_g/", "../summaries_g")
    15. processSamplesInDir("../results_exo4/", "../summaries_exo4")
    16. processSamplesInDir("../results_exo5/", "../summaries_exo5")
    17. #~/Tools/csv2xls-0.4/csv_to_xls.py exceRpt_miRNA_ReadsPerMillion.txt exceRpt_tRNA_ReadsPerMillion.txt exceRpt_piRNA_ReadsPerMillion.txt -d$'\t' -o exceRpt_results_detailed.xls

    Script output

    1. Several files are output by the script in the location of the input exceRpt results (or somewhere else if explicitly specified). All output files are prefixed with exceRpt_ and contain a variety of information regarding all samples input:
    2. File Name Description
    3. QC data:
    4. exceRpt_DiagnosticPlots.pdf All diagnostic plots automatically generated by the merge script
    5. exceRpt_readMappingSummary.txt Read-alignment summary including total counts for each library
    6. exceRpt_ReadLengths.txt Read-lengths (after 3 adapters/barcodes are removed)
    7. Raw transcriptome quantifications:
    8. exceRpt_miRNA_ReadCounts.txt miRNA read-counts quantifications
    9. exceRpt_tRNA_ReadCounts.txt tRNA read-counts quantifications
    10. exceRpt_piRNA_ReadCounts.txt piRNA read-counts quantifications
    11. exceRpt_gencode_ReadCounts.txt gencode read-counts quantifications
    12. exceRpt_circularRNA_ReadCounts.txt circularRNA read-count quantifications
    13. Normalised transcriptome quantifications:
    14. exceRpt_miRNA_ReadsPerMillion.txt miRNA RPM quantifications
    15. exceRpt_tRNA_ReadsPerMillion.txt tRNA RPM quantifications
    16. exceRpt_piRNA_ReadsPerMillion.txt piRNA RPM quantifications
    17. exceRpt_gencode_ReadsPerMillion.txt gencode RPM quantifications
    18. exceRpt_circularRNA_ReadsPerMillion.txt circularRNA RPM quantifications
    19. R objects:
    20. exceRpt_smallRNAQuants_ReadCounts.RData All raw data (binary R object)
    21. exceRpt_smallRNAQuants_ReadsPerMillion.RData All normalised data (binary R object)
  6. Re-draw the heatmap plots

    1. #genome 97.9% 98.3% 21.3% 44.9% 81.4% 78.3% 78.5% 79.3% 73.3% 69.2% 65.6% 71.9%
    2. #miRNA_sense 84.7% 85.6% 3.5% 7.1% 16.2% 14.7% 15.8% 15.3% 7.5% 7.0% 12.9% 14.6%
    3. #miRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    4. #
    5. #miRNAprecursor_sense 0.1% 0.1% 0.0% 0.0% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0%
    6. #miRNAprecursor_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    7. #
    8. #tRNA_sense 3.4% 1.8% 8.4% 25.3% 45.3% 41.4% 48.8% 47.3% 52.1% 49.0% 41.2% 33.9%
    9. #tRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    10. #
    11. #piRNA_sense 0.6% 0.5% 0.1% 0.4% 0.3% 0.4% 0.5% 0.4% 0.4% 0.5% 0.4% 0.6%
    12. #piRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    13. #
    14. #gencode_sense 7.0% 8.5% 6.7% 8.6% 15.7% 16.6% 10.8% 12.9% 11.2% 10.8% 8.5% 18.3%
    15. #gencode_antisense 0.1% 0.1% 0.7% 0.3% 0.2% 0.3% 0.2% 0.2% 0.2% 0.2% 0.2% 0.3%
    16. #gencode 7.10% 8.60% 7.40% 8.90% 15.90% 16.90% 11.00% 13.10% 11.40% 11.00% 8.70% 18.60%
    17. #
    18. #circularRNA_sense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    19. #circularRNA_antisense 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
    20. #
    21. #not_mapped_to_genome_or_libs 2.1% 1.7% 78.7% 55.1% 18.6% 21.7% 21.5% 20.7% 26.7% 30.8% 34.4% 28.1%
    22. import pandas as pd
    23. import numpy as np
    24. import seaborn as sns
    25. import matplotlib.pyplot as plt
    26. # Define data
    27. samples = [
    28. "control MKL1", "control WaGa", "WaGa wildtype 0505", "WaGa wildtype 1905",
    29. "WaGa sT DMSO 0505", "WaGa sT DMSO 1905", "WaGa sT Dox 0505", "WaGa sT Dox 1905",
    30. "WaGa scr DMSO 0505", "WaGa scr DMSO 1905", "WaGa scr Dox 0505", "WaGa scr Dox 1905"
    31. ]
    32. #TODO_2: genome --> human_genome, not_mapped_to_genome_or_libs --> not_mapped_to_human_genome
    33. # send the new results including exogenous alignments to Ute!
    34. #categories = [
    35. # "reads_used_for_alignment", "genome", "miRNA", "miRNAprecursor", "tRNA", "piRNA",
    36. # "gencode", "circularRNA", "not_mapped_to_genome_or_libs"
    37. #]
    38. categories = [
    39. "reads_used_for_alignment", "human_genome", "miRNA", "miRNAprecursor", "tRNA", "piRNA",
    40. "gencode", "circularRNA", "not_mapped_to_human_genome"
    41. ]
    42. data = np.array([
    43. [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0],
    44. [97.9, 98.3, 44.9, 21.3, 65.6, 71.9, 78.5, 81.4, 73.3, 79.3, 69.2, 78.3],
    45. [84.7, 85.6, 7.1, 3.5, 12.9, 14.6, 15.8, 16.2, 7.5, 15.3, 7.0, 14.7],
    46. [0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.0, 0.1],
    47. [3.4, 1.8, 25.3, 8.4, 41.2, 33.9, 48.8, 45.3, 52.1, 47.3, 49.0, 41.4],
    48. [0.6, 0.5, 0.4, 0.1, 0.4, 0.6, 0.5, 0.3, 0.4, 0.4, 0.5, 0.4],
    49. [7.1, 8.6, 8.9, 7.4, 8.7, 18.6, 11.0, 15.9, 11.4, 13.1, 11.0, 16.9],
    50. [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    51. [2.1, 1.7, 55.1, 78.7, 34.4, 28.1, 21.5, 18.6, 26.7, 20.7, 30.8, 21.7]
    52. ])
    53. ## Load data from Excel file
    54. #file_path = "mapping_heatmap.xlsx"
    55. #
    56. ## Read Excel file, assuming first column is index (row labels)
    57. #df = pd.read_excel(file_path, index_col=0)
    58. # Convert percentages to decimals
    59. data = data / 100.0
    60. # Create DataFrame
    61. df = pd.DataFrame(data, index=categories, columns=samples)
    62. # Plot heatmap
    63. plt.figure(figsize=(14, 6))
    64. sns.heatmap(df, annot=True, cmap="coolwarm", fmt=".3f", linewidths=0.5, cbar_kws={'label': 'Fraction Aligned Reads'})
    65. # Improve layout
    66. plt.title("Heatmap of Read Alignments by Category and Sample", fontsize=14)
    67. plt.xlabel("Sample", fontsize=12)
    68. plt.ylabel("Read Category", fontsize=12)
    69. plt.xticks(rotation=15, ha="right", fontsize=10)
    70. plt.yticks(rotation=0, fontsize=10)
    71. plt.tight_layout()
    72. # Save as PNG
    73. plt.savefig("mapping_heatmap.png", dpi=300, bbox_inches="tight")
    74. # Show plot
    75. plt.show()
  7. Key steps of log: This log details the execution of a small RNA sequencing data analysis pipeline using the exceRpt tool (version 4.6.3) in a Docker container. The pipeline processes a human small RNA-seq dataset (testData_human.fastq.gz) with the following key steps:

    • Initial Setup

      • Docker container launched with mounted volumes for input/output and reference databases.
      • Parameters: hg38 genome, 50 threads, 200GB Java memory, exogenous mapping enabled.
      • Docker container launched with input/output volume mounts
      • 50 threads allocated with 200GB Java memory
      • hg38 reference genome specified
    • Preprocessing

      • Adapter detection and trimming using known adapter sequences.
      • Quality filtering (Phred score ≥20, length ≥18nt).
      • Removal of homopolymer-rich reads and low-quality sequences.
      • Input FASTQ file decompressed (testData_human.fastq.gz)
      • Adapter sequences identified using adapters.fa
      • Quality encoding determined (Phred+33/64)
      • Adapter clipping performed (TCGTATGCCGTCTTCTGCTTG)
      • Quality filtering (Q20, p<80%)
      • Homopolymer repeats filtered (max 66% single nt)
    • Contaminant Filtering

      • Alignment against UniVec contaminants and ribosomal RNA (rRNA) databases.
      • 322 reads processed, with statistics tracked at each step.
    • Endogenous RNA Analysis

      • Alignment to human genome (hg38) and transcriptome.
      • Quantification of small RNA types:
        • miRNA (mature/precursor): Sense strands detected (antisense absent).
        • tRNA, piRNA, gencode transcripts: Only sense strands reported.
        • circRNA: Not detected in this dataset.
      • Coverage and complexity metrics calculated.
    • Exogenous RNA Analysis

      • Screened for microbial/viral RNAs:
        • miRNA databases (miRBase).
        • Ribosomal RNA databases.
        • Comprehensive genomic databases (bacteria, plants, metazoa, fungi, viruses).
      • Taxonomic classification of exogenous hits performed.
    • QC & Results

      • QC Result: PASS (based on transcriptome/genome ratio >0.5 and >100k transcriptome reads).
      • Key Metrics:
        • Input Reads: ~1.5 million (exact count not shown in log).
        • Genome Mapped: Majority of reads.
        • Transcriptome Complexity: Calculated ratio.
      • Core results compressed into testData_human.fastq_CORE_RESULTS_v4.6.3.tgz.
    • Notable Observations:

      • Antisense Reads: Absent for miRNA, tRNA, and piRNA (common in small RNA-seq).
      • Potential Issues: Some files (e.g., antisense counts) were missing but did not disrupt pipeline.
      • Resource Usage: High RAM (200GB) and multi-threading (50 cores) employed for efficiency.
    • Output Files:

      • Quantified counts for endogenous RNAs (miRNA, tRNA, etc.).
      • Exogenous RNA alignments with taxonomic annotations.
      • QC report, adapter sequences, and alignment statistics.
  8. Raw LOG of the pipeline providing a comprehensive small RNA profile, distinguishing host transcripts from contaminants and exogenous RNAs.

    1. jhuang@WS-2290C:/media/jhuang/Elements/Data_Ute/Data_Ute_smallRNA_7$ docker run -v ~/DATA/Data_Ute/Data_Ute_smallRNA_7/results_exo4:/exceRptOutput -v /mnt/nvme0n1p1/MyexceRptDatabase:/exceRpt_DB -t rkitchen/excerpt INPUT_FILE_PATH=/exceRptInput/testData_human.fastq.gz MAIN_ORGANISM_GENOME_ID=hg38 N_THREADS=50 JAVA_RAM='200G' MAP_EXOGENOUS=on
    2. #
    3. mkdir -p /exceRptOutput/testData_human.fastq
    4. #
    5. gunzip -c /exceRptInput/testData_human.fastq.gz 2>> /exceRptOutput/testData_human.fastq.err | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar FindAdapter -n 10000 -m 1000000 -s 4 -a /exceRpt_DB/adapters/adapters.fa - > /exceRptOutput/testData_human.fastq/testData_human.fastq.adapterSeq 2>> /exceRptOutput/testData_human.fastq.log
    6. #
    7. ## ASCII 84 is equal to Q20 (p<0.01) in Phred+64, so any file with max quals greater than this can reasonably assumed to be Phred+64
    8. gunzip -c /exceRptInput/testData_human.fastq.gz | head -n 40000 | awk '{if(NR%4==0) printf("%s",$0);}' | od -A n -t u1 | grep -v "^\*" | awk 'BEGIN{min=100;max=0;}{for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}}END{if(max<84) print "33"; else print "64";}' > /exceRptOutput/testData_human.fastq/testData_human.fastq.qualityEncoding
    9. cat: /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq: No such file or directory
    10. ## Run the SW alignment of known adapters regardless of user preference
    11. gunzip -c /exceRptInput/testData_human.fastq.gz 2>> /exceRptOutput/testData_human.fastq.err | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar FindAdapter -n 1000 -m 100000 -s 4 -a /exceRpt_DB/adapters/adapters.fa - > /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq 2>> /exceRptOutput/testData_human.fastq.log
    12. #@echo -e "`/bin/date "+%Y-%m-%d--%H:%M:%S"` exceRpt_smallRNA: Known adapter sequence: \n" >> /exceRptOutput/testData_human.fastq.log
    13. ## Carry on with the adapter provided / guessed
    14. gunzip -c /exceRptInput/testData_human.fastq.gz > /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp; /exceRpt_bin/fastx_0.0.14/bin/fastx_clipper -Q33 -a TCGTATGCCGTCTTCTGCTTG -l 18 -v -n -M 7 -i /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp -z -o /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.tmp.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; rm /exceRptOutput/testData_human.fastq/testData_human.fastq.preClipped.fastq.tmp
    15. ## Count reads input to adapter clipping
    16. grep "Input: " /exceRptOutput/testData_human.fastq.log | awk '{print "input\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    17. ## Count reads output following adapter clipping
    18. grep "Output: " /exceRptOutput/testData_human.fastq.log | awk '{print "successfully_clipped\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    19. ## Remove random barcodes if there are any
    20. mv /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.tmp.gz /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.gz
    21. gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.fastq.gz | java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar TrimFastq -5p 0 -3p 0 | gzip -c > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.fastq.gz 2>>/exceRptOutput/testData_human.fastq.log
    22. gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.fastq.gz | /exceRpt_bin/fastx_0.0.14/bin/fastq_quality_filter -v -Q33 -p 80 -q 20 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp 2>>/exceRptOutput/testData_human.fastq.log
    23. ## Count reads that failed the quality filter
    24. grep "low-quality reads" /exceRptOutput/testData_human.fastq.log | awk '{print "failed_quality_filter\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    25. #
    26. # Filter homopolymer reads (those that have too many single nt repeats)
    27. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar RemoveHomopolymerRepeats --verbose -m 0.66 -i /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp -o /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.REMOVEDRepeatReads.fastq
    28. gzip /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.REMOVEDRepeatReads.fastq
    29. gzip /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq
    30. rm /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.tmp
    31. ## Count homopolymer repeat reads that failed the quality filter
    32. grep "Done. Sequences removed" /exceRptOutput/testData_human.fastq.log | awk -F "=" '{print "failed_homopolymer_filter\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    33. gunzip -c /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq
    34. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar GetSequenceLengths /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.readLengths.txt 2>> /exceRptOutput/testData_human.fastq.err
    35. rm /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq
    36. java -classpath /exceRpt_bin/FastQC_0.11.7:/exceRpt_bin/FastQC_0.11.7/sam-1.103.jar:/exceRpt_bin/FastQC_0.11.7/jbzip2-0.9.jar -Xmx200G -Dfastqc.threads=50 -Dfastqc.unzip=false -Dfastqc.output_dir=/exceRptOutput/testData_human.fastq/ uk/ac/babraham/FastQC/FastQCApplication /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    37. ## Count calibrator oligo reads
    38. echo -e "calibrator\tNA" >> /exceRptOutput/testData_human.fastq.stats
    39. #
    40. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_ --genomeDir /exceRpt_DB/UniVec/STAR_INDEX_UniVec --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Aligned.out.bam | awk '{print $3}' | sort -k 2,2 2>> /exceRptOutput/testData_human.fastq.err | uniq --count > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.counts 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Aligned.out.bam | awk '{print $1}' | sort 2>> /exceRptOutput/testData_human.fastq.err | uniq -c | wc -l > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.readCount 2>> /exceRptOutput/testData_human.fastq.err; gzip -c /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noUniVecContaminants.fastq.gz; rm /exceRptOutput/testData_human.fastq/filteringAlignments_UniVec_Unmapped.out.mate1
    41. ## Count UniVec contaminant reads
    42. cat /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.readCount | awk '{print "UniVec_contaminants\t"$1}' >> /exceRptOutput/testData_human.fastq.stats
    43. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_rRNA --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noUniVecContaminants.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam | awk '{print $3}' | sort -k 2,2 2>> /exceRptOutput/testData_human.fastq.err | uniq -c > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.counts 2>> /exceRptOutput/testData_human.fastq.err; /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam | awk '{print $1}' | sort 2>> /exceRptOutput/testData_human.fastq.err | uniq -c | wc -l > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.readCount 2>> /exceRptOutput/testData_human.fastq.err; gzip -c /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz; rm /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Unmapped.out.mate1
    44. ## Count rRNA reads
    45. cat /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.readCount | awk ' {print "rRNA\t"$1}' >> /exceRptOutput/testData_human.fastq.stats
    46. #
    47. /exceRpt_bin/samtools-1.7/samtools sort -@ 50 -m 2G -O bam -T /exceRptOutput/testData_human.fastq/tmp /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam > /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.sorted.bam
    48. /exceRpt_bin/samtools-1.7/samtools index /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.sorted.bam
    49. rm /exceRptOutput/testData_human.fastq/filteringAlignments_rRNA_Aligned.out.bam
    50. java -classpath /exceRpt_bin/FastQC_0.11.7:/exceRpt_bin/FastQC_0.11.7/sam-1.103.jar:/exceRpt_bin/FastQC_0.11.7/jbzip2-0.9.jar -Xmx200G -Dfastqc.threads=50 -Dfastqc.unzip=false -Dfastqc.output_dir=/exceRptOutput/testData_human.fastq/ uk/ac/babraham/FastQC/FastQCApplication /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    51. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_genome --readFilesIn /exceRptOutput/testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    52. #
    53. ## sort the alignments by ReadID just in case these are paired end reads in a single file? -- no, better to flag that this is an invalid file (ToDo)
    54. #
    55. ## v use this line when we start dealing with paired-end reads
    56. #/exceRpt_bin/samtools-1.7/samtools fastq -1 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 -2 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate2 /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam
    57. /exceRpt_bin/samtools-1.7/samtools fastq /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1
    58. [M::bam2fq_mainloop] discarded 0 singletons
    59. [M::bam2fq_mainloop] processed 322 reads
    60. #
    61. ## map ALL READS to the TRANSCRIPTOME (STAR ungapped)
    62. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_ --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1 --genomeDir /exceRpt_DB/hg38/STAR_INDEX_transcriptome --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 --readFilesCommand - >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    63. gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.R1.fastq.gz
    64. #
    65. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_ --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Unmapped.out.mate1 --outReadsUnmapped Fastx --genomeDir /exceRpt_DB/hg38/STAR_INDEX_transcriptome --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 --readFilesCommand - >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    66. gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz
    67. #
    68. ## Count # mapped reads
    69. cat /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Log.final.out | grep "Number of input reads" | awk -F "|\t" '{print "reads_used_for_alignment\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    70. cat /exceRptOutput/testData_human.fastq/endogenousAlignments_genome*apped_transcriptome_Log.final.out | grep "Number of input reads\|Uniquely mapped reads number\|Number of reads mapped to multiple loci" | sed '2,4d' | awk -F "|\t" '{SUM+=$2}END{print "genome\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats
    71. #
    72. ## Compress STAR logs
    73. gzip /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Log.out
    74. gzip /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Log.out
    75. #
    76. ## Tidy up
    77. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_SJ.out.tab
    78. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_SJ.out.tab
    79. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Mapped.out.mate1
    80. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Unmapped.out.mate1
    81. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Unmapped.out.mate1
    82. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.out.mate1
    83. #
    84. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar CIGAR_2_PWM -f /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam.CIGARstats.txt 2>> /exceRptOutput/testData_human.fastq.log
    85. #
    86. /exceRpt_bin/samtools-1.7/samtools sort -n -@ 50 -m 2G -O bam -T /exceRptOutput/testData_human.fastq/tmp /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.bam > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam 2>> /exceRptOutput/testData_human.fastq.log
    87. #
    88. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ReadCoverage -exceRpt -a /exceRpt_DB/hg38/gencodeAnnotation.gtf -f /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam 2>> /exceRptOutput/testData_human.fastq.log
    89. #
    90. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam
    91. #
    92. ## Assign reads
    93. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessEndogenousAlignments --libPriority miRNA,tRNA,piRNA,gencode,circRNA --genomeMappedReads /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.bam --transcriptomeMappedReads /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Aligned.out.bam --hairpin2genome /exceRpt_DB/hg38/miRNA_precursor2genome.sam --mature2hairpin /exceRpt_DB/hg38/miRNA_mature2precursor.sam --dict /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.dict 2>> /exceRptOutput/testData_human.fastq.log | sort -k 2,2 -k 1,1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt
    94. #
    95. ## Do we want to downsample?
    96. #
    97. ## Quantify all annotated RNAs
    98. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar QuantifyEndogenousAlignments --dict /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.dict --acceptedAlignments /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt --outputPath /exceRptOutput/testData_human.fastq 2>> /exceRptOutput/testData_human.fastq.log
    99. #
    100. ## Summarise alignment statistics
    101. cat /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_sense.txt | awk '{SUM+=$4}END{printf "miRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    102. cat /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_antisense.txt | awk '{SUM+=$4}END{printf "miRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    103. cat: /exceRptOutput/testData_human.fastq/readCounts_miRNAmature_antisense.txt: No such file or directory
    104. cat /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_sense.txt | awk '{SUM+=$4}END{printf "miRNAprecursor_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    105. cat /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_antisense.txt | awk '{SUM+=$4}END{printf "miRNAprecursor_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    106. cat: /exceRptOutput/testData_human.fastq/readCounts_miRNAprecursor_antisense.txt: No such file or directory
    107. cat /exceRptOutput/testData_human.fastq/readCounts_tRNA_sense.txt | awk '{SUM+=$4}END{printf "tRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    108. cat /exceRptOutput/testData_human.fastq/readCounts_tRNA_antisense.txt | awk '{SUM+=$4}END{printf "tRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    109. cat: /exceRptOutput/testData_human.fastq/readCounts_tRNA_antisense.txt: No such file or directory
    110. cat /exceRptOutput/testData_human.fastq/readCounts_piRNA_sense.txt | awk '{SUM+=$4}END{printf "piRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    111. cat /exceRptOutput/testData_human.fastq/readCounts_piRNA_antisense.txt | awk '{SUM+=$4}END{printf "piRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    112. cat: /exceRptOutput/testData_human.fastq/readCounts_piRNA_antisense.txt: No such file or directory
    113. cat /exceRptOutput/testData_human.fastq/readCounts_gencode_sense.txt | awk '{SUM+=$4}END{printf "gencode_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    114. cat /exceRptOutput/testData_human.fastq/readCounts_gencode_antisense.txt | awk '{SUM+=$4}END{printf "gencode_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    115. cat /exceRptOutput/testData_human.fastq/readCounts_circRNA_sense.txt | awk '{SUM+=$4}END{printf "circularRNA_sense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    116. cat: /exceRptOutput/testData_human.fastq/readCounts_circRNA_sense.txt: No such file or directory
    117. cat /exceRptOutput/testData_human.fastq/readCounts_circRNA_antisense.txt | awk '{SUM+=$4}END{printf "circularRNA_antisense\t%.0f\n",SUM}' >> /exceRptOutput/testData_human.fastq.stats
    118. cat: /exceRptOutput/testData_human.fastq/readCounts_circRNA_antisense.txt: No such file or directory
    119. ## Count reads not mapping to the genome or to the libraries
    120. gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz | wc -l | awk '{print "not_mapped_to_genome_or_libs\t"($1/4)}' >> /exceRptOutput/testData_human.fastq.stats
    121. #
    122. ## Tidy up
    123. gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt > /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz
    124. rm /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt
    125. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_ --genomeDir /exceRpt_DB/hg38/STAR_INDEX_repetitiveElements --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeUnmapped_transcriptome_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    126. ## Assigned non-redundantly to annotated REs
    127. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Aligned.out.bam | grep -v "^@" | awk '{print $1}' | sort | uniq | wc -l | awk '{print "repetitiveElements\t"$0}' >> /exceRptOutput/testData_human.fastq.stats
    128. gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.R1.fastq.gz
    129. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_ --alignIntronMax 0 --alignIntronMin 21 --genomeDir /exceRpt_DB/hg38/STAR_INDEX_genome --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_repetitiveElements_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Endogenous_smallRNA.in --alignEndsType Local --outFilterMatchNmin 18 --outFilterMatchNminOverLread 0.9 --outFilterMismatchNmax 1 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    130. ## mapped to the genome with gaps
    131. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Aligned.out.bam | grep -v "^@" | awk '{print $1}' | sort | uniq | wc -l | awk '{print "endogenous_gapped\t"$0}' >> /exceRptOutput/testData_human.fastq.stats
    132. gzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.R1.fastq.gz
    133. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA
    134. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_ --genomeDir /exceRpt_DB/miRBase/STAR_INDEX_miRBaseAll --readFilesIn /exceRptOutput/testData_human.fastq/endogenousAlignments_genomeGapped_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log 2>> /exceRptOutput/testData_human.fastq.err
    135. gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.R1.fastq.gz
    136. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.out.mate1
    137. #
    138. ## quantify read alignments using a slight hack of the endogenous alignment engine
    139. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessEndogenousAlignments --forceLib miRNA --transcriptomeMappedReads /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Aligned.out.bam --hairpin2genome /exceRpt_DB/miRBase/miRNA_precursor2genome.sam --mature2hairpin /exceRpt_DB/miRBase/miRNA_mature2precursor.sam --dict /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.dict 2>> /exceRptOutput/testData_human.fastq.log | sort -k 2,2 -k 1,1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt
    140. #
    141. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar QuantifyEndogenousAlignments --dict /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.dict --acceptedAlignments /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt --outputPath /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA 2>> /exceRptOutput/testData_human.fastq.log
    142. #
    143. ## Tidy up:
    144. gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt > /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousMiRNAAlignments_Accepted.txt.gz
    145. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenousAlignments_Accepted.txt
    146. #
    147. ## Stats
    148. cat /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Log.final.out | grep "Number of input reads" | awk -F "|\t" '{print "input_to_exogenous_miRNA\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    149. cat /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Log.final.out | grep "Uniquely mapped reads number\|Number of reads mapped to multiple loci" | awk -F "|\t" '{SUM+=$2}END{print "exogenous_miRNA\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats
    150. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA
    151. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_ --genomeDir /exceRpt_DB/ribosomeDatabase/exogenous_rRNAs --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA/exogenous_miRBase_Unmapped.R1.fastq.gz --outReadsUnmapped Fastx --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    152. ## Input to exogenous rRNA alignment
    153. grep "Number of input reads" /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.final.out | tr '[:blank:]' ' ' | awk -F " \\\| " '{print "input_to_exogenous_rRNA\t"$2}' >> /exceRptOutput/testData_human.fastq.stats
    154. ## Assigned non-redundantly to annotated exogenous rRNAs
    155. cat /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.final.out | grep "Uniquely mapped reads number\|Number of reads mapped to multiple loci" | awk -F "|\t" '{SUM+=$2}END{print "exogenous_rRNA\t"SUM}' >> /exceRptOutput/testData_human.fastq.stats
    156. #/exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Aligned.out.bam | awk '{print $1}' | sort | uniq | wc -l | awk '{print "exogenous_rRNA\t"$0}' >> /exceRptOutput/testData_human.fastq.stats
    157. ## compress and tidy up
    158. gzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Unmapped.out.mate1 > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz
    159. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Unmapped.out.mate1
    160. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Log.out
    161. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/exogenous_rRNA_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | sort -k 1,1 -k 2,2 | uniq > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt 2>> /exceRptOutput/testData_human.fastq.log
    162. #
    163. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessExogenousAlignments -taxonomyPath /exceRpt_DB/NCBI_taxonomy_taxdump -min 0.001 -frac 0.95 --minReads 3 -batchSize 20000 -alignments /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt --rdp > /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.tmp 2>> /exceRptOutput/testData_human.fastq.log
    164. #
    165. # Tidy up
    166. mv /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.tmp /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.result.taxaAnnotated.txt
    167. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.txt
    168. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes
    169. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    170. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    171. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    172. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    173. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    174. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria6_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA6 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    175. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria7_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA7 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    176. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria8_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA8 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    177. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria9_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA9 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    178. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria10_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_BACTERIA10 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    179. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes
    180. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    181. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    182. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    183. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    184. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_PLANTS5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    185. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes
    186. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    187. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    188. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    189. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    190. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa5_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_METAZOA5 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    191. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes
    192. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_FUNGI_PROTIST_VIRUS --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    193. mkdir -p /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes
    194. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate1_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE1 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    195. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate2_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE2 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    196. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate3_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE3 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    197. /exceRpt_bin/STAR-2.5.4b/bin/Linux_x86_64/STAR --runThreadN 50 --outFileNamePrefix /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate4_ --genomeDir /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_GENOME_VERTEBRATE4 --readFilesIn /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz --parametersFiles /exceRpt_DB/Genomes_BacteriaFungiMammalPlantProtistVirus/STAR_Parameters_Exogenous.in --outSAMtype BAM Unsorted --outSAMattributes Standard --alignEndsType EndToEnd --outFilterMatchNmin 18 --outFilterMatchNminOverLread 1.0 --outFilterMismatchNmax 0 --outFilterMismatchNoverLmax 0.3 >> /exceRptOutput/testData_human.fastq.log
    198. #
    199. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    200. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    201. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    202. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    203. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    204. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria6_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    205. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria7_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    206. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria8_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    207. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria9_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    208. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Bacteria10_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[$:]");print $1"\tBacteria\t"a[1]"\t"a[7]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    209. #
    210. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    211. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    212. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    213. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    214. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Plants5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    215. #
    216. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    217. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    218. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    219. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    220. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Metazoa5_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    221. #
    222. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Virus:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,"[:|]");print $1"\t"a[1]"\t"a[3]"\t"a[5]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    223. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Fungi:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    224. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/FungiProtistVirus_Aligned.out.bam | grep "Protist:" | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    225. #
    226. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate1_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    227. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate2_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    228. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate3_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    229. /exceRpt_bin/samtools-1.7/samtools view /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/Vertebrate4_Aligned.out.bam | awk '{print $1,$3,$4,$6,$10}' | uniq | awk '{split($2,a,":");print $1"\t"a[1]"\t"a[2]"\t"a[3]"\t"$3"\t"$4"\t"$5}' >> /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    230. #
    231. cat /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt | sort -k 1,1 | gzip -c > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz
    232. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.txt
    233. #
    234. ## Input to exogenous genome alignment
    235. gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA/unaligned.fq.gz | wc -l | awk '{print "input_to_exogenous_genomes\t"$1/4}' >> /exceRptOutput/testData_human.fastq.stats
    236. ## Count reads mapped to exogenous genomes:
    237. gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz | awk '{print $1}' | uniq | wc -l | awk '{print "exogenous_genomes\t"$1}' >> /exceRptOutput/testData_human.fastq.stats
    238. #
    239. gunzip -c /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt.gz > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt
    240. java -Xmx200G -jar /exceRpt_bin/exceRpt_Tools.jar ProcessExogenousAlignments -taxonomyPath /exceRpt_DB/NCBI_taxonomy_taxdump -min 0.001 -frac 0.95 -batchSize 500000 -minReads 3 -alignments /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt > /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.tmp 2>> /exceRptOutput/testData_human.fastq.log
    241. # Tidy up
    242. mv /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.tmp /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.result.taxaAnnotated.txt
    243. rm /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.sorted.txt
    244. ## Wrap up logging and stats files
    245. #
    246. ## Adapter confidence
    247. echo -e "known: " >> /exceRptOutput/testData_human.fastq.qctmp
    248. cat /exceRptOutput/testData_human.fastq/testData_human.fastq.knownAdapterSeq >> /exceRptOutput/testData_human.fastq.qctmp
    249. echo -e "used: " >> /exceRptOutput/testData_human.fastq.qctmp
    250. cat /exceRptOutput/testData_human.fastq/testData_human.fastq.adapterSeq >> /exceRptOutput/testData_human.fastq.qctmp
    251. cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' ' ' | awk -F ' ' '{if($2=="used:"){ if(NF==2){print "Adapter_confidence: LOW"}else{print "Adapter_confidence: WARN_unableToGuessAdapter_usingProvided("$3")"}}else{if($2==$4){print "Adapter_confidence: HIGH"}else{print "Adapter_confidence: WARN_providedAdapter("$4")DisagreesWithGuessed("$2")"}}}' > /exceRptOutput/testData_human.fastq.qcResult
    252. #
    253. ## Calculate QC result
    254. cat /exceRptOutput/testData_human.fastq.stats | grep "^input" | head -n 1 | awk '{print $2}' > /exceRptOutput/testData_human.fastq.qctmp
    255. cat /exceRptOutput/testData_human.fastq.stats | grep "^genome" | awk '{print $2}' >> /exceRptOutput/testData_human.fastq.qctmp
    256. cat /exceRptOutput/testData_human.fastq.stats | grep "sense" | awk '{SUM+=$2}END{print SUM}' >> /exceRptOutput/testData_human.fastq.qctmp
    257. cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' '\t' | awk '{result="FAIL"; ratio=0; if($2>0){ratio=$3/$2}; if(ratio>0.5 && $3>100000)result="PASS"}END{print "QC_result: "result"\nInputReads: "$1"\nGenomeReads: "$2"\nTranscriptomeReads: "$3"\nTranscriptomeGenomeRatio: "ratio}' >> /exceRptOutput/testData_human.fastq.qcResult
    258. gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz | wc -l > /exceRptOutput/testData_human.fastq.qctmp
    259. gunzip -c /exceRptOutput/testData_human.fastq/endogenousAlignments_Accepted.txt.gz | awk '{print $2}' | uniq | wc -l >> /exceRptOutput/testData_human.fastq.qctmp
    260. #
    261. cat /exceRptOutput/testData_human.fastq.qctmp | tr '\n' '\t' | awk '{if($1>0){print "TranscriptomeComplexity: "($2/$1)}else{print "TranscriptomeComplexity: 0"}}' >> /exceRptOutput/testData_human.fastq.qcResult
    262. rm /exceRptOutput/testData_human.fastq.qctmp
    263. #
    264. ## Compress core results files automatically
    265. ls -lh /exceRptOutput/testData_human.fastq | awk '{print $9}' | grep "readCounts_\|.readLengths.txt\|_fastqc.zip\|.counts\|.knownAdapterSeq\|.adapterSeq\|.qualityEncoding\|.CIGARstats.txt\|.coverage.txt" | awk '{print "testData_human.fastq/"$1}' > /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.log >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.stats >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; echo testData_human.fastq.qcResult >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq | awk '{print $9}' | grep "calibratormapped.counts" | awk '{print "testData_human.fastq/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_miRNA | awk '{print $9}' | grep "readCounts_" | awk '{print "testData_human.fastq/EXOGENOUS_miRNA/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_rRNA | awk '{print $9}' | grep "ExogenousRibosomalAlignments.result.taxaAnnotated.txt" | awk '{print "testData_human.fastq/EXOGENOUS_rRNA/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt; ls -lh /exceRptOutput/testData_human.fastq/EXOGENOUS_genomes | awk '{print $9}' | grep "ExogenousGenomicAlignments.result.taxaAnnotated.txt" | awk '{print "testData_human.fastq/EXOGENOUS_genomes/"$1}' >> /exceRptOutput/testData_human.fastq_filesToCompress.txt
    266. tar -cvz -C /exceRptOutput -T /exceRptOutput/testData_human.fastq_filesToCompress.txt -f /exceRptOutput/testData_human.fastq_CORE_RESULTS_v4.6.3.tgz 2> /dev/null
    267. testData_human.fastq/endogenousAlignments_genomeMapped_transcriptome_Aligned.out.sorted.bam.coverage.txt
    268. testData_human.fastq/endogenousAlignments_genome_Aligned.out.bam.CIGARstats.txt
    269. testData_human.fastq/readCounts_gencode_antisense.txt
    270. testData_human.fastq/readCounts_gencode_antisense_geneLevel.txt
    271. testData_human.fastq/readCounts_gencode_sense.txt
    272. testData_human.fastq/readCounts_gencode_sense_geneLevel.txt
    273. testData_human.fastq/readCounts_miRNAmature_sense.txt
    274. testData_human.fastq/readCounts_miRNAprecursor_sense.txt
    275. testData_human.fastq/readCounts_piRNA_sense.txt
    276. testData_human.fastq/readCounts_tRNA_sense.txt
    277. testData_human.fastq/testData_human.fastq.adapterSeq
    278. testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.noRiboRNA_fastqc.zip
    279. testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.rRNA.counts
    280. testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.readLengths.txt
    281. testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered.uniVecContaminants.counts
    282. testData_human.fastq/testData_human.fastq.clipped.trimmed.filtered_fastqc.zip
    283. testData_human.fastq/testData_human.fastq.knownAdapterSeq
    284. testData_human.fastq/testData_human.fastq.qualityEncoding
    285. testData_human.fastq.log
    286. testData_human.fastq.stats
    287. testData_human.fastq.qcResult
    288. testData_human.fastq/EXOGENOUS_rRNA/ExogenousRibosomalAlignments.result.taxaAnnotated.txt
    289. testData_human.fastq/EXOGENOUS_genomes/ExogenousGenomicAlignments.result.taxaAnnotated.txt
    290. rm /exceRptOutput/testData_human.fastq_filesToCompress.txt
    291. ## END PIPELINE

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum