RNA-seq data analysis of Yersinia on GRCh38

gene_x 0 like s 573 view s

Tags: processing, bash, pipeline, RNA-seq

YopM is a systematic antagonist of YopP in suppressing inflammatory gene expression in Yersinia-infected human macrophages

  1. goal

    # 0th heatmap see manuscript: The 
    # 1.th heatmap mock WAC WAP 1.5 + 6 hours: WAC, WA314 comparing to mock
    # 2.th heatmap deltaYopM, deltaYopP, deltaYopQ Vergleich zu WAP (6 hours). The sample deltaYopMP is completed deleted: deltaM, deltaP, deltaQ to WA314
    
  2. download the batch 1 data

    #-- PRJEB10086 --
    
    mock_6h_a
    mock_6h_b
    mock_90min_a
    mock_90min_b
    WA314_6h_a
    WA314_6h_b
    WA314_90min_a
    WA314_90min_b
    WA314dYopM_6h_a
    WA314dYopM_6h_b
    WA314dYopM_90min_a
    WA314dYopM_90min_b
    
    #-- PRJEB45780 --
    
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924425/mock_DoA.fastq.gz
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924426/mock_DoII.fastq.gz
    #mv mock_DoA.fastq.gz mock_6h_DoA.fastq.gz
    #mv mock_DoII.fastq.gz mock_6h_DoII.fastq.gz
    
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924427/WAC1_5h_DoA.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924428/WAC1_5h_DoII.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924429/WAC6h_DoA.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924430/WAC6h_DoII.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924431/WAP1_5h_DoA.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924421/WAP1_5h_DoII.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924422/WAP6h_DoA.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924423/WAP6h_DoII.fastq.gz
    
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924420/dP6h_DoII.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924424/dP6h_DoIII.fastq.gz
    
    #-- PRJEB44958 --
    
    ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087955/dYopP_1.5h_DonorII.fastq.gz
    ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087956/dYopP_1.5h_DonorIII.fastq.gz
    
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087951/dYopMP_1.5h_DonorII.fastq.gz
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087952/dYopMP_1.5h_DonorIII.fastq.gz
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087953/dYopMP_6h_DonorII.fastq.gz
    wget ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087954/dYopMP_6h_DonorIII.fastq.gz
    
    # ---- 1st. batch: 1.5h + 6h ---- 
    mock
    WAC
    WA314 == WAP
    dYopM,dYopP
    
    E-MTAB-10602
    E-MTAB-10473
    
    Project: PRJEB45780
    Pathogenic bacteria Yersinia enterocolitica injects virulence plasmid-encoded effectors through the type three secretion system into macrophages to modulate gene expression. Here we analyzed the effect on gene expression in primary human macrophages of Y. enterocolitica strains lacking effector YopP (1.5 h infection) or effectors YopP and YopM (1.5 h or 6 h infection) simultaneously using RNA-seq. This is part of a larger sequencing experiment for which other samples can be found in EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-10473 and European Nucleotide Archive (ENA) at http://www.ebi.ac.uk/ena/data/view/PRJEB10086.
    
    ./ena-file-download-read_run-PRJEB10086-submitted_ftp-20231220-1619_sorted.sh
    #? check WA314_90min_a.fastq.gz == WAP1_5h_DoA.fastq.gz
    #? check WA314_90min_b.fastq.gz == WAP1_5h_DoII.fastq.gz
    #? check WA314_6h_a.fastq.gz == WAP6h_DoA.fastq.gz
    #? check WA314_6h_b.fastq.gz == WAP6h_DoII.fastq.gz
    #TODO: if they are not the same, take WA314_* as the WAP-samples in the 1st batch!
    
    cat mock_90min_a_L004_R1_001.fastq.gz mock_90min_a_L005_R1_001.fastq.gz mock_90min_a_L006_R1_001.fastq.gz mock_90min_a_L007_R1_001.fastq.gz mock_90min_a_L008_R1_001.fastq.gz > mock_90min_a.fastq.gz
    cat mock_90min_b_L004_R1_001.fastq.gz mock_90min_b_L005_R1_001.fastq.gz mock_90min_b_L006_R1_001.fastq.gz mock_90min_b_L007_R1_001.fastq.gz mock_90min_b_L008_R1_001.fastq.gz > mock_90min_b.fastq.gz
    cat mock_6h_a_L004_R1_001.fastq.gz mock_6h_a_L005_R1_001.fastq.gz mock_6h_a_L006_R1_001.fastq.gz mock_6h_a_L007_R1_001.fastq.gz mock_6h_a_L008_R1_001.fastq.gz > mock_6h_a.fastq.gz
    cat mock_6h_b_L004_R1_001.fastq.gz mock_6h_b_L005_R1_001.fastq.gz mock_6h_b_L006_R1_001.fastq.gz mock_6h_b_L007_R1_001.fastq.gz mock_6h_b_L008_R1_001.fastq.gz > mock_6h_b.fastq.gz
    
    cat WA314_90min_a_L004_R1_001.fastq.gz WA314_90min_a_L005_R1_001.fastq.gz WA314_90min_a_L006_R1_001.fastq.gz WA314_90min_a_L007_R1_001.fastq.gz WA314_90min_a_L008_R1_001.fastq.gz > WA314_90min_a.fastq.gz
    cat WA314_90min_b_L004_R1_001.fastq.gz WA314_90min_b_L005_R1_001.fastq.gz WA314_90min_b_L006_R1_001.fastq.gz WA314_90min_b_L007_R1_001.fastq.gz WA314_90min_b_L008_R1_001.fastq.gz > WA314_90min_b.fastq.gz
    cat WA314_6h_a_L004_R1_001.fastq.gz WA314_6h_a_L005_R1_001.fastq.gz WA314_6h_a_L006_R1_001.fastq.gz WA314_6h_a_L007_R1_001.fastq.gz WA314_6h_a_L008_R1_001.fastq.gz > WA314_6h_a.fastq.gz
    cat WA314_6h_b_L004_R1_001.fastq.gz WA314_6h_b_L005_R1_001.fastq.gz WA314_6h_b_L006_R1_001.fastq.gz WA314_6h_b_L007_R1_001.fastq.gz WA314_6h_b_L008_R1_001.fastq.gz > WA314_6h_b.fastq.gz
    
    cat WA314dYopM_90min_a_L004_R1_001.fastq.gz WA314dYopM_90min_a_L005_R1_001.fastq.gz WA314dYopM_90min_a_L006_R1_001.fastq.gz WA314dYopM_90min_a_L007_R1_001.fastq.gz WA314dYopM_90min_a_L008_R1_001.fastq.gz > WA314dYopM_90min_a.fastq.gz
    cat WA314dYopM_90min_b_L004_R1_001.fastq.gz WA314dYopM_90min_b_L005_R1_001.fastq.gz WA314dYopM_90min_b_L006_R1_001.fastq.gz WA314dYopM_90min_b_L007_R1_001.fastq.gz WA314dYopM_90min_b_L008_R1_001.fastq.gz > WA314dYopM_90min_b.fastq.gz
    cat WA314dYopM_6h_a_L004_R1_001.fastq.gz WA314dYopM_6h_a_L005_R1_001.fastq.gz WA314dYopM_6h_a_L006_R1_001.fastq.gz WA314dYopM_6h_a_L007_R1_001.fastq.gz WA314dYopM_6h_a_L008_R1_001.fastq.gz > WA314dYopM_6h_a.fastq.gz
    cat WA314dYopM_6h_b_L004_R1_001.fastq.gz WA314dYopM_6h_b_L005_R1_001.fastq.gz WA314dYopM_6h_b_L006_R1_001.fastq.gz WA314dYopM_6h_b_L007_R1_001.fastq.gz WA314dYopM_6h_b_L008_R1_001.fastq.gz > WA314dYopM_6h_b.fastq.gz
    
    #END
    
    ./WAC1_5h_DoA.fastq.gz
    ./WAC1_5h_DoII.fastq.gz
    ./WAC6h_DoA.fastq.gz
    ./WAC6h_DoII.fastq.gz
    
    ./WAP1_5h_DoA.fastq.gz
    ./WAP1_5h_DoII.fastq.gz
    ./WAP6h_DoA.fastq.gz
    ./WAP6h_DoII.fastq.gz
    
    ./dYopP_1.5h_DonorII.fastq.gz
    ./dYopP_1.5h_DonorIII.fastq.gz
    ./dP6h_DoII.fastq.gz
    ./dP6h_DoIII.fastq.gz
    
    mv dP6h_DoII.fastq.gz dYopP_6h_DoII.fastq.gz
    mv dP6h_DoIII.fastq.gz dYopP_6h_DoIII.fastq.gz
    
    mv WAC1_5h_DoA.fastq.gz   WAC_1.5h_DoA.fastq.gz
    mv WAC1_5h_DoII.fastq.gz  WAC_1.5h_DoII.fastq.gz
    mv WAC6h_DoA.fastq.gz     WAC_6h_DoA.fastq.gz
    mv WAC6h_DoII.fastq.gz    WAC_6h_DoII.fastq.gz
    
    mv WAP1_5h_DoA.fastq.gz   WAP_1.5h_DoA.fastq.gz
    mv WAP1_5h_DoII.fastq.gz  WAP_1.5h_DoII.fastq.gz
    mv WAP6h_DoA.fastq.gz     WAP_6h_DoA.fastq.gz
    mv WAP6h_DoII.fastq.gz    WAP_6h_DoII.fastq.gz
    
    mv WA314dYopM_90min_a.fastq.gz dYopM_90min_a.fastq.gz
    mv WA314dYopM_90min_b.fastq.gz dYopM_90min_b.fastq.gz
    mv WA314dYopM_6h_a.fastq.gz    dYopM_6h_a.fastq.gz
    mv WA314dYopM_6h_b.fastq.gz    dYopM_6h_b.fastq.gz
    
    #Desperated!
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924425/mock_DoA.fastq.gz
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR592/ERR5924426/mock_DoII.fastq.gz
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087953/dYopMP_6h_DonorII.fastq.gz
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087954/dYopMP_6h_DonorIII.fastq.gz
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087951/dYopMP_1.5h_DonorII.fastq.gz
    #wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR608/ERR6087952/dYopMP_1.5h_DonorIII.fastq.gz
    
  3. download the public data and prepare the batch 2 data

    # ---- data public (naive and LPS is public data)---- 
    jhuang@hamburg:/media/jhuang/Elements2/Data_Indra_RNASeq_GSM2262901/raw_data
    
    #Dataset name GEO ID Series Experiment
    
    #- naive macrophages GSM2679941 GSE100382 RNA-seq (RNA_N_rep1)
    #    SRR5746425 4,000,000   204M    136Mb   2017-06-25
    #    SRR5746426 4,000,000   204M    137Mb   2017-06-25
    #    SRR5746427 4,000,000   204M    137.2Mb 2017-06-25
    #    SRR5746428 4,000,000   204M    135.6Mb 2017-06-25
    #    SRR5746429 4,000,000   204M    134.5Mb 2017-06-25
    #    SRR5746430 4,000,000   204M    136.2Mb 2017-06-25
    #    SRR5746431 903,661 46.1M   32.7Mb  2017-06-25
    cp RNA_N_rep1.fastq.gz ../raw_data/naive_rep1.fastq.gz
    
    #- naive macrophages GSM2679942 GSE100382 RNA-seq (RNA_N_rep2)
    #    SRR5746432 14,176,210  723M    350.2Mb 2017-06-25
    #    SRR5746433 14,157,173  722M    353.4Mb 2017-06-25
    cp RNA_N_rep2.fastq.gz ../raw_data/naive_rep2.fastq.gz
    
    #- LPS stimulated macrophages GSM2679944 GSE100382 RNA-seq (RNA_L_rep1)
    #    SRR5746436 4,000,000   204M    135.8Mb 2017-06-25
    #    SRR5746437 4,000,000   204M    137.1Mb 2017-06-25
    #    SRR5746438 4,000,000   204M    136.6Mb 2017-06-25
    #    SRR5746439 4,000,000   204M    135.4Mb 2017-06-25
    #    SRR5746440 4,000,000   204M    135.3Mb 2017-06-25
    #    SRR5746441 2,178,850   111.1M  76.7Mb  2017-06-25
    cp RNA_L_rep1.fastq.gz ../raw_data/LPS_rep1.fastq.gz
    
    #- LPS stimulated macrophages GSM2679945 GSE100382 RNA-seq (RNA_L_rep2)
    #    SRR5746442 17,351,249  884.9M  428Mb   2017-06-25
    #    SRR5746443 17,318,750  883.3M  431.8Mb 2017-06-25
    cp RNA_L_rep2.fastq.gz ../raw_data/LPS_rep2.fastq.gz
    
    #- naive macrophages GSM2262901 GSE85243 RNA-seq (RPMI_d6_6095)
    #    SRR4004025 11,559,434  485.5M  338.3Mb 2016-11-21
    cp RPMI_d6_6095.fastq.gz ../raw_data/naive_rep3.fastq.gz
    
    #- naive macrophages GSM2262902 GSE85243 RNA-seq (RPMI_d6_6718)
    #    SRR4004026 22,911,696  962.3M  635Mb   2016-11-21
    cp RPMI_d6_6718.fastq.gz ../raw_data/naive_rep4.fastq.gz
    
    #- LPS stimulated macrophages GSM2262906 GSE85243 RNA-seq (RPMI_d6_Restim_8749)
    #    SRR4004030 26,566,129  1.1G    681.3Mb 2016-11-21
    cp RPMI_d6_Restim_8749.fastq.gz ../raw_data/LPS_rep3.fastq.gz
    
    #- LPS stimulated macrophages GSM2262907 GSE85243 RNA-seq (RPMI_d6_Restim_8754)
    #    SRR4004031 28,719,690  1.2G    731.2Mb 2016-11-21
    cp RPMI_d6_Restim_8754.fastq.gz ../raw_data/LPS_rep4.fastq.gz
    
    #END
    
    # ---- 2nd. batch: 6h         jhuang@hamburg:~/DATA/Data_Soeren_RNA-seq_2022/Raw_Data ---- 
    mock
    WAC
    WAP
    deltaYopQ -->
    Vergleich zu WAP
    
    mv MOCK_A.fastq.gz ../raw_data/
    mv MOCK_B.fastq.gz ../raw_data/
    mv WAC_A.fastq.gz ../raw_data/
    mv WAC_B.fastq.gz ../raw_data/
    mv WAP_A.fastq.gz ../raw_data/
    mv WAP_B.fastq.gz ../raw_data/
    mv deltaQ_A.fastq.gz ../raw_data/
    mv deltaQ_B.fastq.gz ../raw_data/
    
  4. run nextflow

        (rnaseq) [jhuang@sage DATA]$ pwd
        /home/jhuang/DATA
        /home/jhuang/REFs/Homo_sapiens/Ensembl
        #"/home/jhuang/REFs/Homo_sapiens/UCSC/hg38/blacklists/hg38-blacklist.bed"
    
        nextflow run rnaseq/main.nf --input samplesheet.csv    --outdir results_GRCh38 --genome GRCh38   -profile test_full -resume --max_memory 300.GB --max_time 2400.h --save_reference --aligner star_salmon  --skip_deseq2_qc --skip_fastqc
    
        ln -s ~/Tools/rnaseq/assets/multiqc_config.yaml multiqc_config.yaml
        multiqc -f --config multiqc_config.yaml . 2>&1
        rm multiqc_config.yaml
    
  5. summary:

        #ENSG00000000419 DPM1    385.616
        #grep "ENSG00000000419" quant.genes.sf
        #ENSG00000000419 1159.45 909.453 15.8259 385.616
    
        Attached, you will find the results of the RNASeq analysis discussed in our last online meeting.
    
        The raw counts are located in Yersinia_RNAseq_results_GRCh38/star_salmon/salmon.merged.gene_counts.tsv. Note that this table was generated using the STAR-Salmon strategy, with STAR aligning the reads and Salmon quantifying transcript abundances. Salmon's advanced probabilistic model assigns fractional counts to transcripts, particularly when reads map to multiple similar transcripts. This is the reason non-integer numbers appear in this table.
    
        The samples included in the analysis are as follows:
    
        batch 1:
    
        ./mock_90min_a.fastq.gz
        ./mock_90min_b.fastq.gz
        ./mock_6h_a.fastq.gz
        ./mock_6h_b.fastq.gz
        ./WAC_1.5h_DoA.fastq.gz
        ./WAC_1.5h_DoII.fastq.gz
        ./WAC_6h_DoA.fastq.gz
        ./WAC_6h_DoII.fastq.gz
        ./WAP_1.5h_DoA.fastq.gz
        ./WAP_1.5h_DoII.fastq.gz
        ./WAP_6h_DoA.fastq.gz
        ./WAP_6h_DoII.fastq.gz
        ./WA314_90min_a.fastq.gz
        ./WA314_90min_b.fastq.gz
        ./WA314_6h_a.fastq.gz
        ./WA314_6h_b.fastq.gz
        ./dYopM_90min_a.fastq.gz
        ./dYopM_90min_b.fastq.gz
        ./dYopM_6h_a.fastq.gz
        ./dYopM_6h_b.fastq.gz
        ./dYopP_1.5h_DonorII.fastq.gz
        ./dYopP_1.5h_DonorIII.fastq.gz
        ./dYopP_6h_DoII.fastq.gz
        ./dYopP_6h_DoIII.fastq.gz
        ./mock_6h_DoA.fastq.gz
        ./mock_6h_DoII.fastq.gz
        ./dYopMP_1.5h_DonorII.fastq.gz
        ./dYopMP_1.5h_DonorIII.fastq.gz
        ./dYopMP_6h_DonorII.fastq.gz
        ./dYopMP_6h_DonorIII.fastq.gz
    
        batch public:
    
        naive_rep1.fastq.gz
        naive_rep2.fastq.gz
        LPS_rep1.fastq.gz
        LPS_rep2.fastq.gz
        naive_rep3.fastq.gz
        naive_rep4.fastq.gz
        LPS_rep3.fastq.gz
        LPS_rep4.fastq.gz
    
        batch 2:
    
        MOCK_A.fastq.gz (6h)
        MOCK_B.fastq.gz (6h)
        WAC_A.fastq.gz  (6h)
        WAC_B.fastq.gz  (6h)
        WAP_A.fastq.gz  (6h)
        WAP_B.fastq.gz  (6h)
        deltaQ_A.fastq.gz (6h)
        deltaQ_B.fastq.gz (6h)
    

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum