Mapping reads to a reference genome

huang 0 like s 607 view s

Tags: Methods

    conda activate viral3

    for sample in CSF_EH_RNA_S1; do 
    java -jar /home/jhuang/Tools/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 ./raw_data/${sample}_R1_001.fastq.gz ./raw_data/${sample}_R2_001.fastq.gz trimmed/${sample}_R1.fastq.gz trimmed/${sample}_unpaired_R1.fastq.gz trimmed/${sample}_R2.fastq.gz trimmed/${sample}_unpaired_R2.fastq.gz ILLUMINACLIP:/home/jhuang/Tools/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 AVGQUAL:20;
    done


    #The header should be 'ON340918'
    ref_fa="ON340918.fa";
    #coverage plots
    for sample in CSF_EH_RNA_S1; do 
      #bowtie2-build ${ref_fa} ${ref_fa}
      #bowtie2 -1 trimmed/${sample}_R1.fastq.gz -2 trimmed/${sample}_R2.fastq.gz -x ${ref_fa}.fasta --fast --threads 16 -S bams/${sample}.sam
      #samtools view -h -Sb bams/${sample}.sam > bams/${sample}.bam
      bwa index ${ref_fa};
      bwa mem -M -t 14 ${ref_fa} trimmed/${sample}_R1.fastq.gz trimmed/${sample}_R2.fastq.gz | samtools view -bS - > bams/${sample}.bam;
      bwa mem -M -t 14 ${ref_fa} trimmed/${sample}_R1.fastq.gz trimmed/${sample}_R2.fastq.gz | samtools view -bS -F 256 - > bams/${sample}_uniqmap.bam;
      samtools sort bams/${sample}.bam > bams/${sample}_sorted.bam
      samtools index bams/${sample}_sorted.bam  
      samtools depth -d 1000000 bams/${sample}_sorted.bam | gzip > ${sample}.cov.gz
      ~/Tools/damian_extended/lib_py/vipr_cov_pdf_sample.py -c ${sample}.cov.gz -p ${sample}.pdf
    done

    AF_VS_COV_PDF = "#{BASEPATH}/lib_py/vipr_af_vs_cov_pdf_1sample.py"
    AF_VS_COV_HTML = "#{BASEPATH}/3rd_party/vipr-tools/src/vipr_af_vs_cov_html.py"
    JOINER = "#{BASEPATH}/3rd_party/vipr-tools/src/simple_contig_joiner.py"
    FILL_DP_ON_VPHASER = "#{BASEPATH}/lib_py/fill_DP_on_vphaser2.py"
    POLISHER = "#{BASEPATH}/3rd_party/vipr-tools/src/polish_viral_ref.sh"
    GAPPER = "#{BASEPATH}/3rd_party/vipr-tools/src/vipr_gaps_to_n.py"

    #downsample 
    seqtk sample -s100 CSF_EH_RNA_S1_R1.fastq.gz 1000000 > CSF_EH_RNA_downsampled_R1.fastq.gz
    seqtk sample -s100 CSF_EH_RNA_S1_R2.fastq.gz 1000000 > CSF_EH_RNA_downsampled_R2.fastq.gz

    ~/Tools/damian_extended/3rd_party/vipr-tools/src/polish_viral_ref.sh -1 trimmed/CSF_EH_RNA_downsampled_R1.fastq.gz -2 trimmed/CSF_EH_RNA_downsampled_R2.fastq.gz -r ON340918.fa -o polished.fa -t 8
    ~/Tools/damian_extended/3rd_party/vipr-tools/src/vipr_gaps_to_n.py -i polished.fa -c CSF_EH_RNA_S1.cov.gz > gapped_contig.fa

    ~/Tools/csv2xls-0.4/csv_to_xls.py CSF_RNA_on_ON340918.cov  -d$'\t' -o  CSF_RNA_on_ON340918_coverage.xls

Mapping reads to a reference genome is a common task in bioinformatics analysis. This process involves aligning short reads from a sequencing experiment to a known reference genome, allowing for downstream analysis such as variant calling and differential expression analysis.

There are several tools available for mapping reads to a reference genome, including:

  1. Bowtie/Bowtie2: These are fast and memory-efficient alignment tools that use the Burrows-Wheeler transform (BWT) algorithm to align reads to a reference genome.

  2. BWA: This is another widely used alignment tool that uses the BWT algorithm to align reads to a reference genome.

  3. STAR: This is a tool specifically designed for RNA sequencing data that can align reads to a reference genome and identify splice junctions.

  4. HISAT2: This is another RNA sequencing alignment tool that uses a hierarchical indexing strategy to align reads to a reference genome.

To map reads to a reference genome using any of these tools, you will typically need the following inputs:

  1. A reference genome in FASTA format.
  2. Short reads in FASTQ format.
  3. The specific software tool you want to use for alignment.

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum