Transposable element (TEs) and structural variant (SV) detection in bacterial genomes

gene_x 0 like s 195 view s

Tags: bash, DNA-seq

For a bacterial genome such as Acinetobacter baumannii, the pipeline would be slightly different than the one used for the human due to the simpler genome structure, the absence of introns, and the different nature of repetitive elements compared to eukaryotes. Here is a pipeline tailored to bacterial genomes:

  1. Quality Control of Sequencing Reads: Use FastQC for quality control checks on raw sequence data.

    fastqc your_reads.fastq.gz
  2. Read Trimming: Trim adapters and low-quality bases using Trimmomatic.

    trimmomatic PE your_reads_R1.fastq.gz your_reads_R2.fastq.gz \
      your_reads_R1_paired.fastq.gz your_reads_R1_unpaired.fastq.gz \
      your_reads_R2_paired.fastq.gz your_reads_R2_unpaired.fastq.gz \
  3. De Novo Assembly or Reference Alignment: If a closely related reference genome is available:

    bwa mem reference_genome.fasta your_reads_R1_paired.fastq.gz your_reads_R2_paired.fastq.gz > aligned_reads.sam

    Or for de novo assembly of the bacterial genome: -1 your_reads_R1_paired.fastq.gz -2 your_reads_R2_paired.fastq.gz --careful -o spades_output

    Then, continue with the contigs (if de novo assembly was performed):

    bwa mem reference_genome.fasta spades_output/contigs.fasta > aligned_contigs.sam
  4. SAM/BAM Conversion and Sorting: Use SAMtools to convert and sort the alignment files.

    samtools view -bS aligned_reads.sam > aligned_reads.bam
    samtools sort aligned_reads.bam -o aligned_reads_sorted.bam
    samtools index aligned_reads_sorted.bam
  5. Detection of Transposable Elements: For bacterial genomes, tools like ISfinder, TnpPred, or custom scripts utilizing BLAST can be used to identify Insertion Sequences (IS) and other mobile elements.

    # For ISfinder, it's usually a web-based tool or database search.
    # For TnpPred: -i spades_output/contigs.fasta -o TnpPred_output
  6. Annotation of Assembled Genome or Contigs: Use Prokka to annotate the assembled genome or contigs.

    prokka --outdir prokka_output --prefix ab_contigs spades_output/contigs.fasta
  7. Structural Variant Detection: For structural variants including transposable elements, you can use tools like MUMmer for comparing the assembled contigs to the reference genome.

    nucmer --mum reference_genome.fasta spades_output/contigs.fasta -p output_prefix
  8. Visualization: Tools like Artemis or IGV can be used to visualize the annotated genome and identify regions with transposable elements.

    # Launch Artemis or IGV and load your BAM files and annotations for visualization.
  9. Tools for detection of Transposable Elements (namely the step 5 above)

    • ISfinder:

      # ISfinder does not come with a command-line tool. It's an online resource.
      # You would download your sequence's IS annotations from ISfinder after submitting your sequences on their website.
    • RepeatMasker:

      RepeatMasker -species bacteria -pa 4 -xsmall your_genome_sequence.fasta
    • TnpPred:

      # First, download the TnpPred tool, then:
      perl -i your_genome_sequence.fasta -o output_directory
    • MobileElementFinder (MEF):

      # Assuming you have installed MobileElementFinder, the basic command would be:
      python -i your_contigs.fasta -o mef_output -d mef_database_path
    • OASIS:

      # OASIS is an online service, so you would use the web interface to submit your sequence data.
    • Meta-Mobilome:

      # Similarly, Meta-Mobilome is an online tool, you would need to upload your data through their web portal.
    • ICEberg:

      # ICEberg is also used via a web interface for annotation and detection of ICEs.
    • MobilomeFINDER:

      # This service is web-based as well. You will need to interact with the MobilomeFINDER platform through the browser.
  10. For structural variant (SV) detection in bacterial genomes (namely the step 7 above), we can consider using a range of tools designed to detect large genomic rearrangements, such as insertions, deletions, inversions, and translocations. Note that SV detection tools often require pre-processed data, such as alignment files (e.g., BAM files), which you need to create by mapping your reads to a reference genome with tools like BWA or Bowtie2. Some of these tools are also designed with eukaryotic genomes in mind, so their default settings might not be optimal for bacterial genomes, and you may need to adjust parameters accordingly. Here are some command-line examples for some of the tools that can be used for SV detection:

    • MUMmer (particularly nucmer and show-diff for comparing assemblies):

      nucmer --mum reference.fasta query.fasta -p output_prefix
      show-diff -r > output_prefix.diffs
    • DELly (originally designed for human genomes, but can be adapted for bacteria with long-read data):

      delly call -g reference.fasta -o output.bcf input.bam
    • Pindel (can detect large deletions and insertions):

      pindel -f reference.fasta -i config.txt -c ALL -o output_prefix
    • LUMPY (a probabilistic framework for SV discovery):

      lumpyexpress -B input.bam -S input.splitters.bam -D input.discordants.bam -o output.vcf
    • BreakDancer (can predict various types of SVs):

      breakdancer-max config.txt > output.ctx
    • breseq (especially good for short-read data in bacteria):

      breseq -r reference.gbk input_reads.fastq -o output_directory

like unlike






© 2023 Impressum