Short-Read Sequencing vs Long-Read Sequencing

gene_x 0 like s 284 view s

Tags: sequencing

When dealing with sequencing libraries, particularly when working with short-read (e.g., Illumina) and long-read (e.g., Nanopore or PacBio) technologies, understanding their error profiles, and how to process and analyze the data is crucial. Below is an explanation of these concepts and some practical steps for managing and analyzing the data. Error Rates in Sequencing Technologies

* Short-Read Sequencing (e.g., Illumina):
   - Error Rates: Generally low, around 0.1% to 1%.
   - Advantages: High accuracy, high throughput, and good for variant detection.
   - Disadvantages: Short read lengths, which can make it challenging to resolve repetitive regions and complex structural variations.

* Long-Read Sequencing (e.g., Nanopore, PacBio):
   - Error Rates: Higher, ranging from 5% to 20% for individual reads.
   - Advantages: Long reads, which can span entire genes or large structural variations, making assembly and complex variant detection easier.
   - Disadvantages: Higher error rates and lower throughput compared to short-read technologies.

Practical Steps for Data Processing

* Data Preprocessing:
   - Quality Control: Use tools like FastQC to assess the quality of sequencing data.
   - Trimming: Remove low-quality bases and adapters using tools like Trimmomatic (short-read) or Porechop (long-read).

* Assembly and Alignment:
   - Short-Read Assembly: Use assemblers like SPAdes or Velvet.
   - Long-Read Assembly: Use assemblers like Canu, Flye, or Shasta.
   - Hybrid Assembly: Combine both short and long reads using tools like Unicycler or MaSuRCA.

* Error Correction:
   - Short-Read Correction: Generally not needed due to low error rates.
   - Long-Read Correction: Use tools like Nanocorrect or FMLRC to correct long-read data using short reads.

* Variant Calling:
   - Short-Read Variant Calling: Use tools like GATK or FreeBayes.
   - Long-Read Variant Calling: Use tools like Medaka (Nanopore) or Longshot (PacBio).
   - Integrative Analysis: Combine data using WhatsHap for phasing and DeepVariant for accurate variant calling.

Pacbio Sequel 20Kb (Microorganism)

Pacbio Sequel 10Kb (Microorganism)

<=800bp

Nanopore (Microorganism)

PacBio barcode library (Microorganism)

PacBio Revio library

Cyclone normal long library

Sequencing services for microorganisms:

  1. PacBio Sequel 20Kb (Microorganism)

    PacBio Sequel: This is a sequencing platform developed by Pacific Biosciences, known for generating long reads. 20Kb: Refers to the average length of the DNA fragments (20,000 base pairs) that are sequenced. Longer reads are particularly useful for de novo assembly, resolving complex regions, and identifying structural variations. Microorganism: Indicates that this service is optimized for sequencing microbial genomes, which can be challenging due to their diverse and complex genetic content.

  2. PacBio Sequel 10Kb (Microorganism)

    PacBio Sequel: Same platform as above. 10Kb: Refers to a shorter average read length of 10,000 base pairs. These reads are still long compared to other technologies and useful for similar applications, but might be chosen for different balance of throughput and read length depending on the project needs. Microorganism: Again, optimized for microbial genomes.

  3. <=800bp

    <=800bp: This likely refers to a sequencing service that generates reads of up to 800 base pairs in length. This could be indicative of Sanger sequencing or certain targeted sequencing applications where short reads are sufficient and high accuracy is required.

  4. Nanopore (Microorganism)

    Nanopore: Refers to Oxford Nanopore Technologies (ONT) sequencing, which can produce very long reads (up to several megabases) but with higher error rates compared to short-read technologies. Microorganism: Tailored for microbial genome sequencing. ONT is useful for its ability to sequence long stretches of DNA, providing comprehensive insights into genome structure and function.

  5. PacBio barcode library (Microorganism)

    PacBio barcode library: A library preparation method that includes barcoding (adding unique sequences to DNA fragments). This allows multiplexing of multiple samples in a single sequencing run, distinguishing them bioinformatically afterward. Microorganism: Optimized for microbial samples. Barcoding is particularly useful in high-throughput studies where multiple microbial genomes are sequenced simultaneously.

  6. PacBio Revio library

    PacBio Revio: Refers to a newer or advanced library preparation method from PacBio, possibly associated with the Revio system (or similar advanced sequencers). The details might be specific to the latest improvements in sequencing chemistry and protocols that enhance read length, accuracy, or throughput. Library: Refers to the prepared DNA ready for sequencing on the PacBio platform.

  7. Cyclone normal long library

    Cyclone: This term is not widely recognized in the current sequencing technologies or literature, which suggests it might be a proprietary or specific method/service offered by BGI. It could be a specialized library preparation method that BGI has developed, focusing on certain aspects of long-read sequencing. Normal long library: Likely indicates that this service involves preparing long-read sequencing libraries (similar to those used in PacBio or Nanopore sequencing) but with a "normal" protocol that might be standard or default for general long-read sequencing projects.

Summary

PacBio Sequel 20Kb and 10Kb: Long-read sequencing options for microbial genomes, with average read lengths of 20Kb and 10Kb, respectively.
<=800bp: Short-read or targeted sequencing, possibly high accuracy for specific applications.
Nanopore (Microorganism): Long-read sequencing from Oxford Nanopore, tailored for microbial genomes.
PacBio barcode library (Microorganism): Barcoded sequencing library preparation for multiplexing microbial samples.
PacBio Revio library: Likely refers to advanced or newer library preparation methods for PacBio sequencing.
Cyclone normal long library: Likely a BGI-specific or proprietary long-read sequencing library preparation method.

Comparison of the precision of three popular sequencing technologies: PacBio, Nanopore, and Illumina.

  • PacBio (Pacific Biosciences)

    • Technology: Single Molecule, Real-Time (SMRT) Sequencing
    • Read Length: Long reads, often exceeding 10,000 base pairs, with some reads over 100,000 base pairs.
    • Accuracy:
    • Raw Read Accuracy: Approximately 85-90%
    • Consensus Accuracy: Greater than 99.9% after error correction through multiple reads
    • Strengths:
    • Excellent for detecting structural variants and large insertions/deletions. High-quality assembly of genomes with complex regions. Limitations: Higher error rates in raw reads compared to Illumina. More expensive per base compared to other technologies.
  • Nanopore (Oxford Nanopore Technologies)

    • Technology: Nanopore Sequencing
    • Read Length: Ultra-long reads, theoretically limited only by the length of the DNA molecule, with some reads over 2 million base pairs.
    • Accuracy:
    • Raw Read Accuracy: Approximately 90-95%
    • Consensus Accuracy: Greater than 99% with sufficient coverage and error correction
    • Strengths:
    • Ability to produce very long reads.
    • Portable and scalable devices (e.g., MinION, PromethION).
    • Limitations:
    • Higher raw read error rate compared to Illumina.
    • Requires high coverage for accurate consensus sequences.
  • Illumina

    • Technology: Sequencing by Synthesis (SBS)
    • Read Length: Short reads, typically 150-300 base pairs.
    • Accuracy:
    • Raw Read Accuracy: Greater than 99.9%
    • Consensus Accuracy: Very high due to low error rates in raw reads
    • Strengths:
    • Extremely high accuracy and throughput.
    • Cost-effective for large-scale projects.
    • Limitations:
    • Short read length can make it challenging to resolve complex regions of the genome.
    • Limited ability to detect large structural variants.
  • Summary

  • PacBio: Best for long-read sequencing with high consensus accuracy after error correction, ideal for complex genomes and structural variant analysis.

  • Nanopore: Offers ultra-long reads and portable sequencing options, with improving accuracy, making it versatile for various applications.
  • Illumina: Provides the highest raw read accuracy and throughput, perfect for applications requiring short reads and high precision.

Each technology has its unique strengths and is chosen based on the specific requirements of the sequencing project.

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum