Designing Specific and Sensitive Nucleic Acid Probes for Bioinformatics Applications

gene_x 0 like s 361 view s

Tags: processing, scripts

Designing a probe for bioinformatics applications typically involves the creation of DNA or RNA sequences that can specifically hybridize to target nucleic acids in a sample. This process is commonly used for applications such as DNA microarrays, fluorescence in situ hybridization (FISH), and qPCR. Here are the general steps involved in designing a probe for bioinformatics:

  1. Define the target sequence: Identify the specific nucleic acid sequence (DNA or RNA) that you want to detect or analyze.

  2. Obtain reference sequences: Collect reference sequences related to the target sequence from databases like GenBank, RefSeq, or Ensemble.

  3. Select target regions: Choose specific regions within the target sequence that will allow for specific and sensitive detection. These regions should have unique characteristics, such as a conserved sequence or a single nucleotide polymorphism (SNP).

  4. Design probe sequence: Generate the complementary probe sequence based on the selected target region. The length of the probe sequence should be appropriate for the intended application (e.g., 18-30 nucleotides for qPCR, 50-70 nucleotides for FISH).

  5. Evaluate probe properties: Analyze the probe sequence for properties like melting temperature (Tm), GC content, and potential secondary structures. Ensure that these properties are within acceptable ranges for the intended application.

    ./BaitsTools/baitstools.rb checkbaits -i BLAST_2_NCCRmiRNA.csv -L 25 -c -N -C  -n0.0 -x100.0 -q67.0 -z76.0  -T DNA-DNA > BLAST_2_NCCRmiRNA_filtered_by_Tm.txt
    ../BaitsTools/baitstools.rb checkbaits -i out-baits.fa -L 25 -c -N -C  -n0.0 -x100.0 -q67.0 -z76.0  -T DNA-DNA > probes_filtered_by_Tm.txt
    #BLAST_2_NCCRmiRNA_filtered_by_Tm.txt contains 200 records.
    #The output file is out-filtered-baits.fa also contains 200 records.
    
  6. Cluster similar probes: Group probes with high sequence similarity together to reduce redundancy and save costs. This can be achieved using clustering algorithms such as hierarchical clustering or k-means clustering. By identifying and combining highly similar probes, you can streamline the probe set without sacrificing coverage of the target region.

    usearch -cluster_smallmem out-filtered-baits.fa -id 0.90 -centroids out-filtered-baits_clustered.fa
    
  7. Check for specificity: Perform in silico analyses (e.g., BLAST search) to ensure that the designed probe does not have significant homology to other sequences in the genome or transcriptome, which could lead to false-positive results.

    # -- simple version, it works well! --
    #------------------------------------------------------------------------------
    #Standard Nucleotide BLAST              11         On (DUST)           10
    #Search for short/near exact matches     7         Off               1000
    blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_on-human.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query 1_filtered_by_Tm.fasta -out 1_filtered_by_Tm_on-human.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query BLAST_2_NCCRmiRNA.csv -out BLAST_2_NCCRmiRNA_on-human_evalue0.1.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query BLAST_2_NCCRmiRNA.csv -out BLAST_2_NCCRmiRNA_on-human_evalue1.blastn -evalue 1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    #qseqid chr pident  length  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore
    
    # -- complicated version, it does not work well! --
    blastn -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query out-filtered-baits_clustered.fa -out blast_result_human.txt -evalue 1 -num_threads 15 -outfmt 6 -strand both
    paste f1 f2 f3_4 f5 f6_7 f8 > input_baitfilter.txt
    BaitFilter -m blast-a --ref-blast-db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -i input_baitfilter.txt --blast-first-hit-evalue 0.00000001 --blast-second-hit-evalue 0.00001 -o baits_excluding_human.out
    mv blast_result.txt blast_result_human.txt
    cut -f1 blast_result_human.txt > blast_result_human_f1.txt
    # using commands '| sort | uniq -c' exclude the baits.
    cat blast_result_human_f1.txt | sort | uniq -c | wc -l
    cat blast_result_human_f1.txt | sort | uniq -c > blast_result_human_counts.txt
    
  8. Check for self-matching: Assess the designed probe for self-complementarity, which could lead to the formation of hairpins or dimers that may decrease the probe's specificity and sensitivity. Use tools like Primer3, OligoAnalyzer, or UNAFold to identify and minimize self-complementarity in the probe sequence.

    makeblastdb -in 2_clustered_with_0.90.fasta -dbtype nucl   
    makeblastdb -in 1_filtered_by_Tm.fasta -dbtype nucl  
    # NOTE that '-strand minus' may be wrong, should be set as 'both': -strand <String, `both', `minus', `plus'>
    #qseqid sseqid  pident  length  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore
    blastn -task blastn-short -db 2_clustered_with_0.90.fasta -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_self-comp.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1  
    blastn -task blastn-short -db 1_filtered_by_Tm.fasta -query 1_filtered_by_Tm.fasta -out 1_filtered_by_Tm_self-comp.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1  
    makeblastdb -in 2_clustered_with_0.90.fasta -dbtype nucl         
    blastn -db 2_clustered_with_0.90.fasta -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_self-comp.blastn -evalue 1e-10 -num_threads 15 -outfmt 6  -strand minus  # -max_target_seqs 1
    blastn -db ligated_probes_clustered.fasta -query ligated_probes_clustered.fasta -out probes_clustered_on_human_self-comp.blastn -evalue 1e-10 -num_threads 15 -outfmt 6 -strand both
    
  9. Validate probe performance: Test the designed probe experimentally to confirm its specificity, sensitivity, and overall performance in the intended application.

  10. Analyze and interpret data: Use the designed probe in the bioinformatics application and analyze the resulting data to draw conclusions about the target sequence or biological system under investigation.

    ~/Tools/csv2xls-0.4/csv_to_xls.py 1_filtered_by_Tm_on-human_re.blastn 1_filtered_by_Tm_self-comp.blastn -d$'\t' -o 1_filtered_qualitycontrol.xls
    ~/Tools/csv2xls-0.4/csv_to_xls.py 2_clustered_with_0.90_on-human.fasta.blastn 2_clustered_with_0.90_self-comp.blastn -d$'\t' -o 2_clustered_with_0.90_qualitycontrol.xls
    ~/Tools/csv2xls-0.4/csv_to_xls.py 1_filtered_by_Tm.txt 2_clustered_with_0.90.txt -d$'\t' -o probes.xls
    ~/Tools/csv2xls-0.4/csv_to_xls.py BLAST_2_NCCRmiRNA_on-human_evalue0.1.blastn -d$'\t' -o BLAST_2_NCCRmiRNA_homology_to_human.xls
    #For most applications, an e-value cutoff of 1e-5 or 1e-6 is considered appropriate. This threshold balances the need for specificity while still allowing for some degree of similarity between the query sequence and the target genome. However, if you require higher stringency, you can use a lower e-value cutoff, such as 1e-10 or 1e-20.
    

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum