gene_x 0 like s 361 view s
Tags: processing, scripts
Designing a probe for bioinformatics applications typically involves the creation of DNA or RNA sequences that can specifically hybridize to target nucleic acids in a sample. This process is commonly used for applications such as DNA microarrays, fluorescence in situ hybridization (FISH), and qPCR. Here are the general steps involved in designing a probe for bioinformatics:
Define the target sequence: Identify the specific nucleic acid sequence (DNA or RNA) that you want to detect or analyze.
Obtain reference sequences: Collect reference sequences related to the target sequence from databases like GenBank, RefSeq, or Ensemble.
Select target regions: Choose specific regions within the target sequence that will allow for specific and sensitive detection. These regions should have unique characteristics, such as a conserved sequence or a single nucleotide polymorphism (SNP).
Design probe sequence: Generate the complementary probe sequence based on the selected target region. The length of the probe sequence should be appropriate for the intended application (e.g., 18-30 nucleotides for qPCR, 50-70 nucleotides for FISH).
Evaluate probe properties: Analyze the probe sequence for properties like melting temperature (Tm), GC content, and potential secondary structures. Ensure that these properties are within acceptable ranges for the intended application.
./BaitsTools/baitstools.rb checkbaits -i BLAST_2_NCCRmiRNA.csv -L 25 -c -N -C -n0.0 -x100.0 -q67.0 -z76.0 -T DNA-DNA > BLAST_2_NCCRmiRNA_filtered_by_Tm.txt
../BaitsTools/baitstools.rb checkbaits -i out-baits.fa -L 25 -c -N -C -n0.0 -x100.0 -q67.0 -z76.0 -T DNA-DNA > probes_filtered_by_Tm.txt
#BLAST_2_NCCRmiRNA_filtered_by_Tm.txt contains 200 records.
#The output file is out-filtered-baits.fa also contains 200 records.
Cluster similar probes: Group probes with high sequence similarity together to reduce redundancy and save costs. This can be achieved using clustering algorithms such as hierarchical clustering or k-means clustering. By identifying and combining highly similar probes, you can streamline the probe set without sacrificing coverage of the target region.
usearch -cluster_smallmem out-filtered-baits.fa -id 0.90 -centroids out-filtered-baits_clustered.fa
Check for specificity: Perform in silico analyses (e.g., BLAST search) to ensure that the designed probe does not have significant homology to other sequences in the genome or transcriptome, which could lead to false-positive results.
# -- simple version, it works well! --
#------------------------------------------------------------------------------
#Standard Nucleotide BLAST 11 On (DUST) 10
#Search for short/near exact matches 7 Off 1000
blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_on-human.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query 1_filtered_by_Tm.fasta -out 1_filtered_by_Tm_on-human.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query BLAST_2_NCCRmiRNA.csv -out BLAST_2_NCCRmiRNA_on-human_evalue0.1.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
blastn -task blastn-short -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query BLAST_2_NCCRmiRNA.csv -out BLAST_2_NCCRmiRNA_on-human_evalue1.blastn -evalue 1 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
#qseqid chr pident length mismatch gapopen qstart qend sstart send evalue bitscore
# -- complicated version, it does not work well! --
blastn -db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -query out-filtered-baits_clustered.fa -out blast_result_human.txt -evalue 1 -num_threads 15 -outfmt 6 -strand both
paste f1 f2 f3_4 f5 f6_7 f8 > input_baitfilter.txt
BaitFilter -m blast-a --ref-blast-db /ref/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -i input_baitfilter.txt --blast-first-hit-evalue 0.00000001 --blast-second-hit-evalue 0.00001 -o baits_excluding_human.out
mv blast_result.txt blast_result_human.txt
cut -f1 blast_result_human.txt > blast_result_human_f1.txt
# using commands '| sort | uniq -c' exclude the baits.
cat blast_result_human_f1.txt | sort | uniq -c | wc -l
cat blast_result_human_f1.txt | sort | uniq -c > blast_result_human_counts.txt
Check for self-matching: Assess the designed probe for self-complementarity, which could lead to the formation of hairpins or dimers that may decrease the probe's specificity and sensitivity. Use tools like Primer3, OligoAnalyzer, or UNAFold to identify and minimize self-complementarity in the probe sequence.
makeblastdb -in 2_clustered_with_0.90.fasta -dbtype nucl
makeblastdb -in 1_filtered_by_Tm.fasta -dbtype nucl
# NOTE that '-strand minus' may be wrong, should be set as 'both': -strand <String, `both', `minus', `plus'>
#qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
blastn -task blastn-short -db 2_clustered_with_0.90.fasta -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_self-comp.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
blastn -task blastn-short -db 1_filtered_by_Tm.fasta -query 1_filtered_by_Tm.fasta -out 1_filtered_by_Tm_self-comp.blastn -evalue 0.1 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
makeblastdb -in 2_clustered_with_0.90.fasta -dbtype nucl
blastn -db 2_clustered_with_0.90.fasta -query 2_clustered_with_0.90.fasta -out 2_clustered_with_0.90_self-comp.blastn -evalue 1e-10 -num_threads 15 -outfmt 6 -strand minus # -max_target_seqs 1
blastn -db ligated_probes_clustered.fasta -query ligated_probes_clustered.fasta -out probes_clustered_on_human_self-comp.blastn -evalue 1e-10 -num_threads 15 -outfmt 6 -strand both
Validate probe performance: Test the designed probe experimentally to confirm its specificity, sensitivity, and overall performance in the intended application.
Analyze and interpret data: Use the designed probe in the bioinformatics application and analyze the resulting data to draw conclusions about the target sequence or biological system under investigation.
~/Tools/csv2xls-0.4/csv_to_xls.py 1_filtered_by_Tm_on-human_re.blastn 1_filtered_by_Tm_self-comp.blastn -d$'\t' -o 1_filtered_qualitycontrol.xls
~/Tools/csv2xls-0.4/csv_to_xls.py 2_clustered_with_0.90_on-human.fasta.blastn 2_clustered_with_0.90_self-comp.blastn -d$'\t' -o 2_clustered_with_0.90_qualitycontrol.xls
~/Tools/csv2xls-0.4/csv_to_xls.py 1_filtered_by_Tm.txt 2_clustered_with_0.90.txt -d$'\t' -o probes.xls
~/Tools/csv2xls-0.4/csv_to_xls.py BLAST_2_NCCRmiRNA_on-human_evalue0.1.blastn -d$'\t' -o BLAST_2_NCCRmiRNA_homology_to_human.xls
#For most applications, an e-value cutoff of 1e-5 or 1e-6 is considered appropriate. This threshold balances the need for specificity while still allowing for some degree of similarity between the query sequence and the target genome. However, if you require higher stringency, you can use a lower e-value cutoff, such as 1e-10 or 1e-20.
点赞本文的读者
还没有人对此文章表态
没有评论
Plot phylogenetic tree_heatmap and MSA on yopBDJTEMKOH[NR]
Variant Calling for Herpes Simplex Virus 1 from Patient Sample Using Capture Probe Sequencing
Typing of 81 S. epidermidis samples (Luise)
Co-Authorship Network Generator using scraped data from Google Scholar via SerpAPI
© 2023 XGenes.com Impressum