gene_x 0 like s 20 view s
Tags: pipeline, DNA-seq
Could you please help me to process these data (Project: X101SC25015922-Z01-J001)?
For you information,
1. Please compare the data with the AYE strain (CU459141) across the following conditions:
a) AYE-S
b) AYE-Q
c) AYE-WT on Tig4
d) AYE-craA on Tig4
e) AYE-craA-1 on Cm200
f) AYE-craA-2 on Cm200
2. The "clinical" sample refers to a clinical isolate of Acinetobacter baumannii. I’m unsure which reference genome would be most appropriate for comparison in this case. Can we use lab strains (CP059040, CU459141, and CP079931) as reference genome for comparison?
Processed the genome sequence for project X101SC24115801-Z01-J001?
1. Kindly compare the data with the ATCC 19606 strain (CP059040) under the following conditions:
a) adeABadeIJ (knockout of adeA, adeB, adeI, and adeJ, please confirm whether these genes are successfully knocked out.)
b) adeIJK (knockout of adeI, adeJ, and adeK, please confirm whether these genes are successfully knocked out.)
c) CM1
d) CM2
The "HF" sample may also refer to a clinical isolate of Enterobacter hormaechei.
2. The "HF" sample refers to a clinical isolate of Acinetobacter baumannii. I’m unsure which reference genome would be most appropriate for comparison in this case. Can we use lab strains (CP059040, CU459141, and CP079931) as reference genome for comparison?
Project Data_Tam_DNAseq_2025_AYE
86e4016c902a1cd23a2190415425e641 01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz
554eb44ae261312039929f0991582111 01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz
ce004b0d7135bce80f34bd6bac3e89e7 01.RawData/AYE-Q/AYE-Q_1.fq.gz
bddc7ced051a2167a5a8341332d7423a 01.RawData/AYE-Q/AYE-Q_2.fq.gz
227d93b8a762185d5dcd1e4975041491 01.RawData/AYE-S/AYE-S_1.fq.gz
f098c9a8579bf5729427dc871225a290 01.RawData/AYE-S/AYE-S_2.fq.gz
78e08dd090d89330b1021ce42fb09baa 01.RawData/clinical/clinical_1.fq.gz
2346fef1d896ef0924d2ec88db51cade 01.RawData/clinical/clinical_2.fq.gz
4c07494505caf22f70edb54692bcaca2 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz
52944e395004dc11758d422690bda168 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz
92b498ed7465645ca00bbc945c514fe2 01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz
fd9d670942973e6760d6dd78f4ee852a 01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz
375f1e3efb60571ffd457b3cb1e64a84 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz
041c08f4c45f1fabd129fc10500c6582 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz
c129aa9a208ca47db10bb04e54c096d7 02.Report_X101SC25015922-Z01-J001.zip
md5sum 01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz > MD5.txt_
md5sum 01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-Q/AYE-Q_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-Q/AYE-Q_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-S/AYE-S_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-S/AYE-S_2.fq.gz >> MD5.txt_
md5sum 01.RawData/clinical/clinical_1.fq.gz >> MD5.txt_
md5sum 01.RawData/clinical/clinical_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz >> MD5.txt_
md5sum 02.Report_X101SC25015922-Z01-J001.zip >> MD5.txt_
ce004b0d7135bce80f34bd6bac3e89e7 AYE-Q_1.fq.gz
bddc7ced051a2167a5a8341332d7423a AYE-Q_2.fq.gz
Data process according to http://xgenes.com/article/article-content/325/analysis-of-snps-indels-transposons-and-is-elements-in-5-a-baumannii-strains/
Call variant calling using snippy
ln -s ~/Tools/bacto/db/ .;
ln -s ~/Tools/bacto/envs/ .;
ln -s ~/Tools/bacto/local/ .;
cp ~/Tools/bacto/Snakefile .;
cp ~/Tools/bacto/bacto-0.1.json .;
cp ~/Tools/bacto/cluster.json .;
mkdir raw_data; cd raw_data;
# Note that the names must be ending with fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-S/AYE-S_1.fq.gz AYE-S_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-S/AYE-S_2.fq.gz AYE-S_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-Q/AYE-Q_1.fq.gz AYE-Q_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-Q/AYE-Q_2.fq.gz AYE-Q_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz AYE-WT_on_Tig4_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz AYE-WT_on_Tig4_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz AYE-craA_on_Tig4_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz AYE-craA_on_Tig4_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz AYE-craA-1_on_Cm200_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz AYE-craA-1_on_Cm200_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz AYE-craA-2_on_Cm200_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz AYE-craA-2_on_Cm200_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/clinical/clinical_1.fq.gz clinical_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/clinical/clinical_2.fq.gz clinical_R2.fastq.gz
#download CU459141.gb from GenBank
mv ~/Downloads/sequence\(1\).gb db/CU459141.gb
#setting the following in bacto-0.1.json
"fastqc": false,
"taxonomic_classifier": false,
"assembly": true,
"typing_ariba": false,
"typing_mlst": true,
"pangenome": true,
"variants_calling": true,
"phylogeny_fasttree": true,
"phylogeny_raxml": true,
"recombination": false, (due to gubbins-error set false)
"genus": "Acinetobacter",
"kingdom": "Bacteria",
"species": "baumannii", (in both prokka and mykrobe)
"reference": "db/CU459141.gb"
mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
#check if we need big calculation for including the clinical sample by checking mlst. TODO: send the mlst results to Tam. Next step by check vrap which complete isolate?
Download all S epidermidis genomes and identified all ST2 isolates from them!
#Acinetobacter baumannii Taxonomy ID: 470
#esearch -db nucleotide -query "txid470[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_470_ncbi.fasta
#python ~/Scripts/filter_fasta.py genome_470_ncbi.fasta complete_genome_470_ncbi.fasta #
# ---- Download related genomes from ENA ----
https://www.ebi.ac.uk/ena/browser/view/470
#Click "Sequence" and download "Counts" (13059) and "Taxon descendants count" (16091) if there is enough time! Downloading time points is 28.02.2025.
python ~/Scripts/filter_fasta.py ena_470_sequence.fasta complete_genome_470_ena_taxon_descendants_count.fasta #16091-->920
#python ~/Scripts/filter_fasta.py ena_470_sequence_Counts.fasta complete_genome_470_ena_Counts.fasta #xxx, 5.8G
Run vrap
#replace --virus to the specific taxonomy (e.g. Acinetobacter baumannii) --> change virus_user_db --> specific_bacteria_user_db
mv trimmed/clinical_trimmed*.fastq .
gzip *.fastq
ln -s ~/Tools/vrap/ .
mamba activate /home/jhuang/miniconda3/envs/vrap
vrap/vrap.py -1 clinical_trimmed_P_1.fastq.gz -2 clinical_trimmed_P_2.fastq.gz -o vrap_clinical --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Tam_DNAseq_2025_AYE/complete_genome_470_ena_taxon_descendants_count.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g
Run second run of bengal3_ac3 without the clinical sample
mkdir results_with_clinical
mv variants results_with_clinical
mv roary results_with_clinical
mv fasttree results_with_clinical
mv raxml-ng results_with_clinical
mv snippy/clinical/ results_with_clinical/clinical_under_snippy
rm raw_data/clinical_*.fastq.gz
rm fastq/clinical_*.fastq
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
Variant calling against AYE
Using spandx calling variants (almost the same results to the one from viral-ngs!)
mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CU459141
cp db/CU459141.gb ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CU459141/genes.gbk
vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
#CU459141.chromosome : CU459141
/home/jhuang/miniconda3/envs/spandx/bin/snpEff build -c ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config -genbank CU459141 -v #-d
#00:00:02 Saving sequences for chromosome 'CU459141.1 ...
mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3 /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff build -c ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config -genbank CU459141 -v
~/Scripts/genbank2fasta.py CU459141.gb
mv CU459141.gb_converted.fna CU459141.fasta #rename "CU459141.1 xxxxx" to "CU459141" in the fasta-file
ln -s /home/jhuang/Tools/spandx/ spandx
mamba activate /home/jhuang/miniconda3/envs/spandx
(spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref CU459141.fasta --annotation --database CU459141 -resume
#DEBUG "ERROR_CHROMOSOME_NOT_FOUND"
cd Outputs/Master_vcf
sed 's/^CU459141/CU459141.1/' out.filtered.vcf > modified_input.vcf
/home/jhuang/miniconda3/envs/spandx/bin/snpEff ann -chr CU459141.1 out.filtered.vcf > annotated_variants.vcf
/home/jhuang/miniconda3/envs/spandx/bin/snpEff databases
Summarize all SNPs and Indels from the snippy result directory.
#Output: snippy_CP059040/snippy/summary_snps_indels.csv
# IMPORTANT_ADAPT the array isolates = ["AYE-S", "AYE-Q", "AYE-WT on Tig4", "AYE-craA on Tig4", "AYE-craA-1 on Cm200", "AYE-craA-2 on Cm200"]
python3 ~/Scripts/summarize_snippy_res.py snippy
grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv
Additonal steps for DEBUG #ERROR_CHROMOSOME_NOT_FOUND in the generated SPANDx result: Rerun SNP_matrix.sh with /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff, rather than the snpEff from its own environment spandx!
# -- The syntax of /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff is different (see below) --
#cd ~/DATA/Data_Tam_DNAseq_2025_AYE/snippy/AYE-Q/reference
### snpEff build -c reference/snpeff.config -dataDir . -gff3 ref
#ref.chromosome : CU459141
### snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref AYE-Q.filt.vcf > AYE-Q.vcf
#(OUTDATED!) #under Outputs
# (bengal3_ac3) cp -r ~/DATA/Data_Tam_DNAseq_2025_AYE/snippy/AYE-Q/reference .
# for sample in AYE-craA-1_on_Cm200_trimmed
# #(bengal3_ac3) snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref out.filtered.vcf > out.filtered_.vcf
# for sample in AYE-S_trimmed.PASS.indels AYE-S_trimmed.PASS.snps AYE-Q_trimmed.PASS.indels AYE-Q_trimmed.PASS.snps AYE-WT_on_Tig4_trimmed.PASS.indels AYE-WT_on_Tig4_trimmed.PASS.snps AYE-craA_on_Tig4_trimmed.PASS.indels AYE-craA_on_Tig4_trimmed.PASS.snps AYE-craA-1_on_Cm200_trimmed.PASS.indels AYE-craA-1_on_Cm200_trimmed.PASS.snps AYE-craA-2_on_Cm200_trimmed.PASS.indels AYE-craA-2_on_Cm200_trimmed.PASS.snps; do
# snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref Variants/VCFs/${sample}.vcf > Variants/Annotated/${sample}.annotated.vcf
# done
#(WORKS!) Under Outputs/Master_vcf and (spandx)
(spandx) cp -r ../../snippy/AYE-Q/reference .
(spandx) cp ../../spandx/bin/SNP_matrix.sh ./
#Note that ${variant_genome_path}=CU459141
#Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to
"/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh
(spandx) bash SNP_matrix.sh CU459141 .
Merge the two files
#grep -v "/" All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated_.txt
cut -d$'\t' -f2 All_SNPs_indels_annotated.txt > ../../id1
cut -d',' -f2 summary_snps_indels_.csv > ../id2
diff id1 id2
#(OUTDATED!) Merging summary_snps_indels_.csv (Second half part) and All_SNPs_indels_annotated_.txt (First half part) by pasting the results from two variant calling methods snippy and spandx
# cp Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated_.txt merged_variants.csv
# #The common positons are "169174" and "3199991"
# grep "CHROM" All_SNPs_indels_annotated_.txt > All_SNPs_indels_annotated__.txt
# grep "169174" All_SNPs_indels_annotated_.txt >> All_SNPs_indels_annotated__.txt
# grep "3199991" All_SNPs_indels_annotated_.txt >> All_SNPs_indels_annotated__.txt
# cut -d$'\t' -f1-11 All_SNPs_indels_annotated__.txt > All_SNPs_indels_annotated___.txt
# sed -i '1s/_trimmed//g' All_SNPs_indels_annotated___.txt
# cp All_SNPs_indels_annotated___.txt ../../
# grep "CHROM" summary_snps_indels_.csv > summary_snps_indels__.csv
# grep "169174" summary_snps_indels_.csv >> summary_snps_indels__.csv
# grep "3199991" summary_snps_indels_.csv >> summary_snps_indels__.csv
# cut -d$',' -f11- summary_snps_indels__.csv > summary_snps_indels___.csv
# sed -i 's/,/\t/g' summary_snps_indels___.csv
# sed -i 's/ /\t/g' summary_snps_indels___.csv
# paste Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated___.txt snippy/summary_snps_indels___.csv > merged_variants.csv
#The common positons are "169174" and "3199991"
grep "CHROM" All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated__.txt
grep "169174" All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated__.txt
grep "3199991" All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated__.txt
#DEBUG
CU459141 3199991 CTGT C INDEL CTGT CTGT CTGT CTGT CTGT C disruptive_inframe_deletion MODERATE gttgct/gct p.Val8del/c.23_25delTTG 86 ABAYE3157 protein_coding
-->
CU459141 3199991 GCTGTT GCT INDEL GCTGTT GCTGTT GCTGTT GCTGTT GCTGTT GCT disruptive_inframe_deletion MODERATE gctgtt/gct p.Val8del/c.23_25delTGT 86 ABAYE3157 protein_coding
gene 3199972..3200232
/locus_tag="ABAYE3157"
CDS 3199972..3200232
/locus_tag="ABAYE3157"
/inference="ab initio prediction:AMIGene:2.0"
/note="Evidence 4 : Homologs of previously reported genes
of unknown function"
/codon_start=1
/transl_table=11
/product="conserved hypothetical protein; putative
exported protein"
/protein_id="CAM87966.1"
/db_xref="EnsemblGenomes-Gn:ABAYE3157"
/db_xref="EnsemblGenomes-Tr:CAM87966"
/db_xref="UniProtKB/TrEMBL:B0V5Q3"
/translation="MLFKKFAVAAAVAATLAFVGCSKKEEAPAANAASEATAAASEAT
AAASEAATAASQAVDAAASEAAASTAAASAPAASEAAASAAH"
>CU459141:3199972-3200232
ATGTTATTCAAAAAATTC GCTGTT GCTGCTGCTGTTGCTGCTACTTTAGCTTTCGTTGGTTGTTCTAAGA
AAGAAGAAGCTCCTGCTGCTAACGCTGCATCTGAAGCTACTGCTGCTGCATCTGAAGCTACTGCTGCTGC
ATCTGAAGCTGCTACTGCTGCTTCTCAAGCTGTTGATGCTGCTGCATCTGAAGCTGCTGCTTCTACTGCT
GCCGCTTCTGCACCTGCCGCTTCTGAAGCTGCTGCTTCTGCTGCACACTAA
(Optional-->may error-->ignoring the step! since it should not happed) filter rows without change (between REF and the isolates) in snippy/summary_snps_indels_.csv (94)
awk -F, '
NR == 1 {print; next} # Print the header line
{
ref = $3; # Reference base (assuming REF is in the 3rd column)
same = 1; # Flag to check if all bases are the same as reference
samples = ""; # Initialize variable to hold sample data
for (i = 6; i <= NF - 8; i++) { # Loop through all sample columns, adjusting for the number of fixed columns
samples = samples $i " "; # Collect sample data
if ($i != ref) {
same = 0;
}
}
if (!same) {
print $0; # Print the entire line if not all bases are the same as the reference
#print "Samples: " samples; # Print all sample data for checking
}
}
' merged_variants.csv > merged_variants_.csv
#Explanation:
# -F, sets the field separator to a comma.
# NR == 1 {print; next} prints the header line and skips to the next line.
# ref = $3 sets the reference base (assumed to be in the 3rd column).
# same = 1 initializes a flag to check if all sample bases are the same as the reference.
# samples = ""; initializes a string to collect sample data.
# for (i = 6; i <= NF - 8; i++) loops through the sample columns. This assumes the first 5 columns are fixed (CHROM, POS, REF, ALT, TYPE), and the last 8 columns are fixed (Effect, Impact, Functional_Class, Codon_change, Protein_and_nucleotide_change, Amino_Acid_Length, Gene_name, Biotype).
# samples = samples $i " "; collects sample data.
# if ($i != ref) { same = 0; } checks if any sample base is different from the reference base. If found, it sets same to 0.
# if (!same) { print $0; print "Samples: " samples; } prints the entire line and the sample data if not all sample bases are the same as the reference.
(Optional) Draw local genetic environments of △adeIJ
see http://xgenes.com/article/article-content/325/analysis-of-snps-indels-transposons-and-is-elements-in-5-a-baumannii-strains/
(Optional) Show difference between the genome using BRIG (show no differences by images!)
This is an issue with java version. I was running Java 1.8 version and had same problem as the OP's question. Then I followed the answers and finally changing the java version worked for me.
I ran the following steps:
1) Go to link: https://www.oracle.com/java/technologies/javase-java-archive-javase6-downloads.html
2) mkdir Java1.6
3) Download "jdk-6u45-linux-x64.bin" and save in Java1.6
3) cd Java1.6 && chmod +x jdk-6u45-linux-x64.bin && ./jdk-6u45-linux-x64.bin
4) You should see java file in Java1.6/jdk1.6.0_45/bin
5) Java1.6/jdk1.6.0_45/bin/java -version - should give "1.6.0_45"
6) (base) conda install bioconda::blast-legacy # install blastall 2.2.26
7) #set lastLocation="/home/jhuang/miniconda3/bin/" in default-BRIG.xml --> /home/jhuang/miniconda3/bin/blastall is used for blast!
8) Open BRIG using Java1.6 as follows: ~/Tools/Java1.6/jdk1.6.0_45/bin/java -Xmx15000M -jar BRIG.jar
Note: I did not uninstall my original Java1.8 rather used the downloded java1.6 execulatable file using path. That way I retain both versions without any problem.
This gave me all the rings without any issue!! Hope this helps someone!
Users who wish to run BRIG from the command-line need to:
* Navigate to the unpacked BRIG folder in a command-line interface (terminal, console, command prompt).
* Run 'java -Xmx1500M -jar BRIG.jar'. Where -Xmx specifies the amount of memory allocated to BRIG.
Note:
* BLAST legacy comes as a compressed package, which will unzip the BLAST binaries where ever the package is. We advise users to first create a BLAST directory (in either the home or applications directory), copy the downloaded BLAST package to that directory and unzip the package.
* BRIG supports both BLAST+ & BLAST Legacy. If BRIG cannot find BLAST it will prompt users at runtime. Users can specify the location of their BLAST installation in the BRIG options menu which is: Main window > Preferences > BRIG options.
* N.B: If BOTH BLAST+ and legacy versions are in the same location, BRIG will prefer BLAST+
convert CP059040.png -crop 4000x2780+3840+0 CP059040_.png
TODO: Variant calling against xxxx (found in vrap results)!
Project Data_Tam_DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2
#ln -s ../X101SC24115801-Z01-J001/01.RawData/HF/HF_1.fq.gz HF_R1.fastq.gz
#ln -s ../X101SC24115801-Z01-J001/01.RawData/HF/HF_2.fq.gz HF_R2.fastq.gz
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
#HF is Enterobacter cloacae (550) or Enterobacter hormaechei (158836)
# ---- Download related genomes from ENA ----
https://www.ebi.ac.uk/ena/browser/view/550
#Click "Sequence" and download "Counts" (7263) and "Taxon descendants count" (8004) if there is enough time! Downloading time points is 28.02.2025.
python ~/Scripts/filter_fasta.py ena_550_sequence.fasta complete_genome_550_ena_taxon_descendants_count.fasta #8004-->100
https://www.ebi.ac.uk/ena/browser/view/158836
#Click "Sequence" and download "Counts" (3763) and "Taxon descendants count" (4846) if there is enough time! Downloading time points is 28.02.2025.
python ~/Scripts/filter_fasta.py ena_158836_sequence.fasta complete_genome_158836_ena_taxon_descendants_count.fasta #4846-->540
cat complete_genome_158836_ena_taxon_descendants_count.fasta complete_genome_550_ena_taxon_descendants_count.fasta > complete_genome_158836_550.fasta
#check if the flanking line is correct
grep "ENA|AP022130|AP022130.1" complete_genome_158836_550.fasta
#>ENA|AP022130|AP022130.1 Enterobacter cloacae plasmid pWP5-S18-CRE-02_4 DNA, complete genome, strain: WP5-S18-CRE-02.
mv trimmed/HF_trimmed*.fastq .
gzip *.fastq
ln -s ~/Tools/vrap/ .
mamba activate /home/jhuang/miniconda3/envs/vrap
vrap/vrap.py -1 HF_trimmed_P_1.fastq.gz -2 HF_trimmed_P_2.fastq.gz -o vrap_HF --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Tam_DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2/complete_genome_158836_550.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr -t 100 -l 200 -g
# Variant calling against ATCC 19606
mkdir results_with_HF
mv variants results_with_HF
mv roary results_with_HF
mv fasttree results_with_HF
mv raxml-ng results_with_HF
mv snippy/HF/ results_with_HF/HF_under_snippy
rm raw_data/HF_*.fastq.gz
rm fastq/HF_*.fastq
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040
cp db/CP059040.gb ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040/genes.gbk
vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
/home/jhuang/miniconda3/envs/spandx/bin/snpEff build CP059040 #-d
~/Scripts/genbank2fasta.py CP059040.gb
mv CP059040.gb_converted.fna CP059040.fasta #rename "CP059040.1 xxxxx" to "CP059040" in the fasta-file
ln -s /home/jhuang/Tools/spandx/ spandx
mamba activate /home/jhuang/miniconda3/envs/spandx
(spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref CP059040.fasta --annotation --database CP059040 -resume
python3 ~/Scripts/summarize_snippy_res.py snippy
grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv
cd Outputs/Master_vcf
(spandx) cp -r ../../snippy/CM1/reference .
(spandx) cp ../../spandx/bin/SNP_matrix.sh ./
#Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to
"/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh
(spandx) bash SNP_matrix.sh CP059040 .
Supplementary: Enterobacter cloacae (taxid550) vs. Enterobacter hormaechei (taxid158836)
🔬 介绍
阴沟肠杆菌(Enterobacter cloacae) 和 霍尔马氏肠杆菌(Enterobacter hormaechei) 都属于 肠杆菌科(Enterobacteriaceae),是革兰氏阴性、兼性厌氧的杆状细菌。它们广泛存在于 环境中(如水、土壤、植物) 以及 人类和动物的肠道 中。
Report
Please find attached the SNP and Indel calling results for the project DNAseq_2025_AYE against CU459141, as well as for the project DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2 against CP059040.
Additionally, I have attached the analysis for the two clinical samples: clinical (in DNAseq_2025_AYE) and HF (in DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2). You can find the details in the files clinical_summary.xlsx and HF_summary.xlsx. The closest species found in the database are "Acinetobacter baumannii strain 2024CK-00130" and "Enterobacter hormaechei strain K528". Would you like to have the SNP and Indel analysis for the clinical samples compared to these closely related species as the reference?
点赞本文的读者
还没有人对此文章表态
没有评论
Typing of S. epidermidis samples (MD P.B.)
Using vrap and bacto to process Acinetobacter baumannii isolates
Processing for Data_Tam_DNAseq_2025
Presence-Absence Table and Graphics for Selected Genes in Data_Patricia_Sepi_7samples
© 2023 XGenes.com Impressum