DNAseq 2025_AYE and DNAseq 2025_adeABadeIJ_adeIJK_CM1_CM2

gene_x 0 like s 762 view s

Tags: pipeline, DNA-seq

Targets

Could you please help me to process these data (Project: X101SC25015922-Z01-J001)?
For you information,
1. Please compare the data with the AYE strain (CU459141) across the following conditions:
a) AYE-S
b) AYE-Q
c) AYE-WT on Tig4
d) AYE-craA on Tig4
e) AYE-craA-1 on Cm200
f) AYE-craA-2 on Cm200
2. The "clinical" sample refers to a clinical isolate of Acinetobacter baumannii. I’m unsure which reference genome would be most appropriate for comparison in this case. Can we use lab strains (CP059040, CU459141, and CP079931) as reference genome for comparison?

Processed the genome sequence for project X101SC24115801-Z01-J001?
1. Kindly compare the data with the ATCC 19606 strain (CP059040) under the following conditions:
a) adeABadeIJ (knockout of adeA, adeB, adeI, and adeJ, please confirm whether these genes are successfully knocked out.)
b) adeIJK (knockout of adeI, adeJ, and adeK, please confirm whether these genes are successfully knocked out.)
c) CM1
d) CM2
The "HF" sample may also refer to a clinical isolate of Enterobacter hormaechei.
2. The "HF" sample refers to a clinical isolate of Acinetobacter baumannii. I’m unsure which reference genome would be most appropriate for comparison in this case. Can we use lab strains (CP059040, CU459141, and CP079931) as reference genome for comparison?

Project Data_Tam_DNAseq_2025_AYE

Download the raw data.

86e4016c902a1cd23a2190415425e641  01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz
554eb44ae261312039929f0991582111  01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz
ce004b0d7135bce80f34bd6bac3e89e7  01.RawData/AYE-Q/AYE-Q_1.fq.gz
bddc7ced051a2167a5a8341332d7423a  01.RawData/AYE-Q/AYE-Q_2.fq.gz
227d93b8a762185d5dcd1e4975041491  01.RawData/AYE-S/AYE-S_1.fq.gz
f098c9a8579bf5729427dc871225a290  01.RawData/AYE-S/AYE-S_2.fq.gz
78e08dd090d89330b1021ce42fb09baa  01.RawData/clinical/clinical_1.fq.gz
2346fef1d896ef0924d2ec88db51cade  01.RawData/clinical/clinical_2.fq.gz
4c07494505caf22f70edb54692bcaca2  01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz
52944e395004dc11758d422690bda168  01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz
92b498ed7465645ca00bbc945c514fe2  01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz
fd9d670942973e6760d6dd78f4ee852a  01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz
375f1e3efb60571ffd457b3cb1e64a84  01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz
041c08f4c45f1fabd129fc10500c6582  01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz
c129aa9a208ca47db10bb04e54c096d7  02.Report_X101SC25015922-Z01-J001.zip

md5sum 01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz > MD5.txt_
md5sum 01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-Q/AYE-Q_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-Q/AYE-Q_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-S/AYE-S_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-S/AYE-S_2.fq.gz >> MD5.txt_
md5sum 01.RawData/clinical/clinical_1.fq.gz >> MD5.txt_
md5sum 01.RawData/clinical/clinical_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz >> MD5.txt_
md5sum 01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz >> MD5.txt_
md5sum 02.Report_X101SC25015922-Z01-J001.zip >> MD5.txt_

ce004b0d7135bce80f34bd6bac3e89e7  AYE-Q_1.fq.gz
bddc7ced051a2167a5a8341332d7423a  AYE-Q_2.fq.gz

Data process according to http://xgenes.com/article/article-content/325/analysis-of-snps-indels-transposons-and-is-elements-in-5-a-baumannii-strains/

Call variant calling using snippy

ln -s ~/Tools/bacto/db/ .;
ln -s ~/Tools/bacto/envs/ .;
ln -s ~/Tools/bacto/local/ .;
cp ~/Tools/bacto/Snakefile .;
cp ~/Tools/bacto/bacto-0.1.json .;
cp ~/Tools/bacto/cluster.json .;

mkdir raw_data; cd raw_data;

# Note that the names must be ending with fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-S/AYE-S_1.fq.gz AYE-S_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-S/AYE-S_2.fq.gz AYE-S_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-Q/AYE-Q_1.fq.gz AYE-Q_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-Q/AYE-Q_2.fq.gz AYE-Q_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-WTonTig4/AYE-WTonTig4_1.fq.gz AYE-WT_on_Tig4_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-WTonTig4/AYE-WTonTig4_2.fq.gz AYE-WT_on_Tig4_R2.fastq.gz

ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craAonTig4/AYE-craAonTig4_1.fq.gz AYE-craA_on_Tig4_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craAonTig4/AYE-craAonTig4_2.fq.gz AYE-craA_on_Tig4_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_1.fq.gz AYE-craA-1_on_Cm200_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-1onCm200/AYE-craA-1onCm200_2.fq.gz AYE-craA-1_on_Cm200_R2.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_1.fq.gz AYE-craA-2_on_Cm200_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/AYE-craA-2onCm200/AYE-craA-2onCm200_2.fq.gz AYE-craA-2_on_Cm200_R2.fastq.gz

ln -s ../X101SC25015922-Z01-J001/01.RawData/clinical/clinical_1.fq.gz clinical_R1.fastq.gz
ln -s ../X101SC25015922-Z01-J001/01.RawData/clinical/clinical_2.fq.gz clinical_R2.fastq.gz

#download CU459141.gb from GenBank
mv ~/Downloads/sequence\(1\).gb db/CU459141.gb
#setting the following in bacto-0.1.json

    "fastqc": false,
    "taxonomic_classifier": false,
    "assembly": true,
    "typing_ariba": false,
    "typing_mlst": true,
    "pangenome": true,
    "variants_calling": true,
    "phylogeny_fasttree": true,
    "phylogeny_raxml": true,
    "recombination": false, (due to gubbins-error set false)

    "genus": "Acinetobacter",
    "kingdom": "Bacteria",
    "species": "baumannii",  (in both prokka and mykrobe)
    "reference": "db/CU459141.gb"

mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds

#check if we need big calculation for including the clinical sample by checking mlst. TODO: send the mlst results to Tam. Next step by check vrap which complete isolate?

Download all S epidermidis genomes and identified all ST2 isolates from them!

#Acinetobacter baumannii Taxonomy ID: 470
#esearch -db nucleotide -query "txid470[Organism:exp]" | efetch -format fasta -email j.huang@uke.de > genome_470_ncbi.fasta
#python ~/Scripts/filter_fasta.py genome_470_ncbi.fasta complete_genome_470_ncbi.fasta  #

# ---- Download related genomes from ENA ----
https://www.ebi.ac.uk/ena/browser/view/470
#Click "Sequence" and download "Counts" (13059) and "Taxon descendants count" (16091) if there is enough time! Downloading time points is 28.02.2025.
python ~/Scripts/filter_fasta.py  ena_470_sequence.fasta complete_genome_470_ena_taxon_descendants_count.fasta  #16091-->920
#python ~/Scripts/filter_fasta.py ena_470_sequence_Counts.fasta complete_genome_470_ena_Counts.fasta  #xxx, 5.8G

Run vrap

#replace --virus to the specific taxonomy (e.g. Acinetobacter baumannii) --> change virus_user_db --> specific_bacteria_user_db
mv trimmed/clinical_trimmed*.fastq .
gzip *.fastq
ln -s ~/Tools/vrap/ .
mamba activate /home/jhuang/miniconda3/envs/vrap
vrap/vrap.py  -1 clinical_trimmed_P_1.fastq.gz -2 clinical_trimmed_P_2.fastq.gz -o vrap_clinical --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Tam_DNAseq_2025_AYE/complete_genome_470_ena_taxon_descendants_count.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr  -t 100 -l 200  -g

Run second run of bengal3_ac3 without the clinical sample

mkdir results_with_clinical
mv variants results_with_clinical
mv roary results_with_clinical
mv fasttree results_with_clinical
mv raxml-ng results_with_clinical
mv snippy/clinical/ results_with_clinical/clinical_under_snippy
rm raw_data/clinical_*.fastq.gz
rm fastq/clinical_*.fastq
(bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds

Variant calling against AYE

Using spandx calling variants (almost the same results to the one from viral-ngs!)

mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CU459141
cp db/CU459141.gb  ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CU459141/genes.gbk
vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
#CU459141.chromosome : CU459141
/home/jhuang/miniconda3/envs/spandx/bin/snpEff build -c ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config -genbank CU459141 -v   #-d
#00:00:02 Saving sequences for chromosome 'CU459141.1 ...

mamba activate /home/jhuang/miniconda3/envs/bengal3_ac3 /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff build -c ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config -genbank CU459141 -v

~/Scripts/genbank2fasta.py CU459141.gb
mv CU459141.gb_converted.fna CU459141.fasta    #rename "CU459141.1 xxxxx" to "CU459141" in the fasta-file

ln -s /home/jhuang/Tools/spandx/ spandx
mamba activate /home/jhuang/miniconda3/envs/spandx
(spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref CU459141.fasta --annotation --database CU459141 -resume

#DEBUG "ERROR_CHROMOSOME_NOT_FOUND"
cd Outputs/Master_vcf
sed 's/^CU459141/CU459141.1/' out.filtered.vcf > modified_input.vcf
/home/jhuang/miniconda3/envs/spandx/bin/snpEff  ann -chr CU459141.1 out.filtered.vcf > annotated_variants.vcf
/home/jhuang/miniconda3/envs/spandx/bin/snpEff databases

Summarize all SNPs and Indels from the snippy result directory.

#Output: snippy_CP059040/snippy/summary_snps_indels.csv
# IMPORTANT_ADAPT the array isolates = ["AYE-S", "AYE-Q", "AYE-WT on Tig4", "AYE-craA on Tig4", "AYE-craA-1 on Cm200", "AYE-craA-2 on Cm200"]
python3 ~/Scripts/summarize_snippy_res.py snippy
grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv

Additonal steps for DEBUG #ERROR_CHROMOSOME_NOT_FOUND in the generated SPANDx result: Rerun SNP_matrix.sh with /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff, rather than the snpEff from its own environment spandx!

# -- The syntax of /home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff is different (see below) --
#cd ~/DATA/Data_Tam_DNAseq_2025_AYE/snippy/AYE-Q/reference
### snpEff build -c reference/snpeff.config -dataDir . -gff3 ref
#ref.chromosome : CU459141
### snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref AYE-Q.filt.vcf > AYE-Q.vcf

#(OUTDATED!)  #under Outputs
#  (bengal3_ac3) cp -r ~/DATA/Data_Tam_DNAseq_2025_AYE/snippy/AYE-Q/reference .
#  for sample in AYE-craA-1_on_Cm200_trimmed
#  #(bengal3_ac3) snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref out.filtered.vcf > out.filtered_.vcf
#  for sample in AYE-S_trimmed.PASS.indels AYE-S_trimmed.PASS.snps  AYE-Q_trimmed.PASS.indels AYE-Q_trimmed.PASS.snps  AYE-WT_on_Tig4_trimmed.PASS.indels AYE-WT_on_Tig4_trimmed.PASS.snps  AYE-craA_on_Tig4_trimmed.PASS.indels AYE-craA_on_Tig4_trimmed.PASS.snps  AYE-craA-1_on_Cm200_trimmed.PASS.indels AYE-craA-1_on_Cm200_trimmed.PASS.snps AYE-craA-2_on_Cm200_trimmed.PASS.indels AYE-craA-2_on_Cm200_trimmed.PASS.snps; do
#          snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c reference/snpeff.config -dataDir . ref Variants/VCFs/${sample}.vcf > Variants/Annotated/${sample}.annotated.vcf
#  done

#(WORKS!) Under Outputs/Master_vcf and (spandx)
(spandx) cp -r ../../snippy/AYE-Q/reference .
(spandx) cp ../../spandx/bin/SNP_matrix.sh ./
#Note that ${variant_genome_path}=CU459141
#Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to
"/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh
(spandx) bash SNP_matrix.sh CU459141 .

Merge the two files

#grep -v "/" All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated_.txt
cut -d$'\t' -f2 All_SNPs_indels_annotated.txt > ../../id1
cut -d',' -f2 summary_snps_indels_.csv > ../id2
diff id1 id2

#(OUTDATED!)  Merging summary_snps_indels_.csv (Second half part) and All_SNPs_indels_annotated_.txt (First half part) by pasting the results from two variant calling methods snippy and spandx
#  cp Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated_.txt merged_variants.csv
#  #The common positons are "169174" and "3199991"
#  grep "CHROM" All_SNPs_indels_annotated_.txt > All_SNPs_indels_annotated__.txt
#  grep "169174" All_SNPs_indels_annotated_.txt >> All_SNPs_indels_annotated__.txt
#  grep "3199991" All_SNPs_indels_annotated_.txt >> All_SNPs_indels_annotated__.txt
#  cut -d$'\t' -f1-11 All_SNPs_indels_annotated__.txt > All_SNPs_indels_annotated___.txt
#  sed -i '1s/_trimmed//g' All_SNPs_indels_annotated___.txt
#  cp All_SNPs_indels_annotated___.txt ../../
#  grep "CHROM" summary_snps_indels_.csv > summary_snps_indels__.csv
#  grep "169174" summary_snps_indels_.csv >> summary_snps_indels__.csv
#  grep "3199991" summary_snps_indels_.csv >> summary_snps_indels__.csv
#  cut -d$',' -f11- summary_snps_indels__.csv > summary_snps_indels___.csv
#  sed -i 's/,/\t/g' summary_snps_indels___.csv
#  sed -i 's/ /\t/g' summary_snps_indels___.csv
#  paste Outputs/Phylogeny_and_annotation/All_SNPs_indels_annotated___.txt snippy/summary_snps_indels___.csv > merged_variants.csv

#The common positons are "169174" and "3199991"
grep "CHROM" All_SNPs_indels_annotated.txt > All_SNPs_indels_annotated__.txt
grep "169174" All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated__.txt
grep "3199991" All_SNPs_indels_annotated.txt >> All_SNPs_indels_annotated__.txt

#DEBUG
CU459141        3199991 CTGT    C       INDEL   CTGT    CTGT    CTGT    CTGT    CTGT    C       disruptive_inframe_deletion     MODERATE                gttgct/gct      p.Val8del/c.23_25delTTG 86      ABAYE3157       protein_coding
-->
CU459141        3199991 GCTGTT    GCT       INDEL   GCTGTT    GCTGTT    GCTGTT    GCTGTT    GCTGTT    GCT       disruptive_inframe_deletion     MODERATE                gctgtt/gct      p.Val8del/c.23_25delTGT 86      ABAYE3157       protein_coding

gene            3199972..3200232
                /locus_tag="ABAYE3157"
CDS             3199972..3200232
                /locus_tag="ABAYE3157"
                /inference="ab initio prediction:AMIGene:2.0"
                /note="Evidence 4 : Homologs of previously reported genes
                of unknown function"
                /codon_start=1
                /transl_table=11
                /product="conserved hypothetical protein; putative
                exported protein"
                /protein_id="CAM87966.1"
                /db_xref="EnsemblGenomes-Gn:ABAYE3157"
                /db_xref="EnsemblGenomes-Tr:CAM87966"
                /db_xref="UniProtKB/TrEMBL:B0V5Q3"
                /translation="MLFKKFAVAAAVAATLAFVGCSKKEEAPAANAASEATAAASEAT
                AAASEAATAASQAVDAAASEAAASTAAASAPAASEAAASAAH"

>CU459141:3199972-3200232
ATGTTATTCAAAAAATTC GCTGTT GCTGCTGCTGTTGCTGCTACTTTAGCTTTCGTTGGTTGTTCTAAGA
AAGAAGAAGCTCCTGCTGCTAACGCTGCATCTGAAGCTACTGCTGCTGCATCTGAAGCTACTGCTGCTGC
ATCTGAAGCTGCTACTGCTGCTTCTCAAGCTGTTGATGCTGCTGCATCTGAAGCTGCTGCTTCTACTGCT
GCCGCTTCTGCACCTGCCGCTTCTGAAGCTGCTGCTTCTGCTGCACACTAA

(Optional-->may error-->ignoring the step! since it should not happed) filter rows without change (between REF and the isolates) in snippy/summary_snps_indels_.csv (94)

awk -F, '
NR == 1 {print; next}  # Print the header line
{
    ref = $3;  # Reference base (assuming REF is in the 3rd column)
    same = 1;  # Flag to check if all bases are the same as reference
    samples = "";  # Initialize variable to hold sample data
    for (i = 6; i <= NF - 8; i++) {  # Loop through all sample columns, adjusting for the number of fixed columns
        samples = samples $i " ";  # Collect sample data
        if ($i != ref) {
            same = 0;
        }
    }
    if (!same) {
        print $0;  # Print the entire line if not all bases are the same as the reference
        #print "Samples: " samples;  # Print all sample data for checking
    }
}
' merged_variants.csv > merged_variants_.csv
#Explanation:
#    -F, sets the field separator to a comma.
#    NR == 1 {print; next} prints the header line and skips to the next line.
#    ref = $3 sets the reference base (assumed to be in the 3rd column).
#    same = 1 initializes a flag to check if all sample bases are the same as the reference.
#    samples = ""; initializes a string to collect sample data.
#    for (i = 6; i <= NF - 8; i++) loops through the sample columns. This assumes the first 5 columns are fixed (CHROM, POS, REF, ALT, TYPE), and the last 8 columns are fixed (Effect, Impact, Functional_Class, Codon_change, Protein_and_nucleotide_change, Amino_Acid_Length, Gene_name, Biotype).
#    samples = samples $i " "; collects sample data.
#    if ($i != ref) { same = 0; } checks if any sample base is different from the reference base. If found, it sets same to 0.
#    if (!same) { print $0; print "Samples: " samples; } prints the entire line and the sample data if not all sample bases are the same as the reference.

(Optional) Draw local genetic environments of △adeIJ

see http://xgenes.com/article/article-content/325/analysis-of-snps-indels-transposons-and-is-elements-in-5-a-baumannii-strains/

(Optional) Show difference between the genome using BRIG (show no differences by images!)

This is an issue with java version. I was running Java 1.8 version and had same problem as the OP's question. Then I followed the answers and finally changing the java version worked for me.
I ran the following steps:
1) Go to link: https://www.oracle.com/java/technologies/javase-java-archive-javase6-downloads.html
2) mkdir Java1.6
3) Download "jdk-6u45-linux-x64.bin" and save in Java1.6
3) cd Java1.6 && chmod +x jdk-6u45-linux-x64.bin && ./jdk-6u45-linux-x64.bin
4) You should see java file in Java1.6/jdk1.6.0_45/bin
5) Java1.6/jdk1.6.0_45/bin/java -version - should give "1.6.0_45"
6) (base) conda install bioconda::blast-legacy  # install blastall 2.2.26
7) #set lastLocation="/home/jhuang/miniconda3/bin/" in default-BRIG.xml --> /home/jhuang/miniconda3/bin/blastall is used for blast!
8) Open BRIG using Java1.6 as follows: ~/Tools/Java1.6/jdk1.6.0_45/bin/java -Xmx15000M -jar BRIG.jar
Note: I did not uninstall my original Java1.8 rather used the downloded java1.6 execulatable file using path. That way I retain both versions without any problem.
This gave me all the rings without any issue!! Hope this helps someone!
Users who wish to run BRIG from the command-line need to:
* Navigate to the unpacked BRIG folder in a command-line interface (terminal, console, command prompt).
* Run 'java -Xmx1500M -jar BRIG.jar'. Where -Xmx specifies the amount of memory allocated to BRIG.
Note:
* BLAST legacy comes as a compressed package, which will unzip the BLAST binaries where ever the package is. We advise users to first create a BLAST directory (in either the home or applications directory), copy the downloaded BLAST package to that directory and unzip the package.
* BRIG supports both BLAST+ & BLAST Legacy. If BRIG cannot find BLAST it will prompt users at runtime. Users can specify the location of their BLAST installation in the BRIG options menu which is: Main window > Preferences > BRIG options.
* N.B: If BOTH BLAST+ and legacy versions are in the same location, BRIG will prefer BLAST+
convert CP059040.png -crop 4000x2780+3840+0 CP059040_.png

TODO: Variant calling against xxxx (found in vrap results)!

Project Data_Tam_DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2

    #ln -s ../X101SC24115801-Z01-J001/01.RawData/HF/HF_1.fq.gz HF_R1.fastq.gz
    #ln -s ../X101SC24115801-Z01-J001/01.RawData/HF/HF_2.fq.gz HF_R2.fastq.gz

    (bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds
    #HF is Enterobacter cloacae (550) or Enterobacter hormaechei (158836)

    # ---- Download related genomes from ENA ----
    https://www.ebi.ac.uk/ena/browser/view/550
    #Click "Sequence" and download "Counts" (7263) and "Taxon descendants count" (8004) if there is enough time! Downloading time points is 28.02.2025.
    python ~/Scripts/filter_fasta.py  ena_550_sequence.fasta complete_genome_550_ena_taxon_descendants_count.fasta  #8004-->100
    https://www.ebi.ac.uk/ena/browser/view/158836
    #Click "Sequence" and download "Counts" (3763) and "Taxon descendants count" (4846) if there is enough time! Downloading time points is 28.02.2025.
    python ~/Scripts/filter_fasta.py  ena_158836_sequence.fasta complete_genome_158836_ena_taxon_descendants_count.fasta  #4846-->540
    cat complete_genome_158836_ena_taxon_descendants_count.fasta complete_genome_550_ena_taxon_descendants_count.fasta > complete_genome_158836_550.fasta
    #check if the flanking line is correct
    grep "ENA|AP022130|AP022130.1" complete_genome_158836_550.fasta
    #>ENA|AP022130|AP022130.1 Enterobacter cloacae plasmid pWP5-S18-CRE-02_4 DNA, complete genome, strain: WP5-S18-CRE-02.

    mv trimmed/HF_trimmed*.fastq .
    gzip *.fastq
    ln -s ~/Tools/vrap/ .
    mamba activate /home/jhuang/miniconda3/envs/vrap
    vrap/vrap.py  -1 HF_trimmed_P_1.fastq.gz -2 HF_trimmed_P_2.fastq.gz -o vrap_HF --bt2idx=/home/jhuang/REFs/genome --host=/home/jhuang/REFs/genome.fa --virus=/home/jhuang/DATA/Data_Tam_DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2/complete_genome_158836_550.fasta --nt=/mnt/nvme1n1p1/blast/nt --nr=/mnt/nvme1n1p1/blast/nr  -t 100 -l 200  -g

    # Variant calling against ATCC 19606
    mkdir results_with_HF
    mv variants results_with_HF
    mv roary results_with_HF
    mv fasttree results_with_HF
    mv raxml-ng results_with_HF
    mv snippy/HF/ results_with_HF/HF_under_snippy
    rm raw_data/HF_*.fastq.gz
    rm fastq/HF_*.fastq
    (bengal3_ac3) /home/jhuang/miniconda3/envs/snakemake_4_3_1/bin/snakemake --printshellcmds

    mkdir ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040
    cp db/CP059040.gb  ~/miniconda3/envs/spandx/share/snpeff-5.1-2/data/CP059040/genes.gbk
    vim ~/miniconda3/envs/spandx/share/snpeff-5.1-2/snpEff.config
    /home/jhuang/miniconda3/envs/spandx/bin/snpEff build CP059040    #-d
    ~/Scripts/genbank2fasta.py CP059040.gb
    mv CP059040.gb_converted.fna CP059040.fasta    #rename "CP059040.1 xxxxx" to "CP059040" in the fasta-file

    ln -s /home/jhuang/Tools/spandx/ spandx
    mamba activate /home/jhuang/miniconda3/envs/spandx
    (spandx) nextflow run spandx/main.nf --fastq "trimmed/*_P_{1,2}.fastq" --ref CP059040.fasta --annotation --database CP059040 -resume

    python3 ~/Scripts/summarize_snippy_res.py snippy
    grep -v "None,,,,,,None,None" summary_snps_indels.csv > summary_snps_indels_.csv

    cd Outputs/Master_vcf
    (spandx) cp -r ../../snippy/CM1/reference .
    (spandx) cp ../../spandx/bin/SNP_matrix.sh ./
    #Adapt "snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -v ${variant_genome_path} out.vcf > out.annotated.vcf" to
    "/home/jhuang/miniconda3/envs/bengal3_ac3/bin/snpEff eff -no-downstream -no-intergenic -ud 100 -formatEff -c reference/snpeff.config -dataDir . ref out.vcf > out.annotated.vcf" in SNP_matrix.sh
    (spandx) bash SNP_matrix.sh CP059040 .

Supplementary: Enterobacter cloacae (taxid550) vs. Enterobacter hormaechei (taxid158836)

    🔬 介绍
    阴沟肠杆菌（Enterobacter cloacae） 和 霍尔马氏肠杆菌（Enterobacter hormaechei） 都属于 肠杆菌科（Enterobacteriaceae），是革兰氏阴性、兼性厌氧的杆状细菌。它们广泛存在于 环境中（如水、土壤、植物） 以及 人类和动物的肠道 中。

Report

    Please find attached the SNP and Indel calling results for the project DNAseq_2025_AYE against CU459141, as well as for the project DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2 against CP059040.

    Additionally, I have attached the analysis for the two clinical samples: clinical (in DNAseq_2025_AYE) and HF (in DNAseq_2025_adeABadeIJ_adeIJK_CM1_CM2). You can find the details in the files clinical_summary.xlsx and HF_summary.xlsx. The closest species found in the database are "Acinetobacter baumannii strain 2024CK-00130" and "Enterobacter hormaechei strain K528". Would you like to have the SNP and Indel analysis for the clinical samples compared to these closely related species as the reference?

like unlike

点赞本文的读者

还没有人对此文章表态

本文有评论

没有评论

DNAseq 2025_AYE and DNAseq 2025_adeABadeIJ_adeIJK_CM1_CM2

本文有评论

看文章，发评论，不要沉默

最受欢迎文章

最新文章

最多评论文章

推荐相似文章