Unicycler vs. Trycycler

gene_x 0 like s 418 view s

Tags: bacterium, genome, pipeline

  1. prapare the input sequencing data

    NGS.id  Sample.name  ONT_barcode
    jk3332  5179R1  Native Barcode NB01
    jk3333  1585  Native Barcode NB02
    jk3334  1585V  Native Barcode NB03
    jk3335  5179  Native Barcode NB04
    jk3336  HD_05_2  Native Barcode NB05
    jk3337  HD_05_2_K5  Native Barcode NB06
    jk3338  HD_05_2_K6  Native Barcode NB07
    
  2. assembly using trycycler

    cat FAN41335_pass_barcode01_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode01_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode01.fastq.gz
    cat FAN41335_pass_barcode03_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode03_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode03.fastq.gz
    cat FAN41335_pass_barcode04_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_1.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_2.fastq.gz > FAN41335_pass_barcode04.fastq.gz
    cat FAN41335_pass_barcode05_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode05_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode05.fastq.gz
    
    unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode normal -t 55 -o 5179R1_normal
    unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode normal -t 55 -o 1585_normal
    #3 no short sequencing
    unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode normal -t 55 -o 5179_normal
    unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode normal -t 55 -o HD05_2_normal
    unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode normal -t 55 -o HD05_2_K5_normal
    unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode normal -t 55 -o HD05_2_K6_normal
    
    unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
    unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
    #3 no short sequencing
    unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
    unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
    unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
    unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    
    unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
    unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
    #3 no short sequencing
    unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
    unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
    unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
    unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    
    unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
    unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
    #3 no short sequencing
    unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    
    unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
    unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
    unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    
    ragtag.py scaffold  ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
    ragtag.py patch  ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
    grep -o 'N' ragtag.patch.fasta | wc -l
    
    makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
    blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    
  3. install the trycycler environment

    nextdenovo_dir="/path/to/NextDenovo"
    nextpolish_dir="/path/to/NextPolish"
    genome_size="2500000" #2 503 927
    /home/jhuang/Tools/canu/build/bin/canu    -p canu -d canu_temp -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore read_subsets/sample_"$i".fastq
    /home/jhuang/Tools/Trycycler/scripts/canu_trim.py    canu_temp/canu.contigs.fasta > assemblies/assembly_"$i".fasta
    /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh    read_subsets/sample_"$i".fastq "$threads" > assemblies/assembly_"$i".gfa
    /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl    config config.txt
    /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl    bridge config.txt
    /home/jhuang/Tools/raven/build/bin/raven    --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/assembly_"$i".gfa read_subsets/sample_"$i".fastq > assemblies/assembly_"$i".fasta
    
    #https://github.com/rrwick (Bandage, Unicycler, Filtlong, Trycycler, Polypolish
    install canu, flye, raven, miniasm, minipolish, any2fasta via 'mamba install'
    #install fastp, medaka, polypolish, masurca (install Polca) with 'mamba install'
    
    install NextDenovo and NextPolish from https://github.com/Nextomics
    wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
    tar -vxzf NextDenovo.tgz && cd NextDenovo
    #cd NextDenovo && make
    wget https://github.com/Nextomics/NextPolish/releases/download/v1.4.1/NextPolish.tgz
    pip install paralleltask
    tar -vxzf NextPolish.tgz && cd NextPolish   #&& make
    
    git clone https://github.com/rrwick/Minipolish.git
    
    $ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
    $ tar xzvf necat_20200803_Linux-amd64.tar.gz
    $ cd NECAT/Linux-amd64/bin
    $ export PATH=$PATH:$(pwd)
    
    # Install canu and raven under ~/Tools/
    git clone https://github.com/marbl/canu.git
    cd canu/src
    make -j 50  #<number of threads>
    
    git clone https://github.com/lbcb-sci/raven && cd raven
    cmake -S ./ -B./build -DRAVEN_BUILD_EXE=1 -DCMAKE_BUILD_TYPE=Release
    cmake --build build
    
    # Adapt the script trycycler_assembly_extra-thorough.sh with the following complete paths.
    /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl
    /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh
    
  4. assembly using trycycler

    TODO (IMPORTANT): assmeble all genomes using the following methods. compare them to the unicycler results.
    
    (trycycler) jhuang@hamm:~/DATA/Data_Holger_S.epidermidis_1585_5179_HD05$ ./trycycler_assembly_extra-thorough.sh
    
    #In the HD05 project, we use the following strategies!
    
    I. At first construct the genome only with Trycycler (Trycycler: a consensus long-read assembly tool),
    
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz trycycler_5179R1/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz trycycler_1585/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode03/FAN41335_pass_barcode03.fastq.gz trycycler_1585v/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz trycycler_5179/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz trycycler_HD05_2/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz trycycler_HD05_2_K5/reads.fastq.gz
    cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz trycycler_HD05_2_K6/reads.fastq.gz
    
    for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    cd ${sample};
    ../trycycler_assembly_extra-thorough.sh;
    cd ..;
    done
    #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
    for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    cd ${sample};
    ../trycycler_assembly_extra-thorough_raven.sh;
    cd ..;
    done
    #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
    for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    cd ${sample};
    ../trycycler_assembly_extra-thorough_canu.sh;
    cd ..;
    done
    #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    
    for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    cd ${sample};
    trycycler cluster --threads 55 --assemblies assemblies/*.fasta --reads reads.fastq --out_dir trycycler;
    cd ..;
    done
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    #Error: failed to circularise sequence D_bctg00000000 because its start could not be found in other sequences. You can either trim some sequence off the start of D_bctg00000000 or exclude the sequence altogether
    and try again.
    #Error: failed to circularise sequence E_ctg000010 for multiple reasons. You must either repair this sequence or exclude it and then try running trycycler reconcile again.
    #Error: failed to circularise sequence W_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of W_ctg000000 or exclude the sequence altogether and try again.
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    #Error: failed to circularise sequence K_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of K_ctg000000 or exclude the sequence altogether and try
    #Worst-1kbp: W_Utg714
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    #Error: failed to circularise sequence T_contig_1 because its end could not be found in other sequences. You can either trim some sequence off the end of T_contig_1 or exclude the sequence altogether and try again.
    # Worst-1kbp: D_bctg00000000, J_bctg00000000, P_bctg00000000
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    #Error: failed to circularise sequence A_tig00000003 because its start could not be found in other sequences. You can either trim some sequence off the start of A_tig00000003 or exclude the sequence altogether and try again.
    #Error: failed to circularise sequence E_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of E_ctg000000 or exclude the sequence altogether and try again.
    #Error: failed to circularise sequence Q_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of Q_ctg000000 or exclude the sequence altogether and try again.
    # Worst-1kbp: L_Utg716, X_Utg654
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_002
    #M_tig00000002, S_tig00000003, A_tig00000003, C_utg000003l, G_tig00000002, I_utg000002l
    #E_ctg000000, K_ctg000000, Q_ctg000000
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_003
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_004
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_005
    trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_006
    #
    #--> When finished, Trycycler reconcile will make 2_all_seqs.fasta in the cluster directory, a multi-FASTA file containing each of the contigs ready for multiple sequence alignment.
    
    trycycler msa --threads 55 --cluster_dir trycycler/cluster_001
    trycycler msa --threads 55 --cluster_dir trycycler/cluster_002
    trycycler msa --threads 55 --cluster_dir trycycler/cluster_003
    trycycler msa --threads 55 --cluster_dir trycycler/cluster_004
    trycycler msa --threads 55 --cluster_dir trycycler/cluster_005
    #--> When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory
    
    #generate 4_reads.fastq for each contig!
    trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_*
    #trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003
    
    trycycler consensus --threads 55 --cluster_dir trycycler/cluster_001
    trycycler consensus --threads 55 --cluster_dir trycycler/cluster_002
    trycycler consensus --threads 55 --cluster_dir trycycler/cluster_003
    trycycler consensus --threads 55 --cluster_dir trycycler/cluster_004
    trycycler consensus --threads 55 --cluster_dir trycycler/cluster_005
    
    #!!NOTE that we take the isolates of HD05_2_K5 and HD05_2_K6 assembled by Unicycler instead of Trycycler!!
    
    # TODO (TODAY), generate the 3 datasets below!
    # TODO (IMPORTANT): write a Email to Holger, say the short sequencing of HD5_2 is not correct, since the 3 datasets! However, the MTxxxxxxx is confirmed not in K5 and K6!
    TODO: variant calling needs the short-sequencing, they are not dorable without the correct short-reads! resequencing? It is difficult to call variants only from long-reads since too much errors in long-reads!
    #TODO: check the MT sequence if in the isolates, more deteiled annotations come late!
    #II. Comparing the results of Trycycler with Unicycler.
    #III. Eventually add the plasmids assembled from unicycler to the final results. E.g. add the 4 plasmids to K5 and K6
    
  5. Polishing after Trycycler

    #1. Oxford Nanopore sequencer (Ignored due to the samtools version incompatibility!)
    # for c in trycycler/cluster_*; do
    #     medaka_consensus -i "$c"/4_reads.fastq -d "$c"/7_final_consensus.fasta -o "$c"/medaka -m r941_min_sup_g507 -t 12
    #     mv "$c"/medaka/consensus.fasta "$c"/8_medaka.fasta
    #     rm -r "$c"/medaka "$c"/*.fai "$c"/*.mmi  # clean up
    # done
    # cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta
    
    #2. Short-read polishing
    
    #---- 5179_R1 (2) ----
    #  mean read depth: 205.8x
    #  188 bp have a depth of zero (99.9924% coverage)
    #  355 positions changed (0.0144% of total positions)
    #  estimated pre-polishing sequence accuracy: 99.9856% (Q38.42)
    
    #Step 1: read QC
    fastp --in1 ../../s-epidermidis-5179-r1_R1.fastq.gz --in2 ../../s-epidermidis-5179-r1_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002; do
    bwa index ${cluster}/7_final_consensus.fasta
    bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002; do
    cd ${cluster}
    polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    
    #Substitution Errors: 37
    #Insertion/Deletion Errors: 2
    #Assembly Size: 2470001
    #Consensus Quality: 99.9984
    
    #Substitution Errors: 4
    #Insertion/Deletion Errors: 0
    #Assembly Size: 17748
    #Consensus Quality: 99.9775
    
    #Step 4: (optional) more rounds and/or other polishers
    #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
    #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
    for cluster in cluster_001 cluster_002; do
    cd ${cluster}
    polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    
    Substitution Errors: 13
    Insertion/Deletion Errors: 0
    Assembly Size: 2470004
    Consensus Quality: 99.9995
    
    Substitution Errors: 0
    Insertion/Deletion Errors: 0
    Assembly Size: 17748
    Consensus Quality: 100
    
    for cluster in cluster_001; do
    cd ${cluster}
    polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2470004
    #Consensus Quality: 100
    
    #---- 1585 (4) ----
    #  mean read depth: 174.7x
    #  8,297 bp have a depth of zero (99.6604% coverage)
    #  271 positions changed (0.0111% of total positions)
    #  estimated pre-polishing sequence accuracy: 99.9889% (Q39.55)
    
    #Step 1: read QC
    fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      bwa index ${cluster}/7_final_consensus.fasta
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
      polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      cd ${cluster}
      polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
      cd ..
    done
    
    #Substitution Errors: 7
    #Insertion/Deletion Errors: 4
    #Assembly Size: 2443174
    #Consensus Quality: 99.9995
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 9014
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 9014
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2344
    #Consensus Quality: 100
    
    #Step 4: (optional) more rounds and/or other polishers
    #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
    #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
    for cluster in cluster_001; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
      cd ..
    done
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2443176
    #Consensus Quality: 100
    
    #---- 1585 derived from unicycler, under 1585_normal/unicycler (4) ----
    #Step 0: copy chrom and plasmid1, plasmid2, plasmid3 to cluster_001/7_final_consensus.fasta, ...
    
    #Step 1: read QC
    fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      bwa index ${cluster}/7_final_consensus.fasta
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
      polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    #Polishing 1 (2,443,574 bp):
    #mean read depth: 174.7x
    #8,298 bp have a depth of zero (99.6604% coverage)
    #52 positions changed (0.0021% of total positions)
    #estimated pre-polishing sequence accuracy: 99.9979% (Q46.72)
    #Polishing 2 (9,014 bp):
    #mean read depth: 766.5x
    #3 bp have a depth of zero (99.9667% coverage)
    #0 positions changed (0.0000% of total positions)
    #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    #Polishing 7 (2,344 bp):
    #mean read depth: 2893.0x
    #4 bp have a depth of zero (99.8294% coverage)
    #0 positions changed (0.0000% of total positions)
    #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    #Polishing 8 (2,255 bp):
    #mean read depth: 2719.6x
    #4 bp have a depth of zero (99.8226% coverage)
    #0 positions changed (0.0000% of total positions)
    #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      cd ${cluster}
      polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
      cd ..
    done
    
    #Substitution Errors: 7
    #Insertion/Deletion Errors: 4
    #Assembly Size: 2443598
    #Consensus Quality: 99.9995
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 9014
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2344
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2255
    #Consensus Quality: 100
    
    #Step 4: (optional) more rounds and/or other polishers
    #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape! 
    #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    
    for cluster in cluster_001; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
      cd ..
    done
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2443600
    #Consensus Quality: 100
    
    #-- 1585v (1, no short reads, waiting) --
    # TODO!
    
    #-- 5179 (2) --
    #mean read depth: 120.7x
    #7,547 bp have a depth of zero (99.6946% coverage)
    #356 positions changed (0.0144% of total positions)
    #estimated pre-polishing sequence accuracy: 99.9856% (Q38.41)
    
    #Step 1: read QC
    fastp --in1 ../../s-epidermidis-5179_R1.fastq.gz --in2 ../../s-epidermidis-5179_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002; do
    bwa index ${cluster}/7_final_consensus.fasta
    bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002; do
    cd ${cluster}
    polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    
    #Substitution Errors: 49
    #Insertion/Deletion Errors: 23
    #Assembly Size: 2471418
    #Consensus Quality: 99.9971
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 17748
    #Consensus Quality: 100
    
    #Step 4: (optional) more rounds POLCA
    for cluster in cluster_001; do
    cd ${cluster}
    polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    #Substitution Errors: 10
    #Insertion/Deletion Errors: 5
    #Assembly Size: 2471442
    #Consensus Quality: 99.9994
    
    for cluster in cluster_001; do
    cd ${cluster}
    polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    Substitution Errors: 6
    Insertion/Deletion Errors: 0
    Assembly Size: 2471445
    Consensus Quality: 99.9998
    
    for cluster in cluster_001; do
    cd ${cluster}
    polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    cd ..
    done
    Substitution Errors: 0
    Insertion/Deletion Errors: 0
    Assembly Size: 2471445
    Consensus Quality: 100
    
    #-- HD5_2 (2): without the short-sequencing we cannot correct the base-calling! --
    # !ERROR to be REPORTED, the 
    #Polishing cluster_001_consensus (2,504,140 bp):
    #mean read depth: 94.4x
    #240,420 bp have a depth of zero (90.3991% coverage)
    #56,894 positions changed (2.2720% of total positions)
    #estimated pre-polishing sequence accuracy: 97.7280% (Q16.44)
    
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R2_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R1_001.fastq
    /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R2_001.fastq
    #Step 1: read QC
    fastp --in1 ../../HD5_2_S38_R1_001.fastq.gz --in2 ../../HD5_2_S38_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    # NOTE that the following steps are not run since the short-reads are not correct!
    # #Step 2: Polypolish
    # for cluster in cluster_001 cluster_005; do
    #   bwa index ${cluster}/7_final_consensus.fasta
    #   bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    #   bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    #   polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    # done
    
    # #Step 3: POLCA
    # for cluster in cluster_001 cluster_005; do
    #   cd ${cluster}
    #   polca.sh -a polypolish.fasta -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
    #   cd ..
    # done
    
    # #Step 4: (optional) more rounds POLCA
    # for cluster in cluster_001; do
    #   cd ${cluster}
    #   polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
    #   cd ..
    # done
    
    # NOTE that the plasmids of HD5_2_K5 and HD5_2_K6 were copied from Unicycler!
    #-- HD5_2_K5 (4) --
    mean read depth: 87.1x
    25 bp have a depth of zero (99.9990% coverage)
    1,085 positions changed (0.0433% of total positions)
    estimated pre-polishing sequence accuracy: 99.9567% (Q33.63)
    
    #Step 1: read QC
    fastp --in1 ../../275_K5_Holger_S92_R1_001.fastq.gz --in2 ../../275_K5_Holger_S92_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      bwa index ${cluster}/7_final_consensus.fasta
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
      polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      cd ${cluster}
      polca.sh -a polypolish.fasta -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 146
    #Insertion/Deletion Errors: 2
    #Assembly Size: 2504401
    #Consensus Quality: 99.9941
    
    #Substitution Errors: 41
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 99.9007
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 9191
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2767
    #Consensus Quality: 100
    
    #Step 4: (optional) more rounds POLCA
    for cluster in cluster_001 cluster_002; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 41
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504401
    #Consensus Quality: 99.9984
    
    #Substitution Errors: 8
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 99.9806
    
    for cluster in cluster_001 cluster_002; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 8
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504401
    #Consensus Quality: 99.9997
    
    #Substitution Errors: 4
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 99.9903
    
    for cluster in cluster_001 cluster_002; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 8
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504401
    #Consensus Quality: 99.9997
    
    #Substitution Errors: 4
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 99.9903
    
    #-- HD5_2_K6 (4) --
    #mean read depth: 116.7x
    #4 bp have a depth of zero (99.9998% coverage)
    #1,022 positions changed (0.0408% of total positions)
    #estimated pre-polishing sequence accuracy: 99.9592% (Q33.89)
    
    #Step 1: read QC
    fastp --in1 ../../276_K6_Holger_S95_R1_001.fastq.gz --in2 ../../276_K6_Holger_S95_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    
    #Step 2: Polypolish
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      bwa index ${cluster}/7_final_consensus.fasta
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
      bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
      polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    done
    
    #Step 3: POLCA
    for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
      cd ${cluster}
      polca.sh -a polypolish.fasta -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 164
    #Insertion/Deletion Errors: 2
    #Assembly Size: 2504398
    #Consensus Quality: 99.9934
    
    #Substitution Errors: 22
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 99.9467
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 9191
    #Consensus Quality: 100
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2767
    #Consensus Quality: 100
    
    #Step 4: (optional) more rounds POLCA
    for cluster in cluster_001 cluster_002; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 32
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504400
    #Consensus Quality: 99.9987
    
    #Substitution Errors: 0
    #Insertion/Deletion Errors: 0
    #Assembly Size: 41288
    #Consensus Quality: 100
    
    for cluster in cluster_001; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 4
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504400
    #Consensus Quality: 99.9998
    
    for cluster in cluster_001; do
      cd ${cluster}
      polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
      cd ..
    done
    #Substitution Errors: 2
    #Insertion/Deletion Errors: 0
    #Assembly Size: 2504400
    #Consensus Quality: 99.9999
    
  6. Results by directly using Unicycler

    #----------------------- 5179R1_normal -----------------------
    
    >1 length=2468563 depth=1.00x circular=true
    >2 length=17748 depth=1.42x circular=true
    
    Component   Segments   Links   Length      N50         Longest segment   Status
        total          2       2   2,486,311   2,468,563         2,468,563
            1          1       1   2,468,563   2,468,563         2,468,563   complete
            2          1       1      17,748      17,748            17,748   complete
    
    Segment   Length      Depth   Starting gene         Position    Strand    Identity   Coverage
        1   2,468,563   1.00x   UniRef90_Q5HJZ9       1,212,460   forward     100.0%     100.0%
        2      17,748   1.42x   UniRef90_A0A0H2VIR3       4,804   reverse      93.2%      99.7%
    
    # ---- 5179_bold ----
    
    Segment   Length      Depth    Starting gene         Position    Strand    Identity   Coverage
        1   2,469,173    1.00x   UniRef90_Q5HJZ9       1,901,872   reverse     100.0%     100.0%
        2      17,749    2.27x   UniRef90_A0A0H2VIR3       4,771   forward      93.2%      99.7%
        4       4,595   10.19x   none found
        8       2,449   17.14x   none found
    
    >1 length=2469173 depth=1.00x circular=true
    >2 length=17749 depth=2.27x circular=true
    >3 length=4761 depth=0.44x
    >4 length=4595 depth=10.19x circular=true
    >5 length=3735 depth=0.29x
    >6 length=3718 depth=0.42x
    >7 length=3573 depth=0.52x
    >8 length=2449 depth=17.14x circular=true
    >9 length=2411 depth=0.35x
    >10 length=2371 depth=0.32x
    >11 length=2365 depth=0.43x
    >12 length=1637 depth=0.44x
    >13 length=1568 depth=0.66x
    >14 length=1505 depth=0.65x
    >15 length=1403 depth=0.93x
    >16 length=1329 depth=0.55x
    
    makeblastdb -in assembly.fasta -dbtype nucl
    blastn -task blastn-short -db ../HD05_2_K5_conservative/assembly.fasta -query assembly.fasta -out 2-16_vs_1.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    
    #TODO: manually fill the gap in the HD05_2 genome!
    
    5       1       99.946  3728    1       1       1       3728    1535666 1539392 0.0     7366
    6       1       99.973  3718    0       1       1       3718    702963  706679  0.0     7355
    7       1       99.888  3573    1       3       1       3573    1764622 1768191 0.0     7027
    9       1       100.000 2411    0       0       1       2411    1060914 1063324 0.0     4779
    10      1       100.000 2371    0       0       1       2371    615275  612905  0.0     4700
    11      1       99.958  2365    0       1       1       2365    1088713 1086350 0.0     4672
    12      1       100.000 1637    0       0       1       1637    146635  144999  0.0     3245
    13      1       99.936  1568    0       1       1       1568    2024197 2025763 0.0     3092
    14      1       100.000 1505    0       0       1       1505    2445480 2443976 0.0     2983
    15      1       100.000 1403    0       0       1       1403    197723  196321  0.0     2781
    16      1       99.925  1329    1       0       1       1329    49854   48526   0.0     2627
    
    # -------------------- 1585_normal --------------------
    >1 length=2443574 depth=1.00x circular=true       #contig_1        2442282 10      60      61
    >2 length=9014 depth=3.72x circular=true
    >3 length=4388 depth=0.89x
    >4 length=3443 depth=0.48x
    >5 length=3338 depth=0.48x
    >6 length=3336 depth=0.45x
    >7 length=2344 depth=11.44x circular=true
    >8 length=2255 depth=9.81x circular=true
    >9 length=1929 depth=0.37x
    >10 length=1703 depth=1.67x
    >11 length=1605 depth=0.26x
    >12 length=1381 depth=0.56x
    >13 length=1360 depth=0.39x
    >14 length=1281 depth=0.41x
    >15 length=1163 depth=0.51x
    >16 length=1088 depth=0.24x
    
    2594107
    
    ragtag.py scaffold  ../HD05_2_K5_normal/assembly.fasta assembly.fasta
    ragtag.py patch    ragtag.scaffold.fasta ../../HD05_2_K5_normal/assembly.fasta
    grep -o 'N' ragtag.patch.fasta | wc -l
    
    3       1       99.977  4388    0       1       1       4388    2410738 2406352 0.0     8683
    4       1       99.942  3443    0       2       1       3443    2222741 2219301 0.0     6794
    5       1       99.970  3338    0       1       1       3338    455636  452300  0.0     6601
    6       1       99.940  3336    0       2       1       3336    1617740 1614407 0.0     6581
    9       1       99.948  1929    0       1       1       1929    1321522 1319595 0.0     3808
    10      1       99.941  1703    1       0       1       1703    90503   88801   0.0     3368
    11      1       99.938  1605    0       1       1       1605    2361795 2363398 0.0     3166
    12      1       99.928  1381    0       1       1       1381    241092  242471  0.0     2722
    13      1       100.000 1360    0       0       1       1360    1157897 1159256 0.0     2696
    14      1       100.000 1281    0       0       1       1281    218323  219603  0.0     2539
    15      1       100.000 1163    0       0       1       1163    2077536 2078698 0.0     2305
    16      1       100.000 1088    0       0       1       1088    283284  284371  0.0     2157
    
    >1 length=2503585 depth=1.00x circular=true
    >2 length=41288 depth=3.32x circular=true
    >3 length=9191 depth=8.29x circular=true
    >4 length=2767 depth=9.36x circular=true
    
    >1 length=2503927 depth=1.00x circular=true
    >2 length=41288 depth=3.77x circular=true
    >3 length=9191 depth=7.83x circular=true
    >4 length=2767 depth=10.11x circular=true
    
    #--------------------------
    1585V
    #[2024-01-17 13:42:28] INFO: Assembly statistics:
    
        Total length:   2438882 vs 2443574
        Fragments:  1
        Fragments N50:  2438882
        Largest frg:    2438882
        Scaffolds:  0
        Mean coverage:  47
    
    unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
    unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
    #3 no short sequencing
    unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    
    unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
    unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
    unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    
    # ---- 1  5179R1  2469692 ----
    >1 length=2468563 depth=1.00x circular=true
    >2 length=17748 depth=1.42x circular=true
    
    # ---- 2  1585    2442282 ---- (compring to Trycyler chrom is 2443176 nt)
    >1 length=2443574 depth=1.00x circular=true
    >2 length=9014 depth=3.72x circular=true
    >7 length=2344 depth=11.44x circular=true
    >8 length=2255 depth=9.81x circular=true
    
    # ---- 3  1585v   2438882 ----
    #using long sequencing only 1
    
    # ---- 4  5179_bold    2471107+17740 ----
    >1 length=2469173 depth=1.00x circular=true
    >2 length=17749 depth=2.27x circular=true
    >4 length=4595 depth=10.19x circular=true
    >8 length=2449 depth=17.14x circular=true
    
    # ---- 5  HD05_2  2504622 ----
    # Note in HD05_2_bold_hq_lq including the bad long-reads.
    >1 length=965875 depth=0.95x
    >2 length=855325 depth=1.00x
    >3 length=582944 depth=1.02x
    >4 length=183656 depth=1.02x
    >5 length=13570 depth=4.73x circular=true
    >6 length=1503 depth=4.85x
    >7 length=1271 depth=5.06x
    >8 length=227 depth=2.03x
    >9 length=153 depth=0.93x
    >10 length=152 depth=1.09x
    
    # trycycler: 2503231 (yes), 9183 (yes), 22394, 18541 --
    
    # ---- 6  HD05_2_K5  2504656+41290+9191 ----
    conservative
    >1 length=2503585 depth=1.00x circular=true
    >2 length=41288 depth=3.32x circular=true
    >3 length=9191 depth=8.29x circular=true
    >4 length=2767 depth=9.36x circular=true
    
    # ---- 7  HD05_2_K6  2504588+41285+9192 ----
    conservative
    >1 length=2503927 depth=1.00x circular=true
    >2 length=41288 depth=3.77x circular=true
    >3 length=9191 depth=7.83x circular=true
    >4 length=2767 depth=10.11x circular=true
    
    ragtag.py scaffold  ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
    ragtag.py patch  ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
    grep -o 'N' ragtag.patch.fasta | wc -l
    
    makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
    blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.000001 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
    
  7. Submit all genomes to NCBI

    TODO: If 1585V using only the long reads to assemble the genome!
    BioSample accession
            BioProject: PRJNA1038700
            Staphylococcus epidermidis strain:1585v | isolate:1585v Genome sequencing 
                SAMN38198576
                Pathogen: clinical or host-associated sample from Staphylococcus epidermidis 
                    0 SRAs
    
    Status
        To be released
    
    Release date
        2027-11-10
    
    Created
        2023-11-10 15:24
    
    Updated
        2023-11-17 16:57
    
    Sample name
        1585v
    
    Package
        Pathogen: clinical or host-associated; version 1.0
    
    Organism
    
            Name:
                Staphylococcus epidermidis
    
            Taxonomy ID:
                1282
    
    Attributes
        Attribute name  Attribute value
    
            collected by
            H R
    
            collection date
            2004
    
            geographic location
            Germany: Hamburg
    
            host
            Homo sapiens
    
            host disease
            port-catheter infection
    
            isolation source
            port-catheter
    
            isolate
            missing
    
            strain
            1585v
    
            latitude and longitude
            53.551672 N 9.955081 E
    
            https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#run_pgap
    
  8. Background of 1585v and 1585

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346721/ S. epidermidis 1585 is known to be biofilm-negative in laboratory media, but to form biofilm in the presence of human serum.

In contrast, S. epidermidis 1585 v is a variant derived from strain 1585 in which, due to a chromosomal re-arrangement, a 460 kDa isoform of Embp is overexpressed even in TSB, while mutant M135 is an isogenic mutant of 1585 v in which expression of Embp is interrupted by insertion of transposon Tn917.

Staphylococcus epidermidis (S. epidermidis) 1585 is a specific strain of S. epidermidis that is classified as a wild-type strain. Wild-type in bacterial terminology refers to the strain of an organism that is found in nature, as opposed to those that have been modified or mutated in a laboratory setting. Here are some key points about the 1585 wild-type strain of S. epidermidis:

  1. No Embp Production in TSB: One notable characteristic of the S. epidermidis 1585 strain is that it does not produce Embp (extracellular matrix binding protein) when grown in TSB (Tryptic Soy Broth). Embp is a protein that plays a crucial role in the biofilm formation and adherence of bacteria to surfaces. The absence of Embp production in this strain could impact its ability to form biofilms, a common virulence factor in Staphylococcus infections.

  2. Biofilm Formation: S. epidermidis is known for its ability to form biofilms, especially on medical devices, leading to infections that are difficult to treat. The fact that the 1585 strain doesn't produce Embp in TSB suggests it may have a reduced capacity for biofilm formation under these conditions, which could be significant in understanding and managing such infections.

  3. Research and Clinical Implications: Studying wild-type strains like S. epidermidis 1585 is important for understanding the natural behavior and characteristics of the species. Since this strain behaves differently from other strains in terms of Embp production and possibly biofilm formation, it can provide insights into the mechanisms and genetic factors that control these processes. This knowledge is valuable for developing strategies to prevent and treat infections, especially in hospital and healthcare settings where S. epidermidis infections are common.

  4. Genetic Studies: The 1585 strain can also serve as a baseline or control in genetic studies. By comparing the genome and behavior of 1585 with other strains of S. epidermidis, researchers can identify genetic variations and mutations that may be responsible for different phenotypes, such as increased virulence or antibiotic resistance.

Model Organism for Understanding Staphylococcal Behavior: As a wild-type strain, 1585 offers a model for studying the natural state of S. epidermidis. This is crucial for understanding the fundamental biology of the bacterium, which can help in the development of treatments and interventions against infections caused by more virulent or drug-resistant strains. In summary, the S. epidermidis

1585 wild-type strain is significant in microbiological research due to its natural characteristics, particularly its behavior in biofilm formation and Embp production. Understanding these aspects can contribute to better insights into the pathogenicity and treatment of Staphylococcus infections, particularly in clinical settings where these bacteria are a common source of nosocomial infections.

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum