Unicycler vs. Trycycler

gene_x 0 like s 935 view s

Tags: bacterium, genome, pipeline

  1. prapare the input sequencing data

    1. NGS.id Sample.name ONT_barcode
    2. jk3332 5179R1 Native Barcode NB01
    3. jk3333 1585 Native Barcode NB02
    4. jk3334 1585V Native Barcode NB03
    5. jk3335 5179 Native Barcode NB04
    6. jk3336 HD_05_2 Native Barcode NB05
    7. jk3337 HD_05_2_K5 Native Barcode NB06
    8. jk3338 HD_05_2_K6 Native Barcode NB07
  2. assembly using trycycler

    1. cat FAN41335_pass_barcode01_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode01_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode01.fastq.gz
    2. cat FAN41335_pass_barcode03_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode03_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode03.fastq.gz
    3. cat FAN41335_pass_barcode04_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_1.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_2.fastq.gz > FAN41335_pass_barcode04.fastq.gz
    4. cat FAN41335_pass_barcode05_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode05_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode05.fastq.gz
    5. unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode normal -t 55 -o 5179R1_normal
    6. unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode normal -t 55 -o 1585_normal
    7. #3 no short sequencing
    8. unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode normal -t 55 -o 5179_normal
    9. unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode normal -t 55 -o HD05_2_normal
    10. unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode normal -t 55 -o HD05_2_K5_normal
    11. unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode normal -t 55 -o HD05_2_K6_normal
    12. unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
    13. unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
    14. #3 no short sequencing
    15. unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
    16. unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
    17. unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
    18. unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    19. unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
    20. unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
    21. #3 no short sequencing
    22. unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
    23. unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
    24. unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
    25. unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
    26. unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
    27. unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
    28. #3 no short sequencing
    29. unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    30. unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
    31. unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
    32. unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    33. ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
    34. ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
    35. grep -o 'N' ragtag.patch.fasta | wc -l
    36. makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
    37. blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
  3. install the trycycler environment

    1. nextdenovo_dir="/path/to/NextDenovo"
    2. nextpolish_dir="/path/to/NextPolish"
    3. genome_size="2500000" #2 503 927
    4. /home/jhuang/Tools/canu/build/bin/canu -p canu -d canu_temp -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore read_subsets/sample_"$i".fastq
    5. /home/jhuang/Tools/Trycycler/scripts/canu_trim.py canu_temp/canu.contigs.fasta > assemblies/assembly_"$i".fasta
    6. /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh read_subsets/sample_"$i".fastq "$threads" > assemblies/assembly_"$i".gfa
    7. /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl config config.txt
    8. /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl bridge config.txt
    9. /home/jhuang/Tools/raven/build/bin/raven --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/assembly_"$i".gfa read_subsets/sample_"$i".fastq > assemblies/assembly_"$i".fasta
    10. #https://github.com/rrwick (Bandage, Unicycler, Filtlong, Trycycler, Polypolish
    11. install canu, flye, raven, miniasm, minipolish, any2fasta via 'mamba install'
    12. #install fastp, medaka, polypolish, masurca (install Polca) with 'mamba install'
    13. install NextDenovo and NextPolish from https://github.com/Nextomics
    14. wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
    15. tar -vxzf NextDenovo.tgz && cd NextDenovo
    16. #cd NextDenovo && make
    17. wget https://github.com/Nextomics/NextPolish/releases/download/v1.4.1/NextPolish.tgz
    18. pip install paralleltask
    19. tar -vxzf NextPolish.tgz && cd NextPolish #&& make
    20. git clone https://github.com/rrwick/Minipolish.git
    21. $ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
    22. $ tar xzvf necat_20200803_Linux-amd64.tar.gz
    23. $ cd NECAT/Linux-amd64/bin
    24. $ export PATH=$PATH:$(pwd)
    25. # Install canu and raven under ~/Tools/
    26. git clone https://github.com/marbl/canu.git
    27. cd canu/src
    28. make -j 50 #<number of threads>
    29. git clone https://github.com/lbcb-sci/raven && cd raven
    30. cmake -S ./ -B./build -DRAVEN_BUILD_EXE=1 -DCMAKE_BUILD_TYPE=Release
    31. cmake --build build
    32. # Adapt the script trycycler_assembly_extra-thorough.sh with the following complete paths.
    33. /home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl
    34. /home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh
  4. assembly using trycycler

    1. TODO (IMPORTANT): assmeble all genomes using the following methods. compare them to the unicycler results.
    2. (trycycler) jhuang@hamm:~/DATA/Data_Holger_S.epidermidis_1585_5179_HD05$ ./trycycler_assembly_extra-thorough.sh
    3. #In the HD05 project, we use the following strategies!
    4. I. At first construct the genome only with Trycycler (Trycycler: a consensus long-read assembly tool),
    5. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz trycycler_5179R1/reads.fastq.gz
    6. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz trycycler_1585/reads.fastq.gz
    7. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode03/FAN41335_pass_barcode03.fastq.gz trycycler_1585v/reads.fastq.gz
    8. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz trycycler_5179/reads.fastq.gz
    9. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz trycycler_HD05_2/reads.fastq.gz
    10. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz trycycler_HD05_2_K5/reads.fastq.gz
    11. cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz trycycler_HD05_2_K6/reads.fastq.gz
    12. for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    13. cd ${sample};
    14. ../trycycler_assembly_extra-thorough.sh;
    15. cd ..;
    16. done
    17. #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    18. for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    19. cd ${sample};
    20. ../trycycler_assembly_extra-thorough_raven.sh;
    21. cd ..;
    22. done
    23. #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    24. for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    25. cd ${sample};
    26. ../trycycler_assembly_extra-thorough_canu.sh;
    27. cd ..;
    28. done
    29. #TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
    30. for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
    31. cd ${sample};
    32. trycycler cluster --threads 55 --assemblies assemblies/*.fasta --reads reads.fastq --out_dir trycycler;
    33. cd ..;
    34. done
    35. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    36. #Error: failed to circularise sequence D_bctg00000000 because its start could not be found in other sequences. You can either trim some sequence off the start of D_bctg00000000 or exclude the sequence altogether
    37. and try again.
    38. #Error: failed to circularise sequence E_ctg000010 for multiple reasons. You must either repair this sequence or exclude it and then try running trycycler reconcile again.
    39. #Error: failed to circularise sequence W_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of W_ctg000000 or exclude the sequence altogether and try again.
    40. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    41. #Error: failed to circularise sequence K_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of K_ctg000000 or exclude the sequence altogether and try
    42. #Worst-1kbp: W_Utg714
    43. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    44. #Error: failed to circularise sequence T_contig_1 because its end could not be found in other sequences. You can either trim some sequence off the end of T_contig_1 or exclude the sequence altogether and try again.
    45. # Worst-1kbp: D_bctg00000000, J_bctg00000000, P_bctg00000000
    46. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    47. #Error: failed to circularise sequence A_tig00000003 because its start could not be found in other sequences. You can either trim some sequence off the start of A_tig00000003 or exclude the sequence altogether and try again.
    48. #Error: failed to circularise sequence E_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of E_ctg000000 or exclude the sequence altogether and try again.
    49. #Error: failed to circularise sequence Q_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of Q_ctg000000 or exclude the sequence altogether and try again.
    50. # Worst-1kbp: L_Utg716, X_Utg654
    51. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    52. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    53. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
    54. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_002
    55. #M_tig00000002, S_tig00000003, A_tig00000003, C_utg000003l, G_tig00000002, I_utg000002l
    56. #E_ctg000000, K_ctg000000, Q_ctg000000
    57. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_003
    58. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_004
    59. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_005
    60. trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_006
    61. #
    62. #--> When finished, Trycycler reconcile will make 2_all_seqs.fasta in the cluster directory, a multi-FASTA file containing each of the contigs ready for multiple sequence alignment.
    63. trycycler msa --threads 55 --cluster_dir trycycler/cluster_001
    64. trycycler msa --threads 55 --cluster_dir trycycler/cluster_002
    65. trycycler msa --threads 55 --cluster_dir trycycler/cluster_003
    66. trycycler msa --threads 55 --cluster_dir trycycler/cluster_004
    67. trycycler msa --threads 55 --cluster_dir trycycler/cluster_005
    68. #--> When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory
    69. #generate 4_reads.fastq for each contig!
    70. trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_*
    71. #trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003
    72. trycycler consensus --threads 55 --cluster_dir trycycler/cluster_001
    73. trycycler consensus --threads 55 --cluster_dir trycycler/cluster_002
    74. trycycler consensus --threads 55 --cluster_dir trycycler/cluster_003
    75. trycycler consensus --threads 55 --cluster_dir trycycler/cluster_004
    76. trycycler consensus --threads 55 --cluster_dir trycycler/cluster_005
    77. #!!NOTE that we take the isolates of HD05_2_K5 and HD05_2_K6 assembled by Unicycler instead of Trycycler!!
    78. # TODO (TODAY), generate the 3 datasets below!
    79. # TODO (IMPORTANT): write a Email to Holger, say the short sequencing of HD5_2 is not correct, since the 3 datasets! However, the MTxxxxxxx is confirmed not in K5 and K6!
    80. TODO: variant calling needs the short-sequencing, they are not dorable without the correct short-reads! resequencing? It is difficult to call variants only from long-reads since too much errors in long-reads!
    81. #TODO: check the MT sequence if in the isolates, more deteiled annotations come late!
    82. #II. Comparing the results of Trycycler with Unicycler.
    83. #III. Eventually add the plasmids assembled from unicycler to the final results. E.g. add the 4 plasmids to K5 and K6
  5. Polishing after Trycycler

    1. #1. Oxford Nanopore sequencer (Ignored due to the samtools version incompatibility!)
    2. # for c in trycycler/cluster_*; do
    3. # medaka_consensus -i "$c"/4_reads.fastq -d "$c"/7_final_consensus.fasta -o "$c"/medaka -m r941_min_sup_g507 -t 12
    4. # mv "$c"/medaka/consensus.fasta "$c"/8_medaka.fasta
    5. # rm -r "$c"/medaka "$c"/*.fai "$c"/*.mmi # clean up
    6. # done
    7. # cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta
    8. #2. Short-read polishing
    9. #---- 5179_R1 (2) ----
    10. # mean read depth: 205.8x
    11. # 188 bp have a depth of zero (99.9924% coverage)
    12. # 355 positions changed (0.0144% of total positions)
    13. # estimated pre-polishing sequence accuracy: 99.9856% (Q38.42)
    14. #Step 1: read QC
    15. fastp --in1 ../../s-epidermidis-5179-r1_R1.fastq.gz --in2 ../../s-epidermidis-5179-r1_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    16. #Step 2: Polypolish
    17. for cluster in cluster_001 cluster_002; do
    18. bwa index ${cluster}/7_final_consensus.fasta
    19. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    20. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    21. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    22. done
    23. #Step 3: POLCA
    24. for cluster in cluster_001 cluster_002; do
    25. cd ${cluster}
    26. polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    27. cd ..
    28. done
    29. #Substitution Errors: 37
    30. #Insertion/Deletion Errors: 2
    31. #Assembly Size: 2470001
    32. #Consensus Quality: 99.9984
    33. #Substitution Errors: 4
    34. #Insertion/Deletion Errors: 0
    35. #Assembly Size: 17748
    36. #Consensus Quality: 99.9775
    37. #Step 4: (optional) more rounds and/or other polishers
    38. #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
    39. #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    40. for cluster in cluster_001 cluster_002; do
    41. cd ${cluster}
    42. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    43. cd ..
    44. done
    45. Substitution Errors: 13
    46. Insertion/Deletion Errors: 0
    47. Assembly Size: 2470004
    48. Consensus Quality: 99.9995
    49. Substitution Errors: 0
    50. Insertion/Deletion Errors: 0
    51. Assembly Size: 17748
    52. Consensus Quality: 100
    53. for cluster in cluster_001; do
    54. cd ${cluster}
    55. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
    56. cd ..
    57. done
    58. #Substitution Errors: 0
    59. #Insertion/Deletion Errors: 0
    60. #Assembly Size: 2470004
    61. #Consensus Quality: 100
    62. #---- 1585 (4) ----
    63. # mean read depth: 174.7x
    64. # 8,297 bp have a depth of zero (99.6604% coverage)
    65. # 271 positions changed (0.0111% of total positions)
    66. # estimated pre-polishing sequence accuracy: 99.9889% (Q39.55)
    67. #Step 1: read QC
    68. fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    69. #Step 2: Polypolish
    70. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    71. bwa index ${cluster}/7_final_consensus.fasta
    72. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    73. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    74. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    75. done
    76. #Step 3: POLCA
    77. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    78. cd ${cluster}
    79. polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
    80. cd ..
    81. done
    82. #Substitution Errors: 7
    83. #Insertion/Deletion Errors: 4
    84. #Assembly Size: 2443174
    85. #Consensus Quality: 99.9995
    86. #Substitution Errors: 0
    87. #Insertion/Deletion Errors: 0
    88. #Assembly Size: 9014
    89. #Consensus Quality: 100
    90. #Substitution Errors: 0
    91. #Insertion/Deletion Errors: 0
    92. #Assembly Size: 9014
    93. #Consensus Quality: 100
    94. #Substitution Errors: 0
    95. #Insertion/Deletion Errors: 0
    96. #Assembly Size: 2344
    97. #Consensus Quality: 100
    98. #Step 4: (optional) more rounds and/or other polishers
    99. #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
    100. #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    101. for cluster in cluster_001; do
    102. cd ${cluster}
    103. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
    104. cd ..
    105. done
    106. #Substitution Errors: 0
    107. #Insertion/Deletion Errors: 0
    108. #Assembly Size: 2443176
    109. #Consensus Quality: 100
    110. #---- 1585 derived from unicycler, under 1585_normal/unicycler (4) ----
    111. #Step 0: copy chrom and plasmid1, plasmid2, plasmid3 to cluster_001/7_final_consensus.fasta, ...
    112. #Step 1: read QC
    113. fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    114. #Step 2: Polypolish
    115. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    116. bwa index ${cluster}/7_final_consensus.fasta
    117. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    118. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    119. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    120. done
    121. #Polishing 1 (2,443,574 bp):
    122. #mean read depth: 174.7x
    123. #8,298 bp have a depth of zero (99.6604% coverage)
    124. #52 positions changed (0.0021% of total positions)
    125. #estimated pre-polishing sequence accuracy: 99.9979% (Q46.72)
    126. #Polishing 2 (9,014 bp):
    127. #mean read depth: 766.5x
    128. #3 bp have a depth of zero (99.9667% coverage)
    129. #0 positions changed (0.0000% of total positions)
    130. #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    131. #Polishing 7 (2,344 bp):
    132. #mean read depth: 2893.0x
    133. #4 bp have a depth of zero (99.8294% coverage)
    134. #0 positions changed (0.0000% of total positions)
    135. #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    136. #Polishing 8 (2,255 bp):
    137. #mean read depth: 2719.6x
    138. #4 bp have a depth of zero (99.8226% coverage)
    139. #0 positions changed (0.0000% of total positions)
    140. #estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
    141. #Step 3: POLCA
    142. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    143. cd ${cluster}
    144. polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
    145. cd ..
    146. done
    147. #Substitution Errors: 7
    148. #Insertion/Deletion Errors: 4
    149. #Assembly Size: 2443598
    150. #Consensus Quality: 99.9995
    151. #Substitution Errors: 0
    152. #Insertion/Deletion Errors: 0
    153. #Assembly Size: 9014
    154. #Consensus Quality: 100
    155. #Substitution Errors: 0
    156. #Insertion/Deletion Errors: 0
    157. #Assembly Size: 2344
    158. #Consensus Quality: 100
    159. #Substitution Errors: 0
    160. #Insertion/Deletion Errors: 0
    161. #Assembly Size: 2255
    162. #Consensus Quality: 100
    163. #Step 4: (optional) more rounds and/or other polishers
    164. #After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
    165. #However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
    166. for cluster in cluster_001; do
    167. cd ${cluster}
    168. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
    169. cd ..
    170. done
    171. #Substitution Errors: 0
    172. #Insertion/Deletion Errors: 0
    173. #Assembly Size: 2443600
    174. #Consensus Quality: 100
    175. #-- 1585v (1, no short reads, waiting) --
    176. # TODO!
    177. #-- 5179 (2) --
    178. #mean read depth: 120.7x
    179. #7,547 bp have a depth of zero (99.6946% coverage)
    180. #356 positions changed (0.0144% of total positions)
    181. #estimated pre-polishing sequence accuracy: 99.9856% (Q38.41)
    182. #Step 1: read QC
    183. fastp --in1 ../../s-epidermidis-5179_R1.fastq.gz --in2 ../../s-epidermidis-5179_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    184. #Step 2: Polypolish
    185. for cluster in cluster_001 cluster_002; do
    186. bwa index ${cluster}/7_final_consensus.fasta
    187. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    188. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    189. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    190. done
    191. #Step 3: POLCA
    192. for cluster in cluster_001 cluster_002; do
    193. cd ${cluster}
    194. polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    195. cd ..
    196. done
    197. #Substitution Errors: 49
    198. #Insertion/Deletion Errors: 23
    199. #Assembly Size: 2471418
    200. #Consensus Quality: 99.9971
    201. #Substitution Errors: 0
    202. #Insertion/Deletion Errors: 0
    203. #Assembly Size: 17748
    204. #Consensus Quality: 100
    205. #Step 4: (optional) more rounds POLCA
    206. for cluster in cluster_001; do
    207. cd ${cluster}
    208. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    209. cd ..
    210. done
    211. #Substitution Errors: 10
    212. #Insertion/Deletion Errors: 5
    213. #Assembly Size: 2471442
    214. #Consensus Quality: 99.9994
    215. for cluster in cluster_001; do
    216. cd ${cluster}
    217. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    218. cd ..
    219. done
    220. Substitution Errors: 6
    221. Insertion/Deletion Errors: 0
    222. Assembly Size: 2471445
    223. Consensus Quality: 99.9998
    224. for cluster in cluster_001; do
    225. cd ${cluster}
    226. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
    227. cd ..
    228. done
    229. Substitution Errors: 0
    230. Insertion/Deletion Errors: 0
    231. Assembly Size: 2471445
    232. Consensus Quality: 100
    233. #-- HD5_2 (2): without the short-sequencing we cannot correct the base-calling! --
    234. # !ERROR to be REPORTED, the
    235. #Polishing cluster_001_consensus (2,504,140 bp):
    236. #mean read depth: 94.4x
    237. #240,420 bp have a depth of zero (90.3991% coverage)
    238. #56,894 positions changed (2.2720% of total positions)
    239. #estimated pre-polishing sequence accuracy: 97.7280% (Q16.44)
    240. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R1_001.fastq
    241. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R2_001.fastq
    242. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R1_001.fastq
    243. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R2_001.fastq
    244. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R1_001.fastq
    245. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R2_001.fastq
    246. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R1_001.fastq
    247. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R2_001.fastq
    248. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R1_001.fastq
    249. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R2_001.fastq
    250. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R1_001.fastq
    251. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R2_001.fastq
    252. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R1_001.fastq
    253. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R2_001.fastq
    254. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R1_001.fastq
    255. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R2_001.fastq
    256. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R1_001.fastq
    257. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R2_001.fastq
    258. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R1_001.fastq
    259. /media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R2_001.fastq
    260. #Step 1: read QC
    261. fastp --in1 ../../HD5_2_S38_R1_001.fastq.gz --in2 ../../HD5_2_S38_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    262. # NOTE that the following steps are not run since the short-reads are not correct!
    263. # #Step 2: Polypolish
    264. # for cluster in cluster_001 cluster_005; do
    265. # bwa index ${cluster}/7_final_consensus.fasta
    266. # bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    267. # bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    268. # polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    269. # done
    270. # #Step 3: POLCA
    271. # for cluster in cluster_001 cluster_005; do
    272. # cd ${cluster}
    273. # polca.sh -a polypolish.fasta -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
    274. # cd ..
    275. # done
    276. # #Step 4: (optional) more rounds POLCA
    277. # for cluster in cluster_001; do
    278. # cd ${cluster}
    279. # polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
    280. # cd ..
    281. # done
    282. # NOTE that the plasmids of HD5_2_K5 and HD5_2_K6 were copied from Unicycler!
    283. #-- HD5_2_K5 (4) --
    284. mean read depth: 87.1x
    285. 25 bp have a depth of zero (99.9990% coverage)
    286. 1,085 positions changed (0.0433% of total positions)
    287. estimated pre-polishing sequence accuracy: 99.9567% (Q33.63)
    288. #Step 1: read QC
    289. fastp --in1 ../../275_K5_Holger_S92_R1_001.fastq.gz --in2 ../../275_K5_Holger_S92_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    290. #Step 2: Polypolish
    291. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    292. bwa index ${cluster}/7_final_consensus.fasta
    293. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    294. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    295. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    296. done
    297. #Step 3: POLCA
    298. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    299. cd ${cluster}
    300. polca.sh -a polypolish.fasta -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
    301. cd ..
    302. done
    303. #Substitution Errors: 146
    304. #Insertion/Deletion Errors: 2
    305. #Assembly Size: 2504401
    306. #Consensus Quality: 99.9941
    307. #Substitution Errors: 41
    308. #Insertion/Deletion Errors: 0
    309. #Assembly Size: 41288
    310. #Consensus Quality: 99.9007
    311. #Substitution Errors: 0
    312. #Insertion/Deletion Errors: 0
    313. #Assembly Size: 9191
    314. #Consensus Quality: 100
    315. #Substitution Errors: 0
    316. #Insertion/Deletion Errors: 0
    317. #Assembly Size: 2767
    318. #Consensus Quality: 100
    319. #Step 4: (optional) more rounds POLCA
    320. for cluster in cluster_001 cluster_002; do
    321. cd ${cluster}
    322. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
    323. cd ..
    324. done
    325. #Substitution Errors: 41
    326. #Insertion/Deletion Errors: 0
    327. #Assembly Size: 2504401
    328. #Consensus Quality: 99.9984
    329. #Substitution Errors: 8
    330. #Insertion/Deletion Errors: 0
    331. #Assembly Size: 41288
    332. #Consensus Quality: 99.9806
    333. for cluster in cluster_001 cluster_002; do
    334. cd ${cluster}
    335. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
    336. cd ..
    337. done
    338. #Substitution Errors: 8
    339. #Insertion/Deletion Errors: 0
    340. #Assembly Size: 2504401
    341. #Consensus Quality: 99.9997
    342. #Substitution Errors: 4
    343. #Insertion/Deletion Errors: 0
    344. #Assembly Size: 41288
    345. #Consensus Quality: 99.9903
    346. for cluster in cluster_001 cluster_002; do
    347. cd ${cluster}
    348. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
    349. cd ..
    350. done
    351. #Substitution Errors: 8
    352. #Insertion/Deletion Errors: 0
    353. #Assembly Size: 2504401
    354. #Consensus Quality: 99.9997
    355. #Substitution Errors: 4
    356. #Insertion/Deletion Errors: 0
    357. #Assembly Size: 41288
    358. #Consensus Quality: 99.9903
    359. #-- HD5_2_K6 (4) --
    360. #mean read depth: 116.7x
    361. #4 bp have a depth of zero (99.9998% coverage)
    362. #1,022 positions changed (0.0408% of total positions)
    363. #estimated pre-polishing sequence accuracy: 99.9592% (Q33.89)
    364. #Step 1: read QC
    365. fastp --in1 ../../276_K6_Holger_S95_R1_001.fastq.gz --in2 ../../276_K6_Holger_S95_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
    366. #Step 2: Polypolish
    367. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    368. bwa index ${cluster}/7_final_consensus.fasta
    369. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
    370. bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
    371. polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
    372. done
    373. #Step 3: POLCA
    374. for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
    375. cd ${cluster}
    376. polca.sh -a polypolish.fasta -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
    377. cd ..
    378. done
    379. #Substitution Errors: 164
    380. #Insertion/Deletion Errors: 2
    381. #Assembly Size: 2504398
    382. #Consensus Quality: 99.9934
    383. #Substitution Errors: 22
    384. #Insertion/Deletion Errors: 0
    385. #Assembly Size: 41288
    386. #Consensus Quality: 99.9467
    387. #Substitution Errors: 0
    388. #Insertion/Deletion Errors: 0
    389. #Assembly Size: 9191
    390. #Consensus Quality: 100
    391. #Substitution Errors: 0
    392. #Insertion/Deletion Errors: 0
    393. #Assembly Size: 2767
    394. #Consensus Quality: 100
    395. #Step 4: (optional) more rounds POLCA
    396. for cluster in cluster_001 cluster_002; do
    397. cd ${cluster}
    398. polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
    399. cd ..
    400. done
    401. #Substitution Errors: 32
    402. #Insertion/Deletion Errors: 0
    403. #Assembly Size: 2504400
    404. #Consensus Quality: 99.9987
    405. #Substitution Errors: 0
    406. #Insertion/Deletion Errors: 0
    407. #Assembly Size: 41288
    408. #Consensus Quality: 100
    409. for cluster in cluster_001; do
    410. cd ${cluster}
    411. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
    412. cd ..
    413. done
    414. #Substitution Errors: 4
    415. #Insertion/Deletion Errors: 0
    416. #Assembly Size: 2504400
    417. #Consensus Quality: 99.9998
    418. for cluster in cluster_001; do
    419. cd ${cluster}
    420. polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
    421. cd ..
    422. done
    423. #Substitution Errors: 2
    424. #Insertion/Deletion Errors: 0
    425. #Assembly Size: 2504400
    426. #Consensus Quality: 99.9999
  6. Results by directly using Unicycler

    1. #----------------------- 5179R1_normal -----------------------
    2. >1 length=2468563 depth=1.00x circular=true
    3. >2 length=17748 depth=1.42x circular=true
    4. Component Segments Links Length N50 Longest segment Status
    5. total 2 2 2,486,311 2,468,563 2,468,563
    6. 1 1 1 2,468,563 2,468,563 2,468,563 complete
    7. 2 1 1 17,748 17,748 17,748 complete
    8. Segment Length Depth Starting gene Position Strand Identity Coverage
    9. 1 2,468,563 1.00x UniRef90_Q5HJZ9 1,212,460 forward 100.0% 100.0%
    10. 2 17,748 1.42x UniRef90_A0A0H2VIR3 4,804 reverse 93.2% 99.7%
    11. # ---- 5179_bold ----
    12. Segment Length Depth Starting gene Position Strand Identity Coverage
    13. 1 2,469,173 1.00x UniRef90_Q5HJZ9 1,901,872 reverse 100.0% 100.0%
    14. 2 17,749 2.27x UniRef90_A0A0H2VIR3 4,771 forward 93.2% 99.7%
    15. 4 4,595 10.19x none found
    16. 8 2,449 17.14x none found
    17. >1 length=2469173 depth=1.00x circular=true
    18. >2 length=17749 depth=2.27x circular=true
    19. >3 length=4761 depth=0.44x
    20. >4 length=4595 depth=10.19x circular=true
    21. >5 length=3735 depth=0.29x
    22. >6 length=3718 depth=0.42x
    23. >7 length=3573 depth=0.52x
    24. >8 length=2449 depth=17.14x circular=true
    25. >9 length=2411 depth=0.35x
    26. >10 length=2371 depth=0.32x
    27. >11 length=2365 depth=0.43x
    28. >12 length=1637 depth=0.44x
    29. >13 length=1568 depth=0.66x
    30. >14 length=1505 depth=0.65x
    31. >15 length=1403 depth=0.93x
    32. >16 length=1329 depth=0.55x
    33. makeblastdb -in assembly.fasta -dbtype nucl
    34. blastn -task blastn-short -db ../HD05_2_K5_conservative/assembly.fasta -query assembly.fasta -out 2-16_vs_1.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
    35. #TODO: manually fill the gap in the HD05_2 genome!
    36. 5 1 99.946 3728 1 1 1 3728 1535666 1539392 0.0 7366
    37. 6 1 99.973 3718 0 1 1 3718 702963 706679 0.0 7355
    38. 7 1 99.888 3573 1 3 1 3573 1764622 1768191 0.0 7027
    39. 9 1 100.000 2411 0 0 1 2411 1060914 1063324 0.0 4779
    40. 10 1 100.000 2371 0 0 1 2371 615275 612905 0.0 4700
    41. 11 1 99.958 2365 0 1 1 2365 1088713 1086350 0.0 4672
    42. 12 1 100.000 1637 0 0 1 1637 146635 144999 0.0 3245
    43. 13 1 99.936 1568 0 1 1 1568 2024197 2025763 0.0 3092
    44. 14 1 100.000 1505 0 0 1 1505 2445480 2443976 0.0 2983
    45. 15 1 100.000 1403 0 0 1 1403 197723 196321 0.0 2781
    46. 16 1 99.925 1329 1 0 1 1329 49854 48526 0.0 2627
    47. # -------------------- 1585_normal --------------------
    48. >1 length=2443574 depth=1.00x circular=true #contig_1 2442282 10 60 61
    49. >2 length=9014 depth=3.72x circular=true
    50. >3 length=4388 depth=0.89x
    51. >4 length=3443 depth=0.48x
    52. >5 length=3338 depth=0.48x
    53. >6 length=3336 depth=0.45x
    54. >7 length=2344 depth=11.44x circular=true
    55. >8 length=2255 depth=9.81x circular=true
    56. >9 length=1929 depth=0.37x
    57. >10 length=1703 depth=1.67x
    58. >11 length=1605 depth=0.26x
    59. >12 length=1381 depth=0.56x
    60. >13 length=1360 depth=0.39x
    61. >14 length=1281 depth=0.41x
    62. >15 length=1163 depth=0.51x
    63. >16 length=1088 depth=0.24x
    64. 2594107
    65. ragtag.py scaffold ../HD05_2_K5_normal/assembly.fasta assembly.fasta
    66. ragtag.py patch ragtag.scaffold.fasta ../../HD05_2_K5_normal/assembly.fasta
    67. grep -o 'N' ragtag.patch.fasta | wc -l
    68. 3 1 99.977 4388 0 1 1 4388 2410738 2406352 0.0 8683
    69. 4 1 99.942 3443 0 2 1 3443 2222741 2219301 0.0 6794
    70. 5 1 99.970 3338 0 1 1 3338 455636 452300 0.0 6601
    71. 6 1 99.940 3336 0 2 1 3336 1617740 1614407 0.0 6581
    72. 9 1 99.948 1929 0 1 1 1929 1321522 1319595 0.0 3808
    73. 10 1 99.941 1703 1 0 1 1703 90503 88801 0.0 3368
    74. 11 1 99.938 1605 0 1 1 1605 2361795 2363398 0.0 3166
    75. 12 1 99.928 1381 0 1 1 1381 241092 242471 0.0 2722
    76. 13 1 100.000 1360 0 0 1 1360 1157897 1159256 0.0 2696
    77. 14 1 100.000 1281 0 0 1 1281 218323 219603 0.0 2539
    78. 15 1 100.000 1163 0 0 1 1163 2077536 2078698 0.0 2305
    79. 16 1 100.000 1088 0 0 1 1088 283284 284371 0.0 2157
    80. >1 length=2503585 depth=1.00x circular=true
    81. >2 length=41288 depth=3.32x circular=true
    82. >3 length=9191 depth=8.29x circular=true
    83. >4 length=2767 depth=9.36x circular=true
    84. >1 length=2503927 depth=1.00x circular=true
    85. >2 length=41288 depth=3.77x circular=true
    86. >3 length=9191 depth=7.83x circular=true
    87. >4 length=2767 depth=10.11x circular=true
    88. #--------------------------
    89. 1585V
    90. #[2024-01-17 13:42:28] INFO: Assembly statistics:
    91. Total length: 2438882 vs 2443574
    92. Fragments: 1
    93. Fragments N50: 2438882
    94. Largest frg: 2438882
    95. Scaffolds: 0
    96. Mean coverage: 47
    97. unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
    98. unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
    99. #3 no short sequencing
    100. unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
    101. unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
    102. unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
    103. unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
    104. # ---- 1 5179R1 2469692 ----
    105. >1 length=2468563 depth=1.00x circular=true
    106. >2 length=17748 depth=1.42x circular=true
    107. # ---- 2 1585 2442282 ---- (compring to Trycyler chrom is 2443176 nt)
    108. >1 length=2443574 depth=1.00x circular=true
    109. >2 length=9014 depth=3.72x circular=true
    110. >7 length=2344 depth=11.44x circular=true
    111. >8 length=2255 depth=9.81x circular=true
    112. # ---- 3 1585v 2438882 ----
    113. #using long sequencing only 1
    114. # ---- 4 5179_bold 2471107+17740 ----
    115. >1 length=2469173 depth=1.00x circular=true
    116. >2 length=17749 depth=2.27x circular=true
    117. >4 length=4595 depth=10.19x circular=true
    118. >8 length=2449 depth=17.14x circular=true
    119. # ---- 5 HD05_2 2504622 ----
    120. # Note in HD05_2_bold_hq_lq including the bad long-reads.
    121. >1 length=965875 depth=0.95x
    122. >2 length=855325 depth=1.00x
    123. >3 length=582944 depth=1.02x
    124. >4 length=183656 depth=1.02x
    125. >5 length=13570 depth=4.73x circular=true
    126. >6 length=1503 depth=4.85x
    127. >7 length=1271 depth=5.06x
    128. >8 length=227 depth=2.03x
    129. >9 length=153 depth=0.93x
    130. >10 length=152 depth=1.09x
    131. # trycycler: 2503231 (yes), 9183 (yes), 22394, 18541 --
    132. # ---- 6 HD05_2_K5 2504656+41290+9191 ----
    133. conservative
    134. >1 length=2503585 depth=1.00x circular=true
    135. >2 length=41288 depth=3.32x circular=true
    136. >3 length=9191 depth=8.29x circular=true
    137. >4 length=2767 depth=9.36x circular=true
    138. # ---- 7 HD05_2_K6 2504588+41285+9192 ----
    139. conservative
    140. >1 length=2503927 depth=1.00x circular=true
    141. >2 length=41288 depth=3.77x circular=true
    142. >3 length=9191 depth=7.83x circular=true
    143. >4 length=2767 depth=10.11x circular=true
    144. ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
    145. ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
    146. grep -o 'N' ragtag.patch.fasta | wc -l
    147. makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
    148. blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.000001 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
  7. Submit all genomes to NCBI

    1. TODO: If 1585V using only the long reads to assemble the genome!
    2. BioSample accession
    3. BioProject: PRJNA1038700
    4. Staphylococcus epidermidis strain:1585v | isolate:1585v Genome sequencing
    5. SAMN38198576
    6. Pathogen: clinical or host-associated sample from Staphylococcus epidermidis
    7. 0 SRAs
    8. Status
    9. To be released
    10. Release date
    11. 2027-11-10
    12. Created
    13. 2023-11-10 15:24
    14. Updated
    15. 2023-11-17 16:57
    16. Sample name
    17. 1585v
    18. Package
    19. Pathogen: clinical or host-associated; version 1.0
    20. Organism
    21. Name:
    22. Staphylococcus epidermidis
    23. Taxonomy ID:
    24. 1282
    25. Attributes
    26. Attribute name Attribute value
    27. collected by
    28. H R
    29. collection date
    30. 2004
    31. geographic location
    32. Germany: Hamburg
    33. host
    34. Homo sapiens
    35. host disease
    36. port-catheter infection
    37. isolation source
    38. port-catheter
    39. isolate
    40. missing
    41. strain
    42. 1585v
    43. latitude and longitude
    44. 53.551672 N 9.955081 E
    45. https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#run_pgap
  8. Background of 1585v and 1585

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346721/ S. epidermidis 1585 is known to be biofilm-negative in laboratory media, but to form biofilm in the presence of human serum.

In contrast, S. epidermidis 1585 v is a variant derived from strain 1585 in which, due to a chromosomal re-arrangement, a 460 kDa isoform of Embp is overexpressed even in TSB, while mutant M135 is an isogenic mutant of 1585 v in which expression of Embp is interrupted by insertion of transposon Tn917.

Staphylococcus epidermidis (S. epidermidis) 1585 is a specific strain of S. epidermidis that is classified as a wild-type strain. Wild-type in bacterial terminology refers to the strain of an organism that is found in nature, as opposed to those that have been modified or mutated in a laboratory setting. Here are some key points about the 1585 wild-type strain of S. epidermidis:

  1. No Embp Production in TSB: One notable characteristic of the S. epidermidis 1585 strain is that it does not produce Embp (extracellular matrix binding protein) when grown in TSB (Tryptic Soy Broth). Embp is a protein that plays a crucial role in the biofilm formation and adherence of bacteria to surfaces. The absence of Embp production in this strain could impact its ability to form biofilms, a common virulence factor in Staphylococcus infections.

  2. Biofilm Formation: S. epidermidis is known for its ability to form biofilms, especially on medical devices, leading to infections that are difficult to treat. The fact that the 1585 strain doesn't produce Embp in TSB suggests it may have a reduced capacity for biofilm formation under these conditions, which could be significant in understanding and managing such infections.

  3. Research and Clinical Implications: Studying wild-type strains like S. epidermidis 1585 is important for understanding the natural behavior and characteristics of the species. Since this strain behaves differently from other strains in terms of Embp production and possibly biofilm formation, it can provide insights into the mechanisms and genetic factors that control these processes. This knowledge is valuable for developing strategies to prevent and treat infections, especially in hospital and healthcare settings where S. epidermidis infections are common.

  4. Genetic Studies: The 1585 strain can also serve as a baseline or control in genetic studies. By comparing the genome and behavior of 1585 with other strains of S. epidermidis, researchers can identify genetic variations and mutations that may be responsible for different phenotypes, such as increased virulence or antibiotic resistance.

Model Organism for Understanding Staphylococcal Behavior: As a wild-type strain, 1585 offers a model for studying the natural state of S. epidermidis. This is crucial for understanding the fundamental biology of the bacterium, which can help in the development of treatments and interventions against infections caused by more virulent or drug-resistant strains. In summary, the S. epidermidis

1585 wild-type strain is significant in microbiological research due to its natural characteristics, particularly its behavior in biofilm formation and Embp production. Understanding these aspects can contribute to better insights into the pathogenicity and treatment of Staphylococcus infections, particularly in clinical settings where these bacteria are a common source of nosocomial infections.

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum