gene_x 0 like s 635 view s
Tags: bacterium, genome, pipeline
prapare the input sequencing data
NGS.id Sample.name ONT_barcode
jk3332 5179R1 Native Barcode NB01
jk3333 1585 Native Barcode NB02
jk3334 1585V Native Barcode NB03
jk3335 5179 Native Barcode NB04
jk3336 HD_05_2 Native Barcode NB05
jk3337 HD_05_2_K5 Native Barcode NB06
jk3338 HD_05_2_K6 Native Barcode NB07
assembly using trycycler
cat FAN41335_pass_barcode01_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode01_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode01.fastq.gz
cat FAN41335_pass_barcode03_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode03_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode03.fastq.gz
cat FAN41335_pass_barcode04_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_1.fastq.gz FAN41335_pass_barcode04_46d24d87_69a75752_2.fastq.gz > FAN41335_pass_barcode04.fastq.gz
cat FAN41335_pass_barcode05_46d24d87_69a75752_0.fastq.gz FAN41335_pass_barcode05_46d24d87_69a75752_1.fastq.gz > FAN41335_pass_barcode05.fastq.gz
unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode normal -t 55 -o 5179R1_normal
unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode normal -t 55 -o 1585_normal
#3 no short sequencing
unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode normal -t 55 -o 5179_normal
unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode normal -t 55 -o HD05_2_normal
unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode normal -t 55 -o HD05_2_K5_normal
unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode normal -t 55 -o HD05_2_K6_normal
unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
#3 no short sequencing
unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode bold -t 55 -o 5179R1_bold
unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode bold -t 55 -o 1585_bold
#3 no short sequencing
unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode bold -t 55 -o 5179_bold
unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode bold -t 55 -o HD05_2_bold
unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode bold -t 55 -o HD05_2_K5_bold
unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode bold -t 55 -o HD05_2_K6_bold
unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
#3 no short sequencing
unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
grep -o 'N' ragtag.patch.fasta | wc -l
makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
install the trycycler environment
nextdenovo_dir="/path/to/NextDenovo"
nextpolish_dir="/path/to/NextPolish"
genome_size="2500000" #2 503 927
/home/jhuang/Tools/canu/build/bin/canu -p canu -d canu_temp -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore read_subsets/sample_"$i".fastq
/home/jhuang/Tools/Trycycler/scripts/canu_trim.py canu_temp/canu.contigs.fasta > assemblies/assembly_"$i".fasta
/home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh read_subsets/sample_"$i".fastq "$threads" > assemblies/assembly_"$i".gfa
/home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl config config.txt
/home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl bridge config.txt
/home/jhuang/Tools/raven/build/bin/raven --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/assembly_"$i".gfa read_subsets/sample_"$i".fastq > assemblies/assembly_"$i".fasta
#https://github.com/rrwick (Bandage, Unicycler, Filtlong, Trycycler, Polypolish
install canu, flye, raven, miniasm, minipolish, any2fasta via 'mamba install'
#install fastp, medaka, polypolish, masurca (install Polca) with 'mamba install'
install NextDenovo and NextPolish from https://github.com/Nextomics
wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
tar -vxzf NextDenovo.tgz && cd NextDenovo
#cd NextDenovo && make
wget https://github.com/Nextomics/NextPolish/releases/download/v1.4.1/NextPolish.tgz
pip install paralleltask
tar -vxzf NextPolish.tgz && cd NextPolish #&& make
git clone https://github.com/rrwick/Minipolish.git
$ wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
$ tar xzvf necat_20200803_Linux-amd64.tar.gz
$ cd NECAT/Linux-amd64/bin
$ export PATH=$PATH:$(pwd)
# Install canu and raven under ~/Tools/
git clone https://github.com/marbl/canu.git
cd canu/src
make -j 50 #<number of threads>
git clone https://github.com/lbcb-sci/raven && cd raven
cmake -S ./ -B./build -DRAVEN_BUILD_EXE=1 -DCMAKE_BUILD_TYPE=Release
cmake --build build
# Adapt the script trycycler_assembly_extra-thorough.sh with the following complete paths.
/home/jhuang/Tools/NECAT/Linux-amd64/bin/necat.pl
/home/jhuang/Tools/Minipolish/miniasm_and_minipolish.sh
assembly using trycycler
TODO (IMPORTANT): assmeble all genomes using the following methods. compare them to the unicycler results.
(trycycler) jhuang@hamm:~/DATA/Data_Holger_S.epidermidis_1585_5179_HD05$ ./trycycler_assembly_extra-thorough.sh
#In the HD05 project, we use the following strategies!
I. At first construct the genome only with Trycycler (Trycycler: a consensus long-read assembly tool),
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz trycycler_5179R1/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz trycycler_1585/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode03/FAN41335_pass_barcode03.fastq.gz trycycler_1585v/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz trycycler_5179/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz trycycler_HD05_2/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz trycycler_HD05_2_K5/reads.fastq.gz
cp long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz trycycler_HD05_2_K6/reads.fastq.gz
for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
cd ${sample};
../trycycler_assembly_extra-thorough.sh;
cd ..;
done
#TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
cd ${sample};
../trycycler_assembly_extra-thorough_raven.sh;
cd ..;
done
#TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
cd ${sample};
../trycycler_assembly_extra-thorough_canu.sh;
cd ..;
done
#TODO: further steps (see https://github.com/rrwick/Trycycler/wiki)
for sample in trycycler_5179R1 trycycler_1585 trycycler_1585v trycycler_5179 trycycler_HD05_2 trycycler_HD05_2_K5 trycycler_HD05_2_K6; do
cd ${sample};
trycycler cluster --threads 55 --assemblies assemblies/*.fasta --reads reads.fastq --out_dir trycycler;
cd ..;
done
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
#Error: failed to circularise sequence D_bctg00000000 because its start could not be found in other sequences. You can either trim some sequence off the start of D_bctg00000000 or exclude the sequence altogether
and try again.
#Error: failed to circularise sequence E_ctg000010 for multiple reasons. You must either repair this sequence or exclude it and then try running trycycler reconcile again.
#Error: failed to circularise sequence W_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of W_ctg000000 or exclude the sequence altogether and try again.
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
#Error: failed to circularise sequence K_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of K_ctg000000 or exclude the sequence altogether and try
#Worst-1kbp: W_Utg714
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
#Error: failed to circularise sequence T_contig_1 because its end could not be found in other sequences. You can either trim some sequence off the end of T_contig_1 or exclude the sequence altogether and try again.
# Worst-1kbp: D_bctg00000000, J_bctg00000000, P_bctg00000000
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
#Error: failed to circularise sequence A_tig00000003 because its start could not be found in other sequences. You can either trim some sequence off the start of A_tig00000003 or exclude the sequence altogether and try again.
#Error: failed to circularise sequence E_ctg000000 because its start could not be found in other sequences. You can either trim some sequence off the start of E_ctg000000 or exclude the sequence altogether and try again.
#Error: failed to circularise sequence Q_ctg000000 because its end could not be found in other sequences. You can either trim some sequence off the end of Q_ctg000000 or exclude the sequence altogether and try again.
# Worst-1kbp: L_Utg716, X_Utg654
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_001
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_002
#M_tig00000002, S_tig00000003, A_tig00000003, C_utg000003l, G_tig00000002, I_utg000002l
#E_ctg000000, K_ctg000000, Q_ctg000000
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_003
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_004
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_005
trycycler reconcile --threads 55 --reads reads.fastq --cluster_dir trycycler/cluster_006
#
#--> When finished, Trycycler reconcile will make 2_all_seqs.fasta in the cluster directory, a multi-FASTA file containing each of the contigs ready for multiple sequence alignment.
trycycler msa --threads 55 --cluster_dir trycycler/cluster_001
trycycler msa --threads 55 --cluster_dir trycycler/cluster_002
trycycler msa --threads 55 --cluster_dir trycycler/cluster_003
trycycler msa --threads 55 --cluster_dir trycycler/cluster_004
trycycler msa --threads 55 --cluster_dir trycycler/cluster_005
#--> When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory
#generate 4_reads.fastq for each contig!
trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_*
#trycycler partition --threads 55 --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003
trycycler consensus --threads 55 --cluster_dir trycycler/cluster_001
trycycler consensus --threads 55 --cluster_dir trycycler/cluster_002
trycycler consensus --threads 55 --cluster_dir trycycler/cluster_003
trycycler consensus --threads 55 --cluster_dir trycycler/cluster_004
trycycler consensus --threads 55 --cluster_dir trycycler/cluster_005
#!!NOTE that we take the isolates of HD05_2_K5 and HD05_2_K6 assembled by Unicycler instead of Trycycler!!
# TODO (TODAY), generate the 3 datasets below!
# TODO (IMPORTANT): write a Email to Holger, say the short sequencing of HD5_2 is not correct, since the 3 datasets! However, the MTxxxxxxx is confirmed not in K5 and K6!
TODO: variant calling needs the short-sequencing, they are not dorable without the correct short-reads! resequencing? It is difficult to call variants only from long-reads since too much errors in long-reads!
#TODO: check the MT sequence if in the isolates, more deteiled annotations come late!
#II. Comparing the results of Trycycler with Unicycler.
#III. Eventually add the plasmids assembled from unicycler to the final results. E.g. add the 4 plasmids to K5 and K6
Polishing after Trycycler
#1. Oxford Nanopore sequencer (Ignored due to the samtools version incompatibility!)
# for c in trycycler/cluster_*; do
# medaka_consensus -i "$c"/4_reads.fastq -d "$c"/7_final_consensus.fasta -o "$c"/medaka -m r941_min_sup_g507 -t 12
# mv "$c"/medaka/consensus.fasta "$c"/8_medaka.fasta
# rm -r "$c"/medaka "$c"/*.fai "$c"/*.mmi # clean up
# done
# cat trycycler/cluster_*/8_medaka.fasta > trycycler/consensus.fasta
#2. Short-read polishing
#---- 5179_R1 (2) ----
# mean read depth: 205.8x
# 188 bp have a depth of zero (99.9924% coverage)
# 355 positions changed (0.0144% of total positions)
# estimated pre-polishing sequence accuracy: 99.9856% (Q38.42)
#Step 1: read QC
fastp --in1 ../../s-epidermidis-5179-r1_R1.fastq.gz --in2 ../../s-epidermidis-5179-r1_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Step 3: POLCA
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 37
#Insertion/Deletion Errors: 2
#Assembly Size: 2470001
#Consensus Quality: 99.9984
#Substitution Errors: 4
#Insertion/Deletion Errors: 0
#Assembly Size: 17748
#Consensus Quality: 99.9775
#Step 4: (optional) more rounds and/or other polishers
#After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
#However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
cd ..
done
Substitution Errors: 13
Insertion/Deletion Errors: 0
Assembly Size: 2470004
Consensus Quality: 99.9995
Substitution Errors: 0
Insertion/Deletion Errors: 0
Assembly Size: 17748
Consensus Quality: 100
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179-r1_R1.fastq.gz ../../../s-epidermidis-5179-r1_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2470004
#Consensus Quality: 100
#---- 1585 (4) ----
# mean read depth: 174.7x
# 8,297 bp have a depth of zero (99.6604% coverage)
# 271 positions changed (0.0111% of total positions)
# estimated pre-polishing sequence accuracy: 99.9889% (Q39.55)
#Step 1: read QC
fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Step 3: POLCA
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 7
#Insertion/Deletion Errors: 4
#Assembly Size: 2443174
#Consensus Quality: 99.9995
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 9014
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 9014
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2344
#Consensus Quality: 100
#Step 4: (optional) more rounds and/or other polishers
#After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
#However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2443176
#Consensus Quality: 100
#---- 1585 derived from unicycler, under 1585_normal/unicycler (4) ----
#Step 0: copy chrom and plasmid1, plasmid2, plasmid3 to cluster_001/7_final_consensus.fasta, ...
#Step 1: read QC
fastp --in1 ../../s-epidermidis-1585_R1.fastq.gz --in2 ../../s-epidermidis-1585_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Polishing 1 (2,443,574 bp):
#mean read depth: 174.7x
#8,298 bp have a depth of zero (99.6604% coverage)
#52 positions changed (0.0021% of total positions)
#estimated pre-polishing sequence accuracy: 99.9979% (Q46.72)
#Polishing 2 (9,014 bp):
#mean read depth: 766.5x
#3 bp have a depth of zero (99.9667% coverage)
#0 positions changed (0.0000% of total positions)
#estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
#Polishing 7 (2,344 bp):
#mean read depth: 2893.0x
#4 bp have a depth of zero (99.8294% coverage)
#0 positions changed (0.0000% of total positions)
#estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
#Polishing 8 (2,255 bp):
#mean read depth: 2719.6x
#4 bp have a depth of zero (99.8226% coverage)
#0 positions changed (0.0000% of total positions)
#estimated pre-polishing sequence accuracy: 100.0000% (Q∞)
#Step 3: POLCA
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 7
#Insertion/Deletion Errors: 4
#Assembly Size: 2443598
#Consensus Quality: 99.9995
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 9014
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2344
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2255
#Consensus Quality: 100
#Step 4: (optional) more rounds and/or other polishers
#After one round of Polypolish and one round of POLCA, your assembly should be in very good shape!
#However, there may still be a few lingering errors. You can try running additional rounds of Polypolish or POLCA to see if they make any more changes.
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-1585_R1.fastq.gz ../../../s-epidermidis-1585_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2443600
#Consensus Quality: 100
#-- 1585v (1, no short reads, waiting) --
# TODO!
#-- 5179 (2) --
#mean read depth: 120.7x
#7,547 bp have a depth of zero (99.6946% coverage)
#356 positions changed (0.0144% of total positions)
#estimated pre-polishing sequence accuracy: 99.9856% (Q38.41)
#Step 1: read QC
fastp --in1 ../../s-epidermidis-5179_R1.fastq.gz --in2 ../../s-epidermidis-5179_R2.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Step 3: POLCA
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 49
#Insertion/Deletion Errors: 23
#Assembly Size: 2471418
#Consensus Quality: 99.9971
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 17748
#Consensus Quality: 100
#Step 4: (optional) more rounds POLCA
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 10
#Insertion/Deletion Errors: 5
#Assembly Size: 2471442
#Consensus Quality: 99.9994
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
cd ..
done
Substitution Errors: 6
Insertion/Deletion Errors: 0
Assembly Size: 2471445
Consensus Quality: 99.9998
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../s-epidermidis-5179_R1.fastq.gz ../../../s-epidermidis-5179_R2.fastq.gz" -t 55 -m 120G
cd ..
done
Substitution Errors: 0
Insertion/Deletion Errors: 0
Assembly Size: 2471445
Consensus Quality: 100
#-- HD5_2 (2): without the short-sequencing we cannot correct the base-calling! --
# !ERROR to be REPORTED, the
#Polishing cluster_001_consensus (2,504,140 bp):
#mean read depth: 94.4x
#240,420 bp have a depth of zero (90.3991% coverage)
#56,894 positions changed (2.2720% of total positions)
#estimated pre-polishing sequence accuracy: 97.7280% (Q16.44)
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_1_S37_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_2_S38_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_3_S39_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_4_S40_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_5_S41_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_6_S42_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_7_S43_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_8_S44_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_9_S45_R2_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R1_001.fastq
/media/jhuang/Elements2/Data_Anna12_HAPDICS_HyAsP/180821_rohde/HD5_10_S46_R2_001.fastq
#Step 1: read QC
fastp --in1 ../../HD5_2_S38_R1_001.fastq.gz --in2 ../../HD5_2_S38_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
# NOTE that the following steps are not run since the short-reads are not correct!
# #Step 2: Polypolish
# for cluster in cluster_001 cluster_005; do
# bwa index ${cluster}/7_final_consensus.fasta
# bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
# bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
# polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
# done
# #Step 3: POLCA
# for cluster in cluster_001 cluster_005; do
# cd ${cluster}
# polca.sh -a polypolish.fasta -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
# cd ..
# done
# #Step 4: (optional) more rounds POLCA
# for cluster in cluster_001; do
# cd ${cluster}
# polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../HD5_2_S38_R1_001.fastq.gz ../../../HD5_2_S38_R2_001.fastq.gz" -t 55 -m 120G
# cd ..
# done
# NOTE that the plasmids of HD5_2_K5 and HD5_2_K6 were copied from Unicycler!
#-- HD5_2_K5 (4) --
mean read depth: 87.1x
25 bp have a depth of zero (99.9990% coverage)
1,085 positions changed (0.0433% of total positions)
estimated pre-polishing sequence accuracy: 99.9567% (Q33.63)
#Step 1: read QC
fastp --in1 ../../275_K5_Holger_S92_R1_001.fastq.gz --in2 ../../275_K5_Holger_S92_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Step 3: POLCA
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 146
#Insertion/Deletion Errors: 2
#Assembly Size: 2504401
#Consensus Quality: 99.9941
#Substitution Errors: 41
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 99.9007
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 9191
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2767
#Consensus Quality: 100
#Step 4: (optional) more rounds POLCA
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 41
#Insertion/Deletion Errors: 0
#Assembly Size: 2504401
#Consensus Quality: 99.9984
#Substitution Errors: 8
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 99.9806
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 8
#Insertion/Deletion Errors: 0
#Assembly Size: 2504401
#Consensus Quality: 99.9997
#Substitution Errors: 4
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 99.9903
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../275_K5_Holger_S92_R1_001.fastq.gz ../../../275_K5_Holger_S92_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 8
#Insertion/Deletion Errors: 0
#Assembly Size: 2504401
#Consensus Quality: 99.9997
#Substitution Errors: 4
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 99.9903
#-- HD5_2_K6 (4) --
#mean read depth: 116.7x
#4 bp have a depth of zero (99.9998% coverage)
#1,022 positions changed (0.0408% of total positions)
#estimated pre-polishing sequence accuracy: 99.9592% (Q33.89)
#Step 1: read QC
fastp --in1 ../../276_K6_Holger_S95_R1_001.fastq.gz --in2 ../../276_K6_Holger_S95_R2_001.fastq.gz --out1 1.fastq.gz --out2 2.fastq.gz --unpaired1 u.fastq.gz --unpaired2 u.fastq.gz
#Step 2: Polypolish
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
bwa index ${cluster}/7_final_consensus.fasta
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 1.fastq.gz > ${cluster}/alignments_1.sam
bwa mem -t 55 -a ${cluster}/7_final_consensus.fasta 2.fastq.gz > ${cluster}/alignments_2.sam
polypolish polish ${cluster}/7_final_consensus.fasta ${cluster}/alignments_1.sam ${cluster}/alignments_2.sam > ${cluster}/polypolish.fasta
done
#Step 3: POLCA
for cluster in cluster_001 cluster_002 cluster_003 cluster_004; do
cd ${cluster}
polca.sh -a polypolish.fasta -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 164
#Insertion/Deletion Errors: 2
#Assembly Size: 2504398
#Consensus Quality: 99.9934
#Substitution Errors: 22
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 99.9467
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 9191
#Consensus Quality: 100
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 2767
#Consensus Quality: 100
#Step 4: (optional) more rounds POLCA
for cluster in cluster_001 cluster_002; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 32
#Insertion/Deletion Errors: 0
#Assembly Size: 2504400
#Consensus Quality: 99.9987
#Substitution Errors: 0
#Insertion/Deletion Errors: 0
#Assembly Size: 41288
#Consensus Quality: 100
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 4
#Insertion/Deletion Errors: 0
#Assembly Size: 2504400
#Consensus Quality: 99.9998
for cluster in cluster_001; do
cd ${cluster}
polca.sh -a polypolish.fasta.PolcaCorrected.fa.PolcaCorrected.fa.PolcaCorrected.fa -r "../../../276_K6_Holger_S95_R1_001.fastq.gz ../../../276_K6_Holger_S95_R2_001.fastq.gz" -t 55 -m 120G
cd ..
done
#Substitution Errors: 2
#Insertion/Deletion Errors: 0
#Assembly Size: 2504400
#Consensus Quality: 99.9999
Results by directly using Unicycler
#----------------------- 5179R1_normal -----------------------
>1 length=2468563 depth=1.00x circular=true
>2 length=17748 depth=1.42x circular=true
Component Segments Links Length N50 Longest segment Status
total 2 2 2,486,311 2,468,563 2,468,563
1 1 1 2,468,563 2,468,563 2,468,563 complete
2 1 1 17,748 17,748 17,748 complete
Segment Length Depth Starting gene Position Strand Identity Coverage
1 2,468,563 1.00x UniRef90_Q5HJZ9 1,212,460 forward 100.0% 100.0%
2 17,748 1.42x UniRef90_A0A0H2VIR3 4,804 reverse 93.2% 99.7%
# ---- 5179_bold ----
Segment Length Depth Starting gene Position Strand Identity Coverage
1 2,469,173 1.00x UniRef90_Q5HJZ9 1,901,872 reverse 100.0% 100.0%
2 17,749 2.27x UniRef90_A0A0H2VIR3 4,771 forward 93.2% 99.7%
4 4,595 10.19x none found
8 2,449 17.14x none found
>1 length=2469173 depth=1.00x circular=true
>2 length=17749 depth=2.27x circular=true
>3 length=4761 depth=0.44x
>4 length=4595 depth=10.19x circular=true
>5 length=3735 depth=0.29x
>6 length=3718 depth=0.42x
>7 length=3573 depth=0.52x
>8 length=2449 depth=17.14x circular=true
>9 length=2411 depth=0.35x
>10 length=2371 depth=0.32x
>11 length=2365 depth=0.43x
>12 length=1637 depth=0.44x
>13 length=1568 depth=0.66x
>14 length=1505 depth=0.65x
>15 length=1403 depth=0.93x
>16 length=1329 depth=0.55x
makeblastdb -in assembly.fasta -dbtype nucl
blastn -task blastn-short -db ../HD05_2_K5_conservative/assembly.fasta -query assembly.fasta -out 2-16_vs_1.blastn -evalue 0.00000000001 -num_threads 15 -outfmt 6 -strand both -max_target_seqs 1
#TODO: manually fill the gap in the HD05_2 genome!
5 1 99.946 3728 1 1 1 3728 1535666 1539392 0.0 7366
6 1 99.973 3718 0 1 1 3718 702963 706679 0.0 7355
7 1 99.888 3573 1 3 1 3573 1764622 1768191 0.0 7027
9 1 100.000 2411 0 0 1 2411 1060914 1063324 0.0 4779
10 1 100.000 2371 0 0 1 2371 615275 612905 0.0 4700
11 1 99.958 2365 0 1 1 2365 1088713 1086350 0.0 4672
12 1 100.000 1637 0 0 1 1637 146635 144999 0.0 3245
13 1 99.936 1568 0 1 1 1568 2024197 2025763 0.0 3092
14 1 100.000 1505 0 0 1 1505 2445480 2443976 0.0 2983
15 1 100.000 1403 0 0 1 1403 197723 196321 0.0 2781
16 1 99.925 1329 1 0 1 1329 49854 48526 0.0 2627
# -------------------- 1585_normal --------------------
>1 length=2443574 depth=1.00x circular=true #contig_1 2442282 10 60 61
>2 length=9014 depth=3.72x circular=true
>3 length=4388 depth=0.89x
>4 length=3443 depth=0.48x
>5 length=3338 depth=0.48x
>6 length=3336 depth=0.45x
>7 length=2344 depth=11.44x circular=true
>8 length=2255 depth=9.81x circular=true
>9 length=1929 depth=0.37x
>10 length=1703 depth=1.67x
>11 length=1605 depth=0.26x
>12 length=1381 depth=0.56x
>13 length=1360 depth=0.39x
>14 length=1281 depth=0.41x
>15 length=1163 depth=0.51x
>16 length=1088 depth=0.24x
2594107
ragtag.py scaffold ../HD05_2_K5_normal/assembly.fasta assembly.fasta
ragtag.py patch ragtag.scaffold.fasta ../../HD05_2_K5_normal/assembly.fasta
grep -o 'N' ragtag.patch.fasta | wc -l
3 1 99.977 4388 0 1 1 4388 2410738 2406352 0.0 8683
4 1 99.942 3443 0 2 1 3443 2222741 2219301 0.0 6794
5 1 99.970 3338 0 1 1 3338 455636 452300 0.0 6601
6 1 99.940 3336 0 2 1 3336 1617740 1614407 0.0 6581
9 1 99.948 1929 0 1 1 1929 1321522 1319595 0.0 3808
10 1 99.941 1703 1 0 1 1703 90503 88801 0.0 3368
11 1 99.938 1605 0 1 1 1605 2361795 2363398 0.0 3166
12 1 99.928 1381 0 1 1 1381 241092 242471 0.0 2722
13 1 100.000 1360 0 0 1 1360 1157897 1159256 0.0 2696
14 1 100.000 1281 0 0 1 1281 218323 219603 0.0 2539
15 1 100.000 1163 0 0 1 1163 2077536 2078698 0.0 2305
16 1 100.000 1088 0 0 1 1088 283284 284371 0.0 2157
>1 length=2503585 depth=1.00x circular=true
>2 length=41288 depth=3.32x circular=true
>3 length=9191 depth=8.29x circular=true
>4 length=2767 depth=9.36x circular=true
>1 length=2503927 depth=1.00x circular=true
>2 length=41288 depth=3.77x circular=true
>3 length=9191 depth=7.83x circular=true
>4 length=2767 depth=10.11x circular=true
#--------------------------
1585V
#[2024-01-17 13:42:28] INFO: Assembly statistics:
Total length: 2438882 vs 2443574
Fragments: 1
Fragments N50: 2438882
Largest frg: 2438882
Scaffolds: 0
Mean coverage: 47
unicycler -1 s-epidermidis-5179-r1_R1.fastq.gz -2 s-epidermidis-5179-r1_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode01/FAN41335_pass_barcode01.fastq.gz --mode conservative -t 55 -o 5179R1_conservative
unicycler -1 s-epidermidis-1585_R1.fastq.gz -2 s-epidermidis-1585_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode02/FAN41335_pass_barcode02.fastq.gz --mode conservative -t 55 -o 1585_conservative
#3 no short sequencing
unicycler -1 s-epidermidis-5179_R1.fastq.gz -2 s-epidermidis-5179_R2.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode04/FAN41335_pass_barcode04.fastq.gz --mode conservative -t 55 -o 5179_conservative
unicycler -1 HD5_2_S38_R1_001.fastq.gz -2 HD5_2_S38_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode05/FAN41335_pass_barcode05.fastq.gz --mode conservative -t 55 -o HD05_2_conservative
unicycler -1 275_K5_Holger_S92_R1_001.fastq.gz -2 275_K5_Holger_S92_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode06/FAN41335_pass_barcode06.fastq.gz --mode conservative -t 55 -o HD05_2_K5_conservative
unicycler -1 276_K6_Holger_S95_R1_001.fastq.gz -2 276_K6_Holger_S95_R2_001.fastq.gz -l long_reads/240109_FAN41335_S_epidermidis/fastq_pass/barcode07/FAN41335_pass_barcode07.fastq.gz --mode conservative -t 55 -o HD05_2_K6_conservative
# ---- 1 5179R1 2469692 ----
>1 length=2468563 depth=1.00x circular=true
>2 length=17748 depth=1.42x circular=true
# ---- 2 1585 2442282 ---- (compring to Trycyler chrom is 2443176 nt)
>1 length=2443574 depth=1.00x circular=true
>2 length=9014 depth=3.72x circular=true
>7 length=2344 depth=11.44x circular=true
>8 length=2255 depth=9.81x circular=true
# ---- 3 1585v 2438882 ----
#using long sequencing only 1
# ---- 4 5179_bold 2471107+17740 ----
>1 length=2469173 depth=1.00x circular=true
>2 length=17749 depth=2.27x circular=true
>4 length=4595 depth=10.19x circular=true
>8 length=2449 depth=17.14x circular=true
# ---- 5 HD05_2 2504622 ----
# Note in HD05_2_bold_hq_lq including the bad long-reads.
>1 length=965875 depth=0.95x
>2 length=855325 depth=1.00x
>3 length=582944 depth=1.02x
>4 length=183656 depth=1.02x
>5 length=13570 depth=4.73x circular=true
>6 length=1503 depth=4.85x
>7 length=1271 depth=5.06x
>8 length=227 depth=2.03x
>9 length=153 depth=0.93x
>10 length=152 depth=1.09x
# trycycler: 2503231 (yes), 9183 (yes), 22394, 18541 --
# ---- 6 HD05_2_K5 2504656+41290+9191 ----
conservative
>1 length=2503585 depth=1.00x circular=true
>2 length=41288 depth=3.32x circular=true
>3 length=9191 depth=8.29x circular=true
>4 length=2767 depth=9.36x circular=true
# ---- 7 HD05_2_K6 2504588+41285+9192 ----
conservative
>1 length=2503927 depth=1.00x circular=true
>2 length=41288 depth=3.77x circular=true
>3 length=9191 depth=7.83x circular=true
>4 length=2767 depth=10.11x circular=true
ragtag.py scaffold ../assembly_flye_HD05_2/assembly.fasta assembly.fasta
ragtag.py patch ragtag.scaffold.fasta ../../assembly_flye_HD05_2/assembly.fasta
grep -o 'N' ragtag.patch.fasta | wc -l
makeblastdb -in ../assembly_flye_HD05_2/assembly.fasta -dbtype nucl
blastn -db ../assembly_flye_HD05_2/assembly.fasta -query 1-4.fasta -out assmbly_vs_flye.blastn -evalue 0.000001 -num_threads 15 -outfmt 6 -strand minus -max_target_seqs 1
Submit all genomes to NCBI
TODO: If 1585V using only the long reads to assemble the genome!
BioSample accession
BioProject: PRJNA1038700
Staphylococcus epidermidis strain:1585v | isolate:1585v Genome sequencing
SAMN38198576
Pathogen: clinical or host-associated sample from Staphylococcus epidermidis
0 SRAs
Status
To be released
Release date
2027-11-10
Created
2023-11-10 15:24
Updated
2023-11-17 16:57
Sample name
1585v
Package
Pathogen: clinical or host-associated; version 1.0
Organism
Name:
Staphylococcus epidermidis
Taxonomy ID:
1282
Attributes
Attribute name Attribute value
collected by
H R
collection date
2004
geographic location
Germany: Hamburg
host
Homo sapiens
host disease
port-catheter infection
isolation source
port-catheter
isolate
missing
strain
1585v
latitude and longitude
53.551672 N 9.955081 E
https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#run_pgap
Background of 1585v and 1585
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346721/ S. epidermidis 1585 is known to be biofilm-negative in laboratory media, but to form biofilm in the presence of human serum.
In contrast, S. epidermidis 1585 v is a variant derived from strain 1585 in which, due to a chromosomal re-arrangement, a 460 kDa isoform of Embp is overexpressed even in TSB, while mutant M135 is an isogenic mutant of 1585 v in which expression of Embp is interrupted by insertion of transposon Tn917.
Staphylococcus epidermidis (S. epidermidis) 1585 is a specific strain of S. epidermidis that is classified as a wild-type strain. Wild-type in bacterial terminology refers to the strain of an organism that is found in nature, as opposed to those that have been modified or mutated in a laboratory setting. Here are some key points about the 1585 wild-type strain of S. epidermidis:
No Embp Production in TSB: One notable characteristic of the S. epidermidis 1585 strain is that it does not produce Embp (extracellular matrix binding protein) when grown in TSB (Tryptic Soy Broth). Embp is a protein that plays a crucial role in the biofilm formation and adherence of bacteria to surfaces. The absence of Embp production in this strain could impact its ability to form biofilms, a common virulence factor in Staphylococcus infections.
Biofilm Formation: S. epidermidis is known for its ability to form biofilms, especially on medical devices, leading to infections that are difficult to treat. The fact that the 1585 strain doesn't produce Embp in TSB suggests it may have a reduced capacity for biofilm formation under these conditions, which could be significant in understanding and managing such infections.
Research and Clinical Implications: Studying wild-type strains like S. epidermidis 1585 is important for understanding the natural behavior and characteristics of the species. Since this strain behaves differently from other strains in terms of Embp production and possibly biofilm formation, it can provide insights into the mechanisms and genetic factors that control these processes. This knowledge is valuable for developing strategies to prevent and treat infections, especially in hospital and healthcare settings where S. epidermidis infections are common.
Genetic Studies: The 1585 strain can also serve as a baseline or control in genetic studies. By comparing the genome and behavior of 1585 with other strains of S. epidermidis, researchers can identify genetic variations and mutations that may be responsible for different phenotypes, such as increased virulence or antibiotic resistance.
Model Organism for Understanding Staphylococcal Behavior: As a wild-type strain, 1585 offers a model for studying the natural state of S. epidermidis. This is crucial for understanding the fundamental biology of the bacterium, which can help in the development of treatments and interventions against infections caused by more virulent or drug-resistant strains. In summary, the S. epidermidis
1585 wild-type strain is significant in microbiological research due to its natural characteristics, particularly its behavior in biofilm formation and Embp production. Understanding these aspects can contribute to better insights into the pathogenicity and treatment of Staphylococcus infections, particularly in clinical settings where these bacteria are a common source of nosocomial infections.
点赞本文的读者
还没有人对此文章表态
没有评论
Scaffolding and finishing an assembly with a reference genome
RNA-seq Tam on CP059040.1 (Acinetobacter baumannii strain ATCC 19606)
© 2023 XGenes.com Impressum