Plot phylogenetic tree_heatmap and MSA on yopBDJTEMKOH[NR]

gene_x 0 like s 1098 view s

Tags: plot, python, R, processing, scripts

| x | y | | ------------ | ------------ | | x | y | | x | y |

|Isolate|yopJ|yopB|yopT|yopE|yopD|yopM|yopK|yopO|yopH| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |Yersinia_enterocolitica_1055Rr|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_2516-87|0|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_8081|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_8081_bis|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_KNG22703|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_WA|1|1|1|1|1|0|1|1|1| |Yersinia_enterocolitica_Y1|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_Y11|1|1|1|1|1|1|1|1|1| |Yersinia_enterocolitica_YE1|1|0|1|1|1|1|1|1|1| |Yersinia_enterocolitica_YE165|1|1|1|1|0|1|1|0|1| |Yersinia_enterocolitica_YE3|1|0|1|1|1|1|1|0|1| |Yersinia_enterocolitica_YE5|1|0|1|1|1|1|1|1|1| |Yersinia_enterocolitica_YE6|1|1|1|1|1|1|1|0|1| |Yersinia_enterocolitica_YE7|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_1045|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_1412|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_1413|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_1522|1|1|0|0|1|1|1|1|1| |Yersinia_pestis_20|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_2944|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_3067|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_3770|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_790|0|1|1|1|1|1|1|0|1| |Yersinia_pestis_8787|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_91001|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_94|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Angola|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Angola_bis|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Antiqua|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Antiqua_bis|0|1|1|1|1|1|1|1|1| |Yersinia_pestis_CO92|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_CO92_pgm-_pPCP1-|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Cadman|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_D106004|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_D182038|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Dodson|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_EV76-CN|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_El_Dorado|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_FDAARGOS_601|1|1|1|1|1|1|1|0|1| |Yersinia_pestis_FDAARGOS_602|0|1|1|1|1|0|1|1|1| |Yersinia_pestis_FDAARGOS_603|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Harbin_35|1|1|1|1|1|1|1|0|1| |Yersinia_pestis_Harbin_35_bis|1|0|1|1|1|1|1|0|1| |Yersinia_pestis_Java9|1|1|1|1|1|1|1|0|1| |Yersinia_pestis_KIM5|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Nicholisk_41|1|1|1|0|1|1|1|0|1| |Yersinia_pestis_PBM19|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Pestoides_B|0|1|1|1|1|1|1|1|1| |Yersinia_pestis_Pestoides_F|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_Pestoides_F_bis|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_Pestoides_G|1|1|0|1|1|1|1|1|1| |Yersinia_pestis_R|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_S19960127|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_SCPM-O-B-5935_I-1996|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_SCPM-O-B-5942_I-2638|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_SCPM-O-B-6530|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_SCPM-O-B-6899_231|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_SCPM-O-DNA-18_I-3113|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Shasta|1|1|1|1|1|1|1|1|1| |Yersinia_pestis_Z176003|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_598|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_EP2+|0|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_FDAARGOS_579|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_FDAARGOS_580|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_FDAARGOS_581|1|1|1|0|1|1|1|1|1| |Yersinia_pseudotuberculosis_FDAARGOS_582|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_FDAARGOS_583|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_IP2666pIB1|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_IP32953|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_IP32953_bis|1|1|1|1|0|1|1|1|1| |Yersinia_pseudotuberculosis_NZYP4713|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_PA3606|1|1|1|1|1|1|1|1|1| |Yersinia_pseudotuberculosis_PB1+_bis|1|1|1|1|1|0|1|1|1|

ggtree_and_gheatmap_yopK

alignment_yopK

  1. This step uses rsync to download data from the NCBI server to a local directory, save all gff-files in the directory prokka.

    1. rsync --copy-links --recursive --times --verbose rsync://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/696/305/GCF_001696305.1_UCN72.1 Yersinia_pestis_1045
  2. The step processes GFF files containing gene annotations for a set of samples in the directory prokka. The primary goal is to modify the GFF files and create new ones with specific changes and to save them in the directory prokka_plus. The script operates on each sample one by one, and for each sample, it performs the following steps:

    • Replace all occurrences of \tCDS\t with CDS in the original GFF file.
    • Extract all lines containing CDS and save them in a new file with the suffix _CDS.gff.
    • Replace all occurrences of ID= with ID_old= in the new _CDS.gff file.
    • Cut the second field (delimited by ;) from the _CDS.gff file and save it in a new file with the suffix _CDS_f2.
    • Replace all occurrences of Parent=gene- with ID= in the _CDS_f2 file.
    • Paste the contents of the CDS.gff and _CDS_f2 files side by side, with a ; delimiter, and save the result in a new file with the suffix _CDS.gff.
    • Run the enum.py script on the CDS.gff file to add line numbers at the end, and save the result in a new file with the suffix _CDS__.gff. import sys

      1. if len(sys.argv) < 2:
      2. print("Please provide a filename as an argument.")
      3. sys.exit(1)
      4. filename = sys.argv[1]
      5. try:
      6. with open(filename) as f:
      7. for i, line in enumerate(f):
      8. print(f"{line.strip()}_{i+1}")
      9. except FileNotFoundError:
      10. print(f"File {filename} not found.")
    • Extract all lines from the original GFF file that do not contain CDS and save them in a new file with the suffix _nonCDS.gff.

    • Remove all lines containing ### from the nonCDS.gff file and save the result in a new file with the suffix _nonCDS.gff.
    • Concatenate the contents of the nonCDS.gff and _CDS__.gff files and save the result in a new file with the suffix _nonCDS_CDS.gff.
    • Replace all occurrences of CDS with \tCDS\t in the _nonCDS_CDS.gff file.
    • Append the string ##FASTA to the end of the _nonCDS_CDS.gff file.
    • Modify the FASTA file associated with the sample by replacing the first field (delimited by a space) with the corresponding sample name.
    • Concatenate the modified GFF file (_nonCDS_CDS.gff) and the modified FASTA file, and save the result in the ../prokka_plus/ directory with a new name based on the sample name.
    • After processing all samples, the script removes intermediate files generated during the process.
      1. for sample in Yersinia_pestis_1045 Yersinia_pestis_SCPM-O-B-6291_C-25 Yersinia_pestis_2944 Yersinia_pestis_KIM10+ Yersinia_pestis_M-1482; do
      2. sed -i 's/\tCDS\t/_CDS_/g' ${sample}.gff
      3. grep "_CDS_" ${sample}.gff > ${sample}_CDS.gff
      4. sed -i 's/ID=/ID_old=/g' ${sample}_CDS.gff
      5. cut -d';' -f2 ${sample}_CDS.gff > ${sample}_CDS_f2
      6. sed -i 's/Parent=gene-/ID=/g' ${sample}_CDS_f2
      7. paste -d';' ${sample}_CDS.gff ${sample}_CDS_f2 > ${sample}_CDS_.gff
      8. python enum.py ${sample}_CDS_.gff > ${sample}_CDS__.gff # add a line number to end to avoid the sameple Gene_ID
      9. grep -v "_CDS_" ${sample}.gff > ${sample}_nonCDS.gff
      10. grep -v "###" ${sample}_nonCDS.gff > ${sample}_nonCDS_.gff
      11. cat ${sample}_nonCDS_.gff ${sample}_CDS__.gff > ${sample}_nonCDS_CDS.gff
      12. sed -i 's/_CDS_/\tCDS\t/g' ${sample}_nonCDS_CDS.gff
      13. echo "##FASTA" >> ${sample}_nonCDS_CDS.gff
      14. cut -d' ' -f1 ../assembly/${sample}.fna > ../assembly/${sample}.fasta;
      15. cat ${sample}_nonCDS_CDS.gff ../assembly/${sample}.fasta > ../prokka_plus/$(echo $sample | cut -d'_' -f3- | tr " " "_").gff;
      16. done
      17. rm *_CDS.gff *_CDS_f2 *_CDS_.gff *_CDS__.gff *_nonCDS.gff *_nonCDS_.gff
  3. This step runs Roary, a tool for pan-genome analysis. It takes annotated bacterial genomes in GFF3 format as input and clusters the genes based on sequence similarity.

    1. roary -p 4 -f ./roary -i 95 -cd 99 -s -e -n -v prokka_plus/1045.gff prokka_plus/SCPM-O-B-6291_C-25.gff prokka_plus/2944.gff prokka_plus/KIM10+.gff
  4. This step extracts the coding sequences (CDS) of specific genes from multiple genome files and saves them to an output file. Start-files: roary/pan_genome_reference.fa and roary/gene_presence_absence.csv

    1. #grep "yopT" roary/gene_presence_absence.csv
    2. > yopT_seq.txt
    3. for gene_id in M486_RS20945_3996 YE105_RS20560_4012; do
    4. for gbff in Yersinia_massiliensis_2011N-4075/GCF_013282765.1_ASM1328276v1/GCF_013282765.1_ASM1328276v1_genomic.gbff.gz Yersinia_pestis_EV_NIIEG/GCF_000590535.2_ASM59053v2/GCF_000590535.2_ASM59053v2_genomic.gbff.gz Yersinia_pestis_Shasta/GCF_000834335.1_ASM83433v1/GCF_000834335.1_ASM83433v1_genomic.gbff.gz Yersinia_ruckeri_NVI-492/GCF_023212565.2_ASM2321256v2/GCF_023212565.2_ASM2321256v2_genomic.gbff.gz Yersinia_pestis_Pestoides_G/GCF_000834985.1_ASM83498v1/GCF_000834985.1_ASM83498v1_genomic.gbff.gz; do
    5. output=$(python3 extract_CDS_of_a_locus_tag.py ${gbff} $(echo "${gene_id}" | cut -d '_' -f 1-2))
    6. if [[ ! -z "${output}" ]]; then
    7. gbff_short=$(echo "${gbff}" | cut -d '/' -f 1)
    8. printf "%s\t%s\n" "${gbff_short}" "${output}" >> yopT_seq.txt
    9. fi
    10. done
    11. done
    12. #-- code snippet of extract_CDS_of_a_locus_tag.py --
    13. from Bio import SeqIO
    14. import sys
    15. import gzip
    16. #python3 extract_CDS_of_a_locus_tag.py GCF_001188735.1_ASM118873v1_genomic.gbff.gz M486_RS20950
    17. # Get the file name from the command-line argument
    18. if len(sys.argv) == 3:
    19. filename = sys.argv[1]
    20. locus_tag = sys.argv[2]
    21. else:
    22. print(sys.argv)
    23. print("Error: no file name provided")
    24. sys.exit(1)
    25. # Open the compressed file and read the sequences
    26. with gzip.open(filename, "rt") as f:
    27. # Open the GenBank file and read the first record
    28. # record = SeqIO.read("GCF_001188735.1_ASM118873v1_genomic.gbff", "genbank")
    29. for record in SeqIO.parse(f, "genbank"):
    30. #print("%s %i" % (record.id, len(record)))
    31. # Define the locus_tag you want to extract
    32. #locus_tag = "M486_RS20950"
    33. # Loop through the features and extract the CDS with the specified locus_tag
    34. for feature in record.features:
    35. if feature.type == "CDS" and "locus_tag" in feature.qualifiers and feature.qualifiers["locus_tag"][0] == locus_tag:
    36. # Extract the CDS location information and sequence
    37. location = feature.location
    38. seq = location.extract(record).seq
    39. #print(f">{locus_tag}")
    40. print(f"{seq}")
    41. # Translate the nucleotide sequence to protein sequence
    42. #protein_seq = seq.translate()
    43. #print(f"Locus tag: {locus_tag}\nProtein sequence: {protein_seq}")
  5. The given code converts a DNA sequence file to a protein sequence file, aligns the protein sequences using MAFFT or MUSCLE, and constructs a phylogenetic tree using FastTree.

    1. #mafft --clustalout --adjustdirection yopK_seqs.fasta > yopK_seqs.output
    2. #fasttree -gtr -gamma -nt yopK_alignment.fasta > yopK_alignment.tree
    3. #raxml-ng --all --model GTR+G+ASC_LEWIS --prefix core_gene_raxml --threads 6 --msa yopK_alignment_.fasta --bs-trees 1000 -slow
    4. #fasttree <input_file> > <output_file>
    5. for yop in yopJ yopB yopT yopE yopD yopM yopK yopO yopH; do
    6. python3 ../../../plotTreeHeatmap/dna_to_protein.py ${yop}_seq.txt ${yop}_protein.fasta
    7. python3 ../../../plotTreeHeatmap/protein_alignment.py ${yop}_protein.fasta ${yop}_aligned_protein.fasta mafft
    8. awk -F '_' '/^>/ { printf(">%s", $3); for (i = 4; i <= NF; ++i) printf("_%s", $i); printf("\n"); next } { print }' ${yop}_aligned_protein.fasta > ${yop}_aligned_protein_.fasta
    9. done
    10. fasttree yopE_aligned_protein_.fasta > yopE_aligned_protein.tree
    11. fasttree yopO_aligned_protein_.fasta > yopO_aligned_protein.tree
    12. fasttree yopT_aligned_protein_.fasta > yopT_aligned_protein.tree
    13. grep ">" yopE_aligned_protein.fasta > typing_yopE.csv
    14. grep ">" yopO_aligned_protein.fasta > typing_yopO.csv
    15. grep ">" yopT_aligned_protein.fasta > typing_yopT.csv
    16. #-- construct typing_for yopE --
    17. cut -f1 -d'_' typing_yopE.csv > f1
    18. cut -f2 -d'_' typing_yopE.csv > f2
    19. cut -f3- -d'_' typing_yopE.csv > f3_
    20. paste f3_ typing_yopE.csv > temp1
    21. paste f1 f2 > temp2
    22. paste temp1 temp2 > typing_yopE_.csv
    23. #Isolate,Name,Genus,Species,yopE
    24. #R,Yersinia_pestis_R,Yersinia,pestis,Yes
    25. #20,Yersinia_pestis_20,Yersinia,pestis,Yes
    26. #...
    27. #-- for yopO --
    28. cut -f1 -d'_' typing_yopO.csv > f1
    29. cut -f2 -d'_' typing_yopO.csv > f2
    30. cut -f3- -d'_' typing_yopO.csv > f3_
    31. paste f3_ typing_yopO.csv > temp1
    32. paste f1 f2 > temp2
    33. paste temp1 temp2 > typing_yopO_.csv
    34. #-- for yopT --
    35. cut -f1 -d'_' typing_yopT.csv > f1
    36. cut -f2 -d'_' typing_yopT.csv > f2
    37. cut -f3- -d'_' typing_yopT.csv > f3_
    38. paste f3_ typing_yopT.csv > temp1
    39. paste f1 f2 > temp2
    40. paste temp1 temp2 > typing_yopT_.csv
    41. # code snippet of dna_to_protein.py
    42. from Bio import SeqIO
    43. from Bio.Seq import Seq
    44. from Bio.SeqRecord import SeqRecord
    45. import sys
    46. def translate_dna_to_protein(dna_fasta_file, protein_fasta_file):
    47. protein_records = []
    48. for record in SeqIO.parse(dna_fasta_file, "fasta"):
    49. protein_seq = record.seq.translate(to_stop=True)
    50. protein_record = SeqRecord(protein_seq, id=record.id, description=record.description)
    51. protein_records.append(protein_record)
    52. SeqIO.write(protein_records, protein_fasta_file, "fasta")
    53. if __name__ == "__main__":
    54. input_file = sys.argv[1]
    55. output_file = sys.argv[2]
    56. translate_dna_to_protein(input_file, output_file)
    57. # code snippet of protein_alignment.py
    58. import sys
    59. from Bio.Align.Applications import MafftCommandline, MuscleCommandline
    60. from Bio import SeqIO
    61. from Bio import AlignIO
    62. def run_alignment(input_file, output_file, aligner):
    63. if aligner.lower() == "mafft":
    64. mafft_cline = MafftCommandline(input=input_file)
    65. stdout, stderr = mafft_cline()
    66. elif aligner.lower() == "muscle":
    67. muscle_cline = MuscleCommandline(input=input_file, out=output_file)
    68. stdout, stderr = muscle_cline()
    69. else:
    70. print("Invalid aligner. Please choose 'mafft' or 'muscle'.")
    71. sys.exit(1)
    72. with open(output_file, "w") as aligned_fasta:
    73. aligned_fasta.write(stdout)
    74. if __name__ == "__main__":
    75. if len(sys.argv) < 4:
    76. print("Usage: python protein_alignment.py input_fasta output_fasta aligner")
    77. print("Example: python protein_alignment.py input_protein.fasta aligned_protein.fasta mafft")
    78. sys.exit(1)
    79. else:
    80. input_fasta = sys.argv[1]
    81. output_fasta = sys.argv[2]
    82. aligner_choice = sys.argv[3]
    83. run_alignment(input_fasta, output_fasta, aligner_choice)
  6. This step generates a circular phylogenetic tree, a heatmap, and a multiple alignment sequence in a single figure. The tree is constructed from the aligned protein sequence "yopE_aligned_protein_.fasta" and the heatmap is based on the "typing_yopE_.csv" data. The multiple alignment sequence is added into the final figure by extracting specific sequences from the aligned protein sequence file.

    1. library(ggtree)
    2. library(ggplot2)
    3. library(Biostrings)
    4. library(stringr)
    5. library(dplyr)
    6. library(ape)
    7. library(ggmsa)
    8. #install.packages("cowplot")
    9. library(cowplot)
    10. setwd("/home/jhuang/DATA/Data_Gunnar_Yersiniomics/plotTreeHeatmap/")
    11. # -- edit tree --
    12. #https://icytree.org/
    13. #0.000780
    14. #info <- read.csv("typing_198.csv")
    15. info <- read.csv("typing_yopE_.csv")
    16. info$name <- info$Isolate
    17. #rownames(info) <- info$name
    18. info$yopE <- factor(info$yopE)
    19. #tree <- read.tree("core_gene_alignment_fasttree_directly_from_186isoaltes.tree") --> NOT GOOD!
    20. #tree <- read.tree("raxml.tree")
    21. #tree <- read.tree("yopK_alignment_modified.tree")
    22. tree <- read.tree("yopE_aligned_protein.tree")
    23. #tree <- read.tree("core_gene_raxml.raxml.bestTree")
    24. cols <- c(Yes='purple2', No='skyblue2')
    25. # -- tree --
    26. tree_plot <- ggtree(tree, layout='circular', branch.length='none') %<+% info + scale_color_manual(values=cols) + geom_tiplab2(aes(label=name), offset=1) + geom_tippoint(aes(color=yopE))
    27. #+ scale_color_manual(values=cols) + geom_tiplab2(aes(label=name), offset=1)
    28. #, geom='text', align=TRUE, linetype=NA, hjust=1.8,check.overlap=TRUE, size=3.3
    29. #difference between geom_tiplab and geom_tiplab2?
    30. #+ theme(axis.text.x = element_text(angle = 30, vjust = 0.5)) + theme(axis.text = element_text(size = 20)) + scale_size(range = c(1, 20))
    31. #font.size=10,
    32. png("ggtree2.png", width=1260, height=1260)
    33. #png("ggtree.png", width=1000, height=1000)
    34. #svg("ggtree.svg", width=1260, height=1260)
    35. tree_plot
    36. dev.off()
    37. # -- heatmap --
    38. #heatmapData2 <- info %>% select(Isolate, cgMLST, Lineage)
    39. heatmapData2 <- info %>% select(Isolate, Species)
    40. rn <- heatmapData2$Isolate
    41. heatmapData2$Isolate <- NULL
    42. heatmapData2 <- as.data.frame(sapply(heatmapData2, as.character))
    43. rownames(heatmapData2) <- rn
    44. #"1","2","3","4","9","12","13","14","16","18", "19","30","32","36","39","41","42","43","44","53", "64","79","83","84","92","140","148","154","171","172", "173","194","215","217","252","277","279","282","290","312", "335","NA"
    45. #https://bookdown.org/hneth/ds4psy/D-3-apx-colors-basics.html
    46. #"blue","cyan", "skyblue2", "azure3","blueviolet","darkgoldenrod", "tomato","mediumpurple4","indianred",
    47. #"cornflowerblue","darkgreen","seagreen3","tan","red","green","orange","pink","brown","magenta", "cornflowerblue","darkgreen","red","tan","brown",
    48. #heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red", "navyblue", "gold", "lightcyan3","green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki", "maroon","lightgreen", "darkgreen","seagreen3","tan","red", "navyblue", "gold", "green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki", "lightcyan3", "maroon","lightgreen", "darkgrey")
    49. #names(heatmap.colours) <- c("pestis","pseudotuberculosis","similis","enterocolitica","frederiksenii","kristensenii","occitanica","intermedia","hibernica","canariae","alsatica","rohdei","massiliensis","bercovieri","aleksiciae","mollaretii","aldovae","ruckeri","entomophaga", "1","1Aa","1B","2","2/3-9a","2/3-9b","4","5","6","8","10","11","12","13","14","16","17","29","NA")
    50. heatmap.colours <- c("cornflowerblue","darkgreen","seagreen3","tan","red", "navyblue", "gold", "lightcyan3","green","orange","pink","purple","magenta","brown", "darksalmon","chocolate4","darkkhaki", "maroon","lightgreen")
    51. names(heatmap.colours) <- c("pestis","pseudotuberculosis","similis","enterocolitica","frederiksenii","kristensenii","occitanica","intermedia","hibernica","canariae","alsatica","rohdei","massiliensis","bercovieri","aleksiciae","mollaretii","aldovae","ruckeri","entomophaga")
    52. #"cornflowerblue","darkgreen","seagreen3","tan","red","green","orange","pink","brown","magenta","cornflowerblue",
    53. #"2.MED","4.ANT","2.ANT","1.ORI","1.IN","1.ANT","0.ANT3","0.PE4","0.PE3","0.PE2","1a",
    54. #mydat$Regulation <- factor(mydat$Regulation, levels=c("up","down"))
    55. #rochesterensis-->occitanica
    56. png("ggtree_and_gheatmap_yopE.png", width=1000, height=800)
    57. #png("ggtree_and_gheatmap.png", width=1290, height=1000)
    58. #png("ggtree_and_gheatmap.png", width=1690, height=1400)
    59. #svg("ggtree_and_gheatmap.svg", width=1290, height=1000)
    60. #svg("ggtree_and_gheatmap.svg", width=17, height=15)
    61. tree_heatmap_plot <- gheatmap(tree_plot, heatmapData2, width=0.2,colnames_position="top", colnames_angle=90, colnames_offset_y = 0.1, hjust=0.5, font.size=4, offset = 8) + scale_fill_manual(values=heatmap.colours) + theme(legend.text = element_text(size = 14)) + theme(legend.title = element_text(size = 14)) + guides(fill=guide_legend(title=""), color = guide_legend(override.aes = list(size = 5)))
    62. tree_heatmap_plot
    63. dev.off()
    64. #samtools faidx yopE_aligned_protein_.fasta "1522" > yopE_aligned_protein_sorted.fasta
    65. #samtools faidx yopE_aligned_protein_.fasta "Pestoides_F_bis" >> yopE_aligned_protein_sorted.fasta
    66. #...
    67. #samtools faidx yopE_aligned_protein_.fasta "8081" >> yopE_aligned_protein_sorted.fasta
    68. #samtools faidx yopE_aligned_protein_.fasta "WA" >> yopE_aligned_protein_sorted.fasta
    69. #https://bioconductor.org/packages/devel/bioc/vignettes/ggtreeExtra/inst/doc/ggtreeExtra.html
    70. #https://github.com/YuLab-SMU/supplemental-ggmsa
    71. #https://github.com/YuLab-SMU/ggmsa
    72. library(ggmsa)
    73. library(ggplot2)
    74. library(ggtree)
    75. #library(gggenes)
    76. library(ape)
    77. library(Biostrings)
    78. library(ggnewscale)
    79. library(dplyr)
    80. library(ggtreeExtra)
    81. library(phangorn)
    82. library(RColorBrewer)
    83. library(patchwork)
    84. library(ggplotify)
    85. library(aplot)
    86. library(magick)
    87. library(treeio)
    88. #data <- "../supplemental-ggmsa-main/data/s_RBD.fasta"
    89. data <- "yopE_aligned_protein_.fasta"
    90. #data <- "yopE_aligned_protein_sorted.fasta"
    91. #dms <- read.csv("data/DMS.csv")
    92. #del <- c("expr_lib1", "expr_lib2","expr_avg","bind_lib1","bind_lib2")
    93. #dms <- dms[,!colnames(dms) %in% del]
    94. tidymsa <- tidy_msa(data)
    95. #tidymsa <- assign_dms(tidymsa, dms)
    96. #Mapping the position to the protein-protein interaction plot
    97. #tidymsa$position <- tidymsa$position + 330
    98. #dms = TRUE,
    99. png("alignment_yopE.png", width=1100, height=5400)
    100. msa_plot <- ggplot() +
    101. geom_msa(data = tidymsa, char_width = 0.5, seq_name = TRUE, show.legend = TRUE) + theme_msa() + facet_msa(50)
    102. #scale_fill_gradientn(name = "ACE2 binding", colors = c(colRD(75),colBU(25)))
    103. msa_plot
    104. dev.off()
    105. # #Combine the ggtree and ggmsa plots using the cowplot package:
    106. # combined_plot <- cowplot::plot_grid(msa_plot, tree_heatmap_plot, ncol = 1, align = "v", axis = "l", rel_heights = c(3, 2))
    107. # ggsave("combined_plot.png", combined_plot, width = 10, height = 10, dpi = 300)

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum