Yops (Yersinia outer proteins) analysis 2

gene_x 0 like s 493 view s

Tags: python, processing, bash

http://xgenes.com/article/article-content/162/yersinia-outer-proteins-yops-analysis/

  1. extract all plasmids of the 50 isolates with plasmids but no yopK

    python3 extract_plasmids_from_gff.py ../prokka_plus/1045.gff  #reference
    for sample in SCPM-O-B-6291_C-25 KIM10+ 195P Nepal516 A1122 A1122_bis Nairobi IP31758 228 NW57 NW117 NW56 NW115 FORC_002 FORC_002_bis Gp200 NW116 Gp169 Y225 ATCC_BAA-2637 CFS1934 LC20 GTA 2011N-4075 ATCC_43970 NHV_3758 NVI-10705 NVI-1292 NVI-4570 NVI-6614 NVI-11267 NVI-11294 NVI-10571 NVI-8524 NVI-1176 NVI-701 17Y0412 17Y0414 NVI-492 NVI-9681 SC09 17Y0189 17Y0153 17Y0155 KMM821 16Y0180 NVI-5089 NVI-10587 NVI-4840 17Y0159; do
        python3 extract_plasmids_from_gff.py ../prokka_plus/${sample}.gff
    done
    
    grep "yop" *.gff3
    grep "Yop" *.gff3
    (yopH, yopO, yopE, yopT, yopM, yopD, yopB, yopN) and (YopH, YopJ, YopO, YopE, YopT, YopM, YopD, YopB, YopN, YopR) in 195P_NZ_CP019710
    
    # code of extract_plasmids_from_gff.py
    import sys
    import os
    from Bio import SeqIO
    from Bio.Alphabet import generic_dna
    
    if len(sys.argv) != 2:
        print("Usage: python script_name.py your_input.gff3")
        sys.exit(1)
    
    input_gff = sys.argv[1]
    base_filename = os.path.splitext(os.path.basename(input_gff))[0]
    
    # Split the GFF file into annotations and sequences
    with open(input_gff, 'r') as f:
        lines = f.readlines()
        fasta_start = lines.index("##FASTA\n")
        gff_lines = lines[:fasta_start]
        fasta_lines = lines[fasta_start + 1:]
    
    # Separate GFF content for each plasmid/chromosome
    gff_dict = {}
    for line in gff_lines:
        if not line.startswith("#"):
            record_id = line.split("\t")[0]
            if record_id not in gff_dict:
                gff_dict[record_id] = []
            gff_dict[record_id].append(line)
    
    # Write the sequences temporarily to a file
    with open("temp.fasta", 'w') as f:
        f.writelines(fasta_lines)
    
    # Read the sequences from the temporary file
    records = list(SeqIO.parse("temp.fasta", format="fasta"))
    
    for idx, rec in enumerate(records):
        # Skip the chromosome (the first record)
        if idx == 0:
            continue
    
        # Write GFF3
        with open(f"{base_filename}_{rec.id}.gff3", "w") as output_handle:
            output_handle.writelines(gff_dict.get(rec.id, []))
            output_handle.write("##FASTA\n")
            SeqIO.write(rec, output_handle, "fasta")
    
        ## Write GenBank (without annotations)
        #with open(f"plasmid_{rec.id}.gbk", "w") as output_handle:
        #    rec.seq.alphabet = generic_dna  # Add temporary alphabet
        #    SeqIO.write(rec, output_handle, "genbank")
    
        # Write FASTA
        with open(f"{base_filename}_{rec.id}.fasta", "w") as output_handle:
            SeqIO.write(rec, output_handle, "fasta")
    
  2. (optional) cluster all plasmids against reference using fastANI

    git clone https://github.com/ParBLiSS/FastANI.git
    cd FastANI
    ./bootstrap.sh
    ./configure
    make
    ~/Tools/FastANI/fastANI -q 1045_NZ_CP006795.1.fasta --rl plasmids.txt -o output_ani.txt
    

The output of FastANI is a tab-separated file, typically with four columns:

  • Query genome path: The path (or filename) of the query genome.
  • Reference genome path: The path (or filename) of the reference genome.
  • ANI percentage: The average nucleotide identity (ANI) value between the query and reference genomes. This is expressed as a percentage and represents the average nucleotide identity of orthologous gene pairs between the two genomes.
  • Orthologous fragment count: The number of orthologous fragments that were found and compared between the two genomes. FastANI breaks genomes into fixed-size fragments (default is 3kb) and then identifies orthologous fragments between genomes for ANI calculation. This column indicates the number of such orthologous fragment pairs used in the ANI calculation.

    1045_NZ_CP006795.1.fasta     ./SCPM-O-B-6291_C-25_NZ_CP045165.1.fasta   94.3585 1       23
    
  • Query: plasmid_NZ_CP006795.1.fasta

  • Reference: ./plasmid_NZ_CP045165.1.fasta
  • ANI: 94.3585%
  • Orthologous fragments: 1 out of 23
  • Here, the plasmid 1045_NZ_CP006795.1.fasta is being compared to SCPM-O-B-6291_C-25_NZ_CP045165.1.fasta. They have an ANI of approximately 94.36%. However, only 1 out of the 23 fragments in the query plasmid was found to be orthologous with the reference plasmid.

  • The 50 isolates with plasmids but no yopK are as follows.

    SCPM-O-B-6291_C-25.gff  2   Yersinia pestis SCPM-O-B-6291 C-25  aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  2.MED   2   _plasmid    No  NA
    KIM10+  4   Yersinia pestis KIM10+  aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  2.MED   1   _plasmid    No  NA
    195P    19  Yersinia pestis 195P    aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  2.ANT   3   _plasmid    No  NA
    Nepal516    20  Yersinia pestis Nepal516    aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  2.ANT   2   _plasmid    No  NA
    A1122   24  Yersinia pestis A1122   aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  1.ORI   2   _plasmid    No  NA
    A1122_bis   26  Yersinia pestis A1122 bis   aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  1.ORI   2   _plasmid    No  NA
    Nairobi 43  Yersinia pestis Nairobi aarF(39)    dfp(37) galR(48)    glnS(41)    hemA(44)    rfaE(38)    speA(35)    pestis  pestis  79  pestis  1.ANT   1   _plasmid    No  NA
    IP31758 82  Yersinia pseudotuberculosis IP31758 adk(1)  argA(2) aroA(1) glnA(6) thrA(8) tmk(3)  trpE(2) pseudotuberculosis  pseudotuberculosis  2   8       2   _plasmid    No  NA
    228 94  Yersinia similis 228    adk(5)  argA(4) aroA(12)    glnA(12)    thrA(15)    tmk(9)  trpE(9) similis similis 92          1   _plasmid    No  NA
    NW57    115 Yersinia enterocolitica NW57    adk(20) argA(85)    aroA(21)    glnA(22)    thrA(21)    tmk(28) trpE(81)    enterocolitica  enterocolitica  312 1Aa     2   _plasmid    No  NA
    NW117   116 Yersinia enterocolitica NW117   adk(20) argA(85)    aroA(21)    glnA(22)    thrA(21)    tmk(28) trpE(81)    enterocolitica  enterocolitica  312 1Aa     2   _plasmid    No  NA
    NW56    118 Yersinia enterocolitica NW56    adk(20) argA(85)    aroA(21)    glnA(22)    thrA(21)    tmk(28) trpE(81)    enterocolitica  enterocolitica  312 1Aa     2   _plasmid    No  NA
    NW115   119 Yersinia enterocolitica NW115   adk(20) argA(85)    aroA(21)    glnA(22)    thrA(21)    tmk(28) trpE(81)    enterocolitica  enterocolitica  312 1Aa     2   _plasmid    No  NA
    FORC_002    121 Yersinia enterocolitica FORC_002    adk(12) argA(19)    aroA(21)    glnA(22)    thrA(25)    tmk(24) trpE(19)    enterocolitica  enterocolitica  252 1Aa     1   _plasmid    No  NA
    FORC_002_bis    122 Yersinia enterocolitica FORC_002 bis    adk(12) argA(19)    aroA(21)    glnA(22)    thrA(25)    tmk(24) trpE(19)    enterocolitica  enterocolitica  252 1Aa     1   _plasmid    No  NA
    Gp200   129 Yersinia enterocolitica Gp200   adk(20) argA(21)    aroA(85)    glnA(32)    thrA(25)    tmk(~71)    trpE(19)    enterocolitica  enterocolitica      1Aa     1   _plasmid    No  NA
    NW116   130 Yersinia enterocolitica NW116   adk(86) argA(41)    aroA(31)    glnA(83)    thrA(31)    tmk(104)    trpE(16)    enterocolitica  enterocolitica  335 1Aa     1   _plasmid    No  NA
    Gp169   131 Yersinia enterocolitica Gp169   adk(86) argA(41)    aroA(31)    glnA(83)    thrA(31)    tmk(104)    trpE(16)    enterocolitica  enterocolitica  335 1Aa     1   _plasmid    No  NA
    Y225    134 Yersinia frederiksenii Y225 aarF(43)    dfp(41) galR(50)    glnS(47)    hemA(48)    rfaE(41)    speA(39)    frederiksenii   occitanica  83          1   _plasmid    No  NA
    ATCC_BAA-2637   137 Yersinia rochesterensis ATCC BAA-2637   aarF(43)    dfp(41) galR(50)    glnS(10)    hemA(58)    rfaE(41)    speA(39)    rochesterensis  occitanica  84          2   _plasmid    No  NA
    CFS1934 140 Yersinia hibernica CFS1934                              hibernica   hibernica               1   _plasmid    No  NA
    LC20    141 Yersinia hibernica LC20 adk(-)  argA(66)    aroA(-) glnA(68)    thrA(78)    tmk(85) trpE(76)    hibernica   hibernica               2   _plasmid    No  NA
    GTA 146 Yersinia massiliensis GTA   aarF(15)    dfp(~31)    galR(32)    glnS(15)    hemA(30)    rfaE(32)    speA(16)    massiliensis    massiliensis        2       2   _plasmid    No  NA
    2011N-4075  147 Yersinia massiliensis 2011N-4075    aarF(15)    dfp(~31)    galR(~32)   glnS(15)    hemA(30)    rfaE(32)    speA(16)    massiliensis    massiliensis        2       2   _plasmid    No  NA
    ATCC_43970  151 Yersinia bercovieri ATCC 43970  aarF(47)    dfp(45) galR(54)    glnS(61)    hemA(63)    rfaE(45)    speA(9) bercovieri  bercovieri  30          1   _plasmid    No  NA
    NHV_3758    163 Yersinia ruckeri NHV_3758   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-10705   164 Yersinia ruckeri NVI-10705  adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-1292    165 Yersinia ruckeri NVI-1292   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-4570    166 Yersinia ruckeri NVI-4570   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             3   _plasmid    No  NA
    NVI-6614    167 Yersinia ruckeri NVI-6614   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             3   _plasmid    No  NA
    NVI-11267   168 Yersinia ruckeri NVI-11267  adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-11294   169 Yersinia ruckeri NVI-11294  adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-10571   170 Yersinia ruckeri NVI-10571  adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-8524    171 Yersinia ruckeri NVI-8524   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             2   _plasmid    No  NA
    NVI-1176    172 Yersinia ruckeri NVI-1176   adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-701 173 Yersinia ruckeri NVI-701    adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             1   _plasmid    No  NA
    17Y0412 174 Yersinia ruckeri 17Y0412    adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             1   _plasmid    No  NA
    17Y0414 175 Yersinia ruckeri 17Y0414    adk(64) argA(76)    aroA(78)    glnA(78)    thrA(90)    tmk(86) trpE(77)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-492 176 Yersinia ruckeri NVI-492    aarF(76)    dfp(40) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-9681    177 Yersinia ruckeri NVI-9681   adk(64) argA(67)    aroA(72)    glnA(78)    thrA(90)    tmk(86) trpE(89)    ruckeri ruckeri             1   _plasmid    No  NA
    SC09    178 Yersinia ruckeri SC09   aarF(76)    dfp(40) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri             2   _plasmid    No  NA
    17Y0189 180 Yersinia ruckeri 17Y0189    aarF(13)    dfp(10) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri 44          1   _plasmid    No  NA
    17Y0153 181 Yersinia ruckeri 17Y0153    aarF(13)    dfp(10) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri 44          1   _plasmid    No  NA
    17Y0155 182 Yersinia ruckeri 17Y0155    aarF(13)    dfp(10) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri 44          1   _plasmid    No  NA
    KMM821  183 Yersinia ruckeri KMM821 aarF(13)    dfp(10) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri 44          2   _plasmid    No  NA
    16Y0180 184 Yersinia ruckeri 16Y0180    aarF(13)    dfp(10) galR(14)    glnS(13)    hemA(15)    rfaE(14)    speA(14)    ruckeri ruckeri 44          1   _plasmid    No  NA
    NVI-5089    189 Yersinia ruckeri NVI-5089   adk(75) argA(76)    aroA(72)    glnA(78)    thrA(90)    tmk(86) trpE(89)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-10587   190 Yersinia ruckeri NVI-10587  adk(75) argA(76)    aroA(72)    glnA(78)    thrA(90)    tmk(86) trpE(89)    ruckeri ruckeri             1   _plasmid    No  NA
    NVI-4840    191 Yersinia ruckeri NVI-4840   adk(75) argA(76)    aroA(72)    glnA(79)    thrA(90)    tmk(96) trpE(89)    ruckeri ruckeri             2   _plasmid    No  NA
    17Y0159 197 Yersinia ruckeri 17Y0159    aarF(76)    dfp(40) galR(14)    glnS(13)    hemA(15)    RfaE(14)    speA(14)    ruckeri ruckeri             3   _plasmid    No  NA
    

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum