gene_x 0 like s 563 view s
Tags: python, tool, genomics
If you have genomic coordinates (like gene positions, SNP positions etc) in hg19 and want to convert them to hg38, you'd use what's known as a "liftover". The UCSC Genome Browser provides a tool specifically for this purpose called the UCSC LiftOver tool.
Here is a brief step-by-step process to use the UCSC LiftOver tool:
Format your data: Make sure your data is in BED format (a text file with at least the first three columns being chromosome, start position, and end position).
Go to the LiftOver tool: Visit the UCSC LiftOver tool at https://genome.ucsc.edu/cgi-bin/hgLiftOver.
Upload your file and select the appropriate parameters: Upload your BED file, choose "hg19" as the original genome, and "hg38" as the genome to convert to.
Submit and download the result: Click "submit" and download the resulting BED file, which will contain your original genomic coordinates converted from hg19 to hg38.
It's important to note that not all genomic regions will have a direct equivalent between versions, so some data may be lost in the conversion.
If you need to do this conversion frequently or with large datasets, there are also command-line tools and scripts available to perform the liftover locally, such as the "CrossMap" tool or the "liftOver" utility provided by UCSC, which can be run from a Unix-like (Linux/Mac) command line.
If you wish to perform a liftover in Python, you can use the package pyliftover. This package provides a Python interface to use the precomputed chain files available from the UCSC Genome Browser. Here is a simple example of how to use this package:
First, you need to install the pyliftover package. You can do this with pip:
pip install pyliftover
Then, download the chain file for the conversion. For hg19 to hg38, you can download it from the UCSC Genome Browser:
wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToHg38.over.chain.gz
Now, you can use the package in Python:
from pyliftover import LiftOver
# Initialize the LiftOver object
lo = LiftOver('hg19', 'hg38')
# Convert a position (e.g., chr1:1000000)
new_pos = lo.convert_coordinate('chr1', 1000000)
print(new_pos)
In this code:
Please note that the convert_coordinate method returns a list of conversions because a position could potentially map to multiple positions in the new genome. Each conversion is a tuple, where the first element is the name of the new chromosome, the second element is the new position, the third element is the strand ('+' or '-'), and the fourth element is the conversion ratio.
Also, keep in mind that not all positions can be converted. If a position cannot be converted, convert_coordinate will return an empty list.
In the process of lifting over, some regions might not get mapped from the old assembly to the new one. There are several reasons why this could happen:
Assembly differences: The two assemblies (source and target) might have different representations of certain genomic regions. For example, some regions might have been rearranged, added, or removed in the newer assembly.
Unmapped regions: Some regions in the older assembly might not have a clear equivalent in the newer assembly, and vice versa. This could happen if, for example, a region is represented differently in the two assemblies due to updates in the understanding of the genome structure.
Quality of mapping: LiftOver uses chain files that represent alignments of large blocks of the genome. If a region in the old assembly does not align well with any region in the new assembly, it might not get mapped.
LiftOver parameters: LiftOver has some parameters (like the minimum ratio of bases that must remap) that can affect which regions get mapped. If you set these parameters to be more stringent, some regions might not get mapped.
When using LiftOver, it's important to check the unmapped regions (the tool usually provides a file with these regions) to see if they are important for your analysis. If they are, you might need to consider other strategies to include them, such as manual inspection or alternative mapping tools.
#Deleted in new
chr14 483848 483848
#Explain failure messages
Deleted in new:
Sequence intersects no chains
Partially deleted in new:
Sequence insufficiently intersects one chain
Split in new:
Sequence insufficiently intersects multiple chains
Duplicated in new:
Sequence sufficiently intersects multiple chains
Boundary problem:
Missing start or end base in an exon
https://epd.expasy.org/epd/human/human_database.php?db=human
点赞本文的读者
还没有人对此文章表态
没有评论
Identifying the Nearest Genomic Peaks within Defined Regions
© 2023 XGenes.com Impressum