Using the latest human data from Ensembl release and the Perl API to convert coordinates on chromosome (e.g chromosome 10 from 25000 to 30000 ) to the same region in GRCh37. Enable the script to be as generic as possible to be run as a command-line program.
I have created a python script specific_cooordinate_conversion.py
which uses Ensembl REST API Endpoint to convert a specific coordinate from GRCh38 to GRCh37 assembly.
Input for the python script is sequence_region_name:start..end:strand(optional)
.
Here, arguments - "Sequence Region Name, Start Position and End Position" are mandatory whereas "Strand" is optional.
For example,
X:1000000..1000100:1
or X:1000000..1000100
Input the arguments as necessary
python3 specific_cooordinate_conversion.py X:1000000..1000100
Output will be of JSON format containing mapping of the the chromosome region in GRCh38 to that of the coordinates in GRCh37
{'mappings':
[
{'original': {'end': 1000100, 'assembly': 'GRCh38', 'strand': 1, 'coord_system': 'chromosome', 'start': 1000000, 'seq_region_name': 'X'},
'mapped': {'end': 960835, 'assembly': 'GRCh37', 'strand': 1, 'coord_system': 'chromosome', 'seq_region_name': 'X', 'start': 960735}
},
{'original': {'assembly': 'GRCh38', 'end': 1000100, 'start': 1000000, 'seq_region_name': 'X', 'strand': 1, 'coord_system': 'chromosome'},
'mapped': {'coord_system': 'chromosome', 'strand': 1, 'seq_region_name': 'HG480_HG481_PATCH', 'start': 960735, 'end': 960835, 'assembly': 'GRCh37'}
}
]
}
I have created a perl script convert_coordinates_GRCh38_GRCh37.pl
which uses Ensembl REST API Endpoint to convert all the chromosome sequence regions from GRCh38 to that of the coordinates in GRCh37 assembly.
This script will convert all the coordinates of the chromosome and hence, there is no specific input.
perl convert_coordinates_GRCh38_GRCh37.pl
Output will be a JSON format file data_out.json
containing all the mappings of the the chromosome regions in GRCh38 to that of the coordinates in GRCh37