MeToken: Uniform Micro-Environment Token Boosts Post-Translational Modification Prediction

This repository contains the open-source implementation of the paper "MeToken: Uniform Micro-Environment Token Boosts Post-Translational Modification Prediction." The MeToken model leverages both sequence and structural information to accurately predict post-translational modification (PTM) types at specific sites on proteins. By tokenizing the micro-environment of each amino acid, MeToken captures the complex factors influencing PTMs, addressing limitations of sequence-only models and improving prediction performance, especially for rare PTM types.

Introduction

Post-translational modifications (PTMs) are crucial for regulating protein function and interactions. Accurately predicting PTM sites and their types helps understand biological processes and disease mechanisms. Traditional computational approaches mainly focus on sequence motifs for PTM prediction, often neglecting the role of protein structure.

MeToken addresses these limitations by integrating both sequence and structural information into unified tokens that represent the micro-environment of each amino acid. The model leverages a large-scale sequence-structure PTM dataset and uses uniform sub-codebooks to handle the long-tail distribution of PTM types, ensuring robust performance even for rare PTMs.

Features

🚀 Integration of Sequence and Structure: MeToken tokenizes the local micro-environment of amino acids, combining sequence motifs and 3D structural information.
⚡ Support for Multiple PTM Types: The model is designed to predict a wide range of PTM types, including rare modifications.

Installation

Clone the repository:

git clone https://github.com/your_username/MeToken.git
cd MeToken

Install dependencies:

conda env create -f environment.yml
conda activate metoken

Download the pretrained model:

We provide a pretrained model for MeToken. Download it here and place it in the pretrained_models directory.

Usage

Inference

To perform PTM prediction on a single PDB file, follow these steps:

Run the inference script:

python inference.py --pdb_file_path examples/Q16613.pdb --predict_indices 31 79 114

--pdb_file_path: Path to the input PDB file (e.g., examples/Q16613.pdb).
--predict_indices: A list of residue indices for which PTM prediction should be made.

Optional arguments:

--checkpoint_path: Specify the path to the model checkpoint (default is pretrained_model/checkpoint.ckpt).
--output_json_path: Path to save prediction results in JSON format (default is output/predict.json).
--output_hdf5_path: Path to save prediction results in HDF5 format (default is output/predict.hdf5).

Example Output: The script will print predictions for the specified positions:

PTM type at position 31 is phosphorylation.

Testing

You can evaluate the model using predefined test datasets.

Set the test dataset path in args within quick_test.ipynb. Available test sets:

./data_test/large_scale_dataset/
./data_test/generalization/PTMint_dataset/
./data_test/generalization/qPTM_dataset/

Run the test notebook:

jupyter notebook quick_test.ipynb

This will provide performance metrics and model evaluation results.

References

For a complete description of the method, see:

TBD

Contact

Please submit any bug reports, feature requests, or general usage feedback as a github issue or discussion.

Cheng Tan ([email protected])
Zhenxiao Cao ([email protected])

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_interface.py		data_interface.py
environment.yml		environment.yml
inference.py		inference.py
model_interface.py		model_interface.py
quick_inference.ipynb		quick_inference.ipynb
quick_test.ipynb		quick_test.ipynb
run_example.sh		run_example.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeToken: Uniform Micro-Environment Token Boosts Post-Translational Modification Prediction

Table of Contents

Introduction

Features

Installation

Usage

Inference

Testing

References

Contact

License

About

Releases 1

Packages

Contributors 2

Languages

License

A4Bio/MeToken

Folders and files

Latest commit

History

Repository files navigation

MeToken: Uniform Micro-Environment Token Boosts Post-Translational Modification Prediction

Table of Contents

Introduction

Features

Installation

Usage

Inference

Testing

References

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages