esm
介绍¶
ESM 系列模型专注于蛋白质序列,通过学习进化信息,能够预测蛋白质的功能和结构,适用于蛋白质设计和功能预测。有 ESM、ESM2、ESM3、ESM-fold 等多种模型。
ESM2¶
https://github.com/facebookresearch/esm模型信息
Shorthand Release Notes ESM-1 Released with Rives et al. 2019 (Aug 2020 update). ESM-1b Released with Rives et al. 2019 (Dec 2020 update). See Appendix B. ESM-MSA-1 Released with Rao et al. 2021 (Preprint v1). ESM-MSA-1b Released with Rao et al. 2021 (ICML'21 version, June 2021). ESM-1v Released with Meier et al. 2021. ESM-IF1 Released with Hsu et al. 2022. ESM-2 Released with Lin et al. 2022.
ESMFold¶
使用 ESMFold 预测蛋白结构
# 需要使用GPU
$ singularity exec --nv /share/Singularity/fair-esm_2.0.0.sif conda run -n py39-esmfold esm-fold -i ghd7.fa -o ghd7
25/06/03 17:18:41 | INFO | root | Reading sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loaded 1 sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loading model
25/06/03 17:19:31 | INFO | root | Starting Predictions
25/06/03 17:19:38 | INFO | root | Predicted structure for ghd7 with length 257, pLDDT 46.9, pTM 0.166 in 7.1s. 1 / 1 completed.
>ghd7
MSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR
参考
https://github.com/biochunan/esmfold-docker-image
ESMFold conda安装、使用及与AlphaFold的简单比较
ESM3¶
https://github.com/evolutionaryscale/esm
安装相关包
pip install --prefix=/public/home/software/opt/bio/software/esm/3.2.0/ esm huggingface_hub
下载模型
# 使用国内的 hf 镜像下载模型,首次使用会下载模型至 ~/.cache/huggingface/hub/
# esm3-sm-open-v1 需要在 hf 页面上申请后,使用自己的 key 下载
$ export HF_ENDPOINT=https://hf-mirror.com
$ huggingface-cli download EvolutionaryScale/esm3-sm-open-v1
使用测试
# 使用下载好的模型
$ export HF_HOME=/public/home/software/opt/models/huggingface/
$ module load esm/3.2.0-py3.11
$ python esm3-test.py
Fetching 22 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 43505.27it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:45<00:00, 13.24s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:44<00:00, 13.06s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:48<00:00, 13.55s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:40<00:00, 12.58s/it]
esm3-test.py
本文阅读量 次from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig
# This will download the model weights and instantiate the model on your machine.
# 选择使用 GPU 或 CPU运行,这里使用 CPU
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cpu") # "gpu" or "cpu"
# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")
本站总访问量 次