跳转至

esm

介绍

ESM 系列模型专注于蛋白质序列,通过学习进化信息,能够预测蛋白质的功能和结构,适用于蛋白质设计和功能预测。有 ESM、ESM2、ESM3、ESM-fold 等多种模型。

ESM2

https://github.com/facebookresearch/esm

模型信息
Shorthandesm.pretrained.#layers#paramsDatasetEmbedding DimModel URL (automatically downloaded to ~/.cache/torch/hub/checkpoints)
ESM-2esm2_t48_15B_UR50D4815BUR50/D 2021_045120https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t48_15B_UR50D.pt
esm2_t36_3B_UR50D363BUR50/D 2021_042560https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt
esm2_t33_650M_UR50D33650MUR50/D 2021_041280https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
esm2_t30_150M_UR50D30150MUR50/D 2021_04640https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t30_150M_UR50D.pt
esm2_t12_35M_UR50D1235MUR50/D 2021_04480https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t12_35M_UR50D.pt
esm2_t6_8M_UR50D68MUR50/D 2021_04320https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t6_8M_UR50D.pt
ESMFoldesmfold_v148 (+36)690M (+3B)UR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt
esmfold_v048 (+36)690M (+3B)UR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v0.pt
esmfold_structure_module_only_*0 (+various)variousUR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_structure_module_only_*
ESM-IF1esm_if1_gvp4_t16_142M_UR5020124MCATH 4.3 + predicted structures for UR50512https://dl.fbaipublicfiles.com/fair-esm/models/esm_if1_gvp4_t16_142M_UR50.pt
ESM-1vesm1v_t33_650M_UR90S_[1-5]33650MUR90/S 2020_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1v_t33_650M_UR90S_1.pt
ESM-MSA-1besm_msa1b_t12_100M_UR50S12100MUR50/S + MSA 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm_msa1b_t12_100M_UR50S.pt
ESM-MSA-1esm_msa1_t12_100M_UR50S12100MUR50/S + MSA 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm_msa1_t12_100M_UR50S.pt
ESM-1besm1b_t33_650M_UR50S33650MUR50/S 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1b_t33_650M_UR50S.pt
ESM-1esm1_t34_670M_UR50S34670MUR50/S 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR50S.pt
esm1_t34_670M_UR50D34670MUR50/D 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR50D.pt
esm1_t34_670M_UR10034670MUR100 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR100.pt
esm1_t12_85M_UR50S1285MUR50/S 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t12_85M_UR50S.pt
esm1_t6_43M_UR50S643MUR50/S 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t6_43M_UR50S.pt
ShorthandRelease Notes
ESM-1Released with Rives et al. 2019 (Aug 2020 update).
ESM-1bReleased with Rives et al. 2019 (Dec 2020 update). See Appendix B.
ESM-MSA-1Released with Rao et al. 2021 (Preprint v1).
ESM-MSA-1bReleased with Rao et al. 2021 (ICML'21 version, June 2021).
ESM-1vReleased with Meier et al. 2021.
ESM-IF1Released with Hsu et al. 2022.
ESM-2Released with Lin et al. 2022.

ESMFold

使用 ESMFold 预测蛋白结构

# 需要使用GPU
$ singularity exec --nv /share/Singularity/fair-esm_2.0.0.sif conda run -n py39-esmfold esm-fold -i ghd7.fa -o ghd7

25/06/03 17:18:41 | INFO | root | Reading sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loaded 1 sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loading model
25/06/03 17:19:31 | INFO | root | Starting Predictions
25/06/03 17:19:38 | INFO | root | Predicted structure for ghd7 with length 257, pLDDT 46.9, pTM 0.166 in 7.1s. 1 / 1 completed.
测试序列
>ghd7
MSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR

参考

https://github.com/biochunan/esmfold-docker-image

ESMFold conda安装、使用及与AlphaFold的简单比较

ESM3

https://github.com/evolutionaryscale/esm

安装相关包

pip install --prefix=/public/home/software/opt/bio/software/esm/3.2.0/ esm huggingface_hub

下载模型

# 使用国内的 hf 镜像下载模型,首次使用会下载模型至 ~/.cache/huggingface/hub/ 
# esm3-sm-open-v1 需要在 hf 页面上申请后,使用自己的 key 下载
$ export HF_ENDPOINT=https://hf-mirror.com
$ huggingface-cli download EvolutionaryScale/esm3-sm-open-v1

使用测试

# 使用下载好的模型
$ export HF_HOME=/public/home/software/opt/models/huggingface/
$ module load esm/3.2.0-py3.11
$ python esm3-test.py
Fetching 22 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 43505.27it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:45<00:00, 13.24s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:44<00:00, 13.06s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:48<00:00, 13.55s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:40<00:00, 12.58s/it]
esm3-test.py
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# This will download the model weights and instantiate the model on your machine.
# 选择使用 GPU 或 CPU运行,这里使用 CPU
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cpu") # "gpu" or "cpu"

# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")
本文阅读量  次
本站总访问量  次