跳转至

esm

介绍

ESM 系列模型专注于蛋白质序列,通过学习进化信息,能够预测蛋白质的功能和结构,适用于蛋白质设计和功能预测。有 ESM、ESM2、ESM3、ESM-fold 等多种模型。

ESM2

https://github.com/facebookresearch/esm

模型信息
Shorthandesm.pretrained.#layers#paramsDatasetEmbedding DimModel URL (automatically downloaded to ~/.cache/torch/hub/checkpoints)
ESM-2esm2_t48_15B_UR50D4815BUR50/D 2021_045120https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t48_15B_UR50D.pt
esm2_t36_3B_UR50D363BUR50/D 2021_042560https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt
esm2_t33_650M_UR50D33650MUR50/D 2021_041280https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
esm2_t30_150M_UR50D30150MUR50/D 2021_04640https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t30_150M_UR50D.pt
esm2_t12_35M_UR50D1235MUR50/D 2021_04480https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t12_35M_UR50D.pt
esm2_t6_8M_UR50D68MUR50/D 2021_04320https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t6_8M_UR50D.pt
ESMFoldesmfold_v148 (+36)690M (+3B)UR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt
esmfold_v048 (+36)690M (+3B)UR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v0.pt
esmfold_structure_module_only_*0 (+various)variousUR50/D 2021_04-https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_structure_module_only_*
ESM-IF1esm_if1_gvp4_t16_142M_UR5020124MCATH 4.3 + predicted structures for UR50512https://dl.fbaipublicfiles.com/fair-esm/models/esm_if1_gvp4_t16_142M_UR50.pt
ESM-1vesm1v_t33_650M_UR90S_[1-5]33650MUR90/S 2020_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1v_t33_650M_UR90S_1.pt
ESM-MSA-1besm_msa1b_t12_100M_UR50S12100MUR50/S + MSA 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm_msa1b_t12_100M_UR50S.pt
ESM-MSA-1esm_msa1_t12_100M_UR50S12100MUR50/S + MSA 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm_msa1_t12_100M_UR50S.pt
ESM-1besm1b_t33_650M_UR50S33650MUR50/S 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1b_t33_650M_UR50S.pt
ESM-1esm1_t34_670M_UR50S34670MUR50/S 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR50S.pt
esm1_t34_670M_UR50D34670MUR50/D 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR50D.pt
esm1_t34_670M_UR10034670MUR100 2018_031280https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t34_670M_UR100.pt
esm1_t12_85M_UR50S1285MUR50/S 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t12_85M_UR50S.pt
esm1_t6_43M_UR50S643MUR50/S 2018_03768https://dl.fbaipublicfiles.com/fair-esm/models/esm1_t6_43M_UR50S.pt
ShorthandRelease Notes
ESM-1Released with Rives et al. 2019 (Aug 2020 update).
ESM-1bReleased with Rives et al. 2019 (Dec 2020 update). See Appendix B.
ESM-MSA-1Released with Rao et al. 2021 (Preprint v1).
ESM-MSA-1bReleased with Rao et al. 2021 (ICML'21 version, June 2021).
ESM-1vReleased with Meier et al. 2021.
ESM-IF1Released with Hsu et al. 2022.
ESM-2Released with Lin et al. 2022.

ESMFold

使用 ESMFold 预测蛋白结构

# 需要使用GPU
$ singularity exec --nv /share/Singularity/fair-esm_2.0.0.sif conda run -n py39-esmfold esm-fold -i ghd7.fa -o ghd7

25/06/03 17:18:41 | INFO | root | Reading sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loaded 1 sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loading model
25/06/03 17:19:31 | INFO | root | Starting Predictions
25/06/03 17:19:38 | INFO | root | Predicted structure for ghd7 with length 257, pLDDT 46.9, pTM 0.166 in 7.1s. 1 / 1 completed.
测试序列
>ghd7
MSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR

参考

https://github.com/biochunan/esmfold-docker-image

ESMFold conda安装、使用及与AlphaFold的简单比较

ESM3

https://github.com/evolutionaryscale/esm

安装相关包

pip install --prefix=/public/home/software/opt/bio/software/esm/3.2.0/ esm huggingface_hub

下载模型

# 使用国内的 hf 镜像下载模型,首次使用会下载模型至 ~/.cache/huggingface/hub/ 
# esm3-sm-open-v1 需要在 hf 页面上申请后,使用自己的 key 下载
$ export HF_ENDPOINT=https://hf-mirror.com
$ huggingface-cli download EvolutionaryScale/esm3-sm-open-v1

使用测试

# 使用下载好的模型
$ export HF_HOME=/public/home/software/opt/models/huggingface/
$ module load esm/3.2.0-py3.11
$ python esm3-test.py
Fetching 22 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 43505.27it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:45<00:00, 13.24s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:44<00:00, 13.06s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:48<00:00, 13.55s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:40<00:00, 12.58s/it]
esm3-test.py
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# This will download the model weights and instantiate the model on your machine.
# 选择使用 GPU 或 CPU运行,这里使用 CPU
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cpu") # "gpu" or "cpu"

# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")

ESMFold2

配置环境

$ module load Python/3.11.4
$ python3.11 -m venv esmfold2
$ source esmfold2/bin/activate
$ pip install -U pip setuptools wheel
安装依赖
$ pip install torch==2.6.0 numpy==2.1.3 scipy==1.15.3 pandas==2.2.3 scikit-learn==1.5.2 
$ pip install accelerate attrs boto3 brotli cloudpathlib dna_features_viewer einops huggingface_hub ipywidgets msgpack-numpy py3dmol pydssp pygtrie tenacity zstd rdkit biopython biotite
$ pip install "git+https://github.com/Biohub/transformers.git@main"
安装推理框架
$ git clone https://github.com/Biohub/esm.git
$ cd esm
# 将 pyproject.toml 中的 requires-python = ">=3.12,<3.13" 改为 requires-python = ">=3.11,<3.13"

# 安装
$ pip install --no-deps -e .

# 模块测试
$ python -c "import esm;print('esm ok')"
下载模型,亦可使用集群上下载好的模型
# 使用国内的 hf 镜像下载模型,默认下载路径 ~/.cache/huggingface/hub/,也可使用变量 HF_HOME 自定义下载路径
$ export HF_ENDPOINT=https://hf-mirror.com
$ hf download biohub/ESMFold2 
$ hf download biohub/ESMC-6B
测试代码,这里使用下载好的模型
from esm.models.esmfold2 import (
    ESMFold2InputBuilder,
    ProteinInput,
    StructurePredictionInput,
)
from transformers.models.esmfold2.modeling_esmfold2 import ESMFold2Model

MODEL_DIR = "/public/home/software/opt/models/huggingface/hub/models--biohub--ESMFold2/snapshots/e1e189d0f5fb70c2693da2332eca4443c0ccccd6/"

seq = "MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDD"

model = ESMFold2Model.from_pretrained(
    MODEL_DIR,
    local_files_only=True,
).cuda().eval()

spi = StructurePredictionInput(
    sequences=[
        ProteinInput(id="A", sequence=seq)
    ]
)

result = ESMFold2InputBuilder().fold(
    model,
    spi,
    num_loops=3,
    num_sampling_steps=32,
    num_diffusion_samples=1,
    seed=0,
)

print(f"pLDDT mean: {float(result.plddt.mean()):.3f}")
print(f"pTM: {float(result.ptm):.3f}")
print(f"ipTM: {float(result.iptm):.3f}")

with open("result.cif", "w") as f:
    f.write(result.complex.to_mmcif())
运行测试代码
# 设置环境变量
$ export HF_HUB_OFFLINE=1
$ export TRANSFORMERS_OFFLINE=1
$ export HF_HOME=/public/home/software/opt/models/huggingface/

# 运行
$ python esmfold2_local.py
本文阅读量  次
本站总访问量  次