esm
介绍¶
ESM 系列模型专注于蛋白质序列,通过学习进化信息,能够预测蛋白质的功能和结构,适用于蛋白质设计和功能预测。有 ESM、ESM2、ESM3、ESM-fold 等多种模型。
ESM2¶
https://github.com/facebookresearch/esm模型信息
Shorthand Release Notes ESM-1 Released with Rives et al. 2019 (Aug 2020 update). ESM-1b Released with Rives et al. 2019 (Dec 2020 update). See Appendix B. ESM-MSA-1 Released with Rao et al. 2021 (Preprint v1). ESM-MSA-1b Released with Rao et al. 2021 (ICML'21 version, June 2021). ESM-1v Released with Meier et al. 2021. ESM-IF1 Released with Hsu et al. 2022. ESM-2 Released with Lin et al. 2022.
ESMFold¶
使用 ESMFold 预测蛋白结构
# 需要使用GPU
$ singularity exec --nv /share/Singularity/fair-esm_2.0.0.sif conda run -n py39-esmfold esm-fold -i ghd7.fa -o ghd7
25/06/03 17:18:41 | INFO | root | Reading sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loaded 1 sequences from ghd7.fa
25/06/03 17:18:41 | INFO | root | Loading model
25/06/03 17:19:31 | INFO | root | Starting Predictions
25/06/03 17:19:38 | INFO | root | Predicted structure for ghd7 with length 257, pLDDT 46.9, pTM 0.166 in 7.1s. 1 / 1 completed.
>ghd7
MSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR
参考
https://github.com/biochunan/esmfold-docker-image
ESMFold conda安装、使用及与AlphaFold的简单比较
ESM3¶
https://github.com/evolutionaryscale/esm
安装相关包
pip install --prefix=/public/home/software/opt/bio/software/esm/3.2.0/ esm huggingface_hub
下载模型
# 使用国内的 hf 镜像下载模型,首次使用会下载模型至 ~/.cache/huggingface/hub/
# esm3-sm-open-v1 需要在 hf 页面上申请后,使用自己的 key 下载
$ export HF_ENDPOINT=https://hf-mirror.com
$ huggingface-cli download EvolutionaryScale/esm3-sm-open-v1
使用测试
# 使用下载好的模型
$ export HF_HOME=/public/home/software/opt/models/huggingface/
$ module load esm/3.2.0-py3.11
$ python esm3-test.py
Fetching 22 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 43505.27it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:45<00:00, 13.24s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:44<00:00, 13.06s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:48<00:00, 13.55s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:40<00:00, 12.58s/it]
esm3-test.py
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig
# This will download the model weights and instantiate the model on your machine.
# 选择使用 GPU 或 CPU运行,这里使用 CPU
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cpu") # "gpu" or "cpu"
# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")
ESMFold2¶
配置环境
$ module load Python/3.11.4
$ python3.11 -m venv esmfold2
$ source esmfold2/bin/activate
$ pip install -U pip setuptools wheel
$ pip install torch==2.6.0 numpy==2.1.3 scipy==1.15.3 pandas==2.2.3 scikit-learn==1.5.2
$ pip install accelerate attrs boto3 brotli cloudpathlib dna_features_viewer einops huggingface_hub ipywidgets msgpack-numpy py3dmol pydssp pygtrie tenacity zstd rdkit biopython biotite
$ pip install "git+https://github.com/Biohub/transformers.git@main"
$ git clone https://github.com/Biohub/esm.git
$ cd esm
# 将 pyproject.toml 中的 requires-python = ">=3.12,<3.13" 改为 requires-python = ">=3.11,<3.13"
# 安装
$ pip install --no-deps -e .
# 模块测试
$ python -c "import esm;print('esm ok')"
# 使用国内的 hf 镜像下载模型,默认下载路径 ~/.cache/huggingface/hub/,也可使用变量 HF_HOME 自定义下载路径
$ export HF_ENDPOINT=https://hf-mirror.com
$ hf download biohub/ESMFold2
$ hf download biohub/ESMC-6B
from esm.models.esmfold2 import (
ESMFold2InputBuilder,
ProteinInput,
StructurePredictionInput,
)
from transformers.models.esmfold2.modeling_esmfold2 import ESMFold2Model
MODEL_DIR = "/public/home/software/opt/models/huggingface/hub/models--biohub--ESMFold2/snapshots/e1e189d0f5fb70c2693da2332eca4443c0ccccd6/"
seq = "MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDD"
model = ESMFold2Model.from_pretrained(
MODEL_DIR,
local_files_only=True,
).cuda().eval()
spi = StructurePredictionInput(
sequences=[
ProteinInput(id="A", sequence=seq)
]
)
result = ESMFold2InputBuilder().fold(
model,
spi,
num_loops=3,
num_sampling_steps=32,
num_diffusion_samples=1,
seed=0,
)
print(f"pLDDT mean: {float(result.plddt.mean()):.3f}")
print(f"pTM: {float(result.ptm):.3f}")
print(f"ipTM: {float(result.iptm):.3f}")
with open("result.cif", "w") as f:
f.write(result.complex.to_mmcif())
# 设置环境变量
$ export HF_HUB_OFFLINE=1
$ export TRANSFORMERS_OFFLINE=1
$ export HF_HOME=/public/home/software/opt/models/huggingface/
# 运行
$ python esmfold2_local.py
本站总访问量 次