跳转至

open-omics-alphafold

open-omics-alphafold为intel官方在intel xeon cpu上优化过的alphafold,推理过程基于intel第4代可扩展处理器(sapphire rapids)AVX512 FP32 和 AMX-BF16,加速效果堪比NVIDIA A100。

官方代码库:https://github.com/IntelLabs/open-omics-alphafold

官方测试数据 Intel Xeon is all you need for AI inference: Performance Leadership on Real World Applications

安装配置

Note

安装配置过程中需要多次从GitHub上下载代码,如果出现网络问题无法下载的情况,可以使用GitHub镜像,具体使用见 github镜像

  1. 安装配置 oneAPI BaseKit,具体见 oneAPI

  2. 下载代码库

    安装目录为 ~/opt/alphafold/

    $ cd ~/opt/alphafold/
    $ git clone https://github.com/IntelLabs/open-omics-alphafold.git
    
  3. 使用mamba安装依赖

    $ cd ~/opt/alphafold/open-omics-alphafold/
    
    # 由于安装过程中存在冲突问题,删除conda_requirements.yml中的 jax==0.4.8 jaxlib==0.4.7
    $ micromamba env create -f conda_requirements.yml
    $ micromamba activate iaf2
    (iaf2) $
    # update submodules
    (iaf2) $ git submodule update --init --recursive
    

  4. 载入oneAPI和高版本GCC(GCC >= 9.4.0)环境,以及cmake。这里使用集群上已经安装好的。
    (iaf2) $ source  /public/home/software/opt/bio/software/oneAPI_BaseKit/2024.0.1.46/setvars.sh intel64
    (iaf2) $ source  /public/home/software/opt/bio/software/oneAPI_BaseKit/2024.0.1.46/tbb/latest/env/vars.sh
    (iaf2) $ module load GCC/9.4.0 CMake/3.19.8 
    (iaf2) $ export CC=/public/home/software/opt/bio/software/GCC/9.4.0/bin/gcc
    
  5. 安装AVX-512优化版hh-suite
    (iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
    
    (iaf2) $ git clone --recursive https://github.com/IntelLabs/hh-suite.git
    (iaf2) $ cd hh-suite
    (iaf2) $ mkdir build && cd build
    
    # 由于集群CPU是skylake,将icelake-server换成skylake-avx512,否则会出现报错,如果是其他CPU型号,则需要更换对应的产品代码
    (iaf2) $ cmake -DCMAKE_INSTALL_PREFIX=`pwd`/release -DCMAKE_CXX_COMPILER="icpx" -DCMAKE_CXX_FLAGS_RELEASE="-O3 -march=skylake-avx512" ..
    (iaf2) $ make -j 4 && make install
    (iaf2) $ ./release/bin/hhblits -h
    
    测试安装的hh-suite,./input//example.fa 为输入文件。如果下面的程序运行没问题则表示hh-suite安装正常,否则会在运行过程中报错 Illegal instruction
    (iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
    (iaf2) $ ./hh-suite/build/release/bin/hhblits -i ./input//example.fa -cpu 20 -oa3m /tmp/tmpthcmjqar_output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /public/home/software/opt/alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /public/home/software/opt/alphafold_data//uniref30/UniRef30_2021_03
    
  6. 安装AVX-512优化版hmmer
    (iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
    (iaf2) $ git clone --recursive https://github.com/IntelLabs/hmmer.git
    (iaf2) $ cd hmmer
    
    # 这里相比官网命令,少了 `make clean`,否则会报错
    (iaf2) $ cd easel && autoconf && ./configure --prefix=`pwd` && cd ..
    (iaf2) $ autoconf && CC=icx CFLAGS="-O3 -march=icelake-server -fPIC" ./configure --prefix=`pwd`/release
    (iaf2) $ make -j 4 && make install
    (iaf2) $ ./release/bin/jackhmmer -h
    
  7. 安装TPP优化版AlphaFold2 Attention模块
    (iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
    
    (iaf2) $ git clone https://github.com/libxsmm/tpp-pytorch-extension
    (iaf2) $ cd tpp-pytorch-extension
    (iaf2) $ git submodule update --init
    (iaf2) $ python setup.py install
    (iaf2) $ python -c "from tpp_pytorch_extension.alphafold.Alpha_Attention import GatingAttentionOpti_forward"
    
  8. extract weights
    (iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
    (iaf2) $ mkdir weights && mkdir weights/extracted
    
    # 这里下载的alphafold2的数据库路径为/public/home/software/opt/alphafold_data/
    (iaf2) $ python extract_params.py --input /public/home/software/opt/alphafold_data/params/params_model_1.npz --output_dir ./weights/extracted/model_1
    

测试安装环境

创建测试目录和相关数据

(iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/

(iaf2) mkdir test
(iaf2) mkdir test/{input,output}
# 测试你数据
(iaf2) cat test/input/example.fa
>example file
ATGCCGCATGGTCGTC
序列前处理(多序列比对和模板搜索)
(iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/

#官方命令 bash online_preproc_baremetal.sh <root_home> <data-dir> <input-dir> <output-dir>
(iaf2) $ bash online_preproc_baremetal.sh ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/
运行推理过程,预测未优化的蛋白结构(unrelaxed)
(iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/

# 官方命令
# bash online_inference_baremetal.sh <conda_env_path> <root_home> <data-dir> <input-dir> <output-dir> <model_name>
(iaf2) $ bash online_inference_baremetal.sh iaf2 ~/micromamba/envs/iaf2 ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/ model_1
使用 Amber-Relax 优化结构
(iaf2) $ cd ~/opt/alphafold/open-omics-alphafold/
# 下载 stereo_chemical_props.txt 文件
(iaf2) $ wget -q -P ./alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate

# 官方命令,这里多了一个参数 <conda_env_path>,
# bash one_amber.sh <conda_env_path> <root_home> <data-dir> <input-dir> <output-dir> <model_name>
(iaf2) $ bash one_amber.sh  ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/ model_1

集群上运行

#BSUB -J alphafold
#BSUB -n 4
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal

module load micromamba/1.3.0

eval "$(micromamba shell hook --shell=bash)"
micromamba activate iaf2

source  /public/home/software/opt/bio/software/oneAPI_BaseKit/2024.0.1.46/setvars.sh intel64
module load GCC/9.4.0

# 需要在open-omics-alphafold安装目录中运行
cd ~/opt/alphafold/open-omics-alphafold/
echo "alphafold start"
date

echo "preproc"

time bash online_preproc_baremetal.sh ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/

echo "inference"

time bash online_inference_baremetal.sh ~/micromamba/envs/iaf2 ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/ model_1

echo "amber relax"

time bash one_amber.sh  ~/opt/alphafold/open-omics-alphafold/  /public/home/software/opt/alphafold_data/ test/input/ test/output/ model_1

date
echo "alphafold end"

使用集群上的open-omics-alphafold

open-omics-alphafold安装比较麻烦,github下载代码也不稳定,可直接使用集群上已经配置好的版本。

Warning

由于CPU指令集限制,集群上只能使用 normal 队列、high队列、smp队列中的s001-s004节点才能运行 open-omics-alphafold,否则会出现报错或运行速度很慢的情况。

环境准备

# IAF2的工作目录
$ mkdir IAF2_work
$ export IAF2WORK=${PWD}/IAF2_work/
$ cd IAF2_work

# 存放IAF2输入和输出文件目录
$ mkdir data
$ mkdir data/input
$ mkdir data/output
# 其中example.fa为输入文件
$ tree data/
data/
├── input
   └── example.fa
└── output

$ module load open-omics-alphafold/1.0
$ ln -s ${IAF2ROOT}/iaf2 .
$ ln -s ${IAF2ROOT}/open-omics-alphafold .

运行

#BSUB -J alphafold
#BSUB -n 4
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal

cd $IAF2WORK

module load open-omics-alphafold/1.0
module load micromamba/1.3.0

eval "$(micromamba shell hook --shell=bash)"
micromamba activate $IAF2ROOT/iaf2

source  /public/home/software/opt/bio/software/oneAPI_BaseKit/2024.0.1.46/setvars.sh intel64

# 需要在open-omics-alphafold安装目录中运行
cd ${IAF2ROOT}/open-omics-alphafold

echo "alphafold start"
date

echo "preproc"

time bash online_preproc_baremetal.sh ${IAF2ROOT}/open-omics-alphafold  /public/home/software/opt/alphafold_data/ $IAF2WORK/data/input/ $IAF2WORK/data/output/

echo "inference"

time bash online_inference_baremetal.sh ${IAF2ROOT}/iaf2 ${IAF2ROOT}/open-omics-alphafold  /public/home/software/opt/alphafold_data/ $IAF2WORK/data/input/ $IAF2WORK/data/output/ model_1

echo "amber relax"

time bash one_amber.sh  ${IAF2ROOT}/open-omics-alphafold  /public/home/software/opt/alphafold_data/ $IAF2WORK/data/input/ $IAF2WORK/data/output/ model_1

date
echo "alphafold end"

benchmark

使用与alphafold benchmark时一致的蛋白序列,大约250bp。

>ghd7
MSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR
软件 硬件 运行时间(s) 最大内存(MB) 加速倍数
alphafold2 CPU(intel 6150) 232305 15024 1
open-omics-alphafold CPU(intel 6150) 4996 21668 46
alphafold2 GPU(P100) 4523 13711 51
本文阅读量  次
本站总访问量  次