跳转至

Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA and RNA data. It delivers results at blazing fast speeds and low cost. Clara Parabricks can analyze 30x WGS (whole human genome) data in about 25 minutes, instead of 30 hours for other methods. Its output matches commonly used software, making it fairly simple to verify the accuracy of the output.Parabricks achieves this performance through tight integration with GPUs, which excel at performing data-parallel computation much more effectively than traditional CPU-based solutions.

介绍

Clara Parabricks是英伟达基于GPU卡开发用于加速call变异的工具套件,支持GATK haplotypecaller和deepvariant 2种call 变异的方式,相比原版速度有大幅提升。

从v4.0开始,学术机构用户可免费使用。

官网:https://www.nvidia.com/en-us/clara/genomics/

官方文档:https://docs.nvidia.com/clara/parabricks/

官方论坛:https://forums.developer.nvidia.com/c/healthcare/parabricks/290

镜像地址:https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/clara-parabricks

支持的工具组件

Tool Details
applybqsr Apply BQSR report to a BAM file and generate new BAM file
bam2fq Convert a BAM file to FASTQ
bammetrics Collect WGS Metrics on a BAM file
bamsort Sort a BAM file
bqsr Collect BQSR report on a BAM file
collectmultiplemetrics Collect multiple classes of metrics on a BAM file
dbsnp Annotate variants based on a dbSNP
deepvariant Run GPU-DeepVariant for calling germline variants
fq2bam Run bwa mem, coordinate sorting, marking duplicates, and Base Quality Score Recalibration
genotypegvcf Convert a GVCF to VCF
haplotypecaller Run GPU-HaplotypeCaller for calling germline variants
indexgvcf Index a GVCF file
mutectcaller Run GPU-Mutect2 for tumor-normal analysis
postpon Generate the final VCF output of doing mutect PON
prepon Build an index for a PON file, which is the prerequisite to performing mutect PON
rna_fq2bam Run RNA-seq data through the fq2bam pipeline
starfusion Identify candidate fusion transcripts supported by Illumina reads

使用举例

基本使用

fq2bam 输入fq文件、输出排序去重后的bam文件。

haplotypecaller 输入bam文件、输出vcf或gvcf文件。

#BSUB -J parabricks
#BSUB -n 5
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=12G"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu
#BSUB -m gpu01

module load Singularity/3.7.3

# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.0.1-1.sif   pbrun fq2bam  --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz    ../ERR194146_2.fastq.gz  --out-bam ERR194146.deduped.bam

# haplotypecaller, bam->vcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller  --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

# 多样本call gvcf
# haplotypecaller, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller  --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

分染色体

部分基因组较大或深度较深的数据,运行 haplotypecaller 时可能会出现显存不够的报错 Out of memory,此时可以分染色体来跑,最后再合并。以人的样本为例:

# 制作bed文件
$ cat  hg38.fa.fai |awk '{print $1"\t0\t"$2}' > hg38_all.bed
# 将大染色体分别分到单独的bed文件中,零碎的contig分到一个bed文件中
$ for i in {1..24};do cat hg38_all.bed |grep  -w chr${i} > hg38_chr${i}.bed ;done
$ cat hg38_all.bed |grep -e _  -e chrX -e chrY > hg38_other.bed
lsf脚本 ```bash #BSUB -J parabricks

BSUB -n 5

BSUB -R span[hosts=1]

BSUB -gpu "num=1:gmem=12G"

BSUB -o %J.out

BSUB -e %J.err

BSUB -q gpu

BSUB -m gpu01

module load Singularity/3.7.3 module load GATK/4.5.0.0

for i in {chr{1..22},other};do singularity exec --nv IMAGE/clara-parabricks/4.0.1-1.sif pbrun haplotypecaller --interval-file hg38_.bed --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146_{i}.g.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146_.log done

merge all config gvcf to one gvcf

gatk CombineGVCFs -R hg38.fa $(ls *.g.vcf*gz|xargs -i echo "--variant {}") -O ERR194146.g.vcf.gz ```

性能测试

使用数据为人30x WGS样本数据,单节点36核,2张P100 GPU卡。

软件 时间(h) 最大内存(GB) 加速倍数
bwa+GATK 32 124 1
sentieon 4.1 32 7.8
gtxcat 3.1 103 10.3
parabricks haplotypecaller 2.9 93G 11

不同硬件,call变异时间(haplotypecaller),使用数据为人30x WGS样本数据。

硬件 时间 最大内存(GB)
2张P100 24min 32G
1张4090 19min -
2张3090 21min -

最佳实践

鉴于计算平台GPU数量较少,大批量的群体数据,不建议跑call变异的全流程都在GPU上跑运行(约3h),建议前面比对(bwa-mem2)和去重部分在普通节点上进行,最后call变异的步骤在GPU上运行(约20min),可以提高计算通量。

由于parabricks没有合并gvcf的功能,因此对于群体gvcf数据,可以使用glnexus来合并,具体使用见 glnexus

本文阅读量  次
本站总访问量  次