跳转至

parabricks

Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA and RNA data. It delivers results at blazing fast speeds and low cost. Clara Parabricks can analyze 30x WGS (whole human genome) data in about 25 minutes, instead of 30 hours for other methods. Its output matches commonly used software, making it fairly simple to verify the accuracy of the output.Parabricks achieves this performance through tight integration with GPUs, which excel at performing data-parallel computation much more effectively than traditional CPU-based solutions.

介绍

Clara Parabricks是英伟达基于GPU卡开发用于加速序列比对、变异检测等生物信息上游分析流程的工具套件,支持GATK haplotypecaller 和 deepvariant 2 种变异检测方式,相比原版速度有大幅提升。

从v4.0开始,学术机构用户可免费使用。

官网:https://www.nvidia.com/en-us/clara/genomics/

官方文档:https://docs.nvidia.com/clara/parabricks/

官方论坛:https://forums.developer.nvidia.com/c/healthcare/parabricks/290

镜像地址:https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/clara-parabricks

支持的工具组件

ToolDetails
applybqsrApply BQSR report to a BAM file and generate new BAM file
bam2fqConvert a BAM file to FASTQ
bammetricsCollect WGS Metrics on a BAM file
bamsortSort a BAM file
bqsrCollect BQSR report on a BAM file
collectmultiplemetricsCollect multiple classes of metrics on a BAM file
dbsnpAnnotate variants based on a dbSNP
deepvariantRun GPU-DeepVariant for calling germline variants
fq2bamRun bwa mem, coordinate sorting, marking duplicates, and Base Quality Score Recalibration
genotypegvcfConvert a GVCF to VCF
haplotypecallerRun GPU-HaplotypeCaller for calling germline variants
indexgvcfIndex a GVCF file
mutectcallerRun GPU-Mutect2 for tumor-normal analysis
postponGenerate the final VCF output of doing mutect PON
preponBuild an index for a PON file, which is the prerequisite to performing mutect PON
rna_fq2bamRun RNA-seq data through the fq2bam pipeline
starfusionIdentify candidate fusion transcripts supported by Illumina reads

版本说明

4.6.0-1

  • New variant caller pangenome_aware_deepvariant for GPU-accelerated Pangenome-aware DeepVariant variant calling. 支持泛基因组变异检测

  • Performance improvement in giraffe, rna_fq2bam, minimap2, fq2bam, and fq2bam_meth.

  • New features added in mutectcaller: mitochondrial mode, pileup detection and so on.

  • Updated deepvariant and deepsomatic to match with baseline Google version 1.9.

  • Added support for --mode ont and --mode pacbio in deepsomatic.

  • Added WES support in deepsomatic.

4.5.1-1

  • New features added in rna_fq2bam: single-cell mode, quant mode and batch mode. rna_fq2bam 新增单细胞模式

  • --mutect-f1r2-tar-gz support for mutectcaller.

  • Added support for haplotype sampling in giraffe, enabling personalized reference * pangenome creation per sample.

  • vcf.gz output support for all Parabricks variant callers.

  • New options for RNA-seq data in minimap2.

  • Updated base container to CUDA 12.9.

4.5.0-1

  • Support for the following NVIDIA Blackwell GPU architectures: SM_100 and SM_120.

  • As of v4.5 we are dropping support for the Volta (V100) line of GPUs.

  • Splice-aware support and performance improvements for minimap2.

  • Performance improvements for giraffe.

  • Marking duplicates is now on by default for giraffe. Use --no-markdups to turn off marking duplicates.

  • Faster rna_fq2bam with --num-streams-per-gpu.

  • Performance improvements for force-calling mode in haplotypecaller and mutectcaller.

4.4.0-1

  • We are introducing a new tool, the giraffe pangenome graph-based aligner for short reads. 新增泛基因组比对工具 giraffe

  • Unified 'multi-architecture' Docker containers for both Grace Hopper and AMD64 architectures

4.3.2-1

  • New pipeline, ont_germline for long read ONT variant calling. 支持 ont 数据变异检测

  • minimap2 bug fixes and new features.

  • Further optimizations and improvements for the Grace Hopper architecture.

  • Introduced GPU acceleration for CRAM file write operations.

4.3.1-1

  • New tool, deepsomatic for somatic variant calling.

  • The latest Parabricks toolkit (v4.3.1) is now fully supported on Grace Hopper.

  • deepvariant version 1.6.1 updated, new option added.

  • fq2bamfast and fq2bam: Bug fixes and performance improvements. Additional option to monitor approximate CPU utilization and host memory usage during execution (--monitor-usage).

4.3.0-1

  • New tool, fq2bam_meth for accelerated DNA methylation analysis. 支持甲基化数据比对 fq2bam_meth (基于bwa-meth)

  • Germline resource mode and force calling mode are supported in mutectcaller.

  • Support for writing CRAM files using queryname-based sorting has been added into bamsort.

  • Parabricks toolkit (v4.2) is now fully supported on Grace Hopper, see NVIDIA Grace CPU Platforms.

  • deepvariant version 1.6 updated.

4.2.1-1

  • Two new beta tools have been released: fq2bamfast and minimap2. 新增minimap2

  • A new beta pipeline has been released: pacbio_germline. 新增 pacbio 数据变异检测

  • Performance improvements include fq2bamfast, a faster version of the existing fq2bam and the ability to run the variant caller in germline and deepvariant_germline in parallel with BAM generation.

4.2.0-1

  • Several packages have been updated to GATK 4.3. Support for read lengths of up to 500 base pairs has been added to haplotypecaller and mutectcaller.

  • Added markdup, a tool to locate and tag duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

4.1.1-1

4.1.0-1

4.0.0-1

使用举例

Note

群体数据比对和变异检测,为了方便后续做 joint call,建议 使用 --in-fq ../ERR194146_1.fastq.gz ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1" 代替 --in-fq ../ERR194146_1.fastq.gz ../ERR194146_2.fastq.gz,即加入样本信息,否则使用 glnexus 时会出现报错 sample already exists;

对于已有的、没有样本信息的 gvcf 文件,可以参考这个文档修改 gvcf 文件,glnexus 错误处理

Note

参考基因组解压后再使用。使用压缩的参考基因组在变异检测时会报错退出([-unknown-:0] Received signal: 11)。

比对

fq2bam

Parabricks fq2bam 可以一行命令完成比对(bwa mem)、排序(gatk SortSam)、去重(gatk MarkDuplicates,可选项,默认开启)并输出排序去重后的 bam 文件。Parabricks fq2bam 利用 GPU 加速软件运行,相比开源版本速度有非常显著的提升,在大多数场景下可以替代 bwa mem

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=11G:j_exclusive=yes"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu

module load Singularity/3.7.3

# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.4.0-1.sif   pbrun fq2bam  --low-memory --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1" --out-bam ERR194146.deduped.bam

rna_fq2bam

Parabricks rna_fq2bam 可完成 RNA-Seq 数据的比对排序(STAR)、去重(gatk MarkDuplicates,可选项,默认开启)并输出排序去重后的 bam 文件。

fq2bam_meth

Parabricks fq2bam_meth 可完成 甲基化数据(BS-Seq)的比对(bwa-meth)、排序(gatk SortSam)、去重(gatk MarkDuplicates,可选项,默认开启)并输出排序去重后的 bam 文件。

minimap2 (Beta)

Parabricks minimap2 (Beta) 是一款 GPU 加速版的minimap2,可用于三代数据(pacbio、ont)数据的比对(minimap2)、排序(gatk SortSam)并输出排序后的 bam 文件。

比对统计

bammetrics

collectmultiplemetrics

二代数据变异检测

一步运行

输入 fq.gz 文件,一行命令完成比对(bwa mem)、排序(gatk SortSam)、去重(gatk MarkDuplicates)、变异检测(gatk HaplotypeCaller)全流程。

Note

如果没有其它用途,为了节省存储空间,可以去掉 --out-bam ERR194146.deduped.bam 选项,即不保存比对后的 bam 文件,只保留 gvcf 文件。如果需要保留 bam 文件,建议将比对文件的输出格式指定为 cram,即 --out-bam ERR194146.deduped.cram,可以节省大约 ⅓-½ 的存储空间。

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=16G:j_exclusive=yes"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu


module load Singularity/3.7.3

# call vcf
# germline, bwa mem->sort->dep->vcf, 输出排序去重后的bam文件、vcf文件
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun germline --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1" --out-bam ERR194146.deduped.bam --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

# 多样本call gvcf
# germline, bwa mem->sort->dep->gvcf, 输出排序去重后的bam文件、gvcf文件
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun germline --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1" --out-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

分步运行

第一步完成原始数据的比对(bwa mem)、排序(gatk SortSam)、去重(gatk MarkDuplicates),第二步利用第一个生成的 bam 文件完成变异检测(gatk HaplotypeCaller)。

不同版本软件,命令行选项有少量差别。实际测试,4.4.0 全流程速度比 4.0.1 快 10% 左右,4.4.0 消耗的显存更多,需要使用 --low-memory 选项,以降低显存使用,否则容易出现 Out of memory 的显存溢出报错。

fq2bam 输入fq文件、输出排序去重后的bam文件。

haplotypecaller 输入bam文件、输出vcf或gvcf文件。

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=16G:j_exclusive=yes"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu


module load Singularity/3.7.3
# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.0.1-1.sif   pbrun fq2bam  --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1"  --out-bam ERR194146.deduped.bam

# haplotypecaller, bam->vcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller  --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

# 多样本call gvcf
# haplotypecaller, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller  --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

fq2bam 输入fq文件、输出排序去重后的bam文件。

haplotypecaller 输入bam文件、输出vcf或gvcf文件。

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=11G"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu


module load Singularity/3.7.3

# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.4.0-1.sif   pbrun fq2bam  --low-memory --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1"  --out-bam ERR194146.deduped.bam

# haplotypecaller, bam->vcf
singularity exec --nv  $IMAGE/clara-parabricks/4.4.0-1.sif  pbrun haplotypecaller --low-memory --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

# 多样本call gvcf
# haplotypecaller, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.4.0-1.sif  pbrun haplotypecaller --low-memory --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

分染色体运行

部分基因组较大或深度较深的数据,运行 pbrun haplotypecaller 时可能会出现显存不够的报错 Out of memory,对这种情况,可以有多种方式解决。

  • 分染色体来跑,最后再合并。以人的样本为例:
# 制作bed文件
$ cat  hg38.fa.fai |awk '{print $1"\t0\t"$2}' > hg38_all.bed
# 将大染色体分别分到单独的bed文件中,零碎的contig分到一个bed文件中
$ for i in {1..22};do cat hg38_all.bed |grep  -w chr${i} > hg38_chr${i}.bed ;done
$ cat hg38_all.bed |grep -e _  -e chrX -e chrY > hg38_other.bed

lsf脚本

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=12G"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu

module load Singularity/3.7.3
module load GATK/4.5.0.0

# 分染色体运行
# walltime 7151s, mem 20G
for i in {chr{1..22},other};do
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller --interval-file hg38_${i}.bed --ref hg38.fa --in-bam ERR194146.deduped.bam  --out-variants ERR194146_${i}.g.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146_${i}.log
done

# 将各个染色体的vcf合并
# walltime 1442s,mem 1.2G
gatk CombineGVCFs -R hg38.fa $(ls ERR194146*.g.vcf*gz|xargs -i echo "--variant {}") -O ERR194146.g.vcf.gz

deepvariant

使用 google deepvariant 完成变异检测。

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=11G:j_exclusive=yes"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu


module load Singularity/3.7.3

# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.0.1-1.sif   pbrun fq2bam  --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz  ../ERR194146_2.fastq.gz "@RG\tID:ERR194146\tSM:ERR194146\tPL:ILLUMINA\tPU:unit1"  --out-bam ERR194146.deduped.bam

# deepvariant, bam->vcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif pbrun deepvariant --ref hg38.fa --in-bam ERR194146.deduped.bam    --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

三代数据变异检测

deepvariant 官方不推荐在多倍体物种上使用 deepvariant 进行变异检测。

Parabricks 三代数据变异检测主要利用 deepvariant 来实现。

pacbio 的数据使用选项 --mode pacbio,ont 的数据使用选项 --mode ont

#BSUB -J parabricks
#BSUB -n 16
#BSUB -R span[hosts=1]
#BSUB -gpu "num=1:gmem=11G:j_exclusive=yes"
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q gpu


module load Singularity/3.7.3

# deepvariant, bam->vcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif pbrun deepvariant --mode pacbio --ref hg38.fa --in-bam ERR194146.deduped.bam    --out-variants ERR194146.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log

# 多样本call gvcf
# deepvariant, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif pbrun deepvariant --mode pacbio --gvcf --ref hg38.fa --in-bam ERR194146.deduped.bam    --out-variants ERR194146.g.vcf.gz --tmp-dir pbruntmp --logfile pbrun_ERR194146.log 

批量运行

如果有大量的群体数据需要运行,可以参考以下实现,每个样本一个作业,独立运行,提高计算效率。

运行 sh submit_parabricks.sh 即可实现作业批量提交。

submit_parabricks.sh
#BSUB -J submit_parabricks
#BSUB -n 1
#BSUB -o population.out
#BSUB -e population.err

for sample in /path/to/*.clean.R1.fq.gz;do
        sh run_parabricks.sh ${sample} 
        sleep 10
done
wait
run_parabricks.sh
#!/bin/sh

ref=/path/to/ref.fa
sample=$1

index=$(basename $sample |sed 's/.clean.R1.fq.gz//')
input=$(dirname $sample)
work=/path/to/work/
platform=ILLUMINA
RG="@RG\tID:${index}\tSM:${index}\tPL:${platform}\tPU:unit1"

## fq -> gvcf
bsub -J ${index}.parabricks \
        -n 16 -o ${index}.parabricks.out -e ${index}.parabricks.err \
        -gpu "num=1:gmem=18G" -q gpu \
        <<EOF
            module purge ;\\
            module load Singularity/3.7.3; \\
            singularity exec --nv /share/Singularity/clara-parabricks/4.0.1-1.sif \\
                    pbrun germline  \\
                    --ref $ref \\
                    --in-fq $input/${index}.trim_1.fq.gz $input/${index}.trim_2.fq.gz $RG \\
                    --out-bam $work/parabricks/${index}.deduped.bam \\
                    --out-variants $work/parabricks/${index}.g.vcf.gz \\
                    --gvcf --tmp-dir pbruntmp  && echo "$index success" || echo "$index failed"  
EOF

错误处理

  • gatk CombineGVCFs 出现报错 Key END found in VariantContext field INFO at Chr1:5436 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default

    parabricks 运行时输出了 vcf 文件,没有使用 --gvcf 选项,输出 gvcf文件。

  • Could not run fq2bam as part of germline pipeline

    可能参考基因组异常,更换参考基因组后恢复正常。

性能测试

使用数据为人30x WGS样本数据,单节点36核,2张P100 GPU卡。

不同软件

软件时间(h)最大内存(GB)加速倍数
bwa+GATK321241
sentieon4.1327.8
gtxcat3.110310.3
parabricks(BWA-MEM+haplotypecaller)2.993G11

不同硬件

不同硬件,call变异时间(haplotypecaller),使用数据为人30x WGS样本数据。

硬件时间最大内存(GB)
2张P10024min32G
1张409019min-
2张309021min-

不同版本

一张 40490D

命令

##### 4.0.1 #####
# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.0.1-1.sif   pbrun fq2bam  --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz    ../ERR194146_2.fastq.gz  --out-bam ERR194146.deduped.bam
# haplotypecaller, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.0.1-1.sif  pbrun haplotypecaller  --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log


##### 4.4.0 #####
# fq2bam, bwa mem->sort->dep, 输出排序去重后的bam文件
singularity exec --nv $IMAGE/clara-parabricks/4.4.0-1.sif   pbrun fq2bam  --low-memory --ref hg38.fa --in-fq ../ERR194146_1.fastq.gz    ../ERR194146_2.fastq.gz  --out-bam ERR194146.deduped.bam
# haplotypecaller, bam->gvcf
singularity exec --nv  $IMAGE/clara-parabricks/4.4.0-1.sif  pbrun haplotypecaller --low-memory --ref hg38.fa --in-bam ERR194146.deduped.bam --out-variants ERR194146.g.vcf.gz --gvcf --tmp-dir pbruntmp --logfile pbrun_ERR194146.log
版本BWA-MEM 时间(s)haplotypecaller 时间(s)总时间(s)
4.0.1530741709477
4.4.0482534108235

使用 germline 一步运行,内存使用差异巨大。

版本时间(s)最大内存(GB)
4.0.1974767
4.4.09454449

最佳实践

鉴于计算平台GPU数量较少,大批量的群体数据,不建议跑call变异的全流程都在GPU上跑运行(约3h),建议前面比对(bwa-mem2)和去重部分在普通节点上进行,最后call变异的步骤在GPU上运行(约20min),可以提高计算通量。

集群 GPU 相对充足,建议全流程都在 GPU 上运行。

由于parabricks没有对群体 gvcf 进行 joint call 的功能,因此对于群体 gvcf 数据,可以使用glnexus 来进行 joint call,具体使用见 glnexus

本文阅读量  次
本站总访问量  次