canu
canu在运行过程中各个不同的阶段需要消耗的资源不同(如下所示),
On LSF detected memory is requested in MBB
--
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 31 GB 6 CPUs (k-mer counting)
-- Grid: hap 16 GB 18 CPUs (read-to-haplotype assignment)
-- Grid: cormhap 19 GB 4 CPUs (overlap detection with mhap)
-- Grid: obtovl 24 GB 6 CPUs (overlap detection)
-- Grid: utgovl 24 GB 6 CPUs (overlap detection)
-- Grid: cor 24 GB 4 CPUs (read correction)
-- Grid: ovb 4 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 32 GB 1 CPU (overlap store sorting)
-- Grid: red 38 GB 6 CPUs (read error detection)
-- Grid: oea 8 GB 1 CPU (overlap error adjustment)
-- Grid: bat 150 GB 20 CPUs (contig construction with bogart)
-- Grid: cns --- GB 8 CPUs (consensus)
-- Grid: gfa 64 GB 20 CPUs (GFA alignment and processing)
canu 使用集群模式运行时,可能会出现资源设置不合理导致大量作业挂起甚至将节点卡死的情况,结合本集群的情况,探索了一些canu的资源设置选项如下: 在ovs中,会提交大量的单核心、每个作业需要的内存较大的作业,很容易出现多个作业聚集在一个节点上,过程中节点内存很快耗尽导致作业被挂起或节点卡死,因此可以使用rusage设置LSF的内存预留情况,避免多个作业挤在一个节点。具体设置如下:
gridEngineResourceOption="-n THREADS " (1.8及以上)
将
setGlobalIfUndef("gridEngineResourceOption", "-n THREADS -M MEMORY");
更改为
setGlobalIfUndef("gridEngineResourceOption", " -n THREADS ");
集群module中 v1.7.1
v1.8
v1.9
2.0
2.2
的代码都做了相应的修改,可以不用上述选项,如果使用自己安装的canu可根据自己的需要添加选项或修改代码。
bat过程消耗内存较大,可以放到high队列,如下:
gridOptionsBAT="-q high"
minThreads
选项设置单个作业使用更多的核心,以避免作业因超内存而挂掉。 #BSUB -J canu
#BSUB -n 1
#BSUB -R span[hosts=1]
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal
module load canu/2.2
canu -p project -d project_out genomeSize=2000m minThreads=6 gridOptionsBAT="-q high" -pacbio subreads.fq.gz
#BSUB -J canu
#BSUB -n 1
#BSUB -R span[hosts=1]
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal
canu -p project -d project_out genomeSize=2000m minThreads=6 gridOptionsBAT="-q high" gridEngineResourceOption="-n THREADS " -pacbio subreads.fq.gz
本站总访问量 次