Bismark
跑bismark的时候需要申请的核心数时设置的程序线程数4倍,因为这个程序会调用4个bowtie同时跑比对,导致节点负载过高,程序运行变慢。
如 --multicore 4,实际运行中会运行总共16线程的bowtie比对,因此注意LSF申请的核心数必须为--multicore 设置线程数的4倍。
bismark2bedGraph 中使用了系统的sort命令,在centos7中,sort命令可以多线程运行,但bismark2bedGraph的sort没有添加多线程选项,可以自己在代码中手动添加一下--parallel=8,使用8线程,另外因为多线程使用sort会比较占内存,bismark2bedGraph可以用--buffer_size 控制一下内存的使用。
else{
my $sort_dir = './'; # there has been a cd into the output_directory already
# my $sort_dir = $output_dir;
# if ($sort_dir eq ''){
# $sort_dir = './';
# }
if ($gazillion){
if ($in =~ /gz$/){
open $ifh, "gunzip -c $in | sort --parallel=8 -S $sort_size -T $sort_dir -k3,3V -k4,4n |" or die "Input file could not be sorted. $!\n";
}
else{
open $ifh, "sort --parallel=8 -S $sort_size -T $sort_dir -k3,3V -k4,4n $in |" or die "Input file could not be sorted. $!\n";
}
### Comment by Volker Brendel, Indiana University
### "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros). Omit the V, and things go very wrong in the tallying of reads."
}
else{
### this sort command was used previously and sorts according to chromosome in addition to position. Since the files are being sorted according to chromosomes anyway,
### we may drop the -k3,3V option. It has been reported that this will result in a dramatic speed increase
if ($in =~ /gz$/){
open $ifh, "gunzip -c $in | sort --parallel=8 -S $sort_size -T $sort_dir -k4,4n |" or die "Input file could not be sorted. $!\n";
}
else{
open $ifh, "sort --parallel=8 -S $sort_size -T $sort_dir -k4,4n $in |" or die "Input file could not be sorted. $!\n";
}
}
本站总访问量 次