Bismark

跑bismark的时候需要申请的核心数时设置的程序线程数4倍,因为这个程序会调用4个bowtie同时跑比对,导致节点负载过高,程序运行变慢。

如 --multicore 4,实际运行中会运行总共16线程的bowtie比对,因此注意LSF申请的核心数必须为--multicore 设置线程数的4倍。

bismark2bedGraph 中使用了系统的sort命令,在centos7中,sort命令可以多线程运行,但bismark2bedGraph的sort没有添加多线程选项,可以自己在代码中手动添加一下--parallel=8,使用8线程,另外因为多线程使用sort会比较占内存,bismark2bedGraph可以用--buffer_size 控制一下内存的使用。

else{
    my $sort_dir = './'; # there has been a cd into the output_directory already
    # my $sort_dir = $output_dir;
    # if ($sort_dir eq ''){
    #   $sort_dir = './';
    # }
    if ($gazillion){
    if ($in =~ /gz$/){
        open $ifh, "gunzip -c $in | sort --parallel=8 -S $sort_size -T $sort_dir -k3,3V -k4,4n |" or die "Input file could not be sorted. $!\n";
    }
    else{ 
        open $ifh, "sort --parallel=8 -S $sort_size -T $sort_dir -k3,3V -k4,4n $in |" or die "Input file could not be sorted. $!\n";
    }
    ### Comment by Volker Brendel, Indiana University
    ### "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros).  Omit the V, and things go very wrong in the tallying of reads."
    }
    else{
    ### this sort command was used previously and sorts according to chromosome in addition to position. Since the files are being sorted according to chromosomes anyway,
    ### we may drop the -k3,3V option. It has been reported that this will result in a dramatic speed increase
    if ($in =~ /gz$/){
        open $ifh, "gunzip -c $in | sort --parallel=8 -S $sort_size -T $sort_dir -k4,4n |" or die "Input file could not be sorted. $!\n";
    }
    else{
        open $ifh, "sort --parallel=8 -S $sort_size -T $sort_dir -k4,4n $in |" or die "Input file could not be sorted. $!\n";
    }
}
本文阅读量  次
本站总访问量  次