igv
IGV 可用来可视化查看多种基因组数据,支持在windows本地、服务器上运行。
https://github.com/igvteam/igv
集群上离线使用¶
IGV 在运行时会联网获取数据,由于集群所有节点均未联网,因此会出现网络连接超时(java.net.SocketTimeoutException: Connect timed out
)的报错,鉴于此可使用本地数据(这里使用线虫的数据ce11
)。
$ mkdir ~/igv
# 创建 prefs.properties 文件并写入相应内容
$ cat > ~/igv/prefs.properties << EOF
IGV.genome.sequence.dir=/public/home/software/opt/bio/software/IGV/2.17.4/genomes.tsv
IGV.Bounds=390,175,1150,800
DEFAULT_GENOME_KEY=ce11
##RNA
##THIRD_GEN
EOF
$ bsub -q interactive -XF -Is bash
Job <19526598> is submitted to queue <interactive>.
<<ssh X11 forwarding job>>
<<Waiting for dispatch ...>>
<<Starting on sg58>>
$ module load IGV/2.17.4
To execute IGV run igv.sh
$ igv.sh
自定义参考基因组¶
直接使用IGV载入基因组一直不太成功,可以自行处理基因组文件并创建json文件然后再在IGV中载入。
拟南芥¶
IGV启动后自动加载拟南芥的基因组数据。
$ mkdir -p ~/igv/tair10/data/
$ cd ~/igv/tair10/data/
# 下载数据 基因组fa文件和注释gff3文件
$ wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-58/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
$ wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-58/gff3/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.58.gff3.gz
$ gzip -d *gz
$ module load BEDTools/2.27 SAMtools/1.17 HTSlib/1.18
# 给fa文件建立索引
$ samtools faidx Arabidopsis_thaliana.TAIR10.dna.toplevel.fa
# 给gff3文件建立索引
$ bedtools sort -i Arabidopsis_thaliana.TAIR10.58.gff3 > Arabidopsis_thaliana.TAIR10.58_srt.gff3
$ bgzip Arabidopsis_thaliana.TAIR10.58_srt.gff3
$ tabix Arabidopsis_thaliana.TAIR10.58_srt.gff3.gz
# 写json文件
$ cd ~/igv/tair10/
$ cat > tair10.json << EOF
{
"id": "tair10",
"name": "A. thaliana (TAIR 10)",
"fastaURL": "/public/home/user/igv/tair10/data/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa",
"indexURL": "/public/home/user/igv/tair10/data/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.fai",
"tracks": [
{
"name": "Genes",
"format": "gff3",
"url": "/public/home/user/igv/tair10/data/Arabidopsis_thaliana.TAIR10.58_srt.gff3.gz",
"indexed": true,
"removable": false,
"order": 1000000,
"searchable": true
}
]
}
EOF
# 创建 genomes.tsv文件,注意3列之间用tab键而不是空格隔开,否则igv启动会报错
$ cat > ~/igv/genomes.tsv << EOF
# Hosted genomes list. Tab delimited file with 3 columns Name (for menu), url to .genome or .json file, and genome ID.
A. thaliana (TAIR 10) /public/home/user/igv/tair10/tair10.json tair10
EOF
# 更改 ~/igv/prefs.properties 文件
$ cat > ~/igv/prefs.properties << EOF
IGV.genome.sequence.dir=/public/home/user/igv/genomes.tsv
IGV.Bounds=2166,231,1150,800
DEFAULT_GENOME_KEY=tair10
##RNA
##THIRD_GEN
EOF
水稻¶
载入水稻参考基因组并查看比对后的bam文件。
$ mkdir -p ~/igv/Oryza/data/
$ cd ~/igv/Oryza/data/
# 下载参考基因组和注释文件(gff3或gtf均可)
$ wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.59.gff3.gz
$ gzip -d *gz
$ module load BEDTools/2.27 SAMtools/1.17 HTSlib/1.18
# 给fa文件建立索引
$ samtools faidx Oryza_sativa.IRGSP-1.0.dna.toplevel.fa
# 给gff3文件建立索引
$ bedtools sort -i Oryza_sativa.IRGSP-1.0.59.gff3 > Oryza_sativa.IRGSP-1.0.59_srt.gff3
$ bgzip Oryza_sativa.IRGSP-1.0.59_srt.gff3
$ tabix Oryza_sativa.IRGSP-1.0.59_srt.gff3.gz
# 写json文件
$ cd ~/igv/Oryza/
$ cat > Oryza.json << EOF
{
"id": "Oryza",
"name": "Oryza (IRGSP-1.0)",
"fastaURL": "/public/home/user/igv/Oryza/data/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa",
"indexURL": "/public/home/user/igv/Oryza/data/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.fai",
"tracks": [
{
"name": "Genes",
"format": "gff3",
"url": "/public/home/user/igv/Oryza/data/Oryza_sativa.IRGSP-1.0.59_srt.gff3.gz",
"indexed": true,
"removable": false,
"order": 1000000,
"searchable": true
}
]
}
EOF
# 写入 genomes.tsv文件,,注意3列之间用tab键而不是空格隔开,否则igv启动会报错
# 这个文件可以写入多个基因组
$ cat > ~/igv/genomes.tsv << EOF
# Hosted genomes list. Tab delimited file with 3 columns Name (for menu), url to .genome or .json file, and genome ID.
A. thaliana (TAIR 10) /public/home/user/igv/tair10/tair10.json tair10
Oryza (IRGSP-1.0) /public/home/user/igv/Oryza/Oryza.json Oryza
EOF
# 更改 ~/igv/prefs.properties 文件
$ cat > ~/igv/prefs.properties << EOF
IGV.genome.sequence.dir=/public/home/user/igv/genomes.tsv
IGV.Bounds=2166,231,1150,800
##RNA
##THIRD_GEN
EOF
# 准备bam文件,排序建索引
$ samtools sort -@20 sample.bam > sample_srt.bam
$ samtools index -@20 sample_srt.bam
# 在交互队列运行igv,并载入Oryza
$ bsub -q interactive -XF -Is bash
Job <19557888> is submitted to queue <interactive>.
<<ssh X11 forwarding job>>
<<Waiting for dispatch ...>>
<<Starting on sg60>>
$ module load IGV/2.17.4
To execute IGV run igv.sh
$ igv.sh -g Oryza
File
-> Load from File
载入对应的bam文件。 参考¶
本文阅读量 次本站总访问量 次