实验记录 | 6/4

首先,解决昨天的报错。
根据报错信息,我发现了两个网址:
(1)论坛上关于错误的解释:http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigs
(2)官网给出的解决方案:https://www.broadinstitute.org/gatk/guide/article?id=1328
但是,今天这个网站好像打不开。
虽然网上,没有说解决方法,我觉得可以使用samtools对bam文件重新进行排序。
我们上一步骤输出的.bam文件是./output/tumor/dupmark.bam
所以,我们要做的一个工作,就是把dupmark.bam这个过程文件取出来,然后查看一下他的顺序,然后将其按照我们reference的顺序,重新排序。


首先,我回到window平台,把上次比对产生的alignment.sam文件,拿过来看一下。或者将那一步的输出路径暂时改为workplace那层的output文件夹。

  • 需要修改somatic.pl这个文件,将dupmark.bam的输出改为workplace中的输出文件夹。
  • 需要获取alignment.sam这个过程性文件,查看其索引。
  • 看一下samtools sort指令的排序结果,满足我们要求的话,将其添加到我们的指令中去(标记重复的代码段后面)。

转到window界面。


找到了bam文件,不过是乱序的,对于我们的这个过程,没啥参考性的意义。
我们的dupmark.bam 文件已经生成了,我们把它提取出来看一看。

自己做的这些尝试都是无用的,最重要的还是看看官网的介绍。一句话概括这里就是说,使用这块的指令,对.bam文件进行一个重新的排序工作。

java -jar picard.jar ReorderSam I= original.bam O=reordered.bam R=reference.fasta CREATE_INDEX=TRUE

我们可以按照这个,修改一下,我们的原始代码,或许会有结果。

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/picard.jar ReorderSam I=./output/tumor/dupmark.bam O=./output/tumor/dupmark.bam R=./geneome/hg19/hg19.faCREATE_INDEX=TRUE

蹩脚性质的修改了文件中的代码,运行下来,报出错误如下:

INFO 2021-06-04 14:22:34 ReorderSam
********** NOTE: Picard’s command line syntax is changing.
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
********** The command line looks like this in the new syntax:
********** ReorderSam -I ./output/tumor/dupmark.bam -O ./output/tumor/dupmark.bam -R ./geneome/hg19/hg19.faCREATE_INDEX=TRUE
ERROR: Option ‘SEQUENCE_DICTIONARY’ is required.
USAGE: ReorderSam [options]
Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#ReorderSam
Not to be confused with SortSam which sorts a SAM or BAM file with a valid sequence dictionary, ReorderSam reorders
reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching
of contigs. Reads mapped to contigs absent in the new reference are unmapped. Runs substantially faster if the input is
an indexed BAM file.
Version: 2.25.4-SNAPSHOT
Options:
–help
-h Displays options specific to this tool.
–stdhelp
-H Displays options specific to this tool AND options common to all Picard command line
tools.
–version Displays program version.
INPUT=File
I=File Input file (SAM or BAM) to extract reads from. Required.
OUTPUT=File
O=File Output file (SAM or BAM) to write extracted reads to. Required.
SEQUENCE_DICTIONARY=File
SD=File A Sequence Dictionary for the OUTPUT file (can be read from one of the following file
types (SAM, BAM, VCF, BCF, Interval List, Fasta, or Dict) Required.
ALLOW_INCOMPLETE_DICT_CONCORDANCE=Boolean
S=Boolean If true, allows only a partial overlap of the original contigs with the new reference
sequence contigs. By default, this tool requires a corresponding contig in the new
reference for each read contig Default value: false. This option can be set to ‘null’ to
clear the default value. Possible values: {true, false}
ALLOW_CONTIG_LENGTH_DISCORDANCE=Boolean
U=Boolean If true, then permits mapping from a read contig to a new reference contig with the same
name but a different length. Highly dangerous, only use if you know what you are doing.
Default value: false. This option can be set to ‘null’ to clear the default value.
Possible values: {true, false}

我错误的原因是,没有加入SEQUENCE_DICTIONARY这个指令。再使用说明上,又一个示例的代码。

java -jar picard.jar ReorderSam
INPUT=sample.bam
OUTPUT=reordered.bam
SEQUENCE_DICTIONARY=reference_with_different_order.dict

继续模仿修改如下:

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/picard.jar ReorderSam INPUT=./output/tumor/dupmark.bam OUTPUT=./output/tumor/dupmark.bam SEQUENCE_DICTIONARY=./geneome/hg19/hg19.dict(这里需要动一下脑筋) CREATE_INDEX=TRUE

好的,返回window界面下,重新修改指令。

我终于明白了,不是我的.bam文件的错误,而是我的从数据库中下载的.vcf文件的顺序的错误,所以还是应该使用picard.jar来矫正,不过对象是.vcf文件。应该使用指令SortVcf

查看一下出错记录,对象是哪一个文件?

home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ./geneome/hg19/hg19.fa --num_threads 32 `-known` ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf `-known` `./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf ` -o ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam > ./output/tumor/index.out

ERROR MESSAGE: Input files known2 and reference have incompatible contigs.

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T IndelRealigner  --filter_bases_not_stored --disable_auto_index_creation_and_locking_when_reading_rods -R ./geneome/hg19/hg19.fa `-known` ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf `-known` ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf  -targetIntervals ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam -o ./output/tumor/realigned.bam >./output/tumor/tumor_realign.out

ERROR MESSAGE: Input files knownAlleles2 and reference have incompatible contigs.

所以,根本上是./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf文件的问题。

我先把这个文件的一部分打开,看一看是不是这样。我比较期待的结果是如错误文件中显示,排在最前面的那个染色体应该是chrM

##contig=<ID=chrM,length=16571,assembly=hg19>
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
##contig=<ID=chr5,length=180915260,assembly=hg19>
##contig=<ID=chr6,length=171115067,assembly=hg19>
##contig=<ID=chr7,length=159138663,assembly=hg19>
##contig=<ID=chr8,length=146364022,assembly=hg19>
##contig=<ID=chr9,length=141213431,assembly=hg19>
##contig=<ID=chr10,length=135534747,assembly=hg19>

果然如此。
我们看一下,其他的文件,是否有存在这中情况。
这三个参考.vcf文件,都是同样的情况,于是,我们选择将这三个文件的顺序重新排一下。
在官网上的参考指令是这样的:

java -jar picard.jar SortVcf I=original.vcf O=sorted.vcf SEQUENCE_DICTIONARY=reference.dict

我们重新返回window系统中,修改指令。(我觉得我可以解决这个问题的,只是时间的问题)

蹩脚的修改,出现了错误:

Scalar found where operator expected at somatic.pl line 411, near “” OUTPUT="$resource"
(Missing operator before r e s o u r c e ? ) S c a l a r f o u n d w h e r e o p e r a t o r e x p e c t e d a t s o m a t i c . p l l i n e 412 , n e a r " " O U T P U T = " resource?) Scalar found where operator expected at somatic.pl line 412, near "" OUTPUT=" resource?)Scalarfoundwhereoperatorexpectedatsomatic.plline412,near""OUTPUT="resource"
(Missing operator before r e s o u r c e ? ) s y n t a x e r r o r a t s o m a t i c . p l l i n e 411 , n e a r " " O U T P U T = " resource?) syntax error at somatic.pl line 411, near "" OUTPUT=" resource?)syntaxerroratsomatic.plline411,near""OUTPUT="resource"
syntax error at somatic.pl line 412, near “” OUTPUT="$resource"
Execution of somatic.pl aborted due to compilation errors.

我们回到原始文件中查看一下。修改了以后,重新运行,还是不行。代码是没有问题的。
但行运行这一个代码文件,发现对于程序而言是无效的,即使将结果文件,更名产生,发现并没有新的文件的产生。
初步得出结论,就是说这个代码行是没有运行成功的。
屏幕上报出错误,如下:

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread “main” java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrM,length=16571,dict_index=0,assembly=hg19,alternate_names=[]) was found when SAMSequenceRecord(name=chr1,length=249250621,dict_index=0,assembly=null,alternate_names=[]) was expected.
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:123)
at picard.vcf.SortVcf.doWork(SortVcf.java:92)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrM,length=16571,dict_index=0,assembly=hg19,alternate_names=[]) was found when SAMSequenceRecord(name=chr1,length=249250621,dict_index=0,assembly=null,alternate_names=[]) was expected.
at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:154)
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:121)

所以,接下来该如何解决呢?(我今天一天的任务,就是解决这样的一个问题,如果能够顺利解决,则今天就是成功的。)

找到链接:https://www.cnblogs.com/chenwenyan/p/7040852.html
有多个转载,我们尝试一下这个方法。
更新索引:

java -jar picard.jar UpdateVcfSequenceDictionary \
    I=1000G_phase1.indels.hg19.sites.vcf \
    O=1000G_phase1.indels.hg19.sites.dic.vcf \
    SEQUENCE_DICTIONARY=hg19.dict

更新索引正在运行中,还不确定是否有效。
更新之后,好像卡住了。

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar SortVcf --INPUT 1000G_phase1.snps.high_confidence.hg19.vcf --OUTPUT  1000G_phase1.snps.high_confidencehg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

最后证明是有效的。我们生成的结果文件,确实顺序一致了。
这个过程中,所用到的完整代码是:

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar `UpdateVcfSequenceDictionary` --INPUT 1000G_phase1.snps.high_confidence.hg19.vcf --OUTPUT  1000G_phase1.snps.high_confidence.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar `SortVcf` --INPUT 1000G_phase1.snps.high_confidence.hg19.vcf --OUTPUT  1000G_phase1.snps.high_confidence.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar  `UpdateVcfSequenceDictionary` --INPUT Mills_and_1000G_gold_standard.indels.hg19.vcf --OUTPUT  Mills_and_1000G_gold_standard.indels.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar `SortVcf` --INPUT Mills_and_1000G_gold_standard.indels.hg19.vcf --OUTPUT  Mills_and_1000G_gold_standard.indels.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

回到window平台,重新修改代码。
运行成功了,但是dbsnp这个文件,也需要改一下。

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar  UpdateVcfSequenceDictionary --INPUT dbsnp.hg19.vcf --OUTPUT  dbsnp.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java  -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar  SortVcf --INPUT dbsnp.hg19.vcf --OUTPUT  dbsnp.hg19_2.vcf  -SEQUENCE_DICTIONARY  /home/xxzhang/workplace/QBRC/geneome/hg19/hg19.dict

修改完成之后,重新运行该代码。

perl somatic.pl NA NA ./example/example_dataset/sequencing/SRR7246238_1.fastq.gz ./example/example_dataset/sequencing/SRR7246238_2.fastq.gz 32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java ./output human 1 ./disambiguate_pipeline

代码正在运行中,应该不会在出现之前的错误了,但是新的错误会不会出现还不一定。我现在看一下,我能不能接着运行下去。
前段这段时间就是“先把这个流程完整的跑下去”,将其结果与作者提供的代码来比较,如果相差不大。则这个流程则是比较顺利的跑下来了,接下来只要环境配置成功,所需要的软件以及文件都配置完成的话,接下来,不会特别难。

vcf文件夹下,出现了一个比较奇怪的问题,在文件dbsnp.hg19.vcf并没有矫正完全。

##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf
#CHROM POS ID REF ALT QUAL FILTER INFO
chrM 64 rs3883917 C T . . ASP;CAF=[0.9373,0.06268];COMMON=1;OTHERKG;R5;RS=3883917;RSPOS=64;SAO=0;SSR=0;VC=SNV;VP=0x050000020005000002000100;WGT=1;dbSNPBuildID=108
chrM 72 rs370271105 T C . . ASP;CAF=[0.9878,0.01123];COMMON=1;HD;OTHERKG;R5;RS=370271105;RSPOS=72;SAO=0;SSR=0;VC=SNV;VP=0x050000020005000402000100;WGT=1;dbSNPBuildID=138
chrM 93 rs369034419 A G . . ASP;CAF=[0.9663,0.03368];COMMON=1;HD;OTHERKG;R5;RS=369034419;RSPOS=93;SAO=0;SSR=0;VC=SNV;VP=0x050000020005000402000100;WGT=1;dbSNPBuildID=138

等待等待等待,这个星期一定要把这个程序文件运行出来。我现在需要理一下我接下来的发展方向。我感觉课题组中的学习氛围并不是特别的浓厚,尤其是在组会上,是老师和讲的同学的单方面的互动,其他同学,有的都没有在听。好像整体上,人并不是特别的活跃。
那么对于我来说的话,我希望自己能够多读些文献,争取下次师姐在讲的时候,能够提出更多有质量的问题,明天,对自己所学的东西,做一个复叛和整理,主要 分为以下两个方面的内容。
(1)逻辑学(比如老师在组会的时候就会提到一些逻辑学的概念,如循环论证)
(2)多读领域相关的文献
(3)多做总结,每次组会听下来,自己能够学会哪些思路方面的内容
我想要进步,我想要进步,我不想昏昏沉沉的活着。
我感觉自己这几天是刚刚进组,有一点的新鲜劲,比较积极活跃,那么你能否将这种品质一直保持到你硕士毕业呢?
这几天,陆陆续续的接触到不一样的状态的人,慢慢觉得,如果停止前进,停止对自己提出要求,那么你就没有成长。
我想留在这里,和诸老师多多学习,她是我想要拜的师傅。求学就是这样的一回事儿,我要多多努力才是。现在,只是有这样的一个机会而已,而你自己是否能够把握,是否能够得到老师的认可,却是另外的一件事,所以,还是有很多有趣的可以努力的地方的。
(1)我希望自己多读文献,多思考,学会提问。
——努力的途径

  • 风变课程:沟通提问部分,与上司的沟通
  • 网盘书籍:文献要如何读
  • 实践:买一个kindle的阅读器,整理与自己领域相关的文献,利用上下班的时间读文献,作标记
  • 组会复盘:每次开会自己学到什么,做一个整理(我想要跟比我强的人学习)
    (2)提高自己的专业能力。
    ——努力途径
  • 把老师交代的任务付出时间付出大量的精力,认认真真的弄好。
  • 《神经生物学》那本书的基础知识的阅读(有时候组会上讨论的一些专有名词,没怎么反应过来)

以上,就是我这一阶段的努力的目标——我想提升自己的专业能力与与人沟通交流的能力,获得老师和同事的认可。

跑出来了coverage.txt文档(说明之前的问题解决了),快乐~但是,又出现了另外的错误。
粘贴如下,等我溜弯儿回来整理。(17:54)
(删掉,我溜弯儿回来了。)
现在,重新开始整理。接下来一段时间的目标就是,继续解决出错的地方。
(18:36—21:30,3个小时)

出错1:

/home/xxzhang/miniconda3/bin/samtools mpileup -d 7000 -I -f ./geneome/hg19/hg19.fa ./output/tumor/tumor.bam > ./output/tumor/tumor.mpileup

samtools mpileup option I is functional, but deprecated. Please switch to using bcftools mpileup in future.
1 samples in 1 input files(这个指令已经过期了,应该换用bcftools)
Error: mpileup has failed!

出错2:

configureStrelkaGermlineWorkflow.py --bam /home/xxzhang/workplace/QBRC/output/tumor/tumor.bam --referenceFasta ./geneome/hg19/hg19.fa --runDir ./output/strelka/ --exome

Can’t exec “configureStrelkaGermlineWorkflow.py”: No such file or directory at somatic.pl line 523.

注:这个问题好办,可能是之前,自己在这里忘记加入指令的绝对路径了。我找一下configureStrelkaGermlineWorkflow.py这个文件,属于哪个文件?(找到了,在strelka文件夹下。)

出错3:

Rscript /home/xxzhang/workplace/QBRC//somatic_script/filter_vcf.R ./output NA

Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
Warning in install.packages(name_pkg[bool_nopkg]) :
‘lib = “/usr/lib64/R/library”’ is not writable
Error in install.packages(name_pkg[bool_nopkg]) :
unable to install packages
Execution halted

注:这里需要去源文件中找一下,它想要install的包是哪一个包?

出错4:

/home/xxzhang/workplace/QBRC/annovar/table_annovar.pl ./output/germline_mutations.txt /humandb/ -buildver hg19 -out ./output/germline_mutations -remove  -protocol refGene,ljb26_all,cosmic70,esp6500siv2_all,exac03,1000g2015aug_all -operation g,f,f,f,f,f  -nastring .

which: no table_annovar.pl in (/opt/perl5/bin:/opt/curl/bin:/opt/jdk/bin:/opt/go/bin:/opt/SimpleQueue:/opt/tsce4/maui/sbin:/opt/tsce4/maui/bin:/opt/tsce4/torque6/sbin:/opt/tsce4/torque6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin)
Use of uninitialized value $annovar_path in substitution (s///) at somatic.pl line 178.
Use of uninitialized value $annovar_path in concatenation (.) or string at somatic.pl line 192.
Can’t exec “/home/xxzhang/workplace/QBRC/annovar/table_annovar.pl”: Permission denied at somatic.pl line 523.

注:这里是关于annovar注释文件所在的路径。需要在源代码中删去which那一个步骤,直接在下面引用我们自己的annovar路径即可。并且,修改annovar这个文件夹的权限为可读可写可操作。

好的,基本上就是这四个问题。我继续一点点的来解决。


解决思路及其过程:
出错1:
samtools的mpileup指令已经过期,不再使用。应该换用bcftools。
检查了一下,没有bcftools的路径,所以现在着手来安装。
参考链接:https://anaconda.org/bioconda/bcftools
输入指令:./conda install -c bioconda bcftools
(我觉得工作的过程中,会很容易进入“心流”的状态,竟然会涌现出一种喜悦。)
(矫情的很,好像安装bcftools的时候,报错了。)

明明显示已经安装完成,为什么加载不出来呢?

All requested packages already installed.

我明白了,是编译的时候,缺东西。

./bcftools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

缺少libcrypto.so.1.0.0,按照网上的教程:https://blog.csdn.net/u012000056/article/details/52671474
我等师兄回复消息(不是自己的服务器,真的太麻烦了)。
建议说普通用户也可以在自己的系统上:

export LD_LIBRARY_PATH=///:$LD_LIBRARY_PATH

我没怎么整明白。
按照这个教程建立软链接,好像没什么用处?还是会报错。
链接:https://www.cnblogs.com/huanping/p/13786701.html
我尝试建立软链接,但是我之前失败是将链接引导到/bin/文件夹下。但是,正确的做法是将路径链接到/lib/文件夹下方。
我还是对于系统不够熟悉啊?
ln -s /usr/lib64/libcrypto.so.1.0.2k /home/xxzhang/miniconda3/lib/libcrypto.so.1.0.0
另外,在问之前,需要明确的一点是先看一下系统中是否存在这个文件,不要盲目的就去安装。
怎样查看系统中是否有这个库文件:
whereis libcrypto

libcrypto: /usr/lib64/libcrypto.so

查看版本号:
ll /usr/lib64/libcrypto.so

lrwxrwxrwx. 1 root root 19 3月 3 20:27 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.2k

然后就是将系统中的文件,链接到我们的lib文件夹下即可。到时候conda调用的时候,直接调用即可。(顺便吐槽一下,这个系统也真够笨的,稍微版本号不一致,就无法识别。)

最后,展示一下胜利的果实,快乐:

-bash-4.2$ ./bcftools
Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.9 (using htslib 1.9)
Usage: bcftools [–version|–version-only] [–help]
Commands:
– Indexing
index index VCF/BCF files
– VCF/BCF manipulation
annotate annotate and edit VCF/BCF files
concat concatenate VCF/BCF files from the same set of samples
convert convert VCF/BCF files to different formats and back
isec intersections of VCF/BCF files
merge merge VCF/BCF files files from non-overlapping sample sets
norm left-align and normalize indels
plugin user-defined plugins
query transform VCF/BCF into user-defined formats
reheader modify VCF/BCF header, change sample names
sort sort VCF/BCF file
view VCF/BCF conversion, view, subset and filter VCF/BCF files
– VCF/BCF analysis
call SNP/indel calling
consensus create consensus sequence by applying VCF variants
cnv HMM CNV calling
csq call variation consequences
filter filter VCF/BCF files using fixed thresholds
gtcheck check sample concordance, detect sample swaps and contamination
mpileup multi-way pileup producing genotype likelihoods
roh identify runs of autozygosity (HMM)
stats produce VCF/BCF stats

然后我们就可以用bcftools替换samtools,来做mpileup了。
好的。
用bcftools做比对的指令。
./bcftools mpileup

Usage: bcftools mpileup [options] in1.bam [in2.bam […]]
Input options:
-6, --illumina1.3+ quality is in the Illumina-1.3+ encoding
-A, --count-orphans do not discard anomalous read pairs
-b, --bam-list FILE list of input BAM filenames, one per line
-B, --no-BAQ disable BAQ (per-Base Alignment Quality)
-C, --adjust-MQ INT adjust mapping quality; recommended:50, disable:0 [0]
-d, --max-depth INT max per-file depth; avoids excessive memory usage [250]
-E, --redo-BAQ recalculate BAQ on the fly, ignore existing BQs
-f, --fasta-ref FILE faidx indexed reference sequence file
–no-reference do not require fasta reference file
-G, --read-groups FILE select or exclude read groups listed in the file
-q, --min-MQ INT skip alignments with mapQ smaller than INT [0]
-Q, --min-BQ INT skip bases with baseQ/BAQ smaller than INT [13]
-r, --regions REG[,…] comma separated list of regions in which pileup is generated
-R, --regions-file FILE restrict to regions listed in a file
–ignore-RG ignore RG tags (one BAM = one sample)
–rf, --incl-flags STR|INT required flags: skip reads with mask bits unset []
–ff, --excl-flags STR|INT filter flags: skip reads with mask bits set
[UNMAP,SECONDARY,QCFAIL,DUP]
-s, --samples LIST comma separated list of samples to include
-S, --samples-file FILE file of samples to include
-t, --targets REG[,…] similar to -r but streams rather than index-jumps
-T, --targets-file FILE similar to -R but streams rather than index-jumps
-x, --ignore-overlaps disable read-pair overlap detection
Output options:
-a, --annotate LIST optional tags to output; ‘?’ to list []
-g, --gvcf INT[,…] group non-variant sites into gVCF blocks according
to minimum per-sample DP
–no-version do not append version and command line to the header
-o, --output FILE write output to FILE [standard output]
-O, --output-type TYPE ‘b’ compressed BCF; ‘u’ uncompressed BCF;
‘z’ compressed VCF; ‘v’ uncompressed VCF [v]
–threads INT number of extra output compression threads [0]
SNP/INDEL genotype likelihoods options:
-e, --ext-prob INT Phred-scaled gap extension seq error probability [20]
-F, --gap-frac FLOAT minimum fraction of gapped reads [0.002]
-h, --tandem-qual INT coefficient for homopolymer errors [100]
-I, --skip-indels do not perform indel calling
-L, --max-idepth INT maximum per-file depth for INDEL calling [250]
-m, --min-ireads INT minimum number gapped reads for indel candidates [1]
-o, --open-prob INT Phred-scaled gap open seq error probability [40]
-p, --per-sample-mF apply -m and -F per-sample for increased sensitivity
-P, --platforms STR comma separated list of platforms for indels [all]

bcftools的绝对路径:

/home/xxzhang/miniconda3/bin/bcftools

修改指令:

/home/xxzhang/miniconda3/bin/bcftools mpileup -d 7000  -f ./geneome/hg19/hg19.fa ./output/tumor/tumor.bam > ./output/tumor/tumor.mpileup

出错2:
需要到原始文件夹中,链接.py文件的地址。

/home/xxzhang/workplace/software/strelka/bin

修改指令:

/home/xxzhang/workplace/software/strelka/bin/configureStrelkaGermlineWorkflow.py --bam /home/xxzhang/workplace/QBRC/output/tumor/tumor.bam --referenceFasta ./geneome/hg19/hg19.fa --runDir ./output/strelka/ --exome

出错3:
这里主要的原因,是想要安装R包,但是呢?那个文件不可写。因此,我可以采取的方法有在我这边安装R,然后将Rscript对接到我的这个文件中。
先弄最简单的,我先安装R。

突然想起来,可以直接用conda进行安装。就不用再编译什么的了。
./conda install R

出错4:
修改annotation的权限。
chmod -R 777 annovar/
cd annovar/
ll

total 512
-rwxrwxrwx. 1 xxzhang Zhu_lab 226259 6月 8 2020 annotate_variation.pl
-rwxrwxrwx. 1 xxzhang Zhu_lab 42478 6月 8 2020 coding_change.pl
-rwxrwxrwx. 1 xxzhang Zhu_lab 170199 6月 8 2020 convert2annovar.pl
drwxrwxrwx. 2 xxzhang Zhu_lab 9 6月 2 12:08 example
drwxrwxrwx. 3 xxzhang Zhu_lab 34 6月 2 13:15 humandb
-rwxrwxrwx. 1 xxzhang Zhu_lab 19407 6月 8 2020 retrieve_seq_from_fasta.pl
-rwxrwxrwx. 1 xxzhang Zhu_lab 42469 6月 8 2020 table_annovar.pl
-rwxrwxrwx. 1 xxzhang Zhu_lab 21291 6月 8 2020 variants_reduction.pl

(21:20)我可以把电脑放在这里,回宿舍了。
先挂在这里,明天早上再来收获。到时候再跑一边流程。希望能够顺利。
这周的目标就是希望能够将somatic.pl这个文件跑完。

上一篇:java – 将Shift_JIS格式转换为UTF-8格式


下一篇:Python库生成VCF文件?