实验记录 | 6/10

(8:54)早上的时候,来这边发现,gtf转换,但是并没有转换出来(847MB==>43.5GB)。所以,昨天的尝试是失败的。
又及:另外一个平台上尝试下载的gtf文件,也下载失败。

gencode.v19.annotation.gtf.gz 63%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++====> ] 22.89M 11.5KB/s in 3m 19s
2021-06-09 21:19:36 (7.90 KB/s) - Connection closed at byte 24001167. Giving up.

所以,整体上,卡在了gtf与基因组序列的不匹配上。之前,我记得我也遇到类似的问题,整体上应该解决起来并不难的。
(9:04)我现在应该怎么办呢?
我找一下,之前我的基因组的参考序列是在哪个平台上下载的:

(5月13日)
mkdir genome/hg19
cd genome/hg19
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
tar zvfx chromFa.tar.gz

我的参考基因组序列是在UCSC平台上获取的,所以,我的gtf文档,也应该在这个平台上获取(这样才能最大可能的一一对应起来)。
(记得到时候学姐讲的时候,给学姐拍照)
这个界面的打开,是比较慢的,但是不要着急。

Feb. 2009 (GRCh37/hg19)
Genome sequence files and select annotations (2bit, GTF, GC-content, etc)
Sequence data by chromosome
Annotations
GC percent data
Protein database for hg19
SNP-masked fasta files
LiftOver files
Pairwise alignments (primates)
Pairwise alignments (other mammals)
Pairwise alignments (other vertebrates)
Multiple alignments
Patches

终于找到了UCSC上,hg19的gtf注释文档。
参考链接:https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/
目录下面一共有四个选择:

hg19.ensGene.gtf.gz 2020-01-10 09:45 26M
hg19.knownGene.gtf.gz 2020-01-10 09:45 17M
hg19.ncbiRefSeq.gtf.gz 2021-05-17 10:35 19M
hg19.refGene.gtf.gz 2020-01-10 09:45 21M
(可以看到,整体上并不是很大,哪里有那么夸张呢!)

我们选择,下载refGene这个版本。
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz

–2021-06-10 09:29:31-- https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)… 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 21846291 (21M) [application/x-gzip]
Saving to: ‘hg19.refGene.gtf.gz’
hg19.refGene.gtf.gz 38%[======> ] 8.02M 115KB/s eta 8hg19.hg19.refGene.gh

下载完成之后,我们将其解压,最后查看文件内容。

chr1 refGene transcript 11869 14362 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; gene_name “LOC102725121”;
chr1 refGene exon 11869 12227 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “1”; exon_id “NR_148357.1”; gene_name “LOC102725121”;
chr1 refGene exon 12613 12721 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “2”; exon_id “NR_148357.2”; gene_name “LOC102725121”;
chr1 refGene exon 13221 14362 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “3”; exon_id “NR_148357.3”; gene_name “LOC102725121”;
chr1 refGene transcript 11874 14409 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; gene_name “DDX11L1”;
chr1 refGene exon 11874 12227 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “1”; exon_id “NR_046018.1”; gene_name “DDX11L1”;
chr1 refGene exon 12613 12721 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “2”; exon_id “NR_046018.2”; gene_name “DDX11L1”;
chr1 refGene exon 13221 14409 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “3”; exon_id “NR_046018.3”; gene_name “DDX11L1”;
chr22 refGene transcript 24666799 24813706 . + . gene_id “SPECC1L”; transcript_id “NM_015330”; gene_name “SPECC1L”;
chr22 refGene exon 24666799 24666951 . + . gene_id “SPECC1L”; transcript_id “NM_015330”; exon_number “1”; exon_id “NM_015330.1”; gene_name “SPECC1L”;

感觉,有点奇怪,这个文件中的染色体号,好像是乱的。
有点意思,我们沿用昨天的方法,把他的前面取出来。

chr17
chr12
chr2
chr6_mcf_hap5
chr2
chr6_mcf_hap5
chr7
chr5
chr6_mcf_hap5
chrX

全部都是乱的(并不是我们理想的状态)。
我想再去看看其他的文件。
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ensGene.gtf.gz

–2021-06-10 09:39:08-- https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ensGene.gtf.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)… 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 27763607 (26M) [application/x-gzip]
Saving to: ‘hg19.ensGene.gtf.gz’
hg19.ensGene.gtf.gz 100%[================================================================================================================>] 26.48M 101KB/s in 3m 11s
2021-06-10 09:42:20 (142 KB/s) - ‘hg19.ensGene.gtf.gz’ saved [27763607/27763607]

gunzip -d hg19.ensGene.gtf.gz
grep -o -E "^\w+([-+.]\w+)*" hg19.ensGene.gtf|uniq

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr17_gl000204_random
chr17_gl000205_random
chr18
chr19

这个我挺满意的,只是接下来,要把这个文件按照字典排序(chr1,chr2,chr3……)。
可以使用bedtools来对序列进行排序。
参考链接:https://anaconda.org/bioconda/bedtools
conda install -c bioconda bedtools
安装成功,可以继续使用这个工具了。
后来尝试后发现,其实并不需要bedtools这个工具,直接使用sort即可(我对于一些shell指令还是不太熟悉)。
sort -k1,1V -k3,3n -k4,4n hg19.ensGene.gtf >hg19.ensGene_sorted.gtf
grep -o -E "^\w+([-+.]\w+)*" hg19.ensGene_sorted.gtf|uniq
最后的sorted结果,是我们想要的结果:

chr1
chr1_gl000191_random
chr1_gl000192_random
chr2
chr3
chr4
chr4_gl000193_random
chr4_gl000194_random
chr5
chr6
chr6_apd_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap6
chr6_ssto_hap7
chr7
chr7_gl000195_random
chr8
chr8_gl000196_random
chr9
chr9_gl000199_random
chr9_gl000201_random
chr10
chr11
……
chrUn_gl000213
chrUn_gl000215
chrUn_gl000216
chrUn_gl000218
chrUn_gl000219
……
chrX
chrY

我们可以使用这个gtf注释文档,来进行接下来的star比对工作。
我们把sorted后的文档,上传到服务器中。


上传完成之后,开始运行比对,产生索引。
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runThreadN 32 --runMode genomeGenerate --genomeDir /home/xxzhang/workplace/QBRC/geneome/hg19/STAR --genomeFastaFiles hg19.fa --sjdbGTFfile hg19.gtf --sjdbOverhang 149

STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 10:51:47 … started STAR run
Jun 10 10:51:47 … starting to generate Genome files
Jun 10 10:52:57 … processing annotations GTF(这一步终于过去了,不再卡在原来的地方,说明我的努力是有价值的)
Jun 10 10:53:28 … starting to sort Suffix Array. This may take a long time…
Jun 10 10:53:47 … sorting Suffix Array chunks and saving them to disk…
Jun 10 11:05:51 … loading chunks from disk, packing SA…
Jun 10 11:07:14 … finished generating suffix array
Jun 10 11:07:14 … generating Suffix Array index
Jun 10 11:10:41 … completed Suffix Array index
Jun 10 11:10:42 … inserting junctions into the genome indices
Jun 10 11:14:59 … writing Genome to disk …
Jun 10 11:15:00 … writing Suffix Array to disk …
Jun 10 11:15:10 … writing SAindex to disk
Jun 10 11:15:10 … finished successfully

所以创建成功!比我想象的要快很多!
最后在路径下生成的文件有:

chrLength.txt exonInfo.tab hg19.gtf sjdbList.fromGTF.out.tab
chrNameLength.txt geneInfo.tab Log.out sjdbList.out.tab
chrName.txt Genome SA transcriptInfo.tab
chrStart.txt genomeParameters.txt SAindex
exonGeTrInfo.tab hg19.fa sjdbInfo.txt

索引,创建完成之后,继续运行somatic.pl程序。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline

在比对的过程中,出现错误。

EXITING because of INPUT ERROR: could not open genomeFastaFile: ./geneome/hg19/hg19.fa
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

我们来看一下,在STAR比对那一步,它在代码中是如何写的。

$star_index=$index;
$star_index=~s/\/[^\/]*?$/\/STAR/;
……
system_call("STAR --genomeDir ".$star_index." --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
          " ".$zcat." --outFileNamePrefix ".$type_output."/tmp/pass1 --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

system_call("cd ".$type_output.";
STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
          $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

system_call("STAR --genomeDir ".$type_output."/tmp/pass2 --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
          " ".$zcat." --outFileNamePrefix ".$type_output."/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

通过这段代码,我们可以猜测,主要原因是权限不够。
修改hg19文件夹的权限。
chmod -R 777 geneome/hg19/
重新运行指令。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline

同样的错误,并没有得到很好的解决:

EXITING because of FATAL ERROR: could not open genome file ./output_RNA/tumor/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:34 … FATAL ERROR, exiting
mv ./output_RNA/tumor/Aligned.out.sam ./output_RNA/tumor/alignment.sam
./output_RNA/tumor/SJ.out.tab doesn’t exist!
Jun 10 11:37:46 … finished mapping
Jun 10 11:37:48 … finished successfully
cd ./output_RNA/normal;/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/normal/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/normal/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/normal/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/normal/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 11:37:48 … started STAR run
!!! WARNING: Could not move Log.out file from ./Log.out into ./output_RNA/normal/tmp/pass2/Log.out. Will keep ./Log.out
Jun 10 11:37:48 … starting to generate Genome files
EXITING because of INPUT ERROR: could not open genomeFastaFile: ./geneome/hg19/hg19.fa
Jun 10 11:37:48 … FATAL ERROR, exiting
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/normal/tmp/pass2 --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/normal/tmp/pass2 --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 11:37:48 … started STAR run
Jun 10 11:37:48 … loading genome
EXITING because of FATAL ERROR: could not open genome file ./output_RNA/normal/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:48 … FATAL ERROR, exiting
mv ./output_RNA/normal/Aligned.out.sam ./output_RNA/normal/alignment.sam
./output_RNA/normal/SJ.out.tab doesn’t exist!
Error: At least one bam file doesn’t exist!

所以,问题出在哪里呢?
我们来一行行的分析:
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./geneome/hg19/STAR --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/tmp/pass1 --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

输入文件:
./output_RNA/normal/fastq1.fastq
./output_RNA/normal/fastq2.fastq
输出文件:
./output_RNA/normal/tmp/pass1

genomeDir ./GenomeDir/
string: path to the directory where genome files are stored (for --runMode alignReads) or will be generated (for --runMode generateGenome)
readFilesIn Read1 Read2
string(s): paths to files that contain input read1 (and, if needed, read2)

所以,这行代码的功能是比对吗?
不是,我觉得它没有任何意义。

进入到tumor文件夹;
cd ./output_RNA/tumor
这里怎么还创建基因组索引了呢(匪夷所思)?这是不对的。
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/tumor/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/tumor/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32

接下来
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/tumor/tmp/pass2 --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

所以报出了错误(哪里可能对呢?这个程序文件就是不合理的):

EXITING because of FATAL ERROR: could not open genome file ./output_RNA/tumor/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:34 … FATAL ERROR, exiting

……
诸如此类,在结构上挺乱的。作者为什么要做这些操作呢?为什么不直接比对?

我们需要理解,这三行STAR比对代码的意思。
所以,我觉得建库,这些操作可以都不需要,直接就是将normal/tumor序列数据比对到参考基因组上一行指令即可(这块是作者忘记注释掉了吗?)
需要查看一下网上的比对指令,我们这个文件应该怎样比对比较好?

STAR --twopassMode Basic 
--quantMode TranscriptomeSAM GeneCounts 
--runThreadN 6 
--genomeDir index_dir 
--alignIntronMin 20 
--alignIntronMax 50000 
--outSAMtype BAM SortedByCoordinate 
--sjdbOverhang 149 
--outSAMattrRGline ID:sample SM:sample PL:ILLUMINA 
--outFilterMismatchNmax 2 
--outSJfilterReads Unique 
--outSAMmultNmax 1 
--outFileNamePrefix out_prefix 
--outSAMmapqUnique 60 
--readFilesCommand gunzip -c 
--readFilesIn seq1.fq.gz seq2.fq.gz

修改代码:

system_call("STAR --genomeDir ".$star_index." --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
        " ".$zcat." --outFileNamePrefix ".$type_output." --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

#system_call("cd ".$type_output.";
#STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
#          $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

#system_call("STAR --genomeDir ".$type_output."/tmp/pass2 --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
 #         " ".$zcat." --outFileNamePrefix ".$type_output."/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

(以上思考的很乱……,划掉)


我们来,运行一下,正常的比对,看看行不行?
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./geneome/hg19/STAR --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

自动生成了两个文件:

  • SJ.out.tab
  • Aligned.out.sam

接下来,
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/tumor/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/tumor/SJ.out.tab --sjdbOverhang 100 --runThreadN 32

这个步骤也运行成功了。

STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 13:19:50 … started STAR run
!!! WARNING: Could not move Log.out file from ./Log.out into ./output_RNA/tumor/tmp/pass2/Log.out. Will keep ./Log.out
Jun 10 13:19:50 … starting to generate Genome files
Jun 10 13:21:18 … starting to sort Suffix Array. This may take a long time…
Jun 10 13:21:37 … sorting Suffix Array chunks and saving them to disk…
Jun 10 13:33:35 … loading chunks from disk, packing SA…
Jun 10 13:34:59 … finished generating suffix array
Jun 10 13:34:59 … generating Suffix Array index
Jun 10 13:39:05 … completed Suffix Array index
Jun 10 13:39:05 … inserting junctions into the genome indices
Jun 10 13:40:43 … writing Genome to disk …
Jun 10 13:40:44 … writing Suffix Array to disk …
Jun 10 13:40:53 … writing SAindex to disk
Jun 10 13:40:54 … finished successfully

最后,运行最后一个步骤:
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/tumor/tmp/pass2 --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

同样的默认生成的文件,文件名为:

  • SJ.out.tab
  • Aligned.out.sam

所以,代码,整体上,没有什么问题,只需要把Aligned.out.sam文件的默认名称修改为alignment.sam即可。

另外,转换到计算节点的方法就是:qsub -I

我仿佛是弄明白了整体的思路,那么接下来,再看一下,哪里出现了新的问题。


切换一下,回到Linux界面下。
最终找到了问题的核心。
问题出在,原始代码中,错误的切换路径:

system_call("cd ".$type_output.";/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
         $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

我们需要删去cd ".$type_output.";这行代码,因为并不需要。进入到这个路径之后,后面的相对路径就找不到了。相对应地,里面的文件也会检索不到。

修改代码后,重新回到服务器中运行(这次记得在计算节点上去运行)。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline>test.txt

(16:41)程序运行完成。
最后生成了两个结果文件(缺少somatic_mutations_hg19.txt):

  • germline_mutations_hg19.txt
  • coverage.txt

说明其他部分的运行都是没有问题的。
但是,在生成somatic_mutations_hg19.txt这个文档的时候,却出现了问题。这是我们接下来,要去解决的问题。

这里是一个细节的地方,我的CosmicCodingMuts.hg19.vcf 文件的原来的名字是,CosmicCodingMuts_hg19.vcf,因此,系统检测不到。所以,我需要对它的名字进行修改。

ERROR MESSAGE: Couldn’t read file /home/xxzhang/workplace/QBRC/./geneome/hg19/hg19.fa_resource/CosmicCodingMuts.hg19.vcf because file does not exist

解决方法:
mv CosmicCodingMuts_hg19.vcf CosmicCodingMuts.hg19.vcf

三个mutation calling的过程,也都没有成功。

Can’t exec “/home/xxzhang/workplace/software/manta/bin/configManta.py”: Permission denied at somatic.pl line 528.

Can’t exec “/home/xxzhang/workplace/software/Shimmer/shimmer.pl”: Permission denied at somatic.pl line 528.

解决方法:
修改文件夹的权限。

chmod -R 777 workplace/software/

/home/xxzhang/workplace/software/speedseq/bin/speedseq.config somatic -F 0.01 -q 10 -t 32 -T ./output_RNA/sptmp -o ./output_RNA/speedseq ./geneome/hg19/hg19.fa /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam

这个没有报错,但是,就是没有结果。不知道为啥?
还是觉得应该修改为:
/home/xxzhang/workplace/software/speedseq/bin/speedseq somatic -F 0.01 -q 10 -t 32 -T ./output_RNA/sptmp -o ./output_RNA/speedseq ./geneome/hg19/hg19.fa /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam

speedseq.config ===>speedseq,因为speedseq.config 不是一个具体的指令,显示不出内容的。
正确的是,speedseq

Program: speedseq
Version: 0.1.2
Author: Colby Chiang (cc2qe@virginia.edu)
usage: speedseq [options]
command: align align FASTQ files with BWA-MEM
var call SNV and indel variants with FreeBayes
somatic call somatic SNV and indel variants in a tumor/normal pair with FreeBayes
sv call SVs with LUMPY
realign realign from a coordinate sorted BAM file
options: -h show this message

/home/xxzhang/miniconda3/bin/lofreq call-parallel --pp-threads 32 -s --sig 1 --bonf 1 -C 7 -f ./geneome/hg19/hg19.fa -S ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --call-indels -l ./geneome/hg19/hg19.fa.exon.bed -o ./output_RNA/lofreq_n.vcf /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam

Calling external LoFreq script via execvp failed: No such file or directory

这个问题,应该如何解决呢?
后来查看lofreq的路径,发现call-parallel指令,以lofreq2_call_pparallel.py作为呈现方式。
修改指令为:
/home/xxzhang/miniconda3/bin/lofreq2_call_pparallel.py --pp-threads 32 -s --sig 1 --bonf 1 -C 7 -f ./geneome/hg19/hg19.fa -S ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --call-indels -l ./geneome/hg19/hg19.fa.exon.bed -o ./output_RNA/lofreq_n.vcf /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam

与之对应的,就是下游找不到该文件。

ERROR MESSAGE: Could not read file /home/xxzhang/workplace/QBRC/./output_RNA/somatic_diffs.readct.vcf because file ‘./output_RNA/somatic_diffs.readct.vcf’ does not exist

ERROR MESSAGE: Could not read file /home/xxzhang/workplace/QBRC/./output_RNA/speedseq2.vcf because file ‘./output_RNA/speedseq2.vcf’ does not exist

以及,下游进行筛选的过程中,遇到问题:

报错如下:
/home/xxzhang/workplace/software/R/R-3.6.3/bin/Rscript /home/xxzhang/workplace/QBRC//somatic_script/filter_vcf2.R ./output_RNA/somatic_mutations.txt ./output_RNA/somatic_mutations.hg19_multianno.txt ./output_RNA/somatic_mutations_hg19.txt somatic

Error in read.table(args[2], stringsAsFactors = F, sep = “\t”, header = T) :
no lines available in input
Execution halted

(这里就是说,没有输入文件,归根究底还是前面的过程出了问题)

(18:20)根据上面的出错,修改somatic.pl的代码。
回到window平台中。


(18:30)开始运行程序,等待结果。
我觉得我今天有点不舒服,不舒服的原因在于,社交障碍(不愿意说话),觉得自己并不属于这里(踏踏实实的把自己的事情做好,就非常足够了),还有今天好大的雨啊。
究竟什么是做科研呢?为什么文献我一点都看不进去。
现在我的程序,正在服务器的后台运行中,我这段时间干什么呢(文献我看不进去)?
我知道自己要提升,但是眼下,让我有些乱了手脚,觉得自己心里乱糟糟的,我想回去整理一下思路。
亲爱的,你究竟想要什么?
(18:58)我害怕你陷入庸庸碌碌的麻木的忙碌之中,却忘记了自己想要什么?这是最可怕的。
我先回宿舍,把程序挂在这里运行。明天早上过来看结果。

上一篇:【水果识别分类】基于matlab形态学水果识别分类【含Matlab源码 1132期】


下一篇:人脑全映射网络