1、数据处理记录
【1】使用网页另存为将数据文件保存至本地电脑后,上传至服务器中
【2】利用SRAtoolkits中的fastq-dump命令,将SRA数据转换为fastq格式
#命令如下:
(base) zexing@DNA:~/projects/sunxiaoyu/RNA_seq/2021_03_15/clean_data$ fastq-dump --split-e SRR11906971 SRR11906972 SRR11906973 SRR11906974
Read 27386449 spots for SRR11906971
Written 27386449 spots for SRR11906971
Read 40512990 spots for SRR11906972
Written 40512990 spots for SRR11906972
Read 35874167 spots for SRR11906973
Written 35874167 spots for SRR11906973
Read 33734879 spots for SRR11906974
Written 33734879 spots for SRR11906974
【3】在Linux服务器中对RNA_seq数据进行处理
vim新建RNA_seq_script将数据质控、比对、格式转换、排序、拼接和定量综
#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
# This program is used for RNA-seq data analysis.
# History
# 2021/03/15 zexing First release
# 设置变量${dir}为常用目录
dir=/f/xudonglab/zexing/projects/sunxiaoyu/RNA_seq/2021_03_15
# 利用for循环进行后续操作
for i in SRR11906971 SRR11906972 SRR11906973 SRR11906974
do
# 对数据进行比对
hisat2 -t -p 16 -x /f/xudonglab/zexing/reference/UCSC_mm10/hisat2_index/hisat2_index_mm10 \
-1 ${dir}/clean_data/${i}_1.fastq \
-2 ${dir}/clean_data/${i}_2.fastq \
-S ${dir}/sam/${i}.sam
# 对数据进行格式转换
samtools view -@ 16 -S ${dir}/sam/${i}.sam -1b -o ${dir}/bam/${i}.bam
# 对数据进行排序
samtools sort -@ 16 -l 5 -o ${dir}/bam_sort/${i}.bam.sort ${dir}/bam/${i}.bam
# 对数据进行拼接、定量
mkdir ${dir}/ballgown/"$i"
stringtie ${dir}/bam_sort/"$i".bam.sort -o ${dir}/ballgown/"$i"/"$i".gtf \
-p 16 -G /f/xudonglab/zexing/reference/UCSC_mm10/mm10_genes.gtf -e -B \
-A ${dir}/ballgown/"$i"/"$i".gene.tab
done
后台运行RNA_seq_script:
nohup bash RNA_seq_script > RNA_seq_script_log &