Quantitative single-cell rna-seq with unique molecular identifers
这篇文章论证了 scRNA-seq 使用UMI来计算基因表达量的合理性和优势。
这里主要研究如何分析 scRNA-seq 的数据,如何处理ERCC和UMI。
背景:
however, losses in cdna synthesis and bias in cdna amplifcation lead to severe quantitative errors.
单细胞测序,每个细胞内的mRNA的量都比较少,在合成cDNA时必然会有困难,所以就需要扩增,否则根本就检测不到mRNA,但扩增会导致定量的问题,因为你无法精准的控制扩增的量,这样你计算的表达量就是不准的。
The two main challenges in single-cell RNA-seq are the efficiency of cDNA synthesis (which sets the limit of detection) and the amplification bias (which reduces quantitative accuracy).
单细胞测序的主要问题是:1.cDNA合成;2.扩增偏差
In contrast, standard RNA-seq uses relative measures such as reads per kilobase per million reads
(RPKM), which mask differences in total mRNA content.
常规RNA-seq使用RPKM或FPKM来定量,这是相对定量;而UMI就是绝对定量。
so that the number of identical mRNA molecules is expected to be <100 for most genes.
UMI的原理,一般UMI就只有5bp,也就只有4的5次方种,1024种;UMI是一种随机序列,并不是一种特异标记,它并不是为了给每一个mRNA分子一个特殊标记。
To ensure that successfully generated cDNA molecules are sequenced, it is crucial to sequence to a sufficient depth after amplification.
单细胞必须保证有足够的测序深度
Pairwise correlation coefficients calculated