A single-molecule long-read survey of the human transcriptome

A single-molecule long-read survey of the human transcriptome

对人类转录组的单分子长read调查

  • An Erratum to this article was published on 10 March 2014

Abstract

Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5′ to 3′ end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5′ ends. For longer RNA molecules more 5′ nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

全球RNA研究已经成为理解生物学过程的核心,但是像微阵列和短读测序这样的方法还不能描述5 '到3 '端的整个RNA分子。
在这里,我们使用太平洋生物科学公司的单分子长读测序技术对20个人体器官和组织的聚腺苷酸RNA补体进行测序,不需要片段或扩增。
我们发现,长达1.5 kb的全长RNA分子很容易被监测到,在5 '末端几乎没有序列损失。
对于较长的RNA分子,有更多的5 '核苷酸缺失,但完整的内含子结构通常被保留下来。
总的来说,我们确定了约14000个拼接的基因。
高置信度映射与基因编码注释是一致的,但是10%的序列代表了以前没有注释的内含子结构。
作为一个群体,转录本映射到未注释区域具有长、非编码rna的特征。
我们的结果表明,从复杂的真核转录组在单分子水平上对全长RNA进行深度测序是可行的。

上一篇:[Fundamental of Power Electronics]-PART I-6.变换器电路-6.2 变换器简单罗列


下一篇:Single Number