What is long read sequencing?
October 2018
Emma Johnson
emma.johnson@phgfoundation.org
Sobia Raza
sobia.raza@phgfoundation.org
DNA sequencing the process of reading part or all of the DNA of an organism is helping to improve clinical care across different areas of medicine, from rare diseases and cancers, to the management of infectious diseases.
Progress has been accelerated by the advancement of high-throughput nextgeneration sequencing (NGS) technologies, which are capable of reading the code of millions of small fragments of DNA in parallel. These have enabled faster sequencing with increased throughput, at falling costs. In recent years, new technologies that are capable of sequencing longer strands of DNA by reading single DNA molecules, have advanced and become more prominent. This briefing explains what long-read sequencing (LRS) is, and how it differs from established short-read sequencing (SRS). The second, accompanying briefing, Long-Read Sequencing: Ready for the Clinic? describes the potential of these technologies for diagnostic sequencing in a clinical setting, and in this context the challenges with implementing the technology.
The essentials
Single molecule, true long-read sequencers enables the production of reads that are considerably longer than those resulting from SRS. This has several inherent advantages
LRS can sequence parts of the genome that cannot easily be sequenced by short-read sequencing. Longer reads are more likely to look distinct compared to shorter reads, allowing them to be assembled together with less ambiguity
The two dominant producers of true long-read sequencing technologies are Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore)
什么是长读序列?
2018年10月
艾玛·约翰逊
emma.johnson@phgfoundation.org
索比亚Raza
sobia.raza@phgfoundation.org
DNA测序读取有机体的部分或全部DNA的过程,有助于改善不同医学领域的临床护理,从罕见疾病和癌症,到传染病的管理。
下一代高通量测序(NGS)技术的发展加速了这一进程,这种技术能够同时读取数百万个DNA小片段的编码。
这使得测序速度更快,通量增加,成本降低。
近年来,能够通过读取单个DNA分子来对较长DNA链进行测序的新技术不断发展,并变得更加突出。
本简要介绍了什么是长读测序(LRS),以及它与已建立的短读测序(SRS)有何不同。
第二篇,伴随简报,长期阅读的顺序:准备好去诊所了吗?
描述这些技术在临床诊断测序中的潜力,以及在此背景下实施该技术的挑战。
要点
单分子,真正的长读测序器能够产生比SRS长得多的测序结果。
这有几个固有的优点
LRS可以对基因组中无法通过短读测序进行测序的部分进行测序。
与较短的阅读相比,较长的阅读看起来更清晰,这使它们能够以更少的歧义组合在一起
真正的长时间测序技术的两大主要生产商是太平洋生物科学公司(PacBio)和牛津纳米孔技术公司(Nanopore)
What is long-read sequencing?
The genome of most organisms (including humans) is too long to be sequenced as one continuous string.
Using next-generation ‘short-read’ sequencing, DNA is broken into short fragments that are amplified(copied) and then sequenced to produce ‘reads’.
Bioinformatic techniques are then used to piece together the reads like a jigsaw, into a continuous genomic sequence.
True LRS technologies – sometimes referred to as third generation sequencers – directly sequence single molecules of DNA in real time, often without the need for amplification.
This direct sequencing approach enables the production of reads that are considerably longer than those resulting from SRS.
Other,’synthetic’ long-read sequencing approaches utilise modified sample processing and conventional SRS to computationally reconstruct long reads from shorter sequencing reads.
True LRS represents the greatest departure from widely used short-read systems.
Currently, the two dominant producers of ‘true’ long-read sequencing technologies are Pacific Biosciences(PacBio) and Oxford Nanopore Technologies (Nanopore).
Both have developed platforms for ‘real-time’ sequencing of nucleic acids (DNA and RNA) that is faster than current short-read technologies.
什么是长读序列?
大多数生物(包括人类)的基因组太长,不能作为一个连续的序列进行测序。
利用下一代“短读”测序技术,DNA被分解成短片段,经过扩增(复制),然后测序产生“短读”。
生物信息技术随后被用来将读到的信息像拼图一样拼凑成一个连续的基因组序列。
真正的LRS技术——有时也被称为第三代测序器——直接实时测序DNA的单个分子,通常不需要扩增。
这种直接测序方法能够产生比SRS长得多的reads。
另外,“合成”长读测序方法利用改良的样本处理和传统的SRS从较短的测序数据中计算重建长读数据。
真正的LRS与广泛使用的短读系统有很大的不同。
目前,“真正的”测序技术的两大主要生产商是太平洋生物科学公司(PacBio)和牛津纳米孔技术公司(Nanopore technologies)。
这两家公司都开发了核酸(DNA和RNA)“实时”测序平台,比目前的短读技术更快。
The benefits
There are several inherent benefits in using longer reads for the examination of genomic data; these can have advantages for clinical genome analysis.
• Genome assembly:
The human genome is over 3 billion DNA base pairs in length and contains many repetitive stretches of genetic code.
Like a complex jigsaw, reassembling the genome from short reads can be challenging, as many fragments look highly similar without additional context.
Long-read data can make this task simpler as the reads are more likely to look distinct, allowing them to be assembled together with less ambiguity and error.
Improvements in genome assembly are helping to close gaps in our knowledge of the genome and allow for a better understanding of the genetic causes of disease.
• Variant detection:
Some features of individual genomes are particularly difficult to detect and quantify with SRS technologies, for example:
large and complex rearrangements, large insertions or deletions of DNA, repetitive regions, highly polymorphic regions, or regions with low DNA nucleotide diversity.
Long reads can span across larger parts of these regions, so are able to detect more of these variants, which may be clinically relevant.
LRS may also enhance the ‘genome-wide’ detection of certain variants .
• Haplotype phasing:
In areas such as reproductive medicine it can be useful to know whether genetic variants exist on the same copy of the chromosome.
This can be determined using a process known as haplotype phasing.
Long reads are able to provide the long-range information for resolving haplotypes without additional statistical inference, maternal/paternal sequencing,
or sample preparation, as is required for an approximation of phasing using SRS.
Beyond producing long reads, true LRS technologies have other features that present new opportunities.
Amongst these are:
• Portability:
In contrast to other sequencing platforms, Nanopore’s devices rely on detecting electronic rather than optical signals.
This allows them to design devices as small as a memory (USB) stick, making them highly portable.
Many other sequencers, including the vast majority of SRS systems, are large desktop or free-standing machines.
Nanopore’s MinION device has been used to sequence samples in the field during the Ebola and Zika virus outbreaks and has even been used in space.
• Real-time sequencing and speed:
Compared to the fixed run times of SRS systems, both PacBio and Oxford Nanopore offer faster sequencing runs.
PacBio provides options for rapid sequencing that can be completed in <24hours, from sample preparation to analysis.
Nanopore technologies permit real-time analyses and allow experimental run time to be determined by the user, giving the user the ability to track data collection and begin analyses as desired.
This provides additional flexibility and speed, and removes the need for batch sequencing of multiple samples which is currently required for cost-effective SRS.
It is particularly useful when examining small genomes (such as those of many pathogens) or specific genomic regions.
• Other ‘omics:
Long-read technologies have been used to directly sequence RNA.
They may also allow simultaneous detection of epigenetic modifications (chemical modifications to DNA/RNA that affect how genes are expressed), although additional bioinformatic interpretation is required.
Separate sequencing runs need to be performed to retrieve this information using current SRS systems.
好处
使用更长的阅读来检查基因组数据有几个内在的好处;
这些在临床基因组分析中具有优势。
•基因组组装:
人类基因组的长度超过30亿个DNA碱基对,包含许多重复的遗传密码片段。
就像复杂的拼图一样,从短片段中重组基因组可能是一项挑战,因为许多片段在没有附加上下文的情况下看起来高度相似。
长时间读取的数据可以使这项任务变得更简单,因为读取的数据看起来更清晰,可以将它们组合在一起,减少歧义和错误。
基因组组装的改进有助于缩小我们对基因组知识的差距,并使我们能够更好地理解疾病的遗传原因。
•变异检测:
个别基因组的某些特征特别难以用SRS技术检测和量化,例如:
大规模和复杂的重排、大量的DNA插入或缺失、重复区域、高度多态性区域或DNA核苷酸多样性低的区域。
长阅读可以跨越这些区域的较大部分,因此能够检测更多的这些变异,这可能是临床相关的。
LRS还可以增强对某些变异的“全基因组”检测。
•单倍型分期:
在生殖医学等领域,了解基因变异是否存在于同一条染色体上是很有用的。
这可以通过一个称为单倍型分阶段的过程来确定。
Long reads能够提供解决单倍型的长程信息,而无需额外的统计推断、母系/父系测序、
或样品制备,这是使用SRS近似分相所需要的。
除了产生长读之外,真正的LRS技术还有其他特点,这些特点带来了新的机遇。
其中有:
•可移植性:
与其他测序平台相比,Nanopore的设备依靠检测电子信号而非光学信号。
这使得他们能够设计出像记忆棒(USB)一样小的设备,使它们具有高度的便携性。
许多其他序列器,包括绝大多数SRS系统,都是大型台式机或独立机器。
Nanopore公司的MinION设备已被用于埃博拉和寨卡病毒爆发期间的现场样本测序,甚至还被用于太空。
•实时排序和速度:
与SRS系统的固定运行时间相比,PacBio和Oxford Nanopore提供了更快的测序运行时间。
PacBio提供了快速测序选项,从样品制备到分析,可以在24小时内完成。
纳米孔技术允许实时分析,并允许用户决定实验运行时间,使用户能够跟踪数据收集和开始分析所需的能力。
这提供了额外的灵活性和速度,并消除了目前成本效益高的SRS需要对多个样品进行批量测序的必要性。
它在检查小基因组(如许多病原体)或特定基因组区域时特别有用。
•其他“组学:
长读技术已经被用来直接测序RNA。
它们也可以同时检测表观遗传修饰(影响基因表达的DNA/RNA的化学修饰),尽管需要额外的生物信息学解释。
需要使用当前的SRS系统执行单独的测序来检索这些信息。
Conclusion
The inherent benefits of utilising longer reads for genome reconstruction and analysis, alongside the additional potential advantages true LRS systems present for genome analysis, could be beneficial for the diagnosis of several diseases and disorders.
However, LRS systems also present their own challenges, and come with some limitations;
this and their potential for use in clinical sequencing is discussed in the accompanying briefing.
结论
利用更长的读取时间进行基因组重建和分析的内在好处,以及true LRS系统为基因组分析提供的额外潜在优势,可能有利于多种疾病和紊乱的诊断。
利用下一代“短读”测序技术,DNA被分解成短片段,然后进行扩增(复制)然后测序产生“读”。
本文将对其在临床测序中的应用前景进行讨论。