Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Abstract
Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

Keywords:
hybrid assembly, bacterial genomics, long-read sequencing, Enterobacteriaceae, plasmid assembly

长read测序技术在复杂细菌基因组杂交组装中的比较

摘要
Illumina测序允许快速、廉价和准确的全基因组细菌分析,但是短读(300 bp)通常不能实现完整的基因组组装。
长读测序大大有助于解决复杂的细菌基因组,特别是当结合短读Illumina数据(杂交组装)。
然而,尚不清楚不同的长读测序方法如何影响杂交装配的准确性。
装配过程的相对自动化对于促进高通量的完整细菌基因组重建也至关重要,避免了多个定制过滤和数据操作步骤。
在这项研究中,我们使用Illumina测序和来自牛津纳米孔技术(ONT)或SMRT太平洋生物科学(PacBio)测序平台的长序列,比较了20个细菌分离物的杂交组件,包括两个参考菌株。
我们选择了肠杆菌科的分离株,因为它们通常具有高度可塑性和重复的遗传结构,对这些物种进行完整的基因组重建有助于准确理解抗菌素耐药性的流行病学。
我们使用混合装配器uniycler重新组装基因组,并比较不同的读取处理策略,以及用Flye进行长只读组装,然后用Pilon进行短读抛光。
无论是PacBio还是ONT reads的杂交装配都有利于高质量的基因组重建,并且在精确性和完整性方面优于长读装配和抛光方法。
结合ONT和Illumina阅读完全解决了大多数基因组不需要额外的手动步骤,并且在我们的设置中每个分离物有较低的消耗品成本。
自动杂交装配是实现细菌基因组完整、准确装配的有力工具。

关键词:
杂交组装,细菌基因组学,长读测序,肠杆菌科,质粒组装

上一篇:[ARM-assembly]-C语言和汇编对比学习


下一篇:[ARM-assembly]-A64的load/store指令总结