Jabba: hybrid error correction for long sequencing reads using maximal exact matches
机译:Jabba:使用最大精确匹配对长序列读数进行杂交错误校正
hird generation sequencing platforms produce longer reads with higher error rates than second generation sequencing technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is that this mapping is constructed with a seed and extend methodology, using maximal exact matches as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of maximal exact matches in the context of third generation reads are presented.展开▼
机译:与第二代测序技术相比,第三代测序平台可产生更长的读数,错误率更高。尽管提高的读取长度可以为下游分析提供有用的信息,但高错误率对基础算法提出了挑战。使用准确的短读来纠正嘈杂的长读的纠错方法似乎很有吸引力,可以生成高质量的长读。将短读与长读对齐的方法不能最佳地使用第二代数据中包含的信息,并且运行时间长。近来,已经提出了一种新的混合错误校正方法,其中第二代数据首先被组装成de Bruijn图,然后在其上对齐长读。在这种情况下,我们介绍了Jabba,这是一种通过将它们映射到由第二代数据构建的经过校正的de Bruijn图上来纠正较长的第三代读取的混合方法。我们的方法的独特之处在于,此映射是使用种子和扩展方法构造的,并使用最大精确匹配作为种子。除了基准测试结果外,还提供了一些有关在第三代阅读中使用最大精确匹配的可能性和局限性的理论结果。