Comparative assessment of long-read error-correction software applied to RNA-sequencing data 长读纠错软件应用于rna测序数据的比较评估

Comparative assessment of long-read error-correction software applied to RNA-sequencing data    长读纠错软件应用于RNA测序数据的比较评估

Leandro Lima, Camille Marchet, Ségolène Caboche, Corinne Da Silva, Benjamin Istace, Jean-Marc Aury, Hélène Touzet, Rayan Chikhi
 

Abstract

Motivation Long-read sequencing technologies offer promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However these technologies are currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of RNA-sequencing long reads remain limited.

Results In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type.

Benchmarking software https://gitlab.com/leoisl/LR_EC_analyser

动机
长读测序技术为高通量短读测序提供了很有前途的替代品,特别是在rna测序领域。
然而,这些技术目前受到输出数据的高错误率的阻碍,这些错误率会影响分析,如异构体的识别、外显子边界、开放阅读框和基因目录的创建。
由于这些数据的新新性,计算方法仍在积极开发中,rna测序长读的纠错选项仍然有限。

结果
在这篇文章中,我们评估现有的长读DNA纠错方法能够纠正cDNA纳米孔读错的程度。
我们提供了一个自动的和广泛的基准工具,不仅报告经典的错误修正指标,而且还修正对基因家族,亚型多样性,对主要亚型的偏见,剪接位点检测的影响。
我们发现,最初为DNA开发的长读错误校正工具也适用于rna测序数据的校正,特别是在提高碱基对准确性方面。
然而,研究者应该被警告的是,校正过程扰乱了基因家族的大小和亚型多样性。
根据应用程序的类型,该工作提供了应该使用哪些(或者是否)错误纠正工具的指南。

Comparative assessment of long-read error-correction software applied to RNA-sequencing data 长读纠错软件应用于rna测序数据的比较评估

上一篇:pdb表空间传输(nodone)


下一篇:关于cookie中sessionid丢失的问题排查和解决--samesite(客户端app、服务器端asp.net)