Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications

Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications  短读、长读和光学测图组件的组合揭示了具有群体遗传意义的大规模串联重复序列阵列

  1. Matthias H. Weissensteiner1,2
  2. Andy W.C. Pang3
  3. Ignas Bunikis4
  4. Ida H?ijer4
  5. Olga Vinnere-Petterson4
  6. Alexander Suh1,5 and 
  7. Jochen B.W. Wolf1,2,5

+Author Affiliations

  1. 1Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden;
  2. 2Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilian University of Munich, 82152 Planegg-Martinsried, Germany;
  3. 3BioNano Genomics, San Diego, California 91121, USA;
  4. 4SciLife Lab Uppsala, Uppsala University SE-751 85 Uppsala, Sweden
  1. Corresponding authors: matthias.weissensteiner@ebc.uu.sealexander.suh@ebc.uu.se
  1. ?5 These authors contributed equally to this work.

Abstract

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and subtelomeric regions, it locally influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly [LR]) and single-molecule optical maps (optical map assembly [OM]). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing misassemblies. By combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using whole-genome population resequencing data, we estimated the population-scaled recombination rate (ρ) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three different technologies, our results highlight the importance of adding a layer of information on genome structure that is inaccessible to each approach independently.

准确和连续的基因组装配是形成基因组多样性和进化过程的全面理解的关键。
然而,它经常受到结构异染色质的限制,通常特征是高度重复的DNA。
作为与着丝粒和亚端粒区域相关的基因组结构的一个关键特征,它在局部影响减数分裂重组。
在这项研究中,我们评估了大型串联重复序列阵列对鸟类物种形成模型中重组率景观的影响。
我们利用单分子实时测序(long-read assembly [LR])和单分子光学图谱(optical map assembly [OM])收集了两份高质量的基因组参考文献。
包括为同一个体构建的已发布的短读程序集(SR)的三方比较允许评估程序集属性和定位错误程序集。
通过结合来自所有三个集合的信息,我们确定了36个序列集合断点附近的大重复区域,其中大多数包含一个14kb卫星重复序列或它的1.2 kb子单元的复杂阵列。
利用全基因组群体重测序数据,我们估计了群体规模重组率,发现这些地区的重组率显著降低。
这些发现与着丝粒或端粒下异染色质邻近区域的低重组效应相一致,并增加了我们对遗传多样性和基因组分化中产生广泛异质性的过程的理解。
通过结合三种不同的技术,我们的结果强调了在基因组结构上增加一层信息的重要性,而每种方法都无法单独获取这些信息。

Footnotes

  • [Supplemental material is available for this article.]

Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications

上一篇:C#操作word -转载


下一篇:c89和c99的区别【转】