本文为加拿大纽芬兰纪念大学(作者:Songyuan Ji)的硕士论文,共109页。
研究与人类疾病相关的单核苷酸多态性(SNPs)对于识别致病性遗传变异和阐明复杂疾病的遗传结构具有重要意义。一项全基因组关联研究(GWAS)检测不同个体的遗传变异,并检测与疾病相关的SNP。传统的机器学习方法往往将SNP数据作为一个序列进行分析和处理,从而忽略了多种遗传因素之间复杂的相互作用关系。在这篇论文中,我们提出了一种新的混合深度学习方法来识别与结直肠癌相关的易感单核苷酸多态性。首先通过混合特征选择算法选择一组SNP变体,然后通过选择空间填充曲线模型将其组织为3D图像。利用这些图像构造并训练了多层深度卷积神经网络。我们发现,使用空间填充曲线模型生成的图像保留了基因组中的原始SNP位置,产生了最好的分类性能。我们还报告了一组与结直肠癌相关的高危单核苷酸多态性,以此作为深度神经网络模型的结果。
The study of Single Nucleotide Polymorphisms (SNPs) associated with human diseases is important for identifying pathogenic genetic variants and illuminating the genetic architecture of complex diseases. A Genome-wide association study (GWAS) examines genetic variation in different individuals and detects disease related SNPs. The traditional machine learning methods always use SNPs data as a sequence to analyze and process and thus may overlook the complex interacting relationships among multiple genetic factors. In this thesis, we propose a new hybrid deep learning approach to identify susceptibility SNPs associated with colorectal cancer. A set of SNPs variants were first selected by a hybrid feature selection algorithm, and then organized as 3D images using a selection of space-filling curve models. A multi-layer deep Convolutional Neural Network was constructed and trained using those images. We found that images generated using the space-filling curve model that preserve the original SNP locations in the genome yield the best classification performance. We also report a set of high risk SNPs associate with colorectal cancer as the result of the deep neural network model.
-
引言
- 项目背景
- 研究方法
- 研究结果
- 讨论与结论
附录A.1 鉴别器的Python代码
附录A.2 最终结果的基因组信息
更多精彩文章请关注公众号: