【信息技术】【2002.06】音频信号处理、分析和检索系统

【信息技术】【2002.06】音频信号处理、分析和检索系统

本文为美国普林斯顿大学(作者:GEORGE TZANETAKIS)的博士论文,共198页。

数字音频,尤其是音乐收藏正在成为普通电脑用户体验的主要部分。电影和动画行业也使用大型数字音频集的音效。利用大量音频收集的研究领域包括:听觉显示、生物声学、计算机音乐、法医学和音乐认知。为了开发更复杂的工具与大型数字音频集交互,需要对计算机听觉算法和用户界面进行研究。

在这项工作中,将描述一系列用于操作、检索和分析大量音频信号的系统。该系统的基础是自动音频内容分析的新算法和现有算法的应用。分析结果用于构建新颖的二维和三维图形用户界面,用于浏览并与音频信号数据集进行交互。该系统基于信号处理、模式识别、信息检索、可视化和人机交互等领域的技术。所有提出的算法和接口都在MARSYAS下集成,MARSYAS是一个为计算机听觉研究快速原型设计的免费软件框架。在大多数情况下,所提出的算法都是通过进行用户研究来评估和了解的。这项工作对计算机听觉领域的新贡献包括:通用的多特征音频纹理分割方法、MP3压缩数据的特征提取、基于离散小波变换的节拍自动检测和分析以及结合音色、节奏和和声特征的音乐体裁分类。在这项工作中开发的新颖图形用户界面是浏览和可视化大型音频集合的各种工具,如timberream、timberspace、GenreGram和增强的声音编辑器。

Digital audio and especially music collections are becoming a major part of the average computer user experience. Large digital audio collections of sound effects are also used by the movie and animation industry. Research areas that utilize large audio collections include: Auditory Display, Bioacoustics, Computer Music, Forensics, and Music Cognition. In order to develop more sophisticated tools for interacting with large digital audio collections, research in Computer Audition algorithms and user interfaces is required. In this work a series of systems for manipulating, retrieving from, and analysing large collections of audio signals will be described. The foundation of these systemsis the design of new and the application of existing algorithms for automatic audio content analysis. The results of the analysis are used to build novel 2D and 3D graphical user interfaces for browsing and interacting with audio signals and collections. The proposed systems are based on techniques from the fields of Signal Processing, Pattern Recognition, Information Retrieval, Visualization and Human Computer Interaction. All the proposed algorithms and interfaces are integrated under MARSYAS, a free software framework designed for rapid prototyping of computer audition research. In most cases the proposed algorithms have been evaluated and informed by conducting user studies. New contributions of this work to the area of Computer Audition include: a general multifeature audio texture segmentation methodology, feature extraction from mp3 compressed data, automatic beat detection and analysis based on the Discrete Wavelet Transform and musical genre classification combining timbral, rhythmic and harmonic features. Novel graphical user interfaces developed in this work are various tools for browsing and visualizing large audio collections such as the Timbregram, TimbreSpace, GenreGram, and Enhanced Sound Editor.

  1.   引言
    
  2. 表达形式
  3. 分析
  4. 交互
  5. 评估
  6. 执行
  7. 结论

更多精彩文章请关注公众号:【信息技术】【2002.06】音频信号处理、分析和检索系统

上一篇:Server Error in '/' Application Runtime Error 错误


下一篇:Python爬虫新手入门教学(十六):爬取网站音乐素材