歌声合成方法和工具总结1

2023-12-05 11:20:10

主流的方法

tacotron + wavenet=tacotron2
1. demo：https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/Tacotron2_and_WaveNet_text_to_speech_demo.ipynb

5.声码器
1. world
1. github地址：https://github.com/r9y9/wavenet_vocoder
2.world主要提取提取pitch音高（基频，F0）、谐波谱包络线、非周期谱包络线

音频特征提取工具包librosa

音乐信息检索（Music information retrieval，MIR）
应用方向

目前MIR的商业应用主要包括：

1. 推荐系统
目前音乐推荐的应用很多，但很少是基于MIR技术实现的，现在主流技术是通过人工标记或者用户的评论以及收听历史等简介数据进行分类判断，进而实现推荐，但事实上不同音乐本身的相似性是很多的
2. 轨道分离及乐器识别
实现音乐的轨道分离，以及从音乐中识别出是何种乐器在演奏
3. 自动录音
根据音乐自动转换成MIDI文件或者乐谱
4. 音乐分类
根据音乐的产地、艺术家身份、音乐节奏等特征，借助机器学习方法进行5. 音乐分类
6. 自动生成音乐
利用数据库训练模式，让机器自主创造音乐
* [参考*]

librosa 核心代码【* 参考librosa官方文档*]
3.1 音频信号提取

load（path[,sr,mono,offset,duration,…]):读取音频文件为时间序列的数据
to_mono(y):转化为单声道
resample（y,orig_sr,target_sr[,res_type,…]):重新采样
get_duration([y,sr,S,n_fft,hop_length,…]):计算音频文件的时长
autocorrelate(y[, max_size, axis])：自动边界识别
zero_crossings(y[, threshold, …])：找到0交叉点
tone(frequency[, sr, length, duration, phi])：返回一个纯音信号
3.2 光谱表示
stft(y[, n_fft, hop_length, win_length, …])：短时傅里叶变换
istft(stft_matrix[, hop_length, win_length, …])：反傅里叶变换
ifgram(y[, sr, n_fft, hop_length, …])：计算瞬时采样频率
3.3 幅度范围
amplitude_to_db(S[, ref, amin, top_db])：转化为db单位的幅值
db_to_amplitude(S_db[, ref])：db单位的响度图转化为幅值光谱图
power_to_db(S[, ref, amin, top_db])：能量光谱图转化为响度db单位的图
db_to_power(S_db[, ref])：响度的光谱图转化为db的响度图
perceptual_weighting(S, frequencies, **kwargs)：感知加权的能量光谱图

3.4 时间和频率转化

frames_to_samples(frames[, hop_length, n_fft])：帧指数转化为音频采样指数
frames_to_time(frames[, sr, hop_length, n_fft])：帧到时间的转化
samples_to_frames(samples[, hop_length, n_fft])：采样指数到短时傅里叶变换帧
samples_to_time(samples[, sr])：帧数到时间的转化
time_to_frames(times[, sr, hop_length, n_fft])：时间到傅里叶真数的转化
time_to_samples(times[, sr])：时间到采样数的转化
hz_to_note(frequencies, kwargs):频率到音符的转化
hz_to_midi(frequencies)：根据频率得到midi的音符数
midi_to_hz(notes)：midi的音符得到频率
midi_to_note(midi[, octave, cents])：将midi数字转化为音符符号
note_to_midi(note[, round_midi])：音符符号转化为midi数字格式
hz_to_mel(frequencies[, htk])：频率转化为梅尔谱
hz_to_octs(frequencies[, A440])：频率转化为八度音符
mel_to_hz(mels[, htk])：梅尔谱到频率转化
octs_to_hz(octs[, A440])：八度音符到频率转化
fft_frequencies([sr, n_fft])：
mel_frequencies([n_mels, fmin, fmax, htk])：梅尔谱到频率的转化
tempo_frequencies(n_bins[, hop_length, sr])：每分钟的节拍转化为矩阵
samples_like(X[, hop_length, n_fft, axis])：特征矩阵转为数组（采样数）

3.5 音高和曲调

estimate_tuning([y, sr, S, n_fft, …])：估计输入音频的曲调
pitch_tuning(frequencies[, resolution, …])：根据音高估计曲调
3.6 节奏和曲速
beat_track([y, sr, onset_envelope, …])：估计节奏
tempo([y, sr, onset_envelope, hop_length, …])：估计曲速
3.7 显示
specshow(data[, x_coords, y_coords, x_axis, …])：显示光谱图
waveplot(y[, sr, max_points, x_axis, …])：振幅包络的波形图
cmap(data[, robust, cmap_seq, cmap_bool, …])：从给定数据中获取默认色彩映射
3.8 光谱特征

3.9 节奏特征

tempogram（[y，sr，onset_envelope，…]）计算临时图：起始强度包络的局部自相关。
3.10 光谱图分解

WaveNet vocoder

    1. github地址
    2.博客：https://r9y9.github.io/wavenet_vocoder/

码农公寓

主流的方法

音频特征提取工具包librosa

WaveNet vocoder

相关文章