语音识别的一些开源项目整理

2023-11-08 21:21:40

1、语音识别主流工具包

（1）ESPNET

推荐指数：★★★★★

star数量：4.4k

工具特点：支持多个语音任务，支持多个ASR端到端系统，当前最活跃的语音开源社区，是第三代端到端ASR系统的典型代表。

链接：https://github.com/espnet/espnet

（2）kaldi

推荐指数：★★★★☆

start数量：11k

工具特点：基于C++开发，工具丰富，2012-2018年最活跃的开源社区，是第二代神经网络ASR系统的典型代表。

链接：https://github.com/kaldi-asr/kaldi

（3）wenet

推荐指数：★★★★☆

start数量：1.5k

工具特点：基于pytorch，代码较为简洁，并有多个平台的runtime支持。

链接：https://github.com/wenet-e2e/wenet

（4）speechbrain

推荐指数：★★★★☆

star数量：3.3k

工具特点：该工具纯python化，易用性的设计较好。

链接：https://github.com/speechbrain/speechbrain

（5）ASRT

推荐指数：★★★★☆

star数量：4.9k

工具特点：端到端训练。

链接：https://github.com/nl8590687/ASRT_SpeechRecognition

（6）openasr

推荐指数：★★☆☆☆

start数量：100-

链接：https://github.com/by2101/OpenASR

（7）openspeech

推荐指数：★★☆☆☆

star数量：300+

链接：https://github.com/openspeech-team/openspeech

（8）lingvo

推荐指数： ★★★☆☆

star数量：2.3k

工具特点：是google基于tensorflow开发的神经网络工具包，包含了asr在内的多个任务。

链接：https://github.com/tensorflow/lingvo

（9）fairseq

推荐指数： ★★★☆☆

start数量：14.4k

工具特点：是meta基于pytorch开发的序列到序列建模的工具，包含了ASR在内的多个任务。

链接：https://github.com/pytorch/fairseq

（10）athena

star数量：700+

工具特点：端到端语音处理工具包，同样包含asr在内的多个任务。

链接：https://github.com/athena-team/athena

（11）deepspeechstar

star数量：18.5k

链接：https://github.com/mozilla/DeepSpeech

（12）wav2letter

star数量：5.9k

链接：https://github.com/flashlight/wav2letter

（13）CAT

star数量：100+

工具特点：基于CTC-CRF的ASR系统

链接：https://github.com/thu-spmi/CAT

（14）torchaudio

star数量：1.5k

工具特点：pytorch的audio库

链接：https://github.com/pytorch/audio

（15）htk

推荐指数：★★☆☆☆

工具特点：基于C开发，是第一代HMM的ASR系统的典型代表。

链接：https://htk.eng.cam.ac.uk/2、其他工具包

2、其他功能型工具包/库

（1）kaldiio

链接：https://github.com/nttcslab-sp/kaldiio

（2）librosa

链接：https://github.com/librosa/librosa

（3）warp-ctc

链接：https://github.com/baidu-research/warp-ctc

（4）warp-transducer

链接：https://github.com/HawkAaron/warp-transducer

（5）k2

链接：https://github.com/k2-fsa/k2

（6）sctk

链接：GitHub - usnistgov/SCTK

码农公寓

相关文章