HVite是解码工具,输出语音信号,和字典信息、声学模型、语言模型等条件下,输出对应的转录文本(transcription)。
首先,字典(Vocab)的结构如下:
typedef struct {
int nwords; /* total number of words */
int nprons; /* total number of prons */
Word nullWord; /* dummy null word/node */
Word subLatWord; /* special word for HNet subLats */
Word *wtab; /* hash table for DictEntry's */
MemHeap heap; /* storage for dictionary */
MemHeap wordHeap; /* for DictEntry structs */
MemHeap pronHeap; /* for WordPron structs */
MemHeap phonesHeap; /* for arrays of phones */
} Vocab;
包含了词的个数、发音的个数、字典入口(DictEntry)的hash表,每个槽为指向一个DictEntry的指针(Word),那么DictEntry的结构如下:
typedef struct _DictEntry{
LabId wordName; /* word identifier */
Pron pron; /* first pronunciation */
int nprons; /* number of prons for this word */
Word next; /* next word in hash table chain */
void *aux; /* hook used by HTK library modules for temp info */
} DictEntry;
它指明了当前word的名字、发音以及是否有多个发音(nprons大于1)等等。然后是一些初始化HMMSet、加载模型参数等等,在之前的工具中都分析过了。
在解码过程中,比较重要而之前又没涉及过的数据结构是Lattice,理解Lattice的结构和作用,viterbi的算法就理解差不多了。可以认为Lattice是掌握viterbi算法的钥匙。
typedef struct lattice
{
MemHeap *heap; /* Heap lattice uses */
LatFormat format; /* indicate which fields are valid */
Vocab *voc; /* Dictionary lattice based on */
int nn; /* Number of nodes */
int na; /* Number of arcs */
LNode *lnodes; /* Array of lattice nodes */
LArc *larcs; /* Array of lattice arcs */
LabId subLatId; /* Lattice Identifier (for SubLats only) */
SubLatDef *subList; /* List of sublats in this lattice level */
SubLatDef *refList; /* List of all SubLats referring to this lat */
struct lattice *chain; /* Linked list used for various jobs */
char *utterance; /* Utterance file name (NULL==unknown) */
char *vocab; /* Dictionary file name (NULL==unknown) */
char *hmms; /* MMF file name (NULL==unknown) */
char *net; /* Network file name (NULL==unknown) */
float acscale; /* Acoustic scale factor */
float lmscale; /* LM scale factor */
LogFloat wdpenalty; /* Word insertion penalty */
float prscale; /* Pronunciation scale factor */
HTime framedur; /* Frame duration in 100ns units */
float logbase; /* base of logarithm for likelihoods in lattice files
(1.0 = default (e), 0.0 = no logs) */
float tscale; /* time scale factor (default: 1, i.e. seconds) */
Ptr hook; /* User definable hook */
} Lattice;
首先,我们从总体上了解Lattice是什么。然后再逐步细化下去。lattice结构包含的信息有多个节点、多少条边、节点和边构成的数组、字lattices和它的所有前向lattices和一些参数系数等。