HVite源码解析

HVite是解码工具,输出语音信号,和字典信息、声学模型、语言模型等条件下,输出对应的转录文本(transcription)。

首先,字典(Vocab)的结构如下:

typedef struct {
   int nwords;          /* total number of words */
   int nprons;          /* total number of prons */
   Word nullWord;       /* dummy null word/node */
   Word subLatWord;     /* special word for HNet subLats */
   Word *wtab;          /* hash table for DictEntry's */
   MemHeap heap;        /* storage for dictionary */
   MemHeap wordHeap;    /* for DictEntry structs  */
   MemHeap pronHeap;    /* for WordPron structs   */
   MemHeap phonesHeap;  /* for arrays of phones   */
} Vocab;

包含了词的个数、发音的个数、字典入口(DictEntry)的hash表,每个槽为指向一个DictEntry的指针(Word),那么DictEntry的结构如下:

typedef struct _DictEntry{
   LabId wordName;  /* word identifier */
   Pron pron;       /* first pronunciation */
   int nprons;      /* number of prons for this word */
   Word next;       /* next word in hash table chain */
   void *aux;       /* hook used by HTK library modules for temp info */
} DictEntry;

它指明了当前word的名字、发音以及是否有多个发音(nprons大于1)等等。然后是一些初始化HMMSet、加载模型参数等等,在之前的工具中都分析过了。

在解码过程中,比较重要而之前又没涉及过的数据结构是Lattice,理解Lattice的结构和作用,viterbi的算法就理解差不多了。可以认为Lattice是掌握viterbi算法的钥匙。

typedef struct lattice
{
   MemHeap *heap;               /* Heap lattice uses */
   LatFormat format;	       	/* indicate which fields are valid */
   Vocab *voc;                  /* Dictionary lattice based on */

   int nn;                      /* Number of nodes */
   int na;                      /* Number of arcs */
   LNode *lnodes;               /* Array of lattice nodes */
   LArc *larcs;                 /* Array of lattice arcs */

   LabId subLatId;              /* Lattice Identifier (for SubLats only) */
   SubLatDef *subList;          /* List of sublats in this lattice level */
   SubLatDef *refList;          /* List of all SubLats referring to this lat */
   struct lattice *chain;       /* Linked list used for various jobs */

   char *utterance;		/* Utterance file name (NULL==unknown) */
   char *vocab;			/* Dictionary file name (NULL==unknown) */
   char *hmms;			/* MMF file name (NULL==unknown) */
   char *net;			/* Network file name (NULL==unknown) */

   float acscale;               /* Acoustic scale factor */
   float lmscale;		/* LM scale factor */
   LogFloat wdpenalty;		/* Word insertion penalty */
   float prscale;		/* Pronunciation scale factor */
   HTime framedur;              /* Frame duration in 100ns units */
   float logbase;               /* base of logarithm for likelihoods in lattice files
                                   (1.0 = default (e), 0.0 = no logs) */
   float tscale;                /* time scale factor (default: 1, i.e. seconds) */

   Ptr hook;                    /* User definable hook */

} Lattice;

首先,我们从总体上了解Lattice是什么。然后再逐步细化下去。lattice结构包含的信息有多个节点、多少条边、节点和边构成的数组、字lattices和它的所有前向lattices和一些参数系数等。

 

 

 

 

 

上一篇:用 Visual Studio Code 做基于 .NET MAUI 跨平台移动应用开发


下一篇:AI大语音(十四)——区分性训练 (深度解析)