VC-Dimension and Rademacher Complexity-based bounds

VC-Dimension和Rademacher complexity是机器学习中常提到的度量复杂的的概念,一直远观而没有亵玩,今天对这个概念进行学习记录。

VC-Dimension

全称为Vapnik-Chervonenkis dimension,从wiki上搞来一段定义

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the capacity (complexity, expressive power, richness, or flexibility) of a space of functions that can be learned by a statistical classification algorithm. It is defined as the cardinality of the largest set of points that the algorithm can shatter. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.[1]

即:在VC 理论中,VC维度是一种度量由统计分类算法得到的函数空间的容量(复杂度,表达能力,丰富性或灵活性)的方法。它被定义为算法可以破坏的最大点集的基数。

集合族的VC维

VC-Dimension and Rademacher Complexity-based bounds为集合族,VC-Dimension and Rademacher Complexity-based bounds是一个集合。它们的交被定义为

VC-Dimension and Rademacher Complexity-based bounds
我们说集合VC-Dimension and Rademacher Complexity-based boundsVC-Dimension and Rademacher Complexity-based boundsshatter,如果VC-Dimension and Rademacher Complexity-based bounds包含VC-Dimension and Rademacher Complexity-based bounds的所有子集,即:

VC-Dimension and Rademacher Complexity-based bounds

VC-Dimension and Rademacher Complexity-based bounds的VC维VC-Dimension and Rademacher Complexity-based bounds是被破坏的最大基数,如果任意大的子集能被破坏,那么VC维是VC-Dimension and Rademacher Complexity-based bounds

分类模型的VC维

一个分类模型VC-Dimension and Rademacher Complexity-based bounds有着参数向量VC-Dimension and Rademacher Complexity-based bounds被称为shatter一个集合的数据点VC-Dimension and Rademacher Complexity-based bounds如果对这些数据点所有赋予的标签,都存在VC-Dimension and Rademacher Complexity-based bounds使得VC-Dimension and Rademacher Complexity-based bounds在评估这个集合的数据点的时候无错误。
模型VC-Dimension and Rademacher Complexity-based bounds的VC维是能被VC-Dimension and Rademacher Complexity-based boundsshatter的最大点数。

例子

例如一条直线在平面分类,那么最多二分类的点数为3,所以VC维为3.

Rademacher complexity

In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.

在计算学习理论中(机器学习和计算理论),Rademacher comlexity由Hans Rademacher命名,度量一类基于概率分布实值函数的丰富性。

集合的Rademacher comlexity

给定集合VC-Dimension and Rademacher Complexity-based boundsVC-Dimension and Rademacher Complexity-based bounds的Rademacher complexity定义如下:

VC-Dimension and Rademacher Complexity-based bounds

其中VC-Dimension and Rademacher Complexity-based bounds是Rademacher分布得到的独立随机变量。

VC-Dimension and Rademacher Complexity-based bounds forVC-Dimension and Rademacher Complexity-based boundsandVC-Dimension and Rademacher Complexity-based bounds

许多作者在取上界前取了和的绝对值,但是如果VC-Dimension and Rademacher Complexity-based bounds是对称的,那么这就没区别。

函数类的Rademacher comlexity

给定样本VC-Dimension and Rademacher Complexity-based bounds,类别VC-Dimension and Rademacher Complexity-based bounds定义在空间VC-Dimension and Rademacher Complexity-based boundsVC-Dimension and Rademacher Complexity-based bounds的实验Rademacher

complexity对给定的VC-Dimension and Rademacher Complexity-based bounds定义为:

VC-Dimension and Rademacher Complexity-based bounds

这也可以使用之前的定义来写:

VC-Dimension and Rademacher Complexity-based bounds

其中表示函数复合,即:

VC-Dimension and Rademacher Complexity-based bounds
VC-Dimension and Rademacher Complexity-based boundsVC-Dimension and Rademacher Complexity-based bounds的概率分布。函数类VC-Dimension and Rademacher Complexity-based bounds的Rademacher complexity基于VC-Dimension and Rademacher Complexity-based bounds且样本大小为VC-Dimension and Rademacher Complexity-based bounds为:
VC-Dimension and Rademacher Complexity-based bounds
其中上述期望都是从VC-Dimension and Rademacher Complexity-based bounds中采样独立同分布的样本。

 

 

 

承接Matlab、Python和C++的编程,机器学习、计算机视觉的理论实现及辅导,本科和硕士的均可,咸鱼交易,专业回答请走知乎,详谈请联系QQ号757160542,非诚勿扰。

 

 

 

 

上一篇:去中心化数字身份DID简介——四、用户属性的零知识证明


下一篇:实习笔记——SDH原理