label smoothing

 

An overconfident model is not calibrated and its predicted probabilities are consistently higher than the accuracy.

For example, it may predict 0.9 for inputs where the accuracy is only 0.6.

Notice that models with small test errors can still be overconfident, and therefore can benefit from label smoothing.

 

Label smoothing replaces one-hot encoded label vector y_hot with a mixture of y_hot and the uniform distribution:

 

  y_ls = (1 - a) * y_hot + a / k

 

where K is the number of label classes, and α is a hyperparameter that determines the amount of smoothing.

If α = 0, we obtain the original one-hot encoded y_hot. If α = 1, we get the uniform distribution.

 

ref:

What is Label Smoothing?. A technique to make your model less… | by Wanshun Wong | Towards Data Science

 

def label_smoothing(inputs, epsilon=0.1):
    '''Applies label smoothing. See 5.4 and https://arxiv.org/abs/1512.00567.
    inputs: 3d tensor. [N, T, V], where V is the number of vocabulary.
    epsilon: Smoothing rate.
    
    For example,
    
    ```
    import tensorflow as tf
    inputs = tf.convert_to_tensor([[[0, 0, 1], 
       [0, 1, 0],
       [1, 0, 0]],

      [[1, 0, 0],
       [1, 0, 0],
       [0, 1, 0]]], tf.float32)
       
    outputs = label_smoothing(inputs)
    
    with tf.Session() as sess:
        print(sess.run([outputs]))
    
    >>
    [array([[[ 0.03333334,  0.03333334,  0.93333334],
        [ 0.03333334,  0.93333334,  0.03333334],
        [ 0.93333334,  0.03333334,  0.03333334]],

       [[ 0.93333334,  0.03333334,  0.03333334],
        [ 0.93333334,  0.03333334,  0.03333334],
        [ 0.03333334,  0.93333334,  0.03333334]]], dtype=float32)]   
    ```    
    '''
    V = inputs.get_shape().as_list()[-1] # number of channels
    return ((1-epsilon) * inputs) + (epsilon / V)

 

上一篇:小学期微信小程序实训篇5_分页加载、携参跳转、日期处理方法


下一篇:bs实战-新浪微博