Reference:

Linear Classification

score function (评分函数)

原始图片到各个类别的评分情况，分越高越接近

$\displaystyle f(x_i,W,b)=Wx_i+b$

每个图像都是 [D x 1] D = pixel X pixel X 3(RGB)

参数

权重

W (weight) = [k X D] k是样本数

在维度空间里做旋转变换

偏差向量

b (bias vector) = [k x 1]

在维度空间里做平移变化

图像数据预处理

最常见的就是 normalization

零均值的中心化把 [0-255] -> [-127-127] -> [-1,1]

Loss Function (损失函数)

量化分类label 与真实label 之间的一致性也叫代价函数Cost Function或目标函数Objective

当 Score Function 与真实结果相差越大，Cost Function输出越大

多类支持向量机损失 Multiclass Support Vector Machine Loss

$\displaystyle L_i=\sum_{j\not=y_i}max(0,s_j-s_{y_i}+\Delta)$

eg: 有三个分类 score 是 s = [13, -7, 11] \delta = 10, 第一个label 是真实正确的

$\displaystyle Li=max(0,-7-13+10)+max(0,11-13+10)$

因为 -20 大于 \delta 边界值，所以最后的损失值为8

线性评分函数的损失函数公式：

$\displaystyle L_i=\sum_{j\not=y_i}max(0,w^T_jx_i-w^T_{y_i}x_i+\Delta)$

折叶损失（hinge loss）: max (0, -)

平方折叶损失SVM（即L2-SVM）： max (0, -) ^2

目的是想要正确类别进入红色区域，如果其他类别进入红色区域甚至更高的时候，计算loss，我们的目的是找权重W

Regularization 正则化

假设有一个数据集和一个权重集W能够正确地分类每个数据，可能有很多相似的W都能正确地分类所有的数据

比如有一个权重W 调整系数可以改变Loss score。我们希望給W添加一些偏好

向损失函数增加一个正则化惩罚

egularization penalty 正则化惩罚 R(W)

$R(W)=\sum_k \sum_l W^2_{k,l}$

多类SVM损失函数 L = 数据损失（data loss），即所有样例的的平均损失L_i + 正则化损失（regularization loss）

$L=\displaystyle \underbrace{ \frac{1}{N}\sum_i L_i}_{data \ loss}+\underbrace{\lambda R(W)}_{regularization \ loss}$

展开公式

$L=\frac{1}{N}\sum_i\sum_{j\not=y_i}[max(0,f(x_i;W)_j-f(x_i;W)_{y_i}+\Delta)]+\lambda \sum_k \sum_l W^2_{k,l}$

Code:

def L_i(x, y, W):
  """
  unvectorized version. Compute the multiclass svm loss for a single example (x,y)
  - x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
    with an appended bias dimension in the 3073-rd position (i.e. bias trick)
  - y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
  - W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
  """
  delta = 1.0 # see notes about delta later in this section
  scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
  correct_class_score = scores[y]
  D = W.shape[0] # number of classes, e.g. 10
  loss_i = 0.0
  for j in xrange(D): # iterate over all wrong classes
    if j == y:
      # skip for the true class to only loop over incorrect classes
      continue
    # accumulate loss for the i-th example
    loss_i += max(0, scores[j] - correct_class_score + delta)
  return loss_i

def L_i_vectorized(x, y, W):
  """
  A faster half-vectorized implementation. half-vectorized
  refers to the fact that for a single example the implementation contains
  no for loops, but there is still one loop over the examples (outside this function)
  """
  delta = 1.0
  scores = W.dot(x)
  # compute the margins for all classes in one vector operation
  margins = np.maximum(0, scores - scores[y] + delta)
  # on y-th position scores[y] - scores[y] canceled and gave delta. We want
  # to ignore the y-th position and only consider margin on max wrong class
  margins[y] = 0
  loss_i = np.sum(margins)
  return loss_i

def L(X, y, W):
  """
  fully-vectorized implementation :
  - X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
  - y is array of integers specifying correct class (e.g. 50,000-D array)
  - W are weights (e.g. 10 x 3073)
  """
  # evaluate loss over all examples in X without using any for loops
  # left as exercise to reader in the assignment
    delta = 1.0
    score_matrix = W.dot(X)
    correct_score = score_matrix[y]
    #Todo, correct_score_matrix = correct_core * 50
    #Todo, delta_matrx repeat delta
    loss_matrix = score_matrix - correct_score_matrix + delta_matrx
    result = np.sum(loss_matrix, axis=1)
  return loss_matrix

Softmax分类器

逻辑回归分类器面对多个分类的一般化归纳

公式：

$\displaystyle Li=-log(\frac{e^{f_{y_i}}}{\sum_je^{f_j}})$

或者

$L_i=-f_{y_i}+log(\sum_je^{f_j})$

所有的函数转换成 $$e^z$$

score function => $f_j(z)=\frac{e^{z_j}}{\sum_ke^{z_k}}$

softmax 函数, 每个元素都在0-1之间并且和为1

概率解释:

$P(y_i|x_i,W)=\frac{e^{f_{y_i}}}{\sum_je^{f_j}}$

我们就是在最小化正确分类的负对数概率，这可以看做是在进行最大似然估计（MLE）

实现softmax函数计算的时候技巧可以用常数C，通常$$\log C = -maxf_j$$

$\frac{e^{f_{y_i}}}{\sum_je^{f_j}}=\frac{Ce^{f_{y_i}}}{C\sum_je^{f_j}}=\frac{e^{f_{y_i}+logC}}{\sum_je^{f_j+logC}}$

f = np.array([123, 456, 789]) # 例子中有3个分类，每个评分的数值都很大
p = np.exp(f) / np.sum(np.exp(f)) # 不妙：数值问题，可能导致数值爆炸

# 那么将f中的值平移到最大值为0：
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # 现在OK了，将给出正确结果

折叶损失（hinge loss）替换为交叉熵损失（cross-entropy loss）

$\displaystyle H(p,q)=-\sum_xp(x) logq(x)$

SVM和Softmax的比较

Softmax

Summary

SVM 和 Softmax 基于 weight W 和 bias b
define Loss Function (损失函数) 用来更好的定义更好的预测模型

—-sad—-
Test Mathjax, 但是公式实在太复杂。

$$
L_{i}=f_{y_i}+\log (\sum_{j} e^{f_j})
$$

$L_{i} = -\log \frac {e^{f_y}} {e^{f_j}}$

$\sum_{j=0}$

$$
a+b=c
$$

TODO

Weijie Sun Blog

cs231n note - 2