CS231N Lec. 3 | Loss Fucntions and Optimization

please find lecture reference from here1

CS231N Lec. 3 | Loss Fucntions and Optimization

Linear classification

  1. Loss function to quantifies to define scores of “W”

  2. Optimization to minimize Loss function, finding optimized parameters.

How calculate loss function?

s : predicted score out of prediction yi : category of ground trowth s_yi : score of true class

Q1. What happen to loss if car scores change a bit ? -> doesn’t change. because car score(4.9) is far larger then constant(1),
so results will be same.
Q2. what is min/max possible loss ?
min is 0, max is infinite. (see above graph)
Q3. At init. W is small so all s ~ 0 . what is the loss ? will be “c(number of classes) - 1”.
s_j and s_y_i will be 0, so all losses are max(0, 1),
so sum of loss will be 1 * “c(number of classes) - 1” = c-1 it’s quite useful debugging skill in practice. Q4. ??? Q5. will be same. just a re-scaling ( because average is divide sum by # of targets ) Q6. this will be different.

e.g.) 2W is also has L=0

regularization

to penalize complex model. to make model more simpler. ( refer to Occam’s razor)

여기부분은 조금 더 보충이 필요함. more regularization methods..

  • Multinomial Logistic Regression(w/ Softmax function)

    SVM loss와 다르게 score 값도 의미있는 숫자로 가져간다. unnormalized log probabilities of the classes.

L1/L2 regularization, Elasitc net Softmax

Optimization

어떻게 최적화 된 W를 찾을까?

  1. Random W 15.5%.. SOA(while state-of-the-art 95%) 그냥 참고용.

  2. Follow slope gradient를 구하는 방법들..

learning rate and step size are important, first check.

stochastic (확률적) gradient mini batch monte carlo

linear 한 것에만 적용한것은 아니다.

polar coordinate 를 이용해서 circular data set을 linear 하게 처리하는것도 가능하다. feature vector

keywords:

  • feature vector, mini batch, mote carlo, stochastic gradient, SOA, SVM loss, L1 regularization, L2 regularization, Elastic

© 2020. All rights reserved.