호빗의 인간세상 탐험기
deep learning #2 본문
3.3.4 Python Namespaces Tutorial code often uses the following namespaces:
import theano
import theano.tensor as T
import numpy
3.4 A Primer on Supervised Optimization for Deep Learning
What’s exciting about Deep Learning is largely the use of unsupervised learning of deep networks. But supervised learning also plays an important role. The utility of unsupervised pre-training is often evaluated on the basis of what performance can be achieved after supervised fine-tuning. This chapter reviews the basics of supervised learning for classification models, and covers the minibatch stochastic gradient descent algorithm that is used to fine-tune many of the models in the Deep Learning Tutorials. Have a look at these introductory course notes on gradient-based learning for more basics on the notion of optimizing a training criterion using the gradient.
3.4.1 Learning a Classifier
Zero-One Loss
The models presented in these deep learning tutorials are mostly used for classification. The objective in training a classifier is to minimize the number of errors (zero-one loss) on unseen examples. If f : RD → 8 Chapter 3. Getting Started Deep Learning Tutorial, Release 0.1 {0, ..., L} is the prediction function, then this loss can be written as:
Negative Log-Likelihood Loss Since the zero-one loss is not differentiable, optimizing it for large models (thousands or millions of parameters) is prohibitively expensive (computationally). We thus maximize the log-likelihood of our classifier given all the labels in a training set.
The likelihood of the correct class is not the same as the number of right predictions, but from the point of view of a randomly initialized classifier they are pretty similar. Remember that likelihood and zero-one loss are different objectives; you should see that they are corralated on the validation set but sometimes one will rise while the other falls, or vice-versa. Since we usually speak in terms of minimizing a loss function, learning will thus attempt to minimize the negative log-likelihood (NLL), defined as:
The NLL of our classifier is a differentiable surrogate for the zero-one loss, and we use the gradient of this function over our training data as a supervised learning signal for deep learning of a classifier. This can be computed using the following line of code :
3.4.2 Stochastic Gradient Descent
while True:
loss = f(params)
d_loss_wrt_params = ... # compute gradient
params -= learning_rate * d_loss_wrt_params
if
'IT이야기' 카테고리의 다른 글
RAID (레이드) 레벨과 구성방식 (0) | 2017.03.10 |
---|---|
Computer Networking : Principles, Protocols and Practice (0) | 2017.03.06 |
Deep Learning (0) | 2017.02.27 |
JAVA MultiThread DegienPattern2 (0) | 2017.02.17 |
JAVA MultiThread DegienPattern (0) | 2017.02.17 |