1. Convert the disprepancy between each output and its target value into an error derivative.
E = 1 / 2 Sigma (Tj - Yj)^2
j in Output
dE / dYj = - (Tj -Yj)
2. Compute an error derivative in each hidden layer from error derivatives in the layer above.
dE / dZj = dYj / dZj * (dE / dYj)
, where Zj is the sum of all outputs of i hidden units.
, where Yi is the output of i hidden unit.
, where Yj is the output of j unit
dE / dZj = Yj (1 - Yj) * (dE / dYj)
, where Yj (1 - Yj) is dY / dZ of a nonlinear logic unit of y = 1 / (1 + e ^ -Z)
, where dY / dZ is y (1 - y)
dE / dYi = Sigma(j) ( dZj / dYi ) * (dE / dZj)
dE / dYi = Sigma(j) Wij * (dE / dZj)
,where dE / dZj is already computed in above layer.
thus,
dE / dWij = (dZj / dWij) * (dE / dZj)
dE / dWij = Yi * (dE / dZj)
Proof:
y = 1 / (1 + e^-Z) = (1 + e^-Z)^-1
thus, dy/dz = -1 (-e^-z) / (1 + e^-z)^2
dy/dz = 1 / (1 + e^-z) * (e^-z / (1 + e^-z) ) = y ( 1 - y)
because, (e^-z) / (1 + e^-z) = ((1 + e^-z) - 1) / (1 + e^-z)
= (1 + e^-z) / (1 + e^-z) * ( -1 / ( 1 + e^-z) ) = 1 - y
Reference:
"Learning representations by back-propagating errors" by Geoffrey Hinton (October 1986)
https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf
Comments
Post a Comment