Skip to main content

Backpropagating dE/dy by Geoffrey Hinton

1. Convert the disprepancy between each output and its target value into an error derivative.


E = 1 / 2 Sigma (Tj - Yj)^2

j in Output

 

dE / dYj = - (Tj  -Yj)

 

 

2. Compute an error derivative in each hidden layer from error derivatives in the layer above.


dE / dZj = dYj / dZj  * (dE / dYj)

, where Zj is the sum of all outputs of i hidden units.

, where Yi is the output of i hidden unit.

, where Yj is the output of j unit


dE / dZj =  Yj (1 - Yj) * (dE / dYj)

, where Yj (1 - Yj) is dY / dZ of a nonlinear logic unit of y = 1 / (1 + e ^ -Z) 

, where dY / dZ is y (1 - y)


dE / dYi = Sigma(j) ( dZj / dYi ) * (dE / dZj)

dE / dYi = Sigma(j) Wij * (dE / dZj)

,where dE / dZj is already computed in above layer.


thus,

dE / dWij = (dZj / dWij) * (dE / dZj)

dE / dWij  = Yi * (dE / dZj)

 



Proof:

y = 1 / (1 + e^-Z) = (1 + e^-Z)^-1

thus, dy/dz = -1 (-e^-z) / (1 + e^-z)^2 

dy/dz = 1 / (1 + e^-z) * (e^-z / (1 + e^-z) ) = y ( 1 - y)

because,  (e^-z) / (1 + e^-z) = ((1 + e^-z) - 1) / (1 + e^-z)

= (1 + e^-z) / (1 + e^-z) * ( -1 / ( 1 + e^-z) ) = 1 - y


Reference:

"Learning representations by back-propagating errors" by Geoffrey Hinton (October 1986)

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf



Comments

Popular posts from this blog

Time of Flight Depth Sensor (ToF) - Pros and Cons

Pros - Lightweight - Full frame time-of-flight data (3D array) collected with a single laser pulse - Unambiguous direct calculation of range - Blur-free imager without motion distortion - Co-registeration of range and intensity for each pixel - Perfectly registered pixels within a frame - Ability to represent the camera-oblique objects - No precision scanning mechanism required - 3D flash LIDAR with 2D cameras (EO and IR) to combine 2D texture over 3D depth - Multiple 3D flash LIDAR cameras for full volumetric 3D scene - Lighter and smaller than point scanning systems - Non-moving parts - Lower power consumption - Ability to scan through range-gating, natural obscurants

How to improve the traditional ASR using Connectionist Temporal Classification

The traditional Automatic Speech Recognition (ASR) performs at about 85% accuracy rate.  At this rate, ASR users are often frustrated with the experience with using such a system. The tradition ASR is often fragile: 1) requires extensive modification of parameters, just to make it work. 2) requires extensive understanding of a language model and a acoustic model. 3) doesn't scale well to multiple languages. 4) hyper-sensitive to speaker variants. Deep Learning on the acoustic model has been introduced, but not much of gain in the accuracy. What if, we can do a DL from end to end? Connectionist Temporal Classification (2006) introduces an idea of using FFT on the frequency of a recording of a voice command and constructs a spectrogram at 8kHz.  At each spectrogram interval, a DL neural network can be assigned, individually. The basic idea is to have RNN output neurons to encode distribution over "symbols". The traditional ASR uses a phone...

State of the Art SLAM techniques

Best Stereo SLAMs in 2017 are reviewed. Namely, (in arbitrary order) EKF-SLAM based,  Keyframe based,  Joint BA optimization based,  RSLAM,  S-PTAM,  LSD-SLAM,   Best RGB-D SLAMs in 2017 are also reviewed. KinectFusion,  Kintinuouns,  DVO-SLAM,  ElasticFusion,  RGB-D SLAM,   See my keypoints of the best Stereo SLAMs. Stereo SLAM Conditionally Independent Divide and Conquer EKF-SLAM [5]   operate in large environments than other approaches at that time uses both  close and far points far points whose depth cannot be reliably estimated due to little disparity in the stereo camera  uses an inverse depth parametrization [6] shows empirically points can be triangulated reliably, if their depth is less than about 40 times the stereo baseline.     - Keyframe-based  Stereo SLAM   - uses BA optimization in a local area to archive scalability.  ...