A feature locator is essential in all CV domain. It's the basis of the germetric transformation, epipolar geometry, to 3D mesh reconstruction.
Many techniques - SIFT and other SLAM technologies, are available, but they require ideal environments to work in.
To address the short comings:
- sensitive to low texture environment
- sensitive to low light envonrment
- sensitive to high light environment (like outdoor day light with above 20k lux)
- and many other issues
I propose a CNN based neural network to detect 4 correspondences in an image A and an image B.
Since it is tricky to have a neural network to predict a 4x4 affine matrix of rotation and translation, I separated the translation vector from the rotation vector.
Basically, the ground truth data will be precalcalated with a generic SIFT with RANSAC to calculate the correspondences set P and P'.
The L2 (Eucledean) distance will be used between a predicted value. They are 4 points, so an averaged will be used to calculate the delta beteen a predict P' and P'
Using Theano, a neural network was created and trained over few weeks.
The prediction errors were within 25% of the ground truth.
Further work:
I didn't have the confidence value calculated, but would like to add that in the prediction graph. This means we should be using Cross Entropy instead Regression here.
Hardware:
- CPU:
- Intel(R) Core(TM)2Duo CPU E8500 @ 3.16GHz
- Memory:
- 2GB RAM
- GPU:
- GeForce GTX 285
- BLAS:
- Intel Math Kernel Library, version 10.2.4.032
- Compute:
- CPU: double precision
- GPU: single precisison
Comments
Post a Comment