Skip to main content

How to use Convolution Neural Network to predict SIFT features



A feature locator is essential in all CV domain.  It's the basis of the germetric transformation, epipolar geometry, to 3D mesh reconstruction.

Many techniques - SIFT and other SLAM technologies, are available, but they require ideal environments to work in.

To address the short comings:

- sensitive to low texture environment
- sensitive to low light envonrment
- sensitive to high light environment (like outdoor day light with above 20k lux)
- and many other issues

I propose a CNN based neural network to detect 4 correspondences in an image A and an image B.

Since it is tricky to have a neural network to predict a 4x4 affine matrix of rotation and translation, I separated the translation vector from the rotation vector.

Basically, the ground truth data will be precalcalated with a generic SIFT with RANSAC to calculate the correspondences set P and P'.

The L2 (Eucledean) distance will be used between a predicted value.  They are 4 points, so an averaged will be used to calculate the delta beteen a predict P' and P'

Using Theano, a neural network was created and trained over few weeks.

The prediction errors were within 25% of the ground truth.

Further work:

I didn't have the confidence value calculated, but would like to add that in the prediction graph.  This means we should be using Cross Entropy instead Regression here.

Hardware:

- CPU:
  - Intel(R)  Core(TM)2Duo  CPU  E8500  @  3.16GHz
- Memory:
  - 2GB RAM
- GPU:
  - GeForce GTX 285
- BLAS:
  - Intel  Math  Kernel  Library,  version  10.2.4.032
- Compute:
  - CPU: double precision
  - GPU: single precisison





Comments

Anonymous said…
This comment has been removed by a blog administrator.

Popular posts from this blog

How to project a camera plane A to a camera plane B

How to Create a holographic display and camcorder In the last part of the series "How to Create a Holographic Display and Camcorder", I talked about what the interest points, descriptors, and features to find the same object in two photos. In this part of the series, I'll talk about how to extract the depth of the object in two photos by calculating the disparity between the photos. In order to that, we need to construct a triangle mesh between correspondences. To construct a mesh, we will use Delaunnay triagulation.  Delaunnay Triagulation - It minimizes angles of all triangles, while the sigma of triangles is maximized. The reason for the triangulation is to do a piece wise affine transformation for each triangle mapped from a projective plane A to a projective plane B. A projective plane A is of a camera projective view at time t, while a projective plane B is of a camera projective view at time t+1. (or, at t-1.  It really doesn't matter)

How to create a holographic camcorder

Since the invention of a camcorder, we haven't seen much of advancement of a video camcorder. Sure, there are few interesting, new features like capturing video in 360 or taking high resolution 4K content. But the content is still in 2D and we still watch it on a 2D display. Have you seen the movie Minority Report (2002)? There is a scene where Tom Cruise is watching a video recording of his lost son in 3D or holographically. Here is a video clip of this scene. I have been waiting for the technological advancement to do this, but it's not here yet. So I decided to build one myself. In order to build a holographic video camcorder, we need two devices. 1) a video recorder - a recorder which captures the video content in 3D or holographically. 2) a video display - a display device which shows the recorded holographic content in 3D or holographically. Do we have a technology to record a video, holographically. Yes, we can now do it, and I'll e

Creating an optical computer

Creating an optical computer  Note on creating an optical computer.  What is Optical Computer? A laptop is a microchip based computer and uses electricity and transisters to compute. An optical computer uses photons to compute.  How does it compare to a typical laptop? A modern desktop computer has about 5 TFLOPS (5 x 10^16 floating calculations per second). With an optical computer, there is no limit in the calcuations per second.   Is an optical computer faster than a quantuam computer?  In 2016, the fastest known quantum computer has 2000 qubits, which is 1000 faster than 512 qubits.  With an optical computer, there is no artificial limitation like 2000 or 500 qubits.   What's the theoretical compute limit on an optical computer?  There is a limit of speed of light. For now, the only artificial limitation is how we design the first prototype.  How much electricity energy does it require?  The first POC should use less than 1000 W/hr.  Has there been any prior inventions or work