Skip to main content

How to reduce TOF errors in AR glasses

In this blog, I will describe how we reduced the noise of the Time-Of-Flight sensor in our AR glasses prototype.

Types of noise
- systematic noise
   note: caused by imperfect sinusoidal modulation
- random noise
   note: by shot noise. use bilateral filtering

Motion artifacts reduction
note: when motion is observed on a target object, we have motion artifacts observed in the tof sensor.  This happens when TOF measurement is recorded sequentially.  And, this causes doppler effects.

fix:
- use Plus and Minus rules
   -- reference:
       1) "Time of flight motion compensation revisited"  (2014)
       2) "Time of flight cameras: Principles, Methods and Applications" (2012)


Physics-based MPI reduction

fix:
- use 2K+1 frequency measurements for K inferencing paths in absence of noise.


Per-pixel temporal processing of raw ToF measurements

fix:
- matrix pencil method
- Prong's method
- onthogonal matching method
- EPIRIT / MUSIC
- atomic norm regularization
- light transport model with sparse & low rank components
- phaser imaging

reference:
- "Signal processing for time-of-flight imaging sensors: An introduction to inverse problems in computational 3-d imaging" (2016)
- "Resolving multipath interference in kinetc: An inverse problem approach." (2016)
- "Recent advances in transient imaging: A computer graphics and vision perspective" (2017)
- "SRA: fast removal of general multipath for tof sensors." (2014)
- "Phasor imaging: A generalization of correlation-based time-of-flight imaging"  (2015)

Learning-based MPI reduction

fix:
- use an encoder to learn a mapping from captured ToF measurements to a feature representation of MPI corrupted path.
- combine it with a simulated, directed ToF measurements to train a decoder, so it can produce MPI corrected depth maps.

- use a KAKU robot and structured light to capture ToF measurements with registered GT depth.
- then, train two neural networks to correct depth and refine edges using geodesic filtering.

- use transient rendering to synthesize a training dataset with realistic shot noise.
- then, generate measurements from ToF sensors with random modulation path.

reference:
- ""DeepToF: Off the self real-time correction of multipath interference in time-of-flight imaging" (2017)
- "Automatic learning to remove multipath distortions in time-of-flight range images for a robotic arm setup." (2016)
- "Recent advances in transient imaging: A computer graphics and vision perspective" (2017)
- "A framework for transient rendering." (2014)


Additional notes will be added in the near future, as we make more progress.





Comments

Popular posts from this blog

How to improve the traditional ASR using Connectionist Temporal Classification

The traditional Automatic Speech Recognition (ASR) performs at about 85% accuracy rate.  At this rate, ASR users are often frustrated with the experience with using such a system. The tradition ASR is often fragile: 1) requires extensive modification of parameters, just to make it work. 2) requires extensive understanding of a language model and a acoustic model. 3) doesn't scale well to multiple languages. 4) hyper-sensitive to speaker variants. Deep Learning on the acoustic model has been introduced, but not much of gain in the accuracy. What if, we can do a DL from end to end? Connectionist Temporal Classification (2006) introduces an idea of using FFT on the frequency of a recording of a voice command and constructs a spectrogram at 8kHz.  At each spectrogram interval, a DL neural network can be assigned, individually. The basic idea is to have RNN output neurons to encode distribution over "symbols". The traditional ASR uses a phone...

Calculating camera extrincs

Before we talk about the projection matrix of the depth correspondces, we need to know two things: - Camera extrinsics - Camera intrinsics Camera extrinsics maps the world coorinates to the camera coordinates. For the simplicity of the camera, it is a pinhole camera without lenses.  I'll talk about the lenses, the focal length, the lense aberation, the pixel sensor dimension, etc in Camera intrincs. So, locating an object in two images and projecting in the camera space is not that straight. But, it will be a straight process with the application of Machine Learning. I'll talk about the next part of the series in applying the deep neural network to optimizing the homographic projection and have it robust in low texture settings including low light. Deep Neural Network - Estimating Homography to address: - low texture environment - outside light conditions ( gamma > 2kLs) - robust as or better than SfM or other SLAM techniquese First, we need to locate the ...

How to use Convolution Neural Network to predict SIFT features

A feature locator is essential in all CV domain.  It's the basis of the germetric transformation, epipolar geometry, to 3D mesh reconstruction. Many techniques - SIFT and other SLAM technologies, are available, but they require ideal environments to work in. To address the short comings: - sensitive to low texture environment - sensitive to low light envonrment - sensitive to high light environment (like outdoor day light with above 20k lux) - and many other issues I propose a CNN based neural network to detect 4 correspondences in an image A and an image B. Since it is tricky to have a neural network to predict a 4x4 affine matrix of rotation and translation, I separated the translation vector from the rotation vector. Basically, the ground truth data will be precalcalated with a generic SIFT with RANSAC to calculate the correspondences set P and P'. The L2 (Eucledean) distance will be used between a predicted value.  They are 4 points, so an averaged will ...