Skip to main content

What are the depth sensors?

How to Create a holographic display and camcorder

In the last part of the series, I talked about why the depth sensors may not be ideal for a consumer grade camcorder.

These depth sensors lack
  • Miniaturized form factor
  • Cost effectiveness 
  • Poor weather handling
  • Noticeable noise errors

Due to these limitations, the holographic display and camcorder will use other depth sensor alternatives.

What are the depth sensor alternatives?

Cameras


We can use one or more cameras.  When we use a camera or more, we can retrieve the depth information.

These camera configurations are
  • Monocular Camera
  • Stereoscopic Cameras
  • N-View Cameras


For the first prototype, we will limit our use case to indoors.

I haven't decided if I should use a monocular camera, stereoscopic cameras or n-view cameras.  This may largely decided by how much time I have available.  Likely, I will use all these camera configurations to compare and contrast the results over the design and the ease-of-use.

The camcorder should should record
  • A person
  • Indoors

What are the depth sensors?

A camera records a scene.
The scene is recorded in 2D.  That is, it has width and height.
There is no depth distance recorded with a camera.


A depth sensor records the z-axis distance to every depth point.

The z-axis distance is the depth distance.


  • It is the distance between the depth sensor emitter and the surface point of an object in the scene.

For example, imagine you shoot one single laser beam from the depth sensor emitter to some object. Let's say, it's a small cube box in the scene. 

When the laser beam hits some point on the surface of the small box, you should see only one laser beam point reflected on the surface of the box.  

This reflected point is the depth point. 

This depth point is reflected back to the depth sensory plane.  When it is reflected, the time of flight is measured to calculate the distance between the depth sensory emitter and the reflected surface point.

Now, image multiple laser beams hitting the surface of the box.   This means we can sample the surface distance from the laser beam emitters at each point. 

That is just one object.

What if we shoot many laser beams to all objects in the scene?

With this, we can sample the time of flight distances between all laser beams and the reflected depth points from all visible objects.



On the next part, I'll talk about How to use the cameras to retrieve the depth information.
After that, we can use the depth distance points to reconstruct a scene.



Comments

Popular posts from this blog

How to improve the traditional ASR using Connectionist Temporal Classification

The traditional Automatic Speech Recognition (ASR) performs at about 85% accuracy rate.  At this rate, ASR users are often frustrated with the experience with using such a system. The tradition ASR is often fragile: 1) requires extensive modification of parameters, just to make it work. 2) requires extensive understanding of a language model and a acoustic model. 3) doesn't scale well to multiple languages. 4) hyper-sensitive to speaker variants. Deep Learning on the acoustic model has been introduced, but not much of gain in the accuracy. What if, we can do a DL from end to end? Connectionist Temporal Classification (2006) introduces an idea of using FFT on the frequency of a recording of a voice command and constructs a spectrogram at 8kHz.  At each spectrogram interval, a DL neural network can be assigned, individually. The basic idea is to have RNN output neurons to encode distribution over "symbols". The traditional ASR uses a phone...

How to use Convolution Neural Network to predict SIFT features

A feature locator is essential in all CV domain.  It's the basis of the germetric transformation, epipolar geometry, to 3D mesh reconstruction. Many techniques - SIFT and other SLAM technologies, are available, but they require ideal environments to work in. To address the short comings: - sensitive to low texture environment - sensitive to low light envonrment - sensitive to high light environment (like outdoor day light with above 20k lux) - and many other issues I propose a CNN based neural network to detect 4 correspondences in an image A and an image B. Since it is tricky to have a neural network to predict a 4x4 affine matrix of rotation and translation, I separated the translation vector from the rotation vector. Basically, the ground truth data will be precalcalated with a generic SIFT with RANSAC to calculate the correspondences set P and P'. The L2 (Eucledean) distance will be used between a predicted value.  They are 4 points, so an averaged will ...

Calculating camera extrincs

Before we talk about the projection matrix of the depth correspondces, we need to know two things: - Camera extrinsics - Camera intrinsics Camera extrinsics maps the world coorinates to the camera coordinates. For the simplicity of the camera, it is a pinhole camera without lenses.  I'll talk about the lenses, the focal length, the lense aberation, the pixel sensor dimension, etc in Camera intrincs. So, locating an object in two images and projecting in the camera space is not that straight. But, it will be a straight process with the application of Machine Learning. I'll talk about the next part of the series in applying the deep neural network to optimizing the homographic projection and have it robust in low texture settings including low light. Deep Neural Network - Estimating Homography to address: - low texture environment - outside light conditions ( gamma > 2kLs) - robust as or better than SfM or other SLAM techniquese First, we need to locate the ...