How to create a holographic camcorder

Since the invention of a camcorder, we haven't seen much of advancement of a video camcorder.

Sure, there are few interesting, new features like capturing video in 360 or taking high resolution 4K content. But the content is still in 2D and we still watch it on a 2D display.

Have you seen the movie Minority Report (2002)? There is a scene where Tom Cruise is watching a video recording of his lost son in 3D or holographically.

Here is a video clip of this scene.

I have been waiting for the technological advancement to do this, but it's not here yet. So I decided to build one myself.

In order to build a holographic video camcorder, we need two devices.

1) a video recorder

- a recorder which captures the video content in 3D or holographically.

2) a video display

- a display device which shows the recorded holographic content in 3D or holographically.

Do we have a technology to record a video, holographically. Yes, we can now do it, and I'll explain below.

And, do we have a technology to display a video, holographically? Yes and No, and I'll explain below.

The Minority Report video clip shows a set of 4 external projection devices. These are used in conjunction to create a holographic illusion of the captured video content, when the captured video content is played back. Unfortunately, we are still working on such technologies, so we don't have the technology to make an external, holographic projection. However, we can create an internal holographic display technology. I'll explain below

How to create a holographic camcorder?

In essence, a holographic camcorder is a camcorder which records a scene in 3D. That means, if you record a person in a holographic camera, the person should show up in 3D, and the watcher, or the user, of the video content, should be able walk to the side of the recorded person, which is displayed in front of the user, to see the side view of the recorded person.

How to record a video content, holographically?

Before we jump into a holographic video content, let's talk about the basics of a 2D video content we have seen many times.

A typical 2D video clip recorded by a regular camcorder is a series of photos. Essentially, a video is about a 30 frames per second photos. That is a 30 photos per second video, so 30 photos recorded by the user for one second. For one minute video, we have 30 photos * 60 seconds or 1800 photos for that one minute of a captured video. So, a 2D video is a series of many 2D photos. They are just played back fast, so we see an illusion of motion.

Let's talk about a holographic photo.

A typical 2D photo is a 2D photo image. A color photo is three photos combined. For example, a color photo is a photo image in red only, a photo image in blue only, and a photo image in green, only. So, let's talk about a photo image in red, and let's call it a light intensity photo, although we can calculate a single 2D photo with calculated light intensity, for example by converting the original photo from RBG to HSV.

For this, a photo in red channel will do. Let's call this a light intensity photo.

It is a 2D image photo. It has 2 dimensions. Width and Height.

It records the intensity of light in pixels defined by the dimension of a photo. So, a 2D photo is made up of Width * Height pixels.

Each pixel has information about the intensity of light, but it has no depth information. Simply, the depth information is not recorded by the camera sensor.

Getting the depth information.

With the advancement in robotics, computer vision and machine learning, we now can employ additional sensors in addition to a camera sensor to retrieve the depth information.

LIDAR

The laser base 1D or 3D scanning sensor is great, but it is susceptible to weather conditions like rain or snow, which cause high rate errors in estimating the depth information about a scene.

It is an ideal sensor to retrieve the most precise depth information, if we can figure out how to minimize the error rate in environmental elements like snow, fog and rain.

RADAR

[Fill in the pros and cons of using the sensor in a consumer grade device.]

SONAR

The sound based 3D scanning sensor.

[Fill in the pros and cons of using the sensor in a consumer grade device.]

Stereo Cameras

Using two cameras, we can estimate the depth information about a scene. It may be ideal for indoor situations, where the depth distance to capture doesn't go beyond 10m (about 33ft). We may be able to estimate the depth information up to 40m (about 131ft), but the estimate may cause a shivering effect on a 3D content captured that far.

The reason for this is estimating the depth information is hinged upon an accumulating epsilon, as we move about a scene with the holographic camcorder.

To further discuss the epsilon e, we need to find the inliers from nearest neighbor features. To find the features, we need to calculate the descriptors based on SIFT (Scale Invariant Feature Transformation), for example.

Finding the matching features is based on solving SVD (Single Vector Decomposition) on any number of feature correspondences by using Direct Linear Transformation (DLT).

Finding the descriptors will lead to the features, and with the matching features, we can estimate the depth structure by ...

[Insert Essential Matrix, Fundamental Matrix, Epipolar Lines math formulas]

Shivering Effect / Jitter Effect

These are all calculated estimates, not calculated precisely matching inliers. And they accumulate over time.

And this causes a shivering effect or jittering. A shivering effect is estimated depth information is not being consistent, or its calculation or its many numbers are shifting back and forth between two sets or more of calculated estimates.

calculate the disparity between externally calibrated stereo cameras, simply.

Even with just one camera, we can derive depth structure with an internally calibrated monocular camera, when applying SfM (Structure from Motion).

For this projecton, we can experiment with a monocular camera, a stereo cameras, or more than 4 cameras (Multi-view Geometry) to estimate the depth information. Or, simply experiment with an infrared sensor for the first prototype.

Structure from Motion (SfM*)

Camera and Infrasensor Fusion

How to project a camera plane A to a camera plane B

How to Create a holographic display and camcorder In the last part of the series "How to Create a Holographic Display and Camcorder", I talked about what the interest points, descriptors, and features to find the same object in two photos. In this part of the series, I'll talk about how to extract the depth of the object in two photos by calculating the disparity between the photos. In order to that, we need to construct a triangle mesh between correspondences. To construct a mesh, we will use Delaunnay triagulation. Delaunnay Triagulation - It minimizes angles of all triangles, while the sigma of triangles is maximized. The reason for the triangulation is to do a piece wise affine transformation for each triangle mapped from a projective plane A to a projective plane B. A projective plane A is of a camera projective view at time t, while a projective plane B is of a camera projective view at time t+1. (or, at t-1. It really doesn't matter)...