Skip to main content

Posts

Creating an optical computer

Creating an optical computer  Note on creating an optical computer.  What is Optical Computer? A laptop is a microchip based computer and uses electricity and transisters to compute. An optical computer uses photons to compute.  How does it compare to a typical laptop? A modern desktop computer has about 5 TFLOPS (5 x 10^16 floating calculations per second). With an optical computer, there is no limit in the calcuations per second.   Is an optical computer faster than a quantuam computer?  In 2016, the fastest known quantum computer has 2000 qubits, which is 1000 faster than 512 qubits.  With an optical computer, there is no artificial limitation like 2000 or 500 qubits.   What's the theoretical compute limit on an optical computer?  There is a limit of speed of light. For now, the only artificial limitation is how we design the first prototype.  How much electricity energy does it require?  The first POC should use less than 1000 W/hr.  Has there been any prior inventions or work
Recent posts

How to reduce TOF errors in AR glasses

In this blog, I will describe how we reduced the noise of the Time-Of-Flight sensor in our AR glasses prototype. Types of noise - systematic noise    note: caused by imperfect sinusoidal modulation - random noise    note: by shot noise. use bilateral filtering Motion artifacts reduction note: when motion is observed on a target object, we have motion artifacts observed in the tof sensor.  This happens when TOF measurement is recorded sequentially.  And, this causes doppler effects. fix: - use Plus and Minus rules    -- reference:        1) "Time of flight motion compensation revisited"  (2014)        2) "Time of flight cameras: Principles, Methods and Applications" (2012) Physics-based MPI reduction fix: - use 2K+1 frequency measurements for K inferencing paths in absence of noise. Per-pixel temporal processing of raw ToF measurements fix: - matrix pencil method - Prong's method - onthogonal matching method - EPIRIT / MUSIC - atomi

How to train a neural network to retrieve 3D maps from videos

This blog is about how to train a neural network to extract depth maps from videos of moving people captured with a monocular camera. Note: With a monocular camera, extracting the depth map of moving people is difficult.  Difficulty is due to the motion blur and the rolling shutter of an image.  However, we can overcome these limitations by predicting the depth maps by the model trained with a generated dataset using SfM and MVS from the normalized videos. This normalized dataset can be the basis of the training set for the neural network to automatically extract the accurate depth maps from a typical video footage, without any further assistance from a MVS. To start this project with a SfM and a MVS, we will use TUM Dataset. So, the basic idea is to use SfM and Multiview Stereo to estimate depth, while serves as supervision during training. The RGB-D SLAM reference implementation from these papers are used: - RGB-D Slam (Robotics OS) - Real-time 3D Visual SLAM with a ha

State of the Art SLAM techniques

Best Stereo SLAMs in 2017 are reviewed. Namely, (in arbitrary order) EKF-SLAM based,  Keyframe based,  Joint BA optimization based,  RSLAM,  S-PTAM,  LSD-SLAM,   Best RGB-D SLAMs in 2017 are also reviewed. KinectFusion,  Kintinuouns,  DVO-SLAM,  ElasticFusion,  RGB-D SLAM,   See my keypoints of the best Stereo SLAMs. Stereo SLAM Conditionally Independent Divide and Conquer EKF-SLAM [5]   operate in large environments than other approaches at that time uses both  close and far points far points whose depth cannot be reliably estimated due to little disparity in the stereo camera  uses an inverse depth parametrization [6] shows empirically points can be triangulated reliably, if their depth is less than about 40 times the stereo baseline.     - Keyframe-based  Stereo SLAM   - uses BA optimization in a local area to archive scalability.   - [8]: joint optimization of BA (point-pose constraints) in a inner window      -  pose-graph (pose-p

Finding a better local minima in Deep Learning

Training a DL model to find a local minima in n-dimensions can be a challenge.  Often, data scientists and ML engineers would use a gradient descent to optimize the path. Starting delta may be anywhere between 1e-3 or 1e-4.  Having a constant gridient would not fast-approach a local minima. There are few issues with this approach. 1) The first found local minima may not be the best minima.  It can be stuck in a sharp valley, where any deriviate change would raise the error rate above 50% or more. 2) The first found local minima may be a local mixima, as shown in the saddle point graph below. When optimizing on n-th dimensions of space of a DL model, the best approach is to find a flat valley, when the SGD can locate a stable ground and where error rates stay low or relatively small to what it landed in the best optimization. However, there are a better way than this. Instead of manually entering an initial gradient decent value and updating it every epoch o

Review of ORB-SLAM: a monocular SLAM system

ORB-SLAM - Uses   - Bundle Adjustment   - ORB features [9]   - A pose graph      - Essential graph      - a spanning tree        - loop closure links        - strong edges        - from covisibility graph   - covisibliity graph     - local covisible area     - tracking and mapping - mar point and keyframe selection   - generous spawning   - restrictive culling   - identify redundant keyframes   - improves robustness and lifelong operations - Stores map points:   - 3D position X(w,i) in the world coordinate system   - the viewing direction n(i)     - the mean unit vector of all its viewing directions     - the ray that joint the point with the optical center of the keyframe  - A representative ORB descriptor D(i)     - the associated ORB descriptor whose hamming distance is minimum       with respect to all other associated descriptors in the keyframes       in which the point is observed.  - the maximum d(max)  - the minimum d(min) distance     - a

Public Datasets for SLAM

TUM RGB-D benchmark [38] - an excellent dataset to evaluate the accuracy of camera location - several sequences with accurate ground truth obtained with an external motion capture system KITTI - extracted 2000 corners - 512x384 - 752x480 - 1241x376 - 5 corners per cell Compute orientation & ORB descriptors  - novel, direct, semi-dense, LSD-SLAM [10]     - takes time to converge the depth values - PTAM benchmarks [4]   - manually selected two keyframes for initialization - align the keyframe trajectories using the similarity transformatione - scale is unknown - measure the absolute trajectory error (ATE) [38] - RGB-D SLAM [43]    - trajectories - use the similarity transform to check if the scale is well recovered. - align the trajectories with a rigid body transformation