Skip to main content

State of the Art SLAM techniques

Best Stereo SLAMs in 2017 are reviewed.

Namely, (in arbitrary order)

  • EKF-SLAM based, 
  • Keyframe based, 
  • Joint BA optimization based, 
  • RSLAM, 
  • S-PTAM, 
  • LSD-SLAM,  


Best RGB-D SLAMs in 2017 are also reviewed.

  • KinectFusion, 
  • Kintinuouns, 
  • DVO-SLAM, 
  • ElasticFusion, 
  • RGB-D SLAM,  


See my keypoints of the best Stereo SLAMs.


Stereo SLAM

Conditionally Independent Divide and Conquer EKF-SLAM [5] 

  • operate in large environments than other approaches at that time
  • uses both  close and far points
  • far points whose depth cannot be reliably estimated due to little disparity in the stereo camera 
  • uses an inverse depth parametrization [6]
  • shows empirically points can be triangulated reliably, if their depth is less than about 40 times the stereo baseline.

   
- Keyframe-based  Stereo SLAM
  - uses BA optimization in a local area to archive scalability.
  - [8]: joint optimization of BA (point-pose constraints) in a inner window
     -  pose-graph (pose-pose constraints) in an outer window of keyframes
     - achieves the constant time complexity by limiting the size of these windows
     - at the expense of not guaranteeing global consistency.

  - [9]: RSLAM uses a relative representation of landmarks and poses
    - performs relative BA in an active area, constrained by constant-time
    - able to close loops
    - allows to expand active areas at both sides of a loop
    - not enforcing global consistency

  - [10]:S-PTAM 
    - performs local BA
    - lacks large loop closing


 - [11]: LSD-SLAM
   - a semi-dense direct approach
   - minimizes photometric error in image regions with high gradient.
   - More robust than feature-based to 
     -  motion blur
     - low textured environments
   - severely degraded performance by unmodeled effects
     - rolling shutter
     - non-lambertian reflectance.


RGB-D SLAM

- KinectFusion [4]
   - fused all depth data from the sensor into a volumetric dense model
   - uses ICP with the model to track the camera pose.
   - limited to small workspace due to volumetric representation
   - lack of loop closing


- Kintinuous [12]:
  - operate in large environments
  - uses a rolling cyclical buffer
  - does loop closing with place recognition and pose graph optimization

- RGB-D SLAM [13]:
  - feature-based system
  - front-end computes frame-to-frame motion by feature matching and ICP.
  - back-end performs pose-graph optimization with loop closure constraints from a heuristic search.

- DVO-SLAM [14]:
  - optimizes a pose-graph
  - computes keyframe-to-keyframe constraints from a visual odometry
  - a visual odometry minimizes both photometric and depth error.
  - searches for loop candidates in a heuristic manner over all previous frames
  - not relying on place recognition.

- ElasticFusion [15]:
  - builds a surfel-based map of the environment.
  - a map-centric approach that doe do poses
  - performs loop closing with a non-rigid deformation to the map
  - not using the standard pose-graph optimization
  - impressive detail reconstruction
  - impressive localization accuracy
  - implementation limited to a room-size map due to the complexity scales with the number of surfels in the map.




Reference
[5] L. M. Paz, P. Pinie ́s, J. D. Tardo ́s, and J. Neira, “Large-scale 6-DOF SLAM with stereo-in-hand,” IEEE Trans. Robot., vol. 24, no. 5, pp. 946–957, 2008.


[6] J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse depth parametrization for monocular SLAM,” IEEE Trans. Robot., vol. 24, no. 5, pp. 932–945, 2008.


[7] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012.


[8] H. Strasdat, A. J. Davison, J. M. M. Montiel, and K. Konolige, “Double window optimisation for constant time visual SLAM,” in IEEE Int. Conf. Comput. Vision (ICCV), 2011, pp. 2352–2359.


[9] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid, “RSLAM: A system for large-scale mapping in constant-time using stereo,” Int. J. Comput. Vision, vol. 94, no. 2, pp. 198–214, 2011.


[10] T. Pire, T. Fischer, J. Civera, P. De Cristo ́foris, and J. J. Berlles, “Stereo parallel tracking and mapping for robot localization,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2015, pp. 1373–1378.


[11] J. Engel, J. Stueckler, and D. Cremers, “Large-scale direct SLAM with stereo cameras,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2015.


[12] T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volu- metric fusion,” Int. J. Robot. Res., vol. 34, no. 4-5, pp. 598–626, 2015.

[13] F.Endres,J.Hess,J.Sturm,D.Cremers,andW.Burgard,“3-Dmapping with an RGB-D camera,” IEEE Trans. Robot., vol. 30, no. 1, pp. 177– 187, 2014.

[14] C. Kerl, J. Sturm, and D. Cremers, “Dense visual SLAM for RGB-D cameras,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2013. [15] T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “ElasticFusion: Real-time dense SLAM and light source

estimation,” Int. J. Robot. Res., vol. 35, no. 14, pp. 1697–1716, 2016.

Comments

Popular posts from this blog

How to improve the traditional ASR using Connectionist Temporal Classification

The traditional Automatic Speech Recognition (ASR) performs at about 85% accuracy rate.  At this rate, ASR users are often frustrated with the experience with using such a system. The tradition ASR is often fragile: 1) requires extensive modification of parameters, just to make it work. 2) requires extensive understanding of a language model and a acoustic model. 3) doesn't scale well to multiple languages. 4) hyper-sensitive to speaker variants. Deep Learning on the acoustic model has been introduced, but not much of gain in the accuracy. What if, we can do a DL from end to end? Connectionist Temporal Classification (2006) introduces an idea of using FFT on the frequency of a recording of a voice command and constructs a spectrogram at 8kHz.  At each spectrogram interval, a DL neural network can be assigned, individually. The basic idea is to have RNN output neurons to encode distribution over "symbols". The traditional ASR uses a phone...

How to use Convolution Neural Network to predict SIFT features

A feature locator is essential in all CV domain.  It's the basis of the germetric transformation, epipolar geometry, to 3D mesh reconstruction. Many techniques - SIFT and other SLAM technologies, are available, but they require ideal environments to work in. To address the short comings: - sensitive to low texture environment - sensitive to low light envonrment - sensitive to high light environment (like outdoor day light with above 20k lux) - and many other issues I propose a CNN based neural network to detect 4 correspondences in an image A and an image B. Since it is tricky to have a neural network to predict a 4x4 affine matrix of rotation and translation, I separated the translation vector from the rotation vector. Basically, the ground truth data will be precalcalated with a generic SIFT with RANSAC to calculate the correspondences set P and P'. The L2 (Eucledean) distance will be used between a predicted value.  They are 4 points, so an averaged will ...

Calculating camera extrincs

Before we talk about the projection matrix of the depth correspondces, we need to know two things: - Camera extrinsics - Camera intrinsics Camera extrinsics maps the world coorinates to the camera coordinates. For the simplicity of the camera, it is a pinhole camera without lenses.  I'll talk about the lenses, the focal length, the lense aberation, the pixel sensor dimension, etc in Camera intrincs. So, locating an object in two images and projecting in the camera space is not that straight. But, it will be a straight process with the application of Machine Learning. I'll talk about the next part of the series in applying the deep neural network to optimizing the homographic projection and have it robust in low texture settings including low light. Deep Neural Network - Estimating Homography to address: - low texture environment - outside light conditions ( gamma > 2kLs) - robust as or better than SfM or other SLAM techniquese First, we need to locate the ...