Skip to main content

State of the Art SLAM techniques

Best Stereo SLAMs in 2017 are reviewed.

Namely, (in arbitrary order)

  • EKF-SLAM based, 
  • Keyframe based, 
  • Joint BA optimization based, 
  • RSLAM, 
  • S-PTAM, 
  • LSD-SLAM,  


Best RGB-D SLAMs in 2017 are also reviewed.

  • KinectFusion, 
  • Kintinuouns, 
  • DVO-SLAM, 
  • ElasticFusion, 
  • RGB-D SLAM,  


See my keypoints of the best Stereo SLAMs.


Stereo SLAM

Conditionally Independent Divide and Conquer EKF-SLAM [5] 

  • operate in large environments than other approaches at that time
  • uses both  close and far points
  • far points whose depth cannot be reliably estimated due to little disparity in the stereo camera 
  • uses an inverse depth parametrization [6]
  • shows empirically points can be triangulated reliably, if their depth is less than about 40 times the stereo baseline.

   
- Keyframe-based  Stereo SLAM
  - uses BA optimization in a local area to archive scalability.
  - [8]: joint optimization of BA (point-pose constraints) in a inner window
     -  pose-graph (pose-pose constraints) in an outer window of keyframes
     - achieves the constant time complexity by limiting the size of these windows
     - at the expense of not guaranteeing global consistency.

  - [9]: RSLAM uses a relative representation of landmarks and poses
    - performs relative BA in an active area, constrained by constant-time
    - able to close loops
    - allows to expand active areas at both sides of a loop
    - not enforcing global consistency

  - [10]:S-PTAM 
    - performs local BA
    - lacks large loop closing


 - [11]: LSD-SLAM
   - a semi-dense direct approach
   - minimizes photometric error in image regions with high gradient.
   - More robust than feature-based to 
     -  motion blur
     - low textured environments
   - severely degraded performance by unmodeled effects
     - rolling shutter
     - non-lambertian reflectance.


RGB-D SLAM

- KinectFusion [4]
   - fused all depth data from the sensor into a volumetric dense model
   - uses ICP with the model to track the camera pose.
   - limited to small workspace due to volumetric representation
   - lack of loop closing


- Kintinuous [12]:
  - operate in large environments
  - uses a rolling cyclical buffer
  - does loop closing with place recognition and pose graph optimization

- RGB-D SLAM [13]:
  - feature-based system
  - front-end computes frame-to-frame motion by feature matching and ICP.
  - back-end performs pose-graph optimization with loop closure constraints from a heuristic search.

- DVO-SLAM [14]:
  - optimizes a pose-graph
  - computes keyframe-to-keyframe constraints from a visual odometry
  - a visual odometry minimizes both photometric and depth error.
  - searches for loop candidates in a heuristic manner over all previous frames
  - not relying on place recognition.

- ElasticFusion [15]:
  - builds a surfel-based map of the environment.
  - a map-centric approach that doe do poses
  - performs loop closing with a non-rigid deformation to the map
  - not using the standard pose-graph optimization
  - impressive detail reconstruction
  - impressive localization accuracy
  - implementation limited to a room-size map due to the complexity scales with the number of surfels in the map.




Reference
[5] L. M. Paz, P. Pinie ́s, J. D. Tardo ́s, and J. Neira, “Large-scale 6-DOF SLAM with stereo-in-hand,” IEEE Trans. Robot., vol. 24, no. 5, pp. 946–957, 2008.


[6] J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse depth parametrization for monocular SLAM,” IEEE Trans. Robot., vol. 24, no. 5, pp. 932–945, 2008.


[7] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Visual SLAM: Why filter?” Image and Vision Computing, vol. 30, no. 2, pp. 65–77, 2012.


[8] H. Strasdat, A. J. Davison, J. M. M. Montiel, and K. Konolige, “Double window optimisation for constant time visual SLAM,” in IEEE Int. Conf. Comput. Vision (ICCV), 2011, pp. 2352–2359.


[9] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid, “RSLAM: A system for large-scale mapping in constant-time using stereo,” Int. J. Comput. Vision, vol. 94, no. 2, pp. 198–214, 2011.


[10] T. Pire, T. Fischer, J. Civera, P. De Cristo ́foris, and J. J. Berlles, “Stereo parallel tracking and mapping for robot localization,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2015, pp. 1373–1378.


[11] J. Engel, J. Stueckler, and D. Cremers, “Large-scale direct SLAM with stereo cameras,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2015.


[12] T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volu- metric fusion,” Int. J. Robot. Res., vol. 34, no. 4-5, pp. 598–626, 2015.

[13] F.Endres,J.Hess,J.Sturm,D.Cremers,andW.Burgard,“3-Dmapping with an RGB-D camera,” IEEE Trans. Robot., vol. 30, no. 1, pp. 177– 187, 2014.

[14] C. Kerl, J. Sturm, and D. Cremers, “Dense visual SLAM for RGB-D cameras,” in IEEE/RSJ Int. Conf. Intell. Robots and Syst. (IROS), 2013. [15] T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “ElasticFusion: Real-time dense SLAM and light source

estimation,” Int. J. Robot. Res., vol. 35, no. 14, pp. 1697–1716, 2016.

Comments

Popular posts from this blog

Time of Flight Depth Sensor (ToF) - Pros and Cons

Pros - Lightweight - Full frame time-of-flight data (3D array) collected with a single laser pulse - Unambiguous direct calculation of range - Blur-free imager without motion distortion - Co-registeration of range and intensity for each pixel - Perfectly registered pixels within a frame - Ability to represent the camera-oblique objects - No precision scanning mechanism required - 3D flash LIDAR with 2D cameras (EO and IR) to combine 2D texture over 3D depth - Multiple 3D flash LIDAR cameras for full volumetric 3D scene - Lighter and smaller than point scanning systems - Non-moving parts - Lower power consumption - Ability to scan through range-gating, natural obscurants

How to improve the traditional ASR using Connectionist Temporal Classification

The traditional Automatic Speech Recognition (ASR) performs at about 85% accuracy rate.  At this rate, ASR users are often frustrated with the experience with using such a system. The tradition ASR is often fragile: 1) requires extensive modification of parameters, just to make it work. 2) requires extensive understanding of a language model and a acoustic model. 3) doesn't scale well to multiple languages. 4) hyper-sensitive to speaker variants. Deep Learning on the acoustic model has been introduced, but not much of gain in the accuracy. What if, we can do a DL from end to end? Connectionist Temporal Classification (2006) introduces an idea of using FFT on the frequency of a recording of a voice command and constructs a spectrogram at 8kHz.  At each spectrogram interval, a DL neural network can be assigned, individually. The basic idea is to have RNN output neurons to encode distribution over "symbols". The traditional ASR uses a phone...