Abstract
Monocular SLAM is a well-studied problem and has shown significant progress in recent years, but still, challenges remain in creating a rich semantic description of the scene. Feature-based visual SLAMs are vulnerable to erroneous pose estimates due to insufficient tracking of mapped points or motion induced errors such as in large or in-place rotations. We present a new SLAM framework in which we use monocular edge based SLAM [1], along with category level models, to localize objects in the scene as well as improve the camera trajectory. In monocular SLAM systems, the camera track tends to break in conditions with abrupt motion which leads to reduction in the number of 2D point correspondences. In order to tackle this problem, we propose the first most principled formulation of its kind which integrates object category models in the core SLAM back-end to jointly optimize for the camera trajectory, object poses along with its shape and 3D structure. We show that our joint optimization is able to recover a better camera trajectory in such cases, as compared to Edge SLAM. Moreover, this method gives a better visualization incorporating object representations in the scene along with the 3D structure of the base SLAM system, which makes it useful for augmented reality (AR) applications