Abstract
Present visual servoing approaches are instance specific i.e. they control camera motion between two views of the same object. However, in practical scenarios where a robot is required to handle various instances of a category, classical visual servoing techniques are less suitable. We formulate across instance visual servoing as a pose induction and pose alignment problem. Initially, the desired pose given for any known instance is transferred to the novel instance through pose induction. Then the pose alignment problem is solved by estimating the current pose using the part aware keypoints reconstruction followed by a pose based visual servoing (PBVS) iteration. To tackle large variation in appearance across object instances in a category, we employ visual features that uniquely correspond to locations of object's parts in images. These part-aware keypoints are learned from annotated images using a convolutional neural network (CNN). Advantages of using such part-aware semantics are two-fold. Firstly, it conceals the illumination and textural variations from the visual servoing algorithm. Secondly, semantic keypoints enables us to match descriptors across instances accurately. We validate the efficacy of our approach through experiments in simulation as well as on a quadcopter. Our approach results in acceptable desired camera pose and smooth velocity profile. We also show results for large camera transformations with no overlap between current and desired pose for 3D objects, which is desirable in servoing context.