Abstract
With the booming online market, managing the warehouse inventory is one of the most
essential and challenging tasks. The management of inventory will be more efficient if it
is automated using robots. The robots can work faster than humans, the robots work at a
constant speed with no breaks, and do tasks in more repetition than humans like fetching
inventory from warehouse. But for the robots to perform tasks like putting objects in racks,
fetching objects from rack, re-organising the racks to make more space they need to have an
understanding of the warehouse environment.
To understand the warehouse environment the robots needs to create a 3D map of the
warehouse consisting space to move for the robot as well as identify the racks and boxes so
that its able to plan the execution of tasks. Towards solving this problem in this thesis we
address problem of freespace estimation for rack shelves. Given a monocular RGB image
captured from a camera mounted on a robotic arm. We aim to predict the Top-view and
Front-view layouts so as to create a 3D reconstruction of rack and objects present in the
Monocular RGB image.
We propose a simple yet effective network architecture RackLay, which takes a monocular RGB image as input and outputs the Top-view and Front-view layout of all the shelves
comprising the rack visible in the image. The Network can learn two kinds of layout representations, one in the canoncial frame centered on the shelf, called the shelf-centeric layout
and the other in a frame with respect to the camera, called the ego-centric layout. Apart from
portraying the versatility of the network, they lend to various useful applications.
Since there are very few publicly available datasets for warehouse settings, we also introduce the synthetic data generation pipeline termed as WareSynth, which can be used to
generate 3D warehouse scenes, automate the process of data capture and generate corresponding annotations. WareSynth can be used for various tasks such as 2D/3D object detection, semantic/instance segmentation, layout estimation, 3D scene navigation and mapping,
3D reconstruction etc. The same pipeline can also be modified to other kind of scenes such
as supermarkets, greenhouses by changing the database of objects and placement parameters
hence this pipeline open gates for further research in similar environments.