Abstract
In Indian urban and rural driving scenarios, small objects are pervasive and often crucial for safe
navigation. These objects can include pedestrians crossing roads, children playing near streets, cyclists,
stray animals, as well as small vehicles like scooters and motorbikes. Additionally, traffic signs, signal
lights, potholes, and road markings (such as lane dividers or zebra crossings) are often small in size but
essential for driving decisions. In such contexts, missing or inaccurately segmenting these small objects
can lead to critical errors in detection, causing accidents or delays in the vehicle’s decision-making
process. Automated understanding of such objects need detection and segmentation to start with.
Semantic Segmentation is a critical task in computer vision with a wide range of applications. The
objective is to partition an image—a collection of pixels—into distinct labeled regions, each corresponding to specific objects or parts of the scene. This process is crucial for scene understanding and enables
the localization of objects within the image. Over time, significant progress has been made in semantic
segmentation, especially with the advent of deep learning. The advances in this area have revolutionized computer vision, pushing beyond traditional methods and achieving remarkable improvements in
performance.
When discussing semantic segmentation, we often focus on datasets, the objects within those datasets,
and their corresponding segmentations. While many datasets exist for road scenarios, particularly those
representing Western road conditions, there is relatively little research on road conditions specific to
India. One notable exception is the Indian Driving Dataset (IDD), a dataset specifically designed for
semantic segmentation of Indian road scenarios.
Road and driving datasets typically contain objects of varying sizes within each class label. These
objects can be broadly categorized into three types: small, medium, and large. The importance of
segmentation is well understood across several domains such as medical imaging, autonomous vehicles,
aerial imagery, robotics, surveillance, and industrial automation. However, one of the most challenging
problems in segmentation is the segmentation of small objects. Small object segmentation is particularly
difficult due to factors such as (i) the limited number of pixels representing small objects, (ii) class
imbalance during training, and (iii) the inherent challenges posed by small object representations. These
factors hinder the performance of deep learning architectures, making it harder for modern techniques
to accurately handle small objects.
This issue becomes evident when evaluating deep learning models, as performance metrics tend to
be significantly worse for small objects. In this study, we delve into the challenges of segmenting small
objects and explore potential strategies for improving algorithm design for this task. It is widely acknowledged that loss functions play a key role in segmentation performance, alongside the architecture
of deep learning models. Indeed, the segmentation of small objects is highly influenced by the choice
of loss function during model training.
Various small objects commonly encountered in road scenarios are highlighted in this study, emphasizing their importance in Indian road environments. Objects such as pedestrians, traffic signs, and
traffic lights—despite their small size—are crucial for tasks like autonomous navigation and driver assistance systems. The Indian Driving Dataset (IDD) offers a valuable resource, containing a wide range
of small objects across different class labels, some of which dominate specific categories. These objects
are captured and annotated in real-world, unstructured environments, making this dataset particularly
relevant for segmentation tasks in Indian road scenarios. Therefore, our focus in this study is primarily
on detection and segmentation of small objects within the IDD dataset and explore the impact of loss
functions.