Abstract
Mangoes hold significant economic and cultural importance in India and globally, making accurate yield prediction crucial for optimizing domestic consumption, enhancing international trade, and supporting farmers in decision-making. Traditional yield estimation methods, such as manual counting, are labor-intensive, error-prone, and impractical for large-scale orchards, necessitating automated solutions. This study presents a novel time-series vision dataset for mango yield prediction, capturing images from 12 trees across eight spatial orientations (SW, S, SE, E, NE, N, NW, W) over 2.5 months to analyze fruit growth and flowering patterns. A high-resolution image dataset was created and annotated using a semi-automatic pipeline integrating YOLO for object detection and SAM for precise segmentation, significantly reducing manual annotation efforts. Benchmarking was conducted using state-of-the-art deep learning models for segmentation (DeepLabV3+, PSPNet, SegFormer, Swin-S, YOLO+SAM) and detection (Mask R-CNN, Faster R-CNN, DETR, YOLO). For flower segmentation, Swin-S achieved the best results with 67.35 IoU, followed closely by SegFormer with 66.44 IoU, while for fruitlet segmentation, YOLO+SAM obtained the highest IoU of 78.97. In detection tasks, YOLO achieved the best performance for both flowers and fruitlets, with 63.8 mAP50 and 80.5 mAP50, respectively. Additionally, segmentation is identified as a suitable approach for flower feature extraction in yield prediction models, with SegFormer emerging as a strong choice due to its lower computational cost. Furthermore, this study discusses the relationship between wind direction patterns and the flower-to-fruit conversion ratio, challenging previous research that attributed yield variations solely to canopy structure differences with respect to orientation.