Optimizing tomato detection and counting in smart greenhouses: A lightweight YOLOv8 model incorporating high- and low-frequency feature transformer structures

Zhimin Tian; Huijuan Hao; Guowei Dai; Yajuan Li

doi:10.1080/0954898X.2024.2428713

Optimizing tomato detection and counting in smart greenhouses: A lightweight YOLOv8 model incorporating high- and low-frequency feature transformer structures

Network. 2024 Nov 21:1-37. doi: 10.1080/0954898X.2024.2428713. Online ahead of print.

Authors

Zhimin Tian¹, Huijuan Hao¹, Guowei Dai^{2

3

4}, Yajuan Li¹

Affiliations

¹ Hebei University of Water Resources and Electric Engineering, Cangzhou, Hebei, China.
² College of Computer Science, Sichuan University, Chengdu, China.
³ National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China.
⁴ Agricultural Information Institute of CAAS, National Agriculture Science Data Center, Beijing, China.

PMID: 39569671
DOI: 10.1080/0954898X.2024.2428713

Abstract

Tomato harvesting in intelligent greenhouses is crucial for reducing costs and optimizing management. Agricultural robots, as an automated solution, require advanced visual perception. This study proposes a tomato detection and counting algorithm based on YOLOv8 (TCAttn-YOLOv8). To handle small, occluded tomato targets in images, a new detection layer (NDL) is added to the Neck and Head decoupled structure, improving small object recognition. The ColBlock, a dual-branch structure leveraging Transformer advantages, enhances feature extraction and fusion, focusing on densely targeted regions and minimizing small object feature loss in complex backgrounds. C2fGhost and GhostConv are integrated into the Neck network to reduce model parameters and floating-point operations, improving feature expression. The WIoU (Wise-IoU) loss function is adopted to accelerate convergence and increase regression accuracy. Experimental results show that TCAttn-YOLOv8 achieves an mAP@0.5 of 96.31%, with an FPS of 95 and a parameter size of 2.7 M, outperforming seven lightweight YOLO algorithms. For automated tomato counting, the R² between predicted and actual counts is 0.9282, indicating the algorithm's suitability for replacing manual counting. This method effectively supports tomato detection and counting in intelligent greenhouses, offering valuable insights for robotic harvesting and yield estimation research.

Keywords: CloFormer; Object detection; attention convolution; tomato detection count.