Tomato harvesting in intelligent greenhouses is crucial for reducing costs and optimizing management. Agricultural robots, as an automated solution, require advanced visual perception. This study proposes a tomato detection and counting algorithm based on YOLOv8 (TCAttn-YOLOv8). To handle small, occluded tomato targets in images, a new detection layer (NDL) is added to the Neck and Head decoupled structure, improving small object recognition. The ColBlock, a dual-branch structure leveraging Transformer advantages, enhances feature extraction and fusion, focusing on densely targeted regions and minimizing small object feature loss in complex backgrounds. C2fGhost and GhostConv are integrated into the Neck network to reduce model parameters and floating-point operations, improving feature expression. The WIoU (Wise-IoU) loss function is adopted to accelerate convergence and increase regression accuracy. Experimental results show that TCAttn-YOLOv8 achieves an mAP@0.5 of 96.31%, with an FPS of 95 and a parameter size of 2.7 M, outperforming seven lightweight YOLO algorithms. For automated tomato counting, the R2 between predicted and actual counts is 0.9282, indicating the algorithm's suitability for replacing manual counting. This method effectively supports tomato detection and counting in intelligent greenhouses, offering valuable insights for robotic harvesting and yield estimation research.
Keywords: CloFormer; Object detection; attention convolution; tomato detection count.