Study on the Tea Pest Classification Model Using a Convolutional and Embedded Iterative Region of Interest Encoding Transformer

Biology (Basel). 2023 Jul 17;12(7):1017. doi: 10.3390/biology12071017.

Abstract

Tea diseases are one of the main causes of tea yield reduction, and the use of computer vision for classification and diagnosis is an effective means of tea disease management. However, the random location of lesions, high symptom similarity, and complex background make the recognition and classification of tea images difficult. Therefore, this paper proposes a tea disease IterationVIT diagnosis model that integrates a convolution and iterative transformer. The convolution consists of a superimposed bottleneck layer for extracting the local features of tea leaves. The iterative algorithm incorporates the attention mechanism and bilinear interpolation operation to obtain disease location information by continuously updating the region of interest in location information. The transformer module uses a multi-head attention mechanism for global feature extraction. A total of 3544 images of red leaf spot, algal leaf spot, bird's eye disease, gray wilt, white spot, anthracnose, brown wilt, and healthy tea leaves collected under natural light were used as samples and input into the IterationVIT model for training. The results show that when the patch size is 16, the model performed better with an IterationVIT classification accuracy of 98% and F1 measure of 96.5%, which is superior to mainstream methods such as VIT, Efficient, Shuffle, Mobile, Vgg, etc. In order to verify the robustness of the model, the original images of the test set were blurred, noise- was added and highlighted, and then the images were input into the IterationVIT model. The classification accuracy still reached over 80%. When 60% of the training set was randomly selected, the classification accuracy of the IterationVIT model test set was 8% higher than that of mainstream models, with the ability to analyze fewer samples. Model generalizability was performed using three sets of plant leaf public datasets, and the experimental results were all able to achieve comparable levels of generalizability to the data in this paper. Finally, this paper visualized and interpreted the model using the CAM method to obtain the pixel-level thermal map of tea diseases, and the results show that the established IterationVIT model can accurately capture the location of diseases, which further verifies the effectiveness of the model.

Keywords: ViT; disease diagnosis; image classification; tea leaf.

Grants and funding

This work was partly supported by the National Natural Science Foundation of China (62265007 and 32260622).