Vahagn: VisuAl Haptic Attention Gate Net for slip detection

Front Neurorobot. 2024 Nov 6:18:1484751. doi: 10.3389/fnbot.2024.1484751. eCollection 2024.

Abstract

Introduction: Slip detection is crucial for achieving stable grasping and subsequent operational tasks. A grasp action is a continuous process that requires information from multiple sources. The success of a specific grasping maneuver is contingent upon the confluence of two factors: the spatial accuracy of the contact and the stability of the continuous process.

Methods: In this paper, for the task of perceiving grasping results using visual-haptic information, we propose a new method for slip detection, which synergizes visual and haptic information from spatial-temporal dual dimensions. Specifically, the method takes as input a sequence of visual images from a first-person perspective and a sequence of haptic images from a gripper. Then, it extracts time-dependent features of the whole process and spatial features matching the importance of different parts with different attention mechanisms. Inspired by neurological studies, during the information fusion process, we adjusted temporal and spatial information from vision and haptic through a combination of two-step fusion and gate units.

Results and discussion: To validate the effectiveness of method, we compared it with traditional CNN net and models with attention. It is anticipated that our method achieves a classification accuracy of 93.59%, which is higher than that of previous works. Attention visualization is further presented to support the validity.

Keywords: attention mechanism; haptic; multimodal deep learning; multimodal perception; robot perception.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded by 2035 Innovation Pilot Program of Sichuan University, China.