The condition for artificial agents to possess perceivable intentions can be considered that they have resolved a form of the symbol grounding problem. Here, the symbol grounding is considered an achievement of the state where the language used by the agent is endowed with some quantitative meaning extracted from the physical world. To achieve this type of symbol grounding, we adopt a method for characterizing robot gestures with quantitative meaning calculated from word-distributed representations constructed from a large corpus of text. In this method, a "size image" of a word is generated by defining an axis (index) that discriminates the "size" of the word in the word-distributed vector space. The generated size images are converted into gestures generated by a physical artificial agent (robot). The robot's gesture can be set to reflect either the size of the word in terms of the amount of movement or in terms of its posture. To examine the perception of communicative intention in the robot that performs the gestures generated as described above, the authors examine human ratings on "the naturalness" obtained through an online survey, yielding results that partially validate our proposed method. Based on the results, the authors argue for the possibility of developing advanced artifacts that achieve human-like symbolic grounding.
Keywords: co-speech iconic gesture; human-robot interaction (HRI); natural language processing (NLP); robotics; word-distributed representation.
Copyright © 2024 Sasaki, Nishikawa and Morita.