As a robotic platform for multiparty conversation facilitation, we present the design of SCHEMA robot. In order to participate in multiparty conversational situations, and be recognized as a ratified participant, a robot needs to have capabilities to exchange conversational protocols, which include organizing participation structure, transmitting messages, and turn-taking. Such protocols essentially needs robots’ embodied functions, including facial expressions, head gestures, and directional control of torso. Based on our studies, SCHEMA has 22 degrees of freedom. It was also designed with an user-friendly styling for all generations, from children to elderly people.
In recent years, there has been a growing worldwide attention to the development of humanoid robots designed for social interactions, and such robots have been investigated in many fields, including human-robot interaction, developmental robotics, and embodied conversational agents. In such fields, robotic platforms have been regarded both as tools to model human cognitive functions from scientific perspectives, and as optimized human interfaces from engineering perspectives. While robotic hardware has been designed from both types of perspectives, recent robotic platforms have common architectural structures. The architectures can be defined in terms of protocols. Figure 1 shows a diagram of one of the common structures. The highest abstraction on the top is cognitive architecture such as a conversational system framework. Modules of a conversational system framework could run on multiple computer node located remotely. In this level, a higher level of protocols are defined to connect cognitive modules. Connections among the modules of a cognitive architecture are supported by networking middleware managing modularity by abstracting algorithmic modularity and hardware interfacing. The protocols of networking middleware address inter- process communication requirements. The abstraction is usually implemented as port connections. Some types of middleware follow the observer pattern by decoupling producers and consumers. Examples include “yet another robot platform” (YARP) and the virtual human messaging (VHMsg) library. Such middleware can deliver messages of any size across a network using a number of protocols and shared memory. Some other middleware use peer-to-peer models such as the robot operating system (ROS) and message-oriented networked-robot architecture (MONEA). The lower abstraction level concerns hardware devices defining the interfaces for devices via their native APIs, which easily encapsulates hardware dependencies.
We present a robotic platform for conversational robots participating in multiparty conversations. In conversations among humans, we use social cues expressed by embodiments, such as facial expressions, head gestures, and body orientation. For example, head gestures such as nodding/shaking head can express positive/negative attitudes. Eye gaze explicitly expresses the participant’s interest. ROBITA had capabilities to recognize the facial direction of the speaker (Matsusaka et al., 2003). When ROBITA detects the end of the speaker’s utterance and that the speaker is facing the robot, it assumes that the turn is being handed over to it. It thus takes the turn and begins to speak. If it detects that the speaker is facing another hearer, it assumes that the turn is being handed over to that hearer. It regards that hearer as the expected speaker and gazes at that person (even if he or she does not begin to speak). This function achieves not only smooth turn-taking but also a feeling of unity. Besides, the robot’s user-friendly exterior is necessary in the daily life situations where humans live with robots. Considering these perspectives, we developed SCHEMA, a robotic platform for physically situated conversational agents.
“Anatomists draw these fundamental forms as “schema” in their minds. […] Human anatomy, therefore, draws clearly such schemas of human bodies.”
The platform was named SCHEMA ([she:ma]) because of the following quotation by the Japanese anatomist Shigeo Miki in his book “Introduction to Life Morphology”: ”Anatomists draw these fundamental forms as “schema” in their minds. Human anatomy, therefore, draws clearly such schemas of human bodies.” Generally, the word “schema” comes from the Greek word “σχημα,” which means shape, or plan. As our platform is developed to investigate the general framework and process many aspects of conver- sations, we named it SCHEMA. SCHEMA is a successor of the previously developed “Waseda Robots” named ROBISUKE and KOBIAN. ROBISUKE is a one-meter-tall conversational robot developed by the Perceptual Computing Group in 2002, and numerous conversational systems have been developed based on the robot (Fujie et al., 2008). KOBIAN (Endo et al., 2008) is a full body bipedal walking humanoid with a face that enables it to express emotions. The robot was developed by combining and redesigning WABIAN, a bipedal robot (Ogura et al., 2006) and WE-4, an expressive-face robot (Miwa et al., 2002). Considering these concepts and mechanical designs, we redesign a humanoid robot for conversational purposes.
In our study, taking into consideration both standing and sitting situations, we designed the height of the robot as 120 cm, slightly smaller than eye level. Ideally, it is the best way to implement all embodied capabilities in just proportion of humans to express social cues. Nevertheless, all degrees of freedom of the human body are not always needed to express social cues. We designed the degrees of freedom according to the priority of essential social cue capabilities. As for the styling (cover) design, we considered the following three functions:
- Abstracted user-friendly forms.
- Detachable joints of covers in case of emergency breakage because of high tension of wires.
- Covers easily detachable by hand for maintenance.
To ensure user-friendly design, all parts are of the rounded streamline shape. In order to realize the free- form curve, we used the fiber reinforced plastics (FRP) material. As shoulders need bigger output motors, they might threaten users. As a solution, we crane its neck and treated surface treatment with free-form curves from head to shoulders to make the robot appealing to users. Considering safety and maintenance aspects, all joints of covers are attached to mechanical parts by strong magnets. Hence, screw holes are not required to attach the parts, resulting in beautiful styling.
For a socially situated conversational robot, head gesture and facial expression features are necessary. For example, head gestures such as nodding or shaking head are needed to express positive/negative attitudes. Eye gaze explicitly expresses the participant’s interest in the conversation. Lip that move when the robot speaks and eyebrows that express emotions such as confusion and surprise are also needed. On an empirical basis of character animation, eye blinks are used to convey that the agent is “breathing.” In order to realize these functions, the degrees of freedom of SCHEMA’s head are designed as listed in Table 5.1.
Arm movements are used for pointing at objects of interest or to communicate symbolic or linguistic information. In order to realize these functions, the degrees of freedom of SCHEMA’s arms are designed as listed in Table 5.2.
Body orientation of a robot also generates social cues that elicit recognition by the participants in a group. Generally, participants place themselves where they can see each other and position their bodies so that they are oriented toward the group centroid. An overhearer willing to participate in the conversation typically gazes at the current speaker and expresses his/her intention to participate by certain actions such as raising a hand or saying a typical phrase, such as “excuse me…” If the overhearer is recognized by the current speaker, the other participants might change their positions so that they are oriented toward the centroid of the enlarged group. We design this function as the rotation of the mobile turret. In case the robot cannot follow an object enough only by eye gaze, the turret rotation can help cover the object. Motor drivers, laptop PCs, a speaker, a speaker amplifier, and batteries are located inside the robot’s body. Table 3 lists the mechanical parts.