Regulating A Four-Participant Conversation
We present a framework for facilitation robots that regulates imbalanced engagement density in four-participant conversation as the forth participant with proper procedures for obtaining initiatives. Four is the special number in multiparty conversations. The three-participant conversation is the minimum unit where the participants autonomously organize a multiparty conversational situation. The fourth participant is the first person who can objectively observe the conversational situation. In three-participant conversations, social imbalance, in which a participant is left behind in the current conversation, sometimes occurs. In such scenarios, a conversational robot has the potential to objectively observe and control situations as the fourth participant. A four-participant conversational situation, where three participants and a facilitator are participating, is the minimum unit of the facilitation process model.
Extended Participation Structure
In order to formalize procedural steps obtaining an initiative controlling a situation, we begin by extending the participation structure model in multiparty conversations. The participation structure model was presented by Clark (1996), drawing on Goffman’s work (1981). In this model, each participant is assigned a participation role considered by the current speaker, where speaker, addressee, and side-participant are “ratified participants.” Ratified participants include the speaker and addressees, as well as a side-participant who is taking part in the conversation but is not currently being addressed. All other listeners, who we refer to as over-hearers, have no rights or responsibilities within the structure. Over-hearers come in two main types. Bystanders are those who are openly present but not part of the conversation. Eavesdroppers are those who listen in without the speaker’s awareness. The speaker must pay close attention to these distinctions when speaking. For example, the speaker must distinguish addressee from side-participants. When the speaker asks an addressee a question, the speaker must make sure that it is the addressee who is intended to answer the question, and not side-participants. However, the speaker must also ensure that the side-participant understands the question directed at the addressee. In addition, the speaker must consider the over-hearers. However, because the over-hearers have no rights or responsibilities in the current conversation, the speaker can treat them as he pleases.
In this paper, we extend Clark’s model with the concept of engagement. Based on previous studies, we define engagement as the process establishing connections among participants using dialogue actions so that they can represent their own positions properly. In the extended participation structure model, suppose participant C has been assigned as a side-participant who has not engaged with other participants for a significant time. Participant C’s amount of communication traffic with the other participants is significantly less than that of the others. Here, we define “engagement density,” which represents the amount of communication traffic. As a relevant measurement of engagement density, Katzenmaier et al. produced a measure of “utterance density,” which takesthe ratio ofspeech to non-speech behavior per utterance (“a speech activity per a certain unit of time by dividing each utterance duration by the sum of previous and following pause durations”) (Campbell and Scherer, 2010). While the utterance density directly dependents on speech activities, the engagement density is a measurement of amount of communication between interlocutors. Therefore, even if a participant’s utterance density is high, it does not mean the engagement density is high. Jokinen (2011) also mentioned that sometimes one of the participants might be less active in turn-taking (engagement) even if the speaking activity in the conversation as a whole is large. Three-participant conversations are likely to produce a difference of density. We define a “harmonized” participant as a participant with high engagement density, and an “un-harmonized” participant as a participant with low engagement density. Consequently, speaker and addressee are always assigned as harmonized participants, and side-participants can be divided into two types in terms of engagement density: harmonized side-participant and unharmonized side-participant. Fig. 3 shows the extended participation structure based on Clark’s model. Although all side-participants are ratified, an un-harmonized side-participant, who is only recognized by the speaker, can sometimes emerge in four-participant situations.
Procedures obtaining initiatives controlling engagement density
In terms of the way of controlling engagement, Whittaker et al. analyzed two-participant dialogues to investigate the mechanism how each control was signaled by speakers and how it affects discourse structure, including the lower control level, topic level and global organization level (Whittaker and Stenton, 1988). For the control level, they found that three types of utterances (prompts, repetitions and summaries) were consistently used to signal. For the topic level, they found that interruptions introduce a new topic. And the global organization is organized also by topic initiation. This study argued that not only signal utterances but also topic shifting/initialization plays an important role for engagement control. On the basis of these discussions above, we define the following constraints for both harmonized and un-harmonized participants when they address a next speaker and shift current topics:
- Constraint of addressing: An un-harmonized participant must not address the other un-harmonized participants directly.
- Constraint of topic shifting: An harmonized participant must not shift the current topic when he/she addresses the other un-harmonized
participants.
The relationship between subject and target participants that are permitted to approach in the two constraints are shown in Tables 2 and 3. For examples, while a harmonized participant (speaker, addressee and harmonized side-participant) can address an both harmonized (addressee and harmonized side-participant) and un-harmonized (un-harmonized side-participant) participants, an un-harmonized participant can not address another un-harmonized participant. In the following sections, we describe a computational model that has the group maintenance functions discussed above.
Computational Model of Engagement Density Control
We model the engagement density control as Partially Observable Markov Decision Process (POMDP). In the following diagram, Circles represent random variables, squares represent decision nodes, and diamonds represent utility nodes. Shaded circles indicate random variables, while unshaded circles represent observed variables. Solid directed arcs indicate casual effect, while dashed directed arcs indicate that a distribution is used. we assume a set of states S can be factored into three components:
- Harmony (engagement) states Se,
- Participants’ motivation states Sm,
- Participants’ actions Ap.
Adjacency Pairs: Timing of Initializing a Procedure
In order to detect timing of initializing a procedure, a facilitator should care about a unit of consecutive sequence to avoid to break a current conversation. An adjacency pair is a minimal unit of conversational sequence organization (Schegloff and Sacks, 1973), therefore it might be reasonable to employ here. An adjacency pair is characterized by certain features (Schegloff, 2007): (a) composed of two turns, (b) by different speakers, (c) adjacently placed, (d) these two turns are relatively ordered; that is, they are differentiated into “first part parts” and “second pair parts”. First pair parts are utterance types that initiate some exchange, such as question, request, offer, invitation, announcement, etc. Second pair parts are utterance types that are responsive to the action of prior turn, such as answer, grant, reject, accept, decline, agree/disagree, acknowledgement, etc. (e) pair-type related; that is, not every second pair part can properly follow any first pair part. Adjacency pairs compose pair types; types are exchanges, such as greeting-greeting, question-answer, offer-accept/decline, and the like. To compose an adjacency pair, the first and second pair parts come from the same pair type.
The basic practice or rule of operation, then by which the minimal form of the adjacency pair is produced is: (1) given the recognizable production of a first pair part, (2) on its first possible completion its speaker should stop, (3) a next speaker should start (often someone selected as next speaker by the first pair part), and (4) should produce a second pair part of the same pair type. Adjacency pair-based sequences can come to have more than two turns. Schegloff discussed expansions of adjacency pairs, including pre-expansion, insert expansion, and post-expansion.1 The product of these features of adjacency pairs may be represented schematically in a very simple transcript diagram as follows:
<- Pre-Expansion
A: First Pair Part
<- Insert Expansion
B: Second Pair Part
<- Post Expansion (sequence-closng third)
So, which timing can be candidates for a facilitator to initiate procedures? As a facilitator might produce economically short steps of procedures to help a left behind participant, in this paper, we assume every second or third part might be the candidates to initiate. We assume that an un-harmonized participant needs to be approved by a speaker’s second pair part to be harmonized.
Architecture for Group Process
Based on the requirements and elements of facilitation model, as well as the general concepts of cognitive architectures we reviewed above, we propose a computational architecture for multiparty conversation facilitation robots, namely the SCHEMA Framework. The SCHEMA Framework mainly consists of the following processes: the Perception Process the Procedural Production Process the Language Generation Process. The Perception Process process interprets situations based on visual and auditory information. This process includes Adjacency Recognition, Participation Recognition, Topic Recognition and Question Analysis. Each time the system detects an endpoint of participant’s speech from the automatic speech recognition (ASR) module, it interprets the current situation. The Procedural Production Process produces procedural actions to manage a group, referring Goal Management Module. This module is modeled as a reinforcement learning framework (partially observable Markov decision process (POMDP)). The Language Generation Process. is divided into factoid and non-factoid typed answer generation modules. The factoid typed answer generation module refers to structured knowledge databases organized using Semantic Web techniques. The non-factoid typed answer generation module generates the system’s own opinions automatically extracted from a large indefinite number of reviews on the Web. It also has an utterance combination mechanism that combines factoid and non-factoid typed responses to realize the additional phrasing function.
Experiments
In this video, the participant C is left behind the current conversation. In such moments, SCHEMA tries to approach the participant C to give him a floor, caring the interaction between A and B simultaneously. First, SCHEMA tries to obtain an initiative, and then asks C to give him a floor.
Related Papers
- Four-Participant Group Conversation: A Facilitation Robot Controlling Engagement Density As the Fourth Participant
Journal of Computer Speech and Language, 2015. (DOI:10.1016/j.csl.2014.12.001)
Yoichi Matsuyama, Iwao Akiba, Shinya Fujie and Tetsunori Kobayashi
| DOI