"Continuous interaction for ECAs"
Project Team
Dennis Reidsma (Project leader)
Herwin van Welbergen
Khiet Truong
University of Twente
Human Media Interaction
Senior Advisors
Prof.dr.ir Anton Nijholt (University of Twente)
Dr. Dirk Heylen (University of Twente)
Dr.-Ing. Stefan Kopp (Bielefeld University)
11 December 2009
Abstract
The main objective of this project is to develop an Embodied Conversational Agent able to
receive and to handle certain kinds of feedback, backchannel and interruptions from the user. We
plan on modeling and implementing the sensing, interaction and generation for what we call
continuous interaction. A continuous interactive ECA will be able to perceive and generate
conversational (non-)verbal behavior fully in parallel, and can coordinate this behavior to
perception continuously -- a capability which is not present in most state-of-the-art ECAs.
We propose to do this specifically by looking at feedback, backchannel and interruption behavior
from the human user who is listening to the ECA that serves as a virtual route guide. The ECA
will present information to the user in a multi-modal way. Actively dealing with and responding to
the above mentioned behaviors from the user requires the ECA to be able to handle overlap,
replanning and re-timing of expressions, ignoring attempts by the user to interrupt, and
abandoning of planned utterances (letting itself in effect be interrupted). An evaluation study will
show how the ECA developed is perceived by human users in terms of politeness and certain
personality traits.
1. Project objectives
The design of interactive Embodied Conversational Agents (ECAs, see e.g. Figure 1) has mainly
focused on the combination of speech with gestures in conversational settings (Mancini et al.,
2008; Kopp et al., 2004; Thiebaux et al., 2008). They tend to be developed using a turn-based
interaction paradigm in which the user and the system take turns to talk. If the interaction
capabilities of ECAs are to become more human-like and they are to function in social settings,
their design should shift from this turn-based paradigm to one of continuous interaction in which
all partners perceive each other, express themselves, and coordinate their behavior to each
other, continually and in parallel (Thorisson et al., 2002, Nijholt et al., 2008).
In human dialogs, both verbal and nonverbal behaviors contribute to the perceived quality of the
dialog, e.g. in terms of naturalness, politeness and effectiveness. Apart from the choice of words,
the way they are spoken (e.g. pitch, volume and timing) is informative of the userÕs mental state,
attitude and intentions. In addition, short vocal utterances such as "yeah" and "hmm" are often
used to encourage the other to continue. Many of these cues and patterns can also be found in
nonverbal behavior. Gaze, head movements (e.g. nodding), posture shifts and hand gestures
help to coordinate the "flow" of the conversation. Although on average, people tend to speak and
listen in turns, when we take this wider interpretation, people "talk" and "listen" (display and
perceive) simultaneously all the time. Moreover, people coordinate their behavior to that of the
conversation partner continuously.
The main objective of this project is to explore this kind of coordination behavior in ECAs,
modeling and implementing the sensing, interaction and generation for what we call continuous
interaction. A continuous interactive ECA will be able to perceive the user and generate
conversational behavior fully in parallel, and can coordinate behavior to perception continuously -a capability which is not yet present in most state-of-the-art ECAs.
We propose to do this specifically by looking at multi-modal feedback, backchannel and
interruption behavior from the human user who is listening to the ECA that serves as a virtual
route guide. The ECA will present information to the user in a multi-modal way. At the same time,
it will perceive the user, through cameras and microphones. Actively dealing with and responding
to the above mentioned behaviors from the user requires the ECA to be able to handle overlap,
replanning and re-timing of expressions, assessing attempts by the user to interrupt, and
abandoning of planned utterances (letting itself in effect be interrupted).
Three important aspects of this project are the knowledge about human-human interaction that is
used to determine what to develop for the ECA, the scenarios that are used to lead the user into
actually displaying the feedback and interruption behaviors, and the evaluation of the resulting
behavior as for how it is perceived by the user along several dimensions such as politeness and
certain personality traits (e.g., dominance).
Figure 1 Example of earlier ECA application capable of continuous interaction and parallel
perception and generation
Figure 2 ECA that attempts to establish eye contact with museum visitor
2. Background information
In human-human communication, nonverbal signals play an important role to maintain the 'flow' in
the conversation. For instance, humans often show that they are listening and following the
conversation by nodding or saying 'hmm' and 'yeah' at appropriate moments: listeners give
feedback (Allwood et al., 1992) to signal that they are engaged in the conversation. Feedback
can be given explicitly by words, e.g., by officially taking the floor and expressing his/her feelings
in one turn, but it can also be given via nonverbal signals, e.g., head nods. Backchannels can be
considered a subset of feedback behavior: backchannel signals are given by the listener to show
that he/she is listening and that the speaker can continue with his/her turn, without having the
intention to take a turn on his/her own. Typical backchannel behavior can be expressed via
(short) vocal interjections, gaze behavior and head gestures. Feedback and backchannel
behavior of the listener do not occur randomly, these are coordinated to the (non-)verbal actions
of the speaker (e.g., Nijholt et al., 2008). Interruptions are also signalled in a multimodal way by
the listener, often simultaneously with the speech from the speaker. However, in contrast to many
feedback behaviors, interruptions are intended to take the turn from the speaker. The two kinds of
behaviors (backchannel and feedback vs interruptions) require different kinds of reactions from
the speaker.
In aiming for a human-like continuous interaction between a virtual human and a user, we need to
study how these types of behaviors are coordinated between speaker and listener, and we need
to implement this coordinated behavior in a virtual human. For example, by explicitly eliciting
feedback from users, verbally or non-verbally, and by having the virtual human react
appropriately to this feedback, the user will feel being understood and actively taking part in the
conversation. If the user interrupts the ECA, the ECA should react adequately Ð either releasing
the floor, or talking right through the interruption, forcing the user to wait for his/her turn.
The above goals requires a multimodal behavior realizer to be capable of immediate adaptation Ð
in content and in timing -- to the dynamics of the environment and the user. So far, projects that
explore related themes have focused on generation of backchannel at appropriate moments (e.g.,
Maatman et al., 2005; Ward and Tsukuhara, 1999), and avoiding speech overlap (Jonsdottir and
Thorisson, 2009) rather than dealing with backchannel reception and overlap. Elckerlyc
(http://hmi.ewi.utwente.nl/showcases/Elckerlyc) offers functionality to allow this immediate
adaptation. The backchannel awareness and handling in the virtual guide proposed here is one
example of the use of such continuous interaction mechanisms.
A virtual human can be designed to have different strategies to deal with feedback and
backchannel behavior. The strategy of turn-taking (Sacks et al., 1974) is an important process in
coordinating the communicative behavior between speaker and listener. Sacks et al. (1974)
studied turn-taking in casual conversations and stated that the "turn allocation" (who is the next
speaker?) is governed by a set of rules that are designed to aim at as few simultaneous talk as
possible. For example, the virtual human can be designed to wait for the user to finish his/her turn
before the virtual human starts its own turn. But what happens when the turn-taking goes wrong,
for example, when both the user and the virtual human start speaking at the same time, or when
the user starts speaking before the virtual human has finished its turn? Maat and Heylen (2009)
have shown through artificial simulations of face-to-face conversations that variations of turntaking strategies can lead to different human perceptions of an agent on personality scales,
interpersonal scales and emotional scales. In particular, we want to focus on designing overlap
resolution strategies for the virtual human. For example, what should the virtual human do when it
is being interrupted intrusively, should it raise its voice and keep talking? One possible way to
determine whether the interruption was intrusive or not, perhaps it was a vocal backchannel, is to
use an automatic detector that can discriminate between intrusive and non-intrusive interruptions
(French and Local, 1983; Lee et al., 2008). The design of these turn-taking and overlap resolution
strategies depends on the personality behavior that we want to give the virtual human. A polite
virtual human (Hofs et al., 2009) will have a different turn-taking and overlap resolution strategy,
than an impolite one, and will exhibit different coordinated behavior to the listener. After
implementing the different strategies and nonverbal behavior actions, the virtual human needs to
be evaluated. Our plan is to evaluate the virtual human along several scales, e.g., personality,
interpersonal, and emotional scales (see also Maat and Heylen, 2009), with a specific interest for
politeness (Brown and Levinson, 1987).
3. Detailed technical description
3.1 GLOBAL SYSTEM ARCHITECTURE
Figure 3: Envisioned system architecture
Fig.3 gives an overview of the architecture of the ECA that will be developed. We intent to detect
the occurrence of feedback and interruptions using non-verbal vocalization analysis (e.g., "uh
huh", "mmm"), facial feature detection (e.g., raised eyebrows, head nods), keyword spotting (e.g.,
"yeah", "ok"), and gaze tracking. The behavior planner specifies the behavior to be realized on
the basis of politeness and social strategies (can be kept fixed, or modulated by interpreted input)
and conversation content (a specification of the route to explain). If feedback or backchannels
occur, Elckerlyc is instructed to gracefully interrupt the currently running behavior or to retime or
re-parameterize (speak louder, increase the amplitude of gestures etc.) its behavior. New
behavior can be constructed by selecting and inserting new BML fragments in order to react to
interruptiona. The exact method of feedback handling is influenced by turn-taking strategies and
politeness/social strategies.
3.2 Detailed tasks and research themes
Sensing
Non-verbal aspects of human behavior often reveal part of a user's mental state, intentions or
attitude towards another (virtual) human. Taking into account non-verbal behavior can, when
coupled with the generation of believable reactions, lead to more human-like interactions with
vritual humans. This requires the detection and recognition of non-verbal cues that are
informative in the context of the conversation. These include eye gaze (looking at, looking away),
head movements (nodding, turning away), posture (open/closed, shifts) and non-verbal
vocalizations (laughter, sighs, "hmm").
We propose to use cameras, computer vision and vocal analysis techniques to observe the user.
Given the interactive scenario, processing power is limited which dictates an approach to
detection and recognition that is more opportunistic. However, (dialog) context can be used to
focus on these moments in the continuous interaction that are most informative.
More specifically, the following non-verbal cues are on the wish list to be detected:
- Detection of overlapping speech and interruptions. Ideally, we want to develop a recognizer that
can discriminate cooperative (e.g., backchannels) from competitive (e.g., intrusive turn-stealing
attempts) interruptions, primarily based on vocal behavior (French and Local, 1983, Lee et al.,
2008).
- Detection of head gestures, such as head nods (which are typical examples of backchannels).
- Detection of verbalized backchannels, such as "okay", "yeah" etc.
- Detection of the politeness of the user (could be determined from the turn-taking behavior of the
user).
- Detection of eye gaze behavior.
Scenarios
A concrete example of nonverbal signals which can be used to elicit feedback from the user is
eye gaze. Speakers tend to look away from the listener when they are in the middle of their turn,
but nearing the end of their turn they will look at the listener to see whether what he has just said,
was understood (van Es et al., 2002; Heylen et al., 2005). If our virtual human displays this
behavior as well, users will feel the need to respond either through giving a backchannel or more
elaborately with feedback, especially if this gaze is accompanied by a head nod and/or posture
shift. To make sure users understand that the virtual human is aware of backchannel behavior
and feedback, it can explicitly ask for a backchannel in the first place. An example of explicit
feedback elicitation (VH= Virtual Human):
1
2a
2b
2c
3
VH:
User:
User:
User:
VH:
And then you turn right at the... uh...
the church? VH: yes...
<silence> VH: the church...
the school? VH: no, the church...
and then immediately left.
Another way to elicit feedback with the user is to design task-based dialogues. For instance, the
is initially shown a map which gives the possibility to prime the user with some concepts such as
"school", "church", "river", "bridge" etc. Subsequently, the map is put away, the user is interacting
with the ECA to find the route, the map is shown again and the user is asked to draw the route.
In addition, the adaptive behavior of the virtual human needs to be designed. For example, if the
user is behaving impolite, the virtual human can also decide to be a bit more impolite too, which
can be reflected in the non-verbal behavior generated: try to keep the turn by speaking louder
and faster. Depending on the type of behavior that we would like to realize in concordance with a
politeness strategy and certain personality traits (e.g., dominance or impatience), different
strategies can be designed to deal with backchannels, overlap and interruptions.
More specifically, the following activities are needed to model the virtual human's (adaptive)
behavior:
- Design scenarios that elicit feedback, either explictly or implicitly.
- Study what kind of overlap resolution behaviour humans use
- Determine what kind of overlap resolution behaviour the VH should use in the polite and the
impolite cases.
- Model these behaviours so they can be generated and used.
- Implement algorithms that give the VH the possibility of performing overlap resolution behaviour
based on the politeness of the user.
Generation
The behavior of the ECA will be generated using Elckerlyc (see also below, under Resources).
This platform was developed to support continuous interaction. Among other things, it supports
flexible real-time adaptation of the /timing/ of planned behavior Ð both for the speech and the
nonverbal behavior.
In the context of this project, several things need to be done regarding the behavior generation,
among which:
- Given the content of what the ECA will present, appropriate utterances (verbal and nonverbal)
need to be defined
- In situations of overlap and interruption attempts, the ECA needs to generate the right reaction:
speaking louder or with more emphasis, gracefully abandoning the utterance, retiming part of the
utterance to make room for the feedback from the user, etc.
- As gaze behavior is an important part of the kind of interaction studied in this project, good
algorithms for natural gaze need to be implemented
Evaluation
Evaluation can be done by analyzing interactions with an agent with and without a specific type of
behavior. Different types of nonverbal behavior can be defined for different levels of politeness,
and the influences of these politeness settings can also be analyzed. Different strategies of
overlap resolution can be evaluated. "Do users provide more feedback and backchannels under
certain conditions, do users feel more engaged in the conversation when certain strategies are
applied, how are the different overlap and politeness strategies perceived by human users in
terms of personality?" are some of the questions that can be answered through a perception
study. In addition, we also want to measure whether we succeed in eliciting feedback via the
expression of non-verbal cues.
More specifically, in the evaluation process we aim to:
- Design a perception study to evaluate the virtual route guide that exhibits different types of
communicative behavior that is implemented by varying overlap resolution and politeness
strategies.
- Develop (quantitive) measures to assess the effect of the different behaviors exhibited by the
virtual human on the perception of human users.
- Define scales/dimensions along which the virtual human is to be evaluated.
3.3 RESOURCES
To ensure rapid progress, we intent to use a number of off-the-shelf components. Members of our
team are frequent users or developers of many of these components, so we expect to deploy and
combine them quickly and smoothly. Most components are open source, which allows us to
easily make modifications that might be needed for the project.
Multimodal Behavior Specification: BML
The emerging Behavior Markup Language standard (BML) (Kopp et al., 2006,
http://wiki.mindmakers.org/projects:bml:draft1.0) is a markup language that allows one to specify
the different behaviors that a VH should execute (such as speech, gestures, poses, and gaze),
together with their synchronization. BMLT (http://wiki.mindmakers.org/projects:bml:bmlt) extends
this standard by allowing the specification of synchronization to predicted events (for example:
synchronize the stroke of a gesture to the predicted time of a head-nod of the human listener).
Elckerlyc
Elckerlyc (Van Welbergen et al., submitted; http://hmi.ewi.utwente.nl/showcases/Elckerlyc) is a
BML compliant behavior realizer for generating multimodal verbal and nonverbal behavior for
VHs. It is designed specifically for continuous (as opposed to turn-based) interaction with tight
temporal coordination between the behavior of a VH and its interaction partners.
Animation assets
Elckerlyc contains several frequently used speech-accompanying gestures (beats, head nods,
pointing gestures etc), imported from Greta (Mancini et al., 2008). We also plan to incorporated
the procedural gestures used in MURML (Kopp and Wachsmuth, 2004). All procedural
animations are parameterized: it is possible to adapt their use of power, spatial extend, fluidity
and temporal extent on the fly. New animation assets can easily be encorporated in Elckerlyc: it
features a flexible procedural animation framework, which can also be used to annotate and
retime motion captured animation. Motion capture animation can be recorded during the
preparation phase before the workshop starts, or can possibly be obtained from existing motion
capture databases. Procedural animation can be authored using the custom procedural animation
editor for Elckerlyc, or using the Greta procedural animation editor and Elckerlyc's Greta gesture
importer.
Feature extraction and sensing technology
The feature extraction and multimodal detection/interpretation is a major aspect of the project.
Besides algorithms and modules that may be developed specifically during the workshop, we can
use existing tools such as the following:
- the Semaine API for easily connecting input modules to the rest of the system
- the perception modules that will be available already from the Semaine API, such as the face
tracker from Imperial College, or openSMILE for getting speech features
- the Xuuk EyeBox for unconstrained eye contact detection
- OpenCV or EyesWeb
Equipment
The project does not need very special hardware. However, machines for the participants to work
on are useful (although many participants will probably also bring laptops); in addition, a few
slightly heavier machines, connected to a network, for running the distributed modules for sensing
and generation are useful, too. We may be able to bring a few desktop machines ourselves as
well.
3.4 Project Management
The project management tasks will be shared by the three core members of the team. We plan to
have a thorough preparation before the workshop takes place, communicating with the
participants through email, so the participants do not enter the actual workshop completely
blanco. One person of the team of project leaders will take responsibility for integrative activities
at an early stage during the workshop, to ensure that at start of evaluation there is a full system to
evaluate. Advisors from the senior staff at HMI, but also from institutes such as DFKI who
collaborate with the team members in other projects such as Semaine and SSPNet, will help
keep the project on track when needed. For example, the following three persons have agreed to
be involved as senior advisor: Prof.dr.ir. Anton Nijholt (head of the HMI Group at the University of
Twente), Dr. Dirk Heylen (HMI, will also act as liaison with a number of European partners and
projects such as Semaine and SSPNet that will play a role in the tutorial programme) and Dr.-Ing.
Stefan Kopp from Bielefeld (leader of a research group specialized in generation of natural
behavior for ECAs).
4. Work plan and implementation schedule
WP1: Sensing
In this work package, we analyze vocal and visual behavior of the user with the aim to detect
relevant non-verbal signals like backchannels and interruptions. Dependent on the expertise
available the focus may vary between vocal and visual analysis. (week 1 - 2)
WP2: Scenario and strategy design
In this work package, we aim to design strategies to deal with backchannels, overlap and
interruptions. These designs depend on the type of behavior that we would like to realize in
concordance with a politeness strategy and certain personality traits (e.g., dominance or
impatience). This also involves identifying good scenarios and conversational situations that will
actually elicit the desired behavior form the user. (week 1 - 3)
WP3: Behavior planning and adaptation
In this work package, the behavior planning is implemented, i.e. given the dynamic
communicative intent in certain situation, choose and schedule appropriate behaviors to express
intent. This includes developing timing control and (temporal) adaptation of behavior needed to
achieve real continuous interaction. Generation for graceful and natural interruption: stop in
middle of word or not. (week 2 - 4)
WP4: Evaluation
Finally, a perception study will be designed in order to evaluate the virtual human developed.
Evaluation will focus on questions: does continuous interaction work at all? Do people use
feedback behavior with such an ECA? How do users perceive personality / attitude of ECA given
different strategies of dealing with overlap, interruptions, feedback? (week 3 - 4)
WP5: End report
All members of the project team will contribute to writing an end report.
During the 4-weeks workshop, we also plan to invite speakers to give tutorials on:
- the Semaine API
- expressive speech synthesis
5. Benefits of the research
We insist that all the software components used for the project, and all the software built during
the project should be free for use, and available as such to all participants (after the workshop
too).
In the four weeks, we will have explored design and implementation issues of a continuous
interactive and feedback aware ECA. Moreover, we will have achieved and implemented a
special, novel kind of interaction that is continuous and sensitive to social conversational signals
of the speaker such as backchannels and interruptions. This working demo installation can serve
as a experimental platform that can be useful for future evaluation experiments.
Through a perception study, the ECA will be evaluated and the results will give an indication as
how the scripted behavior is perceived by humans along several social and personality scales.
This provides us insight in how to design conversational strategies to build social robots and
ECAs. All the behavior of the ECA will be incorporated in the BML realizer platform which is
publicly available (naturally, the BML extensions made during this project will also be publicly
available).
6. Profile of team:
Dennis Reidsma
Dennis Reidsma is a PostDoc at the Human Media Interaction group. He did his PhD working on
different aspects of natural interaction systems. He worked, among other things, on problems of
annotation and reliability in large multimodal annotated corpora, in the context of the EU FP6 AMI
and AMIDA projects. In addition, he worked on research and development of new interactive
systems with virtual humans. The /interactive virtual dancer /attempts to invite a human to engage
with her, using computer vision, music analysis, and patterns of leading and following behavior.
The /interactive virtual orchestra conductor /leads an ensemble of human musicians through a
musical performance using advanced interactive graphics developed at HMI and advanced music
processing algorithms. His current interests are in exploring continuous interaction with virtual
humans in conversational settings. He is one of the Elckerlyc developers.
Herwin van Welbergen
Herwin van Welbergen received his MSc in Human Media Interaction from the University of
Twente's Department of Computer Science. Currently, he is a PhD candidate at the Human
Media Interaction group. His research activities focus on real-time multimodal behavior
generation for virtual humans, using real-time procedural animation, real-time physical simulation
and speech, especially for applications that allow continuous interaction with a virtual human.
Herwin is the main developer of the Elckerlyc framework.
Khiet Truong
Khiet Truong is a postdoctoral researcher at the University of Twente in the Human Media
Interaction group. She has a background in computational linguistics and speech technology. As
a Master student, she carried out research on automatic pronunciation error detection in speech
of second-language learners. In 2009, she successfully defended her PhD thesis on automatic
emotion recognition in speech based on work carried out at TNO. Currently, she is working on
social signal processing in the SSPNet-project.
Other researchers needed
There is plenty of room for participants with expertise in one or more of a number of topics.
Depending on expertise available, elements in project may be developed more extensively, or we
might choose for opportunistic fallbacks (see technical section) where expertise is missing.
Several expertises are already available in the core team.
- Speech: nonverbal detection / recognition of intent and attitude in several dimensions is very
important in this project
- Vision: online detection of head gestures and gaze behavior, and possibly detection of whole
body movements and posture shifts
- Motion synthesis: focus on pose shifts and other 'whole body movements', especially in
coordination with conversation partners' movements
- Evaluation: people with expertise in user experiments and evaluating multimodal / ECA
interfaces
- Otherwise: any related expertise, especially in ECAs and / or multimodal interaction is welcome
Publications by proposing team that are relevant for this project:
- D. Reidsma, Z. M. Ruttkay, and A. Nijholt, "Challenges for Virtual Humans in Human Computing," in
Artifical Intelligence for Human Computing, ser. LNAI: State of the Art Surveys, T. S. Huang, A. Nijholt, M.
Pantic, and A. Pentland, Eds. Berlin/Heidelberg: Springer Verlag, 2007, pp. 316Ð338.
- D. Reidsma, A. Nijholt, and P. Bos, "Temporal Interaction Between an Artificial Orchestra Conductor and
Human Musicians," Computers in Entertainment, vol. 6, iss. 4, pp. 1Ð22, 2008.
- H. van Welbergen, D. Reidsma, J. Zwiers, Z. M. Ruttkay, and M. ter Maat, "An Animation Framework for
Continuous Interaction with Reactive Virtual Humans," in Proc. Short Paper and Poster Proceedings of The
Twenty-Second Annual Conference on Computer Animation and Social Agents, Amsterdam, 2009, pp. 69Ð
72.
- A. Nijholt, D. Reidsma, Z. M. Ruttkay, H. van Welbergen, and P. Bos, "Non-verbal and Bodily Interaction in
Ambient Entertainment," in Proc. Proceedings workshop on The Fundamentals of Verbal and Non-verbal
Communication and the Biometrical Issue, Amsterdam, The Netherlands, 2007, pp. 343Ð348.
- D. Reidsma, H. van Welbergen, R. W. Poppe, P. Bos, and A. Nijholt, "Towards Bi-directional Dancing
Interaction," in R. Harper, M. Rauterberg, and M. Combetto, Eds., Proceedings of International Conference
on Entertainment Computing (ICEC'06), 2006, pp. 1Ð12.
- S.E.M. Jansen and H. van Welbergen, "Methodologies for the User Evaluation of the Motion of Virtual
Humans," in Intelligent Virtual Agents, 9th International Conference, Lecture Notes in Computer Science,
volume 5773, Springer Berlin / Heidelberg, Berlin, ISBN 978-3-642-04379-6, pp. 125-131, 2009
- H. van Welbergen, D. Reidsma, J. Zwiers, Z.M. Ruttkay and M. ter Maat, "An Animation Framework for
Continuous Interaction with Reactive Virtual Humans," in Short Paper and Poster Proceedings of The
Twenty-Second Annual Conference on Computer Animation and Social Agents, A. Nijholt, A. Egges, H. van
Welbergen and G.H.W. Hondorp (eds), CTIT Workshop Proceedings Series, Centre for Telematics and
Information Technology, University of Twente, Enschede, ISSN 0929-0672, pp. 69-72, 2009
- H. van Welbergen, B. J. H. van Basten, A. Egges, Zs. Ruttkay and M. H. Overmars, "Real Time Character
Animation: A Trade-off Between Naturalness and Control," in State-of-the-Art-Report proceedings of
Eurographics, Mark Pauly and Guenther Greiner (eds), Eurographics Association, Munich, Germany, ISSN
1017-4656, pp.45-72, 2009
- A. Nijholt, D. Reidsma, H. van Welbergen, H.J.A. op den Akker and Z.M. Ruttkay, "Mutually Coordinated
Anticipatory Multimodal Interaction," in Nonverbal Features of Human-Human and Human-Machine
Interaction, A. Esposito, N.G. Bourbakis, N. Avouris and I. Hatzilygeroudis (eds), Lecture Notes in Computer
Science, volume 5042, Springer Verlag, Berlin, ISBN 978-3-540-70871-1, pp. 70-89, 2008
- H. Vilhjalmsson, N. Cantelmo, J. Cassell, N.E. Chafai, M. Kipp, S. Kopp, M. Mancini, S. Marsella, A.N.
Marshall, C. Pelachaud, Z.M. Ruttkay, K. Th—risson, H. van Welbergen and R.J. van der Werf, "The
Behavior Markup Language: Recent Developments and Challenges," in Proceedings of the 7th International
Conference on Intelligent Virtual Agents, C. Pelachaud, J-C. Martin, E. AndrŽ, G. Collet, K. Karpouzis and D.
PelŽ (eds), Electronic Notes in Artificial Intelligence, volume 4722, Springer, Berlin, ISSN 0302-9743, pp. 90111, 2007
- H. van Welbergen, A. Nijholt, D. Reidsma and J. Zwiers, "Presenting in Virtual Worlds: Towards an
Architecture for a 3D Presenter explaining 2D-Presented Information," IEEE Intelligent Systems, 21(5):4799, ISSN 1541-1672, 2006
- W. Lewis Johnson, Paola Rizzo, Wauter Bosma, Sander Kole, Mattijs Ghijsen and Herwin van Welbergen,
"Generating socially appropriate tutorial dialog," ISCA Workshop on Affective Dialogue Systems, Kloster
Irsee, Germany, June 2004, Lecture Notes in Computer Science 3068, E. AndrŽ, L. Dybkjaer, W. Minker &
P. Heisterkamp (Eds.), ISBN 3-540-22143-3, Springer-Verlag, Berlin Heidelberg New York, 254-264.
- K.P. Truong and D.A. van Leeuwen, "Automatic discrimination between laughter and speech. Speech
Communication," Vol. 49, 144 - 158, 2007.
- W.A. Melder, K.P. Truong, M.A. Neerincx, D.A. van Leeuwen, M. Den Uyl, L.R. Loos, and B. Plum,
"Affective Multimodal Mirror: Sensing and Eliciting Laughter," In Proceedings of Workshop on HumanCentered Multimedia, Augsburg, Germany, 2007.
- K.P. Truong, M.A. Neerincx, and D.A. van Leeuwen, "Assessing agreement of observer- and selfannotations in spontaneous multimodal emotion data," In Proceedings of Interspeech, Brisbane, Australia,
2008.
- K.P. Truong, D.A. van Leeuwen, M.A. Neerincx, and F.M.G. de Jong, "Arousal and Valence prediction in
spontaneous emotional speech: felt versus perceived emotion," Proceedings of Interspeech, 2009.
References
- Allwood, J. and Nivre, J. and Ahlsen, E. (1992) On the semantics and pragmatics of linguistic feedback.
Journal of semantics 9(1), pp. 1Ð26.
- Brown, P. and Levinson, S.C. (1987) Politeness: Some universals in language usage. Cambridge:
Cambridge University Press.
- van Es, I. and Heylen, D.K.J. and van Dijk, E.M.A.G. and Nijholt, A. (2002) Making Agents Gaze Naturally.
Does it work? In: Proceedings AVI 2002: Advanced Visual Interfaces, 22-24 May 2002, Trento, Italy,
pp. 357-358.
- French, P. and Local, J. (1983) Turn-Competitive Incomings. In: Journal of Pragmatics 7, 17-38.
- Heylen, D.K.J. and van Es, I. and Nijholt, A. and van Dijk, E.M.A.G. (2005) Controlling the Gaze of
Conversational Agents. In: Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems.
Kluwer Academic Publishers, pp. 245-262.
- Hofs, D. and Theune, M. and Op den Akker, R. (2009) Natural interaction with a virtual guide in a virtual
environment. In: Journal on Multimodal User Interfaces.
- Jonsdottir, G.R. and Thorisson, K.R. (2009) Teaching Computers to Conduct Spoken Interviews: Breaking
the Realtime Barrier with Learning. In: Proceedings of IVA 2009, pp. 446-459.
- Kopp, S. and Wachsmuth, I. (2004) Synthesizing Multimodal Utterances for Conversational Agents. The
Journal of Computer Animation and Virtual Worlds, 15(1).
- Kopp, S. and Krenn, B. and Marsella, S. and Marshall, A.N. and Pelachaud, C. and Pirker, H. and
Thorisson, K.R. and Vilhjalmsson, H.H. (2006) Towards a common framework for multimodal generation:
The behavior markup language. In Intelligent Virtual Agents, volume 4133 of LNCS, pages 205-217.
- Lee, C.-C. and Lee, S., and Narayanan, S.S. (2008) An analysis of multimodal cues of interruption in
dyadic spoken interactions. In: Proceedings of Interspeech 2008.
- Ter Maat, M. and Heylen, D. (2009) Turn management or Impression Management? In: Proceedings of
Intelligent Virtual Agents, 2009, pp. 467-473.
- Maatman, R.M. and Gratch, J. and Marsella, S. (2005) Natural Behavior of a Listening Agent. In T.
Panayiotopoulos, J. Gratch, R. Aylett, D. Ballin, P. Olivier, and T. Rist, Eds., Intelligent Virtual Agents, 5th
International Working Conference, IVA 2005, Kos, Greece, September 12-14, 2005, Proceedings, 2005, pp.
25Ð36.
- Mancini, M. and Niewiadomski, R. and Bevacqua, E. and Pelachaud, C. (2008) Greta: a SAIBA compliant
ECA system. In: 3e Workshop sur les Agents Conversationnels AnimŽes, 2008
- Nijholt, A. and Reidsma, D. and van Welbergen, H. and op den Akker, H.J.A. and Ruttkay, Z.M. (2008)
Mutually coordinated anticipatory multimodal interaction. In: Verbal and Nonverbal Features of HumanHuman and Human-Machine Interaction, 29-31 October 2007, Patras, Greece. pp. 70-89. Lecture Notes in
Computer Science 5042. Springer Verlag. ISSN 0302-9743 ISBN 978-3-540-70871-1.
- Sacks, H. and Schegloff, E.A. and Jefferson, G. (1974) A simplest systematics for the organization of turntaking for conversation. Language, 50, pp. 696-735.
- Thiebaux, M. and Marshall, A. N. and Marsella, S. and Kallmann, M. (2008) SmartBody: Behavior
Realization for Embodied Conversational Agents. In: Autonomous Agents and Multiagent Systems, 2008, pp
151-158
- Thorisson, K.R. (2002) Natural Turn-Taking Needs No Manual: Computation Theory and Model, form
Perception to Action. In: Multimodality in Language and Speech Systems, pp.173-207.
- Ward, N. and Tsukahara, W. (1999) A Responsive Dialog System. In Machine Conversations, Y. Wilks, Ed.
Kluwer, 1999, pp. 169Ð174.