Experimental Evaluation of Multimodal Interfaces based on Multimodal Corpora Analysis Stéphanie BUISINE Jean-Claude MARTIN LIMSI-CNRS, France {buisine,martin}@limsi.fr www.limsi.fr/Individu/buisine
1
www.limsi.fr/Individu/martin
LIMSI’s ECA Team LIMSI-CNRS (repr. P. Le Quéré) n HCI dept (repr. P. Tarroux) n
AMI group: Architectures and Models for Interaction (repr. J.P. Sansonnet ) n
ECA Team created in 2003 (repr. Jean-Claude
§ § § §
MARTIN)
Sarkis ABRILIAN Stéphanie BUISINE Jean-Pierre FOURNIER Ouriel GRYNSZPAN
§ Guillaume PITEL § Jean-Paul SANSONNET 2
1
Research Topic n How to achieve intuitiveness of multimodal
input & output in HCI? n n
Collection of multimodal corpora Conception of evaluation protocols
3
Overview n Three Experimental Studies: n Multimodal Input with Conversational Agents n
Multimodal Input with classical GUI
n
Multimodal behavior of Conversational Agents
n Conclusion
4
2
Multimodal Input with Conversational Agents Experimental Study (Wizard of Oz) on Multimodal Input Proceedings of Interact 2003
5
www.niceproject.com
Why evaluating Input with Agents? n Evaluations focused on output features n See Dehn & Van Mulken, 2000; n McBreen & Jack, 2001; n Moreno et al., 2001; n Craig et al., 2002; Output n Baylor, 2003… Input User
Agent
n Evaluations focused on input modes n ? 6
3
Input with Embodied Agents n Keyboard and/or mouse input n E.g. Pelachaud et al., 2002 n Speech input n E.g. McBreen & Jack, 2001 n Multimodal input n Oviatt & Adams, 2000 n Cassell & Thorisson, 1999 7
Goals of this study n Test the usefulness of multimodal input with a
simulated system (Wizard of Oz). n Collect behavioral data to build a functional
system. n Compare two populations: adults vs. children.
8
4
Method n Conversational game. Input modes: n Monomodal: speech-only Counterbalanced n Multimodal: speech + order pen on the screen n Users: 7 adults (age 26), 10 children (age 11). n 2D agents: 4 cartoonish characters
9
Catalogue of 2D images …
… … 10
5
Can you bring me the red lamp?
11
Game scenario
There is nothing to eat here.
Fetch the blue flower, it must be watered.
Can I help you?
See you later!
12
6
Wizard of Oz device
83 pre-encoded utterances and associated nonverbal behaviors combinations
Experimenter Video monitor
Wizard Interface
PC
Loudspeakers
PC
Loudspeakers
Video Camera + microphone
k
Interactive Pen Display 13
Subject
Video
14
7
Analysis process Annotation Process PRAAT
ANVIL Kipp, 2001
JAVA JAXP
Annotations
Metrics
Coding Scheme
Video Corpus 3h30
SPSS
Statistics: Uni Uni-- and Multidimensional Analyses Questionnaire
15
Annotation Process PRAAT
Video Corpus
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
« Hello »
« this is » « your » « flower » 16
8
Annotation Process PRAAT
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
Video Corpus
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
Output file from PRAAT:
Time in seconds
282.81034178522418 283.64115146193643 "bonjour" 283.64115146193643 284.13634989722596 "" 284.13634989722596 284.40148593975829 "voilà" 284.40148593975829 284.83569424129678 "votre" 284.83569424129678 285.39286418574881 "fleur"
« Hello »
« this is » « your » « flower »
17
Annotation Process PRAAT
Video Corpus
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
1 sec
preparation
stroke, pointing
retraction
18
9
Annotation Process ANVIL PRAAT Video substantive Corpus ... Statistics: Uni - and Multidimensional Analyses circling exploration ... ... ... get in ask wish take object ... ... ... ... Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
SPSS
Questionnaire
n Coding Scheme
for ANVIL
... bonjour locution voilà locution votre adjective fleur substantive ... ... preparation stroke pointing
19
Annotation Process PRAAT
Video Corpus
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
n Output file from ANVIL: n Results displayed track by track n Need of a program to extract metrics and connections between them.
20
10
Annotation Process PRAAT
Variables collected
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
Video Corpus
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
n Behavioral metrics: n Duration of use for each modality n Characteristics of use (syntactic categories of words, movement’s shape …) n Subjective data (questionnaire): n Easiness, pleasantness…
è 11 dependent variables 21
Annotation Process PRAAT
Statistics
Video Corpus
ANVIL Kipp, 2001
Annotations
JAVA JAXP
Metrics
Coding Scheme
SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire
n One-dimensional: n ANOVA n Wilcoxon-Mann-Whitney n Multidimensional: n Factorial Analysis n Multiple Regressions
22
11
Results n Descriptive data about the use of each modality: Morphosyntactic category
Percentage
Locutions
21.9 %
Verbs
19.3 %
Substantives
16.2 %
Pronouns
15.4 %
Adjectives
11.4 %
Articles
10.3 %
Conjunctions
3.1 %
Adverbs
2.3 %
Shape of movement
Percentage
Pointing
66 %
Circling
18.1 %
Exploration
8.5 %
Line
5.4 %
Arrow
2.1 %
23
Results n Multimodal scenarios significantly shorter
+ yield higher and more homogeneous ratings of easiness. 5 4,5 4 Easiness
3,5 3
Children Adults
2,5 2 1,5 1 0,5 0
24
Multimodal Input
Speech-only Input
12
Results n Use of pen by children: extra-interaction time. 100
use duration (sec)
90 80 70
Children
60
Adults
50 40 30 20 10 0
Speech Input
Pen
Overlap
25
Results n Principal Component Analysis n Correlations between the variables and the 3 components: 1
Components 2
Total duration
-0.912
-0.065
-0.104
Speech duration Pen duration Age Easiness Effectiveness Pleasantness Learning
0.017 -0.424 0.520 0.828 0.172 0.434 0.848
-0.816 0.682 -0.582 0.116 0.171 0.503 0.239
-0.264 0.135 0.398 -0.270 0.906 -0.411 0.062
3
26
13
Results n Modalities used as a function of actions performed: n Conversational goals (asking questions): by speech. n Taking objects: by pen. n Exit a room: 14
use number
12
Children Adults
10 8 6 4 2 0
27
Speech Input
Pen
Multimodal input
Conclusions on this Study n Usefulness of multimodal input n Especially for children n Behavioral data used for implementing a
functional system: n n
Parameterization of the recognition system Modality specialization for identified actions
28
14
I give it to you.
+
29
Perspectives on this Study n Analyses in progress: model of multimodal
behavior n Further analyses of spoken utterances n Politeness, language style, vocabulary scope, disfluencies, emotion… n Test of the functional system with the same
scenario n
Validation of the experimental platform 30
15
Multimodal Input with a classical GUI Second Wizard-of-Oz Study on Multimodal Input
31
cpn.paris.ensam.fr/tvi
Context and Goals n Conception of an interface for interactive TV. n Comparison of different input modes: vocal,
gestural, multimodal. n Collection of behavioral data to develop a
functional system. n Addition of multimodality to an existing
interface. 32
16
Method n TV program search scenarios n Input modes: n Speech-only n Pen-only n Speech + Pen
Counterbalanced order
n Users: 6 adults 33
Interface n Program Guide
on the web n n n
Texts Graphics Auditory error messages
34
17
Wizard of Oz device Experimenter
Video Monitor PC
PC
Loudspeakers
Internet Video Camera + Microphone
k
Interactive Pen Display Subject
35
Results n Input mode preferred: n Speech for 2 users n Pen for 2 users n Multimodal for 2 users è Multimodal input is likely to satisfy everybody!
36
18
Results n Syntax of verbal commands: n No sentence structure n
n
Use of interface labels (70% of words used): n
n
Ex: “Date, Tuesday 26 November, channel, Encyclopedia”
“Category, Film, Subcategory, Science-Fiction”
Use of names of commands: n
“Select” “OK” “Cancel” “Back” “Scroll”… 37
Conclusions on this Study n Ensure equivalence between modalities:
users can choose their preferred input mode. n Spontaneous use of labels: facilitation of vocal
recognition. n Syntax: n Due to the application? n To the interface? 38
19
Perspectives on this Study n Insert an Embodied Agent in the interface. n Test the same scenarios with an agent: n Comparison of user’s behavior with the web site / with an agent ? n Comparison of user’s behavior with agents as a function of the application.
39
Multimodal Behavior of Conversational Agents Experimental Study on Output Features Proceedings of Workshop AAMAS 2003
40
20
Giving Agents a Personality n Emotional personality (extraversion,
friendliness…): n
See Ball and Breese (2000)…
n Rhetorical personality: n Variation of verbal / nonverbal behavior in the discourse.
41
Cooperation between Speech and Gesture n Survey of multimodal video corpora: n Redundancy n Complementarity n Specialization n Concurrency … (Martin et al., 2001)
42
21
Redundancy You have to use the big round button in the center.
43
Complementarity You have to use this button.
44
22
Speech-Specialization You have to use the big round button in the center.
45
Combinations with Agent’s Look n n n
3 strategies, 3 agents, 3 objects to present.
n Latin squared
combinations.
46
23
Method n 18 users (9 men + 9 women). n Listen to 3 short technical presentations. n No interaction with agents. n Recall of information + Questionnaire.
47
Results n Quality of explanation n No effect of appearance n Effect of multimodal behavior: Quality of explanation
3
2,5
2
1,5
1
Redundant
Complementary
Specialized
48
24
Results n Interaction with user’s gender: Males Females
Quality of explanation
3
2,5
2
1,5
1 Redundant
Complementary
Specialized 49
Results n Likeability n No effect of multimodal behavior n Effect of appearance:
>
>
n Performance: same effect of appearance. n But no correlation. 50
25
Conclusions on this Study n Effects of cooperation between speech and
gesture: n n
Unconscious effect Gender differences to be confirmed
n Effects of agent’s appearance on likeability: n Additional comments of users: glasses, white coat… n Effects of appearance on performance: n Moreno et al. 2002 51
General Conclusion
Design of Multimodal Interfaces based on Corpora Analysis and Experimental Evaluation 52
26
n Multimodal Corpora Analysis n Studies on
n Studies on
INPUT modes (multimodal interfaces)
OUTPUT modes (agents)
n Applied Results: n Validating concepts n Implementing functional systems n General Results about humans: n Multimodal behavior n Age, gender differences… 53
Thank you! Any questions? More questions?
[email protected] www.limsi.fr/Individu/buisine
54
www.niceproject.com
cpn.paris.ensam.fr/tvi
www.limsi.fr
27
preparation
stroke, pointing
retraction
Overlaps
55
28