LIMSI's ECA Team - Stéphanie Buisine

Multimodal: speech + pen on the .... .
3MB taille 1 téléchargements 34 vues
Experimental Evaluation of Multimodal Interfaces based on Multimodal Corpora Analysis Stéphanie BUISINE Jean-Claude MARTIN LIMSI-CNRS, France {buisine,martin}@limsi.fr www.limsi.fr/Individu/buisine

1

www.limsi.fr/Individu/martin

LIMSI’s ECA Team LIMSI-CNRS (repr. P. Le Quéré) n HCI dept (repr. P. Tarroux) n

AMI group: Architectures and Models for Interaction (repr. J.P. Sansonnet ) n

ECA Team created in 2003 (repr. Jean-Claude

§ § § §

MARTIN)

Sarkis ABRILIAN Stéphanie BUISINE Jean-Pierre FOURNIER Ouriel GRYNSZPAN

§ Guillaume PITEL § Jean-Paul SANSONNET 2

1

Research Topic n How to achieve intuitiveness of multimodal

input & output in HCI? n n

Collection of multimodal corpora Conception of evaluation protocols

3

Overview n Three Experimental Studies: n Multimodal Input with Conversational Agents n

Multimodal Input with classical GUI

n

Multimodal behavior of Conversational Agents

n Conclusion

4

2

Multimodal Input with Conversational Agents Experimental Study (Wizard of Oz) on Multimodal Input Proceedings of Interact 2003

5

www.niceproject.com

Why evaluating Input with Agents? n Evaluations focused on output features n See Dehn & Van Mulken, 2000; n McBreen & Jack, 2001; n Moreno et al., 2001; n Craig et al., 2002; Output n Baylor, 2003… Input User

Agent

n Evaluations focused on input modes n ? 6

3

Input with Embodied Agents n Keyboard and/or mouse input n E.g. Pelachaud et al., 2002 n Speech input n E.g. McBreen & Jack, 2001 n Multimodal input n Oviatt & Adams, 2000 n Cassell & Thorisson, 1999 7

Goals of this study n Test the usefulness of multimodal input with a

simulated system (Wizard of Oz). n Collect behavioral data to build a functional

system. n Compare two populations: adults vs. children.

8

4

Method n Conversational game. Input modes: n Monomodal: speech-only Counterbalanced n Multimodal: speech + order pen on the screen n Users: 7 adults (age 26), 10 children (age 11). n 2D agents: 4 cartoonish characters

9

Catalogue of 2D images …

… … 10

5

Can you bring me the red lamp?

11

Game scenario

There is nothing to eat here.

Fetch the blue flower, it must be watered.

Can I help you?

See you later!

12

6

Wizard of Oz device

83 pre-encoded utterances and associated nonverbal behaviors combinations

Experimenter Video monitor

Wizard Interface

PC

Loudspeakers

PC

Loudspeakers

Video Camera + microphone

k

Interactive Pen Display 13

Subject

Video

14

7

Analysis process Annotation Process PRAAT

ANVIL Kipp, 2001

JAVA JAXP

Annotations

Metrics

Coding Scheme

Video Corpus 3h30

SPSS

Statistics: Uni Uni-- and Multidimensional Analyses Questionnaire

15

Annotation Process PRAAT

Video Corpus

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

« Hello »

« this is » « your » « flower » 16

8

Annotation Process PRAAT

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

Video Corpus

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

Output file from PRAAT:

Time in seconds

282.81034178522418 283.64115146193643 "bonjour" 283.64115146193643 284.13634989722596 "" 284.13634989722596 284.40148593975829 "voilà" 284.40148593975829 284.83569424129678 "votre" 284.83569424129678 285.39286418574881 "fleur"

« Hello »

« this is » « your » « flower »

17

Annotation Process PRAAT

Video Corpus

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

1 sec

preparation

stroke, pointing

retraction

18

9

Annotation Process ANVIL PRAAT Video substantive Corpus ... Statistics: Uni - and Multidimensional Analyses circling exploration ... ... ... get in ask wish take object ... ... ... ... Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

SPSS

Questionnaire

n Coding Scheme

for ANVIL

... bonjour locution voilà locution votre adjective fleur substantive ... ... preparation stroke pointing

19

Annotation Process PRAAT

Video Corpus

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

n Output file from ANVIL: n Results displayed track by track n Need of a program to extract metrics and connections between them.

20

10

Annotation Process PRAAT

Variables collected

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

Video Corpus

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

n Behavioral metrics: n Duration of use for each modality n Characteristics of use (syntactic categories of words, movement’s shape …) n Subjective data (questionnaire): n Easiness, pleasantness…

è 11 dependent variables 21

Annotation Process PRAAT

Statistics

Video Corpus

ANVIL Kipp, 2001

Annotations

JAVA JAXP

Metrics

Coding Scheme

SPSS Statistics: Uni - and Multidimensional Analyses Questionnaire

n One-dimensional: n ANOVA n Wilcoxon-Mann-Whitney n Multidimensional: n Factorial Analysis n Multiple Regressions

22

11

Results n Descriptive data about the use of each modality: Morphosyntactic category

Percentage

Locutions

21.9 %

Verbs

19.3 %

Substantives

16.2 %

Pronouns

15.4 %

Adjectives

11.4 %

Articles

10.3 %

Conjunctions

3.1 %

Adverbs

2.3 %

Shape of movement

Percentage

Pointing

66 %

Circling

18.1 %

Exploration

8.5 %

Line

5.4 %

Arrow

2.1 %

23

Results n Multimodal scenarios significantly shorter

+ yield higher and more homogeneous ratings of easiness. 5 4,5 4 Easiness

3,5 3

Children Adults

2,5 2 1,5 1 0,5 0

24

Multimodal Input

Speech-only Input

12

Results n Use of pen by children: extra-interaction time. 100

use duration (sec)

90 80 70

Children

60

Adults

50 40 30 20 10 0

Speech Input

Pen

Overlap

25

Results n Principal Component Analysis n Correlations between the variables and the 3 components: 1

Components 2

Total duration

-0.912

-0.065

-0.104

Speech duration Pen duration Age Easiness Effectiveness Pleasantness Learning

0.017 -0.424 0.520 0.828 0.172 0.434 0.848

-0.816 0.682 -0.582 0.116 0.171 0.503 0.239

-0.264 0.135 0.398 -0.270 0.906 -0.411 0.062

3

26

13

Results n Modalities used as a function of actions performed: n Conversational goals (asking questions): by speech. n Taking objects: by pen. n Exit a room: 14

use number

12

Children Adults

10 8 6 4 2 0

27

Speech Input

Pen

Multimodal input

Conclusions on this Study n Usefulness of multimodal input n Especially for children n Behavioral data used for implementing a

functional system: n n

Parameterization of the recognition system Modality specialization for identified actions

28

14

I give it to you.

+

29

Perspectives on this Study n Analyses in progress: model of multimodal

behavior n Further analyses of spoken utterances n Politeness, language style, vocabulary scope, disfluencies, emotion… n Test of the functional system with the same

scenario n

Validation of the experimental platform 30

15

Multimodal Input with a classical GUI Second Wizard-of-Oz Study on Multimodal Input

31

cpn.paris.ensam.fr/tvi

Context and Goals n Conception of an interface for interactive TV. n Comparison of different input modes: vocal,

gestural, multimodal. n Collection of behavioral data to develop a

functional system. n Addition of multimodality to an existing

interface. 32

16

Method n TV program search scenarios n Input modes: n Speech-only n Pen-only n Speech + Pen

Counterbalanced order

n Users: 6 adults 33

Interface n Program Guide

on the web n n n

Texts Graphics Auditory error messages

34

17

Wizard of Oz device Experimenter

Video Monitor PC

PC

Loudspeakers

Internet Video Camera + Microphone

k

Interactive Pen Display Subject

35

Results n Input mode preferred: n Speech for 2 users n Pen for 2 users n Multimodal for 2 users è Multimodal input is likely to satisfy everybody!

36

18

Results n Syntax of verbal commands: n No sentence structure n

n

Use of interface labels (70% of words used): n

n

Ex: “Date, Tuesday 26 November, channel, Encyclopedia”

“Category, Film, Subcategory, Science-Fiction”

Use of names of commands: n

“Select” “OK” “Cancel” “Back” “Scroll”… 37

Conclusions on this Study n Ensure equivalence between modalities:

users can choose their preferred input mode. n Spontaneous use of labels: facilitation of vocal

recognition. n Syntax: n Due to the application? n To the interface? 38

19

Perspectives on this Study n Insert an Embodied Agent in the interface. n Test the same scenarios with an agent: n Comparison of user’s behavior with the web site / with an agent ? n Comparison of user’s behavior with agents as a function of the application.

39

Multimodal Behavior of Conversational Agents Experimental Study on Output Features Proceedings of Workshop AAMAS 2003

40

20

Giving Agents a Personality n Emotional personality (extraversion,

friendliness…): n

See Ball and Breese (2000)…

n Rhetorical personality: n Variation of verbal / nonverbal behavior in the discourse.

41

Cooperation between Speech and Gesture n Survey of multimodal video corpora: n Redundancy n Complementarity n Specialization n Concurrency … (Martin et al., 2001)

42

21

Redundancy You have to use the big round button in the center.

43

Complementarity You have to use this button.

44

22

Speech-Specialization You have to use the big round button in the center.

45

Combinations with Agent’s Look n n n

3 strategies, 3 agents, 3 objects to present.

n Latin squared

combinations.

46

23

Method n 18 users (9 men + 9 women). n Listen to 3 short technical presentations. n No interaction with agents. n Recall of information + Questionnaire.

47

Results n Quality of explanation n No effect of appearance n Effect of multimodal behavior: Quality of explanation

3

2,5

2

1,5

1

Redundant

Complementary

Specialized

48

24

Results n Interaction with user’s gender: Males Females

Quality of explanation

3

2,5

2

1,5

1 Redundant

Complementary

Specialized 49

Results n Likeability n No effect of multimodal behavior n Effect of appearance:

>

>

n Performance: same effect of appearance. n But no correlation. 50

25

Conclusions on this Study n Effects of cooperation between speech and

gesture: n n

Unconscious effect Gender differences to be confirmed

n Effects of agent’s appearance on likeability: n Additional comments of users: glasses, white coat… n Effects of appearance on performance: n Moreno et al. 2002 51

General Conclusion

Design of Multimodal Interfaces based on Corpora Analysis and Experimental Evaluation 52

26

n Multimodal Corpora Analysis n Studies on

n Studies on

INPUT modes (multimodal interfaces)

OUTPUT modes (agents)

n Applied Results: n Validating concepts n Implementing functional systems n General Results about humans: n Multimodal behavior n Age, gender differences… 53

Thank you! Any questions? More questions? [email protected] www.limsi.fr/Individu/buisine

54

www.niceproject.com

cpn.paris.ensam.fr/tvi

www.limsi.fr

27

preparation

stroke, pointing

retraction

Overlaps

55

28