Automatic Measures to Characterise Verbal Alignment in Human

Aug 15, 2017 - proportion of shared vocabulary. 12 / 14 .... Automatically measuring lexical and acoustic/prosodic convergence in tutorial dialog corpora.
2MB taille 1 téléchargements 246 vues
Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction G. Dubuisson Duplessis1 , C. Clavel2 , F. Landragin3 August 15th, 2017 1

Sorbonne Universités, UPMC Univ Paris 06, CNRS, ISIR, Paris, France 2 LTCI, Télécom ParisTech, Université Paris-Saclay, Paris, France 3 Lattice Laboratory, CNRS, ENS, Université de Paris 3, Université Sorbonne Paris Cité, PSL Research University, Paris/Montrouge, France [email protected] http://www.dubuissonduplessis.fr

Context

Background

Model

Experimentation

Conclusion

References

Outline 1

Context: H2020 ARIA VALUSPA Project

2

Background: Convergence and Verbal Alignment

3 Proposition: Automatic Measures to Characterise Verbal Alignment in H-A Interaction

4

Experimentation and Results

5

Conclusion and Perspectives

SIGDIAL 2017, 08/15/2017

0 / 14

Context

Background

Model

Experimentation

Conclusion

References

H2020 European Project: ARIA VALUSPA

Main Features Artificial Retrieval of Information Assistant I Virtual agent I Multimodal interaction (verbal/non-verbal behaviour) I Adaptation I

URL: http://aria-agent.eu/

Unexpected situation Socio-emotional state of the user

SIGDIAL 2017, 08/15/2017 N

1 / 14

Context

Background

Model

Experimentation

Conclusion

References

Outline 1

Context: H2020 ARIA VALUSPA Project

2

Background: Convergence and Verbal Alignment

3 Proposition: Automatic Measures to Characterise Verbal Alignment in H-A Interaction

4

Experimentation and Results

5

Conclusion and Perspectives

SIGDIAL 2017, 08/15/2017 N

1 / 14

Convergence and Verbal Alignment Convergence and Interactive Alignment I

Communication Accommodation Theory [Gallois et al., 2005] Convergence of behaviour occurs both at low-level (e.g., postures, accent and speech rate) and at high-level (e.g., mental, emotional, cognitive)

I

Interactive Alignment Theory [Pickering and Garrod, 2004] Alignment at the lexical, syntactic and semantic levels Repetitiveness, routinization and dialogue routines

Loc.

Utterance

Loc.

Utterance

S1

hi i’m sam , nice to meet you what is your name ? alex how are you doing ? i am great i really appreciate going fifty fifty with you on clearing out this locker.

H1 S2 H3 S4 H5 S6 H7 S8

hi hi i’m sam , nice to meet you nice to meet you i’m erica how are you doing ? i’m doing good how are you pretty good good i really appreciate going fifty fifty with you on clearing out this locker.

H2 S3 H4 S5

Table 1 : Corpus H-A 311 neg1

Table 2 : Corpus H-A 376 neg1

Context

Background

Model

Experimentation

Conclusion

References

Why studying verbal alignment? Lessons from H-H interaction I

Subconscious phenomenon that naturally occurs in H-H dialogues [Pickering and Garrod, 2004] Speakers reuse lexical as well as syntactic structures from previous utterances [Reitter et al., 2006, Ward and Litman, 2007]

I

Facilitates successful task-oriented conversations [Nenkova et al., 2008, Friedberg et al., 2012]

. . . and what about H-M interaction? I

Linguistic alignment occurs: users adopt lexical items and syntactic structures used by a system [Brennan and Clark, 1996, Stoyanchev and Stent, 2009, Parent and Eskenazi, 2010, Branigan et al., 2010]

I

. . . but it is only one-way!

SIGDIAL 2017, 08/15/2017 N

3 / 14

Context

Background

Model

Experimentation

Conclusion

References

Research Direction Goal Provide a virtual agent with the ability to I I

detect the alignment behaviour of its human interlocutor align (or not) with the user

Motivation Natural source of variation in dialogue Taking into account the socio-emotional behaviour of the user (“social glue”) I Adaptation without the need of extensive user profiling

I

I

Expected outcomes Enhancing agent’s believability, likeability and friendliness Increasing interaction naturalness I Maintaining and fostering user’s engagement [Clavel et al., 2016] I Improving collaboration in task-oriented dialogue

I

I

SIGDIAL 2017, 08/15/2017 N

4 / 14

Context

Background

Model

Experimentation

Conclusion

References

Outline 1

Context: H2020 ARIA VALUSPA Project

2

Background: Convergence and Verbal Alignment

3 Proposition: Automatic Measures to Characterise Verbal Alignment in H-A Interaction

4

Experimentation and Results

5

Conclusion and Perspectives

SIGDIAL 2017, 08/15/2017 N

4 / 14

Context

Background

Model

Experimentation

Conclusion

References

Proposition Approach Providing measures characterising verbal alignment processes based on I I

the transcript of dialogue, and the shared expressions at the lexical level

Expression lexicon

Dialogue transcript GAT TACA

Expr. Freq.

XD TAC

TAC AC XD O

WAC E Z BY OB XD EOF

2 3 2 2

Turn(s) ... 1; 2 1; 2; 3 2: 5 4; 5

Measures

...

Figure 1 : Proposed framework: automatic building of the shared expression lexicon to derive verbal alignment measures

SIGDIAL 2017, 08/15/2017

N

5 / 14

Context

Background

Model

Experimentation

Conclusion

References

Automatic Building of the Expr. Lexicon Shared expression A surface text pattern at the utterance level that has been produced by both speakers in a dialogue A1 B2 A3

B4

well, that’s an interesting idea. but no, that’s not gonna work for me. what will work for you? what do you think about me getting two chairs and one plate and you getting one chair, one plate, and the clock? that’s not gonna work for me

=⇒

Expr.

Freq.

Init.

...

that’s not gonna work for me work for me what you

2

A

...

3 3 2 2

A A B B

... ... ... ...

Sequential pattern mining multiple common subsequence problem A

$1

B

B

$0

1:3

Dialogue

A

0:3 $0

A

$1

0:2

B

1:2 $1 1:1

B$0 0:0

$0 0:1

A$1 1:0

Building of a Generalised Suffix Tree

Filtering

Dialogue Lexicon

Figure 2 : Main steps to build the dialogue lexicon (inspired from [Dubuisson Duplessis et al., 2017]) SIGDIAL 2017, 08/15/2017 N

6 / 14

Context

Background

Model

Experimentation

Conclusion

References

Measures Derived from the Expression Lexicon Dialogue Transcript

Expr. Lexicon Size Expr. Variety Expr. Repetition (S) Initiated Expr. (S)

+

Expression Lexicon

Measures

Number of unique shared expressions in the lexicon (ELS) ELS EV = #Tokens S in an established expr. ERS = # Tokens from # Tokens from S ∀ S, ERS ∈ [0, 1] by S IES = # Expr. initiated ELS ∀ S, IES ∈ [0, 1] SIGDIAL 2017, 08/15/2017 N

7 / 14

Context

Background

Model

Experimentation

Conclusion

References

Outline 1

Context: H2020 ARIA VALUSPA Project

2

Background: Convergence and Verbal Alignment

3 Proposition: Automatic Measures to Characterise Verbal Alignment in H-A Interaction

4

Experimentation and Results

5

Conclusion and Perspectives

SIGDIAL 2017, 08/15/2017 N

7 / 14

Context

Background

Model

Experimentation

Conclusion

References

Experimentation Protocol Protocol Corpus-based contrastive study to assess the proposed framework and measures I I I

H-H/A Corpus VS Surrogate Corpora H-H Corpus VS H-A Corpus Conditions in the H-A Corpus negotiation type (cooperative/competitive) framing (“human operator”/“AI”) gender (male/female agent)

Real Interaction Corpora

H-H

Artificial Corpora

Surrogate H-H

H-A

Surrogate H-A SIGDIAL 2017, 08/15/2017

N

8 / 14

Context

Background

Model

Experimentation

Conclusion

References

Negotiation Corpora H-H/A Corpora I

Negotiation task Integrative/win-win, or Distributive/competitive

I I

2 settings: H-H, H-A (Woz) From [DeVault et al., 2015, Gratch et al., 2016]

The Woz system [DeVault et al., 2015] Designed to be as natural as possible I > 11000 possible utterances I

Dialogue Utterance . . . avg (std) Token

H-H

H-A (Woz)

84 10319 122.8 (84.1)

154 17125 111.2 (57.5)

79396

90479

SIGDIAL 2017, 08/15/2017 N

9 / 14

Context

Background

Model

Experimentation

Conclusion

References

Surrogate Corpora Surrogate Corpora Break the dynamic of the IAP I Break the coupling between utterances

A: X: A: Y: A:

I

A: B: A: B: A:

G: B: R: B: T:

H-H/A

Loc.

Real Utterance

H1 S2 H3

hi

S4 H5

Surrogate

Randomised Utterance

i’m most interested in the chairs hi i’m sam , nice to meet you nice to meet you i’m erica yeah since you won’t budge at all i’d rather do this how are you doing ? i’m doing good how are you why do you want the chairs more than the other items [. . . ] [. . . ] SIGDIAL 2017, 08/15/2017 N

10 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-H/A VS Surrogate Corpora Hypothesis

0.18

0.16

Dialogue participants should constitute a richer expression lexicon in the H-H/A corpora than what would incidentally happen in the surrogate corpora

0.12

Ratio ●

0.08

0.10

0.10 0.08

Ratio

0.12

0.14

0.14

0.16

● ●

0.06

● ●

0.06

● ●

● ●





H−H

random

H−A

random

Experiment Setup

Figure 3 : H-H VS surrogate. Expression Variety. Difference is significant (p < 0.001).

Experiment Setup

Figure 4 : H-A VS surrogate. Expression Variety. Difference is significant (p < SIGDIAL 0.001). 2017, 08/15/2017 N

11 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-H/A VS Surrogate Corpora

Hypothesis Dialogue participants should constitute a richer expression lexicon in the H-H/A corpora than what would incidentally happen in the surrogate corpora Results Observation of richer expression lexicons in the H-H/A corpora than in the surrogate corpora

SIGDIAL 2017, 08/15/2017 N

11 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-H VS H-A Corpora Hypothesis (following [Branigan et al., 2010]) Verbal alignment differs between H-H and H-A interactions: expect more verbal alignment from the human than from the agent (influence by beliefs about the limitations of the agent) ●



0.7

1.0

I

0.6

0.8



0.4

Ratio

0.3

0.4

Ratio

0.6

0.5





0.2

0.2



0.1

0.0





H−A/S1 (Woz)

H−A/S2

H−H/S1

H−H/S2

Speaker

Figure 3 : Initiated Expressions (IES ). Difference is significant for H-A (p < 0.001), not significant for H-H.

H−A/S1 (Woz)

H−A/S2

H−H/S1

H−H/S2

Speaker

Figure 4 : Expression Repetition (ERS ). Difference is significant for H-A (p < 0.001), notSIGDIAL significant for H-H. 2017, 08/15/2017 N

12 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-H VS H-A Corpora Hypothesis (following [Branigan et al., 2010]) Verbal alignment differs between H-H and H-A interactions: I

expect more verbal alignment from the human than from the agent (influence by beliefs about the limitations of the agent)

Results Verbal alignment is: I I

Symmetrical in the H-H corpus Asymmetrical in the H-A corpus the human participant adopts more Woz-initiated expressions, the human participant dedicates more tokens to the repetition of expressions, and this asymmetry does not appear when considering the number of tokens produced by each speaker or when considering the proportion of shared vocabulary. SIGDIAL 2017, 08/15/2017 N

12 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-A Corpus > Negotiation type Study Impact of the negotiation type on verbal alignment indicators

0.6

integrative (win-win) distributive (competitive)

● ●

Ratio 0.10

0.3

0.12

Ratio

0.14

0.5

0.16

0.18

I

0.4

I

0.08

0.2



distributive

integrative Negotiation type

Figure 3 : Expression Variety (EV). Difference is not significant.



distributive

integrative Negotiation type

Figure 4 : Expression Repetition (ER). Difference is significant (p < 08/15/2017 0.001). SIGDIAL 2017, N

13 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-A Corpus > Negotiation type Study Impact of the negotiation type on verbal alignment indicators I I

integrative (win-win) distributive (competitive)

Results Competitive negotiation leads to: I I

longer dialogues, more verbal alignment (need to verbally align more on (counter-)propositions?)

SIGDIAL 2017, 08/15/2017 N

13 / 14

Context

Background

Model

Experimentation

Conclusion

References

Outline 1

Context: H2020 ARIA VALUSPA Project

2

Background: Convergence and Verbal Alignment

3 Proposition: Automatic Measures to Characterise Verbal Alignment in H-A Interaction

4

Experimentation and Results

5

Conclusion and Perspectives

SIGDIAL 2017, 08/15/2017 N

13 / 14

Context

Background

Model

Experimentation

Conclusion

References

Conclusion and Perspectives I

Automatic and generic measures of verbal alignment based on sequential pattern mining at the level of surface of text utterances characterising: the routinization process; the degree of repetition between dialogue participants; the orientation of verbal alignment.

I

Contrasting H-H and H-A verbal alignment (symmetry VS asymmetry) Quantitative confirmation of predictions from previous literature regarding the strength and orientation of verbal alignment in Human-Machine Interaction [Branigan et al., 2010]

I

Perspectives Online usage in a dialogue system (measures are based on efficient algorithms) Qualitative analysis of verbal alignment differences Confirming results on other comparable H-H/H-A corpora SIGDIAL 2017, 08/15/2017

N 14 / 14

Context

Background

Model

Experimentation

Conclusion

References

Références I Branigan, H. P., Pickering, M. J., Pearson, J., and McLean, J. F. (2010). Linguistic alignment between people and computers. Journal of Pragmatics, 42(9):2355–2368.

Brennan, S. E. and Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(6):1482.

Clavel, C., Cafaro, A., Campano, S., and Pelachaud, C. (2016). Fostering user engagement in face-to-face human-agent interactions: a survey. In Toward Robotic Socially Believable Behaving Systems-Volume II, pages 93–120. Springer.

DeVault, D., Mell, J., and Gratch, J. (2015). Toward natural turn-taking in a virtual human negotiation agent. In AAAI Spring Symposium on Turn-taking and Coordination in Human-Machine Interaction. AAAI Press, Stanford, CA.

Dubuisson Duplessis, G., Charras, F., Letard, V., Ligozat, A.-L., and Rosset, S. (2017). Utterance Retrieval based on Recurrent Surface Text Patterns. In 39th European Conference on Information Retrieval (ECIR), pages 199–211, Aberdeen, United Kingdom.

Friedberg, H., Litman, D., and Paletz, S. B. (2012). Lexical entrainment and success in student engineering groups. In Spoken Language Technology Workshop (SLT), pages 404–409. IEEE.

SIGDIAL 2017, 08/15/2017

15 / 14 N

Context

Background

Model

Experimentation

Conclusion

References

Références II Gallois, C., Ogay, T., and Giles, Howard, H. (2005). Communication accommodation theory: A look back and a look ahead. W. Gudykunst (red.): Theorizing about intercultural communication. Thousand Oaks, CA: Sage, pages 121–148.

Gratch, J., DeVault, D., and Lucas, G. (2016). The benefits of virtual humans for teaching negotiation. In International Conference on Intelligent Virtual Agents (IVA), pages 283–294. Springer.

Nenkova, A., Gravano, A., and Hirschberg, J. (2008). High frequency word entrainment in spoken dialogue. In Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies (ACL-HLT): Short papers, pages 169–172. Association for Computational Linguistics.

Parent, G. and Eskenazi, M. (2010). Lexical entrainment of real users in the let’s go spoken dialog system. In INTERSPEECH, pages 3018–3021.

Pickering, M. J. and Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and brain sciences, 27(02):169–190.

SIGDIAL 2017, 08/15/2017

16 / 14

Context

Background

Model

Experimentation

Conclusion

References

Références III

Reitter, D., Keller, F., and Moore, J. D. (2006). Computational modelling of structural priming in dialogue. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (NAACL-HLT): Short Papers, pages 121–124. Association for Computational Linguistics.

Stoyanchev, S. and Stent, A. (2009). Lexical and syntactic priming and their impact in deployed spoken dialog systems. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (NAACL-HLT): Short Papers, pages 189–192. Association for Computational Linguistics.

Ward, A. and Litman, D. J. (2007). Automatically measuring lexical and acoustic/prosodic convergence in tutorial dialog corpora. In Speech and Language Technology in Education (SLaTE2007), pages 57–60.

SIGDIAL 2017, 08/15/2017

17 / 14

Context

Background

Model

Experimentation

Conclusion

References

Results: H-H VS H-A Corpora Token ratio for S1 and S2 per dialogue (H−Woz VS H−H) Token overlap for S1 and S2 per dialogue (H−Woz VS H−H)

0.8

0.8



0.5

0.2

0.3

0.4

0.4

Ratio

Ratio

0.6

0.6

0.7



0.2

● ●

● ●

H−A/S2

H−H/S1

H−H/S2

Speaker



0.1

H−A/S1 (Woz)

H−A/S1 (Woz)

H−A/S2

H−H/S1

H−H/S2

Speaker

Figure 3 : Amount of tokens produced by each speaker. Difference is not significant.

Figure 4 : Shared vocabulary. Difference is not significant. SIGDIAL 2017, 08/15/2017

18 / 14

Context

Background

Model

Experimentation

Conclusion

References

Convergence and Verbal Alignment

Loc. S1 H2 S3 H4 S5 H6 S7 H8 S9 H10

Utterance [. . . ] deal deal thank you thank you nice doing business with you it’s a pleasure until next time have a good day goodbye bye

Table 3 : Corpus H-A 302 neg1

Loc. S1 H2 S3 H4 S5 H6 S7 H8

Utterance [. . . ] deal deal thank you thank you it’s a pleasure doing business with you it’s a pleasure doing business with you too goodbye goodbye

Table 4 : Corpus H-A 352 neg1 SIGDIAL 2017, 08/15/2017

19 / 14

Context

Background

Model

Experimentation

Conclusion

References

Excerpts of the Negotiation Corpora Loc. A1 B2 A3 B4 A5 B6 A7

Utterance well, that’s an interesting idea. but no, that’s not gonna work for me. what will work for you? what do you think about me getting two chairs and one plate and you getting one chair, one plate, and the clock? that’s not gonna work for me well which of these items would be your first choice? well i don’t want the clock oh really?

Table 5 : Excerpt of dialogue extracted from the H-A corpus. Expressions are coloured. Established expressions are in italic. (H-A 329 neg2)

SIGDIAL 2017, 08/15/2017

20 / 14

Context

Background

Model

Experimentation

Conclusion

References

The Woz system

Figure 5 : The Woz system [DeVault et al., 2015] SIGDIAL 2017, 08/15/2017

21 / 14

Context

Background

Model

Experimentation

Conclusion

References

Perspectives: NLG and Evaluation Verbal Alignment Strategy

Automatic Evaluation

Enabling verbal alignment in the NLG model of the agent

Studying the contribution of verbal alignment metrics to automatic evaluation procedures

Dialogue Act

Dialogue History

+

Expression Lexicon

NL Generation Process

Dialogue transcript

Evaluation Metrics

System Utterance

Discourse Context

SIGDIAL 2017, 08/15/2017

22 / 14