1 Meeting understanding based on surface annotations

Feb 17, 2005 - what users want to retrieve: analysis of user queries. Â¥ relevance to other applications in the field. £ Empirical validity. Â¥ definitions based on ...
452KB taille 3 téléchargements 290 vues
I nst it ut ional support

Meet ing underst anding based on surface annot at ions

( I M ) 2 pr oj ect – a Sw iss NCCR 

Andre i Pope scu- Be lis

I nt eract iv e Multim odal I nform at ion Managem ent

I SSCO / TI M / ETI Un ive r sit y of Ge ne va Sw it ze rla nd

Un ive r sit y of Ge n e va , ETI 

j oint work wit h Susan Arm st rong, Alexander Clark , Maria Georgescul Denis Lalanne, Agnes Lisowska, Sandrine Zufferey , et al.

School of t ranslat ion and int erpreting

1 7 Febr ua ry 2 0 0 5

Research on m eet ing processing 

I SC Ly o n

2

Meeting processing and ret riev al in ( I M) 2 Understanding human dialogues in meetings

Dialogu e “ u nder st anding” by com put er s has pr om ising applicat ions m eet ing t ranscript ion  enriched eet ing sum m arizat ion  m elligent m eet ing browsing  int  digit al assist ant s for m eet ing room s

Shallow Dialogue Analysis - segmentation, keywords - dialogue acts - co-reference - discourse markers Argumentative Analysis

Dialogue annotation modules

Processing of dialogues Storage of processed & annotated meetings

applicat ions t o hum an- com put er dialogue

 

Desir able: Fully aut om at ed m inut e wr it ing applicat ion Reason able h ope: “ Wer e t her e an y q uest ions ab out sect ion 2 of t h e r epor t ?”

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

Transcript-based interface (TQB) - queries to DB - multimedia output Multimodal interface “Archivus” - speech, text, pointer - dialogue models

Retrieving meeting data

Interfaces

Interfaces to meeting databases 3

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

Plan of t he t alk

Const raint s on our st udy of dialogue processing







 

I nt r oduct ion Shallow Dialog ue Ann ot at ion ( SDA)  Segm ent at ion into episodes  Recognition of dialogue act s  Resolut ion of references t o docum ent s  Det ect ion of discourse m arkers

 

Use of SDA in a m eet ing br owser Discussion  m achine learning ( or not ) for SDA  cycle of evaluation- driven language processing

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

5

4

Theoretical grounding

 

availabilit y of m odels of t he phenom enon dom ains



sem ant ics + discour se st udies + pr agm at ics

Application requirem ent s

 

what user s want t o r etr ieve: analysis of user quer ies r elevance t o ot her applicat ions in t he field

Em pirical v alidity

 

definit ions based on examples occur r ing in a given cor pus hum an annot at or s find consist ent r esult s



Av ailabilit y of data



Apparent feasibilit y

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

6

1

Select ed phenom ena: SDA Shallow Dialogue Annot at ion 

SDA over v iew Segmentation - utterances - episodes

I nput data: t im ed t ranscript for each speaker ( i.e. channel) Name

Type of a nn ot a t ion

Scope

EP

e pisodes ( 1 )

t em por al boundar ies

cr oss- channel

TO

t opics/ keywords

labels on EP ( open set )

sam e as EP

UT

ut t er ances

t em por al boundar ies

int r a- channel

DA

dia logue a cts ( 2 )

labels on UT ( DA t agset )

sam e as UT

RE

r efer r ing expressions

t em por al boundar ies

int r a- channel

DE

re f . t o docu m en t s ( 3 )

point er s RE

cr oss- m odal

DM

discou rse m a rke rs ( 4 )

wor d classificat ion

DE

Input data - transcribed speech - timing - multimodal events

Reference Discourse - to documents markers - coreference - det ect

Dialogue structure - dialog acts - links between acts

XSLT XSLT

Channel_1



Database

Channel_N

XML annotations

XSLT

Est-ce que vous ne poussez pas un peu loin le bouchon ? Disons, vous avez mis en exergue...

int r a- channel

HTML browser

Low level linguistic processing 1 7 Febr ua ry 2 0 0 5

I SC Ly o n

7

Nb. x t im e Media

I DI AP

60 x 5’

I SSCO UniFr 

75 x 60’



EN

ut t er ances, dialogue act s, discour se m ar ker s, episodes( 30% )

A, V, T

EN

ut t er ances, episodes

Met hods  use word dist ribution t o identify cohesive unit s 



int egrat e m ulti- word expressions



use discourse feat ures ( with SVM)

8 x 30’

A, V, T, D

EN

A, V, T, D

FR

ut t er ances, r efer ences t o docum ent s





lat ent sem antic analy sis (LSA, PLSA)

Difficult y  no large dat aset available yet wit h a ll SDA annotat ions I SC Ly o n

9

“ Real” dat a





47%

LSA

35%

34%

C99

43%

10%



Next : t opic charact erization exper im ent s w it h keyw or d ext r act ion vs. concept ident ificat ion ( EDR)

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

11

10

[ Clar k & Popescu- Belis]

funct ion of an ut t er ance in dialogue m any com pet ing t heor ies about “ funct ion”

DA annot at ion 

r esult s on ar t ificial dat a ( m er ged ar t icles) not corr elat ed w it h r eal m eet ing dat a

 

I SC Ly o n

Dialogue act 

38%

synt actic cues, speaker change, discourse m ark ers ( e.g., well, now) , silences

1 7 Febr ua ry 2 0 0 5

“ Art ificial” dat a

Baseline



2. DA recognit ion

Result s ( Pk score, ~ error rat e)



Goal  segm ent each m eet ing int o coherent blocks defined by a com m on t opic

A, T

22 x 15’

Algorit hm

[ M. Georgescul]

Annotat ion

Result s on t opic boundar y det ect ion 

8

Lg.

ongoing: all

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

1. Them at ic episodes: t opic boundary det ect ion

Available dat a I CSI - MR

1 7 Febr ua ry 2 0 0 5



pr esupposes segm ent at ion of channels int o ut t er ances som e st at e- of- t he- ar t st at ist ical r ecognit ion m et hods dependence on t he DA t agset

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

12

2

Choosing t he r ight DA t agset

MALTUS: an I M2 pr oposal







DAMSL: indepen dent dim ensions  Com m unicat ive St at us, I nform at ion Level, Forward Looking Function, Backward Looking Function

 

220 observed DAMSL labels m ut ually- exclusive t ags



Conv ersion of I CSI - MR t ags t o MALTUS 

I SC Ly o n

13

113,000 ut t erances  50 MALTUS t ags ( w it hout D) m ore analysis and dat a needed t o find which t ags ar e m ut ually exclusive

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

14

3. References t o docum ent s [ Alex Clark ]

[ Lalanne & Popescu- Belis] 

Obj ect iv es

Cross- m edia link bet w een

find dim ensions of MALTUS t hat ar e m ost easily pr edict able fr om dat a find dependencies am ong t ags



w hat is said: r efer r ing expr essions docum ent s and elem ent s t o w hich t he REs r efer



Feat ures 

m ain f unct ion  st at ement , quest ion, backchannel, floor holder / gr abber secondary funct ion  r esponse ( posit ive, negat ive or undecided) , at tent ion- relat ed, com m and (per for m at ive) , polit eness m ark, rest at ed info.

Num ber of possible labels: 770







lex ical ( w or ds) + cont ex t ual ( sur r ounding t ags)

Result s   





MALTUS







I CSI - MRDA: com bine ( ag ain) SWBD- DAMSL  ca. 7 m illion possible labels

reduce dim ensionalit y of I CSI - MRDA

St r uct ur e of a MALTUS label: t ags

clust ered int o 42

DA t agging in I M2 



St at em ent 36% , Acknowledgem ent / Backchannel 19% , Opinion 13% , Agree/ Accept 5%

1 7 Febr ua ry 2 0 0 5







SWBD- DAMSL:



Mult idim ensional Abst ract Layered Tagset for Ut t eranceS

Four w ay classifier ( S | Q | B | H)  84.9% accuracy vs. 64.1% baseline Full MALTUS classif ier ( w it hout “ disr upt ions” )  73.2% accuracy vs. 41.9% baseline ( S t ag) MALTUS w it h six classifier s t r ained separ at ely  Prim ary classifier: S| H| Q| B  5 secondary classifiers: PO | not PO, AT | not AT, et c.  70.5% accuracy only

Conclusion 

separ at e cls. < com bined cls.

1 7 Febr ua ry 2 0 0 5



dependencies bet w een DAs

I SC Ly o n

15

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

Ref2doc annot at ion

Ref2doc algorit hm based on anaphor a t r acking









DI VA/ Univer sit y of Fr ibour g  press- review m eet ings ( ~ 15’ each)  22 m eet ings, 30 docum ent s

Loop t hrough REs in chronological order 



 

90%

1 7 Febr ua ry 2 0 0 5

97%

for docum ent elem ent s ass. ( 9 I SC Ly o n



3 errors) 17

if RE includes newspaper nam e  r ef ers t o t hat newspaper  < cur r ent docum ent > set t o t hat newspaper ot herwise ( anaphor)  r ef ers t o < cur rent docum ent >

Docum ent elem ent assignm ent 

I nt er - ann ot at or agr eem ent  3 annotat ors on 1/ 3 of t he data  before discussion aft er discussion  96% 100% for docum ent assignm ent (3 0 errors) 



st ore < curr ent docum ent > and < curr ent docum ent elem ent >

Docum ent assignm ent 

Gr oun d t r ut h an n ot at ion for t r aining and ev aluat ion  dialogue t ranscript ion, docum ent st ruct uring ( XML)  RE annotat ion: 427 REs  ref2doc annotat ion

16

if RE is anaphoric  r ef ers t o < curr ent docum ent elem ent > ot herwise  best m at ching docum ent elem ent  ( wor ds of RE + cont ext ) { m at ch} words of docum ent  < cur r ent docum ent elem ent > set t o t hat elem ent

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

18

3

4. Discourse m arker s ( DM)

Result s and opt im izat ion 

[ Zufferey & Popescu- Belis] 

Best results (322 REs)  



RE RE

 



docum ent : 93% vs. 50% baseline ( m ost fr equent ) doc. elem ent : 73% vs. 18% baseline ( m ain ar t icle)

  

Opt imization of features and t heir relevance 







cont ext ual f eat ures  only r ight context of t he RE must be consider ed for m at ching  opt im al size of context : ~ 10 wor ds  r elevance: when rem oved, ~ 40% accur acy only



 



anaphora t racking  r elevance: when rem oved, ~ 65% accur acy only 19

 

1b. So t hey'll say w e ll t hese ar e t he t hings I want t o do. 

2a. Did y ou lik e t h e m ovie? 2b. Most of our m eet ings ar e u h m eet ings cur r ent ly wit h lik e , five, six, sev en, or eight p eople.



How t o det ect only “ pr agm at ic” uses?





( b) vs. ( a)

I SC Ly o n

21

κ = 0.74 ( > 0.67) r eliable t ask pr osodic cues ar e crucial

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

Feat ur es char act erizing DM vs. non- DM uses



Conclusions



r = 0 .9 5 / p = 0 .6 8 / κ = 0 .6 5

“ negat iv e” or excluding collocat ions durat ion of it em durat ion of pause befor e lik e durat ion of pause af t er lik e

1. I m port ance of collocat ion

filt ers 2. A pause before lik e indicat es a

DM in 91% of t he rem aining cases

Set of posit ive and negat ive exam ples f rom I CSI - MR 

22

Result s for DM classificat ion Scores for like: best classifier





I nt er- annot at or agreem ent

Decision t r ees + C4. 5 t raining ( Quinlan / WEKA)





Annot at ors had t o classify each occurrence of like as DM or non- DM





20

1 st ex perim ent : only wit h t ranscript 2 nd ex perim ent : t ranscript link ed t o audio

 

St at ist ical t r aining of DM classifiers 

I SC Ly o n

Disam biguat ion of DM like by hum ans using prosodic cues

1a. I t allows y ou t o ent er t hings w e ll.

1 7 Febr ua ry 2 0 0 5

bot h lexical it em s ar e am biguous: t hey can f unct ion as a discourse m ark er or as som et hing else ( e.g. , v er b or adverb) need t o di sam bi gu at e occu r r en ce s: D M vs. n on - D M

1 7 Febr ua ry 2 0 0 5

Exam ples



“ like” - signals approxim at ion “ well” - m arks t opic shif t , or correct ion

Problem 

I SC Ly o n

increase accuracy of POS t agging prelude t o synt act ic analysis indicat e global discourse st r uct ure indicat e coher ence r elat ions ( à la RST) bet w een ut t erances serve as feat ur es for t he aut om at ic det ect ion of dialog act s

Tw o m arkers w er e st udied 

( local) opt im al weight s for m at ching  RE t it le of ar t icle 15 r ight cont ext wor d t it le 10 * cont ent word of art icle 1

1 7 Febr ua ry 2 0 0 5



I m port ance of DM ident ificat ion

3. Ot her fact ors are r elevant t oo,

but quit e r edundant pr osody

~ 4500 for like and ~ 4100 for well

Result s of t he t raining  

binary decision t ree classifier ( DM / non- DM) m easur e of t he discrim inat ion power: 10 t im es cross- validat ion

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

23

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

24

4

Wit hout collocat ion filt ers  



Scores of best classifier





Conclusions 2. 3.

Scor es r = 0.97 / p = 0.91 / κ = 0 .8 1

r = 0.35 / p = 0.6 / κ = 0 .2 3

1.



Best classifier for well as a DM Conclusions : 1.

Ot her feat ures ar e r elevant t oo Best t em por al feat ur e: a pause befor e or aft er like Tem por al feat ur es ar e r edundant when collocat ions can be used

2.



Use of collocat ions only r = 0.98 / p = 0.89 / κ = 0.78



Prosody is relev ant t o hum an annot at ors



Relevance of other featur es? Use of “ pause aft er ” only r = 0.96 / p = 0.77 / κ = 0.45

t ry t o find ot her r elevant pr osodic features

1 7 Febr ua ry 2 0 0 5

I m port ance of collocat ions A pause aft er well indicat es the presence of a DM

I SC Ly o n

25

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

26

TQB: Transcript - based query & brow sing int erface 2. Results of the query

Snapshot / Dem o 6.References to documents

Use of SDA in a m eet ing brow ser 1.Parameters of the query 3. Rich transcript 1 7 Febr ua ry 2 0 0 5

Sum m ary: m achine learning t echniques and t heir scores



M e t hod

Ba seline

Accura cy

DA

MALTUS

MaxEnt

~ 40%

70- 73%

LSA/ C99

67%

60- ( 90) %

EP

Boundaries

DE

RE DE

Rule- based ~ 20%

73%

DM

DM/ non- DM

36% ( like) Decision t rees, C4.5 66% ( well)

81% 91%

I SC Ly o n

I SC Ly o n

28

Use of m achine lear ning w hen…    







29

enough annot at ed dat a for t raining enough low- level relevant feat ur es unknown opt im al r elat ions bet w een feat ur es and annot at ions D A, EP, ( T O) , D M  possibilit y t o add som e obvious hand- cr aft ed r ules

Use of hand- cr af t ed r ules or classifiers w hen… 

Machine learning appears t o be relev ant t o sem ant ic/ pragm atic annot at ions More or less t ransparent st atistical m odels 1 7 Febr ua ry 2 0 0 5

5.Documents

SDA: m achine learning or not ?

Ta g se t



4. Links to sound file

not enough dat a t o lear n r elat ions bet ween feat ur es and annot at ions ( UT) , ( RE) , RE  D E  possibilit y t o opt im ize aut om at ically t he hand- crafted r ules

Possibilities t o use a m ix t hem

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

30

5

Conclusion: The basis of evaluat iondr iven language pr ocessing

Fut ure w or k I nt egr at ion: “ m ult i- agent dialogu e p ar ser ”

 

De fine an obser va ble lin guist ic phe n om en on

each m odule generat es annot ations loop t hrough modules until no annotation can be added

Ext ensions

 

add new m odules, im prove existing ones: TO, RE, … use m ultim odal feat ures: prosody , face ex pression, …

design int erfaces t o annot ated dat abase t est them wit h/ without access t o annot ations YES / N O

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

Design a syste m t hat det ect s t he phenom enon ( st at ist ical, rule- based, hybrid, et c.)

Adapt t he syst em t o t raining dat a Prepare ground t rut h ( GT) annot at ed dat a ( annot at ors agree, or rem ove inst ances) . Separat e t raining/ t est dat a

Relevance of SDA an not at ions t o m eet in g br owsin g

 

Ask hum an j udges t o annot at e it on dat a. I s int er- annot at or agreem ent (I AA) accept able?

Define feat ures t hat help t o det ect the phenom enon

31

1 7 Febr ua ry 2 0 0 5

Eva lu at e the syst em on test dat a: com pare it s output ( R) t o GT. I s dist ance( GT,R) close t o I AA ?

Go t o another phenom enon I SC Ly o n

I nt egrat e several recognizers 32

References  







Clark A. & Popescu- Belis A. (2004) - Mult i- level Dialogue Act Tags. I n Proc. SI GDI AL'04, Cam bridge, MA, p.163- 170. Lisowska A., Popescu- Belis A. & Arm st rong S. (2004) - User Query Analysis for t he Specificat ion and Evaluat ion of a Dialogue Processing and Ret rieval Syst em . I n Proc. LREC 2004, Lisbon, p.993- 996. Popescu- Belis A., Clark A., Georgescul M., Zufferey S. & Lalanne D. (2005) - Shallow Dialogue Processing Using Machine Learning Algorithm s ( or not ) . I n Bengio S. & Bourlard H., eds., Machine Learning for Mult im odal I nteract ion, LNCS 3361, SpringerVerlag, Berlin, p.277- 290. Popescu- Belis A. & Lalanne D. (2004) - Ref2doc: Reference Resolut ion over a Rest rict ed Dom ain. I n Proc. ACL 2004 Workshop on Reference Resolut ion and it s Applicat ions, Barcelona. Zufferey S. & Popescu- Belis A. ( 2004) - Towards Aut om at ic Disambiguat ion of Discourse Markers: the Case of 'Like'. I n Proc. SI GDIAL'04, Cambridge, MA, p.63-71.

1 7 Febr ua ry 2 0 0 5

I SC Ly o n

33

6