I nst it ut ional support
Meet ing underst anding based on surface annot at ions
( I M ) 2 pr oj ect – a Sw iss NCCR
Andre i Pope scu- Be lis
I nt eract iv e Multim odal I nform at ion Managem ent
I SSCO / TI M / ETI Un ive r sit y of Ge ne va Sw it ze rla nd
Un ive r sit y of Ge n e va , ETI
j oint work wit h Susan Arm st rong, Alexander Clark , Maria Georgescul Denis Lalanne, Agnes Lisowska, Sandrine Zufferey , et al.
School of t ranslat ion and int erpreting
1 7 Febr ua ry 2 0 0 5
Research on m eet ing processing
I SC Ly o n
2
Meeting processing and ret riev al in ( I M) 2 Understanding human dialogues in meetings
Dialogu e “ u nder st anding” by com put er s has pr om ising applicat ions m eet ing t ranscript ion enriched eet ing sum m arizat ion m elligent m eet ing browsing int digit al assist ant s for m eet ing room s
Shallow Dialogue Analysis - segmentation, keywords - dialogue acts - co-reference - discourse markers Argumentative Analysis
Dialogue annotation modules
Processing of dialogues Storage of processed & annotated meetings
applicat ions t o hum an- com put er dialogue
Desir able: Fully aut om at ed m inut e wr it ing applicat ion Reason able h ope: “ Wer e t her e an y q uest ions ab out sect ion 2 of t h e r epor t ?”
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
Transcript-based interface (TQB) - queries to DB - multimedia output Multimodal interface “Archivus” - speech, text, pointer - dialogue models
Retrieving meeting data
Interfaces
Interfaces to meeting databases 3
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
Plan of t he t alk
Const raint s on our st udy of dialogue processing
I nt r oduct ion Shallow Dialog ue Ann ot at ion ( SDA) Segm ent at ion into episodes Recognition of dialogue act s Resolut ion of references t o docum ent s Det ect ion of discourse m arkers
Use of SDA in a m eet ing br owser Discussion m achine learning ( or not ) for SDA cycle of evaluation- driven language processing
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
5
4
Theoretical grounding
availabilit y of m odels of t he phenom enon dom ains
sem ant ics + discour se st udies + pr agm at ics
Application requirem ent s
what user s want t o r etr ieve: analysis of user quer ies r elevance t o ot her applicat ions in t he field
Em pirical v alidity
definit ions based on examples occur r ing in a given cor pus hum an annot at or s find consist ent r esult s
Av ailabilit y of data
Apparent feasibilit y
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
6
1
Select ed phenom ena: SDA Shallow Dialogue Annot at ion
SDA over v iew Segmentation - utterances - episodes
I nput data: t im ed t ranscript for each speaker ( i.e. channel) Name
Type of a nn ot a t ion
Scope
EP
e pisodes ( 1 )
t em por al boundar ies
cr oss- channel
TO
t opics/ keywords
labels on EP ( open set )
sam e as EP
UT
ut t er ances
t em por al boundar ies
int r a- channel
DA
dia logue a cts ( 2 )
labels on UT ( DA t agset )
sam e as UT
RE
r efer r ing expressions
t em por al boundar ies
int r a- channel
DE
re f . t o docu m en t s ( 3 )
point er s RE
cr oss- m odal
DM
discou rse m a rke rs ( 4 )
wor d classificat ion
DE
Input data - transcribed speech - timing - multimodal events
Reference Discourse - to documents markers - coreference - det ect
Dialogue structure - dialog acts - links between acts
XSLT XSLT
Channel_1
…
Database
Channel_N
XML annotations
XSLT
Est-ce que vous ne poussez pas un peu loin le bouchon ? Disons, vous avez mis en exergue...
int r a- channel
HTML browser
Low level linguistic processing 1 7 Febr ua ry 2 0 0 5
I SC Ly o n
7
Nb. x t im e Media
I DI AP
60 x 5’
I SSCO UniFr
75 x 60’
EN
ut t er ances, dialogue act s, discour se m ar ker s, episodes( 30% )
A, V, T
EN
ut t er ances, episodes
Met hods use word dist ribution t o identify cohesive unit s
int egrat e m ulti- word expressions
use discourse feat ures ( with SVM)
8 x 30’
A, V, T, D
EN
A, V, T, D
FR
ut t er ances, r efer ences t o docum ent s
lat ent sem antic analy sis (LSA, PLSA)
Difficult y no large dat aset available yet wit h a ll SDA annotat ions I SC Ly o n
9
“ Real” dat a
47%
LSA
35%
34%
C99
43%
10%
Next : t opic charact erization exper im ent s w it h keyw or d ext r act ion vs. concept ident ificat ion ( EDR)
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
11
10
[ Clar k & Popescu- Belis]
funct ion of an ut t er ance in dialogue m any com pet ing t heor ies about “ funct ion”
DA annot at ion
r esult s on ar t ificial dat a ( m er ged ar t icles) not corr elat ed w it h r eal m eet ing dat a
I SC Ly o n
Dialogue act
38%
synt actic cues, speaker change, discourse m ark ers ( e.g., well, now) , silences
1 7 Febr ua ry 2 0 0 5
“ Art ificial” dat a
Baseline
2. DA recognit ion
Result s ( Pk score, ~ error rat e)
Goal segm ent each m eet ing int o coherent blocks defined by a com m on t opic
A, T
22 x 15’
Algorit hm
[ M. Georgescul]
Annotat ion
Result s on t opic boundar y det ect ion
8
Lg.
ongoing: all
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
1. Them at ic episodes: t opic boundary det ect ion
Available dat a I CSI - MR
1 7 Febr ua ry 2 0 0 5
pr esupposes segm ent at ion of channels int o ut t er ances som e st at e- of- t he- ar t st at ist ical r ecognit ion m et hods dependence on t he DA t agset
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
12
2
Choosing t he r ight DA t agset
MALTUS: an I M2 pr oposal
DAMSL: indepen dent dim ensions Com m unicat ive St at us, I nform at ion Level, Forward Looking Function, Backward Looking Function
220 observed DAMSL labels m ut ually- exclusive t ags
Conv ersion of I CSI - MR t ags t o MALTUS
I SC Ly o n
13
113,000 ut t erances 50 MALTUS t ags ( w it hout D) m ore analysis and dat a needed t o find which t ags ar e m ut ually exclusive
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
14
3. References t o docum ent s [ Alex Clark ]
[ Lalanne & Popescu- Belis]
Obj ect iv es
Cross- m edia link bet w een
find dim ensions of MALTUS t hat ar e m ost easily pr edict able fr om dat a find dependencies am ong t ags
w hat is said: r efer r ing expr essions docum ent s and elem ent s t o w hich t he REs r efer
Feat ures
m ain f unct ion st at ement , quest ion, backchannel, floor holder / gr abber secondary funct ion r esponse ( posit ive, negat ive or undecided) , at tent ion- relat ed, com m and (per for m at ive) , polit eness m ark, rest at ed info.
Num ber of possible labels: 770
lex ical ( w or ds) + cont ex t ual ( sur r ounding t ags)
Result s
MALTUS
I CSI - MRDA: com bine ( ag ain) SWBD- DAMSL ca. 7 m illion possible labels
reduce dim ensionalit y of I CSI - MRDA
St r uct ur e of a MALTUS label: t ags
clust ered int o 42
DA t agging in I M2
St at em ent 36% , Acknowledgem ent / Backchannel 19% , Opinion 13% , Agree/ Accept 5%
1 7 Febr ua ry 2 0 0 5
SWBD- DAMSL:
Mult idim ensional Abst ract Layered Tagset for Ut t eranceS
Four w ay classifier ( S | Q | B | H) 84.9% accuracy vs. 64.1% baseline Full MALTUS classif ier ( w it hout “ disr upt ions” ) 73.2% accuracy vs. 41.9% baseline ( S t ag) MALTUS w it h six classifier s t r ained separ at ely Prim ary classifier: S| H| Q| B 5 secondary classifiers: PO | not PO, AT | not AT, et c. 70.5% accuracy only
Conclusion
separ at e cls. < com bined cls.
1 7 Febr ua ry 2 0 0 5
dependencies bet w een DAs
I SC Ly o n
15
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
Ref2doc annot at ion
Ref2doc algorit hm based on anaphor a t r acking
DI VA/ Univer sit y of Fr ibour g press- review m eet ings ( ~ 15’ each) 22 m eet ings, 30 docum ent s
Loop t hrough REs in chronological order
90%
1 7 Febr ua ry 2 0 0 5
97%
for docum ent elem ent s ass. ( 9 I SC Ly o n
3 errors) 17
if RE includes newspaper nam e r ef ers t o t hat newspaper < cur r ent docum ent > set t o t hat newspaper ot herwise ( anaphor) r ef ers t o < cur rent docum ent >
Docum ent elem ent assignm ent
I nt er - ann ot at or agr eem ent 3 annotat ors on 1/ 3 of t he data before discussion aft er discussion 96% 100% for docum ent assignm ent (3 0 errors)
st ore < curr ent docum ent > and < curr ent docum ent elem ent >
Docum ent assignm ent
Gr oun d t r ut h an n ot at ion for t r aining and ev aluat ion dialogue t ranscript ion, docum ent st ruct uring ( XML) RE annotat ion: 427 REs ref2doc annotat ion
16
if RE is anaphoric r ef ers t o < curr ent docum ent elem ent > ot herwise best m at ching docum ent elem ent ( wor ds of RE + cont ext ) { m at ch} words of docum ent < cur r ent docum ent elem ent > set t o t hat elem ent
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
18
3
4. Discourse m arker s ( DM)
Result s and opt im izat ion
[ Zufferey & Popescu- Belis]
Best results (322 REs)
RE RE
docum ent : 93% vs. 50% baseline ( m ost fr equent ) doc. elem ent : 73% vs. 18% baseline ( m ain ar t icle)
Opt imization of features and t heir relevance
cont ext ual f eat ures only r ight context of t he RE must be consider ed for m at ching opt im al size of context : ~ 10 wor ds r elevance: when rem oved, ~ 40% accur acy only
anaphora t racking r elevance: when rem oved, ~ 65% accur acy only 19
1b. So t hey'll say w e ll t hese ar e t he t hings I want t o do.
2a. Did y ou lik e t h e m ovie? 2b. Most of our m eet ings ar e u h m eet ings cur r ent ly wit h lik e , five, six, sev en, or eight p eople.
How t o det ect only “ pr agm at ic” uses?
( b) vs. ( a)
I SC Ly o n
21
κ = 0.74 ( > 0.67) r eliable t ask pr osodic cues ar e crucial
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
Feat ur es char act erizing DM vs. non- DM uses
Conclusions
r = 0 .9 5 / p = 0 .6 8 / κ = 0 .6 5
“ negat iv e” or excluding collocat ions durat ion of it em durat ion of pause befor e lik e durat ion of pause af t er lik e
1. I m port ance of collocat ion
filt ers 2. A pause before lik e indicat es a
DM in 91% of t he rem aining cases
Set of posit ive and negat ive exam ples f rom I CSI - MR
22
Result s for DM classificat ion Scores for like: best classifier
I nt er- annot at or agreem ent
Decision t r ees + C4. 5 t raining ( Quinlan / WEKA)
Annot at ors had t o classify each occurrence of like as DM or non- DM
20
1 st ex perim ent : only wit h t ranscript 2 nd ex perim ent : t ranscript link ed t o audio
St at ist ical t r aining of DM classifiers
I SC Ly o n
Disam biguat ion of DM like by hum ans using prosodic cues
1a. I t allows y ou t o ent er t hings w e ll.
1 7 Febr ua ry 2 0 0 5
bot h lexical it em s ar e am biguous: t hey can f unct ion as a discourse m ark er or as som et hing else ( e.g. , v er b or adverb) need t o di sam bi gu at e occu r r en ce s: D M vs. n on - D M
1 7 Febr ua ry 2 0 0 5
Exam ples
“ like” - signals approxim at ion “ well” - m arks t opic shif t , or correct ion
Problem
I SC Ly o n
increase accuracy of POS t agging prelude t o synt act ic analysis indicat e global discourse st r uct ure indicat e coher ence r elat ions ( à la RST) bet w een ut t erances serve as feat ur es for t he aut om at ic det ect ion of dialog act s
Tw o m arkers w er e st udied
( local) opt im al weight s for m at ching RE t it le of ar t icle 15 r ight cont ext wor d t it le 10 * cont ent word of art icle 1
1 7 Febr ua ry 2 0 0 5
I m port ance of DM ident ificat ion
3. Ot her fact ors are r elevant t oo,
but quit e r edundant pr osody
~ 4500 for like and ~ 4100 for well
Result s of t he t raining
binary decision t ree classifier ( DM / non- DM) m easur e of t he discrim inat ion power: 10 t im es cross- validat ion
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
23
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
24
4
Wit hout collocat ion filt ers
Scores of best classifier
Conclusions 2. 3.
Scor es r = 0.97 / p = 0.91 / κ = 0 .8 1
r = 0.35 / p = 0.6 / κ = 0 .2 3
1.
Best classifier for well as a DM Conclusions : 1.
Ot her feat ures ar e r elevant t oo Best t em por al feat ur e: a pause befor e or aft er like Tem por al feat ur es ar e r edundant when collocat ions can be used
2.
Use of collocat ions only r = 0.98 / p = 0.89 / κ = 0.78
Prosody is relev ant t o hum an annot at ors
Relevance of other featur es? Use of “ pause aft er ” only r = 0.96 / p = 0.77 / κ = 0.45
t ry t o find ot her r elevant pr osodic features
1 7 Febr ua ry 2 0 0 5
I m port ance of collocat ions A pause aft er well indicat es the presence of a DM
I SC Ly o n
25
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
26
TQB: Transcript - based query & brow sing int erface 2. Results of the query
Snapshot / Dem o 6.References to documents
Use of SDA in a m eet ing brow ser 1.Parameters of the query 3. Rich transcript 1 7 Febr ua ry 2 0 0 5
Sum m ary: m achine learning t echniques and t heir scores
M e t hod
Ba seline
Accura cy
DA
MALTUS
MaxEnt
~ 40%
70- 73%
LSA/ C99
67%
60- ( 90) %
EP
Boundaries
DE
RE DE
Rule- based ~ 20%
73%
DM
DM/ non- DM
36% ( like) Decision t rees, C4.5 66% ( well)
81% 91%
I SC Ly o n
I SC Ly o n
28
Use of m achine lear ning w hen…
29
enough annot at ed dat a for t raining enough low- level relevant feat ur es unknown opt im al r elat ions bet w een feat ur es and annot at ions D A, EP, ( T O) , D M possibilit y t o add som e obvious hand- cr aft ed r ules
Use of hand- cr af t ed r ules or classifiers w hen…
Machine learning appears t o be relev ant t o sem ant ic/ pragm atic annot at ions More or less t ransparent st atistical m odels 1 7 Febr ua ry 2 0 0 5
5.Documents
SDA: m achine learning or not ?
Ta g se t
4. Links to sound file
not enough dat a t o lear n r elat ions bet ween feat ur es and annot at ions ( UT) , ( RE) , RE D E possibilit y t o opt im ize aut om at ically t he hand- crafted r ules
Possibilities t o use a m ix t hem
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
30
5
Conclusion: The basis of evaluat iondr iven language pr ocessing
Fut ure w or k I nt egr at ion: “ m ult i- agent dialogu e p ar ser ”
De fine an obser va ble lin guist ic phe n om en on
each m odule generat es annot ations loop t hrough modules until no annotation can be added
Ext ensions
add new m odules, im prove existing ones: TO, RE, … use m ultim odal feat ures: prosody , face ex pression, …
design int erfaces t o annot ated dat abase t est them wit h/ without access t o annot ations YES / N O
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
Design a syste m t hat det ect s t he phenom enon ( st at ist ical, rule- based, hybrid, et c.)
Adapt t he syst em t o t raining dat a Prepare ground t rut h ( GT) annot at ed dat a ( annot at ors agree, or rem ove inst ances) . Separat e t raining/ t est dat a
Relevance of SDA an not at ions t o m eet in g br owsin g
Ask hum an j udges t o annot at e it on dat a. I s int er- annot at or agreem ent (I AA) accept able?
Define feat ures t hat help t o det ect the phenom enon
31
1 7 Febr ua ry 2 0 0 5
Eva lu at e the syst em on test dat a: com pare it s output ( R) t o GT. I s dist ance( GT,R) close t o I AA ?
Go t o another phenom enon I SC Ly o n
I nt egrat e several recognizers 32
References
Clark A. & Popescu- Belis A. (2004) - Mult i- level Dialogue Act Tags. I n Proc. SI GDI AL'04, Cam bridge, MA, p.163- 170. Lisowska A., Popescu- Belis A. & Arm st rong S. (2004) - User Query Analysis for t he Specificat ion and Evaluat ion of a Dialogue Processing and Ret rieval Syst em . I n Proc. LREC 2004, Lisbon, p.993- 996. Popescu- Belis A., Clark A., Georgescul M., Zufferey S. & Lalanne D. (2005) - Shallow Dialogue Processing Using Machine Learning Algorithm s ( or not ) . I n Bengio S. & Bourlard H., eds., Machine Learning for Mult im odal I nteract ion, LNCS 3361, SpringerVerlag, Berlin, p.277- 290. Popescu- Belis A. & Lalanne D. (2004) - Ref2doc: Reference Resolut ion over a Rest rict ed Dom ain. I n Proc. ACL 2004 Workshop on Reference Resolut ion and it s Applicat ions, Barcelona. Zufferey S. & Popescu- Belis A. ( 2004) - Towards Aut om at ic Disambiguat ion of Discourse Markers: the Case of 'Like'. I n Proc. SI GDIAL'04, Cambridge, MA, p.63-71.
1 7 Febr ua ry 2 0 0 5
I SC Ly o n
33
6