Completions, Continuations, and Coordination in

Nov 14, 2005 - And you put it through there, let's see ... account of sentence continuations, such as an account of ... In the time it took us to write this paper at least one ... For our purposes parameters 1, 2, 4, 5, 7, and 9 are of primary interest. .... 2003)—clearly not all such contents should be viewed as adding to the same ...
1MB taille 0 téléchargements 319 vues
1

(Prolegomena to a theory of) Completions, Continuations, and Coordination in Dialogue Massimo Poesio, University of Essex, [email protected] Hannes Rieser, University of Bielefeld, [email protected] November 14th , 2005

1

Introduction Utterances such as 1.2, 1.3, and 2.2 in the following fragment of a transcript from the Bielefeld Toy Plan Corpus of task-oriented dia logues (Skuplik, 1999) are examples of COMPLETIONS (1.2, 2.2) and CONTINUATIONS (1.3). (1) 1.1

Inst

So, jetzt nimmst Du [pause] Well, now you take

1.2

Cnst

eine Schraube a screw

1.3

Inst

eine orangene mit einem Schlitz. an orange one with a slit

1.4

Cnst

Ja Yes

2.1

Inst

Und steckst sie dadurch, also And you put it through there, let’s see

2.2

Cnst

von oben from the top

2.3

Inst

von oben, daß also die drei festgeschraubt werden dann from the top, so that the three bars get fixed then

2.4

Cnst

Ja Yes

These two forms of SENTENCE CONTINUATIONS (Clark, 1996) are among the evidence th at led Clark (1996) and other theoreticians such as Garrod and Anderson (1987) or, more recently, Pickering and Garrod (2004) to argue tha t dia logue requires coordination even at the utterance level. More in genera l, we feel th a t the study of sentence continuations can shed light on a number of centra l issues for models of the semantics and pragmatics of dia logues, both at the `macro’ level of dia logue management and at the `micro’ level of the semantic interpreta tion of utterances. At the macro level, th is type of data may be relevant to compare competing cla ims about coordination—i.e., whether it is best expla ined with an intentional model, or with a model like Pickering and Garrod’s, based on simpler alignment mechanisms. At the micro level, sentence continuations are clear evidence th a t intention recognition in dia logue proceeds incrementa lly, and may provide insights about semantic composition.

2

In th is paper, we propose a treatment of sentence continuations in PTT (Poesio and Traum, 1997, 1998; Poesio and Muskens, 1997; Matheson, Poesio and Traum, 2000) a theory of interpreta tion in dia logue incorporating ideas from Lexical Tree Adjoining Grammar (LTAG; Abeille and Rambow, 2000, Joshi 2004) and (Compositional) DRT (Muskens, 1996) th a t, we believe, provides some of th e necessary ingredients for an account of sentence continuations, such as an account of interpretation with fragments and an account of grounding moves . Our initia l motiva tion for th is work was our belief th a t ideas from PTT, together with Tuomela’s theory of intentions (Tuomela, 2000) provide a lot of the mach inery needed to provide at least a preliminary formal account of sentence continuations. In the time it took us to write th is paper at least one additional account has appeared (Purver et al, 2005), but we believe th a t a number of central questions are still outstanding. One issue explored in some deta il here is the debate between ‘intention-based’ and `alignment-based’ models of dia logue. In addition to a more traditional ‘intentional’ account, we also outline an alignment-based treatment of the phenomenon. The structure of the paper is as follows. In Section 3 we introduce PTT. In Section 4 we introduce a treatment of intentions and shared plans incorporating a number of recent ideas in the area. In Section 5, we use the notions introduced in the previous two sections to provide a preliminary account of sentence continuations. In Section 6, we provide a preliminary treatment of continuations in terms of the Pickering and Garrod model. In Section 7 we compare our analysis with rela ted work, especia lly by Purver et al.

3

2 2.1

Completions and Continuations in the Bielefeld Toy Plane Corpus The Bielefeld Toy Plane Corpus

The Bie lefeld Toy Plane Corpus (BTPC) is a collection of 22 filmed, speech recorded and transcribed construction dialogues between two agents, the Instructor and the Constructor, whose task is to interactively construct a “Baufix” toy airplane (see Fig. 2.1). The instructor (Inst) expla ins the constructor (Cnst) how to assemble the airplane. Instructor and Constructor sit on separate tables, have the same collection of “Baufix”components and can freely communicate. The dia logues were recorded in different conditions so as to set up different contexts for the production and understanding of referring expressions. One dimension of varia tion was sight condition: tota l screen, face to face, ha lfscreen allowing eye contact. Different conditions concerning the instructions to be followed by the instructor were also tested. (In some cases the Instructor had to direct the Constructor on the basis of a a building plan, in others using an already completed model.)

Figure 2.1: Instructor's and Constructor's respective situations at the beginning of 1.1

The corpus consists of 3675 agents’ contributions, counting everyth ing except nonverbal events like groans or laugther. 2.2

Completions and Continuations in the BTPC

[TO BE REVISED] Skuplik (1999) classif ied 126 cooperative contributions from the BTPC along the following dimensions: (1.) producer, (2.) structural units, (3.) grammatical function, (4.) syntactical category, (5.) resulting syntactica l construction, (6.) gapping construction, (7.) acceptance or denial wrt part added by other agent, (8.) wording of acceptance or denial, (9.) indications for change of speaker. Skuplik used the following heuristic definitions to identify completions and continuations. She treated a contribution as a SENTENCE COOPERATION if at least two dia logue participants contributed to its production. In tota l, 160 contributions (4.34 %) were classif ied as cooperatively produced turns, most of them other- initia ted (95%). Following Wilkes-Gibbs (1995) and Clark (1996, pp. 230-235), she further considered two types of sentence extension: a COMPLETION arises if a sub-sententia l structure is filled up by obligatory constituents; a CONTINUATION adds materia l to an already existing sentence. For our purposes parameters 1, 2, 4, 5, 7, and 9 are of primary interest. (1.) Producer There were 54 completions (43%) and 72 continuations (57%). In 79% of the cases Cnst produced the completing or the continuing part, in 21% Inst provided the expansions. (2.) Structural Units 61% of sentence cooperations are complete phrases (German “Satzglieder”); (4.) Syntactica l Category. Prepositional phrases (37%) are preferred, followed by noun phrases (24%), adverbia l phrases (7%), nouns (6%) and sentence parts with finite verbs (5%). (5.) Resulting Syntactical Construction. Most sentences resulting from cooperation do not show extraposition to the right (27%). Furthermore, we have faculta tive extraposition (25%), first order extraposition of

4

obligatory constituents (19%), gapping (14%), recursive (second order) extraposition of an obligatory constituent (4%), and subordinate clauses (3%). (7.) Acceptance or Denia l of Added Part by other Agent. 84% of the completing or continuing parts were accepted by the previous speaker. 41% of the completing or continuing parts are not explicitly accepted or denied. Explicit acceptance is indicated by e.g. ja/yes (28%) and other aff irmative particles. In addition, acceptance can be indicated by various forms of resumption or by paraphrase. (9.) Structural Indications for Change of Speaker. Only in 31% of the sentence cooperations is the change of speaker indicated by prosodic or other means such as various forms of hesita tion. 2.3

The example dialogue at the light of Skuplik’s study

[TO BE REVISED] We now brief ly describe the status of our example with respect to the statistica l evidence presented (percentages given in brackets indicate the overall va lues for the 126 examples of sentence cooperations). In (1a) we have a completion of Cnst’s (79%), a screw, an obligatory NP (30%) yielding a single complete sentence unit (46%), making up a sentence (50%) if merged with Inst’s production an indication for a change of speaker by prosodic means (lengthening of German du and level tone, 31%); here the evidence is not conclusive th a t the completing part is not accepted by Inst (4%), who extends it with an orange one with a slit (9%) th a t Inst’s contribution is in turn accepted by Cnst. (1a) conta ins a completion as well as a continuation. (1b) behaves in a similar way. Let’s see/ (German) also might indicate th a t a change of the speaker role will be accepted by current speaker. German also frequently indicates a planning pause. However, th is time we have a continua tion by Cnst, forming a single complete sentence unit of category AdvP, adding finally up to a sentence without extraposition. Cnst acknowledges by resuming the phrase and extending it with a subordinate clause. In (1a) Cnst’s extension acts as a repair, in (1b) it provides the description of a causal consequence in the domain.

5

3

Incremental Meaning Composition in Dialogue: The PTT Approach

An essentia l prerequisite of any account of sentence cooperations is a theory of semantic interpretation in dialogue expla ining how the meaning of fragmentary utterances performed by different speakers, can be incrementa lly combined to derive more complex interpretations. Our proposal is based on PTT (Poesio, 1994; Poesio and Traum, 1997; Poesio and Muskens, 1997; Poesio and Traum, 1998; Matheson, Poesio, and Traum, 2000) , a theory of dia logue semantics and dia logue interpretation that originated from work on the TRAINS93 system (Allen et al, 1995). PTT was developed to expla in how utterances are incrementa lly interpreted in dia logue, crucially considering both the ir semantic impact (e.g., how the occurrence of the pronoun “Sie” in utterance 2.1 of our example dia logue is interpreted) and their impact on other aspects of dialogue interaction traditionally considered as outside the scope of semantic theory (e.g., the role of the two “Ja”s in 1.4 and 2.4). The first key characteristic of the theory is the assumption, derived from ideas developed in Situation Semantics (Barwise and Perry, 1983; Cooper YYYY; Ginzburg XXXX) th a t the common ground doesn’t simply record the propositions asserted or the question raised, but also – at least for a certain period - the fact th at certa in utterances were performed in a certa in order, and th at they generated certain dia logue actions. Furthermore, the theory assumes tha t the occurrence of these so-called micro-conversational events—sub-sententia l utterances—also leads to immediate updates of the common ground, which in turn leads to the initia tion of semantic and pragmatic interpreta tion processes. A second crucial feature of the theory is th at it a ims a t providing an explicit account of the process by which the common ground is established, or grounding (Clark and Schaefer, 1990; Traum, 1994). It is assumed in PTT tha t new utterances result in the introduction of new Discourse Units, wh ich only become part of the common ground (‘G’) as a result of explicit or implicit Acknowledgments, and may be cooperatively repaired or revised (as in 1.3 and 2.3). The th ird relevant feature of the theory, inherited from (Traum, 1994), is the hypothesis th a t conversational participants perform different types of dia logue acts, not all of which have to do with performing the task: some, including Acknowledgments, DU initia tions, and Repairs, have to do with grounding; other have to do with turn-taking. All of these dia logue acts are recorded in the common ground, when their occurrence is recognized; it is th is recognition tha t drives a number of dialogue interpretation processes. Finally, and crucia lly for our purposes, PTT is based of DRT – specifically, on Muskens’ Compositional DRT (Muskens, 1996), to which it adds axioms for specifying the anaphoric behavior of dia logue acts and a simple formalization of events based on (Muskens, 1995). 3.1

Conversational Events and Discourse Situations

The shared `conversational score' in a conversation does not consist only of information about the propositional content of utterances. Not all utterances are assertions; and the participants in a conversation also share information about whose turn it is to speak, how what is being said fits in with in the structure of the rest of the conversation, and whether what h as been said needs acknowledging (Clark, 1996). As a result, an ordinary conversation does not consist only of utterances performed to assert or query a proposition, but also of utterances whose role is to acquire, keep, or release a turn, to signal

6

how the current utterance relates to what h as been said before, or to acknowledge what has just been uttered (Clark, XXXX; Poesio and Traum, 1997; Ginzburg, 1997). The linguistic tools used for these purposes include CUE PHRASES such as so or (one sense of) okay; KEEP - TURN SIGNALS such as filled pauses ( umm) or wait; and grounding signals such as okay again, right or huhu. Bunt (1995) proposed for these utterances the term DIALOGUE CONTROL ACTS. The context update potentia l of non-assertional speech acts and of dia logue control acts can be formalized rather naturally in terms of a speech act-based theory (Bunt, 1995; Traum and Hinkelman, 1992). The problem of specifying the meaning of the expressions used to perform dia logue control acts was addressed by Poesio and Traum (1997) by proposing tha t the conversational score consists of a record of the speech acts performed during the conversation, i.e., someth ing similar to wha t in Situation Semantics (Barwise and Perry, 1983; Cooper, 1992; Ginzburg and Sag, 2000) is called the DISCOURSE SITUATION . However, Poesio and Traum also argued tha t the move to viewing the conversational score as a record of the discourse situation could be formalized using the tools introduced in DRT (Kamp and Reyle, 1993), because speech acts are in many ways just like any other events: CONVERSATIONAL EVENTS , and because events and their propositional contents can serve as the antecedents of anaphoric expressions, just like normal events. So, whereas the ordinary DRT construction algorithm would assign to the text in (3.1.1) an interpreta tion along the lines of (3.1.2) (using the syntax from Muskens (1996) and his equality operator is for equality in DRSs) – a single DRS conta ining the merged propositional content of both assertions—Poesio and Traum hypothesized th a t upon hearing an assertion of th a t sentence, the common ground in a conversation would be more as in (3.1.3). (3.1.1) a. b.

A: B:

There is an engine at Avon. It is hooked to a boxcar.

(3.1.2) [x,w,y,z,s,s’| engine(x), Avon(w), s: at(x,w), boxcar(y), s’:hooked-to(z,y), z is x] (3.1.3) [ce1,ce2| ce1: assert(A,B,[x,w,s| engine(x), Avon(w), s: at(x,w)]) ce2: assert(B,A,[y,z,s’| boxcar(y), s’:hooked-to(z,y), z is x])] (3.1.3) records the occurrence of two conversational events, ce1 and ce2, both of type assert (Poesio and Traum, 1998; Matheson, Poesio, and Traum, 2000) whose propositional content are separate DRSs specify ing the interpretation of th e two utterances in (3.1.1). The discourse entities ce1 and ce2 can serve as antecedents both of implicit anaphoric references—e.g., in the case of `backward’ acts like answers to questions—and of explicit ones. As in (Kamp and Reyle, 1993; Muskens, 1995), a Davidsonian treatment of events is assumed, in which each event- or state-describing predicate p such as hooked-to or assert h as an additional argument for the event (or state). (We follow Kamp and Reyle’s (1993) notation and write e: p(x,y) rather th an p(x,y,e) for these predicates.) One immediate advantage of th is view is th a t it can be used to expla in the updates to the common ground resulting from conversational events other th an assertions, as well as to the all too common situation in which an utterance performs more th an one type of conversational event (Traum and Hinkelman ,1992). Even if we ignore the fact th a t interrogatives and imperatives have non-propositiona l contents (Poesio and Traum, 1997; Ginzburg and Sag, 2000)—in th is paper, we will for simplicity assume tha t a ll contents of conversational events are propositional, as done, e.g., in SDRT (Asher and Lascarides, 2003)—clearly not all such contents should be viewed as adding to the same proposition. [USE EXAMPLE DIALOGUE HERE] In (3.1.3’), for example, neither the content of the open-option conversational event generated by the first utterance in (3.1.1’) nor the content of the info-request resulting from (3.1.1’b) express statements about the current state of the world, and should not therefore be eva luated at th a t index, as they would be using

7

standard DRT semantics if the dia logue were to be assigned the interpretation in (3.1.2’). Notice th a t (4.1.1’b) may be viewed as performing at least two purposes here: accepting the option proposed in ce1, and performing an info-request. Notice also th at backwardlooking acts such as Accept are all implicitly anaphoric to a previous conversational event (ce1 in th is case), hence the assumption th at conversational events introduce discourse markers just like normal events do. (3.1.1’) a. b.

A: B:

We should send an engine to Avon. S h a ll we use engine E3?

(3.1.2’) [x,w,e, y,e’| engine(x), Avon(w), e: send({A,B},x,w) engine(y), E3(y), e’:use({A,B},y)] (3.1.3’) [ce1,ce2,ce3| ce1: open-option(A,B,[x,w,e| engine(x), Avon(w), e: send({A,B},x,w)]) ce2: accept(B,ce1) ce3: info-request(B,A,[y,e’| engine(y), E3(y), e’:use({A,B},y)])] Assert, Open-option and Info-request in the examples above are all examples of events th a t express the primary, domain-oriented intention the participant intends to convey. The repertoire of core speech acts assumed in PTT has been changing over the years; the la ter versions of the theory have been based on the DAMSL repertoire of dia logue acts (Allen and Core, 1997), and core speech acts for which formalizations are provided include, in addition to Assert and Open-option, the forward-looking acts Statement, Influencing-addressee-future-act, Directive, CommittingSpeaker-Future-Action, Commit, and Offer , and the backward-looking acts Agreement, Accept, Answer, and Reject (Matheson, Poesio and Traum, 2000). In addition to core speech acts, utterances may also be used to perform dia logue control acts and grounding acts; we will see examples of these below. 2 In subsequent work (e.g., (Poesio and Muskens, 1997)), a revised view of the interpretation of speech acts was introduced, in which the contents of conversational events are associated with discourse referents whose values are DRSs as done, e.g., in SDRT (Asher, 1993; Asher and Lascarides, 2003) and by Geurts (1994). According to th is new view, the discourse situation resulting from the two utterREFSances in (3.1.1) being interpreted as performing assertions (and nothing else) would be as in (3.1.4), where the propositional contents of ce1 and ce2 also become available for subsequent anaphoric reference. CORE SPEECH ACTs – conversational

(3.1.4) [ce1,ce2,K1,K2| K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], ce1: assert(A,B,K1) ce2: assert(B,A,K2)] Th is type of representation requires complicating the semantics in order to ensure wellfoundedness, but at least in principle, there are a number of ways of doing th is (Asher, 1993; Geurts, 1994; Poesio and Muskens, 1997); and th is representation has the advantage of providing antecedents for references to contents of conversational events: e.g., B could follow ce1 with a denial like That’s not true, which in PTT would be interpreted as a reference to proposition K1. We will assume th is type of interpretation here. 3 2

The version of PTT discussed in (Poesio and Traum, 1997) also assumes a class of speech acts called argumentation acts (Traum, 1994) th a t capture the information expressed in SDRT by rhetorica l rela tions. See below. 3 It could be argued th at the contents of conversational events are not introduced immediate ly, but only become available when referenced, by processes of `common ground

8

Finally, it is assumed in PTT th at dia logue acts are generated (Goodman, ZZZ; Pollack, 1986) by locutionary acts (REFS: AUSTIN, SEARLE, .. ), which we view as events of type utter. These events are assumed to become part of the discourse situation as well, at least for a time; 4 indeed, they play an important role in our account of incrementality, as discussed below. If we take into account th is additiona l assumption, the information in the discourse situation after the second assertion in (3.1.1) would be as follows: (3.1.5) [u1,u2,ce1,ce2,K1,K2| K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], u1: utter(A,”there is an engine at Avon”) ce1: assert(A,B,K1) generate(u1,ce1) u2: utter(B,”it is hooked to a boxcar”) ce2: assert(B,A,K2) generate(u2,ce2)] 3.2

Intentions and Intentional Structure

In PTT, dia logue acts are performed to ach ieve intentions or to satisfy certa in obligations. Both the fact th at one or more agents have a certa in (possibly collective) intention, and th at they are under certain obligations, may become part of the discourse situation, e.g., as a result of the recognition that a particular core speech has been performed (Matheson, Poesio, and Traum, 2000). In previous work in PTT, only a partia l formalization of obligations and intentions was given. In th is earlier work, both obligations and intentions were viewed as relations between agents and action types: for example, the fact tha t agent A has the intention to perform a particular core speech act is captured by the presence in the discourse situation of intention (3.2.1). (3.2.1) i: intend(A,λce.ce: assert(A,B,K1)) We will adopt here a more standard view of intentions and obligations as predicates indexed by the agent(s) holding their intention (REFERENCE). We will a lso simplify matters concerning the dynamics of conversational events by assuming th at intentions and obligations have as their contents propositions describing the state of affa irs to be achieved –i.e., DRSs—rather th an action types, as in (3.2.2), showing the intention by agent A to perform an assertion with content K1. (3.2.2) i: IntA([ce|ce: assert(A,B,K1)]) A partia l formalization of obligations was provided in (Matheson, Poesio ad Traum, 2000). As far as intentions are concerned, previous work only discussed the assumption, inherited from Grosz and Sidner (1986), th a t intentions may be related to each other in two ways: by a rela tion of dominance when sa tisfying a certa in intention is part of the satisfaction of another, more complex, intention (more formally: intention i dominates augmentation’ similar to those proposed by Kamp and Reyle (1993) for plurals, and already argued for by Haviland and Clark (1978) for bridging references. We adopt the simpler view here. 4 An important simplif ication made in PTT is to ignore the issue of forgetting—i.e., the fact th a t information, particularly linguistic information, only remains `activa ted’ for a rela tively short period. There is quite a lot of evidence tha t at least certa in types of information disappear after a period (Sacks, 1967) although the speed at wh ich th is h appens is unclear (Fletcher, 1994). A number of utterances in dialogue – so called informationally redundant utterances (Wa lker, 1994) are also planned with the goal of preventing important information to be forgotten.

9

intention i' if ach ieving i' is part of achieving i), and by a relation called satisfactionprecedes when satisfying an intention is a prerequisite for satisfy ing the second (intention i satisfaction-precededes intention i' if ach ieving i is a necessary prerequisite of ach ieving i’). As each intention is (directly) dominated by only one other intention, Poesio and Traum (1997) formalized dominance with a (partia l) function dom mapping intention i2 to the intention to which it is subordinated, if any. More controversia l ly, satisfaction-precedence was also formalized as a partia l function sp mapping intention i2 to the intention i1 th a t must be achieved for i2 to be ach ievable. sp(i2) = i1 No axioms for intentions or intentional structure were proposed by Poesio and Traum; one of the goals of th is paper is to use completions and continuations as a source of additional evidence on the properties of intentions, and how they affect interpretation; we discuss a few possibilities and our assumptions in Section 5. In (Poesio, 1994), a theory of discourse structure based on Grosz and Sidner’s (1986) was assumed: utterances were viewed as having, among other effects on the information state shared between agents, th a t of inducing an intentional structure, organized by the same two dominance and satisfaction-precedes rela tions assumed by Grosz and Sidner. In addition, Poesio argued tha t the conversational events belonging to a discourse segment form a sequence, or conversational thread, and assumed a function prec from a conversational event to the previous event in the thread. In (Poesio and Traum, 1997), core speech acts were viewed are ‘compiled’ intentions, so tha t the functions dom and sp would also apply to core speech acts. dom maps a core speech act ce1 to the core speech act ce2 to which the intention expressed with ce1 is subordinate; and sp maps core speech act ce1 to core speech act ce2 if satisfaction of the the intention associated with ce2 is the prerequisite for satisfying the intention associated with ce1. We mainta in th is extension of the use of dom and sp; in addition, we will a lso use a prec function like th a t proposed in (Poesio, 1994), which will map a core speech act to the preceding core speech act in the same discourse segment. 5 The simplest examples of uses of dom are cases like th e (made-up) dia logue: (3.2.3)

u1 M: We need to send a tanker car full of oranges to Avon. u2 We should begin by sending a boxcar to Bath to pick up oranges.

in which u1 is an explicit expression of the overall intention (DSP) (“developing a plan to send a tanker car full of oranges to Avon”) to which the intention associated with u2 is subordinate. Under th is interpreta tion, the dia logue would result in both utterances generating core speech acts (ce1 and ce2) of type assert, such th at dom(ce2) = ce1. Often, however, the dsp for the current segment will only be introduced implicitly; th is is the case in the example dia logue. Both (3.1.1) and the example dia logue are examples of core speech acts representing two successive steps of the conversational thread of the same discourse segment. The two assertions in (3.1.1) are dominated by the same DSP dsp1 (explicitly or implicitly realized), which we assume has been introduced earlier in the discourse situation, and ce1 is the previous element of the conversational thread to which ce2 belongs. When we include the information about intentional structure, the full interpretation of (3.1.1) becomes: (3.2.4) […, u1,u2,ce1,ce2,K1,K2| … K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], 5

In (Poesio and Traum, 1997) a more complex structure was assumed, specif ied by Argumentation Acts encoding the type of information expressed by Rhetorica l Rela tions in RST (Mann and Thompson, 1987) and SDRT (Asher and Lascarides, 2003), but no formalization was provided. We will only need the simpler set of two rela tions here.

10

K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], u1: utter(A,”there is an engine at Avon”), ce1: assert(A,B,K1), generate(u1,ce1), dom(ce1) = dsp1, u2: utter(B,”it is hooked to a boxcar”) ce2: assert(B,A,K2) generate(u2,ce2) dom(ce2) = dsp1, prev(ce2) = ce1] In addition to intentions and obligations whose existence is part of the common ground, agents can also have priva te attitudes. We return to th is issue later. 3.3

The Dynamics of Discourse Situations, Part I: The Dynamics of Core Speech Acts

The interpreta tion in (3.2.4) resembles in some respects the kind of interpretation th a t would be assigned to such dia logues in SDRT, which means th at PTT shares with th a t theory an apparent problem: the explanation of intersententia l accessibility given in DRT is not always ava ilable. The reader may have noticed th a t discourse entity x in DRS K1, the propositional content of conversational act in (3.2.4), appears to be inaccessible from with in DRS K2, the propositional content of ce2. As in all theories of the common ground a ttempting to model conversational events other th an simple assertions, the PTT explanation of what makes discourse referents accessible is more complex th an simply assuming tha t antecedents and anaphoric expressions are all part of a single DRS. Our example dia logue provides a real life illustration of th is problem: “Sie” and “dadurch” in utterance 2.1, both of which refer to antecedents introduced as part of the content of previous speech acts. If we leave aside for the moment incremental interpretation and grounding, the PTT interpretation of the example dia logue would involve the core speech acts in (3.3.1): a first directive, ce1, to grasp the relevant screw resulting from 1.1-1.3 and accepted in 1.4, followed by a second directive, ce3, to put th is screw through the a ligned fuselage and wing (see following sections). The screw x is introduced in K1, the propositional content of ce1, and referred to using “Sie” in K2. (The issue becomes even more complex when we consider th at the pronoun is interpreted before the content of speech act act is complete; we return to th is problem below.). In addition, as we saw in the previous section, both ce1 and ce3 are part of the same discourse segment (they are both dominated by the same dsp, dsp1) and they form a conversational th read in which ce1 precedes ce3. (3.3.1) [ce1,ce2,ce3, K1, K2 | K1=[x,z,e|screw(x), orange(x), slit(z), has(x,z), e: grasp(Cnst,x)], ce1: directive(Inst, Cnst,K1), dom(ce1)=dsp1, ce2: accept(Cnst,ce1), K2=[e’| e’:put-through(Cnst,x,y)] ce3: directive(Inst, Cnst,K2), dom(ce3)=dsp1, prev(ce3)=ce1 ] The PTT explanation for the accessibility of x derives from the assumption— common to most work on discourse—th a t accessibility depends on discourse structure, i.e., th a t the propositional contents of speech acts th a t are part of the same `discourse segment' are implicitly rela ted (e.g., (Grosz and Sidner, 1986)). Asher and Lascarides (2003), as well, adopt an explanation of th is type; they argue th a t whether a discourse entity x introduced as part of the content K1 of speech act ce1 is accessible while performing speech act ce2 with content K2 depends on whether the conversational acts ce1 and ce2 are

11

rhetorica lly rela ted. 6 The Veridicality axiom of SDRT ensures tha t if R(K1,K2) is the case, and R is a veridica l rhetorical relation, K2 has to be interpreted in the context provided by K1, wh ich, informally, can be written as follows [VERIDICALITY-INFORMAL] R(K1,K2) ↔ (K1 & K2) (W here & is dynamic conjunction, not CDRT’s static conjunction.) More precisely, [VERIDICALITY] (Asher and Lascarides 2003, p. 157). Let R be one of Explain, Narrate, Parallel, Contrast, Background, Result, and Evidence. Let π1 , π2 be speech act labels with contents Kπ1 , Kπ2 . Then,

(w, f ) R(! 1 ,! 2 )(w' , g )iff (w, f ) K !

1

& K ! 21 & " R (!1 ,! 2 ) (w' , g )

But unfortunately th is solution is not applicable to examples like the one we are discussing, as π1 and π2 are directives, whose content almost by definition doesn’t hold at the world w where Inst is giving instructions. A number of solutions to the accessibility problem for core speech acts were considered during the development of PTT (Poesio and Muskens, 1997). The solution adopted in (Poesio and Traum, 1997) was derived in part from the proposals of Grosz and S idner (1986) concerning the effect of discourse structure on accessibility, in part from the proposals made in the original version of SDRT (Asher, 1993) concerning the way discourse structure affects accessibility, adapted however to a framework in which, like in Grosz and Sidner, only two rela tions between intentions are assumed: dominance and satisfactionprecedes. According to Grosz and Sidner, intentional structure affects accessibility 7 in tha t the focus space associated with an intention i th a t dominates intention i’ is on the stack while processing the utterances rela ted to intention i’. Poesio and Traum were primarily concerned with formalizing accessibility between successive core speech acts belonging to the same discourse segments, wh ich they assumed to be in a satisfaction-precedence rela tion. Their main hypothesis concerning accessibility was th a t discourse entities introduced as part of the propositional content K1 of a core speech act ce1 are accessible from with in the propositional content K2 of a core speech act ce2 satisfaction-preceded by ce1. This hypothesis was formalized using Compositional DRT, in which assignments are part of the object language, as shown in AX-SP-PT97. Th is axiom uses Compositional DRT’s assignments (‘states’) to do the work of Grosz and Sidner’s focus spaces: it states th a t if assignment i satisf ies three conditions—th a t two conversational event ce1 and ce2, of type p with content K1 and type r with content K2, respectively, took place, and tha t ce1 satisfaction-precedes ce2 (written sp(ce2) = ce1)—then for every output assignment j of K2 there must be an input assignment l such th a t (l,j) satisfies K1. [AX-SP-PT97] Let p, r be core speech acts; K1 and K2 DRSs. Then ∀ ce1,ce2,A,B,C,D,K1,K2,i [ | ce1: p(A,B,K1)] (i) ∧ [ | ce2: r(C,D,K2)] (i) ∧ [ | sp(ce2)=ce1]°(i) → (∀ j,k K2(j,k) → ∃ l K1(l,j)) There are several problems with th is axiom. First of all, notice th at th is axiom is fairly weak; it only requires tha t every input assignment j of K1 is an output assignment of K2; it does not also require for every output assignment of K1 to be an input assignment of K2, a lthough it would be easy to `strenghten’ it th is way. Secondly, although Grosz and Sidner

6

Actually, in SDRT are the contents K1 and K2 th at are rela ted from a formal point

of view. 7

In Grosz and Sidner’s theory, the attentional state is viewed as a stack; accessible discourse entities are those associated with focus spaces on the stack.

12

are not very explicit about th is, it is unlikely th a t they would assume that two successive contributions to the same segment would be satisfaction-preceded. In th is paper we will maintain Poesio and Traum’s use of assignments to do the job of focus spaces, but return to a form closer to the original proposal in (Poesio, 1994) in which prev, rather th an sp, is what ensures accessibility with in discourse segments (together with dom). We provide a formalization of the effect on accessibility of both prev and dom, leaving the study of sp for some other time (it’s not clear to us what G&S cla im either). The basic idea in both cases is as follows: discourse referents in the content K2 of a conversational event ce2 are constra ined to get the same value as the discourse referents in the content K1 of a conversational event ce1 th at precedes or dominates ce2. The `pragmatic’ influence of prev on accessibility is similar to th a t achieved via veridica l rela tions such as Explain or Narrate in SDRT, so we could use an axiom similar to VERIDICALITY. Th is could be done by requiring tha t for any pair of assignments f,g th a t support the occurrence of ce1 and ce2 and that prev(ce2) = ce1, i, j such th at f,g also verify K1;K2, where ‘;’ is CDRT’s concatenation operator: [AX-PREV-VERIDICALITY] Let p, r be core speech acts; K1 and K2 DRSs. Then ∀ ce1,ce2,A,B,C,D,K1,K2,f,g [ | ce1: p(A,B,K1), ce2: r(C,D,K2), prev(ce2)=ce1] (f,g) ↔ [K1;K2](f,g) Consider again the interpretation of (3.1.1) provided by (3.2.4). The requirement here is for x in the scope of K2 to receive the same interpretation as x in the scope of K1: e.g., (3.2.4) should not be verif ied in a situation in which the object hooked to a boxcar is not the engine a t Avon.8 AX-PREV-VERIDICALITY ensures th is is the case: a pair of assignments (f,g) satisfies (3.2.4) iff (K1;K2)(f,g), i.e., iff the same object verif ies both the conditions in K1 and those in K2. There is one complication, however. Consider the interpretation of the example dia logue in (3.3.1). The problem here is th at the actions under discussion here are only planned actions. AX-SP-VERIDICALITY won’t work for directives for the same reasons th a t veridica lity won’t work – directives are not evaluated with respect to the same possible world as the discourse situation. It will only work if we allow for the world at which K1 and K2 are eva luated to be different from the one in which the discourse situation is evaluated. In PTT as defined by Poesio and Traum (1997), Muskens’ approach from Tense and the Logic of Change is adopted, in which a distinguished variable v indicates the world at wh ich eva luation is to be done. We can fix AX-SP-VERIDICALITY by a llowing for th is discourse referent to be different (wh ile everyth ing else stays the same): [AX-PREV] Let p, r be core speech acts; K1 and K2 DRSs. Then ∀ ce1,ce2,A,B,C,D,K1,K2,f,g [ | ce1: p(A,B,K1), ce2: r(C,D,K2), prev(ce2)=ce1] (f,g) ↔ (∃ w’ w’’ f’ g’ f [v] f’ ∧ g [v] g’ ∧ v(f’) = w’ ∧ v(g’) = w’’ ∧ [K1;K2](f’,g’)) (Poesio and Traum, 1997) did not specify the accessibility behavior of dom, Grosz and S idner’s dominance rela tion, mostly because the empirica l evidence about it is not very clear (Poesio, Patel, and Di Eugenio, submitted). The main requirement from Grosz and S idner is th at discourse referents introduced by core speech acts subordinated to the discourse segment purpose dsp1 of a certa in discourse segment ds1 are not accessible once the discourse segment is closed. In addition, we also require the discourse referents introduced by a core speech act when it is explicitly associated with dsp1, as in (3.2.3), are accessible to 8

In PTT, assertions are not required to be valid with respect to the ‘rea l world’ – it is only required th at both conversational participants commit to the belief th at the state of affa irs holds for the purposes of the conversation (Matheson, Poesio, and Traum 2000). We will ignore th is complication here.

13

the core speech acts dominated by dsp1. In other words, what seems to be required is a bit like the formalization of complex DRSs in DRT. A tentative formulation would require th a t an input / output assignment pair (f,g) satisfies a discourse situation containing conversational event ce1 with content K1 and conversational event ce2 with content K2, and in which ce2 is dominated by ce1, iff (f,g) satisfies K1, and there are extensions of g j and k (we write g ⊆ j), such th a t (j,k) satisf ies K2: 9 [AX-DOM] For p, r core speech acts; K1 and K2 DRSs, ∀ ce1,ce2,A,B,C,D,K1,K2,f,g [ | ce1: p(A,B,K1), ce2: r(C,D,K2), dom(ce2)=ce1] (f,g) ↔ K1(f,g) ∧ (∃ j, k g ⊆ j ⊆ k K2 (j,k)) However, both th is solution and the solution proposed in SDRT enta il th at antecedents are only predicted to be accessible after the interpreta tion of a speech act’s content has been derived. Th is is consistent with the spirit of the Asher and Lascarides framework, which is predicated on an assumption of separation between the construction of these representations and their interpretation; however, the key property of completions is th a t the interruption takes place before the other conversational participant has completed her contribution. This is wha t happens in the case of sie in 2.1: at the point in which Const produces von oben, indicating th at she ha s interpreted the pronominal reference, Inst hasn’t yet completed his utterance; yet neither VERIDICALITY nor AX-SP and its variant AX-SP-SDRT expla in why the screw is accessible at th is point. The solution we adopt in th is paper is to introduce additional axioms specifying constra ints on the dynamics of sub-sententia l objects. We will discuss th is solution after introducing the PTT account of incrementa l interpreta tion. 3.4

Micro Conversational Events

W h a t we have seen so far of PTT is not th a t much different from what one finds in other theories of dialogue based on DRT, especia lly SDRT. The first truly distinctive feature of PTT is th a t it takes as a central fact about dia logue that utterances are interpreted incrementa lly, as suggested by most psychological work (Frazier, 1987; Seidenberg, 1979; Tanenhaus et al, 1995), and tha t many, if not most, contributions to dia logue are non-sententia l, as shown by corpus evidence (Poesio, 1995; Ginzburg, XXXX? Purver et al, YYYY). One of the fundamenta l hypotheses underlying PTT is th at the discourse situation is updated not just when a complete sentence has been observed, but whenever a new event is observed, including events such as sub-sententia l or even sub-word utterances, and nonverbal events such as gestures. Psychological research suggests th at such updates can take place every few milliseconds, so th at observing the utterance of a phoneme is sufficient to cause an update; but here we will simply assume th a t they take place at least after every word. We will a lso only discuss here updates caused by linguistic events, although other types of updates – e.g., to the focus of visual attention—have been examined in past work (Poesio, 1993). (See also (Rieser, 2004) for an account of the semantic contribution of pointing gestures.) The incremental update hypothesis is not just motiva ted by psychological findings about incremental interpretation in sententia l utterances, but by the fact th a t in dia logue many types of conversational acts are hardly if ever performed with full sentences. A class 9

This axiom is easiest to formulate with partia l assigments, as done here, but tota l assignments are used in Compositional DRT. With tota l assignments we would have to require l to assign the same va lue of j to every discourse marker except possibly for those in the domain of K2, and m to be identical with l for every discourse marker conta ined in K2 and in every DRS in which K2 is embedded.

14

of non-sententia l utterances th at quite clearly lead to immediate updates of the discourse situation are those used to perform DIALOGUE CONTROL ACTS such as take-turn, keep-turn and release-turn –actions whose function is to synchronize the two participants in the conversation as to whom is holding the floor (Schegloff ZZZZ, Traum and Hinkelman, 1992; Bunt 1992)—and GROUNDING ACTS, i.e., acts whose function is to keep the common ground synchronized between the two participants (grounding acts are discussed in greater deta il below). These conversational actions are sometimes performed by sententia l utterances th at also generate a core speech act (e.g., the second utterance in (3.1.1)), but more commonly they are generated by single-word discourse markers—like okay, well, now, a ll of wh ich may be used to perform a keep-turn dia logue control act, or indeed jetzt in 1.1 in the example dia logue (1), which arguably is used to perform a keep-turn function as well as (possibly) a temporal sequencing one—or even non-words, such as filled pauses ( umms and the like). Assuming tha t non-sententia l utterances can lead to updates of the discourse situation results in a theory of update in which the effects of these utterances can be modelled. In PTT, it is hypothesized th at observing such utterances may result in the addition to the discourse situation of keep-turn events (as well as other possible events), via updates of the discourse situation like those in (3.4.1): (3.4.1) well→ [u,ce|u: utter(A,”well”), ce: keep-turn(A), generate(u,ce)] now→ [u,ce|u: utter(A,”now”), ce: keep-turn(A), generate(u,ce)] umm→ [u,ce| u: utter(A,”umm”), ce: keep-turn(A), generate(u,ce)] okay → [u,ce| u: utter(A,”okay”), ce: keep-turn(A), generate(u,ce)] For the purposes of th is paper, we will assume tha t th e utterances of so and jetz at the beginning of 1.1 in the sample dia logue (1) have primarily a dia log control function, so th a t observing their occurrence results in the following updates to the discourse situation.10 (3.4.2) so→ [u,ce|u: utter(Inst,”so”), ce: take-turn(A), generate(u,ce)] jetzt→ [u,ce|u: utter(Cnst,”jetzt”), ce: keep-turn(A), generate(u,ce)] One of the advantages of adopting the view of semantic composition found in Compositional DRT (as opposed to the one found in standard DRT) is th a t it allows us to expla in how those utterances whose main function is to express part of the content of a core speech act, like the rest of the utterances in 1.1 - nimmst, Du, etc. - do so incrementally, as well, instead of assuming that (3.2.4) or (3.3.1) are derived all at once after the entire mini-dia log in (3.1.1) / example dia logue have been syntactica lly analyzed, as it would be necessary to hypothesize when assuming the construction algorithm from (Kamp and Reyle, 1993). As suggested by Muskens (1996), using only technical tools present in Compositional DRT, we could already at the very least hypothesize —more plausibly—th a t (3.2.4) is derived compositionally and incrementally by concatenating separately produced interpretations for the two utterances in (3.1.1), as in (3.4.3): 10

Of course, we are simplify ing matters considerably here. First of all, we are assuming the same turn-taking translation for well, now, and umm. Secondly, and most importantly, it is important to keep in mind tha t these discourse markers are (a) extremely ambiguous, particularly so if prosodic information is ignored (in which case competing updates would be possible) and (b) often play more than one function. These two issues are particularly evident in the case of okay, an utterance of wh ich, depending on the context, may result in one or all of the updates in below (see below for a discussion of acknowledge; note also th at `backward-looking’ interpretations of utterances like okay include implicit references to a previous speech act ce’, one of the arguments supporting our hypothesis th at conversational events introduce discourse markers in the common ground). okay→ [ce|ce: keep-turn(A)] okay → [ce|ce: accept(A,ce’)] okay → [ce|ce: acknowledge(A,ce’)]

15

(3.4.3) [ce1,K1 | K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], ce1: assert(A,B,K1)]; [ce2,K2 | K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], ce2: assert(B,A,K2), sp(ce2)=ce1] But once we make the two assumptions th a t locutionary acts are recorded in the discourse situation, and th a t single word utterances can lead to updates as well, we can use the opportunity to specify the meaning of single words, afforded by Compositional DRT, to go one step further, and to hypothesize th a t the utterances of nimmst and Du result in updates of the discourse situation, as well. Psychological research on priming at different levels (as summarized, e.g., in Pickering and Garrod, 2004) as well as work on clarif ication questions such as (Ginzburg and Cooper, 2004; Purver and Ginzburg, 2003) provide us with some evidence concerning these updates. The hypothesis adopted in PTT is th a t the update to the discourse situation caused by the observation of the utterance of a word records the fact th a t th is utterance just occurred, as well as the results of lexica l access – i.e., th a t utterance’s syntactic classif ication, and its conventional meaning, wh ich in PTT is identif ied with the compositional meaning as specified in Compositional DRT (Poesio, 1995; Poesio and Traum, 1997; Poesio and Muskens, 1997; Poesio, To appear). Observing an utterance of the noun boxcar, for example, results in th e update of the discourse situation in (3.4.4). This update records the utterance of a new locutionary act u (we use the predica te utter to characterize locutionary acts), syntactically classifed as a noun, and with semantic content λx [|boxcar(x)]. (3.4.4) [u|u:utter(A,”boxcar”), Noun(u), sem(u)=λx [|boxcar(x)] (We will often use the abbrevia ted notations u:”boxcar”:Noun to indicate the information added by the utterance of a word to the discourse situa tion, omitting its lexica l semantics, and u:”boxcar”: λx [|boxcar(x)] that specif ies its lexical semantics but omits its syntactic interpretation). We use the term MICRO CONVERSATIONAL EVENTS ( MCEs) to refer to the events of uttering sub-sententia l constituents (Poesio, 1995). The PTT view of the interpretive processes th a t follow the initia l observation of word utterances and lexica l access – syntactic interpretation (parsing) and semantic composition—is very much inspired by current work on grammar in frameworks like Tree Adjoining Grammars and Categoria l Grammar, the only difference being tha t in PTT these interpretive processes are viewed as inferentia l processes resulting in updates of the discourse situation. Parsing is viewed in PTT as a process tha t results in hypotheses about the results of lexical access combine together in phrasa l hypotheses, and these in larger phrasa l hypotheses . As in (Poesio, 1995; Poesio, 2001; Poesio, to appear), we assume here the syntactic framework of Lexicalized Tree Adjoining Grammar (LTAG) (Schabes, 1990 (CHECK STURT AND CROCKER); MORE REFS), since it lends itself to a very natural account of the process by which syntactic interpreta tions are constructed incrementa lly (Sturt and Crocker, 1996). 11 In LTAG, the lexica l interpretations of words are elementary trees. In the case of sorta l nouns like boxcar or Schraube these trees are atomic; but in the case of words whose semantic interpretation takes arguments, such as verbs and determiners, the elementary trees are more complex, and already contain ‘attachment points’ for such arguments. For example, Figure 3.4.1 illustrates the lexica l interpreta tion of the determiner eine according to LTAG. In PTT, the utterance of th is determiner results in the discourse situation being updated not just by the observation tha t an utterance uspec occurred, but also by the expectation tha t uspec is going to be part of the performance of the utterance u of an NP, of which uspec will occupy the specif ier position, as well as the performance of an utterance u of type N and (possibly) of a complement ucompl . 11

The LTAG framework has also been adopted in other modern frameworks concerned with semantic interpretation, such as Muskens’ Logical Description Grammar (Muskens, 2001).

16

uspec u u u ucompl u :NP

uspec :Det “ eine” sem(uspec )= λP’λP([y| ]; P’(y); P(y))]

u

u:N

ucompl

Figure 3.4.1: the update to the discourse situation resulting from an observation of eine The elementary trees introduced into the discourse situation by the performance of utterances of single words are combined by means of the two basic TAG operations: substitution and adjunction. Substitution is the operati on by which `expected’ components of the syntactic structure of an utterance, such as the nominal head of the NP expected as a result of the observation of eine, are `slotted into’ non-atomic elementary trees. For example, assuming an LTAG translation for Schraube analogous to th a t for boxcar, observing an utterance of Schraube, and subsequently substituting the associated elementary tree into the interpretation in Figure 3.4.1, results in the interpreta tion in Figure 3.4.2. uspec u u u ucompl u :NP

uspec :Det “ eine” sem(uspec )= λP’λP([y| ]; P’(y); P(y))]

u

u:N ucompl “Schraube” sem(u)= λv([ |screw(v)]] Figure 3.4.2: the updates resulting from the observation of eine Schraube, after substitution of the elementary tree for Schraube into 3.4.1

The missing ingredient is an account of how phrasa l utterances receive an interpretation. It is clear th a t semantic composition cannot be specif ied in terms of operations tha t manipulate semantic objects prior to these objects being added to the discourse situation; instead, we are led towards an `inferentia l’ characterization of semantic composition, like th a t adopted in Categorial Grammar and the related work on `Parsing as inference’ (Pereira, 1990; Carpenter, XXXX), where the combination of

17

utterances in larger utterances and the specification of the meaning of these larger utterances are provided by inference rules. The particular approach adopted here involves defeasible inferences over the DRS obta ined by concatenating the updates resulting from the utterances of single words (Poesio, To Appear). These default inference rules have the effect of the semantic composition rules introduced by Muskens (1996) for Compositional DRT. For example, the rule BINARY SEMANTIC COMPOSITION below specifies th a t if u1 and u2 are the (only) two constituents of u3, one of them (say, u1) has a semantic interpretation of type 〈α,β〉, and the other has a semantic interpreta tion of type α, the semantic interpretation of u3 is derived by applying the semantic interpreta tion of u1 to th a t of u2. (Cfr. Muskens’ Application rule (Muskens, 1996, p. 166).) We also assume a Unary Semantic Composition inference rule achieving the effect of Muskens’ Copying rule. BINARY SEMANTIC COMPOSITION (BSC):

u1 % u 3, u 2 % u 3, sem(u1) = " # , $ , sem(u 2) = ! # sem(u 3)= " (! ) For a fuller example, let us now return to the beginning of our example dia logue (1), and let us consider a slightly simplif ied version of th e series of utterances tha t result in the first directive: Nimmst Du eine orangene Schraube mit einem Schlitz. (We are ignoring for the moment the issue of the treatment of appositions; see Section 5. As a consequence, we are limiting Const’s contribution to the utterance of a single noun, “Schraube”. Again, see Section 5.) Each of these utterances results in an incrementa l update of the discourse situation which, concatenated, result in the interpreta tion in (3.4.5), where we have used ↑ to indicate dominance in linear notation (REFERENCES). 12 (3.4.5) [mce1, udb1, usp1, ub1, uc1| mce1:utter(Inst,"nimmst"), Verb(mce1), sem(mce1)= λQλx(Q(λx’[e| e: grasp(x, x’)])), mce1 ↑ ub1, uc1 ↑ ub1, uc2 ↑ ub1, ub1 ↑ udb1, VP(udb1), VBar(ub1), NP(uc1), NP(uc2) ]; [mce2, udb2| mce2:utter(Inst,"Du"), Pro(mce2), sem(mce2)= λP.P (you) , mce2 ↑ udb2, NP(udb2)]; [mce3, udb3, un3, ub3, uc3| mce3:utter(Inst,"e ine"), Det(mce3), sem(mce3)= λP’λP([y1| ]; P’(y1); P(y1)), mce3 ↑ udb3, un3 ↑ ub3, uc3 ↑ ub3, ub3 ↑ udb3, NP(udb3), Nbar(ub3), Noun(un3)]; [mce4, ub4, uc4| mce4: utter(Inst, “orangene”), Adj(mce4), sem(mce4) = λPλz([ |orange(z)]; P(z)), mce4 ↑ ub4, uc4 ↑ ub4, Nbar(uc4), Nbar(ub4)]; [mce5| mce5:utter(Cnst,"Schraube"), Noun(mce5), sem(mce5)= λv([ |screw(v)]]; [mce6, udb6, ub6, uc6| mce6:utter(Inst,”mit”), Prep(mce6), sem(mce6)= λP λy(P (λx[ |with(x,y)])), mce6 ↑ ub6, ub6 ↑ udb6, uc6 ↑ ub6, PP(udb6), Pbar(ub6), NP(uc6)]; [mce7, udb7, un7, ub7, uc7| mce7:utter(Inst,"e inem"), Det(mce7), sem(mce7)= λP’λP([y2| ]; P’(y2); P(y2)), mce7 ↑ udb7, un7 ↑ ub7, uc7 ↑ ub7, ub7 ↑ udb7, 12

The complete lexica l and grammatical rules for the fragment of German we are considering, broadly based on Muskens (1996), is given in Appendix B.

18

NP(udb7), Nbar(ub7), Noun(un7)]; [mce8| mce8:utter(Inst,"Sch litz "), Noun(mce8), sem(mce8)= λv([ |slit(v)]] For example, let us consider how the updates due to the observation of “einem” and “Schlitz” result in the hypothesis th an an utterance of the NP “einem Sch litz” was observed. The hypothesis th at mce7 and mce8 in (3.4.5) are subutterances of a larger event of syntactic type NP leads to a substitution inference: the hypothesis tha t un7 (expected as the result of observing an utterance of einem) is the same as mce8. The formulation of th is hypothesis results in the following update of the discourse situation: [|

un7 is mce8, udb7: utter (Inst,”einem Schlitz”), NP(udb7)]

Apply ing BSC to the discourse situation thus updated results in the assignment of the conventional meaning λ P.[y2| ];[| slit(y2)]; P(y2) to udb7, wh ich results in the following update: [ |

sem(udb7)= λ P.[y2| ];[| slit(y2)]; P(y2)]

A similar process leads to hypothesizing the rest of the structure; the only difference is th a t including orangene and mit einem Schlitz requires adjoining the PP resulting from processing mit einem Schlitz into the NBar resulting from eine orangene: the result can be seen in Figure 3.4.3. u u uspec u ucompl u :NP

uspec :NP

u:NBar

u:NBar ucompl : PP “ orangene” “mit einem Schlitz” sem(u)= λz( [|orange(z)]) λx([y|];[|slit(y)];[|with(x,y)]) Figure 3.4.3: the update resulting from the adjunction of ”mit einem Schlitz” to “orangene”

Deriving an interpreta tion for a sequence of utterances may involve a number of inferentia l steps: for example, to resolve an anaphoric expression, as discussed in the next subsection. In PTT, it is assumed tha t such steps may result in additional `utterances’ being added to the discourse situation, besides those representing actual word productions like mce7, or the inference of syntactic constituents like un7. The resulting ‘utterance structure’ resembles therefore more a derivation in Montague Grammar, or what is known as LF in generative frameworks, than a traditional s-structure (cfr. Minimalism, Dynamic Syntax). Th is assumption is particularly important for the treatment of scope interpretation (Poesio, To appear). We will see a few examples below.

19

3.5

The Dynamics of Discourse Situations, Part II: Micro Dynamics

In most work on dynamic semantics, including (Muskens, 1996), the only concern is to expla in accessibility, so tha t pronouns are assigned a semantic interpreta tion only after resolution has taken place—i.e., their interpreta tion is assigned off an ‘indexed’ syntactic structure. Any real explanation of how an anaphoric expression like “Sie” in turn (2.1) of the example dia logue can be resolved incrementally must involve two ingredients: an account of what makes the actual antecedent accessible, and a theory concerning the lexica l semantics of pronouns. We discuss each problem in turn, beginning with the second ingredient. We assume throughout th a t the word utterances in 1.4 and 2.1 are recorded as the following MCEs: mce9:”Ja”, mce10:”Und”, mce11:”steckst”, mce12:”Sie” 3.5.1

An incremental lexical semantics for pronouns

As we said, in most work on semantics, it is assumed tha t the meaning of pronouns can be read directly off the indexed syntactic representation of a complete sentence. In order to introduce our proposals for a lexica l semantics for pronouns tha t may account for the way they are incrementa lly interpreted in examples like our example dia logue we will begin with a translation for pronouns tha t is a bit of a strawman and we will quickly reject, but will allow us to introduce a few basic ideas about pronoun interpretation in a simple way. Assuming for the moment a uniform semantics for NPs of type , our first assumption concerns the type of pronouns. The first idea we are going to pursue is th a t pronouns semantica lly function like determiners, i.e., h ave a translation of type >; and th a t what pronoun resolution does is to turn functions of th is type into functions of the type of NPs by specify ing an identity property, i.e., a property of the form λx. [ | x = z] tha t restricts the interpretation of the pronoun to be the same as th a t of discourse entity z. Then a first possibility would be tha t semantically pronouns simply introduce a new discourse entity without specify ing any properties about it, as in (3.5.1). (3.5.1) “sie”  [mce12, udb12| mce12:utter(A,"S ie "), Pro(mce12), sem(mce12)= λP. λP’. ( [y|]; P (y); P ‘(y)), mce12 ↑ udb12, NP(udb12)]; (We assume tha t the va lues for sem are restricted by axioms restricting these va lues to be of a certa in type, of the form (we assume a number of predicates of the form Ty ) (3.5.2) ∀ u NP(u) → Ty



(sem(u))

In order to assign an interpreta tion of th is type to udb12, it is necessary to find first an appropria te antecedent. The model of pronoun resolution we assume here is based on ideas developed in (Poesio, To appear). The starting point for th at proposal is th a t a ll evidence about pronoun resolution suggests th at the primary factors in determining the antecedent of a pronoun are ‘surface’ factors, including distance, agreement features, and syntactic properties such as grammatica l function (Garnham, 2000). This suggests th at at least the commonest cases of pronoun resolution are best viewed as `surface’ inference processes, using the information present in the discourse situation (Cloitre and Bever, 1988; Filik et a l, 1994). In (Poesio, To Appear) it is proposed tha t th is common form of anaphora resolution results in hypotheses `coindexing’ links between NP utterances. We assume a coindex predicate between micro conversational events to represent coindexing. For example, hypothesizing th at utterance mce12 of pronoun Sie, part of utterance of NP udb12, refers to the `orange screw’ mentioned in utterance udb3, would be tantamount to updating the discourse situation with the information in (3.5.3):

20

(3.5.3) [ | coindex(udb12, udb3)] Inferring such links licenses a second type of semantic operation, PRO-RES, th a t determines sem(udb12) on the basis of sem(mce12) and the discourse entity introduced by udb3—which, again following (Poesio, To Appear), we assume to be specif ied by a function called δ: 13 (3.5.4) [ | δ(udb3) = y1] (Remember th at in Compositional DRT, discourse referents are constants.) We can now specify PRO-RES. This semantic operation hypothesizes th at sem(u1), where u1 is an NP utterance whose spec is a pronoun utterance u3, is obta ined by apply ing sem(u3) to a property restricting the interpreta tion of y to be identical with th a t of the discourse entity z introduced by the antecedent u2 of u1: [PRO-RES (Version 1)]: NP(u1), NP(u2), coindex(u1,u2), δ(u2) = z spec(u1) = u3, Pro(u3), sem(u3) = λP. λP’. ( [y|]; P (y); P ‘(y)), δ(u2) = z |sem(u1) = sem(u3)( λx. [ | x = z] ) In the example we are discussing, assuming tha t pronoun resolution identif ies MCE mce3, the utterance of eine in(3.4.5), as the antecedent of udb12, PRO-RES assigns to udb12 the result of apply ing sem(mce12) to a property restricting the interpretation of y to be identica l with th a t of discourse entity y1 introduced by mce3: (3.5.5) [ | sem(udb12) = ( sem(mce12))( λx. [ | x = y1])] = (after beta conversion) [ | sem(udb12) = λP’. ( [y|]; [| y = z]; P ‘(y)) ] We will stick to th is formalization of pronoun resolution in the rest of the paper; however, there are lots of problems with the translation of pronouns in (3.5.1). The main problem is th a t the semantic va lue of Sie in (3.5.1) is the same as the translation of indefinite articles like eine or einem discussed in the previous section (e.g., Figure 3.4.1), which seems clearly undesirable; e.g., such a translati on would not expla in why pronouns, but not indefinites, are subject to definiteness constra ints: cfr ??There was him in the garden (Reuland & ter Meulen, 1987). A translation tha t would make pronouns semantica lly definite can be derived from the proposals by Loebner (1987). According to Loebner, what a ll definites have in common is th a t they are terms – i.e., functions th a t may take a different number of arguments, but all h ave a value of type e. Thus, for example, the proper name Jack would have as translation the (0-argument) function ι x. (x = j), whereas the definite description the pope would have as translation the 1-argument function λs. ι x. (x = pope(s)(x)), tak ing a situational or temporal argument s. 14 A functional translation for Sie a long those lines is shown in (3.5.6) (we assume th at the other properties of utterance mce12 are just as in (3.5.1)): (3.5.6) sem(mce12)= λP. λP’. ( [y|y = ι x. Q(x)]; P (y); P ‘(y)) W here Q is some contextual property restricting the interpretation of y. The problem is: what is the restriction Q? When th is kind of translations are suggested for pronouns (e.g., SCHUBERT, ?? OTHER REFS ??), two types of predicates are generally used: a predicate requiring an entity with a particular gender (e.g., female(x) for Sie), or a predicate requiring a ‘most salient’ entity. The problem is th a t empirica l evidence shows 13

A very similar function to δ is called dr in Muskens, 1996. We could either assign to definites a translation of type e and assume type raising takes place when needed, or a determiner translation – as it doesn’t matter for our purposes, we will follow the second route here. We could also use axioms like (3.5.2) to specify a Definiteness constra int on utterance of definite NPs. 14

21

th a t pronouns are often used both in contexts when there is more th an one entity of a particular gender, as in `Blair is the only British politician who looks like a world-class leader’, said Mike from Liverpool. Well, maybe he does from a distance. (The Guardian, April 26 th , 2005.) where both Bla ir and Mike are males). Pronouns can also be used in contexts where no entity is more salient, or to refer to an entity wh ich is not most salient (e.g., (Poesio et al, 2004). One type of property th at does restrict the interpreta tion is an identity property of the type we used to formalize the results of pronoun resolution while discussing the first possible type of translation. This suggests th a t an appropriate translation for Sie might be as follows, in which the pronoun is again treated as a determiner, which syntactica lly is probably appropria te, but the property P obtained as the result of resolution is now used to specify the functional interpretation of y: (3.5.7) sem(mce12)= λP. λP’. ( [y|y = ι x. P (x)]; P ‘(y) ) Wit h a semantic translation of th is type, there is no need to change the proposal sketched above concerning the way pronoun resolution affects the interpretation of udb12. After pronoun resolution has determined th at the antecedent for NP utterance udb12 is mce3, discourse entity y1, the interpretation for udb12 is be obta ined by means of PRO-RES, which aga in assigns to udb12 the result of apply ing sem(mce) to a property restricting the interpretation of y to be identica l with th a t of y1: (3.5.8) [ | sem(udb12) = ( sem(mce12))( λx. [ | x = y1])] = (after beta conversion) [ | sem(udb12) = λP’. ( [y| y = ι x. [| x = y1]]; P ‘(y)) ] The last two types of translation we will consider are based on the assumption tha t the meaning of pronouns is underspecified. A translation of th is type has been proposed by Kamp and Reyle (1993), whose translation involves ‘underspecified identity conditions’ for the pronoun, of the form [ | x = ?]. Kamp and Reyle do not provide a semantics for such a conditions, but one could be provided either in terms of a genuine `logic of ambiguity’ as done in (Poesio, 1996), or, more simply, by abstracting over possible va lues or properties, in which case th is type of translation would reduce more or less to the translation proposed above. A `radically underspecif ied’ interpretation for pronouns, based on an examination of the psychologica l evidence concerning pronoun interpretation, was proposed in (Poesio, 2001). Psychological evidence such as (Corbett and Ch ang, 1983; Gernsbacher and Hargreaves, 1989) suggests th a t pronoun interpretation involves the generation of multiple hypotheses in parallel. Given Frazier and Rayner’s results on the differences between the interpretation of polysemy and homonymy in lexica l access, the findings above suggest th a t pronoun interpretation is more similar to the interpretation of homonyms than to the interpretation of lexica lly polysemous expressions : i.e., th a t the a im of pronoun interpretation is not to refine an initia l interpretation such as those discussed above, but to provide one. In the case of pronouns, th is can only mean tha t the lexical interpretation of pronouns is empty, and tha t the interpretation comes entirely from contextual reasoning: in other words, th at operations like PRO-RES do not use the semantic translation of pronoun utterances like sem(mc12) to determine the meaning of NP utterances like udb12, but th a t th is meaning comes entirely from coindexing links between MCEs. The simplest way to formalize wha t such a version of PRO-RES would do is to assume a type e translation for pronouns: [PRO-RES (Version 2)]: NP(u1), NP(u2), coindex(u1,u2), δ(u2) = z spec(u1) = u3, Pro(u3), |sem(u1) = z We will provisionally assume th is type of interpreta tion in what follows.

22

3.5.2

Incremental accessibility

The account of pragmatic accessibility implemented using the axioms AX-PREV and AX-DOM discussed above expla ins what makes the discourse entities introduced by a speech act accessible from with in the content of a speech act related to the first by prev or dom. However, it doesn’t expla in what makes such discourse entities accessible from an anaphoric expression like “sie” in utterance 2.1 of the example dia logue at the moment in which the expression is encountered, when the rela tion of th at anaphoric expression with the existing intentional structure may not have been already determined. The explanation of the `incrementa l’ accessibility of discourse referents (i.e., what makes discourse referents accessible as the interpreta tion of an anaphoric expression before the core speech act of whose content the interpretation of th at anaphoric expression is part h as been fully interpreted) relies on three hypotheses. The first hypothesis is th a t the participant in a conversation starts hypothesizing tha t the utterances being produced by the other cp are part of a new contribution to the conversation in Clark and Schaeffer’s sense (i.e., part of the rea lization of a core speech act) as soon as they are perceived, and even before the type of the new core speech act has been identif ied. In our example dia logue, for example, Cnst immediately realizes th a t the micro conversational events in 2.1 are part of the rea lization of a new core speech act, even when she does not immediate ly know what type of core speech act th a t is. The second hypothesis is th a t the conventional meaning K of the utterance u th at generates a core speech act ce with content K’ has to be satisfied by all and only assignments th a t satisfy K’. We call th is second hypothesis AX-GEN: [AX-GEN] Let p be a core speech act type. Then ∀ ce, u, A,B, K,K’, f, g [ | sem(u)=K, ce: p(A,B,K’), generate(u,ce)] (f,g) → (∀ j,k K(j,k) ↔ K’(j,k)) Our th ird hypothesis is th a t the constituency relati on ↑ and linear precedence rela tion between constituents of a phrasal conversational event affect accessibility just as dom and prev do, although not quite in the same way. Our hypothesis about the way accessibility ‘flows’ through the structure of micro conversational events are pretty standard in dynamic semantics (REFERENCES) and are illustrated in Figure 3.5.1. Let us assume tha t u’ and u’’ are the two subconstituents of u (u’ ↑ u and u’’ ↑ u), and tha t the conventional meanings of u, u’ and u’’ are α〈… π〉, β〈… π〉, and γ〈… π〉, none of which needs to be of type π (the type of propositions, (Muskens, 1996)), but a ll of wh ich are of a type 〈… π〉 th at will yield an object α’π , β’π , and γ’π of type π (the type of propositions) after zero or more applications (Muskens, 1996). Let us further assume th a t β’π updates discourse referents x1 .. xn , whereas γ’π updates discourse referents y1 … yn . The dynamic update effects of u, u’, and u’’ are then as follows: (i) the dynamic update effect of u is the union of the dynamic update effects of u’ and u’’ (e.g., the dynamic update effects of an utterance of A friend published her latest article in Science, [f,a,s], are the union of the upda tes from A friend, [f], and from published her latest article in Science, [a,s]; and (ii) the dynamic update effects of the left utterance constituent u’ are ava ilable while processing the right utterance constituent u’’— e.g., the update resulting from A friend, [f], is ava ilable while processing published her latest article in Science.

23

u u’ u’’

i u i’’ sem(u)= α

i u’ i [x1 ..xn]i’ sem(u’)=β

i’ u’’ i’ [y1 ..yn] i’’ sem(u’’)=γ

Figure 3.5.1: micro-dynamics

The constraints on input / output assignments of mces are expressed in the following axiom AX-DYN-MCE, requiring tha t the assignments satisfying these propositional objects α’π , β’π , and γ’π are such th at if α is derived by applying β to γ, for every pair of assignments 〈i,j〉 th a t satisfies α’π , , there must be an assignment k such th at i ⊆ k and k ⊆ j and such th a t 〈i,k〉 satisf ies β’π , and 〈k,j〉 satisfies γ’π . We call th is second hypothesis AX-DYNMCE, . [AX-DYN-MCE] ∀ u, u’, u’’, f, g, [ | u’ ↑ u, u’’ ↑ u, prev(u’’) = u, sem(u)= α〈τ1..τm,π〉 , sem(u’)= β〈τ’1..τ’n,π〉, sem(u’’)= γ〈τ’’1..τ’’p,π〉](f,g) ↔ (∀ i,j, x1 …. xm, α( x1 )(x2 )…. (x m )( i,j) ↔ (∃ k, ,y1 … yn , z1 .. z p, i ⊆ k ∧ k ⊆ j ∧ β (y1 )… (yn ) (i,k) ∧ γ( z1 )…( zp ) (k,j))) (The reason why we are not simply assuming th at the type of sem(u) is derived from application by those of u’ and u’’ is th a t in genera l more complex operations may rela te the meaning of mother nodes to the meaning of the subconstituents (Poesio, To appear) even though th is will not be the case in th is paper.) Th is axiom can easily be generalized to the case when u has a single constituent, or more th an two constituents. Let us now go back to the example of anaphoric expression in the example dia logue, “Sie” (udb12) referring back to “eine orangene Schraube mit einem Schlitz” (udb3). All of the analyses of the semantics of pronouns we considered require the discourse referent y1 introduced by udb3 to be accessible from udb12. Let us see how our three hypotheses ensure th a t th is is the case. Our first hypothesis is th at at some point while processing the MCEs th a t are part of 2.1, mce10 … mce12, ( Und steckst Sie …. ), Cnst hypothesizes th at Inst is starting a new contribution (performing a core speech act), without necessarily recognizing immediate ly what type of speech act th is is (see (3.3.1); the inference process will be analyzed in detail in Section 5): (3.5.9) [ ce3| prev(ce3) = ce1] AX-PREV ensures tha t the content of the directive ce1 jointly performed by Inst and Cnst via utterances 1.1-1.3 are accessible wh ile performing this second core speech act ce3 preceded by ce1. AX-PREV constra ins the interpretati on of the content K’ of ce3, forcing the interpretation of y1 to be that assigned while interpreting the content of ce1. Cnst will a lso

24

hypothesize th at the three MCEs in question are part of the performance of a locutionary act generating ce3: (3.5.10) [u3 | generate(u3,ce3)] As a result, Cnst will be able to conclude th a t sem(u3) is constra ined by AX-GEN; and because of AX-DYN-MCE, the sem va lue of any utterance which is part of the performance of u3 will be constra ined to assign to any discourse entity (namely, y1) the same va lues assigned while interpreting ce1. 3.5.3

An alternative explanation of incremental accessibility: resource situations

A different solution to the problem of accessibility of discourse referents while incrementa lly interpreting anaphoric expressions can be developed on the basis of the hypothesis from (Poesio, 1993; Poesio, 1994a, 1994b) th a t anaphoric expressions receive their interpreta tion from a resource situation (Barwise and Perry, 1982; Cooper, 1996). In those early versions of the proposal, the idea of resource situation was implemented using Episodic Logic (Hwang and Schubert, 1993), a version of Situation Theory with many similarities to th a t proposed by Kratzer (1989). Poesio and Muskens (1997) proposed a revised implementation of th is idea, using CDRT’s assignments instead of situations. We develop here an even simpler version of the idea, based on the assumption that all anaphoric expressions – in fact, all expressions, potentia l ly—conta in an implicit variable over DRSs. 16 Th is solution involves a small revision of the hypoth esis discussed above about the semantic of pronouns based on the ‘functional’ interpretation of definite NPs due to Loebner (1987), and where pronouns are given the type of determiners. Instead of the lexical semantics seen in (3.5.8), repeated here for convenience: (3.5.8) sem(“Sie”)= λP. λP’. ( [y|y = ι x. P (x)]; P ‘(y) ) We could hypothesize instead th a t all anaphoric expressions conta in an implicit variable over contexts, where contexts are here viewed here as DRSs, and it is th is variable th at supplies the va lue for the discourse referent. (Cfr. Ginzburg and Cooper for a different but rela ted formalization of the idea of `context’.) (3.5.8’) sem(“Sie”)= λK λP. λP’. ( [y|y = ι x. K; P (x)]; P ‘(y) ) Notice how the DRS K provides the context in which to interpret y. As before, we assume th a t P gets replaced by λx ([ | x=z]) when y is ‘resolved’ to z, but now ‘resolution’ involves identify ing both the resource situation and the discourse referent (or the property th at makes the referent for the definite unique (Poesio and V ieira, 1998; Abbott, XXXX). Note a lso th a t K here is used presuppositionally: crucia lly, the translation is not of the form (3.5.8’’) sem(“Sie”)= λK λP. λP’. ( K; [y|y = ι x. P (x)]; P ‘(y) ) which would make K part of the content of the core speech act to whose rea lization the utterance of “Sie” belongs. If we adopt th is hypothesis, in the example dia logue two interpretive steps would be required for “Sie” in 2.1. First of all, the resource situation would have to be identified; looking at (3.3.1), the only candidate would be K1. (See (Poesio, 1993; Poesio, 1994) for default inference rules for identify ing resource situations.) In the framework proposed in (Poesio, to appear), interpretive steps of th is sort result in new phrasa l utterances being introduced: i.e., a new NP utterance udb12’ is introduced, of which udb12 in (3.5.1) is a constituent, and whose sem value is obta ined by applying sem(udb12) to K1: (3.5.11) [udb12’| sem(udb12’)= λP. λP’. ([y|y = ι x. K1; P (x)]; P (y); P ‘(y)), udb12 ↑ udb12’, NP(udb12’)]; 16

This version of the resource situation idea is very similar to th a t being developed in para llel by Ginzburg and Cooper (2004).

25

W h ile th is proposal has a number of advantages, it also has a few problems: it assumes a p-underspecif ied interpretation of anaphoric expressions, and uses a more complicated type for definites. For th is paper, therefore, we stick to the underspecif ied interpretation discussed above. 3.6

Grounding and Discourse Units

A second distinguishing feature of PTT is th a t, unlike other theories of interpretation in context, it does not rely on the assumption tha t every utterance automatically becomes part of the common ground; instead, it includes an explicit formalization of the GROUNDING process (Clark and Sch aefer, 1989; Traum and Hinkelman, 1992; Traum, 1994). Many of the utterances in the example dia logue were most likely intended to play a role in th is process; indeed, completion themselves can be viewed as a particularly explicit form of acknowledgment, as we will see. The formalization of grounding developed in PTT is based on Clark and Schaefer’s proposal (as modif ied by Traum (1994)) th at a conversa tion consists of a series of CONTRIBUTIONS which may then be ACKNOWLEDGED , possibly implicitly (thus becoming part of the common ground) or may need further CLARIFICATIONS and REPAIRS. Poesio and Traum assume th at the participants in a conversation mainta in an INFORMATION STATE th a t consists of three main parts: A priva te part, with information ava ilable to the participant, but not introduced in the dia logue. This part includes private beliefs and intentions of th a t participant, as well as hypotheses about beliefs and intentions of other agents. A public part, called G, with the information tha t is assumed to be part of the common ground. A semi-public part, consisting of the information introduced by contributions which haven’t yet been acknowledged. Th is information is not yet grounded, but is accessible. Following (Traum, 1994), in PTT we use the term DISCOURSE UNIT (DU) to refer to a contribution; hence, we see the semi-public part of the information state consists of a set of discourse units. And again as in (Traum, 1994), it is assumed in PTT tha t the grounding process is controlled by a particular type of dia logue control acts called GROUNDING ACTS: every utterance is interpreted as either Initiating a new DU, Continuing a DU, Acknowledging a DU, or performing a Repair. Brief ly speaking, grounding works as follows. At any point in a conversation a conversationa l participant may begin a new contribution, i.e., initiate a new DU; th is new DU gets added to the semi-public part of the information state. This is the case for example with 1.1 or 2.1 in the example dia logue. In 1.2, Cnst simultaneously acknowledges the part of the new contribution which has been a lready been introduced, grounding it (i.e., adding its contents to G), and adds new materia l to the contribution. (Notice th a t the acknowledgment here is implicit.) In 1.3, Inst performs what Clark and Wilkes-Gibbs called a Refashioning of the contribution—adding new materia l wh ich may also lead to a revision (e.g., to the choice of a new screw). (In PTT, th is type of operation is viewed as a type of Repair.) Finally, Cnst Acknowledges the remaining part of the contribution, which then is fully grounded, and accepts the directive specified by the full contribution.17 The accept itself is viewed in PTT as a second contribution. In 2.1, Inst (implicitly) grounds the accept by initia ting a new contribution (a new directive).

17

Notice th at in PTT Acknowledgments—a type of grounding acts—and Accepts—a backward act resulting in the speaker’s assuming the obligation to perform a certa in action—are distinct.

26

As in (Poesio and Traum, 1997; Matheson, Poesio, and Traum, 2000), we assume tha t discourse units are dynamic propositions about the discourse situation, i.e., DRSs conta ining the type of information about conversationa l events (whether micro conversational events, core speech acts, and other types of dia logue acts) th a t we have discussed in the rest of th is section. In fact, in recent work we adopted the radica l position th a t each of the updates to a discourse situation we discussed earlier in th is section constitutes a discourse unit, but here we will stick with the position adopted in (Poesio and Traum, 1997), in which all the updates of the discourse situation related to the process of grounding a particular core speech act are part of the same discourse unit. In Poesio and Traum (1998), it was also proposed th a t both G and the information state itself are DRSs. For example, in the version of the theory developed in (Poesio and Traum, 1997; Matheson, Poesio and Traum, 2000), the information state after interpreting and grounding (i.e., adding to G) the first contribution in (3.1.1), and after interpreting the second sentence (i.e., creating a DU DU2 for it) but before grounding it, would be as follows: (3.6.1) [DU1,DU2| DU1= [ ce1, K1| K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], u1: utter(A,”there is an engine at Avon”), ce1: assert(A,B,K1), generate(u1,ce1), dom(ce1) = dsp1], G= [ ce1, K1| K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], u1: utter(A,”there is an engine at Avon”), ce1: assert(A,B,K1), generate(u1,ce1), dom(ce1) = dsp1], DU2=[ce2,K2| K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], u2: utter(B,”it is hooked to a boxcar”), ce2: assert(B,A,K2), generate(u2,ce2), dom(ce2) = dsp1, prev(ce2) = ce1]] There are two main reasons for cla iming that the information state is itself a DRS. First of all, a ll grounding acts are implicitly anaphoric, in th at they refer to particular DUs, as we will see below. Secondly, in Compositional DRT the modifications to G and the DUs resulting from grounding acts can be modeled quite simply as updates to the va lues of discourse markers. For example, the effect of an Acknowledgment on the information state is formulated by Poesio and Traum (1998) as replacing the previous value of G with a new DRS wh ich is the merge of G and DU1: (3.6.2) G+= DU1 Here, however we will use a modal operator G to assert th a t a particular DU is grounded,18 for a number of reasons, one among which being tha t th is formulation is closer to th a t adopted in more recent work by Traum (REFS) in which grounding is not viewed as an a ll-or-noth ing affa ir, but as a matter of degree. Th is type of theory is more easily modeled using one or more modal operators to specify grounding. We propose therefore to formalize the state of the discourse situation resulting from the first contribution in (3.1.1) being grounded, while the second one still isn’t, as follows (ignoring micro conversational events). (The reader should compare th is version with (3.2.4) and (3.6.1).) (3.6.3) [DU1,DU2| DU1=[ ce1, K1| 18

G is related to MK discussed in Section 4.

27

K1=[x,w,s| engine(x), Avon(w), s: at(x,w)], u1: utter(A,”there is an engine at Avon”), ce1: assert(A,B,K1), generate(u1,ce1), dom(ce1) = dsp1], G(DU1), DU2=[ce2,K2| K2=[y,u,s’| boxcar(y), s’:hooked-to(u,y), u=x], u2: utter(B,”it is hooked to a boxcar”), ce2: assert(B,A,K2), generate(u2,ce2), dom(ce2) = dsp1, prev(ce2) = ce1]] If we ignore grounding at the micro conversational events, in the example dia logue, we obtain the following information state after the first contribution (the directive jointly produced by Inst and Cnst in 1.1 – 1.3) and the second (the acceptance produced by Cnst in 1.4) are grounded: (3.6.4) [DU1, DU2 | DU1 = [ce1,K1 | K1=[x,z,e|screw(x), orange(x), slit(z), has(x,z), e: grasp(Cnst,x)], ce1: directive(Inst, Cnst,K1), dom(ce1)=dsp1], G(DU1), DU2 = [ce2 | ce2: accept(Cnst,ce1) ], G(DU2), DU3 = [ce3, K2 | K2=[e’| e’:put-through(Cnst,x,y)] ce3: directive(Inst, Cnst,K2), dom(ce3)=dsp1, prev(ce3)=ce1]] The modal operator G is a stronger form of mutual knowledge. We will not provide a full axiomatization here, but we will assume the following: [ AX-G-1] ∀ DU G(DU) → MK(DU) In previous work on PTT, the grounding acts were recorded in the discourse situation just like other types of dialogue acts, which however lead to an interesting problem: expla ining why information about the occurrence of grounding acts was unlike other information in the discourse situation in tha t it never seemed to require grounding. The problem, already present in Clark and Schaeffer’s work, was resolved in Traum (1994) and Poesio and Traum (1998) by introducing a simple fix: grounding acts were themselves automatically grounded. Here, we will adopt a different solution: `grounding acts’ are simply operations on the information state. Specif ically: i. Performing an Init(DU) simply means introducing a new DU in the information state; ii. performing a Cont(DU) means adding to an existing DU; ii i. Acknowledging a DU means asserting G(DU); iv. Repairing a DU means replacing th at DU with a second one. (The first DU gets removed from the list of DUs to be grounded, and replaced by the second.)

28

3

Completions and shared plans

[TO BE REVISED ONE LAST TIME AFTER GOING THROUGH SECTION 5] We developed two analyses of the example dia logue. We begin by presenting what we th ink of as a more mainstream, ‘intentional’ analysis based on hypotheses about the role of intentions and cooperation in communication developed in the AI, linguistics, ph ilosoph ica l, and psychologica l literature over the past th irty years. We then discuss a second analysis based on the more recent proposals developed by Garrod and Pickering on the basis of recent psychological results about interpretation and production in dia logue (Pickering and Garrod, 2004). In th is section and the next we’ll discuss the intentional analysis; the alignment analysis is presented in section 6. 3.1

Coordination, shared intentions, partial shared plans: a look at existing paradigms

3.1.1

Assumpions: intention, cooperation, coordination, discourse plans

Most work on dialogue in Artif icia l Intelligence and Ph ilosophy in the last th irty or so years has been based on three main assumptions. The first assumption is th a t communication involves a great dea l of intention recognition: the production of utterances is motivated by (implicit and explicit) intentions, and in order to communicate felicitously agents must recognize other agents’ intentions even when they do not intend to help these agents to achieve them. Th is hypothesis, generally associated with Grice (Grice 1969, Grice 1975, later revised in Grice 1991), has been the foundation of most theories of dialogue in Artif icia l Intelligence –e. g., of the work of Allen and Perrault (1980), Cohen and Levesque (1990a, 1990b), Grosz and colleagues (Grosz and Sidner, 1986; Grosz and Kraus, 1990) and Sadek (Sadek et a l, 1994)— in Philosophy— e.g., in work by Bratman (1992) or Tuomela (2000)—and Psychology (Clark, 1992, 1996; Bara, XXXX) –and was thoroughly examined in the seminal book Intentions in Communication (Cohen, Morgan and Pollack, 1990). 19 The theories of intentions developed in AI are typica lly also concerned with how such intentions can be ach ieved: hence many such theories, and particularly those of intention recognition, are formulated in terms of plans to ach ieve a particular goal. Our own account wil l be formulated in th is way, as well. The hypothesis th a t intention recognition is central to communication is usually supplemented by two further hypotheses. Much work on dia logue in Artif icia l Intelligence makes the further assumption tha t at least in some contexts, communicating agents do not simply recognize other agents’ intentions; they are also cooperative, in the sense th a t they attempt to help other agents’ ach ieve their intentions even when they are not explicitly expressed. E.g., a genuinely helpful clerk at a ticket counter in a train station will not simply answer the question “From which pla tform does the tra in to Montrea l leave” by saying the platform, if he / she knows th at the tra in has been cancelled. This view is centra l both to Herbert Clark’s theory of dialogue and to Allen’s and Sadek’s theories of intention recognition in dialogue systems (Allen and Perrault, 1980; Sadek, 1992). The theory of dia logue developed by Herbert Clark and summarized in (Clark, 1992, 1996) is based on the view th at conversation is a form of joint activity just like playing football or play ing in an orchestra: i.e., driven by joint intentions and in which it is necessary for agents to coordinate. Clark uses “joint activities” as a foundational stratum on which coordination on content or process, on setting up common ground, on signalling, establish ing (complex) joint projects, on communication using para lle l tracks etc. is then based. Clark defines joint project quite simply as “… a joint action, projected by one of its 19

In recent years, the role of intentions in communication has become of interest to developmental psychologists (REFS) and neural scientists (Bara et a l, 2004), as well.

29

participants and taken up by the others” (Clark (1996), p. 191). The notion of joint project or shared plan has been further developed in the Artif icia l Intelligence, especia lly by Grosz and Sidner (1990) and by Grosz and Kraus (1996). Taken together, these three assumptions lead to the view on which our first analysis is based: th at wha t happens in dialogue, particularly task-oriented dia logue, can be expla ined in terms of the joint intentions of dia logue participants and the shared plans developed to ach ieve them. In the case of the construction dia logues in the BTPC, th is view can be summarized as follows. Inst and Cnst have both shared and private domain plans. Inst and Cnst’s private domain plans overlap to a certa in degree with the shared plan; the difference between the shared plan and the priva te plans may lead to discrepancies and negotiations (e.g., as when Inst adds more deta ils to Cnst’s proposal in 1.3). Inst’s private domain plan is a fully specif ied plan for building the toy airplane (either instructions or a model). The shared domain plan is a partia l plan to build a toy airplane, which at the beginning of a dia logue is virtually empty, but gets progressively refined through the construction dialogue. (We give more details about th is below.) Cnst’s priva te plan is a refinement of the shared domain plan, likely to include at least local further specifications based on expectations. Crucia lly, dia logue involves shared plans both at th e domain level (how to build a toy airplane) and at the discourse level (how to convey a particular intention or plan) (Litman and Allen, 1990). For example, Inst and Cnst also share a discourse plan: th at the conversation will consist of a series of instructions by Inst to Cnst aiming at building a toy a irplane according to Inst’s model. Th is view is clearly formulated, e.g., in the following cita tion from Grosz and Sidner ((1990), p. 418): Discourses may exhibit two types of collaborative behaviour: collaboration in the domain of discourse [...] and collaboration with respect to the discourse itself. Although we cannot yet define [...] “collaboration with respect to a discourse”, it includes not only surface collaboration (such as coordinating turns in a dialogue) or use of appropriate referring expressions ([...]) but also collaborations related to the discourse purpose. For example, the participants collaborate to ensure that the utterances of the discourse itself provide sufficient information to make possible the satisfaction of the discourse purpose. Completions and continuations are viewed by, e.g., Clark as some of the best evidence for cooperative behavior in dialogue (Clark, 1996, p. 238). In terms of shared plans, Cnst’s completion in 1.2 indicates th a t Cnst has recognized Inst’s intentions both at the domain level and at the discourse level, and one possible explanation (although not the only one, as we will see below) for her decision to complete is th a t she is being cooperative. Our first, `intentional’ account of Completions and Continuations is therefore based on the Sh ared Plans hypothesis. We follow here what is perhaps the best-known development of th is idea, due to Bratman in his work on SHARED COOPERATIVE ACTIVITY (SCA) (Bratman, 1992) and further refined by Tuomela (2000). We briefly introduce these two theories next. 3.1.2

Bratman

Arguably the most influentia l modern account of intentions (priva te and shared) is th a t proposed by Bratman (1992). Bratman’s theory a ims at providing a formalization of the notion of “shared cooperative activity” (SCA, p. 327) th a t underlies actions characterised as “being done together” like singing a duet, painting a house or starting an a ttack in a basketba ll game. As we will see in the rest of th is section and the next, we th ink th a t a strong case can be made th at completions and continuations are also examples of shared cooperative activity. Bratman identifies three main characteristics of SCAs: mutual responsiveness, commitment to the joint activity and commitment to mutual support. The conditions tha t according to Bratman must hold in order for us to be said to perform a shared cooperative activity (SCA) J are listed in (4.1.1). First of all, we need individual intentions to J

30

together ((1)(a)( i) and (1)(b)(i)). Secondly, it must be the case th a t you and I develop these intentions because of the other’s intentions ((1)(a)( ii), (1)(b)( ii)). Th ird, (1)(c) is intended to exclude forced cooperation, e.g. the “Mafia sense” (1992, p. 333) of doing someth ing together. Fourth, the intentions must hold for the match ing subplans involved and they must be stable. Finally, we must mutually know (1). (4.1.1) Our J-ing is a SCA only if (1)(a) (i) I intend tha t we J. (ii) I intend tha t we J in accordance with and because of meshing subplans of (1)(a)( i) and (1)(b)( i). (b) (i) You intend tha t we J. (ii) You intend tha t we J in accordance with and because of meshing subplans of (1)(a)( i) and (1)(b)( i). (c) The intentions in (1)(a) and in (1)(b) are not coerced by the other participant. (d) The intentions in (1)(a) and (1)(b) are minimally cooperatively stable. (2) It is common knowledge between us th at (1). Notice th a t th is first part of Bratman’s characterization of an SCA already captures some key elements of the `intentional’ view of dia logue discussed in the previous section: commitment to joint activity, meshing subplans, interdependent intentions, mutual support, and connecting attitudes. Parts (ii) of (1a) and (1b) mean tha t it is not sufficient for an action to be independently intended to be a joint action by me and you for it to be a SCA; these intentions have to be causally connected. Parts (ii) a lso make explicit reference to the shared plans required to ach ieve a joint intention, via the notion of meshing subplans, described by Bratman as follows: “[...] our individual sub-plans concerning J-ing mesh just in case there is some way we could J tha t would not violate either of our subplans” (p. 332). An intention is minimally cooperatively stable if “there are cooperatively relevant circumstances in which the agent would reta in th a t intention” (p. 338). Finally, common knowledge is used in the “fixed point” sense (Fagin et al. , 1995, p. 402; Tuomela 2000, p. 78). (4.1.1) is not yet, however, a definition of SCA. According to Bratman, in order to get such a definition a further ingredient is needed: mutual responsiveness. Mutual responsiveness amounts to the following: “In an SCA each participating agent attempts to be responsive to the intentions and actions of the other, knowing tha t the other is a ttempting to be similarly responsive” ( p. 328). The complete definition of SCA is provided in (4.1.2). (4.1.2) For cooperatively neutral J, our J-ing is a SCA if and only if (a) we J (b) we have the attitudes specif ied in (1) and (2), and (c) (b) leads to (a) by way of mutual responsiveness (in the pursuit of our J-ing) of intention and in action. The idea of mutual responsiveness, or more precisely, its development by Tuomela into the hypothesis th at participants in a SCA may perform ‘unrequited contributory actions,’ provides an explanation for the performance of completions. The role played by Bratman’s notion of SCA in an intentional treatment of completions and continuations in the example dia logue can be illustrated as follows. The relevant J-ing at the stage of example dia logue (1) just preceding the completion in 1.2 is th a t Inst and Cnst we-intend tha t Cnst fix the 7-holes-bar to the fuselage construction according to Inst’s directives. The intended and mutually known meshing sub-plans are the domain plan for building the toy plane in front of Inst, and the discourse plan th a t Inst give the directives and Cnst carry out the actions. (Both plans are discussed in more deta il in Section 4.2.) The necessary conditions for the action of joining wing and fuselage to count as a SCA are the following:

31

(4.1.3) Inst and Cnst joining wing&fuselage is a SCA only if: (1)(a)( i) Inst intend tha t Inst& Cnst join wing&fuselage. (ii) Inst intend tha t Inst&Cnst join wing&fuselage in accordance with and because of meshing subplans of (1)(a)(i) and (1)(b)(i). (b)(i) Cnst intend th at Inst&Cnst join wing&fuselage . (ii)Cnst intend th at Inst&Cnst join wing&fuselage in accordance with and because of meshing subplans of (1)(a)(i) and (1)(b)(i). (c) The intentions in (1)(a) and in (1)(b) are not coerced by the other participant. (d) The intentions in (1)(a) and (1)(b) are minimally cooperatively stable. (2) It is common knowledge between Inst & Cnst th a t (1). The definition in (4.1.4) specifies the necessary and suff icient conditions for the action to be an SCA by adding the mutual responsiveness requirement: (4.1.4) For cooperatively neutra l joining of wing&fuselage, Inst&Cnst joining wing&fuselage is a SCA if and only if: (a) Inst&Cnst join wing&fuselage (b) Inst and Cnst have the attitudes specified in (1) and (2), and (c) (b) leads to (a) by way of mutual responsiveness (in the pursuit of Inst &Cnst’s joining of wing&fuselage) of intention and in action.. We’ll discuss the meshing subplans at stake in some deta il in Section 4.2. 3.1.3

Tuomela

Bratman’s (1992) concept of shared cooperative activity is not formulated in terms of a logic. Severa l such formalizations appeared, most famously by Cohen and Levesque (1990a, 1990b); our own use of we-intention (IntInst&Cnst ) in the rest of the paper will be based on the reconstruction of Bratman’s theory developed by Tuomela (2000, p. 385). According to Tuomela, there are severa l problems with Bratman’s (1992) account of plan-based joint action. One problem T. identif ies is th at the meshing requirement should not be integrated into the content of we-intention. Rather, we should take it as an enta iled conceptual presupposition along the lines of ‘If we have a joint intention to see to it th a t p, then as a rule we also have meshing subplans concerning p’. In addition, according to Tuomela, condition (1)(c) is too strong: coercion should be admitted as long as a subject’s intentional agency is not completely by-passed. His stripped-down characterization of the we-intention going into the construction of the wing&fuselage-join at the point in which the first completion takes place can be informally characterized, in first approximation, in the following way: (4.1.5) Inst and Cnst we-intend that Cnst join wing and fuselage is equivalent to: It is Inst’s and Cnst’s mutual knowledge20 th at Inst intends that Cnst join wing&fuselage because Cnst intends tha t Cnst join wing&fuselage; and Cnst intends tha t Cnst join wing&fuselage because Inst intends that Cnst join wing&fuselage The two clauses cover much of clauses (1a) and (1b) of in (4.1.3), the because providing the ‘meshing’ required by Bratman, but without explicit mention of plans and of coercion. In

20

In many contexts mutual knowledge will be too strong. We will however stick to mutual knowledge for the sake of th is paper.

32

Tuomela’s notation (4.1.5) is expressed as follows, where ‘because’ gets translated by the reason relation /r, which is factual in the sense tha t X/rY → X & Y: (4.1.6) IntInst&Cnst(join(Cnst, W&F)) ↔ MK((IntInst join(Cnst, W&F)) /r IntCnst (join(Cnst, W&F ))) and (IntCnstjoin(Cnst, W&F)) /r IntInst (join(Cnst, W &F)))) According to (4.1.6) the we-intention IntInst&Cnst th a t Cnst join W&F amounts to mutual knowledge th a t each agent’s intention th at Cnst join W&F is caused by the other agent’s intention th at th is be done. We said above th a t agents’ disposition to helping, exemplif ied by completions, is meant to be covered in Bratman’s proposal by the notion of mutual responsiveness which, however, is not fully developed. Tuomela (2000, p. 107) provided a more explicit explanation of the fact th a t Inst gets extra help by Cnst in formulating a directive for which there can’t be a SCA yet. To do th is, Tuomela developed an alternative to Bratman’s definition of IntInst&Cnst. This new definition is extremely complex and requires several pages, which does not allow us to discuss it here fully (see Appendix B); however, we will notice th at it conta ins the following condition (2000, p. 95): “[...] the participants are also assumed to be disposed willingly to perform unrequired contributory actions [our ita lics], thus being disposed to incur extra costs (th is being rational as long as the costs of performing them are less th an the gross gains accruing from their performance)”. The notion of unrequired contributory action underlies our ‘intentional’ account of completions. Briefly, the idea is th a t if Cnst and Inst h ave we-intentions and the associated shared plans, Cnst can infer th at a screw is needed at the point of the dia logue we are analyzing. She is therefore able to produce an unrequired contributory action by producing a completion, if circumstances demand it. We will return on th is shortly. Tuomela’s formalization of helping is of course related to the traditional notion of cooperativeness, as formalized, e.g., in Cohen and Levesque’s ‘Cooperative’ axiom (Cohen and Levesque, 1990b) or in Sadek’s theory (Sadek, 1992). The distinctive aspect of Tuomela’s notion of help is th a t it is embedded into joint intentions and actions. Tuomela argues tha t participants acting on the basis of a cooperative attitude must be disposed to “strong” helping (p. 249).. Helping in the “full” sense means ‘helping in all circumstances in which help is contributive to the others’ part-performances’. It is exactly reliance on partperformances (making up a SCA eventually) wh ich distinguishes Tuomela’s notion of help from Cohen and Levesque’s (1990b) and Sadek’s (1992), which capture tak ing over of some agent’s intention. In the rest of the paper, we will use Tuomela’s IntInst&Cnst predicate to express weintentions, adopting the formalization in (4.1.6).

33

3.2

An intentional analysis of the example dialogue: first pass

In order to mainta in the complexity of presentation manageable, we present our intentional analysis of the example dia logue in two ph ases. In this section we make a first pass, concentrating on the role played by we-intentions, shared plans, and private intentions, according to the Bratman / Tuomela framework just introduced, but ignoring how utterances are incrementa lly interpreted and how propositional content is assembled. In the next section we present a fuller account considering issues such as semantic interpretation and grounding . As said above, our intentional analysis is based on the assumption that at the beginning of the dia logue Instructor and Constructor have a we-intention tha t Constructor assemble a toy airplane identical to the one tha t the Instructor has: (4.2.1) IntInst&Cnst ([ x | toy-a irplane(x), assemble(Cnst, x)]) The Instructor (henceforth, Inst) has a complete plan for assembling the toy a irplane, shown in Figure 4.2.1. The dia logue is driven by the goal of making th is plan a shared plan by discussing it with the Constructor (henceforth: Cnst), so th at Cnst can then execute the relevant actions. For the purposes of th is paper, we’ll assume tha t the case in which Inst receives no instructions, just a completed model of the a irplane, can be subsumed under th is case as well. 21

Assemble toy airplane

Assemble fuselage

grasp-5h-bar

Join wing and fuselage

Assemble wing

grasp 3h bar align wing and fuselage join(5h-bar,3h-bar, screw1)

…..

grasp(screw1)

…………………….

grasp(orange-screw-withslit)

Fig 4.2.1: Inst’s private plan before the conversation

A second assumption is th a t Inst and Cnst develop a sh ared plan to ach ieve th is we-intention of assembling the toy airplane. Th is sha red plan is initia lly h igh ly underspecified and simply specifies th a t Inst and Cnst are going to assemble a toy airplane; but as the conversation progresses, the shared plan gets progressively more refined as each subplan becomes an SCA as well. (Note th at the agreed upon actions in the shared plan are 21

An arguably more plausible view of what is happening in these conversations is th a t Inst doesn’t start by looking at the model and deriving a complete plan for its construction: arguably, he/she develops her private plan incrementa lly as well. We have not seen any evidence showing that adopting this more plausible, but also greatly more complex, model would require changing our account of completions.

34

immediate ly executed, mixing planning and execution, unlike in the TRAINS conversations, for example.) At the point in which Inst begins utterance 1.1 in (1), the state of Cnst’s assembly is as shown in Figure 2.1, repeated here for convenience as Figure 4.2.1.

Figure 4.2.1: Instructor's and Constructor's respective situations at the beginning of 1.1

Cnst has built the ta il and the rear part of the fuselage. She has already picked up a 7-hole bar which is to become the `wing’, has la i d it across the `fuselage,’ and has a ligned the wing’s and the fuselage’s holes, which will make it possible for a fixing mechanism to join wing and fuselage. Each of these steps has been acknowledged by Inst and Cnst. Given th is state of assembly, the partia l shared plan at th is point is presumably similar to th at shown in sketchy form in Figure 4.2.2. The parts of Inst’s priva te plan devoted to the assembly of fuselage and wing are now shared (in fact, they have already been executed). Now Inst and Cnst have a we-intention tha t Cnst join wing and fuselage; th is we-intention is indicated in ita lics as it is currently at the top of the agenda. (More deta ils on the shared plan in a moment.)

Assemble toy airplane

Assemble fuselage

Join wing and fuselage

Assemble wing

grasp(5h-bar) grasp( 3h bar)

align wing and fuselage

join(5h-bar,3h-bar, screw1) grasp(screw1)

……………………. Fig 4.2.2: The shared plan at the point the completion takes place At th is point in the dia logue Cnst has a priva te plan as well, a refinement of the partia l shared plan in Fig. 4.2.2. Th is priva te plan probably already conta ins the information tha t she will need to get a screw in order to join wing and fuselage, as sketch ily shown in Figure 4.2.3. However, Cnst has lots of spare parts left, above all nine screws which she can in principle use for the join as shown in Figure 4.2.4; hence, she cannot refine her own private plan any further.

Assemble toy airplane

Assemble fuselage

Assemble wing

Join wing and fuselage

35

Figure 4.2.4: Screws available to Cnst in at the point when 1.1 is uttered. The intended screw is the only orange screw with a slit. In the Tuomela-derived logica l notation introduced earlier, the currently active we-intention to join W&F can be expressed as follows: (4.2.2 ) IntInst&Cnst (join(Cnst, W,F)) As shown in (4.1.6), repeated below for convenience, Inst and Cnst having a weintention to join wing and fuselage, in Tuomela’s framework, is equiva lent to them mutually knowing tha t each of them has an intention to do th is action because of the other agent’s intention to do it: (4.1.6) IntInst&Cnst(join(Cnst, W,F)) ↔ MK((IntInst join(Cnst, W,F)) /r IntCnst (join(Cnst, W,F))) and (IntInstjoint(Cnst, W,F)) /r IntCnst(join(Cnst, W,F)))) In order to join two objects Cnst needs, first of all, to find a screw and a nut. She must then stick the screw through the holes and screw it into the nut in order to produce a

23

We are using a very simplif ied view of plans here, ignoring the distinction between preconditions –conditions th at h ave to hold in order for the plan to be executable— and steps of the plan proper. Normally, having the Nut and the Screw would be considered to be a precondition. (Th is difference is not unlike th a t between presupposition and assertion.)

36

stable join of three bars (wha t we will call here the Wing&Fuselage-join). This partia l domain plan, th a t we assume to be shared, can be summarized as follows: 23 (4.2.3) join(Agent, Bar1,Bar2,Bar3) step: step: step: step: step:

a lign(Agent, Bar1, Bar2, Bar3,Hole1,Hole2,Hole3), grasp(Agent, Screw), grasp(Agent, Nut), put-through(Agent, Screw, Hole1,Hole2,Hole3), screw-into(Agent, Screw, Nut),

As in (Asher and Lascarides, 2003), we assume tha t the rela tion between intentions and actions (here formalized in terms of plans) is non-monotonic: i.e., th a t a plan such as (4.2.3) is a default way of executing a particular action. (Asher and Lascarides formalization of intentions and actions does not however involve the notion of plans.) This can be formalized as follows, where we use an unspecified normally involves operator to state th a t the conclusions are non-monotonic:24 (4.2.4) join(Agent, Bar1, Bar2,Bar3) normally involves b & c & d & e & f where (b) align(Agent, Bar1,Bar2, Bar3,Hole1,Hole2,Hole3) (c) grasp(Agent,Screw), (d) grasp(Agent, Nut) (e) put-through(Agent, Screw, Hole1,Hole2,Hole3), (f) screw-into(Agent, Screw, Nut) In the conversations of the Bie lefeld corpus, things are a bit more complex, in two respects: (i) actions are performed by Cnst under Inst’s instruction; and (ii) actions are immediate ly executed. As it is not our goal here to provide an account either of the interleaving of planning and execution, or of the interleaving of discourse plans and domain plans, we simplify matters by assuming that in th is domain, shared plans combine discourse actions, domain actions, and execution. For the particular example under discussion, the resulting plan is assumed to be as shown in (4.2.5). According to th is plan for Cnst to join wing and fuselage, in order to jointly perform the action currently we-intended, Inst has to produce the directives triggering these actions; each of these results in actions by Cnst; and a t the end, Cnst must communicate th at she completed the actions demanded. (4.2.5) join(Cnst, W, F) normally involves b & c & d & e & f & g, where (b)

(c)

(d)

(e)

24

1. directive(Inst, Cnst, align(Cnst, W, F, Hole1, Hole2)), 2. align(Cnst, wing, fuselage, Hole1, Hole2), 3. tell(Cnst, Inst, a ligned(Cnst, W, F, Hole1, Hole2)) 1. directive(Inst, Cnst, grasp(Cnst, Screw)), 2. grasp(Cnst, Screw), 3. tell(Cnst, Inst, grasped(Cnst, Screw )) 1.directive(Inst, Cnst, grasp(Cnst, Nut)), 2. grasp(Cnst, Nut), 3. tell(Cnst, Inst, grasped(Cnst, Nut)) 1. directive(Inst, Cnst, put-through(Cnst, Screw, Hole1,Hole2)), 2. put-through(Cnst, Screw, Hole1,Hole2),

(4.2.4) is meant to be a neutra l notation which could be reconstructed in different nonmonotonic formalisms. In SDRT, the rela tion between intentions and plans would be expressed by a conditional, and normally involves would be Asher and Morreau’s > Asher&Mor reau operator. In PTT defaults are inference rules, formulated using Brewka’s (1990) prioritized version of Reiter’s (1980) Default Logic).

37

(f)

(g)

3. tell(Cnst, Inst, put-through(Cnst, Screw, Hole1,Hole2)) 1. directive(Inst, Cnst, screw-into(Cnst, Screw, Nut)), 2. screw-into(Cnst, Screw, Nut), 3. tell(Cnst, Inst, screwed-into(Cnst, Screw, Nut)) tell(Cnst, Inst, joined(Cnst, W,F).

We will assume throughout these two sections that intention recognition for Cnst amounts to recognizing the directives and performing the required actions.25 Bratman’s and Tuomela’s theories of intention make the stronger assumption tha t a we-intention to bring-about a W&F join distributes over the conjuncts: i.e., having a weintention to bring about an action enta ils we-intentions for all parts of the plan. With th is assumption, we would then also derive from (4.2.1) and (4.2.5) (c) the following: ( h) (i) (j)

IntInst&Cnst (directive(Inst, Cnst, grasp(Cnst, Screw))) IntInst&Cnst (grasp(Cnst, Screw)) IntInst&Cnst (tell(Cnst, Inst, grasped(Cnst, Screw))).

Notice th a t even with th is assumption, Inst and Cnst would still h ave different priva te plans and different priva te intentions: Inst, seeing the fully built up airplane model on his side, subscribes to (4.2.6) IntInst (directive(Inst, Cnst, grasp(Cnst, orange slit-screw))) 26 whereas for Cnst we have (4.2.7) IntCnst (grasp(Cnst, Screw)). (The difference between (4.2.6) and (4.2.7) is due to th e fact th a t grasp(Cnst, Screw) can be satisfied in Cnst’s situation in various ways due to the nine screws he has.) The point is tha t if we make the further assumption th a t IntInst&Cnst is distributive, we get (4.2.7) irrespective of any inferences due to the verbal exchange: in other words, with th is assumption, Cnst already wants to grasp a screw before Inst starts h is directive. (Cnst’s intention can be satisfied because there are screws on h is side of the screen.) The only difference between the two plans would then be that Cnst doesn’t know which particular elements have to be used. We will remain non-committa l on whether Cnst derives (4.2.7) from distributivity of we-intentions or from the directive and cooperativity. We will now consider how an intentional account expla ins what may have caused Cnst to produce a completion in 1.2, focusing at th is stage exclusively on the motiva tions for the interventions rather th an on the interpretation. Given tha t IntInst (directive(Inst, Cnst, grasp(Cnst, Screw))) is the next step in the interleaved discourse / domain plan in (4.2.5) to ach ieve (4.2.2), we may ceteris paribus assume tha t by uttering Well, now you grasp Inst has started the production of the next directive in th is shared plan. Wh a t about Cnst’s contribution? According to the mutually intended step (d) of (4.2.5), Cnst should by default wait until Inst has fully produced his directive, and only then she should do the grasping. But she intervenes. So we must expla in: a. the cause for the completion, including the information used for the intervention; b. how the old mutual plan changes, and what the new mutual plan looks like; c. the rela tion between old and new mutual plan.

25

Intention recognition could also be formalized in such a way as to bypass the directive recognition stage, and assuming simply th at Cnst’s recognizes Inst’s intention of Cnst performing a particular action. An account along these lines would not, however, differ from the one we are adopting for our purposes. 26 We are not being very precise concerning notation here. Quantificational formulae are sometimes represented as constants for convenience’s sake.

38

Let us start with the cause for the completion and the information used for the intervention. 1: 2: 3:

4:

5:

27

The step in the plan which is we-intended at th is point in the conversation is join(Inst&Const,W, F). Inst begins to perform directive(Inst, Cnst, grasp(Cnst, orange-slit-screw)). Now, before the directive is completed, and possibly even without being prompted by some sort of request for help (as we saw in section 2, there is some evidence th at in th is particular example a request for completion may have been made using prosody, but the matter is not entirely clear, and these signals are not encountered with all completions), Cnst ‘jumps in’. Clearly, Cnst has been trying to compare her expectations (based on the plan she assumes to be shared) with the actions she observes (as proposed, e.g., in the MOSAIC model (Wolpert et a l, 2003)) and she has been doing th is incrementa lly. As a result, even before Inst’s utterance of the directive is completed, Cnst has already rela ted the incomplete utterance she is observing to the next step in the shared plan, directive(Inst, Cnst, grasp(Cnst, Screw)). We will discuss in more deta il how th is can happen in the next section, in which the PTT account of incrementa l semantic interpreta tion in th is example is presented in more deta il. As we said above, in a theory in which we-intentions are assumed to distribute over the subactions of plans, recognizing th is directive is predicted to be fa irly stra ightforward, as intention (4.2.6) would already be near the top of Cnst’s agenda. Otherwise, some sort of search through the space of possible plans to ach ieve (4.2.2) must be assumed. By comparing the part of the directive th a t has already been produced by Inst with the directive she expects, Cnst can hypothesize th a t wha t’s missing so far is an utterance of the fact th a t the object to be grasped is a screw. (There is no more information about th is in the shared plan, and as there are nine screws, there are nine ways to make the plan more specif ic, so there is no way for Cnst to make the recognition more precise, except by guessing. ) We see at least four ways of expla ining why Cnst follows up th is recognition with a decision to utter “a screw”: a. Responding to a Request: Cnst may have interpreted 1.1 (more precisely, its prosodic lengthening) either as a Request to perform a Continue (DU), or as a Request to Acknowledge the DU. In either case, in PTT it is assumed tha t the result is an obligation: obl(Cnst,cont(DU)) or obl(Cnst, ack(DU)). b. Voluntary coordination-level control: Even without having interpreted Inst’s utterance as a request, Cnst may neverthe less intend to signal her understanding of the directive: i.e., to acknowledge Inst’s directive. Acknowledging a directive cannot be done simply by repeating the part a lready uttered: th is might be interpreted simply as a partia l acknowledgment of only the part of the directive already performed. Hence, Cnst acquires the (private) intention to perform the missing part of the directive (as above). However, the directive is still considered by Cnst as performed by Inst only. c. Cooperativeness: Cnst intends to help Inst to perform the directive: directive(Inst,Cnst,grasp(Cnst,Screw)). We can formalize th is as Cnst making performing the directive into a shared plan: 27

This notion of cooperativeness is different from Cohen and Levesque’s in two respects. First of all, Cnst is deriving a joint intention from a shared plan, instead of deriving a private intention from Inst’s plan. Secondly, as we will see in Section 5, Cnst is adopting an

39

IntCnst&Inst (directive(Inst&Cnst,Cnst,grasp(Cnst,Screw))). Th is leads to Cnst performing an “unrequired contributory action”: performing the part of the directive th a t is missing. As a result, Cnst assumes the private intention: IntCnst (utter(“a screw”)) d. Blurting out: Cnst feels th a t Inst is tak ing too long to perform a simple directive. As soon as Cnst recognizes the directive th a t Inst is trying to perform, the plan (in terms of utterance actions) Inst is using to do so, and what is still missing for the completion of th is plan, Cnst acquires the intention to utter the missing part, without necessarily making the directive into a joint action, and without necessarily intending to acknowledge Inst’s contribution. (Although both intentions could be there.) We feel a ll of these are va lid explanations of what may have happened, but we won’t have enough space to pursue all of them in detail. In a way, explanation d. is the most interesting, but also the most difficult to formalize; whereas explanations a. and b. are examples of the type of interaction most closely studied in previous work on PTT, particularly in (Matheson et al, 2000). In the next section we will therefore concentrate on providing a deta iled account of the Cooperativeness explanation, (c). 6: After the completion, as a result of Inst and Cnst’s coordinated action we get a complete directive: Well, now you take a screw. However, there is a difference between th is directive and the version in Inst’s private plan: IntInst (directive(Inst, Cnst, grasp(Cnst, orange slit-screw))) Inst may react to Cnst’s contribution by either a. Accepting the completion (either because the complete speech act reflects Inst’s initia l intention, or because Inst considers the new directive a va lid a lternative); b. Rejecting it, c. Performing a litera l resumption, d. Paraphrasing, e. Refash ioning it, by modifying some aspects of it.. In th is particular case, Inst seems to be performing a refash ioning (Clark and Wi lkes-Gibbs XXXX). [DEFINITION NEEDED – BUT DO THIS AFTER WE’RE DONE WITH SECTION 5. IN PTT, REFASHIONING = REPAIR ] Inst knows tha t orange slit-screw ⊂ screw; th is can be captured by the following meaning postulate: orange slit-screw(x) := screw(x) ∧ orange(x) ∧ ∃y(slit(y) ∧ with(y,x)). So he accepts some aspects of the contribution, but adds more materia l to ensure the right screw is identified. [I AM NOT SURE THAT WHAT FOLLOWS IS STILL CONSISTENT WITH WHAT WE SAY IN SECTION 5, WE’LL HAVE TO CHECK ONCE WE’RE DONE WRITING THAT UP..] 7: : Inst’s intention is of refash ioning the directive currently under elaboration into the more specific directive: intention to perform part of a plan whose execution would satisfy an obligation of Inst, instead of performing an action from scratch. (See discussion of Tuomela’s strong notion of he lping in the Appendix.)

40

IntInst (direct(Inst, Cnst, grasp(Cnst, screw(x) ∧ orange(x) ∧ ∃y(slit(y) ∧ with(y,x))))). [E.G. HERE THE PTT ASSUMPTION WOULD BE THAT WHAT GETS COMPARED IS THE UTTERANCE PLAN] Again, Inst compares the directive a lready produced with h is intended directive. A screw has already been produced; the difference between Cnst’s ∃x(screw(x)) and Inst’s ∃x(screw(x) ∧ orange(x) ∧ ∃y(slit(y) ∧ with(y,x)), is ∃x(orange(x) ∧ ∃y(slit(y) ∧ with(y,x)). We get the more economical IntInst (directive(Inst, Cnst, grasp(Cnst, orange(x) ∧ ∃y(slit(y) ∧ with(y,x))))) yielding an orange one with a slit.

-

[THIS IS PROBABLY UNNECESSARY] As for the other types of reaction, Accepting is what happens when IntInst (p) = IntCnst (q). In th is case, Yes, Mhm, OK or a literal resumption are produced. If IntInst (p) ≠ IntCnst(q) and not(p |= q) then No or other signals of denial are produced.

Notice how performing the type of reasoning discussed here requires the ability to ta lk about incomplete speech acts, i.e., a theory such as PTT. We next show how PTT a llows us to make the account just presented more precise.

41

4

An Intentional Analysis of the Example Dialogue

In th is section we give a partia l analysis of the example dialogue, utterance unit by utterance unit, 28 on the basis of the grammar given in the CDRT fragment in Appendix B. Unlike SDRT, PTT does not come with a complete theory of the inferentia l processes performed in response to a particular unit. Hence, in the analysis of the example, we will distinguish between the minimal set of inferences tha t are clearly predicted by the theory, and a second set of inferences whose results should arguably take place. 4.1

Discourse situation updates up to the point just before 1.1

We discussed in Section 4 how according to the intentional analysis, a conversation like the one from which fragment (1.1) is extracted is driven by Inst and Cnst’s we-intention to build a toy plane according to the model given to Inst, wh ich in the simplif ied view adopted here becomes the private plan in Figure 4.2.1. By way of h is instructions, Inst turns the priva te plan into a progressively more specified sh ared plan interleaving discourse and domain actions as in the example in (4.2.5). As a result of Cnst’s performing the domain actions in th is shared plan, Inst and Cnst reach the situation whose relevant aspects are depicted in Figures 4.2.1 (state of assembly) and Figure 4.2.4 (leftover screws). In PTT, performing a directive amounts to performing a contribution which, ultimately, amounts to a series of micro-updates as in (3.4.5), each of which has to be properly grounded. The discourse situation just before 1.1 is a record of both core speech acts and micro conversational events performed up to th a t point. 4.2

Production of 1.1: So, jetzt nimmst Du ….

We saw in section 4.2 th a t at th is point in the dia logue, Inst and Const we-intend to join two specific objects, th a t we will call here wing1 and fuselage1. 29 In PTT notation, and using DRSs as intentional contents, the presence of (4.2.2) in the common ground is expressed as the inclusion among the conditions characterizing the discourse situation of the following. (5.2.1) i1.1a:IntInst&Cnst ([e1| e1:join(Cnst, wing1,fuselage1)]) We a lso saw th at in the BTPC conversations, such goals are ach ieved by following plans along the lines of (4.2.5), whose next step (step (c)) is a directive by Inst to Cnst to grasp a screw. In PTT terms, performing such a directive amounts to planning (i.e. intending): (5.2.2) i1.1b:IntInst ([ce1.1,K1.1|K1.1=[e2|e2:grasp(Cnst, orange-slit-screw)], ce1.1:directive(Inst, Cnst,K1.1)]) A couple of remarks about (5.2.2). Notice, first, th a t embedded DRSs are essentia l to formulate th is intention. Secondly, notice tha t Inst’s intention is very likely to be about a specif ic screw (represented here as the constant orange-slit-screw), instead of the more general intention to grasp an x which is a screw (in (5.2.2’) wh ich can be expected to be in the shared plan. (That the intention is about a specif ic screw is suggested, e.g., by the subsequent repair.) (5.2.2’) i1.1b’:IntInst ([ce1.1,K1.1a|K1.1a=[x,e|screw(x),e:grasp(Cnst,x)], ce1.1:directive(Inst, Cnst, K1.1a)])

28

As in (Poesio and Traum, 1997, Poesio and Traum, 1998) we use the term utterance unit to refer to basic units of dialogue processing, corresponding roughly to prosodic units. 29 We remind the reader th a t in CDRT, unlike standard DRT, there are regular constants; wing1 and fuselage1 are such constants, and so is orange-slit-screw.

42

Note also th a t if we assume tha t we-intentions distribute over the plan in (4.2.5), Cnst could derive (5.2.2’) from the non-specific joint intention in the shared plan: (5.2.2’’) i1.1b'’:IntInst&Const ([ce1.1,K1.1a| K1.1a=[x,e|screw(x), e:grasp(Cnst, x)] , ce1.1:directive(Inst, Cnst, K1.1a)]) Assuming tha t intentions to perform domain actions lead to the adoption of domain plans to ach ieve such intentions via von Wright’s (REFS) Practica l Syllogism,30 (5.2.2) leads to Inst’s plan of performing an utterance generating the directive in the sense discussed in Section 3.1, example (3.1.5). To simplify matters, we assume here, as in (Poesio and Traum, 1997), th at th is amounts to intending to perform an utterance with the content of the directive as conventional meaning. (Th is intention is dominated by i1.1b, hence ce1.1 is accessible via AX-DOM, as expla ined in Section 3.3.) (5.2.3) i1.1c:IntInst ([u1.1 | utterance(u1.1), sem(u1.1) = K1.1, generate(u1.1,ce1.1)]) (5.2.3) is the starting point of the generation process. Up until now the only result of using PTT seems to have been an unnecessary complication of the representation; but at th is point, PTT’s hypothesis th a t the discourse situation conta ins micro conversational events – events of uttering sub-sententia l constituents-- begins to do some work. Th is is because the MCE hypothesis gives us a way of representing the plan th a t Inst develops to ach ieve (5.2.3). First of all, if we view utterances as actions, we can view utterance planning as the same type of activity involved in other forms of planning (the assumption being of course th a t specia lized planners – called SURFACE REALIZERS in natural language generation– would be responsible for th is task). In PTT, performing an utterance with the conventional meaning represented by K1.1 is viewed as an action. As shown in (3.4.5), the plan chosen by Inst for producing conventional meaning K1.1 in this particular occasion involves performing an utterance of type S, here called u1.1’. In turn, th is action was ach ieved by performing an utterance udb1 of type VP, wh ich in turn involved three subactions: an utterance mce1 of type V (“nimmst”), a subject utterance udb2 of type NP (“Du”), and a complement utterance udb3 of type NP (“eine orangene Schraube mit einem Sch litz”). In addition to the MCEs seen in (3.4.5), Inst also planned two further utterances. Another advantage of the micro-conversational events approach is th a t it does not require assuming tha t everyth ing th at gets uttered was planned together (i.e., as part of a big `sign’ in the HPSG sense). In the case of 1.1, for example, it is plausible to view at least the first “So,” and possibly even “jetzt”, as being intended to ach ieve distinct goals from the directive: specif ically, tak ing the turn and keeping it – i.e., as turn-taking actions (Traum and Hinkelman, 1992; Traum, 1994; Poesio and Traum, 1997). 31 The hesita tion th at may or may not have occurred at the end of 1.1 would play a turn-taking oriented goal, as well. Mainly to show how such an analysis would work, we il lustrated here th is interpreta tion for “So,” whereas we interpret the utterance of “jetzt” as an utterance of type Advp with a sentence-adjunct interpretation. Omitting many of the details a lready shown in (3.4.5), including lexica l interpretations and many sub-utterances, and using the notation u:“word“:Cat to stand for: u:utter(A,“word“), Cat(u) Inst’s plan to satisfy i1.1c in (5.2.3) becomes a plan of performing the following actions:32 (5.2.4) i1.1d:IntInst ([u1.2 , u1.3, u1.1’, mce1, udb1, udb2, udb3| u1.2: „so“:take-turn, S(u1.1) , sem(u1.1)= K1.1, u1.3:“jetzt“:Advp, S(u1.1’), 30

Intend (ϕ) Bel(χ→ϕ) → Do χ [HANNES TO ELABORATE] “So” could also be interpreted as a READY signal as in the MapTask scheme (Carletta et a l, 1997). 32 We have assumed for simplicity th at the entire utterance is planned at once. In a more plausible theory, the generation process would be incrementa l, as well. 31

43

u1.3 ↑ u1.1, u1.1’ ↑ u1.1, mce1:“nimmst“:V, VP(udb1), udb1 ↑ u1.1’, mce1 ↑ udb1, udb2:“Du“:NP, udb3:“eine orangene Schraube mit einem Sch litz“:NP, udb2 ↑ udb1, udb3 ↑ udb1]) (Again, th is intention is dominated by i1.1c.) Intentions such as i1.1d in (5.2.4) lead to Inst performing the planned action (BY WAY OF THE PRACTICAL SYLLOGISM??) This in turn is interpreted as a contribution by Cnst, as we will see in the next subsection. Inst starts executing th is intention, succeeding to perform u1.2, u1.3, mce1 and udb2. It’s not clear to us whether the lengthening observed at th is point is meant to indicate a problem – e.g., th at Inst has forgotten which screws are unused, and therefore does not know what description would be adequate (Da le and Reiter, 1995). 4.3

The completion: 1.2, “eine Schraube”

The process th at leads Cnst to produce the completion begins when she observes Inst performing the first four micro-conversational events of h is planned contribution. In the PTT account, th is leads Cnst to create a number of new Discourse Units, each of which represents “materia l to be grounded”; one `micro-DU’ per utterance event is assumed. For simplicity, we represent here Cnst’s interpretation of the utterances events in 1.1 as a single new Discourse Unit, th at we will call here CurrDU (th is is similar to what done in (Matheson et al, 2000)), whereas the previous DU is called PrevDU. The update to the discourse situation resulting from the observation of CurrDU is shown in (5.3.1). This DU minimally conta ins a record of the occurrence of each utterance, together with the results of lexica l access and (incrementa l) parsing (see (Poesio, 1999; Poesio, to appear) for deta ils). Assuming tha t there are no interpretation problems, CurrDU would be a partia l representation of the DU planned by Inst and shown in (5.2.4). The difference between the two is th at in CurrDU as produced by Cnst, udb3 hasn’t yet been observed, but is already expected: 33 (5.3.1) [ | CurrDU= [u1.2 , u1.3, u1.1’, mce1, udb1, udb2, udb3| u1.2: “so“:take-turn, S(u1.1) , u1.3:“jetzt“:Advp, S(u1.1’), u1.3 ↑ u1.1, u1.1’ ↑ u1.1, mce1:“nimmst“:V, VP(udb1), udb1 ↑ u1.1’, mce1 ↑ udb1, udb2:“Du“:NP, NP(udb3), udb2 ↑ udb1, udb3 ↑ udb1]] (5.3.1) is the starting point for Cnst’s semantic interpretation process. The extent to which semantic construction and speech act interpreta tion take place immediately is still an open question (our views are discussed in more deta il in (Poesio, 1999, to appear). The interpretation in (5.3.1) simply encodes the results of lexical access and preliminary syntactic interpreta tion, both of which are known to at least begin very early (Swinney 1979, Simpson 1994, Frazier 1987, OTHER REFS). There is also evidence th at the observation of the occurrence of an event of uttering an anaphoric NP such as a pronoun is sufficient to start the processes by which such expressions are interpreted, as discussed in (Poesio, 1999; Poesio, 2001). How much more interpreta tion takes place? As discussed in section 4, under the intentional account the fact th at a completion takes place is taken as indication tha t Cnst somehow manages to recognize Inst’s intention to perform a directive, th a t we represented in tha t section in the simplif ied form IntInst (directive(Inst, Cnst, 33

We assume here th at Inst and Cnst have the same mental grammar.

44

grasp(Cnst, Screw))) ; th is is what prompts her to utter “a screw.” A more complete PTT representation of the result of the first of these inferentia l processes is shown in (1), whereas the intention to utter a screw would be represented as in (2). (1) IntInst ([ce1.1, K1.1a | K1.1a=[x,e|screw(x), e:grasp(Cnst, x)] ce1.1:directive(Inst,Cnst,K1.1a)] (2) IntCnst ([u|u:utter(Cnst,”a screw”)]) Note th a t it’s hard to tell from the dia logue whether Cnst has already identified a particular screw, so we will not assume anyth ing in this respect here. (There is an additional complication, due to the hypothesis th a t what gets added to the information state are contributions—i.e., Discourse Units-- and tha t inference itself is modeled in terms of DUs, as shown in the rest of th is section. ) We discuss two hypotheses concerning the way Cnst reaches conclusion (1). As for the way recognition of (1) leads to the adoption of intention (2), we already saw in Section 4 th a t th is can be explained in a number of ways, and a number of these alternative hypotheses seem equally plausible, so th a t choosing among them would amount to mind reading. We will nevertheless attempt to show tha t a number of such hypotheses could be expressed in terms of PTT, which would be diff icult or perh aps impossible to formulate in a lternative frameworks. One hypothesis concerning the way Cnst may reach conclusion (1) is based on the operation of EXISTENTIAL CLOSURE proposed by Pickering, Chater and Milward (1995). According to Pickering et al, existentia l closure takes place every time a new input is perceived; its purpose is to produce propositions out of partia l syntactic interpretations, so th a t the new input can be immediately eva luated against the current situation. In PTT terms, the existentia l closure hypothesis amounts to hypothesizing tha t Cnst attempts to derive a proposition as the conventional meaning of u1.1 even when the only information at her disposal is wha t is presented in (5.3.1), by existentia l ly closing the missing argument. More precisely, the hypothesis is th at as a result of observing the contribution represented in (5.3.1) as CurrDU, Cnst hypothesizes a new DRS K1.1b as the argument of an intention a ttributed to Inst: K1.1b is the proposition that an event of the Constructor tak ing an object x1 yet to be determined takes place. In PTT, the result of all inferences performed by Cnst on the basis of (5.3.1), such as existentia l closure, is viewed as materia l to be grounded separately, i.e., as a separate DU; for simplicity here we will however assume that all such inferences are added to the same DU, CurrDU. We use the notation K+= K’ to indicate th is, defined as follows: K += K’ Let K be a proposition-va lued discourse referent, and K’ be a proposition (DRS) or a proposition-valued discourse referent. Then K += K’ =def [ TmpK| TmpK = K];[ | K = TmpK;K’] (where TmpK is an unused proposition-va lued discourse referent) Using th is notation, the result of existentia l closure is the update of Cnst’s view of the discourse situation shown in (5.3.2). (5.3.2) CurrDU += [K1.1b| K1.1b=[e1,x1|e1:grasp(Cnst,x1)]]. The recognition by Cnst th at an action of grasping someth ing is being discussed is the basis for her next inference, the recognition tha t the grasping action in DU1.1e is the first step in performing the action currently we-intended. We recall th a t under the intentional account we are discussing, Inst and Cnst are operating under the shared plan in (4.2.5). Under the current account of the way Inst concludes (1), inferring (5.3.2) leads Cnst to infer th at Inst is performing the second directive (c.1) in the shared plan in (4.2.5) for bringing about the required joint. (The acceptance by Cnst of th a t directive will result in Cnst’s adopting the intention of performing the action indicated by the directive, (c.2) in

45

(4.2.5). ) 34 This inference leads to Cnst updating her view of CurrDU with (1), i.e., the a ttribution to Inst of a directive whose content is the DRS K1.1a wh ich is the sum of K1.1b and of [|screw(x)], as shown in (5.3.3). (Notice th at th is intention is the intention discussed above in (5.2.2’).) (5.3.3) CurrDU += [i1.1b’|i1.1b’:IntInst ([ce1.1, K1.1a| K1.1a=[e,x|screw(x),e:grasp(Cnst,x)], ce1.1:directive(Inst,Cnst,K1.1a)])] (As discussed in Section 4, we leave it undetermined here whether th is private intention a ttributed to Inst is directly derived from we-intention (h) derived from (4.2.5) via distributivity of we-intention.) An alternative analysis of the inference process tha t leads Cnst to conclude (5.3.3) is th at Cnst is capable to recognize th a t Inst is performing directive (c.1) in (4.2.5) directly from the lexica l meaning of mce1 in (5.3.1), wh ich is a mention of a grasping action. In other words, tha t the mention of a grasping action alone is sufficient to make step (c.1) of the shared plan in (4.2.5) active, without the need of going through the existentia l closure in (5.3.2). Although the end result would again be (5.3.3), th is second hypothesis has the advantage of avoiding the need to stipulate th a t existentia l closure takes place after each and every utterance; and there is increasing evidence for th is type of ‘surface’ semantic reasoning [REFS? THE PAPER BY FILIK ET AL?] As discussed in Section 3, there are at least four, and probably more, ways of expla ining what prompted Cnst to perform a completion: responding to a request, voluntary coordination control, cooperativeness, and ‘blurting out’. All four explanations we mention in Section 3 assume th at Cnst recognizes (5.3.1) as a partia l plan for performing the contribution in (5.3.3). Cooperativity As seen in Section 4, the cooperativity explanation is th a t Cnst has decided to turn the directive in a joint action. Revising a contribution in th is way is a sort of refashioning (Clark and Wilkes-Gibbs, 1990, p. 481-486) of the DU th at represented the contribution. Refash ioning can be formalized in terms of the system of grounding actions proposed in (Traum, 1994) as a repair grounding act: Cnst proposes to replace CurrDU, according to which Inst alone performs the directive, with NewDU (shown in (5.3.4)), in which the directive is a joint action of Inst and Cnst: (5.3.4) [i1.2a| i1.2a:IntCnst ([NewDU,ce1.2|NewDU = [ce1.1, K1.1a| K1.1a=[e,x|screw(x),e:grasp(Cnst,x)], ce1.1:directive(Inst&Cnst,Cnst,K1.1a)] ce1.2:repair(Cnst,CurrDU,NewDU)])] After deciding to turn the directive into a joint action, Cnst has to decide how best to complete it—i.e., which additional utterance actions to perform in order to generate ce1.1 as described in (5.3.4). Again, Cnst may have determined which utterances are missing from (5.2.4) in a number of ways. The simplest explanation is th at a sort of structural a lignment is tak ing place: Cnst makes her own plan for performing the directive in (5.3.4), and then compares th is plan with the actions she observed. Cnst’s plan to perform the directive is a series of utterances very much like those in (5.2.4), except th at it includes the generic “eine Schraube” instead of the more specific NP “eine orangene Schraube mit einem Schlitz”, so the conventional meaning of u1.1 is K1.1a, not K1.1. Only part of th is series of utterances was executed, but we’ll assume it doesn’t include a take-turn utterance, although it does include an utterance of an adverbial. (Other alternatives are conceivable of course.) Th is plan is shown in (5.3.5). (5.3.5) [u1.3, u1.1’, mce1, udb1, udb2, udb3| 34

In the ‘standard’ version of PTT, as presented, e.g., in (Matheson, Poesio, and Traum, 2000), adopting this intention would be a way of addressing the obligation raised by the directive. As already said above, we are not concerned with obligations here.

46

S(u1.1) , sem(u1.1)= K1.1a, u1.3:“jetzt“:Advp, S(u1.1’), u1.3 ↑ u1.1, u1.1’ ↑ u1.1, mce1:“nimmst“:V, VP(udb1), udb1 ↑ u1.1’, mce1 ↑ udb1, udb2:“Du“:NP, udb3:“eine Schraube“:NP, udb2 ↑ udb1, udb3 ↑ udb1] Having produced th is plan / syntactic structure, Cnst would compare it with the actions she has observed Inst performing. (Th is assumption tha t participants in a conversation always compare the ir expectations with what they observe is at the basis of Wolpert, Doya and Kazato’s (2003) ‘MOSAIC’ model of language comprehension, and also appears to be broadly consistent with Pickering and Garrod’s (2004) `alignment’ model (more on th is model below).) Cnst then decides to include in the new contribution she is planning, NewDU, the missing NP: (cfr. Wolpert; also th is is a bit like Pickering and Garrod) (5.3.6) NewDU += [ | udb3:“eine Schraube“:NP] (The fact th a t Cnst decides to utter an NP instead of producing a complete utterance such as, say, “Ich nimm eine Schraube” might also be expla ined in terms of cooperativity: reusing as much as possible of Inst’s output would make ce1.1 a genuinely joint action.) In case existentia l closure were assumed, alignment at the domain level (rather th an at the syntactic one) could also be used to expla in how Cnst plans her contribution to the joint directive. That is, Cnst might h ave decided to utter an NP by comparing K1.1a, the content of the jointly intended directive in (5.3.4), with K1.1b, the proposition derived from the partia l directive th a t Inst has performed (see (5.3.2) by means of existentia l closure. Cnst would then conclude th a t K1.1b should be augmented with K1.1d: (5.3.7) K1.1d=[x|screw(x)] Cnst could then decide to produce an utterance with K1.1d as content – i.e., an indefinite NP, very much as in (5.3.6). Voluntary coordination control We will only briefly discuss the alternative explanations for the completion mentioned in Section 3. The second hypothesis raised there is th a t Cnst’s uttering of “eine Schraube” is an acknowledgment of (the complete version of) Inst’s directive as in (5.3.8), instead of a repa ir as in (5.3.4). The content of intention i1.2a in (5.3.8) is the proposition resulting from the concatenation of CurrDu += [ | udb3:“eine Schraube“:NP] to an update with an acknowledgment of CurrDU (we remind the readers th a t K += K’ is short for [TmpK| TmpK = K];[| K = TmpK;K’]). (5.3.8) [i1.2a| i1.2a:IntCnst (CurrDU += [ | udb3:“eine Schraube“:NP]; [ce1.2| ce1.2:ack(Cnst,CurrDU)]) (Notice th a t CurrDU here includes the result of severa l inferences, as in (5.3.3).) ALTERNATIVE: acknowledge a new DU concatenating CurrDU (5.3.9) [i1.2a| i1.2a:IntCnst ([NewDU | NewDU = CurrDU;[ | udb3:“eine Schraube“:NP]]; [ce1.2| ce1.2:ack(Cnst,NewDU)]) In either case, the main difference from (5.3.4) is th a t the directive would be still considered Inst’s action. Also, it would be the complete directive to be acknowledged, with K1.1a as content, rather th an its partia l form with content K1.1b. (See (5.3.2).) Cnst would then have to decide how best to perform the acknowledgment. Wh i le a repetition of the partia l directive would be appropria te as an acknowledgment of th a t part only (i.e., without making a completion), we hypothesize th a t it’s not possible to acknowledge an entire directive merely by repeating what has already been uttered. (It would be interesting to test th is experimenta lly.) As in the case of the cooperativity explanation,

47

Cnst would then have to determine what’s missing from Inst’s directive - either by a structural or by a domain-level match. Either way, as a result of th is, Cnst would acquire an intention along the lines of (5.3.6) or (5.3.8). Responding to a Request As we pointed out before, it is possible th a t Cnst performed the completion because she interpreted the pause in 1.1 as a request. The system of grounding acts developed by Traum (1994) and adopted in PTT (see section 3.6) includes three types of Requests th a t Cnst may be performing: Request for Acknowledgment, Request for Continuation, and Request for Repa ir. The consequence of th is interpretation is th at Cnst will add to her information state an obligation to address the request; it will be th is obligation tha t will lead her to acquiring the intentions discussed above, rather th an a desire to be cooperative. Apart from tha t, th is explana tion is not different from the explanations already discussed. Blurting Out Finally, it is possible th a t Cnst’s action is not motiva ted either by a decision to be cooperative at the domain level (i.e., without acquiring intention (5.3.4)) or by a decision to acknowledge (as in (5.3.8)), but by the judgment th at Inst is tak ing too long to perform the action. According to th is explanation, Cnst quickly recognizes th at Inst is a ttempting to perform (5.3.3) and then identif ies what is missing by a structural or domain level match, just as expla ined above. However, Cnst is also moved by other goals, such as completing the task quickly, and it is as a result of these goals th a t she acquires intentions (5.3.4) or (5.3.8). We do not have a full account of what these other goals might be and how they lead to the acquisition of intentions, but we do expect these intentions to be expressible in terms of the formalism given above. 4.4

The repair: 1.3, “eine orangene mit einem Schlitz”

At th is point, Inst performs a continuation: he `accepts’ Cnst’s completion, but also augments Cnst’s description of the screw to make sure Cnst chooses the correct screw, which we have called orange-slit-screw. Th is expansion of the object’s description is also viewed in PTT as a refash ioning; th is time of the DU produced by Cnst in 1.2 (which by now has become the CurrDU). In th is case, Inst suggests to replace CurrDU with a new DU in which the argument of the directive is the expanded proposition K1.3—the concatenation of K1.1a from (5.3.4) with the new information about the screw x, represented in (5.4.1) as K1.3a: (5.4.1) [i1.3a| i1.3a:IntInst ([NewDU,ce1.2| NewDU = [ce1.1, K1.3, K1.3a| K1.3a=[ | orange(x)];[ x3|slit(x3), has(x,x3)], K1.3=K1.1a; K1.3a, ce1.1:directive(Inst&Cnst,Cnst,K1.3)] ce1.2:repair(Inst,CurrDU,NewDU)])] In the PTT framework there are at least two ways to expla in how Inst reaches intention i1.3a in (5.4.1). One explanation is th a t it is the result of plan match ing— matching the directive in (5.3.4) with the directive in (5.2.2). Inst compares K1.1a (the content of directive ce1.1 in proposed by Cnst, shown in (5.3.4)), with K1.1 (the content of ce1.1 according to Inst’s original intention, shown in (5.2.2)). As a result, Inst identifies the need to produce the additional information in K1.3a. A second explanation is th a t Inst reaches th is conclusion as the result of situation matching: Inst considers the situations tha t may result from the execution of the content K1.1a of the directive proposed by Cnst, and rea lizes th a t there are 9 such situations, one for each of the screws in Figure 4.2.4. (Such simple planning can be done quite efficiently, as shown, e.g., by the work on planning the TRAINS project (Allen and Ferguson, 1994, XXXX).) As an aside, the fact th a t utterance unit 1.3 was actually produced in two insta llments [TWO PHRASE BOUNDARIES?] suggests th a t Inst identif ies the properties to add to the description of the screw in two steps. First of all, Inst rea lizes th a t the property [ | orange(x)] is required; then, th a t [ x3 | slit(x3), has(x,x3)] is also needed. (Th is

48

incrementa l decision process leads to appositions being very common in dialogue, and a treatment of appositions as being essentia l for a theory of incrementa l dia logue processing.) In PTT, th is two-stage processed would be modeled by a ttributing to Inst two distinct intentions: an intention to enrich K1.1a with the information tha t x is orange (shown in (5.4.2)) and an intention to enrich th is new description with the additional information th a t x has a slit, shown in (5.4.3). (5.4.2) [i1.3a’| i1.3a’:IntInst ([NewDU,ce1.2| NewDU = [ce1.1, K1.3, K1.3a’| K1.3a’=[ | orange(x)], K1.3=K1.1a;K1.3a’, ce1.1:directive(Inst&Cnst,Cnst,K1.3)] ce1.2:repair(Inst,CurrDU,NewDU)])] (5.4.3) [i1.3a’’| i1.3a’’:IntInst (NewDU+= [ K1.3a’’| K1.3a’’= [ x3 | slit(x3), has(x,x3)] , K1.3=K1.1a; K1.3a’ ; K1.3a’’]) We will however gloss over th is issue in what follows and simply treat the whole of 1.3 as a single apposition. (ELABORATED IN RIESER AND POESIO 2006?) Dea ling with th is utterance requires an account of apposition. The treatment of German appositional constructions assumed here is based on the intuition tha t, semantically, appositions are predicates of type , as illustrated by the contrast between the examples in (5.4.4) a. and b. (5.4.4) a. K im Smith, orphan / policeman / from Manningtree / age 35 b. K im Smith, *rich / *the man left without an explanation / …. c. K im Smith, the major / a policeman from Manningtree Semantica lly, what h as to be expla ined is the fact th a t certa in types of NPs – particularly indefinites and definite NPs—can also serve as appositions, as shown by (5.4.4)(c). We take these to be cases of NPs whose type has been lowered to th a t of predicates (de Swart and Farkas, 2003), although a few constra ints determined by the syntactic formalism have to be taken into account. Syntactica lly, we need to expla in how appositions can be integrated in the parse tree; th is is one of the main reasons for assuming a syntactic formalism such as LTAG which a llows for adjunction operations. LTAG however restricts the possible semantic analyses of NPs used predicatively, in th at only lexica l categories are allowed, and the only a llowed semantic operation is application. Such framework prevents therefore solution tha t stipulate th a t the predicative meaning of NPs in appositions can be derived syncategorematica lly –say, by stipulating a structure as in Figure 5.3.1, with a semantic operation associated with NPapp th at `lowers’ the NP-type interpretation for eine orangene, mit einem Schlitz to a predicative interpreta tion.

49

NP ()

NBar ()

NBar ()

NPapp () λP λ u P (λ x [| u =x])

NP () Figure 5.3.1: A syncategorematic treatment of appositions

Instead, we have to stipulate th a t German eine is ambiguous between a normal determiner interpretation, shown in (5.4.5a), and an interpretation used in appositions, of type , shown in (5.4.5b): (5.4.5) a. eine: λP’λP([u| ]; P’(u); P(u)) b. eineapp : λP’λPλu([v| ]; P(u); P’(v); [ | u = v]) The two interpretations are also associated with the distinct elementary trees shown in (5.4.6a) and (5.4.6b): (5.4.6) a. NP +

Det

N-

eine: λP’λP([u| ]; P’(u); P(u)) N+

b.

N-

NPapp Detapp

NAdj +

eineapp : λP’λPλu([v| ]; P(u); P’(u); u = v)

50

Wit h these assumptions, and assuming standard elementary trees and semantic interpretations for the other words in utterance 1.3, with we get the following LTAG analysis for eine Schraube, eine orangene mit einem Schlitz: (5.4.6) NP N’

NP app NAdj PPDat NP Dat

Det

N

eine

Schraube

Detapp eineapp

NAdj orangene

PDat

Det

mit

einem

N

Sch litz

whose interpreta tion is arrived at through the following deriva tion: (5.4.8) eineapp : λP’λPλu([v| ]; P(u); P’(v); u = v) orangene: λP λv([ |orange’(v)]; P(v)) einem: λP’λP([u| ]; P’(u); P(u)) Schlitz: λv([ |schlitz’(v)]) einem(Sch litz): λP’λP([u| ]; P’(u); P(u))( λv([ |schlitz’(v)] )) = λP([u| ]; [ |schlitz’(u)] ; P(u)) mit: λPλx(P(λy[ |mit’(x,y)])) mit(einem(Sch litz)) : λPλx(P(λy[ |mit’(x,y)]))(λP([u| ]; [ |schlitz’(u)] ; P(u))) = λx(λP([u| ]; [ |schlitz’(u)]; P(u))(λy[ |mit’(x,y)])) = λx([u| ]; [ |schlitz’(u)]; [ |mit’(x,u)]) /*renaming of variables = λx([y| ]; [ |schlitz’(y)]; [ |mit’(x,y)]) orangene(mit(einem(Schlitz))) : λP λv([ |orange’(v)]; P(v))( λx([y| ]; [ |schlitz’(y)]; [ |mit’(x,y)])) λv([ |orange’(v)]; ([y| ]; [ |schlitz’(y)]; [ |mit’(v,y)])) eineapp (orangene(mit(einem(Schlitz)))): λPλu([v| ]; P(u); [ |orange’(v)]; ([y| ]; [ |schlitz’(y)]; [ |mit’(v,y)]); u = v) Schraube(eineapp ((orangene)(mit(einem(Sch litz))))): λu([v| ]; [ |schraube’(u)]; [ |orange’(v)]; u = v; [y| ]; [ |schlitz’(y)]; [ |mit’(u,y)]) eine: λP’λP([u| ]; P’(u); P(u)) eine(Schraube(eineapp ((orangene)(mit(einem(Sch litz)))))): λP’λP([x| ]; P’(x); P(x))( λu([v| ]; [ |schraube’(u)]; [ |orange’(v)]; u = v; [y| ]; [ |schlitz’(y)] ; [ |mit’(u,y)] )) = λP([x| ]; [v| ]; [ |schraube’(x)]; [ |orange’(v)]; x = v; [y| ]; [ |schlitz’(y)]; [ |mit’(x,y)]; P(x)) Apart from the syntactic and semantic issues, the planning of (5.4.6) –or, more precisely, of the appositional part of (5.4.6)—proceeds just as the planning of the completion discussed in section 5.2.

51

4.5

The acknowledgment: 1.4, “Ja” [STILL TO BE REVISED] Ack; also accept? If accept, need different formalization from previous work (conditional) Accept(i)  intend(I&C) Action has to be performed Modification to described situation

4.6

2.1-2.4 briefly [STILL TO BE REVISED] Very similar to 1.1-1.4 A crucia l issue for our discussion is the fact th a t the pronoun it in the second directive refers back to an antecedent generated in a side sequence following the pattern of Clark's "presentation" and "acceptance phase" (see Clark, 1996:227ff), wh ich is run through twice here. The first presentation is a screw wh ich is extended by an orange one with a slit. This "expansion" is accepted in turn by Cnst. The cooperatively produced antecedent is a screw, an orange one with a slit.

52

5

A non-intentional analysis [STILL TO BE REVISED] [AS IN BIELEFELD TALK]

5.1

The Pickering and Garrod alignment model W h a t would a Pickering/Garrod analysis look like? S implest analysis: domain reasoning But th is cannot rea lly work: too many possible actions At the very least, must use plans to cut down on space of states Other possible difference: no mutual plans?? (more like Kautz?)

5.2

An analysis of the example dialogue based on alignment

[How a `Pickering and Garrod’ `simple’ analysis might work ]

53

6

Related literature [TO BE WRITTEN]

6.1

Dynamic Syntax (Purver, Cann and Kempson) [HANNES]

6.2

KOS (Ginzburg) [MASSIMO]

6.3

SDRT [HANNES, You can make comparisons to your heart’s content here …]

54

7

Discussion [STILL TO BE WRITTEN]

55

8

References

Bratman, M.: 1992, Sh ared Cooperative Activity, The Philos. Review 101, 327-341 Bratman, M.: 1993, Sh ared Intention. In Ethics 104, pp. 97-113 Clark, Herbert H., 1996. Using Language. Cambridge: Cambridge University Press. Clark, Herbert H. and Deanna Wilkes-Gibbs, 1990. Referring as a collaborative process. In Cohen, Ph. R., J. Morgan, and M.E. Pollack, eds., Intentions in Communication, MIT Press, 1990: 461-493. Cohen, Ph. R. and Levesque, H. J. (1990a). Persistence, Intention, and Commitment. In Cohen, Ph. et a l. (eds.), Intentions in Communication, pp. 34-69 Cohen, Ph. R. and Levesque, H. J. (1990b). Rational Interaction as a Basis for Communication. In Cohen, Ph. et al. (eds.), Intentions in Communication, pp. 221-225 Fagin, R., Ha lpern J.Y., Moses, Y., Vardi, M.Y.: 1995, Reasoning about Knowledge. The MIT Press: Cambr., Mass. Grice, H. P. (1991). Utterer’s Meaning, Sentence-Meaning, and Word-Meaning. In: Davis St. (ed.) Pragmatics. New York, Oxford: OUP, pp. 65-76 Groenendijk, J. and M. Stokhof, 1991. Dynamic Predicate Logic. Linguistics and Philosophy 14: 39-100. Grosz, Barbara J. and Sidner, Candace L.,1986. Attention, intention, and the structure of discourse. Computational Linguistics 12: 175-204 Grosz, Barbara J. and Sidner, Candace L., 1990. Plans for Discourse. Cohen et al. eds., 417-445 Grosz, B. and Kraus, S.: 1996, Collaborative plans for complex group action. In Artif icia l Intelligence 86, pp. 269-357 Kamp, Hans and Uwe Reyle, 1993. From Discourse to Logic. Dordrecht: Kluwer. Levin, James A. and Moore, James A., 1977. Dia logue Games: Meta-communication Structures for Natural Language Interaction. ISI/RR-77-53, Informa tion Sciences Institute, Univ. of Southern California Litman, Diane, J. and Allen, James, F., 1987. A plan recognition model for subdialogues in conversation. Cognitive Science 11: 163-200 Litman, Diane, J. and Allen, James, F.,1990. Discourse Processing and Commonsense Plans. Cohen et al. eds., 365-389 Mann, William C., 1988. Dia logue Games: Conventions of Human Interaction. Argumentation 2 : 511-532 Matheson, Colin, Poesio, Massimo, and Traum, David, “Modeling Grounding and discourse obligations using update rules”, Proc. Of the NAACL, 2000. Pickering, Martin, J. & Garrod Simon, 2004. Toward a Mechanistic Psychology of Dia logue. Brain and Behavioral Sciences Poesio, Massimo and Muskens, Reinhard, “The Dynamics of Discourse Situations”, Proc. Of the 11th Amsterdam Colloquium, 1997, 247-252. Poesio, Massimo and Traum, David, “Conversational Actions and Discourse Situations”, Computational Intelligence, v. 13, n.3, 1997 Poesio, Massimo and Traum, David, “Towards an Axiomatization of Dia logue Acts”, Proc. Of TWENDIAL, 1998. Poesio, Massimo and Traum, David, R., eds., 2000. Proceedings of GÖTALOG 2000. Fourth Workshop on the Semantics and Pragmatics of Dia logue, Göteburg Univ. Rieser, Hannes and Kristina Skuplik, 2000. Multi-speaker Utterances and Coordination in Task-oriented Dia logue. Poesio, M. and Traum, D. (eds.), 143-151 Sadek, M.D. (1992). A Study in the Logic of Intention. In: B. Nebel, Ch. Rich, W. Swartout (eds.): Principles of Knowledge Representation and Reasoning. Proceedings of the Th ird International Conference. Morgan & Kaufmann Publishers: San Mateo, CA, pp. 462-475 Searle, John. R. and Daniel Vanderveken, 1985. Foundations of Illocutionary Logic. Cambridge: Cambridge University Press.

56

Traum, David R., 1999. 20 Questions on Dia logue Act Taxonomies. [Amstelogue’99]. Tuomela, R.: 2000, Cooperation. Kluwer Academic Publishers

57

APPENDIX A: THE ENTIRE EXAMPLE DIALOGUE APPENDIX B: A CDRT FRAGMENT FOR THE EXAMPLE DIALOGUE [Latest version: CDRTFragment-rev4] APPENDIX C: TUOMELA’S COMPLETE DEFINITION OF SCA