Unified Representation of Typological, Absolute and Relational

[Salmon-Alt2001] and more generally is inspired by the work of ..... Perceived Colors, in Machine Learning in Computer Vision: What, Why and How? AAAI-TR.
132KB taille 8 téléchargements 310 vues
Pacific Association for Computational Linguistics

UNIFIED REPRESENTATION OF TYPOLOGICAL, ABSOLUTE AND RELATIONAL PREDICATES GUILLAUME PITEL, JEAN-PAUL SANSONNET LIMSI-CNRS, BP133 F-91403 Orsay, [email protected] In this paper, we show how to use a functional representation to implement different kinds of referential predicates (relational, absolute and typological, according to the typology proposed by Dale and Reiter). This implementation is intended to resolve extensional referring expressions in a practical dialogue system. We first show that all kinds of referential predicates are context-dependent and are essentially relative, whether they rely on type of entities, intrinsic properties or relational properties. We then argue that a functional approach is necessary to overcome the lack of expressiveness of predicative approaches. Indeed, phenomena like vague predicates or relational properties are very expensive to handle within a predicative approach, since this latter does not account for the matter of degree and dynamicity introduced by these phenomena in a dialogue context. Designers of dialogue systems are thus often led to use specific heuristics for resolving relational referring expressions, and adopt a simplified view on typological and absolute predicates. As we are interested in providing a unified framework for extensional reference resolution in a human computer dialogue system, we detail how to use our functional representation model for referential predicates into a generic resolution algorithm. Key words: extensional reference resolution, functional approach, practical dialogue

1. INTRODUCTION This paper addresses the issue of extensional resolution of referring expressions in the context of a practical dialogue system. Extensional resolution of reference is defined in [Byron and Allen2001] as the research of objects in the mediated software referred to by a referring expression in a user’s utterance. In existing practical dialogue systems, the process of finding objects denoted by a referring expression is often either reduced to repeated selection based on first-order predicates, or built around specific heuristics when the former method fails, typically for relational predicates [Winograd1973]; [Dzikovska and Byron2000]. These approaches to extensional resolution do not allow a generic framework definition for extensional reference resolution in practical dialogue systems. We have proposed in [Pitel and Sansonnet2003] to use a functional representation of referential predicates and the resolution algorithm using this representation. This extensional reference resolution model comes as an extension of the frame defined in [Salmon-Alt2001] and more generally is inspired by the work of [Reboul1999] and the theory of cognitive grammar [Langacker1987]. This model aims to provide a way to describe detailed referential predicates, while staying at an abstract level, thus permitting to access deep structures of the mediated software within a formalism that is a direct extension of a natural language processing system. In Salmon-Alt’s words, parts of a referring expression are considered to act as several differentiation criteria in the element selection process. These criteria are used to organize sets of entities called reference domains. Reference domains store possible candidates to a reference at each step of the resolution process, they also express a point of view about the situation, thus defining a reference frame or specifying a differentiation criterion for future resolutions. Contrary to Salmon-Alt’s position about the representation of differentiation criteria, we adopt in © 2003 Pacific Association for Computational Linguistics

PACLING'03, HALIFAX, CANADA

[Pitel and Sansonnet2003] a procedural approach on the meaning of referential predicates, implemented with referential extractors. In section 2 of the paper, we present several experiments and situations in order to show that all kind of referential predicates are essentially relative (or differential) and context-sensitive. This opinion is shared by [Kyburg and Morreau2000] about vague predicates (such as tall), but whereas the authors only consider that vague references are defined relative to a comparison class, we consider that referential predicates are essentially represented by a binary function that compares two entities along a given dimension. Section 2 thus argues for using our representation model. In section 3, we propose several implementations of a model based upon this model, supported by binary comparison functions, and show how to use it for several kinds of referential predicates. A practical situation illustrates our approach and show how the algorithm is used to resolve a referring expression. 2. REFERENTIAL PREDICATES

2.1 Dale and Reiter typology of referential predicates Dale and Reiter [Dale and Reiter1995] have proposed to distinguish three kinds of referential predicates depending on the kind of properties they are applied on. While their proposal was originally made to deal with the generation of referring expressions, it is commonly used for interpretation too. In decreasing order of importance in a referring expression, the three categories are: typological properties (e.g. triangle, square), absolute properties (e.g. size, color) and relational properties (e.g. spatial position). While this classification is close to the syntactic categories of the natural language (typological properties are nouns and intrinsic properties are adjectives, for instance), we believe that it does not justify a different account in the processing of each category of predicates. In dialogue systems, the issue of representing objects in the mediated software and their link with referential expressions has not been widely addressed. Most dialogue models consider that they have access to an adequate representation, coherent with the way the user will refer to an object using natural language. This can be observed even in late researches, for instance, [Salmon-Alt2001:149], [Byron and Allen2001] or [Dale and Reiter1995]. Typically, [Byron and Allen2001] expresses restrictions due to the interpretation of a referring expression with a form of first-order predicate logic like (color x RED). There is however no evidence that the natural way to refer to entities is close to a particular representation of entities. Specifically, it is very hard to find the right representation for a given concept. For instance, in order to represent geometrical forms, one can either choose to represent figures as entities of type square, triangle or circle, with the properties of color and size, or to represent all figures with the same type figure, and the properties color, size and shape. We consider that a given entity can be considered through several points of view, and thus that one cannot base an extensional reference resolution approach upon an arbitrarily-chosen point of view. We thus have to

UNIFIED REPRESENTATION OF TYPOLOGICAL, ABSOLUTE AND RELATIONAL PREDICATES

consider by default that the inner representation is different from the natural language way of referring to entities’ properties. Our approach leads to separate the representation of entities and the representation of access to their properties and to selection of the right entities. We thus introduce a new object, a sub-process that takes part in the process of extensional resolution of reference. We call referential extractors the dynamic entity corresponding to a referential predicate that plays a role in the determination of the denotation of a referring expression’s extension (e.g. in “Put the red square on the right”). We have defined this category of predicate’s role in order to make a distinction with the roles of referential producer and referential verifier (illustrated respectively in “Create a big yellow square” and “Is this yellow?”). 2.2

Confusion between predicates and properties [Dale and Reiter1995] incremental generation algorithm is the state-of-the-art for the generation of referring expressions. They generalize the view of [Grice1975] in order to choose incrementally the best properties to use in a referring expression so that the expression is explicit enough to be unambiguous without giving to much information. What is of interest for the interpretation of referring expressions is the distinction between type properties, absolute properties and relative properties made by Dale & Reiter, and conserved in more recent extended versions of the algorithm, (for instance [Krahmer and Theune1999]). This distinction is widely adopted in interpretation of referring expressions because it fits well with a frame-based or object-oriented representation of entities. In such an approach, typological properties define the category of an element. The category of an element is constant, and determines the intrinsic properties held by an element. The intrinsic (or absolute) properties of elements are attribute/value pairs, specifying completely an element compared to elements of the same type. Intrinsic properties are characterized by their stability (they are always present), and are contextindependent. We particularly aim to point out the fact that confounding properties and predicates that rely on them is a mistake that should be avoided. Indeed, while size is an absolute property held by the entity itself (e.g. 15cm), referential predicates that rely on the size are relational (e.g. big, small). We will demonstrate that generally, whereas the characteristics needed to calculate the right referent are intrinsic to the entities and do not depend on context, referential extractors relying on intrinsic properties actually need to use extrinsic information in order to resolve the referring expression. In the following section, we will try to show that all kinds of predicates are sensitive to the context. This will prove that whatever the referential extractors rely on, their action always depends on the context, and thus that a unified representation of referential extractors is possible. We will designate by operating context the current state of the mediated software, while ontological context represents the state of the graph of relations between categories and the knowledge that is attached to them for a particular domain.

PACLING'03, HALIFAX, CANADA

2.3

Size, color and shape In practical dialogue systems, semantics of size adjectives like big or small are often reduced to predicates. However, as [Dale and Reiter1995] notice, it seems obvious that the element referred to by an expression containing such an adjective is not supposed to own a property stating if it is big or small in the absolute. The size of an element is an intrinsic characteristic, but the way natural language is used to refer to it makes use of extra information. Thus, referential extractors relying on size of elements depend on operating context. We argue that the general case of reference resolution is contextdependent, and that the rare cases of almost context-independent reference are actual exceptions to the rule. In order to support this hypothesis, we will show from experiments based on figure 1, that color adjectives are context-dependent too. In case 1 of this figure, “the blue leaf” refers to the leaf on the left. On the other hand, in case 2, if one ask a user to select no blue squares, she will choose the three squares in the bottom of the figure. Finally, if one asks for the color of the car in case 3, users will preferably answer “black”, even if they answer yes to the question “Is the car blue?” In these three cases, however, the leaf, the bottom-left square and the car are of the same color. We explain this behavior by the fact that a leaf has a typical color of green, yellow, red or brown, and thus can be categorized as blue as soon as its color get out of these typical colors and tends toward blue. On the other hand, cars like the one shown in case 3 are likely to be of a particular color and a dark color appeal the user to see it as being black. This phenomenon is exactly what we call dependence to the ontological context. In case 2, we illustrate the fact that the verbal categorization of colors depends on the environment. Thus, even if the bottom-left square would be considered as blue if it is seen isolated from the others, it is categorized as non blue when surrounded by more blue elements. This is what we call dependence to the operating context. blue

pale blue dark blue

CASE 1

CASE 2

blue gray

blue gray

blue gray

sky blue

violet

CASE 3

FIGURE 1. Situation for demonstrating that categorization of colors depends on the environment and the type of elements.

Concerning typological predicates, we base our demonstration of their relativity upon the situation sketched in figure 2. In case 1, “the white square” refers to the square on the left, with sharp edges, while “the white squares” refers to figures on the right and on the left. On the other hand, in case 2, “the white squares” does not include the rounded

UNIFIED REPRESENTATION OF TYPOLOGICAL, ABSOLUTE AND RELATIONAL PREDICATES

square. This is an evidence of the influence of context on the meaning of typological referential predicates. Moreover, it is obvious that one can say “This one is more a square than this one”, and thus that such predicates should be considered to have a relational meaning. Finally, categorization depends on the ontological context, since it is not the same to talk about a square in the context of a game or in a geometry learning aid software.

Case 1

Case 2

FIGURE 2. Situations for demonstrating the context-dependence of type predicates

These observations lead us to make two propositions. The first is that reference resolution process should rely on the use of functions, not on predicates. The hypothesis is supported by conclusions made by [Pateras et al.1995] in a paper dealing with dialogue cases where a database access in not sufficient to handle reference resolution. Pateras et al. propose to use functions based on fuzzy sets to choose the right referent. However these functions are defined relative to a particular task, and are not designed to take ontological or operating context into account, because they have no influence in this very task. The same situation can be found in [Lammens and Shapiro1993] where the authors build color categorization functions with a learning algorithm. The second of our propositions is that the functions we use to represent the semantics of referential extractors should be able to take the context as one of their arguments. 3. FUNCTIONAL DIFFERENTIATION MODEL AND APPLICATION In [Pitel and Sansonnet2003], we have proposed a functional representation for referential extractors. In the following, we show how to use this representation and detail possible implementations for three extractors of each kind of the Dale and Reiter typology, that is, a typological extractor (square), an absolute extractor (color) and a relational extractor (size). 3.1 Simplified general resolution algorithm The resolution algorithm is planned to be used in a resolution system following the guidelines defined in [Salmon-Alt2001], so that the type (definite, indefinite, demonstrative, pronominal) of the referring expression determines the way the initial reference domain is built. The extractors are used to return the final result of the reference resolution in the restructuring phase of the resolution process. We consider a simplified version of the algorithm proposed in [Pitel and Sansonnet2003], where each referential extractor is used in a sub-process of the following form:

PACLING'03, HALIFAX, CANADA 1.

Transform the original reference domain, sorted and partitioned, through a point of view that would make the extractor able to process the domain.

2.

Use the fSimil function of the extractor to sort the domain.

3.

Use the fExcl function to partition the domain among possible and impossible objects.

4.

Use the fPref function to partition the possible part among preferred and acceptable objects.

Compared to Salmon-Alt’s differentiation criteria, our algorithm makes use of two different partition functions (fExcl and fPref) in order to produce a ternary partition instead of a binary one. Indeed, as we introduce a total order relation in the differentiation process, we are faced with an issue that was hidden when using simple predicates: when referring to several elements, like in “remove blue squares”, the user refers to several elements in the top of the list sorted by blueness; however, squares less blue but still blue must be distinguished from non-blue squares, because if the user asks for “remove big blue squares”, and big squares are not in the top of the list sorted by blueness, they must be passed from blue extractor to big extractor in the list of possible candidates for the referring expression. It is thus necessary to distinguish between best candidates, needed to answer to group references, and possible and impossible candidates in order to respond “no match” when the referring expression is ambiguous. 3.2

General representation of referential extractor ReferentialExtractor • fSimil (entityType X a, entityType X b, RefDomain d ) → (0, ∞ ) ∪ ⊥ • fExcl ( RefDomain d ) → RefDomain •

fPref ( RefDomain d ) → RefDomain

Figure 3. The three functions used to represent a referential extractor.

Figure 3 shows the three functions used to represent a referential extractor. A detailed description of these functions can be found in [Pitel and Sansonnet2003]. Here, we just outline their main characteristics. The ƒSimil function is used to sort a set of elements among a given characteristic (say, blueness or bigness). For instance, the ƒSimil function of the referential extractor “blue” should give the ratio between the colors of two elements projected along the blue dimension. The ƒExcl function serves to select in the sorted list (produced by the resolution algorithm using fSimil function) objects that will be excluded from the candidates list that is going to be submitted to the next extractor.

UNIFIED REPRESENTATION OF TYPOLOGICAL, ABSOLUTE AND RELATIONAL PREDICATES

The ƒPref function selects the preferred objects for a given extractor. Here again, there is no general rule for the function. Examples of specific implementations As referential extractors are not directly linked with a particular representation choice, they are polymorphic, and thus, a referential extractor has to be implemented for each kind of representation used in the software implementation. For illustration purpose, we consider that all objects are made of three attributes: shape, size parameters (for instance edge length for a square, diameter for a circle) and color. 3.3 Specific implementations of referential extractors Within this configuration, a typological extractor such as the extractor for predicate “square” provides the following fSimil functions: •

fSimil(square, round-square) = 1.2 (should be determined experimentally for each level of “roundness” of a round square)



fSimil(square, circle) = ∞ (idem for triangle, ellipse,



fSimil(square, rectangle) ∈ (1, ∞) depending on the ratio between height and width of the rectangle

For the “big” predicate, extractor’s fSimil function should just give the ratio between the calculated sizes of two objects. The size can be calculated from the surface of figures, or from the biggest bounding rectangle’s surface. In this respect, the extractor for “small” only differs from “big” because it returns a similarity value above 1 if the size of the first argument of the fSimil function is less than the size of the second one. Note that an extractor for “medium” should first compute the medium size of objects from the reference domain containing all the objects to be processed. Concerning a predicate like “dark”, the fSimil function returns the ratio between darkness of two objects, calculated from their color. The partitioning functions for the extractor square and dark are close to each other, since both imply a highly prototypical concept. In both cases, it is necessary to introduce one or more prototypical objects in the reference domain in order to compare the processed objects with the prototypical ones. The prototypes are themselves created either from a static knowledge base, or from the operating context of the application. For instance, if there is only very light objects in the situation, the prototypical dark object would be between black and gray. Once the prototypes are introduced, the partitioning function searches for a discontinuity in the similarity ratios chain, and “cuts” the chain at a discontinuity near a prototype, or when the cumulated similarity ratio between a prototype and one of the object is over a given threshold. For the extractor of the predicate “big”, the fExcl function selects the last object in the chain (and all objects with which it has a similarity ratio of 1). The fPref function selects all objects from the beginning of the chain to the first discontinuity located around an arbitrary similarity threshold.

PACLING'03, HALIFAX, CANADA

3.4

Practical example of reference processing 10

4

5 7

1

6

9 3

2 8

Figure 4. Situation for the practical example of reference processing

We illustrate how to use the referential extractors model in the case of resolving the referring sub-expression “big dark squares” in the situation described in figure 4. First we apply the referential extractor for square, following Dale and Reiter arbitrary order, then dark, and finish with blue. Using the fSimil function, we construct an ordered list whose links between elements are annotated with similarity ratios, and partition it with the two functions fExcl and fPref. The whole resolution process of “big dark squares” is sketched in figure 5. Each predicate’s extractor is processed sequentially from the top of the figure to the bottom. After an extractor has been applied, the following extractor works only on the remaining possible candidates. The sketch shows how prototypes are introduced artificially in the objects chain and used for partitioning. In this example, no action is performed by a fPref function, since it is only useful for the last extractor, and that in this case, fPref selects all objects. Similarity ratios are calculated from the surface of objects and from the gray level of objects’ color. It is obviously necessary to check these arbitrary choices against a psychological experiment. What we believe is that there may be some coherence between cognitive processes and our model based on differentiation functions, however we cannot give any evidence for that. Even if it is possible to tune the differentiation functions in order to mimic human behavior, this will only be a clue that our model in a good imitation of the cognitive process, but nothing more. At best, this would prove that our model is a good reference for building the reference resolution of high-quality dialogue systems.

UNIFIED REPRESENTATION OF TYPOLOGICAL, ABSOLUTE AND RELATIONAL PREDICATES Proto. SQUARE ∞

1.2

1.0 2 ;3 ;4 ;7 ;10

1 ;5 ;8

Proto. BLACK

1 ;5 ;7

1

1.9

1.5 8

3 ;4

3 ;4 1.4

“square”

Proto. GRAY 1.17 1.28

1.4

1.1

6 ;9

5 1.2

2 ;10

7

“dark”

“big”

5.0

Figure 5. Sketch of a referring expression resolution process: “big dark squares”. Each line represents a reference domain containing objects candidates to being referred. Numbers inside the boxes represent the labels of candidate figures in the situation described in figure 4. Candidates are sorted in decreasing order from the left to the right. When several labels are in the same box, they have a similarity ratio of 1. Double bars represent the limit between possible an impossible candidates for the predicate. Numbers outside the boxes are similarity ratios.

4. CONCLUSIONS AND FUTURE WORK We have argued for the need of a procedural account of referential predicates, showing that in several situations and for all kinds of predicates, there are evidence for their context-dependence and relativity. This leads to the necessary use of a functional representation of referential extractors, in a formalism that could integrate resolution models like the one of Salmon-Alt. This approach aims to cover relational, absolute and typological extractors as well, and we propose a possible implementation for them. One of the issues left for further research is to study a mechanism for measuring the distance between two elements in any dimension. For typological dimension particularly, this implies to perform some kind of metaphorical transformation of reference domains from one type to another. Using this model of predicates representation to justify the order of processing proposed by [Dale and Reiter1995] has been addressed in [Pitel and Sansonnet2003], but some experimental validation is still needed to verify the hypothesis. The AMI team at LIMSI-CNRS is currently working on this mechanism into the INTERVIEWS project [Sansonnet et al.2002]. The formalism presented in this paper, as well as the overall model of natural language analysis integrating it, is under development in this framework. There is no evaluation of this system currently, but we plan to reach a working platform in order to be able to compare results from our reference resolution method with the results of psychological experiments.

PACLING'03, HALIFAX, CANADA

REFERENCES

[Byron and Allen2002] Byron D.K, Allen J.F. 2002. What's a Reference Resolution Module to do? Redefining the Role of Reference in Language Understanding Systems, Proc. DAARC2002 [Dale and Reiter1995] Dale R. and Reiter E. 1995. Computational Interpretation of the Gricean Maxims in the Generation of Referring Expressions. Cognitive Science [Dzikovska M.O., Byron D.K. 2000. When is a union really an intersection? Problems interpreting reference to locations in a dialogue system, Proc. GOTALOG'2000 [Grice1975] Grice, H.P. 1975. Logic and conversation. In P. Cole and J. Morgan, editors, Syntax and Semantics: Vol 3, Speech Acts, pages 43--58. New York, Academic Press [Krahmer and Theune1999] Krahmer, E., and Theune, M. 1999. Efficient generation of Descriptions in Context, Proceedings of the ESSLLI Workshop on the Generation of Nominals, R. Kibble and K. van Deemter (eds.), Utrecht, The Netherlands. [Kyburg and Morreau2000] Kyburg, Alice, and Michael Morreau. 2000. Fitting words: Vague language in context. Linguistics and Philosophy 23:577–597. [Lammens and Shapiro1993] Lammens J.M., Shapiro S.C. 1993. Learning Symbolic names for Perceived Colors, in Machine Learning in Computer Vision: What, Why and How? AAAI-TR FSS93-04 [Langacker1987] Langacker, R. 1987. Foundations of cognitive grammar I: Theoretical prerequisites, Stanford, Stanford University Press. [Pateras et al.1995] Pateras C., Dudek G., Mori R. D. 1995. Understanding Referring Expressions in a Person- Machine Spoken Dialogue. Proc. ICASSP'95, Detroit, MI [Pitel and Sansonnet2003] Pitel G., Sansonnet J.P. 2003. A Differential Representation of Predicates for Resolution of Extensional Reference in Practical Dialogue. In Proc of ARQAS, Venice. [Reboul1999] Reboul, A. 1999. Reference, agreement, evolving reference and the theory of mental representation, in Coene, M., De Mulder, W., Dendale, P. & D’Hulst, Y. (eds), Studia Linguisticae in honorem Lilianae Tasmowski, Padova, Unipress, 601-616. [Sansonnet et al.2002] Sansonnet J.P., Sabouret N., Pitel G. 2002. An Agent Design and Query Language dedicated to Natural Language Interaction, Poster AAMAS 2002 [Salmon-Alt2001] Salmon-Alt S. 2001. Référence et dialogue finalisé : de la linguistique à un modèle opérationnel, PhD Thesis, Université H.Poincaré - Nancy 1, France. Mai 2001. [Schang1995] Schang D. 1995. Application de la notion de cadre aux énoncés de positionnement et de référence, Research Report n° 2529, Unité de Recherche INRIA Lorraine. [Winograd1973] Winograd T. 1973. A Procedural Model of Language Understanding. Computer Models of Thought and Language, Roger Schank & Kenneth Colby eds., W. H. Freman Press.