The Parallel Replacement Operation in Finite State ... - André Kempe

It defines a set of replacement expressions that concisely encode several alternate varia- ..... equally when the last two conditions are met, i.e. between v and w.
185KB taille 4 téléchargements 39 vues
The Parallel Replacement Operation in Finite State Calculus Andr´e Kempe, Lauri Karttunen Rank Xerox Research Centre – Grenoble Laboratory August 18, 2009

Abstract This paper introduces to the calculus of regular expressions, the operators of unconditional and conditional parallel replacement as an enhancement of the existing replace operator (Karttunen 1995). It defines a set of replacement expressions that concisely encode several alternate variations of the operation. Replace expressions denote regular relations, defined in terms of other regular-expression operators. The basic case is unconditional obligatory replacement. Different versions of conditional replacement allow the operation to be constrained by context. All described replacement operations are now included in the Xerox finite-state calculus.

1

Introduction

Linguistic descriptions in phonology, morphology and syntax, typically make use of an operation that replaces some symbol, or sequence of symbols, by another sequence or symbol. We consider here the replacement operation in the context of finite-state grammars. This includes many frameworks for phonological, morphological and syntactic description. Kaplan and Kay (1981, 1994) demonstrate that classical phonological rewrite rules can be implemented as finite-state transducers. The two-level model of Koskenniemi (1983) presents another finitestate formalism for constraining symbol-by-symbol replacements in morphology. The constraint grammar of Karlsson et al. (1994) has its own replacement formalism designed for morphological and syntactic disambiguation. It employs constraint rules that delete given morphological or syntactic tags in specified contexts. Finite-state syntax, proposed by Koskenniemi (1990), Koskenniemi, Tapanainen, and Voutilainen (1992), and Voutilainen (1994), accomplishes the same task by adapting the two-level formalism to express syntactic constraints. Brill (1992) introduces replacement rules to improve the accuracy of part of speech disambiguation. Each of these frameworks has its own rule formalism for replacement operations. This paper has a twofold purpose. It first defines replacement in a more general way than is done in most of these formalisms, explicitly allowing replacement to be constrained by input and output contexts, as in two-level rules, but with the following enhancements: • All arguments of the operation, i.e. the substrings to be replaced, those to be generated, as well as left and right contexts, can be described by regular expressions of any complexity under the condition that they denote a regular language, not a regular relation. • A set of replacements can be made in parallel, i.e. without having any influence on each other. In some cases this effect can not be obtained by the composition of the single replacements.

1

Second, the paper defines replacement within a general calculus of regular expressions so that replacements can be conveniently combined with other kinds of operations, such as composition and union, to form complex expressions. The notion “replacement” might suggest a process that (physically) replaces substrings of an input string, i.e. it overwrites them with other substrings. This is a misleading metaphor. Actually, we deal with regular relations that relate the strings of one language to strings of another language. For better understanding we will use expressions like “to replace a symbol”, “to insert an auxiliary symbol” or “to remove symbols”. In these cases we never speak about (physical) modification of a string, but always about the mapping of one string to another one, i.e. no symbol (physically) emerges, changes or disappears. We have already incorporated the new replacement expressions into our implementation of the finite-state calculus. Thus, we can construct transducers directly from replacement expressions as part of the general calculus, without invoking any special rule compiler. We start with a standard kind of regular-expression language and add to it some new operators. These new operators can be used to describe regular relations which relate the strings of one regular language to the strings of another regular language that contain the specified replacement. The replacement may be unconditional or it may be restricted by left and right contexts. The -> operator makes the replacement obligatory, (->) makes the replacement optional. For the sake of completeness, we also define the inverse operators, Ln }

[6]

It is different from the composition of the n single replacement relations, written as { U1 -> L1 }

.o.

...

.o.

{ Un -> Ln }

[7]

We will later illustrate this difference with an example. The relation [6] defined in accordance with [16], [17] and [18] below contains all and only pairs of strings that are related to one another in the manner sketched below. x x

uji lji

y y

uki lki

z z

upper string lower string

[8]

We use uji and uki to represent instances of Ui (with i ∈ [1, n]) and lji and lki to represent instances of Li . The upper string consists of zero or more instances of Ui , possibly interspersed with other material (denoted here by x, y, and z). In the corresponding lower string the sections corresponding to Ui are instances of Li , and the intervening material remains the same (Karttunen, 1995). The relationship is exhaustive in two ways: First, it includes all possible ways of partitioning a given upper string to U - and non-U -sections. Second, for any Ui -segment, it includes the corresponding lower string for each instance of Li .

2.1

Examples

We illustrate parallel replacement with some examples. The regular expression { a -> b b , b b -> a }

[9]

describes a relation mapping in parallel every occurrence of a to bb and every occurrence of bb to a. The result of one replacement does not undergo the other one. The resulting transducer maps the string xaybbz to xbbyaz, while a transducer resulting from the expression

4

{ a -> b b } .o.

{ b b -> a }

[10]

would map xaybbz to xayaz because it is equivalent to the sequential application of two replacements. The first maps xaybbz to xbbybbz and the second xbbybbz to xayaz. A transducer resulting from { b b -> a } .o.

{ a -> b b }

[11]

would map xaybbz to xbbybbz. ?

a:b S0

S2 0:b

b:0 a:b b b:a

?

S1

S3 b:a

Figure 1: Transducer network encoding the expression { a -> b b , b b -> a } Figure 1 shows the state diagram of a transducer encoding expression [9]. The transducer consists of a set of states and arcs that indicate a transition from state to state over a given pair of symbols. Identical pairs that map a particular symbol to itself are represented by a single symbol; e.g. we write a:a as a. The symbol ? represents here any pair of identical symbols that are not explicitly present in the network. Transitions that differ only with respect to the label, are collapsed into a single multiple-labelled arc. The start state is labelled with 0. Double circles mark final states. Every pair of strings in the relation, corresponds to a path from the initial state 0 of the transducer to any final state. Mapping xaybbz to xbbyaz follows the path 0-0-2-0-0-1-0-0.

2.2

Replacement of the Empty String

The language described by the UPPER part of a replacement expression1 UPPER -> LOWER

[12]

can contain the empty string ². In this case, every string that is in the upper-side language of the relation, is mapped to an infinite set of strings in the lower-side language as the upper-side string can be considered as a concatenation of empty and non-empty substrings, with ² at any position and in any number. The relation a* -> x ;

[13]

1 We describe this topic only for uni-directional replacement from the upper to the lower side of a regular relation, but analogous statements can be made for all other types of replacement mentioned in section 4.

5

maps e.g. the string bb to the infinite set of strings bb, xbb, xbxb, xbxbx, xxbb, etc., since the language described by a* contains ², and the string bb can be considered as a result of any one of the concatenations b_ b, ²_ b_ b, ²_ b_ ²_ b, ²_ b_ ²_ b_ ², ²_ ²_ b_ b, etc. For many practical purposes it is convenient to construct a version of empty-string replacement that allows only one application between any two adjacent symbols (Karttunen, 1995). In order not to confuse the notation by a non-standard interpretation of the notion of empty string, we introduce a special pair of brackets, [. .], placed around the upper side of a replacement expression that presupposes a strict alternation of empty substrings and non-empty substrings of exactly one symbol: ²_ x_ ²_ y_ ²_ z_ ²_ ...

[14]

In applying this to the above example, we obtain the relation described by [. a* .]

-> x ;

[15]

that maps the string bb only to the string xbxbx since bb is here considered exclusively as a result of the concatenation ²_ b_ ²_ b_ ².

2.3

The Algorithm

We define the unconditional parallel replacement similar to the simple unconditional replacement (Karttunen 1995, equ. 7), as [ N R ]* N

[16]

where N denotes the language of strings that do not contain any Ui N = ˜$[ [U1 | ...

| Un ] - [ ] ]

[17]

and R stands for the relation that pairs every Ui with the corresponding Li R = [[U1 .x.

L1 ] | ...

| [Un .x.

Ln ]]

[18]

If at least one of the Ui contains [. and .] to express a (single) empty string, instead of using the above algorithm we use the algorithm for conditional parallel replacement without specified contexts (sec. 3.3).

3

Conditional Parallel Replacement

Conditional parallel replacement denotes a relation which maps a set of n expressions Ui (with i ∈ [1, n]) in the upper language into a set of corresponding n expressions Li in the lower language if, and only if, they occur between a left and a right context (li , ri ). { U1 -> L1 || l1

r1 } ,

...

,

{ Un -> Ln || ln

rn }

[19]

Conditional parallel replacement corresponds to what Kaplan and Kay (1994) call “batch rules” where a set of rules (replacements) is collected together in a batch and performed in parallel, at the same time, in a way that all of them work on the same input, i.e. not one applies to the output of another replacement. The idea of conditional parallel replacement can be seen as derived from both unconditional parallel replacement (cf. sec. 2), and single conditional replacement as described by Karttunen (1995). Actually, in our implementation, we combine parts of both algorithms.

6

3.1

Examples

Regular expressions based on formula [19] can be abbreviated if some of the UPPER-LOWER pairs, and/or some of the LEFT-RIGHT pairs, are equivalent. The complex expression { a -> b , b -> c || x

y } ;

[20]

which contains multiple replacement in one left and right context, can be written in a more elementary way as two parallel replacements: { a -> b || x

y } , { b -> c || x y

c

?

b

a

y } ; b c

S0 a

[21]

S2

? x b y

c

x

a

? y

x S3

S1 b:c

a:b

Figure 2: Transducer network encoding the expression { a -> b , b -> c || x y } (Every arc with more than one label actually stands for a set of arcs with one label each.) Figure 2 shows the state diagram of a transducer resulting from expression [20] or [21]. The transducer maps the string xaxayby to xaxbyby following the path 0-1-2-1-3-0-0-0 and the string xbybyxa to xcybyxa following the path 0-1-3-0-0-0-1-2. The regular expression { e -> f || x

y , v

w } ;

[22]

which contains one replacement in multiple contexts, can be written in a more elementary way as { e -> f || x

y } , { e -> f || v

w } ;

[23]

The complex expression { a -> b , b -> c , e -> f || x { a -> c || p q } ;

y , v

w } ,

[24]

contains seven single parallel replacements: { { { { { { {

a a b b e e a

-> -> -> -> -> -> ->

b b c c f f c

|| || || || || || ||

x v x v x v p

y w y w y w q

} } } } } } }

, , , , , , ;

[25]

7

Contexts can be unspecified as in { a -> b || x

y , v

,

w } ;

[26]

where a is replaced by b only when occuring between x and y, or after v, or before w, and equally when the last two conditions are met, i.e. between v and w. An unspecified context is equivalent to ?*, the universal (sigma-star) language. Similarly, a specified context, such as x y, is actually interpreted as ?* x y ?*, that is, implicitly extending the context to infinity on both sides of the replacement. This is a useful convention, but we also need to be able to refer explicitly to the beginning or the end of a string. For this purpose, we introduce a special symbol, .#. (Kaplan and Kay, 1994, p. 349). In the example { a -> b || .#. , v

?

?

.#.} ;

[27]

a is replaced by b only when it is at the beginning of a string or between v and the two final symbols of a string. Note that .#. denotes the beginning or the end of a string depending on whether it occurs in the left or the right context.

3.2

Replacement of the Empty String

The special brackets [. and .] introduced in 2.2 can also be used with the conditional replacement. Here they have the same effect as previously described on the replacement of empty strings, but the additional context constraints are also taken into account.

3.3

The Algorithm

The replacement of one substring by another one inside a context, requires the introduction of auxiliary symbols (brackets). Kaplan and Kay (1994) motivate this step. If we would simply add the regular expressions of the context constraints li and ri (with i ∈ [1, n]) to formula [18] in the following way, R = [ l1 [U1 .x.

L1 ] r1 | ...

| ln [Un .x.

Ln ] rn ]

[28]

then every li and ri would map the adjacent substring of what is mapped by Ui . However, this approach is impossible for the following reason (Kaplan and Kay, 1994): In an example like { a -> b || x

x } ;

[29]

where we expect xaxax to be replaced by xbxbx, the middle x serves as a context for both a’s. A relation described by the formulas [16], [17] and [28] could not accomplish this. The middle x would be mapped either by an ri or by an li but not by both at the same time. That is why only one a could be replaced and we would get two alternative lower strings, xbxax and xaxbx. Therefore, we have to use the context expressions without mapping them. For this purpose we introduce auxiliary brackets i before every right context ri . The replacement maps those brackets without looking at the actual contexts. A regular relation describing replacement in context (and a transducer that represents it), is defined by the composition of a set of “simpler” auxiliary relations. Context brackets occur only in intermediate relations and are not present in the final result.

8

3.3.1

Preparatory Steps

Before the replacement we make the following three transformations: (1) Complex regular expressions like [20] are transformed into elementary ones like expression [21], where every single replacement consists of only one UPPER, one LOWER, one LEFT and one RIGHT expression. This facilitates the computation of the replacement since not only every context pair li ri but also every UPPER-LOWER pair Ui -> Li , has its own pair of brackets i . (2) The set of replacements to be made in parallel is split into two groups, one containing only replacements where UPPER does not contain the empty string and the other one containing only replacements where UPPER is identical with the empty string (of type [ ] or [. .] ). If an UPPER contains the empty string but is not identical with it, then this replacement will be added to both groups but with a different UPPER. This means that e.g. the expression { [.(a).] -> b || x y } , { [ ] -> c , e -> f || v w } ;

[30]

would first be expanded to { [.(a).] -> b || x y } , { [ ] -> c || v w } , { e -> f || v w } ;

[31]

Then these three single parallel replacements would be put into the two groups, i.e. the first and last replacement { a -> b || x { e -> f || v

y } , w } ;

[32]

would be put to the group of non-empty UPPER and the first and the second replacement { [. .] -> b || x y } , { [ ] -> c || v w } ;

[33]

would be put to the group of empty UPPER. The reason for separating the replacement of empty UPPER from those of non-empty UPPER, will be explained later (cf. sec. 3.3.3). (3) All empty UPPER of type [ ] are transformed into type [. .] and the corresponding LOWER are replaced by their Kleene star function. E.g., expression [33] would be transformed into { [. .] { [. .]

-> b || x y } , -> c* || v w } ;

[34]

The following algorithm of conditional parallel replacement will consider all empty UPPER as being of type [. .], i.e. as not being adjacent to another empty string.

9

3.3.2

The Replacement itself

Apart from the previously explained symbols, we will make use of the following symbols in the next regular expressions: allE > Li || li ri } we introduce a separate pair of brackets i with i ∈ [1E...mE] if UPPER is identical with the empty string and i ∈ [1...n] if UPPER does not contain the empty string. The distribution of these auxiliary brackets is controlled by the steps (2), (3) and (4). Step (2) constrains them with respect to each other, (3) and (4) with respect to left and right contexts. A left bracket i marks the beginning of a complete right context. The replacement expression in step (5) includes auxiliary brackets on the upper and on the lower side of the relation. The final result of the composition does not contain any brackets. Step (1) removes them from the upper side and step (6) from the lower side. Since in some cases we first make the replacement and only then check for the brackets, it is necessary that every replacement of a substring preserves at least the brackets i that 10

it uses, and those which might be used for the replacement of adjacent substrings. For computational reasons, we try to limit the number of different brackets2 . Therefore, when the contexts of two (or more) of the replacements are identical, li = lk and ri = rk , we always use identical pairs of brackets,k , to mark them. Note that in the same way as li , lk , ri and rk are not contexts that literally occur in a replacement expression, but only placeholders for contexts that might be identical, the symbols k do not occur literally in an expression and are only placeholders for real (literal) bracket symbols that might also be identical. We define the component relations in the following way. Note that UPPER, LOWER, LEFT and RIGHT (Ui , Li , li and ri ) stand for regular expressions of any complexity but restricted to denote regular languages. Consequently, they are represented by networks that contain no fst pairs. (1) InsertBrackets [ ]

allE [ >allN E ] ] & ˜$[ allN E | >allE ] ] & ˜$[ allN E | >allE | allN E * >allE * 1 >1E 1E 1E 1E 1 >1E

>2E 1 c || x

;

[55]

we would have two pairs of brackets, 1 for (a) -> b and 2 for b -> c . Here the string xa would be incorrectly replaced by xcb in the following way: >2 >1 x >2 >1 1 >2 >1 x >2 >1 1 1 >2 >1 x >2 >1 1 1

[56]

The correct replacement, however, is xbb. If we first replaced the non-empty UPPER, in an example like (a) -> b // x

;

[57]

a string like xa would be replaced as follows: We would first check for the right contexts and brackets. Then we would make the replacement: [58]

>1 1 1 1 1 1 L1 // l1 (3) Left-oriented: { U1 -> L1 \\ l1 (4) Downward-oriented: { U1 -> L1 \/ l1

All four kinds of replacement expressions describe a relation that maps an UPPER to a LOWER between LEFT and RIGHT leaving everything else unchanged. The difference between them is where the left and the right contexts are expected, on the upper or on the lower side of the relation, i.e. LEFT and RIGHT contexts can be checked before or after the replacement. We obtain a relation that describes the upward-oriented replacement by composing the auxiliary relations (steps) described in section 3.3.2 in the order:

16

[64]

InsertBrackets .o. ConstrainBrackets .o. LeftContext .o. RightContext .o. Replace .o. RemoveBrackets

This means that before the replacement step, we check the left and the right contexts. Brackets which were previously present in an optional way will now be confirmed (made obligatory) or forbidden (deleted) according to the context, i.e. no optional brackets will remain. The replacement will then be made by using those brackets. With the other modes of context constraint, we vary the order of the three middle steps which means we check the left and/or right context after the replacement. This can be thought of as using optional brackets (still not checked) for the replacement and checking them afterwards. If such a bracket is then confirmed, the replacement is accepted. Otherwise, if the bracket is forbidden the replacement will be cancelled a posteriori. In the case of the right-oriented version (//) the left context is checked on the lower side of the replacement (afterwards): [65] ... RightContext .o. Replace .o. LeftContext ... The left-oriented version (\\) applies the constraint in the opposite order: ... [66] LeftContext .o. Replace .o. RightContext ... The first three versions roughly correspond to the three alternative interpretations of phonological rewrite rules discussed in Kaplan and Kay (1994). The upward-oriented version corresponds to the simultaneous rule application; the right- and left-oriented versions can model rightward or leftward iterating processes, such as vowel harmony and assimilation. In the fourth version (\/) the replacement operation is constrained by the lower (left and right) context: ... [67] Replace .o. LeftContext .o. RightContext ... 17

Here the Ui (with 1 ≤ i ≤ n) gets mapped to the corresponding Li just in case they end up between li and ri in the output string. As in the upward-oriented case, the relative order of the two context constraints, with respect to each other, is irrelevant.

4.2

Inverse, bidirectional and optional replacement

This section applies to both the unconditional and conditional parallel replacement. A non-inverse replacement, ->, maps every Ui on the upper side to the corresponding Li on the lower side while a Li on the upper side gets mapped onto itself. This means that if we have an Li on the lower side and look upwards in the relation, we obtain two alternative results on the upper side, Ui and Li . With the inverse replacement, ) () Li } this means that an Ui on the upper side is mapped either to a corresponding Li or to itself (no replacement) on the lower side. We define such a relation by changing N (the part not containing any UPPER) in the expressions [16] and [41] into ?* that accepts every substring: [ ?* R]* ?*

[68]

In this expression an Ui is either mapped by the corresponding Ri contained in R (cf. expression 45) and therefore replaced by Li , or it is mapped by ?* and not replaced.

5

Generation of the French Subjunctive as a Practical Application of Replacement

In this section we illustrate the usefulness of the replace operator using a practical example, and compare an expression from the example with rules of the same content written in the two-level formalism (Koskenniemi, 1983). We show how a lexicon of French verbs ending in -ir, inflected in the present tense subjunctive mood, can be derived from a lexicon containing the corresponding present indicative verb forms. We assume here that irregular verbs are encoded separately. It is often proposed that the present subjunctive of -ir verbs be derived, for the most basic case, from a stem in -iss- (e.g.: finir /finiss) rather than from a more general root (e.g.: fin(i)) because once this stem is assumed, the subjunctive ending itself becomes completely regular: que je finiss-e (that I finish) que tu finiss-es que il finiss-e que nous finiss-ions que vous finiss-iez qu’ils finiss-ent

que je cour-e (that I run) que tu cour-es que il cour-e que nous cour-ions que vous cour-iez qu’ils cour-ent

18

The algorithm we propose here, is straightforward: We first derive the present subjunctive stem from the third person plural present indicative (e.g.: finiss, cour), then append the suffix corresponding to the given person and number. The first step can be described as follows: define LETTER : a | b | c | d | ....

[69] ;

define TAG : "+IndP" | "+SubjP" | ... | "+SG" | "+PL" | ... | "+P1" | "+P2" | "+P3" | ... | "+Verb" | ... ; define StemRegular : [ [. .] "+IndP" "+PL" "+P3" "+Verb" || LETTER .o. [ LexInd TAG+ ] .o. [ e n t "+SUFF" || TAG ] ;

[70]

[71] TAG ]

The transducer in the first line of expression [71] inserts the tags of the third person plural present indicative between the word and the tags of the actually required subjunctive form. The next transducer (eq. [71], line 3) which is an indicative lexicon of -ir verbs, concatenated with a sequence of at least one tag, provides the indicative form and keeps the initial subjunctive tags. The last transducer (eq. [71], line 5) replaces the suffix -ent by the symbol ”+SUFF”. Example: finir__________________+SubjP+PL+P2+Verb finir_+IndP+PL+P3+Verb_+SubjP+PL+P2+Verb finissent______________+SubjP+PL+P2+Verb finiss+SUFF____________+SubjP+PL+P2+Verb In some cases the stem has to be modified in the following way: define StemModif : i -> y || [ o | u ]

[72] "+SUFF" TAG* "+PL" [ "+P1" | "+P2" ] ;

To append the appropriate suffix to the subjunctive stem, we use the following transducer which maps the symbol "+SUFF" to a suffix and deletes all tags: define Suffix : [ "+SUFF" -> e .o. [ "+SUFF" -> e .o. [ "+SUFF" -> i .o. [ "+SUFF" -> i .o. [ "+SUFF" -> e .o. [ TAG -> [ ] ]

[73] ||

TAG* "+SG" [ "+P1" | "+P3" ] ]

s ||

TAG* "+SG" "+P2" ]

o n s ||

TAG* "+PL" "+P1" ]

e z ||

TAG* "+PL" "+P2" ]

n t ||

TAG* "+PL" "+P3" ]

;

19

The complete generation of subjunctive forms can be described by the composition: [74]

define LexSubjP : StemRegular .o. StemModif .o. Suffix ;

The resulting (single) transducer LexSubjP represents a lexicon of present subjunctive forms of French verbs ending in -ir. It maps the infinitive of those verbs followed by a sequence of subjunctive tags, to the corresponding inflected surface form and vice versa. All intermediate transducers mentioned in this section will contribute to this final transducer but will themselves disappear. For all irregular verbs in -ir, we construct a separate transducer, details of which we do not mention for simplicity’s sake. Figure 3 (p. 20) gives an overview of the whole operation.

finir________________+SubjP+SG+P2+Verb voir________________+SubjP+PL+P1+Verb StemRegular finir+IndP+PL+P3+Verb+SubjP+SG+P2+Verb voir+IndP+PL+P3+Verb+SubjP+PL+P1+Verb finissent____________+SubjP+SG+P2+Verb voient______________+SubjP+PL+P1+Verb finiss+SUFF__________+SubjP+SG+P2+Verb voi+SUFF____________+SubjP+PL+P1+Verb StemModif finiss+SUFF__________+SubjP+SG+P2+Verb voy+SUFF____________+SubjP+PL+P1+Verb Suffix finisses

voyions

Transducers: StemRegular, StemModif, Suffix. Input and output strings (examples): savoir+SubjP+PL+P2+Verb, etc.

Figure 3: Overview of the subjunctive transduction, with intermediate steps The regular expressions in this section could also be written in the two-level formalism (Koskenniemi, 1983). However, some of them can be expressed more conveniently in the above way, especially when the replace operator is used. E.g., the first line of expression [71], written above as: [ [.

.]

"+IndP" "+PL" "+P3" "+Verb" || LETTER

would have to be expressed in the two-level formalism by four rules:

20

TAG ] ;

[75]

0:%+IndP LETTER (:%+PL) (:%+P3) 0:%+PL LETTER (:%+IndP) (:%+P3) 0:%+P3 LETTER (:%+IndP) (:%+PL) 0:%+Verb LETTER (:%+IndP) (:%+PL)

(:%+Verb) (:%+Verb) (:%+Verb) (:%+P3)

TAG TAG TAG TAG

; ; ; ;

[76]

Here, the difficulty comes not only from the large number of rules we would have to write in the above example, but also from the fact that writing one of these rules requires to have in mind all the others, to avoid inconsistencies between them.

Acknowledgements This work builds on the research by Ronald Kaplan and Martin Kay on the finite-state calculus and the implementation of phonological rewrite rules (1994). Many thanks to all our collegues at PARC and RXRC Grenoble who helped us in whatever respect. We are grateful to Annie Zaenen for important suggestions to the practical application in section 5. Particular thanks are due to Jean-Pierre Chanod and Marc Dymetman for extensive and helpful discussion on French morphology and empty-string replacement respectively, as well as to Kenneth Beesley and Anne Schiller for discussion on two-level rules. Many thanks also to Irene Maxwell for correcting various versions of the paper.

References Brill, Eric (1992). A Simple Rule-Based Part of Speech Tagger. In the Proceedings of the 3rd conference on Applied Natural Language Processing. Trento, Italy, pp. 152-155. Kaplan, Ronald M., and Kay, Martin (1981). Phonological Rules and Finite-State Transducers. Paper presented at the Annual Meeting of the Linguistic Society of America. New York. Kaplan, Ronald M. and Kay, Martin (1994). Regular Models of Phonological Rule Systems. In Computational Linguistics. 20:3, pp. 331-378. Karlsson, Fred, Voutilainen, Atro, Heikkil¨a, Juha, and Anttila, Arto (1994). Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin. Karttunen, Lauri (1995). The Replace Operator . Technical Report MLTT - 017. Rank Xerox Research Centre, Grenoble Laboratory. February 6, 1994. Koskenniemi, Kimmo (1983). Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. Department of General Linguistics. University of Helsinki. Koskenniemi, Kimmo (1990). Finite-State Parsing and Disambiguation. In the Proceedings of Coling-90 . Helsinki, Finland. Koskenniemi, Kimmo, Tapanainen, Pasi, and Voutilainen, Atro (1992). Compiling and using finite-state syntactic rules. In the Proceedings of Coling-92 . Nantes, France. Voutilainen, Atro (1994). Three Studies of Grammar-Based Surface Parsing of Unrestricted English Text. The University of Helsinki.

21