INSTRUCTION FILE ... - André Kempe

Aug 18, 2009 - 4 Department of Engineering Mathematics, University of Bristol ...... means, parts of the δ〈i,j〉-curve can be higher than the corresponding parts.
284KB taille 1 téléchargements 42 vues
August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

Algorithms for the Join and Auto-Intersection of Multi-Tape Weighted Finite-State Machines∗

Jean-Marc Champarnaud1 , Franck Guingne2,3 , Andr´e Kempe† 2 and Florent Nicart2,4 1 LITIS, Universit´ e de Rouen 76800 Saint-Etienne du Rouvray — France [email protected] – http://www.litislab.eu 2

Xerox Research Centre Europe – Grenoble Laboratory 6 chemin de Maupertuis – 38240 Meylan — France [email protected] – http://www.xrce.xerox.com 3

I3S (Universit´e de Nice–Sophia Antipolis, CNRS) 06903 Sophia Antipolis Cedex — France [email protected] – http://www.i3s.unice.fr/

4

Department of Engineering Mathematics, University of Bristol Queen’s Building, University Walk, Clifton, Bristol, BS8 1TR [email protected] – http://patterns.enm.bris.ac.uk/

A weighted finite-state machine with n tapes describes a rational relation on n strings. We recall some basic operations on n-ary rational relations, recast the important join operation in terms of “autointersection”, and propose restricted algorithms for both operations. If two rational relations are joined on more than one tape, it can unfortunately lead to non-rational relations with undecidable properties. As a consequence, there cannot be a fully general algorithm, able to compile any rational join or auto-intersection. We define a class of triples hA, i, ji for which we are able to compile the autointersection of the machine A w.r.t. tapes i and j. We hope that this class is sufficient for many practical applications.

1. Introduction Multi-tape finite-state machines (FSMs) [24, 5, 9, 8] are a natural generalization of the familiar one- and two-tape cases, known respectively as finite-state acceptors and transducers. An n-tape FSM characterizes n-tuples of strings. The set of tuples that it accepts is called an n-ary relation. If the FSM is weighted, it defines a weighted n-ary relation that assigns each n-tuple a weight (in some semiring), such as a probability. In natural language processing, n-tape machines have recently been used to represent lattices of hspeech, gesture, interpretationi triples for processing multimodal input [2]. In the morphological analysis of Semitic languages, multiple tapes have been used to synchronize the vowels, consonants, and templatic pattern into a surface form [9, 14]. ∗ Sections 1-3 of this submission are adapted from a workshop paper by Kempe, Champarnaud, and Eisner (2004). We thank Jason Eisner for allowing us to include his contributions to those sections, notably the connection to relational databases and associated notation. † This co-author changed affiliation after completing his contribution to this paper and can now be reached at Yahoo! Search Technologies, 17 rue Guillaume Tell, 75017 Paris, [email protected]

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

2

The relations defined by FSMs are known as rational relations. However, unlike a classical database, a multi-tape finite state machine may even define infinitely many tuples. In this paper, we focus on the multi-tape join and the auto-intersection operations. The join operation is fundamental since some applications cannot be performed without it. We show how multi-tape join can be compiled through a sequence of more basic operations, such as single-tape (equi-) join and auto-intersection, for which we provide algorithms. However, generalizing weighted transducers to weighted multi-tape machines has a price to pay: cyclic FSMs are closed under the rational operations, but not under all of the relational operations, as for example with intersection [24]. Similarly, there does not exist a fully general algorithm of auto-intersection [12] due to a consequence of Post Correspondence Problem [23]. Thus, we define a class Θ of weighted multi-tape machines that is closed with regard to the auto-intersection and provide algorithms to perform the membership test and the auto-intersection. This paper is structured as follows. Basic definitions of weighted n-ary relations and n-tape weighted finite-state machines (n-WFSMs) are recalled in Section 2.2, and related operations, in particular join and auto-intersection, are presented in Section 3. The importance of multi-tape join in practice is emphasized by examples of applications that cannot be performed without this operation. Section 4 describes a class Θ of multi-tape machines for wich the auto-intersection is computable and defines it by using the notion of delay between tapes. Section 5 gives definitions and algorithms for auto-intersection with single pair of tapes and multiple pairs of tapes and describes also an iterative approach. Sections 6 discuss an extension of the class Θ. 2. Definitions We recall basic definitions about monoids, semirings and n-ary weighted relations. The definition of an n-WFSM follows the usual definitions for multi-tape finite-state automata [5, 3], with semiring weights added just as for acceptors and transducers [16, 21]. 2.1. Semirings A monoid is a structure hM, ◦, ¯ 1i consisting of a set M , an associative binary operation ◦ 1 such that ¯1 ◦ a = a ◦ ¯1 = a for all a ∈ M . A monoid is called on M , and a neutral element ¯ commutative iff a ◦ b = b ◦ a for all a, b ∈ M . A semiring is a structure K = hK, ⊕, ⊗, ¯0, ¯1i consisting of a set K, two binary operations, ⊕ (collection) and ⊗ (extension), and two 1, that satisfies the following properties: 0 and ¯ neutral elements, ¯ ¯ • hK, ⊕, 0i is a commutative monoid, and hK, ⊗, ¯1i is a monoid, • extension is left- and right-distributive over collection: a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c) , (a ⊕ b) ⊗ c = (a ⊗ c) ⊕ (b ⊗ c) , ∀a, b, c ∈ K, • ¯ 0 is an annihilator for extension: ¯0 ⊗ a = a ⊗ ¯0 = ¯0 , ∀a ∈ K. A semiring is called commutative if extension is commutative: a ⊗ b = b ⊗ a , ∀a, b ∈ K. Examples of commutative semirings are: (1) h{FALSE, TRUE}, ∨, ∧, FALSE, TRUEi : the boolean semiring,

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

3

(2) hN, +, ×, 0, 1i : a non-negative integer semiring, (3) hR≥0 , +, ×, 0, 1i : a non-negative real semiring that can be used to model probabilities, (4) hR≥0 ∪ {∞}, min, +, ∞, 0i : a “tropical” semiring, sometimes used to model negative logarithms of probabilities. 2.2. Weighted n-ary Relations and Multi-Tape Weighted Finite-State Machines A weighted n-ary relation is a function from (Σ∗ )n to K, for a given finite alphabet Σ and a given weight semiring K = hK, ⊕, ⊗, ¯0, ¯1i. A relation assigns a weight to any n-tuple of strings. A weight of ¯ 0 can be interpreted as meaning that the tuple is not in the relation. We are especially interested in rational (or regular) n-ary relations, i.e. relations that can be encoded by n-tape weighted finite-state machines, that we now define. We adopt the convention that variable names referring to n-tuples of strings include a superscript (n) . Thus we write s(n) rather than ~s for a tuple of strings hs1 , . . . sn i. We also use this convention for the names of more complex objects that contain n-tuples of strings, such as n-tape automata and their transitions and paths. In the following, we will use a pairing operator, denoted by a colon: s(m+n) = t(m) : u(n) . An n-tape weighted finite-state machine (WFSM or n-WFSM)a A(n) is defined by a six-tuple A(n) = hΣ, Q, K, E (n) , λ, ̺i, with Σ being a finite alphabet, Q a finite set of states, K = hK, ⊕, ⊗, ¯ 0, ¯ 1i the semiring of weights, E (n) ⊆ (Q×(Σ∗ )n ×K×Q) a finite set of weighted n-tape transitions, λ : Q → K a function that assigns initial weights to states, and ̺ : Q → K a function that assigns final weights to states. Any transition e(n) ∈ E (n) has the form e(n) = hp, ℓ(n) , w, ti. We refer to these four components as the transition’s source state p(e(n) ) ∈ Q, its label ℓ(e(n) ) ∈ (Σ∗ )n , its weight w(e(n) ) ∈ K, and its target state t(e(n) ) ∈ Q. We refer to the set of out-going transitions of a state q ∈ Q by E(q), and to the set of its in-coming transitions by E R (q) (with E(q) ⊆ E (n) and E R (q) ⊆ E (n) ). (n) (n) (n) A path γ (n) of length k ≥ 0 is a sequence of transitions e1 e2 · · · ek such that (n) (n) t(ei ) = p(ei+1 ) for each i ∈ [[1, k−1]]. The label of a path is the element-wise concatena(n)

(n)

(n)

tion of the labels of its transitions: ℓ(γ (n) ) =def ℓ(e1 ) · ℓ(e2 ) · · · · · ℓ(ek ). The weight of a path is defined to be       O  (n)  (n) (n) (n) w ej  ⊗ ̺ t(ek ) w(γ ) =def λ p(e1 ) ⊗  (1) j∈[[1,k]]

The path is said to be successful, and to accept its label, if w(γ (n) ) 6= ¯0. We denote by ΓA(n) the set of all successful paths of A(n) , and by ΓA(n) (s(n) ) the set of successful paths (if any) that accept the n-tuple of strings s(n) : ΓA(n) (s(n) ) = { γ (n) ∈ ΓA(n) | s(n) = ℓ(γ (n) ) } (n)

(n)

(2) ∗ n

Now, the machine A defines a weighted n-ary relation R(A ) : (Σ ) → K that assigns to each n-tuple, s(n) , the total weight of all paths accepting it: a We

follow some recent literature in using the term “machine” rather than “automaton.” The acronym to refer to the general n-tape case is then FSM or n-FSM, which leaves the acronym FSA available to refer to the special case of a finite-state acceptor (n = 1). The special case of a finite-state transducer (n = 2) is referred to by FST.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

4

RA(n) (s(n) ) =def

M

w(γ (n) )

(3)

γ (n) ∈ΓA(n) (s(n) )

It is convenient to define the support of an arbitrary weighted relation R(n) , meaning the set of tuples to which the relation gives non-¯0 weight: support(R(n) ) =def { s(n) ∈ (Σ∗ )n | R(n) (s(n) ) 6= ¯0 } (4) This support set can be regarded as an ordinary unweighted relation obtained from R(n) . A different perspective on unweighted relations is that they are weighted relations over the boolean semiring, i.e. functions from (Σ∗ )n → {FALSE, TRUE}. 2.3. Infinite Sums In defining R(A(n) ), we glossed over one point for simplicity’s sake. A sum over finitely many weights can be computed by repeated application of ⊕. But (3) may sometimes call for an infinite sum, whose meaning has not been defined. This case arises if RA(n) contains any cyclic paths with the label hǫ, ǫ, . . . ǫi. Cyclic paths of this sort cannot simply be disallowed in a natural way, since they can be re-introduced by the closure and projection operations discussed below. ∞ L k i ∈ K for each Briefly, the solution is to pre-compute the geometric sum k ∗ = i=0

k ∈ K.b In practice, one simply defines a closure operator ∗ that satisfies certain axioms, obtaining a so-called closed semiring. This allows infinite sums over any regular set of paths, as required by (3) and by Section 3’s Equations (6), (7) and (8). One constructs a WFSM containing just those paths (i.e. ΓA(n) (s(n) )), and then sums their weights with an algorithm that generalizes the Kleene-Floyd-Warshall technique to closed semirings [18]. 3. Operations

We now describe some central operations on n-ary weighted relations and their n-tape WFSMs, focusing on operations that affect the number of tapes (see [13]). In particular, we introduce an “auto-intersection” operation that will simplify the discussion and compilation of multi-tape join. Our notation is chosen throughout to highlight the connection to relational databases. 3.1. Rational Operations The basic rational operations of union, concatenation and closure can be used to construct any n-ary weighted rational relation.c Thus, the rational operations can be used to write regular expressions that specify particular relations. On the database perspective, such expressions are useful for specifying both actual databases (i.e. finite relations) and particular queries (typically infinite relations, i.e. the set of all tuples with a given property). (n) (n) The union and concatenation of two weighted n-ary relations, R1 and R2 , are the (n) (n) (n) (n) relations R1 ∪ R2 and R1 · R2 defined by b Divergent

sums can be represented by k ∗ = ∞, where ∞ ∈ K is a distinguished value. combining the “atomic” weighted relations, namely those whose support is a single tuple from the finite set {(s1 , s2 , . . . sn ) : |s1 s2 · · · sn | ≤ 1}. c By

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

5

 (n) (n) (n) (n) (s(n) ) =def R1 (s(n) ) ⊕ R2 (s(n) ) R1 ∪ R2   M (n) (n) (n) (n) (s(n) ) =def R1 · R2 R1 (u(n) ) ⊗ R2 (v (n) )



(5) (6)

u(n) ,v (n) : (∀i∈[[1,n]])si =ui ·vi

The closure of R(n) is the relation (n) ∗

(R

)

=def

∞ [

k=0

  (R(n) )∗ (hs1 , . . . sn i)

=

∞ M k=0

(n) (n) (n) |R · R {z · · · R } , implying that k times

M

k O

(n)

R(n) (uj ) (7)

(n) (n) j=1 u1 ,...uk : (∀i∈[[1,n]])si =(u1 )i ·(u2 )i ···(uk )i

These operations can be implemented by simple constructions on the corresponding nondeterministic n-tape WFSMs [25]. These n-tape constructions and their semiringweighted versions are exactly the same as for acceptors (n = 1) and transducers (n = 2), as they are indifferent to the n-tuple transition labels. 3.2. Projection and Complementary Projection Projection keeps certain columns of a database relation and discards the others. In the case of a rational relation implemented by a n-WFSM, it can be implemented by discarding the corresponding tapes of the n-WFSM, yielding an m-WFSM. Notice that m is not necessarily smaller than n since tapes that are kept can be duplicated. Projection may map several distinct n-tuples onto the same m-tuple. In this case, we will define the weight of the m-tuple by summing the several n-tuples’ weights using ⊕. This resembles aggregation in databases, but note that only weights can be aggregated across n-tuples, not the (string) data in the n-tuples themselves. For any j1 , . . . jm ∈ [[1, n]], we formally define a projection operator πhj1 ,...jm i that maps n-ary relations to m-ary relations:   M (n) (n) R1 (u(n) ) (8) πhj1 ,...jm i (R1 ) (s(m) ) =def u(n) : (∀i∈[[1,m]]) si=uji

It retains only those component strings (i.e. tapes) of each tuple that are specified by the indices j1 , . . . jm , and places them in the specified order. Notice that our definition allows projection indices to occur in any order, possibly with repeats. Thus the tapes of s(n) can be permuted or duplicated. For example, πh2,1i will invert a 2-ary relation. As a convenience, we also define the complementary projection of a relation. For any j1 , . . . jm ∈ [[1, n]], we define an operator π {j1 ,...jm } that removes the tapes j1 , . . . jm and preserves all other tapes in their original order. Without loss of generality we may assume that j1 < j2 < · · · < jm ; then we can define π {j1 ,...jm } as equivalent to πh1,...j1 −1,j1 +1,...jm −1,jm +1,...ni , which maps n-ary relations to (n − m)-ary relations.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

6

3.3. Join Definition: The reader may already be familiar with the notion of natural join on databases. Our presentation differs from the standard database treatment in that our tapes are numbered, whereas the columns of a database are typically named. So our join operators, unlike a database join, must explicitly select tapes by number, and as a result are neither associative nor commutative. A join of two relations is formed by finding “matching” pairs of tuples. For example, habc, def, ǫi and hdef, ghi, ǫ, jkli match on two of their tapes. We note {2 = 1, 3 = 3} the matching of tapes in this case. They combine to yield a tuple habc, def, ǫ, ghi, jkli, whose weight in the joined relation is the ⊗-product of the two original tuples’ weights. More precisely, for any distinct i1 , . . . ir ∈ [[1, n]] and any distinct j1 , . . . jr ∈ [[1, m]], we define a join operator 1{i1 =j1 ,...ir =jr } . It combines an n-ary and an m-ary relation into an (n + m − r)-ary relation defined as follows: 

(n)

(m)

R1 1{i1 =j1 ,...ir =jr } R2

where v (m) is the (∀k ∈ [ 1, r]]) vjk = uik .



(n)

(m)

(hu1 , . . . un , s1 , . . . sm−r i) =def R1 (u(n) ) ⊗ R2 (v(m) )

(9)

π{j1 ,...jr } (v(m) ) = s(m−r)

and

unique

tuple

such

that

Applications: Our version of the join operation is quite powerful. It can be used to join two “databases” (finite relations), to conjoin two “queries” (typically infinite relations), or to select those database tuples that match a query, reweighting them if the query is weighted. Another family of uses is inspired by natural language processing, where WFSTs (n = 2) are commonly used to construct noisy channel models [15]. Using n > 2 tapes allows us to generalize naturally to doing constraint programming or graphical modeling over stringvalued variables. Given variables V1 , . . . Vn with unknown values in the infinite domain Σ∗ , one can specify a (weighted) m-ary relation to express a (soft) constraint over some m ≤ n of the variables. All known constraint relations can be systematically joined together, along tapes that correspond to common variables. This yields a (weighted) n-ary relation that evaluates which n-tuples are appropriate as joint values of the n variables. If this n-ary relation specifies a probability distribution over n-tuples, one can intersect it with another n-ary relation describing incomplete data, in order to compute the probability of the data for purposes of parameter training or statistical inference [4]. It turns out that a lot of practical applications could not be performed without the use of the multi-tape join operation. Let us cite for example the following problems: applying an n-WFSM to an r-tuple of input strings (transduction), conditional and joint probabilistic normalization [4], preservation of intermediate results in transducer cascades [1, 22, 10, 17] or searching for cognates [11]. Unfortunately, rational relations are not closed under arbitrary joins [12]. Nonetheless, we can mathematically define the possibly non-rational result of a join. The operation appears so useful that it is helpful to have a partial algorithm (Section 5).

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

7

Relation to Cross Product: Taking r = 0 gives an important special case. The cross product operator ×, equivalent to 1∅ , combines an n-ary and an m-ary relation into an (n + m)-ary relation: (m)

(n)

R1 × R2

(n)

=def R1

(m)

1∅ R2

(10)

with the result that   (m) (n) (m) (n) (hu1 , . . . un , v1 , . . . vm i) = R1 (u(n) ) ⊗ R2 (v (m) ) R1 × R2 (n)

(m)

(n)

(11) (m)

A WFSM for R1 × R2 can easily be constructed from WFSMs for R1 and R2 , by concatenating them after appropriately “padding” their transition labels into (n+m)-tuples via extra epsilons. Thus, the cross product of weighted rational relations is always rational. Relation to Intersection: Taking n = r = m gives another important special case. The intersection of two n-ary relations is the n-ary relation defined by: (n)

(n)

R1 ∩ R2 with the result that 

(n)

(n)

R1 ∩ R2

(n)

=def R1 

(n)

1{1=1,2=2,...n=n} R2

(n)

(n)

(s(n) ) = R1 (s(n) ) ⊗ R2 (s(n) )

(12)

(13)

It is known that the intersection of two transducers (n = 2) is not necessarily rational [24]: {haj b∗ , cj i | j ∈ N} ∩ {ha∗ bj , cj i | j ∈ N} = {haj bj , cj i | j ∈ N}. Thus rational relations are not closed under the more general join operation, either. Single-Tape Join: We speak about single-tape join if only one tape is used in each relation (r = 1). Two well-known special cases are the join 1{1=1} used to intersect two acceptors and the join 1{2=1} used during classical composition of two transducers. The single-tape join of weighted multi-tape rational relations is rational as long as the weights fall in a commutative weight semiring. One can construct a WFSM for the resulting relation, using a standard “cross-product of states” construction. The commutativity of the weights is crucial to this construction. (The constructed WFSM’s paths interleave weights from paths in the two input WFSMs.) No such construction is possible if the weight semiring For example, it is shown in [20] that, in the case of a non-commutative semiring, the composition of two (non both acyclic) weighted transducers cannot be represented by a weighted finite-state transducer. 3.4. Auto-Intersection Our discussion of join will be simplified by reducing it to a simpler problem. For any distinct i1 , j1 , . . . ir , jr ∈ [[1, n]], we define an auto-intersection operator σ{i1 =j1 ,i2 =j2 ,...ir =jr } that maps a relation R(n) to a “subset” of that relation, preserving tuples s(n) whose elements are equal in pairs as specified, but removing all other tuples

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

8

from the support of the relation.d The formal definition is the following: (  R(n) (hs1 , . . . sn i) if (∀k ∈ [ 1, r]])sik = sjk σ{i1 =j1 ,...ir =jr } (R(n) ) (hs1 , . . . sn i) =def (14) ¯0 otherwise

Note that auto-intersecting a relation is different from joining the relation with its own projections. For example, if R(2) is supported by {ha, bi, hb, ai}, then σ{1=2} (R(2) ) has empty support. By contrast, both R(2) 1{1=1} πh2i (R(2) ) and R(2) 1{2=1} πh1i (R(2) ) have the same support as R(2) . It is possible to reduce a join to an auto-intersection using only rational operations (namely cross product and complementary projection). An arbitrary join can be implemented as   (n) (m) (n) (m) R1 1{i1 =j1 ,...ir =jr } R2 = π{n+j1 ,...n+jr } σ{i1 =n+j1 ,...ir =n+jr } ( R1 ×R2 ) (15) Conversely, any auto-intersection can be reduced to a single join with a rational relation:  

  σ{i1 =j1 ,...ir =jr } (R(n) ) = R(n) 1{i1 =1,j1 =2,...ir =2r−1,jr =2r} (πh1,1i (Σ∗ )×· · ·×πh1,1i (Σ∗ ) (16) {z } | r times

Thus, for any class of “difficult” join instances whose results are non-rational or have undecidable emptiness [12], there is a corresponding class of difficult auto-intersection instances, and vice-versa. Conversely, a partial solution to one problem would yield a partial solution to the other. Note that an auto-intersection on multiple pairs of tapes can be defined in terms of multiple auto-intersections on a single pair of tapes each: σ{i1 =j1 ,...ir =jr } ( R(n) ) =def σ{ir =jr } ( · · · σ{i1 =j1 } ( R(n) ) · · · )

(17)

Nonetheless, we caution that the general case might benefit from a more direct treatment. It may be wise to compute σ{i1 =j1 ,...ir =jr } “all at once” rather than one tape pair at a time. The reason is that even when σ{i1 =j1 ,...ir =jr } is rational, a finite-state strategy for computing it via (17) could “fail” by encountering non-rational intermediate results. For example, consider applying σ{2=3,4=5} to the rational 5-ary relation {hai bj , ci , cj , d, ei | i, j ∈ N}. The final result is rational (the empty relation), but the intermediate result after applying just σ{2=3} would be the non-rational relation {hai bi , ci , ci , d, ei | i ∈ N}. 4. A class of computable auto-intersections In this section we define a class of computable single-pair auto-intersections, based on the notion of delay between two tapes. We will adopt the following conventions: we refer by A(n) = hΣ, Q, K, E (n) , λ, ̺i to a (n) (n) new n-WFSM that is the result of a construction, and by Ai = hΣi , Qi , Ki , Ei , λi , ̺i i d The requirement that the 2r indices be distinct mirrors the similar requirement on join and is needed in (16). But it can be evaded by duplicating tapes: an illegal auto-intersection such as σ{1=2,2=3} (R) can be computed as π {3} (σ{1=2,3=4} (πh1,2,2,3i (R))).

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

9

to original n-WFSMs that are used in a construction. For example, we write A(n) = (m) (n) (n) σ{...} (A1 ) or A(n+m−r) = A1 1{...} A2 . Notice that in the following sections (n)

several different n-WFSMs have the same name A1 . There is no ambiguity since their definition is related to a figure. 4.1. Delay between two tapes 4.1.1. Definition of delay We will use the notion of delay between two tapes, similarly as in the synchronization of transducers [6, 19]. By delay we understand the difference of length of two strings of an n-tuple: δhi,ji (s(n) ) = |si |−|sj | with i, j ∈ [[1, n]]. The delay of a path γ = γ1 γ2 · · · γr , or of any of its factors γh , results from its respective labels on tapes i and j: δhi,ji (γ) = |πhii (ℓ(γ))|−|πhji (ℓ(γ))| with i, j ∈ [[1, n]]. We will use ℓi ( ) as a shorthand for πhii (ℓ( )), yielding: δhi,ji (γ) = |ℓi (γ)|−|ℓj (γ)|. max We call the delay bounded if its absolute value does not exceed a limit δhi,ji . We say that a path has bounded delay, if all its prefixes have bounded delay, and say that an nWFSM has bounded delay if all its successful paths γ ∈ ΓA(n) have bounded delay. The delay of the n-WFSM is then bounded by a single constant that applies to all its successful paths. 4.1.2. Delay of paths in auto-intersection (n) The relation R(n) = σ{i=j} (R1 ) assigns a weight ¯0 to each string tuple s(n) such that si 6= sj . For sake of simplicity, we chose to construct the auto-intersection A(n) = (n) σ{i=j} (A1 ) without creating paths such that ℓi (γ) 6= ℓj (γ), which is equivalent to creating such paths with w(γ) = ¯ 0. Thus, all successful paths of A(n) have a delay equal to0 : ∀γ ∈ ΓA(n) , ( ℓi (γ) = ℓj (γ) ) =⇒ ( |ℓi (γ)| = |ℓj (γ)| ) =⇒ δhi,ji (γ) = 0 . (n) = We denote by Γ (n) the set of paths of A1 that have equal strings on tape i and j, and A1

by

Γ0 (n) A1

(n)

the set of paths of A1

that have a 0-delay wrt. tape i and j. Then it holds 0 Γ= (n) ⊆ Γ (n) ⊆ Γ (n) A A A

(18)

R(Γ= (n) ) A1

(19)

1

1

1

R(ΓA(n) ) =

Note that ΓA(n) 6= Γ=(n) , despite (19), because the two sets are in different n-WFSMs. A1

If the auto-intersection R(n) is rational then A(n) exists and ΓA(n) has bounded delay since all its paths have 0-delay. This does, however, not imply a bounded delay for Γ=(n) . For example, (a:ε)∗ a:a (ε:a)∗ has a Γ=(n) with unbounded delay, but a rational A1

A1

auto-intersection that can be defined by a:a (a:a)∗ with bounded delay. Note that boundedness is a property of path sets or n-WFSMs, and rationality a property of relations. We can consider any path as a factorization, in which the sum of the delays of the factors is equal to the delay of the path. Therefore it holds: ∀γ ∈ Γ0 (n) , γ = γ1 γ2 · · · γr : A1 Pr δhi,ji (γ) = h=1 δhi,ji (γh ) = 0. If we denote acyclic factors by ah and cyclic factors

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

10

by ch , we can consider any path as a factorization of the form γ = a1 ck11 a2 ck22 · · · ar ckr r (where any ah or ch can be of length 0). It holds r r X X ∀γ ∈ Γ0A(n) , γ = a1 ck11 a2 ck22 · · · ar ckr r : δhi,ji (ah ) + δhi,ji (ckhh ) = 0 (20) 1

r X

h=1

δhi,ji (ah ) = const

r X

=⇒

h=1

δhi,ji (ckhh ) =

kh · δhi,ji (ch ) = const

(21)

h=1

h=1

h=1

r X

4.2. A simple case of bounded delay 4.2.1. Cycles with positive delay (n)

Before defining the class of triples hA1 , i, ji that our construction can cope with, we (n) discuss a simple example. Consider the n-WFSM A1 , in Figure 1, with acyclic factors denoted by ah and cyclic factors denoted by ch . If all cycles have positive delay, δhi,ji (ch ) > 0, then there is only a finite set of solutions to (21), corresponding to a finite set of tuples of coefficients hk1 , . . . kr i, and to a finite set of paths Γ0 (n) . Due to (18), the set of paths of the A1

auto-intersection ΓA(n) is finite, too. This means, A(n) is acyclic and hence has bounded delay (not exceeding a limit that we will compile below).

c1

a1 Fig. 1. The n-WFSM

c2

a2

(n) A1

c3

a3

a4

has four acyclic factors ah and three cycles ch with positive delay.

4.2.2. Limit of the delay If we monitored the delay along any path γ ∈ Γ0 (n) of the n-WFSM in Figure 1, we would A1

obtain a curve such as in Figure 2, that begins and ends with δhi,ji = 0. What is the limit for the (marked) global maximum δˆhi,ji (γ) and global minimum δˇhi,ji (γ) of such a curve, which, in fact, is not recorded? c2

c2 c3 c3 c3

δ

0

c1 a2 c1 c1

a3

a4

γ

a1

(n)

Fig. 2. Hypothetic monitoring of the delay of one path γ of A1 , with δhi,ji (γ) = 0 (global extrema marked). (n)

Actually, we traverse A1 in-depthe both left-to-right and right-to-left, and obtain the (n) (n) LR RL curves in Figure 3, with the (marked) global maxima δˆhi,ji (A1 ) and δˆhi,ji (A1 ), and e In

an in-depth traversal of an n-WFSM, all outgoing transitions of a state are followed (in arbitrary order). If we

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

11 (n)

(n)

LR RL global minima δˇhi,ji (A1 ) and δˇhi,ji (A1 ). Note that in the right-to-left traversal we have to invert the sign of the delay. We could monitor a different curve in Figure 3a. For example, after a1 , we could first traverse a2 rather than c1 . This, however, would have no impact on the vertical positioning of any part of the curve, and therefore would not influence the observed global extrema.

c2

c2

δ LR 0

δ RL c3

c1 a2

a1

a3

γ LR

c1 a 2

a1

a3

c3 a4

a4 0

γ RL

(b)

(a)

(n)

Fig. 3. Possible actual monitoring of the delay on all paths of A1 , (a) on a left-to-right and (b) right-to-left in-depth traversal (global extrema marked).

Since all cycles have positive delay, traversing a cycle always raises the remainder of LR the δhi,ji -curve (Figure 2). It does, however, not influence the remainder of the δhi,ji -curve (Figure 3a). This means, parts of the δhi,ji -curve can be higher than the corresponding parts LR of δhi,ji -curve, but not vice versa! Furthermore, since the δhi,ji -curve begins and ends in 0, its global minimum cannot be higher than 0. For the global minimum of all paths with (n) 0-delay in A1 , this means ∀γ ∈ Γ0A(n) : 1

(n) LR (A1 ) ≤ δˇhi,ji (γ) ≤ 0 δˇhi,ji

=⇒

(n)

LR (A1 )| (22) |δˇhi,ji (γ)| ≤ |δˇhi,ji

For the same reason, parts of the δhi,ji -curve can be lower than the corresponding parts of RL δhi,ji -curve (Figure 3b), but not vice versa! Since the δhi,ji -curve begins and ends in 0, its global maximum cannot be lower than 0. For the global maximum of all paths with 0-delay, this means ∀γ ∈ Γ0A(n) : 1

(n) RL δˆhi,ji (A1 ) ≥ δˆhi,ji (γ) ≥ 0

=⇒

(n) RL |δˆhi,ji (γ)| ≤ |δˆhi,ji (A1 )| (23)

Note that (22) and (23) hold for any path γ ∈ Γ0 (n) and any (arbitrary) order in which parts of

(n) A1

A1

can be traversed during the in-depth traversals. (n)

4.3. A class of triples hA1 , i, ji yielding a compilable auto-intersection (n)

(n)

Proposition 1. Let Θ be the class of all the triples hA1 , i, ji such that A1 does not contain a path traversing both a cycle with positive delay and a cycle with negative delay (n) (w.r.t. tapes i and j). Then for all paths γ ∈ ΓA(n) of A(n) = σ{i=j} (A1 ), the delay is bounded by (n) (n) (n) (n) max LR RL LR RL δhi,ji = max( |δˆhi,ji (A1 )| , |δˆhi,ji (A1 )| , |δˇhi,ji (A1 )| , |δˇhi,ji (A1 )| ) (24) reach a state that is already on the currently visited path, which indicates that we are in a cycle, or if we reach a state that has no more outgoing transitions that we did not follow yet, then we back-track. This resembles the traversal of a tree. An in-depth traversal is repeated for each initial state (in arbitrary order).

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

12

Proof. From (22) and (23) follows that if a path γ ∈ Γ0 (n) has only cycles with positive deA1

(n) (n) LR RL lay, the absolute value of the delay of γ is bounded by max( |δˇhi,ji (A1 )|, |δˆhi,ji (A1 )| ). This still holds if we also admit cycles with 0-delay on γ, since traversing such a cycle has no impact on the delay outside of it. If all cycles of γ had negative or 0delay instead, the absolute value of the delay of γ would be bounded symetrically by (n) (n) RL LR max( |δˇhi,ji (A1 )|, |δˆhi,ji (A1 )| ). Therefore, Proposition 1 holds for all paths γ ∈ Γ0 (n) . A1

Due to (18), it also holds for all paths γ ∈ ΓA(n) . (n)

(n)

Note that even if A1 has cycles with delays of different sign, hA1 , i, ji can still be in Θ, if those cycles are not on the same path. We will see two methods (Sections 5.3 and 6.2) that are expected to extend the class Θ. Both methods attempt to remove cycles that are in conflict with the class definition. How(n) ever, even then we can obviously only deal with a subset of all triples hA1 , i, ji that lead to a rational auto-intersection, but we hope that it is sufficient for many practical applications. 4.4. Membership test 4.4.1. Example (n)

(n)

Let A1 be the n-WFSM of Figure 4. To decide whether a triple hA1 , i, ji is in the class Θ we have to analyze the delays of its elementary cycles,f c0 , c1 , c2 , and c3 . We must compare c0 with all other cycles, and c1 with c2 . We must, however, not compare c1 or c2 with c3 because no path traverses both c1 and c3 , or c2 and c3 . It is sufficient to analyze elementary cycles. For example, if the delays of c1 and c2 have equal sign, then any cycle c′ consisting of (one or more times) c1 and c2 would have a delay of the same sign. Comparing c0 with c1 and c2 is then sufficient, because comparing c0 with c′ would not lead to a different result. If, however, the delays of c1 and c2 have different sign, then the comparison of c1 with c2 already reveals a conflict, and no further analysis is required. Hence, the cycle c′ does not need to be considered in any case. 1

c0

c1

2

c2

0 3

c3

Fig. 4. Analysis of the elementary cycles c0 , c1 , c2 and c3 of an n-WFSM (labels and weights omitted).

4.4.2. Algorithm We proceed in three steps to test whether a given triple hA(n) , i, ji is member of the class Θ. First, we find all elementary cycles by means of a standard method [7]. Then, we mark on each state of a cycle the sign of the cycle’s delay. Finally, we perform an in-depth traversal in order to search for a successful acyclic path that traverses two states marked by f By an elementary cycle on a state q we understand a cycle that starts and ends in q and does not traverse any state more than once.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

13

different sign. The worst case time complexity of the membership test is exponential w.r.t. the number of states because it is exponential both for finding all elementary cycles and for the traversal of all acyclic paths. 5. Algorithms based on delay (n)

We present an algorithm for computing the auto-intersection of an n-WFSM A1 w.r.t. (n) tapes i and j, when the triple hA1 , i, ji is in the class Θ. In this case, the algorithm copies (n) Γ=(n) of A1 into the new n-WFSM A(n) . Actually, it is applicable whenever Γ0 (n) has A1

A1

bounded delay and the bound is known, which admits a larger class than Θ. This, however, requires a different membership test and a different compilation of the bound. We also show how this algorithm can be extended for handling a multi-pair autointersection. 5.1. Auto-intersection on a single pair of tapes According to Equation 14, computing the auto-intersection on tapes i and j of a n-WFSM (n) (n) (n) A1 = hΣ1 , Q1 , K1 , E1 , λ1 , ̺1 i such that the triple hA1 , i, ji is in the class Θ consists (n) in computing an automaton A(n) that keeps all the successful paths of A1 whose labels on tapes i and j are equal. However, this does not come to make A(n) a subautomaton of (n) A1 . Indeed, due to ε labels, the property of equality cannot be kept for the prefixes of paths of A(n) . Let γ be a successful path of A(n) with li (γ) = lj (γ) = w and γ ′ a prefix of γ. Then there exist z and t in Σ∗ such that li (γ ′ ) · z = lj (γ ′ ) · t. Hence one of the two labels li (γ ′ ) or lj (γ ′ ) is a prefix of the other one. Let x = lcp(li (γ ′ ), lj (γ ′ )). Then there exist two words s and u in Σ∗ called leftover strings such that li (γ ′ ) = x · s and lj (γ ′ ) = x · u, where s or u is necessarily the empty word. Therefore, a state in Q can be described as a tuple max ≤δhi,ji

hq1 , s, ui in Q1 × ({ε} × Σ1 on tapes i and j of

(n) A1 .

max ≤δhi,ji

∪ Σ1

max × {ε}), where δhi,ji is a bound of the delay

According to these properties the following proposition holds. (n)

(n)

Proposition 2. For any n-WFSM A1 such that the triple hA1 , i, ji is in the class (n) Θ, the auto–intersection σ{i=j} (A1 ) can be represented by the n-WFSM A(n) = (n) hΣ1 , Q, K1 , E , λ, ̺i defined as follows: max ≤δhi,ji

Q ⊆ Q1 × ({ε} × Σ1 E=

max ≤δhi,ji

∪ Σ1

× {ε})

(n) {hhq1 , s, ui, l1 , w1 , hq1′ , s′ , u′ ii | (n) hq1 , s, ui ∈ Q ∧ hq1′ , s′ , u′ i ∈ Q ∧ ∃e1 = hq1 , l1 , w1 , q1′ i s′ = x−1 · s · ℓi (e1 ) ∧ u′ = x−1 · u · ℓj (e1 ),

where x = lcp(s · ℓi (e1 ), u · ℓj (e1 ))}  λ (q ) if s = ε ∧ u = ε λ(hq1 , s, ui) = ¯ 1 1 ∀hq1 , s, ui ∈ Q 0 otherwise  ̺ (q ) if s = ε ∧ u = ε ∀hq1 , s, ui ∈ Q ̺(hq1 , s, ui) = ¯1 1 0 otherwise

(25) ∈ E1 ∧ (26) (27) (28)

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

14

Equation 27 states that a transition between two states of Q in A(n) exists in E if (n) the corresponding transition exists in A1 between the two associated states and if the concatenation of the labels of this transition on the tapes i and j with their respective leftover strings on the source state yields the leftover strings of the target state once the common prefix is removed. (n) The complexity of the construction of A(n) = σ{i=j} (A1 ) depends on the number of max states of A(n) which is such that: |Q| ≤ |Q1 | · (|Σ1 |0 + 2|Σ1 |1 + . . . 2|Σ1 |δhi,ji ). 5.1.1. Example (n)

(n)

Let A1 be the n-WFSM of Figure 5a. Since A1 contains no path that traverses both a (2) cycle with positive and one with negative delay (w.r.t. tapes 1 and 2), the triple hA1 , 1, 2i (2) max is in the class Θ. The delay of A(2) = σ{1=2} (A1 ) is bounded by δh1,2i = 1. The support (2)

of A1 is (2)

support(A1 ) = (a:a ∪ a:ε)∗ (ba:ab)∗ ε:a = { hai+j (ba)h , ai (ab)h ai | i, j, h ∈ N } (29) (2)

To construct the auto-intersection, we copy states and transitions one by one from A1 (Figure 5a) to A(2) (Figure 5b), starting with the initial state hq1 , ε, εi. Then, we copy the three outgoing transitions of q1 = 0, with their original labels and weights, as well as their respective target states. If a target state q = hq1′ , s′ , u′ i already exists in Q, no state is created and we use the existing state instead. The hs′ , u′ i component of the target state of a transition e results from the hs, ui component of its source state, concatenated with the relevant components of its label ℓ(e). The longest common prefix of s′ and u′ is removed. For example, for the cyclic transition e on q = 5 (Figure 5b), the leftover strings of the target are hu′ , v ′ i = hab, abi−1 (ha, εihba, abi) = ha, εi, which implies t(e) = p(e). In Figure 5b, state q = 2 and its incoming transition are not created because here the max delay, that can be compiled from haa, εi, exceeds δh1,2i = 1, which means that any path

a:a /w 0 a:a /w 0

a:a /w 0

a:ε /w 1 0

ε:ε /w 2 ba:ab /w 3 1

3

ba:ab /w 3

1

ε:ε /w 2



4



2 /ρ1

(b) (2)

Fig. 5. (a) An n-WFSM A1 structed).



6

a:ε /w 1

2

ε:ε /w 2 ba:ab /w 3 5



ε:a /w 4

ε:a /w 4

ε:a /w 4 (a)

a:ε /w 1

0

(2)

7 /ρ1

and (b) its auto-intersection A(2) = σ{1=2} (A1 ) (dashed parts are not con-

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

15 (2)

traversing q = 2 cannot be in σ{1=2} (A1 ). State q = 4 and its incoming transitions are not created because both leftover strings in h2, ba, abi are non-empty, which means that any path traversing q = 4 has different strings on tape 1 and 2 and can therefore not be in (2) (2) σ{1=2} (A1 ). State q = 6 (h2, ε, ai) is non-final, although q1 = 2 is final (in A1 ), because the leftover strings hs, ui are not (ε, ε), which means that any path ending in q = 6 has different strings on tape 1 and 2. The support of the constructed auto-intersection is (as expected) (2)

support( σ{1=2} (A1 ) ) = (a:a)∗ a:ε (a:a)∗ (ba:ab)∗ ε:a = { hai+1+j (ba)h , ai+j (ab)h ai | i, j, h ∈ N } = { hai+j+1 (ba)h , ai+j+1 (ba)h i | i, j, h ∈ N }

(30)

5.2. Basic multi-pair auto-intersection Auto-intersection on multiple pairs of tapes follows basically the same principles as on a (n) single pair of tapes. We now consider a n-WFSM A1 and r pairs of tapes hih , jh i, and (n) we suppose that the triple hA1 , ih , jh i is in the class Θ, for all 1 ≤ h ≤ r, and that max δhih ,jh i is a bound of the delay on the tapes ih and jh . The multi-pair auto-intersection (n)

σ{i1 =j1 ,...,ir =jr } (A1 ) is represented by the n-WFSM A(n) . Each state q of A(n) is associated with a state q1 ∈ Q1 and with r tuples of strings hsh , uh i such that sh or uh is the empty word, for all 1 ≤ h ≤ r. Hence q is a triple hq1 , hs, ui(r) i and the set of states Q max ≤δhi ,j

i

max ≤δhi ,j

i

h h of A(n) is such that Q ⊂ Q1 × Πh=r ∪ Σ1 h h × {ε}). Notice that h=1 ({ε} × Σ1 (r) (r) a state can be equivalently seen as a triple hq1 , hs , u ii. According to these properties the following proposition holds.

(n)

Proposition 3. Let A1 be a n-WFSM and hih , jh i for 1 ≤ h ≤ r be r pairs of tapes. We (n) max suppose that hA1 , ih , jh i is in the class Θ, for all 1 ≤ h ≤ r, and that δhi is a bound h ,jh i (n)

of the delay on the tapes ih and jh . The multi-pair auto-intersection σ{i1 =j1 ,...,ir =jr } (A1 ) can be represented by the n-WFSM A(n) = hΣ1 , Q, K1 , E (n) , λ, ̺i defined as follows: max ≤δhi ,j

Q ⊆ Q1 × {hs(r) , u(r) i ∈ (Σ1 E=

h

hi

)2 | ∀i ∈ [[1, r]], πi (s(r) ) = ε ∨ πi (u(r) ) = ε}

(n) {hhq1 , s(r) , u(r) i, l1 , w1 , hq1′ , s′(r) , u′(r) ii | (n) hq1 , s(r) , u(r) i ∈ Q ∧ hq1′ , s′(r) , u′(r) i ∈ Q ∧ ∃e1 = hq1 , l1 , w1 , q1′ i −1 −1 (n) (n) s′(r) = x(r) · s(r) · πi(r) (l1 ) ∧ u′(r) = x(r) · u(r) · πj (r) (l1 )

where x(r)

−1

(n)

(n)

(n)

(31)

∈ E1 such that (32) (n)

= lcp(s(r) · πi(r) (l1 ), u(r) · πj (r) (l1 )) and πi(r) (l1 ) (resp. πj (r) (l1 ) is

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

16 (n)

on the tuple of tapes hi1 , ..., ir i (resp. hj1 , ..., jr i). g   λ1 (q1 ) if hsh , uh i = hε, εi λ(hq1 , s(r) , u(r) i) = ∀hq1 , s(r) , u(r) i ∈ Q ∀1 ≤ h ≤ r ¯ 0 otherwise   ̺1 (q1 ) if hsh , uh i = hε, εi ̺(hq1 , s(r) , u(r) i) = ∀hq1 s(r) , u(r) i ∈ Q ∀1 ≤ h ≤ r ¯ 0 otherwise

the projection of l1

(33)

(34)

(n)

The complexity of the construction of A(n) = σ{i1 =j1 ,...ir =jr } (A1 ) depends on the δmax +1 Qr hih ,jh i number of states of A(n) , which is such that: |Q| < |Q1 | h=1 (2 |Σ1 | |Σ1 |−1 −1 − 1). 5.2.1. Example (4)

(4)

(4)

The n-WFSM A1 in Figure 6a is such that the triples hA1 , 1, 2i and hA1 , 3, 4i are in (4) (4) max the class Θ. The delay of σ{1=2} (A1 ) is bounded by δh1,2i = 1 and that of σ{3=4} (A1 ) (4)

max by δh3,4i = 2. The support of A1 ish (4)

support( A1 ) = (a:a:dc:cd ∪ a:ε:c:ε)∗ (ba:ab:c:ε)∗ ε:a:ε:cc = { hai+j (ba)h , ai (ab)h a, ([dc]i

a:a:dc:cd /w 0

1

a:ε:c:ε /w 1 0

0

ε:ε:ε:ε /w 2 ba:ab:c:ε /w 3

ε:ε:ε:ε /w 2

2

5

ε:a: ε:cc /w 4

ε:a:ε:cc /w 4 9

(b)



(4)

Fig. 6. (a) An n-WFSM A1 constructed).

3



ba:ab:c: ε /w 3

6

ba:ab:c: ε /w 3

7

8





ε:a: ε:cc /w 4

ε:a: ε:cc /w 4

10 c



ε:ε:ε:ε /w 2

ba:ab:c: ε /w 3

(a)

a: ε :c: ε /w 1

a: ε :c: ε /w 1

4

1

2 /ρ1

a:a:dc:cd /w 0

a:a:dc:cd /w 0



cj )ch , (cd)i c2 i | i, j, h ∈ N } (35)



11 /ρ1

(4)

and (b) its auto-intersection A(4) = σ{1=2,3=4} (A1 ) (dashed parts are not

To construct simultaneously the two auto-intersections σ{1=2} and σ{3=4} (with σ{3=4} (σ{1=2} ) = σ{1=2} (σ{3=4} ) = σ{1=2,3=4} ), we proceed similarly as in the above construction for a single pair of tapes. We copy states and transitions one by one from g Note

that the common prefix of two tuples s(r) and u(r) is compiled element-wise: lcp(s(r) , u(r) ) = hlcp(s1 , u1 ), . . . , lcp(sr , ur )i h We include (informally) a (sub-)string into square brackets to express that it is not split by shuffle (cf.(35)). For example, in ([ab]i [cd]j ) any number of cd can intervene between two occurrences of ab, but not inside one ab, and any number of ab can occur between two cd, but not inside one cd.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

17 (4)

A1 (Figure 6a) to A(4) (Figure 6b), starting with the initial state q1 = 0. Each state (4) q = hq1 , s(r) , u(r) i of A(4) is composed of its corresponding state q1 in A1 , the leftover string tuple s(r) from the tapes hi1 , . . . ir i (yet unmatched on the tapes hj1 , . . . jr i) and the leftover string tuple u(r) from the tapes hj1 , . . . jr i (yet unmatched on the tapes hi1 , . . . ir i). Hence, for the initial state q = 0 of A(4) it is h0, hε, εi, hε, εii. Then, we attempt to copy the three outgoing transitions of q1 = 0 with their original labels and weights, as well as their respective target states. For every transition e, the leftover string tuples s(r) and u(r) of its source are concatenated with the relevant components of its label ℓ(e) (respectively πi(r) (ℓ(e)) and πj (r) (ℓ(e))) and the common prefix is removed to produce the leftover string tuples s′ (r) and u′ (r) of the target state. (r) (r) If the target state ht(e), s′ , u′ i already exists, then it is used, otherwise it is created. For example, for the cyclic transition e on q = 2 (Figure 6b), the leftover tuples of the source, (ha, ci, hε, εi), are concatenated with the relevant projections of the label, πh1,3i (ℓ(e)) = ha, dci and πh2,4i (ℓ(e)) = ha, cdi. From the result, (haa, cdci, ha, cdi), we remove the longest common prefix, ha, cdi, of the two tuples, and obtain finally the leftover tuples of the target, (ha, ci, hε, εi), which implies that p(e) = t(e). In Figure 6b, state q = 3 and its incoming transition are not created because here σ{1=2} max exceeds δh1,2i = 1. State q = 1 and its incoming transition are not created because σ{3=4} has incompatible leftover strings, dc and cd. State q = 9 (h2, hε, εi, ha, ccii) is non-final, (4) although q1 = 2 is final in A1 , because the leftover tuples are not (hε, εi, hε, εi). The support of the constructed auto-intersection is (as expected) (4)

support( σ{1=2,3=4} (A1 ) ) = a:ε:c:ε (a:a:dc:cd)∗ ba:ab:c:ε ε:a:ε:cc = { haai ba, ai aba, c(dc)i c, (cd)i c2 i | i ∈ N } = { hai+1 ba, ai+1 ba, (cd)i c2 , (cd)i c2 i | i ∈ N } (36)

5.3. Iterative multi-pair auto-intersection We now describe an alternative approach that attempts to construct iteratively an autointersection on multiple pairs of tapes. As an example, suppose that we have to compile an auto-intersection on multiple pairs (4) (4) of tapes, σ{1=2,3=4} (A1 ), of the n-WFSM in Figure 7a. The support of A1 is (4)

support( A1 ) = (a:a:dc:cd ∪ a:ε:c:ε)∗ (ba:ab:ε:c)∗ ε:a:ε:c = { hai+j (ba)h , ai (ab)h a, ([dc]i

cj ), (cd)i ch ci | i, j, h ∈ N } (37)

(4)

(4)

max The triple hA1 , 1, 2i is in the class Θ, with δh1,2i = 1. The triple hA1 , 3, 4i is not in the class Θ. Therefore, we compile first only σ{1=2} and obtain the n-WFSM in Figure 7b, with the support (4)

support( σ{1=2} (A1 ) ) = (a:a:dc:cd)∗ a:ε:c:ε (a:a:dc:cd)∗ (ba:ab:ε:c)∗ ε:a:ε:c = { hai aaj (ba)h , ai aj (ab)h a, (dc)i c(dc)j , (cd)i (cd)j ch ci | i, j, h ∈ N } = { hai+j+1 (ba)h , ai+j+1 (ba)h , (dc)i (cd)j c, (cd)i (cd)j ch+1 i | i, j, h ∈ N }

(38)

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

18

a:a:dc:cd /w 0 0

0

ε:ε:ε:ε /w 2

ε:ε:ε:ε /w 2

ba:ab:ε:c /w 3

1

ba:ab:ε:c /w 3

ε:ε:ε:ε /w 2

2

2

ε:a:ε:c /w 4

ε:a:ε:c /w 4 2 /ρ1

(b)

a:ε:c:ε /w 1

a:a:dc:cd /w 0

1

1

(a)

a:ε:c:ε /w 1

a:ε:c:ε /w 1 a:a:dc:cd /w 0

a:a:dc:cd /w 0

0

ε:a:ε:c /w 4 3 /ρ1

(c)

3 /ρ1

(4)

(4)

Fig. 7. Iterative compilation of auto-intersection: (a) an n-WFSM A1 , (b) its auto-intersection σ{1=2} (A1 ), and (c) a second auto-intersection

(4) σ{3=4} (σ{1=2} (A1 )).

(4)

max The triple hσ{1=2} (A1 ), 3, 4i is in the class Θ, with δh3,4i = 2. Now we can compile the second auto-intersection, and obtain the n-WFSM in Figure 7c, with the support (4)

support( σ{3=4} (σ{1=2} (A1 )) ) = a:ε:c:ε (a:a:dc:cd)∗ ε:a:ε:c = { haai , ai a, c(dc)i , (cd)i ci | i ∈ N } = { hai+1 , ai+1 , (cd)i c, (cd)i ci | i ∈ N }

(39)

6. An extension of the class Θ In this section, we introduce the filtering of n-WFSM that, to some extent, leads to an extension of the class Θ. Since it makes use of a property of the single-pair join operation, we first provide a defintion of this operation that is not concerned with the class Θ or any delay boundedness. 6.1. Join on a single pair of tapes Single-pair join can be compiled similarly as composition of weighted transducers except that one of the intersected tapes is kept. Therefore, we can avoid auto-intersecting the crossproduct of the operand machines and give a definition for a direct construction regardless of the class Θ. Since the machines are weighted, we have to deal with commutative semirings and take care of the alignment of ε-transitions, a basic composition will generate all possible ε-alignments which would yield incorrect weights. It is the reason why the following construction of the join of two weighted multi-tape machines involves a simulation of the three-state filter described in [21]. Notice that in the following the component qf ∈ {0, 1, 2} represents a state of the filter, as shown in Figure 8. (n)

(n)

(m)

= = hΣ1 , Q1 , K, E1 , λ1 , ̺1 i and A2 Proposition 4. For any n-WFSM A1 (m) (n) (m) hΣ2 , Q2 , K, E2 , λ2 , ̺2 i, the join A1 1{i=j} A2 can be represented by the n-WFSM

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

19

ε1 : φ 2 x

x ε1 : ε 2

1

ε1 : φ 2

2

φ1 : ε 2

φ1 : ε 2

0 x

Fig. 8. Weighted transducer composition filter.

A(m+n−1) = hΣ, Q, K, E (n+m−1) , λ, ̺i where: Σ = Σ1 ∪ Σ2

(40)

Q = Q1 × Q2 × {0, 1, 2}

(41)

λ(hq1 , q2 , qf i) = λ1 (q1 ) ⊗ λ2 (q2 ), ∀hq1 , q2 , qf i ∈ Q

(42)

̺(hq1 , q2 , qf i) = ̺1 (q1 ) ⊗ ̺2 (q2 ), ∀hq1 , q2 , qf i ∈ Q

(43)

E = {hhq1 , q2 , qf i, l1 : π {j} (l2 ), w1 ⊗ w2 , hq1′ , q2′ , 0ii|∃hq1 , l1 , q1′ , w1 i ∈ E1 , ∃hq2 , l2 , w2 , q2′ i ∈ E2 , s. t. πhii (l1 ) = πhji (l2 ) ∧ (qf = 0 ∨ πhii (l1 ) 6= ε)} ∪(44) {hhq1 , q2 , qf i, l1 : ε(m−1) , w1 , hq1′ , q2 , 1ii|qf ∈ {0, 1} ∧ ∃hq1 , l1 , w1 , q1′ i ∈ E1 , s. t. πhii (l1 ) = ε} ∪ {hhq1 , q2 , qf i, ε ∃hq, l2 , w2 , q2′ i

(n)

:

π {j} (l2 ), w2 , hq1 , q2′ , 2ii|qf

∈ E2 , s. t. πhji (l2 ) = ε}

(45) ∈ {0, 2} ∧ (46) (n)

Proof of this proposition is based on the fact that the single-pair join of two n-WFSM A1 (m) and A2 is constructed as a weighted composition [21]. We just recall that the aim of the filter is to avoid redundant path generation, leading to incorrect weights, by performing ε-alignment on the joined tapes. This filter (see Figure 8) will align ε-transition from A1 and A2 and will consider separatly ε-paths from A1 (resp. A2 ), looping on filter state 1 (resp. 2), that would lead to the composition of a transition from both machines (returning on filter state 0). Sub-equation 44 (resp. Sub-equation 45 and Sub-equation 46) stands for all transitions of the filter whose target is the state 0 (resp. 1 and 2) of the filter. 6.2. Filtering (n)

(n)

Let R1 be a relation and A1 be a n-WFSM that realizes it. We suppose that the triple (n) hA1 , i, ji is not in the class Θ. (n) (n) The filtering technique attempts to substitute to the relation R1 a relation R2 (real(n) (n) (n) ized by the n-WFSM A2 ) satisfying the conditions P1 : σ{i=j} (R2 ) = σ{i=j} (R1 ) (n)

and P2 : hA2 , i, ji ∈ Θ. (n) (n) Given a relation R1 , it is generally possible to construct different relations R2 satisfying the condition P1 . Before presenting such a construction we need the following definition.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

20

Definition 5. A 1-ary relation R(1) is said to be neutrally weighted if and only if the weight of each word in its support is equal to ¯1. A n-WFSM A(1) realizing a 1-ary relation R(1) can be made neutrally weighted as follows. First, all weights of A(1) are replaced by boolean weights: ¯0 by FALSE and non-¯0 by TRUE. Then, A(1) is determinized (and possibly minimized), which is guaranteed to succeed due to the boolean weights. Finally, the boolean weights are again replaced by weights over the original semiring of A(1) : FALSE by ¯0 and TRUE by ¯1. We will denote by neutral(A(1) ) the result of this procedure and by neutral(R(1) ) the relation realized by neutral(A(1) ). (n)

Proposition 6. Let R1 be a relation and i and j two integers such that: 1 ≤ i < j ≤ n. (n) Let us consider the relation filter{i=j} (R1 ) defined by:   (n) (n) (n) (n) filter{i=j} (R1 ) =def R1 1{i=1} neutral(πhji (R1 )) 1{j=2} neutral(πhii (R1 ))(47) (n)

(n)

Then it holds: σ{i=j} (filter{i=j} (R1 )) = σ{i=j} (R1 ).

(n)

Proof. First, we exploit the fact the join of a relation R1 with its own projections generates a superset of its auto-intersection, with regard to the support sets:     (n) (n) (n) (n) support (R1 1{i=1} πhji (R1 )) 1{j=2} πhii (R1 ) ⊇ support σ{i=j} (R1 ) (48) (n)

(n)

This is due to the fact that, given a n-tuple s(n) , if si 6∈ πhji (R1 ) (or if sj 6∈ πhii (R1 )), (n) then we have: si 6= sj , and hence s(n) 6∈ support(σ{i=j} (R1 )). This means that an autointersection is not affected by a preceding join with projections, as far as the weights of the string tuples that remain in the support are not altered. This is the case, since by definition (n) (n) the relations neutral(πhji (R1 )) and neutral(πhii (R1 )) have a neutral weight of ¯1 for each string tuple in their supports. (n) Finally the computation of the relation filter{i=j} (R1 ) is guaranteed to succeed since it is achieved by means of two single-tape joins. (2)

In each of the three following examples the n-WFSM A1 is defined from its support (2) (with arbitrarily chosen non-zero weights). In the following example the triple hA1 , 1, 2i is originally not in the class Θ due to the cycles (a:ε)∗ and (ε:ac)∗ . However condition P2 (2) is reached after a single application of filtering making the auto-intersection σ{1=2} (A1 ) computable. (2)

support( A1 ) = (a:a ∪ a:ε ∪ ε:ac)∗ (ba:ab)∗ ε:a support(

(2) filter{1=2} (A1 )

+



) = (a:a ∪ a:ε) (ba:ab) ε:a

(49) (50)

In some cases, filtering converges only after multiple iterations; in the following example two iterations are sufficient:

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

21 (2)

support( A1 ) = (ε:c)∗ (c:d)∗ (d:ε)2 support(

(2) support( filter{1=2} (A1 ) (2) filter{1=2} (filter{1=2} (A1 ))

(51)



2

2

(52)

2

2

2

(53)

support( A1 ) = (a:a)∗ (ε:a)

(54)

) = (ε:c) (c:d) (d:ε) ) = (ε:c) (c:d) (d:ε)

Unfortunately, filtering does not always converge, as, e.g., in (2)

support( filter{1=2} (· · · {z |

(2) filter{1=2} (A1 ) · · · )

r times

≥r

) = (a:a)

(ε:a)

(55)

}

In general, we cannot predict whether condition P2 can be reached by this type of filtering, how many iterations are needed, and whether iterative filtering converges. In other terms, if after several iterations neither convergence nor condition P2 is reached, we cannot decide whether it is meaningful to continue the iteration. Therefore, filtering requires an additional fail-safe halting criterion such as a limit on the number of iterations. 7. Conclusion We conclude by considering possible extensions to the class Θ. Let us first notice that both non-rationality and undecidability of rationality of an auto(n) (n) intersection A(n) = σ{i=j} (A1 ) can only occur if A1 contains “matching cycles” on the ∗ ∗ same path, such as in (a:ε) a(ε:a) . More generally, it concerns one set of cycles, traversed in a particular order, that matches with another (or the same) set of cycles, traversed in some particular order.i One possibility to avoid matching cycles is to require that cycles on the same path must not have a delay with different sign, which actually defines the class Θ. If a cycle c contains a symbol x on tape i, which does not occur on tape j of any cycle (including c itself) on the same path γ, then c cannot match with any cycles on γ, and can be ignored in the membership test as far as γ is concerned. Extending this consideration from symbols to strings, without creating a case of Post’s Correspondence Problem (PCP), is not trivial. Filtering would remove such a cycle in an n-WFSM as shown in Figure 1, but not necessarily in the union of several such n-WFSMs, because it treats all paths at once rather than separately. Consider now the following case as a motivation for future work. Each of the two 2(2) (2) WFSMs, A1 and A2 , in Figure 9 is auto-intersectable on its own, but their concatenation (2) (2) (2) (2) (2) (Figure 10), A3 = A1 A2 , is not since hA3 , 1, 2i 6∈ Θ. Filtering of A3 converges (2) already after one iteration, but without placing hA3 , 1, 2i into Θ : (2)

support( A3 ) = (a:a ∪ a:ε)∗ b:a (b:b ∪ ε:b)∗ support(

i Examples

(2) filter{1=2} (A3 )

+

+

) = (a:a ∪ a:ε) b:a (b:b ∪ ε:b)

of difficult cases resulting from “matching cycles” can be found in [12].

(56) (57)

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

22

a:a /w 0 ε:a /w 2

0

a:a /w 0 0

1 /ρ1

1 /ρ1

ε:b /w 2

(b)

a:a /w 0

a:ε /w 1

1

b:b /w 1

ε:a /w 2

2 /ρ1

(2)

Fig. 9. Two n-WFSMs, (a) A1 (d)

b:ε /w 0

0

a:ε /w 1

(a)

(c)

b:b /w 1

(d)

b:ε /w 0

0

1

ε:b /w 2

(2)

b:b /w 1 2 /ρ1 (2)

and (b) A2 , and their auto-intersections, (c) σ{1=2} (A1 ) and

(2) σ{1=2} (A2 )

a:a /w 0 0

(a)

b:a /w 2

b:b /w 3 a:a /w 0

1 /ρ1

ε:b /w 4

a:ε /w 1

(2)

Fig. 10. (a) an n-WFSM A3

(b) 0 (2)

(2)

a:ε /w 1

a:a /w 0 1

b:a /w 2

b:b /w 3 2

ε:b /w 4

b:b /w 3 3 /ρ1 (2)

= A1 A2 , and (b) its “intuitive” auto-intersection σ{1=2} (A3 )

(2)

Furthermore, A3 contains no cycles that, according to the above criterion, can be ignored in the membership test. In this simple case, the two conflicting cycles (Figure 10a, a:ε and ε:b) are labeled over two disjoint alphabets, which we can see as a criterion that these cycles are actually not in conflict with each other. The auto-intersection remains bounded. In more complex cases, such cycles could have simply different labels (e.g., abc:ε and b:bc), and could be reducible to PCP [12]. Future work could include the search for cases that contain certain types of conflicting cycles, without being reducible to PCP. In such cases the auto-intersection remains bounded, although it may require a different method for calculating the bound. Acknowledgments We wish to thank Jason Eisner for allowing us to use a bulk of relevant notation that he elaborated and for his advice, Mark-Jan Nederhof for pointing out the relationship between auto-intersection and Post’s Correspondence Problem (personal communication), and the anonymous reviewers of our paper for their advice. References [1] Salah A¨ıt-Mokhtar and Jean-Pierre Chanod. Incremental finite-state parsing. In Proc. 5th Int. Conf. ANLP, pages 72–79, Washington, DC, USA, 1997. [2] Srinivas Bangalore and Michael Johnston. Finite-state multimodal parsing and understanding. In Proc. of the 17th COLING, pages 369–375, Saarbr¨ucken, Germany, August 2000.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

23

[3] Samuel Eilenberg. Automata, Languages, and Machines, volume A. Academic Press, San Diego, 1974. [4] Jason Eisner. Parameter estimation for probabilistic finite-state transducers. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, July 2002. [5] Calvin C. Elgot and Jorge E. Mezei. On relations defined by generalized finite automata. IBM Journal of Research and Development, 9(1):47–68, 1965. [6] Christiane Frougny and Jacques Sakarovitch. Synchronized rational relations of finite and infinite words. Theoretical Computer Science, 108(1):45–82, 1993. [7] Donald B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Jounal on Computing, 4(1):77–84, 1975. [8] Ronald M. Kaplan and Martin Kay. Regular models of phonological rule systems. Computational Linguistics, 20(3):331–378, 1994. [9] Martin Kay. Nonconcatenative finite-state morphology. In Proc. 3rd Int. Conf. EACL, pages 2–10, Copenhagen, Denmark, 1987. [10] Andr´e Kempe. Reduction of intermediate alphabets in finite-state transducer cascades. In Proc. 7th Conf. TALN, pages 207–215, Lausanne, Switzerland, October 2000. ATALA. [11] Andr´e Kempe. NLP applications based on weighted multi-tape automata. In Proc. 11th Conf. TALN, pages 253–258, Fes, Morocco, April 2004. [12] Andr´e Kempe, Jean-Marc Champarnaud, and Jason Eisner. A note on join and auto-intersection of n-ary rational relations. In B. Watson and L. Cleophas, editors, Proc. Eindhoven FASTAR Days, number 04–40 in TU/e CS TR, pages 64–78, Eindhoven, Netherlands, 2004. [13] Andr´e Kempe, Franck Guingne, and Florent Nicart. Algorithms for weighted multi-tape automata. Research report 2004/031, Xerox Research Centre Europe, Meylan, France, 2004. [14] George Anton Kiraz. Multitiered nonlinear morphology using multitape finite automata: a case study on Syriac and Arabic. Computational Lingistics, 26(1):77–105, March 2000. [15] Kevin Knight and Jonathan Graehl. Machine transliteration. Computational Linguistics, 24(4), 1998. [16] Werner Kuich and Arto Salomaa. Semirings, Automata, Languages. Number 5 in EATCS Monographs on Theoretical Computer Science. Springer Verlag, Berlin, Germany, 1986. [17] Shankar Kumar and William Byrne. A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In Proc. Int. Conf. HLT-NAACL, pages 63–70, Edmonton, Canada, 2003. [18] Daniel J. Lehmann. Algebraic structures for transitive closure. Theoretical Computer Science, 4(1):59–76, 1977. [19] Mehryar Mohri. Edit-distance of weighted automata. In Proc. 7th Int. Conf. CIAA (2002), volume 2608 of Lecture Notes in Computer Science, pages 1–23, Tours, France, 2003. Springer Verlag, Berlin, Germany. [20] Mehryar Mohri. Weighted finite-state transducer algorithms: An overview. In Carlos MartnVide, Victor Mitrana, and Gheorghe Paun, editors, Formal Languages and Applications, volume 148, pages 551–564. Springer, Berlin, Germany, 2004. [21] Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. A rational design for a weighted finite-state transducer library. In J.-M. Champarnaud, D. Maurel, and D. Ziadi, editors, Proceedings of the Second International Workshop on Implementing Automata. WIA (1997), volume 1436 of Lecture Notes in Computer Science, pages 144–158. Springer Verlag, Berlin, Germany, 1998. [22] Fernando C. N. Pereira and Michael D. Riley. Speech recognition by composition of weighted finite automata. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, pages 431–453. MIT Press, Cambridge, MA, USA, 1997. [23] Emil Post. A variant of a recursively unsolvable problem. Bulletin of the American Mathematical Society, 52:264–268, 1946.

August 18, 2009 16:30 WSPC/INSTRUCTION FILE

cgkn08a

24

[24] Michael O. Rabin and Dana Scott. Finite automata and their decision problems. IBM Journal of Research and Development, 3(2):114–125, 1959. [25] Arnold L. Rosenberg. On n-tape finite state acceptors. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 76–81, 1964.