Proof-Guided Testing - Page personnelle - Guillaume Lussier

stration is a paper proof based on informal discourse. As a result, there are examples of “proofs” of incorrect. FT algorithms in the literature: the case study that.
274KB taille 2 téléchargements 286 vues
Proof-Guided Testing : Towards Complementarity of Verification Techniques Guillaume Lussier, H´el`ene Waeselynck, Karim Guennoun LAAS-CNRS 7 Avenue du Colonel Roche 31077 Toulouse Cedex 4 - France Email: {glussier,waeselyn,kguennou}@laas.fr Abstract Proof-guided testing is intended to combine correctness proof and testing. The target application field is the verification of fault-tolerance algorithms where a complete formal proof is not available. The method aims at enhancing the testing process with information extracted from the algorithm’s proof: ideally, testing should be focused on the weak parts of the proof. In the case of paper proofs, the identification of weak parts proceeds by restructuring the informal discourse as a semi-formal proof tree and analyzing it step by step. Building on preliminary results obtained for a faulttolerant scheduling algorithm, the method is now consolidated using the example of a group membership protocol (GMP). This GMP was developed for the TimeTriggered Protocol, and was found flawed after publication. Results are quite promising: (1) compared to crude random testing, the proof-guided method allowed us to significantly improve the fault revealing power of test data; (2) the overall method also provided useful feedback on the informal proof and potential flaw(s).

1

Introduction

Fault-Tolerance (FT) mechanisms are critical components for building dependable architectures. While strong evidence for correctness of the underlying algorithms is desirable, it is often the case that the demonstration is a paper proof based on informal discourse. As a result, there are examples of “proofs” of incorrect FT algorithms in the literature: the case study that will be used in this paper is one of them. Attempts at formal reworking may fail in the sense that the proofs remain inconclusive, or partial, due to pending lemmas that could not be discharged. More-

over, formal reworking is usually very costly. Testing can be seen as a pragmatic approach to obtain further evidence in the case of informal proof results or formal – but partial – ones. Note that, ultimately, testing is always required to validate the implemented FT-architecture. This is not the issue discussed in this paper. Rather, the concern is verification of the key properties of the core algorithms. The tested artefact is possibly a prototype of the core algorithm, or a specification that can (in some way) be executed. Testing should ideally be focused on the weak parts of the supplied proof. In this way, the effort that was put into the proof development is not completely lost, and testing effort is focused on revealing faults in the algorithm that could not be caught by the flawed proof. For example, identification of the most complex and less convincing parts of the informal proof might suggest that some input subspaces be sampled more stringently than others. A proof by cases might suggest test cases that would be potentially significant to the correctness of the algorithm. How these general ideas can be put into practice, and whether they do allow the test efficiency to be improved, was first investigated in [1]. We used the FaultTolerant Rate Monotonic Scheduling (FT-RMS) algorithm [2] as an example of a flawed algorithm “proved” by informal demonstration (the flaw was revealed in [3]). We proposed to restructure its demonstration in a semiformal proof tree based on natural deduction. From this proof tree, we extracted several functional cases and used them for testing a prototype of the algorithm. The case study supplied mitigated results. On one hand, we found that the proof tree was a useful guide to question the soundness of the proof in a systematic manner. We identified major problems of imprecise definitions and doubtful reasoning steps, affecting the entire proof structure. On the other hand,

we found that an informal proof is not necessarily helpful for the design of testing, as the test cases extracted from the FT-RMS proof were too loosely connected with the flaw residing in the algorithm. Our conclusion was that the viability of the approach would have to be studied on more convincing proof examples. The group membership protocol proposed by [4] provides such an example. Its informal proof is much better crafted than the FT-RMS one, as it was consolidated by using model-checking to analyze instances of the protocol. Still, the protocol was found to be flawed after publication. In this paper, we investigate whether this example better supports the idea of testing being guided by the argument for correctness. Building on our previous results, the principle of proof tree analysis is also integrated into a general method called proofguided testing, whose steps are illustrated on the group membership protocol. Section 2 briefly introduces the background of the studied protocol. Section 3 gives an overview of the work detailed in the subsequent sections. The process of extracting information from the informal proof is illustrated in Sections 4 (preliminary analysis) and 5 (proof restructuration), yielding a few weaknesses to be identified. Experimental results are discussed in Section 6, proof-guided testing being compared to blind testing, allowing us to assess the difficulty of revealing the flaw in the protocol. In Section 7, the results obtained from both proof analysis and protocol testing provide feedback on the original demonstration.

2

model-checking was used to debug the algorithm and check candidate invariants for the induction proof. Unfortunately, the analysis missed a flaw that manifests itself only when there are three processors in the membership1 . This experience led one of the authors to investigate formal reworking of a simplified version of the problem [6]. After failed attempts to build an inductive proof, he proposed a different approach that has successfully been re-used by others to formally prove the current TTP membership service [7]. In this paper, we focus on the flawed protocol version and its inductive proof published in [4].

3

Overview of the Work

Figure 1 provides an overview of the work reported in the paper. Our method is based on four steps. Preliminary Analysis

Proof Restructuring

Input domain Oracle

Proof−guided testing

Feedback to the proof

Figure 1. Overview of the Proof-Guided Testing Approach

• A Preliminary Analysis aims at gaining an understanding of the algorithm and its requirements: under certain assumptions, some key properties are to be fulfilled. The assumptions include a model of the faults to be tolerated, as well as other environmental assumptions. From their identification, a definition of the algorithm’s test input domain is derived. The key properties yield a specification of the test oracle checking acceptance or rejection of the test results. The understanding of the algorithm must be sufficient to initiate the development of a prototype that will be the support of the test experiments.

Background of the Studied Group Membership Protocol (GMP)

In a distributed system, a group membership service allows non-faulty processors to agree on their membership and to exclude faulty ones. The studied algorithm is a variant of the membership service offered by the Time-Triggered Protocol (TTP). TTP [5] has been developed over the past twenty years at the university of Vienna, and is now commercially promoted by TTTech. It is an integrated communication protocol for TT architectures, typically used for automotive functions (brake-by-wire, steer-by-wire), or avionics ones (the communication system of the Airbus A380 cabin pressure control system will be based on TTP). Derived from the TTP membership service, the variant proposed in [4] was intended to minimize overhead for bandwidth-constrained networks: it requires only one acknowledgment bit per broadcast. The authors developed an (informal) inductive proof of some key properties of the service. The paper proof was consolidated by formal analysis of instances of the algorithm:

• Proof Restructuring is the corner stone of the method. It consists of rewriting the informal discourse as a semi-formal proof tree. The aim is to obtain a clear view of the proof structure by explicitly expressing the inference steps. To do so, we stick to the inference rules of a target calculus: for example, natural deduction was used in 1 See the corrected version of [4] http://www.csl.sri.com/papers/wdag97

2

downloadable

at

[1], and sequent calculus will be used in this paper. The tree reformulation is a useful guide to question the soundness of the proof in a systematic manner, step by step, and to assign a subjective assessment to the various proof branches. Note that the analysis approach is much lighter than complete formal reworking, and would not allow us to establish the correctness of the algorithm. Still, the approach should be sufficient to make a quick assessment of the degree of rigorousness of the informal proof, and to point out its less convincing parts. If we find major problems affecting the entire proof structure, our previous experience with FT-RMS [1] suggests that it may be wiser not to further consider the proof for the design of testing: the paper proof is likely not to carry relevant information for testing. In cases where the assessment results are not so severe, proof-guided testing may go on. The identified weaknesses are used to define test criteria guiding the selection of input sequences. Also, there may be an impact on the test oracle: for example, the GMP proof restructuration yielded additional oracle checks to be specified, so as to observe the validity of an intermediate lemma.

hints on how to rework the original demonstration. The subsequent sections describe the application of the proposed method to the GMP example, and the experimental results we obtained.

4

Preliminary Analysis

The studied GMP involves n processors (numbered 0, ..., n-1) attached to a broadcast bus. Execution is synchronous, with a global time t increased by one at each step. At time t, processor t mod n is the only one that can broadcast messages. This defines broadcast slots owned by this processor. The goal is to maintain a consistent record of nonfaulty processors. Only two types of faults are considered. • Send fault: a processor fails to broadcast when its slot is reached. • Receive fault: a processor fails to receive a broadcast. Faults can be intermittent: a faulty processor may operate correctly in some steps and manifest its fault in others. Only one non-faulty processor can become faulty in any n+1 consecutive steps, and there are always at least two non-faulty processors in the system. According to the proposed GMP, each processor maintains a local view of the membership set. This view is updated after every expected broadcast slot, that is, after every slot owned by a member of the local group (processors already removed from the local membership set are not listened to). According to the update, a local Boolean variable, ack, indicates whether or not the previous expected broadcaster is still a member of the group. When it is the turn of a processor to broadcast, it appends its current ack to whatever data constitutes the broadcast. By observing the presence or absence of an expected broadcast, and by comparing the broadcasted ack with their local ack, the other processors may diagnose a fault and update their local view. We do not reproduce here the update rules. Rather, we illustrate the principle of GMP by means of two scenarios (Fig. 2). Both scenarios start in a state where all non-faulty processors have the same membership set, containing all of them and only them. The local acks are true. Then, a distinguished processor p becomes faulty. At least two other processors qi s remain non-faulty. In the first scenario, p suffers a send fault. None of the qi s receive the broadcast, and p is removed from their membership set. Accordingly, their ack is

• Accordingly, Proof-Guided Testing experiments are conducted. We use statistical testing, as defined by [8], to automate the generation of input sequences. Statistical testing is a probabilistic approach for generating test data based on selection criteria. The retained criterion is used as a basis to determine: (i) the test profile from which the inputs are randomly drawn and, (ii) the size of the random sample. Generally speaking, statistical testing aims to compensate for the imperfect connection of common test criteria with the flaws to be revealed: the cases identified by a criterion have to be exercised several times with different random test data. As regards proof-guided testing, it seems reasonable not to expect a perfect match between doubtful proof steps and revealing inputs (the easy flaws would not have escaped the informal proof). Using a statistical testing approach, we make a weaker assumption: the information extracted from the proof is sufficiently relevant to increase the program failure probability. An oracle implementing the specified checks automatically analyzes the test results. • In addition to providing experimental evidence on the algorithm’s behavior, the method provides some Feedback to the proof. The results of proof restructuration and program testing may provide 3

qi ’s state

Send Fault

ack = T memb

ack = F

ack = T

memb − { p }

p’s send fault

qi ’s state

Receive Fault

ack = T memb

ack = T

memb

memb

memb − { q 1 }

ack = T

memb − { q 1 }

ack = T

ack = F

p’s state

q 2’s emission ack = F

memb

q 1’s emission p’s receive fault

memb − { p }

q 1’s emission

ack = T

p’s state

the test input domain and oracle checks, some interpretation of the informal discourse was required. We chose to define a test sequence by:

ack = T

memb − { p }

q 2’s emission

memb − { q 1 ,p }

ack = F

• The number n of processors, with n > 2. • The list of faults to be injected, each fault being characterized by an occurrence step and affected processor.

q

memb − { p }

silence of p

Note that the type of the fault may be deduced from this information: a send fault occurs at a slot owned by the affected processor, otherwise it is a receive fault. The informal discourse leads us to distinguish two categories of faults, involving different preconditions. Initial faults are those that affect processors that have been non-faulty so far. There can be no more than n-2 such faults, and any two faults are at least n+1 steps apart. Their occurrence steps must correspond to slots owned by processors that have been non-faulty so far: either the slot of the newly faulty processor (send fault) or the slot of a non-faulty processor (receive fault). Subsequent faults are those that affect processors that have already manifested a fault, and relate to the simulation of an intermittent faulty behavior. Note that, in our definition of test sequences, the subsequent faults are only potential: for example, a send fault will be effective only if the processor tries to broadcast, which depends on whether or not it has removed itself from its membership set. It was not clear from the informal discourse whether a subsequent fault has to be the same type as the initial fault, e.g., a send-faulty processor cannot suffer subsequent receive faults. We decided to adopt the least restrictive interpretation: after an initial fault, a processor may arbitrarily suffer any type of faults, or operate correctly in some steps. As regards the oracle checks, we had to interpret the formulation of Theorem 3: when the slots of at most two non-faulty processors have been passed. For an initial receive fault, does its occurrence step count as the first slot of non-faulty processor? From the informal proof, it seems that the answer is no. Having resolved the ambiguity, the verification of the three properties can be readily implemented, provided that the local membership sets are observed at each step. The observation time window corresponding to a test sequence is set to [0...tlastf + n + 1], where tlastf is the occurrence step of the latest initial fault.

ack = T memb − { q 1 ,p }

Figure 2. Group Membership Algorithm

set to false. Since a broadcaster always assumes that its broadcast was correct, p keeps its membership unchanged. Accordingly, its ack is true. Note that p and the qi s still agree on the identity of the next expected broadcaster, say q1 . A false ack is broadcasted, leading other non-faulty processors to notice that they took the same decision as q1 . Their ack is set to true. Processor p disagrees with q1 and excludes it from its membership. After a second disagreement in the slot of q2 , p will diagnose its fault and remove itself from its membership. In the second scenario, p suffers a receive fault and removes q1 from its membership. Then, noticing that the next expected broadcaster q2 did not exclude q1 , it will diagnose its fault and remove itself from its membership. When its slot is reached, it will remain silent in order to inform other processors of this fact. The GMP is intended to fulfill three key properties, yielding the three theorems of the informal proof. • Theorem 1: The local membership sets of all nonfaulty processors are always identical and contain all nonfaulty processors. • Theorem 2: A faulty processor is removed from the membership sets of non-faulty processors in the step following its first broadcast while faulty. • Theorem 3: A newly faulty processor will remove itself from its local membership set (and thereby diagnose itself) when the slots of at most two nonfaulty processors have been passed. After this overview of the GMP, we give some outcomes of the preliminary analysis. The detailed description of the local update rules in [4] made it straightforward to implement a GMP prototype. However, understanding the resulting global behavior was far from trivial. We found it useful to draw an abstract state automaton capturing our understanding of the GMP, and to explore the consequences of a few fault scenarios. As regards the definitions of

5

Restructuration Proof

of

the

Informal

After this preliminary analysis, work is focused on the informal proof that the algorithm fulfils the three required properties. We first give a general overview of 4

the proof structure and its main intermediate lemmas (Section 5.1). Then, for each main lemma, semi-formal proof restructuration is carried out (Section 5.2). The analysis of the resulting proof trees yields the identification of weak parts (Sections 5.3 and 5.4), allowing us to get hints on test data selection (Section 5.5).

5.1

Rule ` ∧ is used when conjoined subgoals have to be proved. The proof is then split into several branches, one for each subgoal. The cut rule is used to introduce new lemmas, i.e. the proof of A is split into two branches, the proof of B and the proof A assuming B. Rule ∨ ` is used for case splitting: the proof of C is split into two branches, depending on whether A, or B, is assumed. Note that a proof by cases can be represented by the successive application of cut (to introduce the appropriate disjunction of cases in the hypotheses) and ∨ ` (to perform the splitting according to those cases). A proof tree is complete when all its proof branches end with axiom rules. For example, there is an axiom stating that a formula is true when the goal to be proved is one of the hypotheses. In most cases, the informal discourse is not low- level enough to explicitly refer to axioms. Hence we use labels to denote our subjective assessment of the validity of the terminal steps of the branches. For parts of the proof that we consider conclusive, we use labels denoting the class of hypotheses from which the pending goal can be easily derived: for instance, f ault model refers to hypotheses on the faults to be tolerated (e.g., any two initial faults are at least n+1 steps apart), algo refers to the update rules of the local membership sets and ack variables, Inv(T h1) refers to the invariance of the 8 conjuncts (established by the inductive proof of Theorem 1, and assumed in the proofs of Theorems 2 and 3). Pending proof branches that we consider missing, or too complex to be easily discharged, are labeled assumed. This would also be the default label for obscure proof fragments that would defeat the analysis. Label false is used in case we are able to establish that the current goal does not hold under the considered hypotheses. An example proof tree is provided in Figure 3, and will be commented on in the next section. Complete restructuration of the GMP did not exceed one week of work. This is due to the fact that the approach is much lighter than formal reworking. The proof trees were a convenient support to analyze the inductive proofs of the 8 conjuncts, as well as the proofs of Theorems 2 and 3. We identified several problems. A general problem, affecting all the proofs, is that the authors did not properly account for all possible fault patterns: in the argumentation, the occurrence of subsequent faults following an initial one is not considered. For example, after an initial receive fault, the faulty processor is always assumed to be able to receive (or broadcast) subsequent messages if it decides to do so (except in the proof of Conjunct 6, where the possibility of not receiving two consecutive expected broadcasts is mentioned). As no justification is pro-

First overview of the proof

Theorem 1 is proved by induction on time. In order to establish the induction step, model checking was used to conduct a repeated strengthening of the invariant until an inductive one was reached. The final invariant contains 8 conjuncts. The first step of the proof is to show that the invariant holds in the initial state (time t = 0). Then the authors assume the validity of the 8 conjuncts until time t and prove each conjunct for time t + 1. In the case of Conjunct (6), this induction step also needs Conjunct (5) to hold in t + 1. Theorem 1 follows trivially from the invariance of Conjuncts (1) and (2). The proofs of Theorem 2 and Theorem 3 are based on the assumption that the 8 conjuncts are invariant. Hence, the construction of this invariant was the foundation of the whole proof. Quoting from [6]: “The informal proof of inductiveness of the conjoined invariants is long and arduous”. For such intricate paper proofs, careful reading is likely to be insufficient to identify the weak parts: this is the rationale for proof restructuration.

5.2

Informal proof restructuration in sequent calculus

The goal of restructuration is not to establish the correctness of the algorithm, but to make a quick assessment of the degree of rigorousness of the proof, and to point out its less convincing parts. The proof trees obtained are semi-formal, using some notations which are not formally defined (e.g. Theorem 3, Sfault). We used sequent calculus to restructure the informal proof in the form of semi-formal proof trees, and analyze them step by step. A sequent is written in the form Γ ` P , where Γ is a list of hypotheses, and P is a conjecture to be proved under these hypotheses. An intuitive interpretation is that the conjunction of the hypotheses should imply P. A proof is then a tree of sequents: the main goal is placed at the root (bottom) of the tree, and the proof tree is constructed upwards from the root by applying inference rules of the form: Γ`A Γ`B Γ`A∧B

`∧

Γ ` B Γ, B ` A Γ`A

Γ, A ` C Γ, B ` C Γ, A ∨ B ` C

cut

∨`

5

vided for ignoring subsequent faults, we consider it as a flaw in the proofs. Taking this flaw into account would lead potentially many proof branches to be labeled assumed. Hence, we decided to put the problem aside, and performed the assessment of the proofs under the assumption that subsequent faults do not occur (We will come back to the problem in Section 5.5). Under this assumption, we found two additional problems affecting respectively Theorem 3 and Conjunct (5). Both of them illustrate the intricacy of the reasoning on time, which is error prone without the support of automated tools. As regards Theorem 3, the authors of [4] made the following diagnosis in the corrected version of the paper (see their footnote 5): “This argument is incorrect when there are only three processors”. In our analysis, we tried not to take advantage of any knowledge of the GMP bug. Indeed, the problem we identified cannot easily be linked to a 3-processors configuration (the link will become apparent when we revisit the proof based on the test results, in Section 7).

5.3

the right part they intend to prove the second disagreement (ack6= (tok2 )), putting the first one in the hypotheses. The informal discourse corresponding to these proofs is quite complicated, making the logical structure difficult to extract. Part of the problem comes from the fact that different time steps intervene in the argumentation. We distinguished four of them in the semi-formal proof tree, and assigned them the following identifiers: • tf is the step corresponding to the arrival of the fault, • tp is p’s first broadcast slot while faulty (tp = tf for an Sfault, tp > tf for an Rfault-b), • tok1 , tok2 are the first and second non-faulty slots after p’s fault (tok2 > tok1 > tp ≥ tf ). Now, it turns out that the authors are confusing tf and tp in the proof of ack6= (tok1 ): text fragment “the next expected slot” may sometimes denote the next expected slot after tf , and sometimes the next expected slot after tp . But tf and tp are not identical in the case of an Rfault-b. Let us see how the confusion becomes manifest in the proof tree. To prove ack6= (tok1 ), the authors introduce new lemmas: after tp , the local ack values of p and nonfaulty processors will differ (¬ack(q,tp ), ack(p,tp )) but they still have the same next expected broadcaster (common next b(tp )). The latter lemma ensures that p will be listening in tok1 , which is the next expected slot after tp , so as to notice a first disagreement. The proof of common next b(tp ) is supposed to be discharged in the left part of Branch B (Figure 3 (c)). But due to the aforementioned confusion, what is actually proved is common next b(tf ). Hence we have an assumed label. Note that, by propagation, ack6= (tok1 ) is doubtful in the case of an Rfault-b: it is not guaranteed that p will be listening during tok1 . As regards the second disagreement (ack6= (tok2 ), right part of Branch A, the proof is only sketched: the authors probably considered that they could reproduce the same reasoning as in the left part. In our opinion, developing the pending branch with the same level of details as for ack6= (tok1 ) would not be a trivial task. Hence we put an assumed label. In particular, it would be necessary to prove that p and the non-faulty processors still agree on the next expected broadcaster after tok1 . Since the proof is missing, the lemma has to be considered as doubtful for both Sfault and Rfault-b.

Restructuration of Theorem 3

Let us recall that Theorem 3 corresponds to a selfdiagnosis property of the algorithm: a newly faulty processor will diagnose itself as faulty, at most two nonfaulty slots after its fault (see Section 4). The proof tree is given in Figure 3. The first proof steps (see Figure 3 (a)) introduce a proof by cases based on the type of (initial) fault. The first case, Rfault-not-b, is when p (the faulty processor) has to remain silent in its first broadcast slot after the fault. This case arises when p suffers a receive-fault and is not the next expected broadcaster: by the algorithm, it should identify its fault and remove itself from its membership set before its next broadcast slot2 . This case is discharged in the left upper part of the proof tree of Figure 3 (a). As we consider it satisfactory, we do detail the proof steps here. The two other cases, Sfault and Rfault-b, are when p broadcasts while faulty: this arises when p suffers a send-fault, or a receive-fault in the last expected slot before its broadcast (and is then the next expected broadcaster). These cases are addressed by Branch A of the proof (Figure 3 (b)). According to the algorithm, a processor must disagree with the broadcasted acks twice consecutively to remove itself from its membership. Hence the proof structure chosen by the authors: in the left part of Branch A, they try to prove the first disagreement (ack6= (tok1 )) between p and nonfaulty processors q; in 2 It will diagnose itself as faulty one non-faulty slot after the fault

6

7

(Branch B)

0

properties of the next expected broadcaster hypotheses

processors states

faults

time slots

Proof notations :

Γ

` ok next b(tp )

algo Γ , ok next b(tp ) ` ack6= (tok1 )

00

(c) Branch B

cut

cut 0

∨`

assumed

f ault model

algo

`∧

algo

`∧

cut

cut

tf : fault arrival instant, tp : first broadcast slot of p when faulty (tp ≥ tf ), tok1 : first nonfaulty broadcast slot after tf (tok1 > tf ), tok2 : second nonfaulty broadcast slot after tf (tok2 > tok1 ) processors p : faulty processor, q : any nonfaulty processor Sfault : p suffers a send fault in tf , Rfault b : p suffers a receive fault in tf and is the next expected broadcaster, Rfault-not b: p suffers a receive fault in tf and is not the next expected broadcaster ack(i,t) : value of the ack bit of processor i, after the broadcast of slot t, ack6= (t) : the values of the ack bits of p and all q differ after the broadcast of slot t agree(t) : the local membership sets of p and every q are the same at the beggining of slot t, removeq (t) : processor q removed itself from its membership during slot t common next b(t) : the next expected broadcaster in t is the same for p and every q, ok next b(t) : the next common expected broadcaster is nonfaulty during its slot 0 00 0 Γ ≡ algo, f ault model, Inv(Th1), Γ ≡ Γ, Sfault ∨ Rfault-b, Γ ≡ Γ , ¬ack(q,tp ), ack(p,tp ), common next b(t)

cut

Γ , ack6= (tok1 ) ` Theorem 3

0

0

0

Γ ` ack(p,tp )

Γ , ack6= (tok1 ), ack6= (tok2 ) ` Theorem 3

Γ ` Sfault ∨ Rfault-b ∨ Rfault-not b

Semi-formal Proof Tree of Theorem 3

(a) Main Proof tree

Γ ` Theorem 3

Γ, Sfault ∨ Rfault-b ∨ Rfault-not b ` Theorem 3

(Branch A) Γ, Sfault ∨ Rfault-b ` Theorem 3

Γ, Rfault-not b ` Theorem 3

(b) Branch A

0

Γ ` ¬ack(q,tp ) ∧ ack(p,tp )

Γ , ack6= (tok1 ) ` ack6= (tok2 )

Γ, Sfault ∨ Rfault-b ` Theorem 3

not reproduced here, assessment ok

Γ ` ack6= (tok1 )

0

0

f ault model

0

Γ ` ¬ack(q,tp ) ∧ ack(p,tp ) ∧ common next b(tp )

cut

∨`

Γ, Rfault-b ` ¬ack(q,tp )

0

Γ, Sfault ` ¬ack(q,tp ) Γ ` ¬ack(q,tp )

f ault model, algo, Inv(T h1)

f ault model, algo, Inv(T h1)

Γ , ¬ack(q,tp ), ack(p,tp ), common next b(tp ) ` ack6= (tok1 )

00

Γ ` common next b(tp )

0

Γ , agree(tf ), ¬removeq (tf ) ` common next b(tp )

assumed: what is proved is common next b(tf )

Γ ` ¬ack(q,tp ) ∧ ack(p,tp ) ∧ common next b(tp )

0

0

Γ ` agree(tf ) ∧ ¬removeq (tf )

Inv(T h1)

5.4

Conjunct (5) of the Invariant

The second general problem concerns the false lemma used in the proof of Conjunct (5). The lemma appears at several places in this proof, and our assessment is that the invariance of Conjunct (5) is dubious whatever the fault configuration: it is not possible to identify safe subsets of the GMP input domain. Now, let us recall that the other conjuncts are inductively proved under the assumption that Conjunct (5) holds until time t. The resulting invariance statements are used in the proofs of the three theorems. Thus, by propagation, no part of the GMP proof structure can be safely considered to be conclusive. All the functional cases that can be extracted from the proof trees are candidate cases to be covered during testing, as they are potentially significant to the correctness of the GMP. These cases are:

Any flaw affecting the proof of one conjunct is potentially serious, because it may propagate to Theorems 1, 2 and 3, and jeopardize the whole GMP proof. Conjunct (5) corresponds to the following property: “ If a processor p became faulty less than n steps ago and q is a nonfaulty processor, either p is the present broadcaster or the present broadcaster is in p’s local membership set iff it is in q’s. ” This formulation is clearly not easy to handle in order to establish the induction step from t to t + 1. In the informal demonstration, text fragments such as “n steps ago”, “since then and up until the next step” have to be interpreted. They define time windows whose bounds depend on whether the reference step is t or t + 1. Unsurprisingly, we found problems related to the treatment of the boundary cases. Three of the proof branches were labeled false. They all correspond to the same flaw. This flaw is the introduction of an intermediate lemma stating that some processor r is never the broadcaster in the window [t − (n − 1), t], under the assumption that r will be broadcasting in t + 1. The lemma does not hold because broadcasting in t + 1 implies having broadcasted in t − (n − 1). We also noticed an implicit use of Conjunct (2) in t+1 (this conjunct states that all non-faulty processors are in their own local membership sets). After analysis, we concluded that this was not a problem because no circular dependency is induced, i.e., the induction step of Conjunct (2) does not need Conjunct (5) to hold in t + 1.

5.5

• the activation conditions of the different update rules of the algorithm (algo labels appearing in the various proof trees), • cases Sfault / Rfault-b / Rfault-not-b (proof tree of Theorem 3), • initial fault not followed by any subsequent fault (all the proof trees), • two consecutive receive faults (proof tree of Conjunct (6)). These general problems mean that it is necessary to perform a global test, verifying the behavior of the GMP on a sample drawn from the whole input domain. The sampling profile has to account for the combination of transient, intermittent and permanent fault patterns with the other cases extracted from the proof analysis. We decided to refine the state automaton built during the preliminary analysis, so as to incorporate all relevant information for defining the profile. The states of the automaton are characterized by the local ack and membership values of the processors, distinguishing between faulty and non-faulty processors, but abstracting from their number and identity. Transitions correspond to broadcast slots (both faulty and non-faulty), and are tagged by the update rules to be activated. The initial state of our automaton is named stable: as long as there is no fault, the system stays in this stable state. Upon the arrival of a new (i.e. initial) fault, the system leaves the stable state, until the faulty processor is removed from its own membership and from the membership of the non-faulty processors, hence going back to the stable state. We identified 20 classes of paths from stable to stable (versus 3 in the original automaton). Covering these classes of paths is the retained test criterion, since it ensures combined coverage of the cases extracted from the proof analysis.

Feedback from the proof analysis

The results obtained from proof restructuration now have to be analyzed from the perspective of testing. Although some weaknesses have been found, our overall judgment is that the proof is of good quality, and is worth being considered for the design of test data. Hence, proof-guided testing may go on. Three problems have been identified. Two of them may potentially affect the whole GMP proof, while the third one is localized in the proof of Theorem 3. The first general problem is that the authors did not account for all fault patterns: after an initial fault, subsequent faults are not considered, except once in the proof of Conjunct (6). Hence, the GMP behavior has to be tested in case of intermittent or permanent faults. 8

We also decided to strengthen the test oracle so as to check the validity of Conjunct (5). Adding this check does not require increasing observability: as for the other checks, it is sufficient that the local membership sets of the processors are observed at each step. Concerning Theorem 3, the identified problem suggests two input subdomains to be tested more thoroughly than others: Sfault with no subsequent fault, Rfault-b with no subsequent fault. Hence, the global test will be supplemented by specific tests directed towards these subdomains. Note that each subdomain corresponds to one class of paths in the refined automaton.

6

Test Experiments : testing

processor (or the broadcaster would not be listened to, and hence such a slot would not affect the system). After an initial fault, the affected processor has a probability 0.5 to manifest a fault in any subsequent time slot. In this way, all subsequent fault configurations are made equally likely. Table 1 shows the experimental results supplied by a sample of 104 random test sequences. We observed no failure of Theorem 1 or Theorem 2. However 6,15% of the sequences revealed a flaw affecting Theorem 3. This flaw is the one already reported on. The flaw was not so hard to reveal with the crude test profile based on the preliminary analysis. We will now investigate whether the knowledge extracted from the proof analysis allows us to further improve the revealing power of testing.

Proof guided

6.2

In order to perform the test experiments, we implemented a prototype of the GMP algorithm in the C language. We first used a crude random testing approach to assess the difficulty of revealing the known flaw (and of possibly new ones), and then studied whether the proof-guided approach could allow us to enhance the revealing power of testing.

6.1

Proof-guided testing

From Section 5.5, two categories of tests have to be performed. We will first consider the global test of the GMP, and then the specific tests directed towards the doubtful cases in the proof of Theorem 3. Global test of the GMP Like the crude test, the global test consists in sampling over the whole input domain. However, the sampling profile is now designed to ensure balanced coverage of the 20 classes of paths of the automaton. The number of processors, of initial faults, are generated as in the crude profile. The difference resides in the generation of faults: probabilities governing the type of the initial fault, as well as the occurrence of subsequent faults, are determined so as to make each class equally likely. This is in conformity with the principles of statistical testing designed from coverage criteria [8]. A sample of 45 test sequences was generated under this profile. Considering that each class of paths has a probability at least 1/20 of being selected in one sequence, this yields an overall 0.9 probability of activation during testing. Note that the estimation is very pessimistic, since a test sequence may contain several initial faults (5.25 on average, but the variance is high due to dependency on the number of processors). In practice, the sample activates each class several times (combien au minimum ?) with different random subsequences. The test results are given in Table 2. Note that Conjunct (5) has been added to the oracle checks. The 45 test sequences yielded 4 failures for Theorem 3. For the sake of our comparison with crude random testing, we also conducted a much longer test set with this profile, and achieved an average failure rate of 7.65%. All failures are due to the known flaw.

Crude random testing

In this crude approach, the random sampling profile is based on the definition of the input domain given by the preliminary analysis. It does not make use of the additional information we obtained from proof restructuration. In the same way, the test oracle is based on the core properties of the GMP (Theorems 1 to 3) and does not include a check for Conjunct (5). Recall that a test sequence is defined by the number n > 2 of processors, and a list of faults characterized by their occurrence step and affected processor. The number n of processors is uniformly selected over the range [3 . . . 20], a range corresponding to the systems targeted by the TTA architecture. As regards the faults, we distinguish between initial and subsequent faults, as they involve different preconditions. The total number of initial faults (i.e. affecting a previously non-faulty processor) is uniformly selected between 1 and n − 2 (there are always at least two nonfaulty processors). The first initial fault is generated to occur in the first round: [0 . . . n − 1]. The fault instant and the affected processor are chosen randomly, the combination of these two data items gives us the fault type (i.e. send fault or receive fault). Then from an initial fault instant t, we randomly choose the next initial fault instant, which is at least n+1 steps later in time, and must correspond to the slot of a non-faulty 9

Table 1. Results for Crude Random Testing Number of Test Sequences 104

Failure Detection Rate for Theorem 1 0%

Failure Detection Rate for Theorem 2 0%

Failure Detection Rate for Theorem 3 6.15%

Table 2. Results for Global Proof Guided Testing Number of Test Sequences 45 104

Failures Detected for Theorem 1 0 0%

Failures Detected for Theorem 2 0 0%

Failures Detected for Theorem 3 4 7.65%

Failures Detected for Conjunct 5 0 0%

a 3 processor membership (whether it is the initial full membership or a membership obtained after several injected faults). This observation will be confirmed by Theorem 3 proof reworking.

The global test was intended to reveal flaws related to potentially invalid assumptions that Conjunct (5) holds, or to missing proof cases for intermittent or permanent fault patterns. Our strongest result is that Conjunct (5) never failed during our tests, even with the longer sample. This is in accordance with the fact that Theorem 1 is also never contradicted, and supports the claim that Conjunct (5) could be valid, despite the identified weak parts. Also we did not identify problems specifically related to the occurrence of subsequent faults. As regards the flaw in Theorem 3, it is obvious that the global profile is not much more stressing than the crude one. Let us now investigate the results of testing when directed towards its specific proof weaknesses. Directed specific tests Theorem 3 has to be tested with two fault configurations, so we decided to conduct two test experiments: one with each of the configurations, using 10 test sequences for each of them. The send fault configuration (only send faults were allowed, with no subsequent fault) did not detect any failure, and we confirmed this result with longer test sequences. These tests tend to indicate that Theorem 3 should be valid for send faults, restricting the flawed part of the proof to the second studied fault configuration. The results of the tests with receive faults on the next expected broadcaster are given below (Table 3). With only 10 test sequences, we detected 2 failures of Theorem 3 (Theorem 1, Theorem 2 and Conjunct (5) were also included into the oracle but did not detect any failure). We verified this result with a sample of 104 test sequences, reaching an average detection rate of 19% for Theorem 3 (once again, the other checks did not fail). This is three times the detection rate of the crude random testing and clearly indicates that Theorem 3 is flawed for the RFault-b fault configuration. By closely analyzing the test results, we also noted that every failure was triggered by a fault injected in

7

The Proof Revisited

We presented in Section 6 how to use the results given by a restructuration of the informal proof to enhance testing effectiveness. We will now use these testing results to gain deeper understanding of the proof. In Section 5 we rebuilt the informal proof, retaining the author’s structure. The test results give us more confidence in our analysis and will guide our reworking of the identified weak parts. Regarding Conjunct (5) of the invariant, we found several weaknesses in its informal proof (as described in Section 5). However, testing this conjunct specifically, by adding it to the test oracle, never led to failure (see Section 6.2). If the conjunct is true, then we should be able to redo the proof and discharge the pending weak parts. In Conjunct (5), the authors use properties over time windows whose bounds depend on whether the reference step is t or t + 1. The weaknesses we found are related to the treatment of the boundary cases (see Section 5.4). By treating separately those boundary cases, we managed to prove the relevant properties and so finish the proof of Conjunct (5). The flaw which was revealed concerning Theorem 3 was mentioned by the authors in the revised version of [4]. This flaw was addressed with a patch to the algorithm which changes one of the update rules. Our tests specifically directed towards Theorem 3 proof never reported a failure with the “Sfault” configuration. We therefore decided to split the proof of Branch A of Theorem 3 into two sub-proofs, one for the “Sfault” configuration and one for the “Rfault-b” configuration (see Section 5, Figure 3). For an “Sfault” configuration, the assumed part of Branch B is no longer a problem, as tf , the fault arrival 10

Table 3. Results for Specific Proof Guided Testing (Rfault-b configuration, no subsequent faults) Number of Test Sequences 10

Failures Detected for Theorem 3 2

instant, and tp , the first broadcast of p when faulty, are the same because we have a send fault. The other assumed branch (corresponding to the proof of ack6=(tok2 ) ) is linked to the second nonfaulty broadcast after p’s fault. Under an “Sfault” hypothesis, we were able to finish this proof too, using the same structure as for ack6=(tok1 ) . The “Rfault-b” sub-tree was much more complicated. We were able to prove the assumed part of Branch B, concerning the first nonfaulty broadcast after p’s fault (ack6=(tok1 ) ). However, it was much more arduous to work on the second nonfaulty broadcast after p’s fault (ack6=(tok2 ) ). Here we had to use Conjunct (5), and so had to prove that its precondition was verified: “p became faulty less than n steps ago”. This precondition is false when only 3 nonfaulty processors remain in the group. By reworking the informal proof, we were able to confirm the results of our tests.

8

Failure Detection Rate for Theorem 3 (104 sequences) 19%

test experiments are performed. Proof guided testing was effective in the GMP example. Its proof was more convincing than the one we studied in previous work on FTRMS [1]. During restructuration, we were not stopped by imprecise definitions and meta-level flaws. As a result, we were able to point at several weaknesses in the proof, one of them being linked to the flaw in the algorithm. This encouraging result suggests that an informal proof may indeed carry relevant information for testing, provided it passes the restructuration step. Note that, in the GMP example, the extracted information was not sufficient to a priori identify the precise 3-processors configuration. We claim that its identification would take very deep insight into the proof. The proposed light analysis approach, coupled to a statistical testing approach, seems more cost-effective. It allowed us to identify an input subspace to be sampled more thoroughly than others, and to obtain a failure rate that was three times the rate under the crude profile. Once the revealing configuration is found by testing, it becomes easier to revisit the proof so as to diagnose the flaw. We now intend to investigate testing from formal (but partial) proofs. A candidate case study is the formal PVS model of the GMP presented in [7]. As its proof is complete, it may be considered as flawless with respect to its key properties. We will insert flaws into the algorithm and then study whether the accordingly modified, and now partial, formal proof could be helpful to guide the design of testing.

Summary and Conclusion

Practical ways to combine testing and theorem proving have been little explored in the literature. Work on testing has primarily been concerned with the generation of test data from formal specification, for conformance testing. It has been less concerned with developing tight interaction between testing and proving (with a few exceptions, e.g. [9], [3] and [10]. Yet, the idea has been expressed since the 70s: “A judicious combination of direct program proving and empirical judgment can reduce size of a complete test.” [11]. Our approach to the problem is an experimental one: taking examples of published (flawed) paper proofs, we investigate whether their analysis may supply useful information for the design of testing. The proposed method for extracting information involves two steps. During preliminary analysis, we extract the information needed to define the test input domain and oracle. We also gain a basic understanding of the algorithm’s behavior. In a second step, we conduct proof restructuration. The informal discourse is rewritten as a semiformal proof tree, to find potential weaknesses towards which we should direct testing. At this step, we are able to make a quick assessment of the degree of rigorousness of an informal proof, and stop the process if it is not judged strong enough. Otherwise, proof-guided

References [1] Guillaume Lussier and H´el`ene Waeselynck, “Informal Proof Analysis Towards Testing Enhancement”, in 13th Int. Symposium on Software Reliability Engineering (ISSRE’02). Nov. 2002, pp. 27–38, IEEE Computer Society, Annapolis, MD, USA. [2] Sunondo Ghosh, Rami Melhem, Daniel Moss´e, and Joydeep Sen Sarma, “Fault-tolerant rate monotonic scheduling”, Real-Time Systems, vol. 15, no. 2, pp. 149–181, 1998. [3] Purnendu Sinha and Neeraj Suri, “On the use of formal techniques for analysing dependable real-time protocols”, in 21st IEEE Real-Time Systems Symposium

11

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

(RTSS’00). 1999, pp. 126–135, IEEE Computer Society, Phoenix, AZ. Shmuel Katz, Pat Lincoln, and John Rushby, “Lowoverhead time-triggered group membership”, in 11th Int. Workshop on Distributed Algorithms (WDAG’97), Marios Mavronicolas and Philippas Tsigas, Eds., Saarbr¨ ucken Germany, Sept. 1997, pp. 155–169, Springer-Verlag, LNCS 1320. Hermann Kopetz and G¨ unter Gr¨ unsteidl, “TTP – a time-triggered protocol for fault-tolerant real-time systems”, IEEE Computer, vol. 27, no. 1, pp. 14–23, Jan. 1994. John Rushby, “Verification diagrams revisited: Disjunctive invariants for easy verification”, in ComputerAided Verification (CAV’00), E. A. Emerson and A. P. Sistla, Eds., Chicago, IL, July 2000, pp. 508–520, Springer-Verlag, LNCS 1855. Holger Pfeifer, “Formal verification of the TTP group membership algorithm”, in Formal Methods for Distributed System Development Proceedings of FORTE XIII / PSTV XX 2000, Tommaso Bolognesi and Diego Latella, Eds., Pisa, Italy, Oct. 2000, pp. 3–18, Kluwer Academic Publishers. P. Th´evenod-Fosse, H. Waeselynck, and Y. Crouzet, “Software statistical testing”, in Predictably Dependable Computing Systems, H.Kopetz B. Randell, JC. Laprie and B. Littlewood, Eds. 1995, pp. 253–272, Springer Verlag. B. Cukic, “Combining testing and correctness verification in software reliability assessment”, in 2nd IEEE High-Assurance Systems Engineering Workshop (HASE’97). Aug. 1997, pp. 182–187, IEEE Computer Society, Washington, DC. V. Rusu, “Verification using test generation techniques”, in Formal Methods Europe (FME’02), Copenhagen, Denmark, July 2002, pp. 252–271, SpringerVerlag, LNCS 2391. John B. Goodenough and Susan L. Gerhart, “Towards a theory of test data selection”, IEEE Trans. on Software Engineering, vol. SE-1, no. 2, pp. 156–173, 1975.

12