Eliciting vague but proper maximum entropy priors in ... .fr

need for expert feedback in understandable terms has been pointed out by Cooke .... Thus MDI priors are defined by. πME(θ) = arg max π(θ)≥0. ∫. Ω π(θ)Z(θ) dθ −. ∫ ..... Right-censored times: 70.0, 159.5, 98.5, 167.2, 66.8, 95.3, 80.9, 83.2 ...
208KB taille 19 téléchargements 289 vues
Noname manuscript No. (will be inserted by the editor)

Nicolas Bousquet

Eliciting vague but proper maximum entropy priors in Bayesian experiments

Received: date / Accepted: date

Abstract Priors elicited according maximal entropy rules have been used for years in objective and subjective Bayesian analysis. However, when the prior knowledge remains fuzzy or dubious, they often suffer from impropriety which can make them uncomfortable to use. In this article we suggest the formal elicitation of an encompassing family for the standard maximal entropy (ME) priors and the maximal data information (MDI) priors, which can lead to obtain proper families. An interpretation is given in the objective framework of channel coding. In a subjective framework, the performance of the method is shown in a reliability context when flat but proper priors are elicited for the Weibull lifetime distributions. Such priors appear as practical tools for sensitivity studies. Keywords Bayesian inference · expert opinion · Kullback-Leibler distance · Shannon’s entropy · noninformative priors · channel coding · sensitivity study · Weibull

1 Introduction With the development of Bayesian techniques in numerous applied domains, the requirement of justifiable prior elicitation methods by decision-makers increases. For years systematic methods have been established by statisticians to propose prior distributions in a conservative way. In objective frameworks, Kass and Wasserman (1996) summarized the various ways of defining the “noninformativeness” of a prior. In subjective frameworks where expert information is available, conservative methods try to avoid arbitrary choices among all possible prior probabilistic distributions whose features can fit with the expert requirements, and, as much as possible, keeping the data information being more conclusive than the subjective information a posteriori. See Robert (2001, chap.3) and Press (2003, § 5.4) for an overview of the weighty work dedicated to this subject. Besides, sensitivity studies must be done in any subjective Bayesian inference, focusing on particular aspects of the expert opinion (Press 2003). Thus prior elicitations should adapt to a fluctuating number of constraints translated from expert opinions (Hill and Spall 1994). In addition, elicited priors require a certain versatility in function of the expert self-confidence: a Bayesian analyst often needs to moderate the subjective odds around the central value of a quantity of interest. In various areas, this common overoptimism of expert opinions and the need for expert feedback in understandable terms has been pointed out by Cooke (1991), Press (2003), Garthwaithe et al. (2005), O’Hagan (1998, 2006), Forrester (2005), Goldstein (2006), Goossens and N. Bousquet Laval University Department of Mathematics and Statistics 1045 av. de la M´edecine Qu´ebec (QC) G1V 0A6, Canada Tel.: +1-418-656-2131 Fax: +1-418-656-2817 E-mail: [email protected]

2

Cooke (2006), O’Hagan et al. (2006) or Oakley and O’Hagan (2007). More generally, there is a deep need for systematic elicitation methods that are common to objective and subjective frameworks, allowing to progress carefully from noninformativeness to informativeness. The most current approaches to do so are based on the maximization of a specific criterion functional under constraints (see Venegaz-Martinez 2004 for a review). In a larger context of probabilistic modeling, the most famous method is probably the elicitation of ME distributions which has been mainly popularized by Jaynes (1957, 1982, 1988, 2003). In the Bayesian framework, ME and related MDI priors have been studied by Zellner (1977, 1991, 1996, 2000), Berger (1985) and Soofi (1992), among many others. Objective linear constraints for the prior provide a general form. When expert opinion can be translated under the form of additional linear constraints, the ME and MDI elicitation appear as ideal formal rules to build priors respecting the previous requirements. The use of ME and MDI priors is quite broad and applications have been studied in many works. See Skilling (1989), Smith et al. (1992), Zellner (1996), Le Besrenais et al. (1999), Soofi (2000), Miller and Yan (2000), for examples and more references. Some authors have proposed other elicitations based on similar ME approaches in various cases of prior knowledge (Friedlander and Gupta 2006). An issue which is frequently encountered using the ME or the MDI priors is their impropriety (namely, the prior measures are not real probabilistic densities) when there is no prior information or when little expert knowledge is considered only. This defect makes the analysis difficult, especially in Bayesian selection (Andrieu et al. 2001), and does not hold with the requirements of most subjective Bayesian analysis. Truncation can help to avoid improper priors but the choice of truncating values can be difficult in high-dimensional problems. Besides, Natarajan and McCulloch (1998), Linneman (2000) and Berger (2006), among others, highlighted the unpredictable consequences of an arbitrary choice of truncation in estimation and model selection. Our aim in this article is then to provide an encompassing family (called MEH priors in the following) for the ME and the MDI priors in order to increase their chances of being proper. The formal elicitation method is still based on linear constraints but we emphasize a constraint that is implied logically by the propriety requirement: in substance, we set the prior mean of the entropy of the sampling model to a variable but finite value. The remainder of this paper is organized as follows. In Section 2, we first recall the general elicitation principles of the ME and MDI priors. Then, observing the implication of propriety, we define the MEH priors as the solution of a ME problem under the alternative propriety constraint hereinbefore, or alternatively as the solution of a weighted MDI elicitation. The MEH family appears then as encompassing for both. Then two aspects of MEH priors in Bayesian analysis are highlighted in objective then subjective frameworks. In Section 3, we provide an asymptotic interpretation of these priors in source coding and channel transmission. They induce that the channel capacity is directed by the data entropy and a linear function of the message length, whose coefficient can be modulated to increase or decrease the transmission of data information. Finally, Section 4 is more dedicated to the incorporation of subjective expert information. The interest of MEH priors for informative Bayesian studies is first briefly discussed. Especially, MEH priors can incorporate qualitative expert information that is not taken into account in ME and MDI prior elicitation. Besides, MEH priors are shown to be of possible use as tools for sensitivity studies and expert-analyst dialogs. Then a case-study is presented in the area of reliability and risk assessment. Flat but proper, semi-explicit priors are elicited for the Weibull distribution when an estimator of the mean time to failure (MTTF) of a device is known a priori without credible precision, the expert information being mainly qualitative. To our knowledge, this is a new result in this area. The posterior results are shown to be of value for correcting overoptimistic expert opinions.

2 Formal aspects Let X be a random variable defined in a separable sample space S. X is supposed to follow a decisionmaking parametric model M(Θ) with probability density function p(x|θ), whose random parameter

3

vector Θ is valued in the metric space Ω, and for which a prior density π(θ) is assumed. Both densities (p(.|θ), π) are supposed to be absolutely continuous with respect to (unexplicit) dominating measures (µ(x), ν(θ)). Thus the complete Bayesian model is of joint density f (x, θ) = p(x|θ)π(θ) with respect to µ(x) ⊗ ν(θ). Ones requires to elicit a prior positive measure π(θ), possibly from expert knowledge, to implement the Bayes paradigm and update the information on Θ knowing some observed data xn . 2.1 ME and MDI priors The maximal entropy (ME) prior π M E is defined as the maximizer of Shannon’s (1948) discrete entropy or its continuous differential analog relatively to π J , namely (assuming a sum rather than an integral in a discrete case), Z π(θ) ME π (θ) = arg max − . (1) π(θ) log J π (θ) π(θ)≥0 Ω or, equivalently, as the minimizer of the Kullback-Leibler divergence between all possible priors π and π J . Precisions and explanations about the links between information theory and statistics can be found in the seminal book by Cover and Thomas (1991). In the most common definition of ME priors, π J is chosen uniform or can be confounded with the dominating measure ν (see for instance Druilhet and Marin 2007) for a choice of the Jeffrey’s measure) and is not written. It has been introduced here in a concern of generality: indeed, many studies have enabled to elicit and classify (usually improper) benchmark priors π J appearing as formal representations of ignorance in statistical problems (Kass and Wasserman 1996, Shulman and Feder 2004). Zellner’s maximal data information (MDI) prior (Zellner (1977, 1991, 1996, Zellner and Sinha 1990, Soofi 1992) is among the most important alternatives to the ME priors. It is chosen so as to maximize the average information G(Θ) in the data density relative to that in the prior: denoting H(X|θ) and H J (Θ) the entropy of the sampling model and the (relative) entropy of the prior,   G(Θ) = Eθ H J (Θ) − H(X|θ) . Thus MDI priors are defined by π M E (θ) = arg max π(θ)≥0

Z

Z π(θ)Z(θ) dθ −



π(θ) log Ω

π(θ) π J (θ)

(2)

where Z(θ) is Shannon’s information (or negative differential entropy) of the sampling distribution Z Z(θ) = p(x|θ) log p(x|θ) dx S

Zellner (1997) noticed that quantity G(Θ) gives “the total information provided by an experiment over and above the prior”. Thus maximizing gain G(Θ) implies to minimize the information carried by π(θ) through the inference (Soofi 2000). See Soofi (1992, 1994) for more detailed and didactic presentations of the information-theoretic arguments. In a purely objective framework, the ME and MDI maximization problems (1) and (2) are to be solved under a propriety constraint which commonly takes a linear form: Z π(θ) dθ = 1. (3) Ω

Besides, it is assumed that some prior information can be summarized, if available, in linear equations Z fi (θ)π(θ) dθ = ci . i = 1, . . . , p, (4) Θ

4

The general solutions of problems (1) and (2) under constraint (3) and (4), obtained using the Lagrange multipliers method (Press 2003, § 5.4), are the only positive density measure ! p X ME J π (θ) ∝ π (θ) exp − γi fi (θ) , (5) i=1

π

M DI

J

(θ) ∝ π (θ) exp Z(θ) −

p X

! γi0 fi (θ)

.

(6)

i=1

where (γ1 , . . . , γp ) 6= (0, . . . , 0) and (γ10 , . . . , γp0 ) 6= (0, . . . , 0) are to be found in IRp such that π M E and π M DI respect (3) and (4). However, it is to note that (3) remains inoperative in this resolution, since the associated Lagrange multipliers disappear in the proportionality constants unwritten in (5) and (6). Those unknown constraints are only dependent on constraints (4) to be finite. That will remain true with any other indicative measure perceived as a constraint on π(θ) (for instance, a percentile vector on θ). Therefore when the number of constraints (4) is small, the ME and MDI priors remain often improper.

2.2 MEH priors Notice that Z π(θ)Z(θ) dθ = H(Θ) − H(X, Θ)

Eθ [Z] = Ω

where H(X, Θ) is the entropy of the joint distribution with density f . Eliciting aR proper π means R π(θ) dθ = 1R ⇒ H(Θ) < ∞. Since, obviously, p(x|θ) is chosen proper (namely S p(x|θ) dx < ∞ Ω ∀θ ∈ Ω), then Ω π(θ) dθ < ∞ ⇒ H(Θ, X) < ∞. Finally, a proper π implies Eθ [Z] < ∞. This simple result suggests that the maximization of Z(θ) can be formally done under constraint Eθ [Z] = c < ∞. Since this constraint takes a linear form, the Lagrange multipliers method can be easily used to obtain an explicit writing of the family of solutions. However, since the value c of Eπ [Z] remains unknown, this propriety constraint is said ”hidden” in the following. The next definition and proposition summarize these points. Definition 1 MEH priors. The family of maximal entropy priors elicited under the hidden propriety constraint is defined by the maximization problem Z π(θ) dθ (7) π M EH (θ) = arg max − π(θ) log J π (θ) π(θ)≥0 Ω Z under the linear constraint Z(θ)π(θ) dθ = c where c < ∞. Ω

Proposition 1 Let (γ0 , . . . , γp ) ∈ IRp . Under the constraints (4), the solution of the problem (7) takes the form ! p X M EH J π (θ) ∝ π (θ) exp −γ0 Z(θ) − γi fi (θ) . (8) i=1

In Appendix a line of proof of Proposition 1 is given. ME and MDI priors appear as special cases of MEH priors: π M EH (θ) = π M E (θ) when γ0 = 0 and π M EH (θ) = π M DI (θ) when γ0 = −1. The MEH encompassing family could thus be of value to modify improper ME or MDI priors, modulating parameter γ0 . Equations (4) produces relationships between hyperparameters (γ1 , . . . , γp ) and one more constraint is necessary to assess all γi . This constraint can take any form, especially other than a Lagrange linear constraint. For instance, they can emanate from a qualitative understanding of the

5

model. A discussion of this point follows in Section 4. Definition 1 takes the form of a ME definition, but it is to note that an alternative definition can be proposed. Indeed, MEH priors can be defined too as the maximizers of the weighted prior average information in the data density minus the (relative) information in the prior density Z Z π(θ) π M EH = arg max dθ. (9) π(θ)(−γ0 )Z(θ) dθ − π(θ) log J π (θ) π(θ)≥0 Θ Θ This observation indicates that, in substance, the MEH elicitation tries to reach a compromise, which is modulated by γ0 , between the maximization of a prior entropy and the need for keeping true a relevant characteristic of the prior joint information (a finite average information in the sampling distribution). One gives more importance to the entropy maximization if γ0 → 0 and more importance to the inferential weight of the data if γ0 → −1.

3 Application in objective Bayesian analysis The ME, MDI and MEH elicitations have interpretations in channel coding. Without loss of generality, assume π J (θ) ∝ 1. Denote I(Θ, Xn ) Shannon’s mutual information, providing the average information common to the parameter Θ and a sample Xn of length n. I(Θ, Xn ) reflects the transinformation (Luttrell 1985) of a channel in which information on Θ is sent to n receiver who has to decode X, and the maximal value of this information (possibly infinite) is defined as the channel’s capacity. For more precisions, see Cover and Thomas (1991, Chap.5-8-10). It is well known that I(Θ, Xn ) can be defined as the Bayes risk Z Z p(xn |θ) dx1 . . . dxn dθ I(Θ, Xn ) = π(θ) p(xn |θ) log m(xn ) n Ω S R where m(xn ) = Ω π(θ)p(xn |θ) dθ. Provided the integrals are interchangeable (and assuming i.i.d data xn ), we obtain an alternative writing: I(Θ, Xn ) =

Z

n Y

Z π(θ)

S n i=1



Z − n Z X i=1



n X

log p(xj |θ) log dx1 . . . dxn dθ

j=1

Z p(xn |θ) log m(xn ) dx1 . . . dxn dθ

π(θ) Sn



=

p(xi |θ)

Z

Z p(xi |θ) log p(xi |θ) dxi dθ −

π(θ)

m(xn ) log m(xn ) dx1 . . . dxn , Sn

S

= nEθ [Z] + H (Xn )

(10)

Since G(Θ) = Eθ [Z] + H(Θ), it is easy to see that G(Θ) =

1 (I(Θ, Xn ) − H (Xn )) + H(Θ). n

While the ME and MEH approaches maximize H(Θ), the MDI approache maximizes I(Θ, Xn )/n + H(Θ) As noticed by Soofi (1992), G(Θ) appears as a “broader” information criterion than a prior entropy. The typical order of divergence will be in (d log n)/2 when θ is a continuous parameter and d = dim Ω. Indeed, Clarke and Barron (1994) gave an asymptotic writing of I(Θ, Xn ) under mild conditions, assuming to restrict Ω to a compact set K: Z d n 1 I(Θ, Xn ) = log + π(θ) log det I(θ) dθ + H(Θ) + o(1) (11) 2 2πe 2 K where I(θ) is the Fisher information of M(θ).

6

Comparing (11) with (10) and setting Eθ [Z] to a finite value c, an asymptotic interpretation of the MEH elicitation can be noticed. Indeed, it can be understood as the maximization of the prior entropy under the constraint that the asymptotic behavior of the data entropy H(Xn ) ∼ (d log n)/2 − cn. In other terms, when the length of a message increases, the mean data information can be strongly modulated by the linear coefficient c. Choosing c < 0 will makes the data be little informative and have the effect of slacking the increasing of the transinformation. Thus, the prior will have more chances to be informative. Conversely, a positive value for c will give more information to the data and decrease the chances of eliciting a proper prior. Note that the ME elicitation, focusing only on H(Θ), does not take into account the issues that are raised by the asymptotic form of the channel transinformation and, again, is also more likely to provide improper priors. Thus, assuming an unknown but finite value for Eθ [Z] = c (typically expressed in bits) is forcing the channel’s capacity to be mainly dependent on the information yielded by the data when |nc/H(Xn )| ∼ 0. Since I(Θ, Xn ) ≥ 0 in informative contexts (i.e., when π is proper), all values of c ∈ IR do not lead inevitably to proper elicitations, but this construction can avoid to obtain automatically improper priors, providing a range of possible value of c to keep π proper. Thus, in a Bayesian study, one can ˆ 0 of H(Xn0 ) (Beirlant et al. imagine obtaining from a former experiment a nonparametric estimate H 1997) where n0 is a small number of data such that, for n ≤ n0 , the prior is wanted strongly informative (namely, reducing I(Θ, Xn ) close to 0) but wanted less and less informative when n increases. Values ˆ 0 /n0 can provide such priors. Notice finally that c can be related to the covariance matrix K of c ≤ −H random variables Xn if it is defined. From Cover and Thomas (2001, chap. 9-10), we obtain (in bits)  n o 1 I(Θ, Xn ) ≤ n c + log 2π exp(1)|K|1/n 2 where |K| is the determinant of K. Again, such an upper bound could be used to select c as a function of an ideal channel capacity and a fixed number of data.

4 Application in subjective Bayesian analysis In subjective Bayesian analysis, the main gain of using maximal entropy priors is that they explicitly incorporate linear constraints that can be elicited from expert knowledge. Thus, through sensitivity studies, they can be easily modulated in function of the confidence placed on it. Usually, the Bayesian analyst relies on the most conservative pieces of knowledge, namely the constraints (fi , ci ) that make the posterior distribution be more robust when ci is slightly modified. Selecting the most informative constraints is another goal of the ME analysis: what constraints can be removed without really modifying the posterior distribution? The aim is here to reduce the number of calls to subjectivity and to refocus the questionings on the most important constraints. For more precisions, see Press (2003). In the next paragraphs, another gain especially provided by the MEH priors is highlighted: the possibility of prior calibration incorporating and modulating some qualitative knowledge.

4.1 Modeling partial expert opinions Assume that when solicited by a Bayesian analyst, experts can provide quantitative and qualitative information (see Craig et al. 1998 for a specification of these notions) about the behavior of a system P from which performance data xn ∈ S n are observed. Recently O’Hagan et al. (2006) gave a review of the main questioning methods used by the analysts to obtain coherent and useful answers. On the one hand, quantitative information usually refers to observable or intuitive magnitudes and can often be translated in estimators of the mean or median of X. For the analyst, the perception of X by non-statistician experts is often marginal (Daneshkhah 2004). In other terms, the quantitative information characterizes the unconditional distribution of X with density Z m(x) = p(x|θ)π(θ) dθ. (12) Ω

7

Besides, the marginal distribution works as a practical questioning and feedback P tool: experts can assimilate m(x) to a crude histogram which summarizes the main features of (van Noortwijk et al. 1992, O’Hagan et al. 2006). Since standard uncertainty measures like variances often are little familiar to them (Lannoy and Procaccia 2003), expert express their self-confidence by indirect ways, for instance through percentiles (Gelfand et al. 1995). In a reliability framework, these percentiles can correspond toPextreme values of lifetime X (worst and better lifetimes ever registered or prospected on replicates of ; see Billy et al. 2006). The situation of a value xe,i considered as the marginal αi −order percentile of X will be summarized by Z gi (x) m(x) dx = ci S

where gRi (x) = II{x≤xe,i } and ci = 1 − αi , and will be introduced as the following linear constraint fi (θ) = S gi (x)p(x|θ) dx among constraints (4).Thus, quantitative information can usually be incorporated into the MEH elicitation. On the other hand, P qualitative information can be more precise and often characterizes the predictive behavior of and deals with a particular subset of θ rather than X. This is particularly the case with shape parameters whose variations are deeply linked to the rationale of the model M(θ). An example is given in the next paragraph about Weibull’s shape parameter. Such qualitative constraints are difficult to be translated in linear constraints on π. For instance, a percentile value on θ will produce a ineffective indicative function f (θ) included in the proportionality constants of (5), (6) and (8) if Θ is unbounded. ME formal rules are practical but often cannot incorporate, in the elicitation process, a part of the information which can be of primary relevance in a given experiment. Then we suggest that parameter γ0 of the MEH priors can be assessed to respect a qualitative constraint.

4.2 A case-study in reliability and risk assessment The two-dimensional versatile Weibull distribution with density (   )  β−1 β x β x exp − p(x|η, β) = η η η includes the exponential distribution (β = 1) and the Rayleigh distribution (β = 2). It is especially used for modeling lifetime X ∼ M(θ = {η, β}) (Lawless 2003). A huge quantity of Bayesian studies have focused on Weibull and related distributions (Guida et al. 1989, Berger and Sun 1993, 1994, Kaminskiy and Krivtsov 2005, Singpurwalla 2006) and prior elicitation has been the subject of such a number of applied studies that, in numerous cases, even non-statistician experts can directly give parametrical information rather than marginal or purely qualitative information (Lannoy and Procaccia 2003). Questioning and feedback methods have been developed in the literature so that experts can learn how to react and be more conscious of the features of the model (Bacha et al. 1998, Bousquet 2006a). In such cases, constraints emanating from expert knowledge should be enough to obtain proper ME or MDI priors without calling upon an encompassing family. Dai et al. (2007) gave recently a didactic, referenced and exemplified elicitation of such ME priors in a reliability context. However, because of such a training, or possibly for self-interest reasons, expert opinions can suffer from a “modeling bias” which unduly favors a subset of Θ, despite questioning care (Daneshkhah 2004, Bousquet 2006b). This is the case in this application. Then expert knowledge must be partially used for the defence of posterior management decisions ahead to control authorities.

Quantitative information. Consider the case, taken P from Bousquet and Celeux (2006), where some right-censored lifetime data xn from a component of a nuclear device are collected in French plants (cf. Table 1 in Appendix). Independently from the data-gathering, two independent experts E1 and E2 are questioned on the lifetime X. They agree with each other about a unique value xe = 250 months, which is assumed to be a reasonable estimator of the unconditional MTTF of X. This is

8

very optimistic with respect to the maximum likelihood estimator (MLE) of the MTTF which gives 128 months. Besides, no credible uncertainty on this value is simple to assess since the self-credibility intervals provided by the experts are suspected to be biased. Indeed, E1 is a device manufacturer, who gets some interest to provide a large and optimistic interval; E2 is an exploiter whose opinion has been formed from the knowledge of a small numbers of identical devices in similar running conditions. Since this subjective information should be carefully taken into account, the only Lagrange constraint Z Z x mπ (x) dx = Eθ [X]π(θ) dθ = xe (13) S



is used in this conservative prior elicitation. Qualitative information. Shape parameter β indicates qualitatively the nature of the degradation process (Lawless 1982, Bacha et al. 1998). One assumes β > 1 for aging systems, β = 1 for systems that are only submitted to accidental failures and β < 1 for systems submitted to infant mortality defects (Bertholon et al. 2006). The case β = 2 means a constant acceleration of aging (Bousquet 2006a). Usual estimates of β lay on [0.5,4] (Bacha et al. 1998). As noticed by Lannoy and Procaccia (2003), experts can give direct or indirect value of β. Experts E1 and E2 indicated such values under the form of percentiles: P (β < βi ) = 1 − αi

(14)

for i = 1, . . . , q. Agreement value βi with modulated orders αi are considered in Table 2 (in Appendix) for a brief sensitivity study. Prior elicitation. We choose π J as the Berger-Bernardo reference prior (Berger and Bernardo 1992, Sun 1993) π J (η, β) ∝ (ηβ)−1 and Z(η, β) takes the expression Z(η, β) = log β − log η + γ/β − (1 + γ)

(15)

where γ is the Euler constant (γ ' 0, 57721). Precisions about the calculus can be found in Zellner and Sinha (1990). Applying (8) with (13) and (15), we obtain for the Weibull distribution a family of proper MEH priors on joint parameters, described in the next theorem (proof given in Appendix). Theorem 1 Denote γ1 = γ0 x−1 and let Γ (.) be the gamma function. Denote G(γ0 , γ1 ) the gamma e distribution with mean γ0 /γ1 and variance γ0 /γ12 . The MEH prior (8) defined under the constraint (13) takes the conditional form η|β ∼ G(γ0 , γ1 Γ (1 + 1/β)), β −1 exp(−γ0 γ/β) π M EH (β) ∝ γ0 Γ (1/β) It is proper iif γ0 > 0. Since Var[η|β] ∼ γ0−1 , the MEH prior becomes less and less informative when γ0 → 0+ . Note this result is counter-intuitive since, in the exponential case (β = 1), the gamma distribution is not conjugate for η (but the inverse gamma does). For the same constraint and benchmark prior, notice that the usual ME prior π M E (η, β) ∝ (ηβ)−1 exp (−γ1 ηΓ (1 + 1/β)) remains always improper for both parameters, for any choice of γ1 . MDI elicitation induces to choose γ0 = −1 which lets the prior improper too.

9

Combining descent methods (as Newton-Raphson) and Monte Carlo methods, one can elicit numerous MEH priors, assessing a range of values (γ0i ){i=1,...,q} according to Equation (14): Z



βi ∞

P (β > βi ) = Z

0

  β −1 γγ0i exp − dβ i β Γ γ0 (1/β)   = 1 − αi . β −1 γγ0i exp − dβ i β Γ γ0 (1/β)

For instance,Pthe default case P (β > 1) = 1/2 (when no expert opinion is assumed about the aging of component ) implies γ0 ' 10−10 which induces an extremely flat but proper prior. In Table 2 (in Appendix), we summarize the posterior estimation results of the MTTF in function of the β percentiles given by the experts E1 and E2 . Posterior distributions have been estimated by adaptive importance sampling (Celeux et al. 2006). Two main conclusions can be done. First, with minimal information, the MEH prior can give weight to the prior MTTF: the posterior mean of the MTTF patently differs from the MLE, but provide a large credibility interval. This large posterior distribution interval can be, for the Bayesian analyst, a tool for correcting the histograms that emerge from discussions with the overoptimistic experts. Second, assuming a good qualitative prior information on β, one can significantly reduce the uncertainty of the posterior MTTF, even though the pointwise estimate of the MTTF seems little dependent on the variation of constraint (14). Thus, MEH priors can be used to indicate to the Bayesian analyst the need for more qualitative information, or as preliminary tools for prior feedback.

5 Conclusion In this article, we introduced a prior elicitation method using the maximal entropy principle, replacing the traditional Lagrangian normalizing constraint by another linear “constraint”, which is formally implied by the propriety of the prior. It led to the definition of an encompassing family for the standard maximal entropy (ME) priors and the maximal data information (MDI) priors. Interpreted in channel coding, such priors emanate from an assumption made on the channel’s transinformation: its value in bits can be modulated through an arbitrary linear coefficient of the length of the message. In industrial Bayesian frameworks, this prior elicitation can help to obtain very vague but proper priors when the expert knowledge is fuzzy or dubious, or little informative priors for sensitivity studies. Beyond the scope of this paper, a catalog of proper families should be the next step for a more developed study. Acknowledgements The author would like to thank two unknown reviewers for their comments, questions and advices which greatly help to improve the paper. A special thank to Prof. Christian Genest for the time spent to talk about this work.

Appendix A. Links between MEH, ME and MDI priors We give here a line of proof of Proposition 1 using the Lagrange multipliers method resolution. See for instance Zellner (2002) for more precisions about the method and an accurate approach of the Lagrangian derivation. The Lagrange multipliers method is not the only way to obtain ME distributions but it is one of the most general procedures and probably, in Bayesian statistics, the most used (Press 2003). The MEH elicitation process can be written as follows: π

M EH

Z (θ) = arg max − π(θ)≥0

π(θ) log Ω

π(θ) dθ π J (θ)

10

R under constraints (4) and Ω Z(θ)π M EH (θ) dθ = c < ∞. Denoting ∇π the differential with respect to π, the solution of the maximization problem satisfies Z ∇π

π(θ) dθ + γ0 ∇π π(θ) log J π (θ) Ω

Z

 Z(θ)π(θ) dθ − c +



p X

Z γi ∇ π

i=1

 fi (θ)π(θ) dθ − ci

= 0(16)



where γi are finite Lagrange multipliers. Since all constraints are linear, (16) is similar to p X π(θ) 1 + log J + γ0 Z(θ) + γi fi (θ) = 0 π (θ) i=1

which gives (8). π J and the exponential term ensure the positivity of π. The solution of the maximization problem is reached when (γ0 , . . . , γp ) 6= (0, . . . , 0). Indeed, on the contrary π ∝ π J .

B. Proof of Theorem 1 From (8), (13) and (15), we have 1 η γ0 exp (−γ1 η − γ0 γ/β) , βη β γ0 1 exp(−γ0 γ/β) G(γ0 , γ1 Γ (1 + 1/β)). ∝ γ0 +1 β (Γ (1 + 1/β))γ0

π M EH (η, β) ∝

The constraint insures (γ0 , γ1 ) > 0. Moreover, Γ (1 + 1/β) = β −1 Γ (1/β) ∈ √ underlying propriety +∗ ] π/2, +∞[ ∀β ∈ IR , so, denoting β −1 exp(−γ0 γ/β), Γ γ0 (1/β)  γ0 2 1 0 ≤ fγ0 (β) ≤ √ exp(−γ0 γ/β) β γ0 +1 π fγ0 (β) =

we obtain

which is integrable on IR+ . Then π M EH (β) is proper and the joint distribution too. Notice the proximity of π M EH (β) with an Rinverse gamma density with scale parameter γγ0 < γ0 . Finally denote A(γ0 ) the integration constant IR+ fγ0 (β)dβ. Constraint (13) insures that xe = A−1 (γ0 ) = A−1 (γ0 )

Z Γ (1 + 1/β)fγ0 (β)E[η|β] dβ, ZIR

+

Γ (1 + 1/β)fγ0 (β) IR+

γ0 dβ γ1 Γ (1 + 1/β)

⇒ γ0 /γ1 = xe .

C. Tables

Table 1 Lifetime data xn (in months) from a nuclear device secondary circuit).

P

(a component of a water pipe from the

Real failure times:

134.9, 152.1, 133.7, 114.8, 110.0, 129.0, 78.7, 72.8, 132.2, 91.8

Right-censored times:

70.0, 159.5, 98.5, 167.2, 66.8, 95.3, 80.9, 83.2

11

P Table 2 Posterior means and credibility intervals for the MTTF (E[X]) of component , in months, as a function of prior percentiles for β. Percentile values emanate from an agreement between qualitative knowledge of experts E1 and E2 , but their orders are fluctuating. The frequentist maximum likelihood estimate of the MTTF gives 128 months with [110, 151] =(5%-95%) confidence interval. infant defects: P (β < 1) = MTTF (5%-95%) interval decelerated aging: P (β < 2) = MTTF (5%-95%) interval

0.1

0.05

0.025

163 [115,221]

171 [121,210]

179 [125,199]

0.7

0.75

0.8

152 [110,207]

144 [111,195]

141 [113,190]

References 1. Andrieu, C., Doucet, A., Fitzgerald, W.J. and P´erez, J.M. Bayesian Computational Approaches to Model Selection, Nonlinear and Non Gaussian Signal Processing, Smith, R.L., Young, P.C. and Walkden. A. (Eds), Cambridge University Press (2001). 2. Bacha, M., Celeux, G., Id´ee, E., Lannoy, A., Vasseur, D., Estimation de mod`eles de dur´ees de vie fortement censur´ees. Eyrolles (1998). 3. Beirlant, J., Dudewicz, E., Gyorfi, L. and van der Meulen, E., Nonparametric entropy estimation: An overview, Int. J. of Math. and Stat. Sci., 6, 17-39 (1997). 4. Berger, J.O., Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York (1985). 5. Berger, J., The Case for Objective Bayesian Analysis, Bayesian Analysis, 1: 385-402 (2006). 6. Berger, J.O., Bernardo, J.M., On the development of reference priors (with discussion). In: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, Eds., Bayesian Statistics 4, 35-60. Oxford University Press (1992). 7. Berger, J.O., Sun, D., Bayesian analysis for the Poly-Weibull Distribution, J. Amer. Stat. Assoc. ,88, 14121418 (1993). 8. Berger, J.O., Sun, D., Bayesian sequential reliability for Weibull and related distributions, Ann. Inst. Statis. Math., 46, 221-249 (1994). 9. Bernardo, J.M., Noninformative Priors Do Not Exist: A Discussion, J. Statis. Planning and Inference, 65, 159-189 (1997). 10. Billy, F., Bousquet, N., Celeux, G., Modelling and eliciting expert knowledge with fictitious data. In: Proceedings of the Workshop on the use of Expert Judgement for decision-making, CEA Cadarache (2006). 11. Bousquet, N., A Bayesian analysis of industrial lifetime data with Weibull distributions, HAL-INRIA research report RR-6025 (2006a). 12. Bousquet, N., Analyse bay´esienne de la dur´ee de vie de composants industriels (Elements of Bayesian analysis for the prediction of the lifetime of industrial components). Ph.D. thesis, Parix XI University (2006b). 13. Bousquet, N., Celeux, G., Bayesian agreement between prior and data, Proceedings of the ISBA congress, Benidorm, Spain (2006). 14. Celeux, G., Marin, J.-M., Robert, C.P., Iterated importance sampling in missing data problems. Comput. Stat. Data An., 12, 3386-3404 (2006). 15. Clarke, B., Barron, A.R., Jeffreys’ prior is asymptotically least favorable under entropy risk, J. Statist. Plan. Infer., 41, 37-60 (1994). 16. Cooke, R., Experts in Uncertainty: Opinion and Subjective Probability in Science. New York: Oxford University Press (1991). 17. Cover, T.M., Thomas, J.A., Elements of Information Theory. New York: Wiley (1991). 18. Craig, P.S., Goldstein, M., Seheult, A.H., Smith, J.A., Constructing partial prior specifications for models of complex physical systems, The Statistician, 47, 37-53 (1998). 19. Daneshkhah, A.R., Psychological Aspects Influencing Elicitation of Subjective Probability, Research report, University of Sheffield (2004). 20. Druilhet, P., Marin, J.-M., Invariant HPD credible sets and MAP estimators, Bayesian Analysis, 2, 1-11 (2007). 21. Forrester, Y., The quality of expert judgement: an interdisciplinary investigation. Ph.D. thesis, University of Maryland (2005). 22. Friedlander, M.P., Gupta, M.R., On minimizing distortion and relative entropy, IEEE Trans. Inform. Theory, 52, 238-245 (2006). 23. Dai, Y.-S., Xie, M., Long, Q., S.-H., Uncertainty Analysis in Software Reliability Modeling by Bayesian Analysis with Maximum-Entropy Principle, Trans. Soft. Eng., 33, 781-795 (2007). 24. Garthwaite, P.H., Kadane, J.B. and O’Hagan, A., Statistical methods for eliciting probability distributions, J. Amer. Stat. Assoc., 100, 680-701 (2005). 25. Gelfand, A.E., Mallick, B.K., Dey. D.K., Modeling expert opinion arising as a partial probabilistic specification, J. Amer. Stat. Assoc., 90, 598-604 (1995). 26. Goldstein, M., Subjective Bayesian analysis: principles and practice, Bayesian Analysis, 1, 403-420 (2006).

12

27. Goossens, L.H.J., Cooke, R.M., Expert judgement - calibration and combination. In: Proceedings of the Workshop on the use of Expert Judgement for decision-making, CEA Cadarache (2006). 28. Guida, M., Calabria, R., Pulcini, G., Bayes inference for a non-homogeneous Poisson process with power intensity law reliability, IEEE Trans. Inform. Theory, 5, 603-609 (1989). 29. Hill, S.D., Spall, J.C., Sensitivity of a Bayesian analysis to the prior distribution, IEEE Trans. Inform. Theory, 24, 216-221 (1994). 30. Jaynes, E. T., Information Theory and Statistical Mechanics, Phys. Rev., 106, 620-630, and 108, 171-190 (1957). 31. Jaynes, E. T., On the Rationale of Maximum Entropy Methods, Proceedings of IEEE, 70, 939-952 (1982). 32. Jaynes, E. T., The Relation of Bayesian and Maximum Entropy Methods (510Kb). In: Maximum-Entropy and Bayesian Methods in Science and Engineering, 1, G. J. Erickson and C. R. Smith (eds.), Kluwer, Dordrecht, 25-29 (1988). 33. Jaynes, E. T., Probability Theory: The Logic of Science. Cambridge University Press (2003). 34. Kadane, J.B., Wolfson, J.A., Experiences in elicitation, The Statistician, 47: 3-19 (1998). 35. Kaminskiy, M.P., Krivtsov, V.V., A simple procedure for Bayesian estimation of the Weibull distribution. IEEE Trans. Inform. Theory, 54, 612-616 (2005). 36. Kass, R.E., Wasserman, L., The selection of prior distributions by formal rules. J. Amer. Stat. Assoc., 91, 1343-1370 (1996). 37. Lannoy, A., and Procaccia, H., L’utilisation du jugement d’expert en sˆ uret´e de fonctionnement. Tec & Doc (2003). 38. Lawless, J.F., Statistical Models and Methods for Lifetime Data, second edition. Wiley (2003). 39. Le Besnerais, G., Bercher, J.-F., Demoment, G., A new look at entropy for solving linear inverse problems, IEEE Trans. Inform. Theory, 45, 1565-1578 (1999). 40. Linnemann, J.T., Upper limits and priors. FNAL CL Worshop (with notes added in January 2003) (2000). 41. Luttrell, S.P., The use of transinformation in the design of data sampling schemes for inverse problems, Inverse problems, 1, 199-218 (1985). 42. Miller, D.J., and Yan, L., Approximate maximum entropy joint feature inference consistent with arbitrary lower order probability constraints: application to statistical classification, Neural Computation, 12, 21752208 (2000). 43. Natarajan, R., McCulloch, C.E., Gibbs sampling with diffuse proper priors: a valid approach to data-driven inference?, J. Comput. Graph. Stat., 7, 267-277 (1998). 44. Oakley, J. E. and O’Hagan, A., Uncertainty in prior elicitations: a nonparametric approach, Biometrika 94, 427-441 (2007). 45. O’Hagan, A., Eliciting expert beliefs in substantial practical applications, The Statistician, 47(1), 21-35 (1998). 46. O’Hagan, A., Research in elicitation. In: Bayesian Statistics and its Applications, S. K. Upadhyay, U. Singh and D. K. Dey (eds.), 375-382. Anamaya: New Delhi (2006). 47. O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., Rakow, T., Uncertain Judgements: Eliciting Expert Probabilities. John Wiley and Sons, Chichester (2006). 48. Press, S.J., Subjective and Objective Bayesian Statistics (second edition). New York: Wiley (2003). 49. Robert, C.P., The Bayesian Choice. A Decision-Theoretic Motivation (second edition). Springer (2001). 50. Shannon, C. E., A Mathematical Theory of Communication, Bell Syst. Techn. J., 27, 379-423, 623-656 (1948). 51. Shulman, N. Feder, M., ( The uniform distribution as a universal prior, IEEE Trans. Inform. Theory, 50, 1356-1362 (2004). 52. Singpurwalla, N.D., Reliability and Risk: a Bayesian perspective. Wiley (2006). 53. Skilling, J., Maximum entropy and Bayesian methods. Kluwer (1989). 54. Smith, C.R., Erickson, G., Neudorfer, P.O. (Eds), Maximum Entropy and Bayesian Methods (Fundamental Theories of Physics). Kluwer Academic Publisher (1992). 55. Soofi, E.S., Information theory and Bayesian statistics. In: D.A. Berry, K.M. Chaloner, and J.K. Geweke (eds.), Bayesian analysis in statistics and econometrics in honor of Arnold Zellner, Wiley: New York: 179-189 (1992). 56. Soofi, E.S., Capturing the Intangible Concept of Information, J. Amer. Stat. Assoc., 89, 1243-1254 (1994). 57. Soofi, E.S., Principal information theoretic approaches, J. Amer. Stat. Assoc., 95, 1349-1353 (2000). 58. Sun, D., A note on noninformative priors for Weibull distributions, J. Statist. Planning and Inference, 61, 319-338 (1997). 59. van Noortwijk, J.M., Dekker, R., Cooke, R.M. and Mazzuchi, T.A., Expert judgment in Maintenance Optimization, IEEE Trans. Reliab., 41, 427-431 (1992). 60. Venegaz-Martinez, F., On information measures and prior distributions: a synthesis, Morfismos, 8, 27-50 (2004). 61. Zellner, A., Maximal data information prior distributions, in A. Aykae and C. Brumat. (eds.): New developments in the applications of Bayesian methods, Amsterdam (1977). 62. Zellner, A., Sinha, S.K., A Note on the Prior Distributions of Weibull Parameters, SCIMA, 19, 5-13 (1990). 63. Zellner, A., Bayesian methods and Entropy in Economics and Econometrics. In: Maximum entropy and Bayesian Methods, eds. W.T. Grandy, Jr. and L.H. Schick, Boston: Kluwer, 17-31 (1991). 64. Zellner, A., Models, prior information and Bayesian analysis, Journal of Econometrics, 75, 51-68 (1996). 65. Zellner, A., Bayesian Analysis in Econometrics and Statistics: The Zellner View and Papers. Cheltenham, U.K.: Edward Elgar (1997). 66. Zellner, A., Information Processing and Bayesian Analysis, Journal of Econometrics, 107, 41-50 (2002).