ADMISSIBLE TREATMENT RULES FOR A RISK-AVERSE PLANNER

numerical findings for Bayes and minimax-regret treatment rules. ..... between the maximum regret of our trial solutions and the lower bounds obtained using ...
148KB taille 2 téléchargements 221 vues
ADMISSIBLE TREATMENT RULES FOR A RISK-AVERSE PLANNER WITH EXPERIMENTAL DATA ON AN INNOVATION

Charles F. Manski Department of Economics and Institute for Policy Research, Northwestern University and Alexei Tetenov Department of Economics, Northwestern University September 2005

The research presented here was supported in part by National Science Foundation grant SES-0314312.

1. Introduction

Problems of choice between a status quo treatment and an innovation occur often in practice. In the medical arena, the status quo may be a standard treatment for a health condition and the innovation may be a new treatment proposed by researchers. Historical experience administering the status quo treatment to populations of patients may have made its properties well understood. In contrast, the properties of the innovation may be uncertain, the only available information deriving from a randomized clinical trial. Then choice between the status quo treatment and the innovation is a statistical decision problem. This paper studies the admissibility of treatment rules when the decision maker is a planner (e.g., a physician) who must choose treatments for a population of persons who are observationally identical but who may vary in their response to treatment. We focus on the relatively simple case where treatments have binary outcomes, which we label success and failure. Then the feasible treatment rules are the functions that map the number of experimental successes into a treatment allocation specifying the fraction of the population who receive each treatment. First suppose that the objective of the planner is to maximize the population rate of treatment success. Then a theorem of Karlin and Rubin (1956) shows that the admissible rules are ones which assign all members of the population to the status quo treatment if the number of experimental successes is below a specified threshold and all to the innovation if the number of successes is above the threshold. An interior fractional allocation of the population is admissible only when the number of experimental successes exactly equals the threshold. Karlin and Rubin called this class of treatment rules monotone, but we will refer to them as KR-monotone. Now suppose that the objective of the planner is to maximize a concave-monotone function of the rate of treatment success. We show that this seemingly modest generalization of the welfare function is consequential. Now admissible treatment rules need not be KR-monotone; in fact, KR-monotone rules may be inadmissible. However, a weaker notion of monotonicity remains relevant. Define a fractional monotone

2 rule to be one in which the fraction of the population assigned to the innovation weakly increases with the experimental success rate. We show that the class of fractional monotone rules is essentially complete. That is, given any rule which is not fractional monotone, there exists a fractional monotone rule that performs at least as well in all feasible states of nature. Investigating particular treatment rules, we find that Bayes rules and the minimax-regret rule depend on the curvature of the welfare function. These rules are KR-monotone if the curvature is sufficiently weak. However, they deliver interior fractional treatment allocations if the curvature is sufficiently strong. Section 2 formalizes the planner’s problem and reviews the case where welfare is the rate of treatment success. Section 3 takes welfare to be a concave-monotone function of the success rate and proves that the fractional monotone rules form an essentially complete class. We also show that KR-monotone rules can be inadmissible if the welfare function has sufficient curvature. Section 4 reports analytical and numerical findings for Bayes and minimax-regret treatment rules. Section 5 concludes. Our consideration of planning problems where welfare is a nonlinear function of the rate of treatment success appears to be new to research studying treatment choice using experimental data. Previous research has examined planning problems in which experimental findings are used to inform treatment choice; see, for example, Canner (1970), Cheng, Su, and Berry (2003), and Manski (2004, 2005). However, these and (as far as we are aware) other studies have invariably assumed without comment that welfare is the rate of treatment success. From a substantive perspective, consideration of concave-monotone functions of the success rate is interesting because, in expected utility theory, such functions imply distaste for mean-preserving spreads of gambles and thus express risk aversion. Public discourse on health matters, although not entirely coherent, suggests strong risk aversion. This is evident in the ancient admonition of the Hippocratic Oath that a physician should “First, do no harm.” It is also evident in the drug approval process of the U. S. Food and Drug Administration, which requires that the manufacturers of pharmaceuticals demonstrate “substantial

3 evidence of effect” for their products (see Gould, 2002). From a decision theoretic perspective, concave-monotone welfare functions are intriguing because they sometimes yields the conclusion that planners should fractionally allocate observationally identical persons across different treatments.

It has been common to presume that a planner should treat

observationally identical persons identically. The analysis in this paper shows that this presumption sometimes is inappropriate when a risk averse planner uses experimental data to inform treatment choice.

2. Background

The Planning Problem The basic concepts are as in Manski (2004, 2005). The planner’s problem is to choose treatments from a finite set T of mutually exclusive and exhaustive treatments. Each member j of the treatment population, denoted J, has a response function yj(@): T 6 Y mapping treatments t 0 T into outcomes yj(t) 0 Y. The population is a probability space (J, S, P), and the probability distribution P[y(@)] of the random function y(@): T 6 Y describes treatment response across the population. The population is “large,” in the sense that J is uncountable and P(j) = 0, j 0 J. In this paper, outcomes are binary with yj(t) = 1 denoting success and yj(t) = 0 failure should person j receive treatment t. There are two treatments, t = a denoting the status quo and t = b the innovation. The population success rates if everyone were to receive the same treatment are " / P[y(a) = 1] and $ = P[y(b) = 1] respectively. Consider a rule that assigns a fraction . of the population to treatment b and the remaining 1 ! . to treatment a. The population success rate under this fractional rule is

(1)

"@(1 ! .) + $@. = " + ($ ! ")@..

4 Welfare is f[" + ($ ! ")@.], where f(@) is an increasing, concave transformation of the success rate. The optimal treatment rule is obvious if (", $) are known. The planner should choose . = 1 if $ > " and . = 0 if $ < "; all values of . yield the same welfare if $ = ". The problem of interest is treatment choice when (", $) are only partially known.

The Empirical Evidence and Admissible Treatment Rules Suppose that historical experience reveals " but not $. The available evidence on $ comes from a randomized experiment, where N subjects are drawn at random and assigned to treatment b. Of these subjects, a number n experience outcome y(b) = 1 and the remaining N ! n experience y(b) = 0. The outcomes of all subjects are observed. In this setting, the sample size N indexes the sampling process and the number n of experimental successes is a sufficient statistic for the sample data. The feasible statistical treatment rules are the functions z(@): [0, . . . , N] 6 [0, 1] that map the number of experimental successes into a treatment allocation. Thus, for each value of n, rule z allocates a fraction z(n) of the population to treatment b and the remaining 1 ! z(n) to treatment a. Following Wald (1950), we evaluate a statistical treatment rule by its expected performance across repeated samples. Let p(n; $) / N![n!@(N ! n)!]!1 $n(1 ! $)N!n denote the Binomial probability of n successes in N trials. Then the expected welfare yielded by rule z(@) across repeated samples is

N

(2)

W(z; $) / 3 p(n; $)@f[" + ($ ! ")@z(n)], n=0

Expected welfare is a function of $, which is unknown. Let % index the values of $ that the planner deems feasible. Rule z is admissible if there exists no other rule zN such that W(z; $) # W(zN; $) for all $ 0 % and W(z; $) < W(zN; $) for some $ 0 %.

5 Admissible Rules for a Risk-Neutral Planner Manski (2005, Chapter 3) considers the case in which welfare is the population rate of treatment success; thus, f(@) is the identity function. Then the expected welfare of rule z is

(3)

W(z; $) = " + ($ ! ")@E$ [z(n)],

where E$ [z(n)] = 3 n p(n; $)@z(n). Rule z is admissible if there exists no zN such that ($ ! ")@E$ [z(n) ! zN(n)] # 0 for all $ 0 % and ($ ! ")@E$ [z(n) ! zN(n)] < 0 for some $ 0 %. A KR-monotone treatment rule, defined in Karlin and Rubin (1956), has the form

(4)

z(n) = 0

for n < k,

z(n) = 8

for n = k,

z(n) = 1

for n > k,

where 0 # k # N and 0 # 8 # 1. Thus, a KR-monotone rule allocates all persons to treatment a if n is smaller than the specified threshold k, a fraction 8 to treatment b if n = k, and all to treatment b if n is larger than k. Manski (2005, Proposition 3.1) applies Karlin and Rubin (1956, Theorem 4) to show that the admissible and KR-monotone rules coincide if % excludes the extreme values 0 and 1. A weaker version of this result holds if " or $ can take the value 0 or 1. Then, Karlin and Rubin (1956, Theorem 1) shows that the collection of KR-monotone treatment rules is essentially complete. That is, given any non-KR-monotone rule zN, there exists a KR-monotone rule z such that W(z; $) $ W(zN; $) for all $ 0 %. It is important to understand that these are exact finite-sample results, not asymptotic findings that only hold approximately as the sample size increases. Let % exclude the extreme values {0, 1} and consider samples of size 0 and 1. If N = 0, then all treatment rules are trivially KR-monotone, so all rules are

6 admissible. If N = 1, there are two possible values for the threshold k and, hence, two types of KR-monotone rule. Setting k = 0 yields rules in which z(0) can take any value and z(1) = 1. Setting k = 1 yields rules in which z(0) = 0 and z(1) can take any value. Thus, everywhere-fractional treatment rules are inadmissible even when the only empirical evidence about the innovation is the outcome of a single experiment.

3. Admissible Treatment Rules for Risk-Averse Planners

Determination of the admissible treatment rules when the function f(@) is nontrivially concave is a formidable problem. However, there are ways to make partial progress. This section presents two findings that shed some light on the matter. Section 3.1 shows that the class of fractional monotone rules, which encompasses the KR-monotone rules, is essentially complete for all concave-monotone f(@). Section 3.2 considers the special case in which there are only two states of nature and shows that one-step KR-monotone rules are not admissible if f(@) is sufficiently curved.

3.1. The Fractional Monotone Rules Form an Essentially Complete Class

The Binomial density function possesses the strict form of the monotone-likelihood ratio property: (n > nN, $ > $N) Y p(n; $)/p(n; $N) > p(nN; $)/p(nN; $N). Thus, larger values of n are unambiguously evidence for larger values of $. It is therefore reasonable to conjecture that good treatment rules are ones that make the fraction of the population allocated to treatment b increase with n. The results of Karlin and Rubin (1956) show that a strong form of this conjecture is correct if f(@) is linear in the population success rate. The Karlin and Rubin theorems do not apply to nonlinear f(@). Nevertheless, the conjecture remains correct in the weaker sense that the class of fractional monotone

7 treatment rules is essentially complete. Formally, we say that a treatment rule z is fractional monotone if n > nN Y z(n) $ z(nN). Proposition 1 proves the result.

Proposition 1: Let f(@) be weakly increasing and concave. If treatment rule zN is not fractional monotone, then there exists a fractional monotone rule z such that W(z; $) $ W(zN; $) for all $ 0 %.

~

Proof: Suppose that z* is not fractional monotone, so z*(n) < z*(nN) for some n > nN. Consider replacing z* with the following treatment rule z**:

p(n; ") p(nN; ") z**(n) = z**(nN) / ))))))))))))) z*(n) + ))))))))))))) z*(nN), p(n; ") + p(nN; ") p(n; ") + p(nN; ")

z**(m) / z*(m), œ m ó {n, nN}. We will show that W(z**; $) $ W(z*; $) for all $ 0 %. For any value of $,

(5) W(z**; $) ! W(z*; $) = p(n; $)@{f[" + ($ ! ")@z**(n)] ! f[" + ($ ! ")@z*(n)]} + + p(nN; $)@{f[" + ($ ! ")@z**(nN)] ! f[" + ($ ! ")@z*(nN)]}.

The function f is concave and z**(n) is a convex combination of z*(n) and z*(nN). Hence,

f[" + ($ ! ")@z**(n)] $ p(n; ")

p(nN; ")

)))))))))))) f[" + ($ ! ")@z (n)] + )))))))))))) f[" + ($ ! ")@z*(nN)]. *

p(n; ") + p(nN; ")

p(n; ") + p(nN; ")

8 The same inequality holds for f[" + ($ ! ")@z**(nN)]. Substituting these inequalities into (5) and rearranging terms yields:

p(n; $)Ap(nN; ") ! p(nN; $)Ap(n; ") W(z**; $) ! W(z*; $) $ ))))))))))))))))))))))) @ {f[" + ($ ! ")@z*(nN)] ! f[" + ($ ! ")@z*(n)]}. p(n; ") + p(nN; ")

The following inequalities use the monotone-likelihood ratio property of the Binomial density function and the fact that z*(n) < z*(nN):

$ < " Y p(n; $)Ap(nN; ") ! p(nN; $)Ap(n; ") # 0, f[" + ($ ! ")@z*(nN)] ! f[" + ($ ! ")@z*(n)] # 0. $ > " Y p(n; $)Ap(nN; ") ! p(nN; $)Ap(n; ") $ 0, f[" + ($ ! ")@z*(nN)] ! f[" + ($ ! ")@z*(n)] $ 0.

Thus, W(z**; $) $ W(z*; $) for all $ 0 #. Given any rule z* that is not fractional monotone, we can iteratively apply the transformation described above to all pairs (nN,n) for which z* (nN) > z*(n), in the following order: (nN, n) = (1, 2), (1, 3), . . . , (1, N), (2, 3), (2, 4),. . . , (N!1, N). The result is a fractional monotone treatment rule which is weakly better than z* for all values of $. Q. E. D.

Proposition 1 implies that a risk-neutral or risk-averse planner can restrict attention to fractional monotone treatment rules; there is no reason to contemplate other rules. The proposition does not imply that all fractional monotone rules are worthy of consideration. Indeed, we already know that a risk-neutral planner can restrict attention to rules that are KR-monotone. However, it appears that no stronger result can be proved without placing restrictions on the shape of f(@) beyond weak monotonicity and concavity.

9 3.2. Admissibility of One-Step KR-Monotone Rules When There are Two States of Nature

To provide a sense of how the shape of f(@) affects the decision problem, we now consider a simple but instructive setting. Let f(@) be strictly increasing and differentiable, with g(@) denoting the derivative function. Let the space % contain only two values, one lower than " and the other higher; thus, % = ($L, $H ), where $L < " < $H . For a specified k with 0 # k # N and a specified pair (v, w) with 0 # v # w # 1, define the one-step monotone treatment rule

(6)

zvw (n) = v for n # k, zvw (n) = w for n > k.

A special case is the one-step KR-monotone rule z01 . Proposition 2 compares rule z01 with a non-extreme fractional rule zvw ; that is, one with 0 < v # w < 1. We find that rule z01 strictly dominates zvw if the derivative function g(@) decreases sufficiently slowly and vice versa if g(@) decreases sufficiently rapidly.

Proposition 2: Let f(@) be strictly increasing and differentiable. Fix k. Let dL / 3 3 n > k p(n; $H ). Let 0 < v # w < 1. Rule z01 strictly dominates zvw if

(7a)

g(")/g($L) > [dL/(1 ! dL)][(1 ! w)/v],

(7b)

g(")/g($H ) < [dH /(1 ! dH )][(1 ! w)/v].

Rule zvw strictly dominates z01 if

n > k

p(n; $L) and dH /

10 (8a)

g[(1 ! v)" + v$L]/g[(1 ! w)" + w$L] < [dL/(1 ! dL)][(1 ! w)/v],

(8b)

g[(1 ! v)" + v$H ]/g[(1 ! w)" + w$H ] > [dH /(1 ! dH )][(1 ! w)/v].

~

Proof: By (2), the expected welfare of rules z01 and zvw in the two feasible states of nature are as follows:

(9a) W(z01 ; $L) = (1 ! dL)f(") + dLf($L), (9b) W(z01 ; $H ) = (1 ! dH )f(") + dH f($H ), (9c) W(zvw ; $L) = (1 ! dL)f[(1 ! v)" + v$L] + dLf[(1 ! w)" + w$L], (9d) W(zvw ; $H ) = (1 ! dH )f[(1 ! v)" + v$H ] + dH f[(1 ! w)" + w$H ].

Rule z01 strictly dominates zvw if W(z01 ; $L) > W(zvw ; $L) and W(z01 ; $H ) > W(zvw ; $H ). Rule zvw strictly dominates z01 if these inequalities are reversed. Ceteris paribus, the direction of the inequalities depends on the curvature of f(@). By assumption, f(@) is concave and strictly increasing. Hence, its derivative g(@) is weakly decreasing and everywhere positive. Use the mean-value theorem to rewrite (9c) and (9d) as

(9c)N W(zvw ; $L) = (1 ! dL)f(") + dLf($L) + (1 ! dL)($L ! ")g[(1 ! vL)" + vL$L]v + dL($L ! ")g[(1 ! wL)" + wL$L](w ! 1),

(9d)N W(zvw ; $H ) = (1 ! dH )f(") + dH f($H ) + (1 ! dH )($H ! ")g[(1 ! vH )" + vH $H ]v + dH ($H ! ")g[(1 ! wH )" + wH $H ](w ! 1),

where vL 0 [0, v], wL 0 [w, 1], vH 0 [0, v], and wH 0 [w, 1]. Recall that $L < " < $H . Comparison of (9a) and (9b) with (9c)N and (9d)N shows that rule z01 strictly dominates zvw if and only if

11 (10a)

(1 ! dL)g[(1 ! vL)" + vL$L]v + dLg[(1 ! wL)" + wL$L](w ! 1) > 0

and

(10b)

(1 ! dH )g[(1 ! vH )" + vH $H ]v + dH g[(1 ! wH )" + wH $H ](w ! 1) < 0.

Rule zvw strictly dominates z01 if and only if these inequalities are reversed. Whether (10a)-(10b) hold, or the reverse inequalities, depends on how rapidly the derivative function g(@) decreases with its argument. Direct analysis of the inequalities is complicated by the fact that the intermediate values (vL, wL, vH , wH ) used in the mean-value theorem are themselves determined by g(@). However, the fact that g(@) is a decreasing function implies that simpler sufficient conditions for dominance can be obtained by letting the intermediate values vary over their feasible ranges. Inequalities (7a)-(7b) are the sufficient condition for rule z01 to strictly dominate zvw and inequalities (8a)-(8b) are the sufficient condition for zvw to strictly dominate z01 . Q. E. D.

4. Bayes and Minimax-Regret Rules

To learn more about how the shape of the welfare function affects treatment choice, we next study the behavior of Bayes rules and the minimax-regret rule. These treatment rules are generically admissible. Hence, the results reported in this section demonstrate properties that admissible rules can have, even if they do not reveal what properties such rules must have. Sections 4.1 and 4.2 presents some broad analytical findings for Bayes rules and the minimax-regret rule respectively. Section 4.3 gives numerical results for

12 specific welfare functions and sample sizes.

4.1. Bayesian Planning

A Bayesian planner places a prior subjective probability distribution, say A, on the set #. Observing the number n of experimental successes in the randomized trial, he forms a posterior distribution, say A(n). Treating $ as a random variable with distribution A(n), the planner then solves the problem

max . 0 [0, 1] If[" + ($ ! ").]dA(n).

(11)

Proposition 3 shows that, given a regularity condition, the Bayes rule assigns the entire population to treatment a (. = 0) if the subjective mean of $ does not exceed " and assigns a positive fraction to treatment b (. > 0) otherwise. The proposition also gives a sufficient condition for the Bayes rule to be fractional (0 < . < 1).

Proposition 3: Consider problem (11). Let A(n) be non-degenerate. Let $A(n) / I$( dA(n) denote the subjective mean of $. (a) Let f(@) be strictly concave. Then the Bayes rule is unique. The solution is . = 0 if $A(n) # ". (b) Let f(@) be continuously differentiable. Let f(@) and A(n) be sufficiently regular that

M{If[" + ($( ! ").]dA(n)}/M. = I{Mf[" + ($( ! ").]/M.}dA(n)

in a neighborhood of . = 0. Then all solutions satisfy . > 0 if $A(n) > ". All solutions satisfy . , (0, 1) if $A(n) > " and If($( )dA(n) < f(").

~

13 Proof: (a) Strict concavity of f(@) implies that If[" + ($ ! ").]dA(n) is strictly concave in .. Hence, problem (11) has a unique solution. If . = 0, then If[" + ($( ! ").]dA(n) = f("). For each . > 0, f[" + ($( ! ").] is strictly concave as a function of $( . Hence, If[" + ($( ! ").]dA(n) < f[" + ($A(n) ! ").]. Hence, $A(n) # " Y If[" + ($( ! ").]dA(n) < f("). (b) {Mf[" + ($( ! ").]/M.}. = 0 = ($( ! ")@[df(x)/dx]x = " . Hence, {M{If[" + ($( ! ").]dA(n)}/M.}. = 0 = ($A(n) ! ")@[df(x)/dx]x = " . By assumption, $A(n) > " and [df(x)/dx]x = " > 0. Hence, If[" + ($( ! ").]dA(n) strictly increases with . in a neighborhood of . = 0, implying that solutions to (11) are positive. If If($( )dA(n) < f("), then . = 1 does not solve (11). Hence, solutions are fractional. Q. E. D.

Observe that the concavity and differentiability restrictions placed on f(@) are used in different parts of the proposition. The proof of part (a) only uses the assumption that f(@) is strictly concave. The proof of part (b) only uses the assumption that f(@) is continuously differentiable and the stated regularity condition. A simple special case sheds further light on the Bayes rule. Let f(@) be the log function, let # contain the two elements {$L, $H }, with $L < " < $H . Define .A(n) / "($A(n) ! ")/*($L ! ")($H ! ")*. Then direct computation shows that the fraction of the population allocated to treatment b is max{0, min (.A(n), 1)}. Thus, the Bayes rule is fractional if and only if 0 < .A(n) < 1.

4.2. Minimax-Regret Planning

The minimax-regret criterion for treatment choice uses no information beyond the planner’s knowledge that $ lies in the set #. Let Z denote the space of all functions that map [0, . . . , N] 6 [0, 1]. The planner solves this problem:

14 (12) inf z 0 Z sup $ 0 # {max [f("), f($)] ! W(z, $)}.

For each $ 0 #, max [f("), f($)] is the maximum welfare achievable given knowledge of $, W(., $) is the expected welfare achieved by rule z(@), and the difference between these quantities is regret. (The maximin rule also uses no information beyond knowledge of #. We do not consider it because it is ultra-conservative, entirely ignoring the sample data. If # contains any value smaller than ", the maximin rule assigns the entire population to the status quo.) The minimax-regret rule, denoted zmmr, is known to have a simple form when f(@) is linear and # is the open set (0, 1). In this case, Canner (1970) and Stoye (2005) show that the solution to problem (12) is the KR-monotone rule setting k = "N and 8 = ½. No explicit characterization of the minimax-regret rule is available when f(@) is nonlinear. However, we can show that if f(@) has sufficient curvature and # contains positive values arbitrarily close to 0, the minimax-regret rule never assigns the entire population to treatment b. Proposition 4 gives the result.

Proposition 4: Let " > 0. Let # contain a sequence of positive values that converges to zero. Let f(@) be such that

(13) lim $N @f($) = !4. $60+

Then zmmr(n) < 1 for all values of n.

~

Proof: Let z0 denote the treatment rule that always assigns everyone to treatment a; thus, z0 (n) = 0 for all values of n. This rule has maximum regret sup $ > " f($) ! f("). This quantity is finite, being bounded from above by f(1) ! f("). Hence, the minimax-regret rule has finite maximum regret. Hence, any treatment rule

15 with infinite maximum regret cannot be minimax regret. By Proposition 1, the minimax-regret rule is fractional monotone. Hence, it suffices to show that zmmr(N) < 1. Let z be any treatment rule such that z(N) = 1. Recalling the definition (2) of W(z; $), the maximum regret of this rule is no smaller than sup $ < " $N @[f(") ! f($)]. This quantity is infinite by (13). Hence, the minimax-regret rule must have zmmr(N) < 1. Q. E. D.

To illustrate, consider the welfare function f(x) = !x!K , where K > 1. Then (13) holds for N < K. Consider the function f(x) = –exp(1/x). Then (13) holds for all values of N.

4.3. Numerical Evaluation of Bayes and Minimax-Regret Rules

Table 1 compares Bayes and minimax-regret rules for four different welfare functions: f(x) = x, f(x) = log(x), f(x) = – x! 2 and f(x) = – exp(1/x). We compute Bayes rules for the Uniform (0, 1) prior and minimax-regret rules for # = (0, 1). The success probability of the status quo treatment takes three values: " = (0.25, 0.5, 0.75). The sample size ranges from N = 1 to N = 10. Computation of a Bayes rule is straightforward. Given a sample size N and success frequency n, we need only compute the posterior distribution A(n) and maximize the concave function If[" + ($ ! ").]dA(n) over . , (0, 1). Computation of minimax-regret rules is difficult when f(@) is nonlinear. Proposition 1 enables us to restrict attention to the space of fractional monotone rules, but this space is still very large. We use a numerical optimization algorithm to search for treatment rules that minimize maximum regret. The algorithm does not guarantee a global minimum. However, in most cases we are able to use an observation of Vapnik (1998) to confirm whether a trial solution is really minimax-regret.

16 Confirmation of Trial Solutions Let RM (z) /sup $ 0 # {max [f("), f($)] ! W(z, $)} denote the maximum regret of treatment rule z. Let RA (z) / I{max [f("), f($)] ! W(z, $)}dA denote the expected regret of rule z under probability distribution A. Let zA denote the Bayes rule using A. Then

(14)

RM (zmmr) $ RA (zmmr) $ RA (zA ).

The first inequality holds because Ih($)dA # sup $ 0 # h($) for any function h(@) on B. The second one holds because the Bayes rule minimizes expected regret. Vapnik observed that, by (14), calculation of max A 0 P RA (zA ) over any collection P of distributions on # gives a lower bound on RM (zmmr). Suppose that the maximum regret of a given treatment rule z equals this lower bound. Then z is the minimax-regret rule. Applying this idea, we let z be our trial minimax-regret rule obtained by numerical optimization and let P be the collection of distributions on # that place all mass on two points ($L, $H ), where $L < " < $H . We considered the collection of two-point distributions for two reasons. First, this is the simplest non-trivial collection of distributions on #. Second, Canner (1970) and Stoye (2005) have shown analytically that when f(@) is linear, there always exists a two-point distribution such that minimax and expected regret coincide. We find that, for all N = 1, . . . , 10, consideration of this collection of distributions is sufficient to confirm our trial solutions when f(x) = log(x) and f(x) = – x!2 . However, there are significant differences between the maximum regret of our trial solutions and the lower bounds obtained using two-point distributions when f(x) = – exp(1/x) and N > 6. Hence, Table 1 does not report minimax-regret rules for these cases.

17 Findings The top row of each part of the table gives the Bayes and minimax-regret rule for f(x) = x. These rules are necessarily KR-monotone. The first strictly concave welfare function is f(x) = log(x). We find that the Bayes rule and the minimax-regret rule are KR-monotone for every value of " and N shown in Table 1. Indeed, both rules remain KR-monotone when we consider values of ", N, and A not shown in the table. This suggests, but of course does not prove, that the log function has insufficient curvature to produce admissible treatment rules that are not KR-monotone. The welfare function f(x) = – x!2 has enough curvature to produce non KR-monotone rules, but only in some cases. Non KR-monotone fractional rules are commonplace when " = 0.25, occur in a few instances when " = 0.5, and do not occur at all when " = 0.75. The welfare function f(x) = !exp(1/x) has more curvature, and non KR-monotone rules are the norm. The table shows no case in which the Bayes or minimax-regret rule assigns the entire population to treatment b. Proposition 4 proved that the minimax-regret rule must have this property for all values of (", N, n).

5. Conclusion

In the Wald (1950) abstract development of statistical decision theory, admissibility is the most basic feature of a reasonable statistical treatment rule. Yet characterization of the admissible rules can be quite difficult in practice. The results of Karlin-Rubin (1956) for statistical decision problems with the monotone likelihood ratio property are among the most striking in the literature. Their work shows that, when outcomes are binary, a risk neutral planner who uses experimental data to choose between a status quo treatment and an innovation should restrict attention to KR-monotone treatment rules.

18 This paper has shown that the assumption of risk neutrality plays an important role in the KarlinRubin analysis. When a planner is sufficiently risk averse, admissible treatment rules need not be KRmonotone and KR-monotone rules need not be admissible.

We have not produced a complete

characterization of admissibility under risk aversion, but Propositions 1 through 4 and our numerical findings make some progress. We hope that our work will stimulate further research on the subject.

19 References

Canner, P. (1970), “Selecting One of Two Treatments When the Responses Are Dichotomous,” Journal of the American Statistical Association, 65, 293-306. Cheng, Y., F. Su, and D. Berry (2003), “Choosing Sample Size for a Clinical Trial Using Decision Analysis,” Biometrika, 90, 923-936. Gould, A. (2002), “Substantial Evidence of Effect,” Journal of Biopharmaceutical Statistics, 12, 53-77. Karlin, S. and H. Rubin (1956), “The Theory of Decision Procedures for Distributions with Monotone Likelihood Ratio,” Annals of Mathematical Statistics, 27, 272–299. Manski, C. (2004), “Statistical Treatment Rules for Heterogeneous Populations,” Econometrica, 72, 12211246. Manski, C. (2005), Social Choice with Partial Knowledge of Treatment Response, Princeton: Princeton University Press. Stoye, J. (2005), “Minimax-Regret Treatment Choice with Finite Sample,” Department of Economics, New York University. Vapnik, V. (1998), Statistical Learning Theory, New York: Wiley. Wald, A. (1950), Statistical Decision Functions, New York: Wiley.

Table 1: Bayes Rules for Uniform (0, 1) Prior and Minimax-Regret Rules for B = (0, 1)

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

N = 1, " = 0.25 0 1

N = 2, " = 0.25 0 1

2

0

N = 3, " = 0.25 1 2

3

1 0.465 0.1496 0.0744

1 1 0.7308 0.3801

[0,1] 0 0 0

1 1 0.5606 0.2841

1 1 0.9782 0.5636

0 0 0 0

1 1 0.4079 0.2044

1 1 0.8851 0.4746

1 1 1 0.7034

0.3600 0.2047 0.1125 0.0767

1 1 0.6322 0.3537

0.1749 0.0028 0.0351 0.0434

1 1 0.5349 0.3047

1 1 0.912 0.5088

0 0 0 0.0052

0.9310 0.6969 0.4006 0.2616

1 1 0.8198 0.4737

1 1 1 0.6417

2

0

N = 3, " = 0.5 1 2

3

N = 1, " = 0.5 0 1

N = 2, " = 0.5 0 1

0 0 0 0

1 1 0.4393 0.3288

0 0 0 0

[0,1] 0 0 0

1 1 0.7835 0.5855

0 0 0 0

0 0 0 0

1 1 0.3952 0.296

1 1 1 0.7682

0 0 0 0

1 0.8468 0.5644 0.4696

0 0 0 0

0.5000 0.3561 0.2075 0.2119

1 1 0.8245 0.6449

0 0 0 0

0 0 0 0.0003

1 0.8498 0.5618 0.4874

1 1 1 0.8421

2

0

N = 3, " = 0.75 1 2

3

N = 1, " = 0.75 0 1

N = 2, " = 0.75 0 1

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

[0,1] 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.3897 0.346

0 0 0 0

0.6400 0.5455 0.3972 0.3792

0 0 0 0

0 0 0 0

0.8251 0.7485 0.5947 0.57

0 0 0 0

0 0 0 0

0.0690 0 0 0

1 0.995 0.8277 0.7919

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

0

N = 4, " = 0.25 1 2

3

4

0

1

N = 5, " = 0.25 2 3

4

5

0 0 0 0

1 0.7683 0.2646 0.132

1 1 0.766 0.3973

1 1 1 0.635

1 1 1 0.7945

0 0 0 0

1 0.3936 0.1287 0.0642

1 1 0.6384 0.3251

1 1 1 0.5691

1 1 1 0.7524

1 1 1 0.8488

0 0 0 0

0.6707 0.45 0.2659 0.1969

1 1 0.7111 0.4195

1 1 1 0.6012

1 1 1 0.7465

0 0 0 0

0.4231 0.2198 0.1503 0.1387

1 1 0.6108 0.3689

1 1 0.9901 0.5617

1 1 1 0.7191

1 1 1 0.8463

3

4

0

1

N = 5, " = 0.5 2 3

4

5

0

N = 4, " = 0.5 1 2

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 0.7397 0.5525

1 1 1 0.8706

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.3771 0.2825

1 1 1 0.7542

1 1 1 0.9156

0 0 0 0

0 0 0 0

0.5000 0.3578 0.2092 0.2365

1 1 0.8376 0.6579

1 1 1 0.9094

0 0 0 0

0 0 0 0

0 0 0 0.004

1 0.8523 0.5648 0.4893

1 1 1 0.8821

1 1 1 0.9301

3

4

0

1

N = 5, " = 0.75 2 3

4

5

0

N = 4, " = 0.75 1 2

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.7269 0.6345

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 1 0.8544

0 0 0 0

0 0 0 0

0 0 0 0

0.3293 0.2578 0.1093 0.1355

1 1 1 0.9306

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0.5769 0.5023 0.3606 0.3681

1 1 1 0.9503

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

N = 6, " = 0.25 2 3

0

1

4

5

6

0 0 0 0

[0,1] 0 0 0

1 1 0.5081 0.256

1 1 0.9375 0.5037

1 1 1 0.7067

1 1 1 0.8256

1 1 1 0.8821

0 0 0 0

0.1813 0 0.054 0.0864

1 0.9785 0.522 0.322

1 1 0.9243 0.5233

1 1 1 0.693

1 1 1 0.8265

1 1 1 0.9025

0

1

N = 6, " = 0.5 2 3

4

5

6

0 0 0 0

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 0.7196 0.5376

1 1 1 0.8779

1 1 1 0.9371

0 0 0 0

0 0 0 0

0 0 0 0

0.5000 0.3582 0.2088 0.2341

1 1 0.8458 0.688

1 1 1 0.9225

1 1 1 0.9428

0

1

4

5

6

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 1 0.9485

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0.8187 0.7444 0.5976 0.5932

1 1 1 0.9616

N = 6, " = 0.75 2 3

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x (risk-neutral) log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x (risk-neutral) log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x (risk-neutral) log(x) – x!2 – exp(1/x)

N = 7, " = 0.25 3 4

0

1

2

5

6

7

0 0 0 0

0 0 0 0

1 1 0.3777 0.1891

1 1 0.8389 0.4385

1 1 1 0.6561

1 1 1 0.8

1 1 1 0.8685

1 1 1 0.9039

0 0 0

0 0 0

0.9281 0.7043 0.4118

1 1 0.8355

1 1 1

1 1 1

1 1 1

1 1 1

0

1

2

N = 7, " = 0.5 3 4

5

6

7

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.3672 0.2751

1 1 1 0.7494

1 1 1 0.925

1 1 1 0.9498

0 0 0

0 0 0

0 0 0

0 0 0

1 0.8537 0.567

1 1 1

1 1 1

1 1 1

0

1

2

N = 7, " = 0.75 3 4

5

6

7

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.363 0.3242

1 1 1 0.9634

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0.0719 0 0

1 0.9992 0.8444

1 1 1

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

N = 8, " = 0.25 3 4

0

1

2

5

6

7

8

0 0 0 0

0 0 0 0

1 0.7307 0.2488 0.1242

1 1 0.7295 0.3737

1 1 1 0.6015

1 1 1 0.7694

1 1 1 0.8548

1 1 1 0.8951

1 1 1 0.9193

0 0 0

0 0 0

0.6732 0.4573 0.2793

1 1 0.7203

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

0

1

2

N = 8, " = 0.5 3 4

5

6

7

8

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 0.708 0.5292

1 1 1 0.8894

1 1 1 0.9438

1 1 1 0.9583

0 0 0

0 0 0

0 0 0

0 0 0

0.5000 0.3584 0.2084

1 1 0.8509

1 1 1

1 1 1

1 1 1

0

1

2

5

6

7

8

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.6991 0.6175

1 1 1 0.9705

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0.3268 0.2556 0.1106

1 1 1

1 1 1

N = 8, " = 0.75 3 4

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

N = 9, " = 0.25 4 5

0

1

2

3

6

7

8

9

0 0 0 0

0 0 0 0

1 0.3722 0.1226 0.0612

1 1 0.6109 0.3094

1 1 1 0.5441

1 1 1 0.7323

1 1 1 0.8392

1 1 1 0.887

1 1 1 0.9131

1 1 1 0.9306

0 0 0

0 0 0

0.4259 0.219 0.1579

1 1 0.6102

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

0

1

2

3

N = 9, " = 0.5 4 5

6

7

8

9

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 0.3598 0.2704

1 1 1 0.7479

1 1 1 0.9354

1 1 1 0.9545

1 1 1 0.9644

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 0.8547 0.5686

1 1 1

1 1 1

1 1 1

1 1 1

0

1

2

3

N = 9, " = 0.75 4 5

6

7

8

9

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 1 1 0.8698

1 1 1 0.9752

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0.5741 0.5014 0.3599

1 1 1

1 1 1

n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x) n= Bayes Rules x log(x) – x!2 – exp(1/x) Minimax-Regret Rules x log(x) – x!2 – exp(1/x)

N = 10, " = 0.25 4 5

0

1

2

3

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 0.4886 0.2457

1 1 0.9216 0.4847

0 0 0

0 0 0

0.1804 0 0.0603

1 0.973 0.5191

1 1 0.9336

0

1

2

3

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0

1

2

3

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

6

7

8

9

10

1 1 1 0.6889

1 1 1 0.8199

1 1 1 0.8785

1 1 1 0.9076

1 1 1 0.9311

1 1 1 0.9961

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

6

7

8

9

10

[0,1] 0 0 0

1 1 0.6993 0.5234

1 1 1 0.9013

1 1 1 0.9505

1 1 1 0.9628

1 1 1 0.9913

0.5000 0.3585 0.2076

1 1 0.8546

1 1 1

1 1 1

1 1 1

1 1 1

6

7

8

9

10

0 0 0 0

0 0 0 0

0 0 0 0

[0,1] 0 0 0

1 1 1 0.9738

1 1 1 0.9922

0 0 0

0 0 0

0 0 0

0.8196 0.7466 0.6017

1 1 1

1 1 1

N = 10, " = 0.5 4 5

N = 10, " = 0.75 4 5