consistent tests of conditional moment restrictions - Pascal Lavergne

test statistics that have case dependent limiting distributions, but do not depend upon the choice of a smoothing .... functional forms of the demand and supply curves are invalid. As seen .... The resulting test is a one-sided normal test and is consistent in the direction of any nonparametric ..... ni = ψ (Zi,θn)ζi and the ζi's satisfy.
296KB taille 1 téléchargements 197 vues
CONSISTENT TESTS OF CONDITIONAL MOMENT RESTRICTIONS Miguel A. Delgado Universidad Carlos III de Madrid, Spain

Manuel A. Dom´ınguez ITAM, Mexico

Pascal Lavergne INRA-ESR Toulouse, France

Abstract We address the issue of building consistent specification tests in econometric models defined through multiple conditional moment restrictions. In this aim, we extend the two methodologies developed for testing the parametric specification of a regression function to testing general conditional moment restrictions. Two classes of tests are proposed, which can be both interpreted as M-tests based on integrated conditional moment restrictions. The first class depends upon nonparametric functions that are estimated by kernel smoothers. The second type of test is built as a functional of a marked empirical process. For both tests, a simulation procedure for obtaining critical values is shown to be asymptotically valid. Comparison of finite sample performances of the tests are investigated by means of several Monte-Carlo experiments. JEL Classification: Primary C52; Secondary C14. Key words: Hypothesis testing, Nonlinear models, Nonparametric methods, Empirical processes.

Corresponding author: Pascal Lavergne, INRA-ESR, BP 27, 31326 CASTANET-TOLOSAN Cedex, FRANCE. Tel: (0)5 61 26 50 96. Email: [email protected]

1

CONSISTENT TESTS OF CONDITIONAL MOMENT RESTRICTIONS 1

Introduction

Econometric models are frequently defined through conditional moment restrictions. This is the case for instance for models parameterizing different conditional moments (e.g. conditional mean and conditional variance) without specific distributional assumptions, transformation models, models defined by means of instrumental variables, and nonlinear-in-variables simultaneous equation models. Efficient estimation under conditional moment restrictions is considered by Chamberlain (1987), Newey (1990) and Robinson (1991). The so-called M-tests, as proposed by Newey (1985a, 1985b), Tauchen (1985) and Wooldridge (1990) among others, take as the null hypothesis a finite number of arbitrary unconditional moment restrictions implied by the conditional moment restrictions. These tests are therefore unable to detect all departures from the null hypothesis. Tests that are consistent against nonparametric alternatives can be obtained by considering some unconditional moment restrictions that uniquely characterize the conditional moment restrictions. Two competing methodologies have been developed in the recent literature for specification testing of univariate regression models. The first approach compares the parametric estimated function with a semiparametric or nonparametric function is estimated using smoothers. This leads to asymptotically pivotal tests statistics, see Eubank and Spiegelman (1990), Kozek (1991), H¨ ardle and Mammen (1993), Horowitz and H¨ardle (1993), Hong and White (1995), Fan and Li (1996), Zheng (1996) among others. Hart’s monograph (1997) surveys part of the statistical literature on the topic. The second approach compares integral transforms of the competing regression curves rather than the curves themselves. Indeed, a function can be uniquely characterized by an integral transform, see Apostol (1957, Chap 11). For instance, a probability density function is uniquely characterized by two integral transforms: the distribution function and the characteristic function. The integral regression function generalizes the distribution function concept to the regression case, see Prakasa Rao (1983 pp. 256-258), and is used for testing purposes by Buckley (1991), HongZhy and Bin (1991), Delgado (1993), Stute (1997), Koul and Stute (1999) and Whang (2000), to mention just a few. Bierens and coauthors use the generalization of the characteristic function to the regression case and propose a specification test, see Bierens (1982, 1990), de Jong and Bierens

1

(1994) and Bierens and Ploberger (1997). The methodology based on integral transforms yields test statistics that have case dependent limiting distributions, but do not depend upon the choice of a smoothing parameter. Surprisingly, work on consistent tests in econometric models mainly focuses on regression models, with few exceptions.1 Stichcombe and White (1998) and Whang (2001) propose tests for an univariate conditional moment restriction that extends Bierens’ work. Using the same methodology, Chen and Fan (1999) provide consistent procedures for testing parametric restrictions in semiparametric models and nonparametric restrictions in nonparametric models, e.g., testing for omitted variables. However, they do not allow the null hypothesis to depend upon unknown parameters. Using a smoothing approach, Li (1999) considers testing portfolio conditional mean-variance efficiency, an hypothesis also studied by Chen and Fan (1999). The aim of this paper is to propose tests for multiple conditional moment restrictions with unknown parameters and thus to provide procedures that can prove useful in a variety of econometric models. The characteristic feature of our study with respect to previous work is to simultaneously follow and generalize the two methodologies developed for specification testing of regression models. We restrict ourselves to an iid context. Extension to a time-series context should follow along the lines of de Jong (1996), who considers Bierens’ (1990) test under data dependence, and Li (1999), who generalizes Fan and Li’s (1996) results. Here, we will concentrate on the particular features arising when the conditional moment restrictions are multidimensional and possibly nonlinear in the endogenous variables. This allows us to point out the inherent problems of the generalization and to raise some open questions. From a practical viewpoint, we look throughtout our paper at some examples of applications and we show how to implement each type of tests in practice. Finally, we compare the behavior of the two types of tests by means of several Monte-Carlo simulations, as there exists few such studies in the econometric literature. The paper is organized as follows. In the next section, we detail our general testing framework and we discuss examples of applications. In Section 3, we study the asymptotics of some tests statitics based upon either nonparametric kernel estimation or integral-transform regression estimation. In Section 4, we propose to approximate critical values of each type of test by simulation procedures. Section 5 reports the results of the Monte-Carlo study. Section 6 gives some directions for further research. Technical proofs are confined to the last section. 1

See Bierens and Ginther (1999) and Zheng (1998) for quantile regression models.

2

2

Testing Framework

Let Zn = {Zi , i = 1, ..., n} be a sample of independent observations, identically distributed as the random variable Z with values in Rs , and let X be a subvector of Z with dimension q. As done by Newey (1985), Chamberlain (1987), Wooldridge (1990) and White (1994), we consider a particular parametric model indexed by θ ∈ Θ, a compact subset of Rp , and defined through conditional moment restrictions of the form H0 : ∃ θ0 ∈ Θ, E [ψ (Z, θ0 ) | X] = 0

a.s.,

(1)

where ψ(·, ·) : Rs ×Rp → Rm is a vector of known functions and 0 is the null vector of Rm . We call ψ(·, ·) a generalized residual vector, as Wooldridge (1990) does by analogy with regression models, because the null hypothesis specifies that its conditional expectation given X is zero. Our general framework allows to deal with a wide range of parametric models considered in the econometric literature. Example 1

Our framework includes specification testing of models that jointly parameterize the

conditional mean and the conditional variance of a dependent variable. Such models are defined as Y = µ(X, θ0 ) + U,

E [U |X] = 0,

a.s.

  E U 2 |X = ω 2 (X, θ0 ) > 0

a.s.,

where µ(·, ·) and ω 2 (·, ·) are known functions up to the value θ0 , see e.g. Wooldridge (1990). The parametric model is completely defined through restrictions (1), where Z = (Y 0 , X 0 )0 and ! Y − µ(X, θ) ψ(Z, θ) = . [Y − µ(X, θ)]2 − ω 2 (X, θ) Testing H0 allows to check the full specification of the model. We may also be interested in testing only a subset of these restrictions. If we consider only restrictions relative to conditional mean, we deal with specification testing of a standard regression model, as studied by many authors. If we consider only the second set of restrictions, e.g. if we are sure about the functional form of the conditional mean, we entertain a test about the functional form of the conditional variance, as studied by Hong (1993). Finally, a particular application of our framework allows to test the null hypothesis of homoskedasticity by considering the specific restriction h i E (Y − µ(X, θ0 ))2 − E (Y − µ(X, θ0 ))2 |X = 0

3

a.s.

Example 2 Consider the model τ (Y, λ) = µ(X, β) + U,

E [U |X] = 0

a.s.,

where µ(·, ·) and τ (·, ·) are known functions. When λ is unknown and τ (·, ·) is a nonlinear transform, this model is not a regression model. Choices for τ (·, ·) include the popular Box-Cox transformation, and the family of transformations defined by τ (Y, λ) = κ(λY )/λ,

κ(0) = 0, κ(0) ˙ = 1,

where κ˙ (·) denotes the first derivative of κ (·), see MacKinnon and Magee (1990). Simple instances of such transformations are κ(y) = y 2 + y and κ(y) = arcsinh(y), which have several advantages over the Box-Cox transform, see Burbidge, Magee and Robb (1988). When the residual’s distribution is unknown, the parametric model is simply defined through E [τ (Y, λ0 ) − µ(X, β 0 )|X] = 0

a.s.,

that is, through restrictions of the type (1). This allows to use any function of X as an instrumental variable (IV) for estimation purposes. Rejecting the validity of the conditional moment restrictions indicates that the functional form of the model is inadequate and would then invalidate the IV estimation method. Example 3 Our framework further includes models defined through conditional moment restrictions given a set of instrumental variables, as considered by Newey (1990). An important example of such a model is one where ψ(·, ·) is a vector (or subvector) of residuals from a (possibly nonlinear) simultaneous equations system. As a benchmark, consider the simple equilibrium model Q =

a0 P + b0 I + U,

a0 < 0 (Demand)

Q = α0 P + β 0 W + V, α0 > 0

(Supply),

where Q and P respectively denote quantity and price, I and W are equation-specific exogenous variables, and the error terms U and V are uncorrelated. Here Z = (Q, P, I, W )0 is the vector of all variables entering the model and X = (I, W )0 is the vector of exogenous variables. The model assumes that the error terms are unpredictable given the exogenous variables, i.e. E [Q − a0 P − b0 I | X] = 0

a.s.

and

E [Q − α0 P − β 0 W | X] = 0

a.s.

Besides its economic meaning, this restriction ensures identification of the coefficients and is the basis for IV estimation methods either in a parametric context, see e.g. Davidson and MacKinnon 4

(1993), or in a nonparametric setup, see Darolles, Florens and Renault (1999). Our framework allows to test jointly the restrictions related to the demand and supply equations. Under the assumption that X is exogenous, rejecting this conditional moment restriction means that the postulated functional forms of the demand and supply curves are invalid. As seen from the above examples, our framework goes far beyond testing the parametric specification of an univariate regression function and applies to testing econometric models defined by several conditional moment restrictions, which can be tested simultaneously or separately. Other examples can be considered, such as testing that the conditional score function is identically zero conditional in a fully parametric model, or that the information matrix equality holds conditionally upon the independent variables. We now introduce some conditions upon the considered econometric model. To keep a great level of applicability, we formulate general assumptions that can accommodate various models and estimation methods. For the sake of simplicity, we focus on the case of cross-section data. 2.1 Zn is an i.i.d. random sample from a random variable Z on Rs . The subvector X admits a probability distribution function F (·) and a density function f (·) . 2.2 Θ is compact in Rp . There is an estimator θn that admits the first order asymptotic expansion  θn = θ∗ + Op n−1/2 , for some interior point θ∗ of .Θ. Under H0 , θ∗ = θ0 . (k) 2.3 (i) E kψ(Z, θ∗ )k2 < ∞. For each component ψ (k) (·, ·) of ψ(·, ·), k = 1, ..., m, ψ˙ (·, ·) = 0 2 (k) ¨ (k) ∂ψ (k) (·, ·)/∂θ and in an open neighborhood

ψ (·, ·) =

∂ ψ (·, ·)/∂θ∂θ exist

almost surely

˙ (k)

¨ (k)

∗ ∗ ∗ ∗ N (θ ) of θ , E ψ (Z, θ ) < ∞ and supθ∈N (θ∗ ) ψ (·, θ ) < S (·) , with E [S (Z)] < ∞,

¨ (k) (·, ·) are continuous in where k·k denotes the Euclidean norm . (ii) For each k = 1, ..., m, ψ θ for θ ∈ N (θ∗ ) and uniformly in Z. Assumption 2.1 restricts our analysis to an iid context where the exogenous variables are continuous. Allowing some discrete components in X is not difficult, but would involve cumbersome √ notations.2 Assumption 2.2 says that we have at hand an estimator θn that is n-consistent for a pseudo-true value θ∗ of the parameter, which equals θ0 under H0 . This allows for several estimation methods, such as nonlinear least-squares, instrumental variables, generalized method of moments, or pseudo-maximum likelihood, see e.g. White (1994) and Gourieroux and Monfort (1995) for a general theory. For instance, given a suitable (p × m) matrix of instruments M , the pseudo-true value can be defined as the solution to the equation system E [M (X) ψ(Z, θ∗ )] = 0, 2

See Zheng (1996) or Su and Wei (1991) for details.

5

(2)

and θn is the solution to the sample analog of (2), that is, the solution to n

1X M (X) ψ(Z, θ) = 0. n

(3)

i=1

Assumption 2.3 imposes some regularity on the functions entering the restrictions, see e.g. Newey (1985) or Robinson (1991) for similar conditions.

3 3.1

The two methodologies: rationale and asymptotics Smooth tests

To test for the null hypothesis, we formulate the conditional moment restrictions as a completely equivalent unconditional moment restriction. Specifically, H0 is equivalent to   H0 : E ψ 0 (Z, θ0 ) E (ψ (Z, θ0 ) | X) wS (X) = 0

a.s.,

(4)

for any weight function wS (·) is strictly positive almost surely onto the support of f (X). This states the orthogonality between the generalized residuals ψ (·, θ0 ) and their conditional expectation E (ψ (Z, θ0 ) | X = ·) . Alternatively, one could weight each of the generalized residuals differently. A sample analog of the expectation in (4) is given by the infeasible estimator n

1X 0 ψ (Zi , θ0 )E (ψ (Zi , θ0 ) | Xi ) wS (Xi ) , n i=1

A feasible estimator thus requires on the one hand a consistent estimator of θ0 , as given by θn , and on the other hand a nonparametric estimator of E(ψ (Z, θ0 ) | X), such as a kernel estimator. A possible and natural choice for wS (·) is the density f (·) itself. This is technically convenient in problems involving sums of kernel estimators, see e.g. Powell, Stock and Stoker (1989), Fan and Li (1996), Zheng (1996) and Lavergne and Vuong (2000) for an analogous device. Estimating E (ψ (Z, θ0 ) | X = ·) f (·) by kernel estimators leads to consider the statistic Tn =

n n X X 1 ψ 0 (Zi , θn ) ψ (Zj , θn ) Kij , n (n − 1) hq

(5)

i=1 j=1,j6=i

where Kij = K ((Xi − Xj ) /h) , K(·) : Rq → R is a kernel function and h = h (n) is a positive bandwidth number. This statistic is easily computed from the estimated generalized residuals ψ(Zi , θn ). It resembles the statistic proposed in Fan and Li (1996) and Zheng (1996) for testing the specification of regression functions and it is constructed similarly, with the generalized vector residuals ψ(Zi , θn ) in place of standard regression residuals. 6

The asymptotic behavior of Tn depends on whether H0 holds. Under misspecification, Tn is √ asymptotically consistent and efficient for the unconditional moment in (4) with the usual n-rate √ of convergence. But, under the null hypothesis, the n-asymptotic distribution of Tn is degenerate. This leads us to consider higher-order terms in its asymptotic expansion under H0 . To state our formal result, we introduce some standard assumptions on the kernel function and the bandwidth parameter. We also impose the supplementary smoothness Assumption 2.6 stated in the proofs’ section. 2.4 K(·) is even and bounded, integrates to one and limkuk→∞ kukq |K (u)| = 0. n o 2.5 limn→∞ h + (nhq )−1 = 0. Theorem 1 Under H0 and Assumptions 2.1 to 2.6, nh

q/2

d

Tn → N (0, V ) , where V = 2

m X m X

  E σ 2kl (X) f (X)

k=1 l=1

Z

K 2 (u) du,

Rq

h i with σ kl (X) = E ψ (k) (Z, θ∗ ) ψ (l) (Z, θ∗ ) |X . To build a test statistic in practice, we need an estimator of the asymptotic variance of Tn under the null hypothesis. For instance, it can be estimated by Vn =

m m n n 2 1 XXX X 2 ψ (k) (Zi , θn ) ψ (l) (Zi , θn ) ψ (k) (Zj , θn ) ψ (l) (Zj , θn ) Kij . n (n − 1) hq k=1 l=1 i=1 j=1,j6=i

The test statistic is defined as tn = nhq/2 Tn /Vn1/2 . Corollary 2 Under Assumptions 2.1 to 2.6, tn →d N (0, 1) under H0 and otherwise nhq/2   E ψ 0 (Z, θ∗ ) E [ψ (Z, θ∗ ) |X] f (X) /V 1/2 > 0.

−1

t n →p

The resulting test is a one-sided normal test and is consistent in the direction of any nonparametric alternative to H0 . In particular, negative values of the test statistic can occur under the null hypothesis only. Example 1 (continued) Given a consistent estimator θn of θ∗ , e.g. a generalized nonlinear least-squares estimator, the generalized residuals are ψ(Zi , θn ) =

Yi − µ(Xi , θn ) [Yi − µ(Xi , θn )]2 − ω 2 (Xi , θn ) 7

! =

Uni 2 − ω 2 (X , θ ) Uni i n

! .

The statistic Tn writes Tn = T1n + T2n X    1 = Uni Unj + Un 2i − ω 2 (Xi , θn ) Un 2j − ω 2 (Xj , θn ) Kij . q n(n − 1)h i6=j

The asymptotic variance estimator is computed as Xn  2 2  2 2 2 2 2 Vn = Uni Unj + Uni − ω 2 (Xi , θn ) Unj − ω 2 (Xj , θn ) q n (n − 1) h i6=j     2 +2Uni Uni − ω 2 (Xi , θn ) Unj Unj − ω 2 (Xj , θn ) Kij . Testing only the specification of the conditional mean is based upon T1n , as proposed by Zheng (1996) and Li and Wang (1998). Testing only the specification of the conditional variance can be based upon T2n . Our joint specification test of conditional mean and variance is simply the addition of the two latter statistics. Among the problems related to the practical implementation of our test is the choice of the bandwidth parameter. First, though our general theory is developed for a generic bandwidth, our test can be readily extended to vanishing individual bandwidths hj for each conditioning variable Q 1/2 X (j) . The rate of convergence of Tn is then n qj=1 hj . Second, a data-dependent bandwidth is often used in practice. This allows to adapt smoothing to the variability of X (j) , by considering for instance hj = h × sj where s2j is the empirical variance of X (j) . The following theorem shows that this can be done without affecting the properties of our asymptotic test. Theorem 3 If K(·) is differentiable, with bounded partial derivatives on its support, Corollary 1 b extends to the case of random  h, if there exists a deterministic h that fulfills the assumptions of Corollary 1 such that b h − h /h = op (1) .

3.2

Integral-transform tests

Another way to construct a test for H0 is to replace the conditional moment restrictions by an equivalent infinite set of unconditional moment restrictions that are easier to deal with than the original ones. Indeed, the null hypothesis of interest is equivalent to H0 : E [ψ (Z, θ0 ) wP (X, x)] = 0,

∀x ∈ Rq ,

(6)

where wP (·, ·) is a properly chosen weight function. Stinchcombe and White (1998) show that there exists a wide class of functions that generate consistent tests, including among others the logistic cumulative distribution function or the exponential weight function employed by Bierens 8

(1982). Here, we use the computationally convenient step function wP (X, x) = 1 (X ≤ x) =  Qq (j) ≤ x(j) , where 1(·) is the indicator function. Previous work where a similar choice is j=1 1 X made includes Andrews (1997), Stute (1997), Koul and Stute (1999) and Whang (2000). It is convenient to consider the expectations in (6) because they are easily estimated by sample analogs. In the case where wP (X, x) = 1 (X ≤ x), the estimator is n

1X ψ (Zi , θn ) 1 (Xi ≤ x) , n

Rn (x) =

(7)

i=1

which is an empirical process of dimension m marked by the generalized residuals ψ(Zi , θn ). A test statistic for (6) can then be any well-chosen continuous functional of Rn (·) . For instance, one may consider a statistic of the form

Z n Rq

Rn0 (x)Rn (x) dν (x) .

where ν (·) is an arbitrary probability measure, as done by Bierens and Ploberger (1997). In our formal analysis, we specifically choose a simple and natural measure as the empirical distribution function of the Xi ’s. Hence, we consider the Cramer-von-Mises type statistic Z cn = n Rq

Rn0 (x)Rn (x) dFn (x)

=

n X

Rn (Xi )0 Rn (Xi ) ,

i=1

where Fn (·) is the empirical distribution function of X. A Kolmogorov-Smirnov type test can also be constructed. We focus here on the Cramer-von-Mises test, as Whang (2000) reports better performances for this type of test in a regression context. We first provide a functional central limit theorem for Rn (·) under the following assumption. 2.7 θn = θ∗ + n1

Pn

i=1 ` (Zi , θ ∗ 0 ∗



 ) + op n−1/2 , where the function ` (·, ·) is such that E [` (Z, θ∗ )] = 0

and E[` (Z, θ ) ` (Z, θ )] exists.  h i−1 When θn is defined as the solution to (3), then ` (Z, θ∗ ) = E M (X) ψ˙ (Z, θ∗ ) M (X) ψ (Z, θ∗ ) ,  (1)0  (m)0 0 where ψ˙ = ψ˙ , ..., ψ˙ . Define h i ri (x, θ) = ψ (Zi , θ) 1 (Xi ≤ x) + E ψ˙ (Z, θ) 1 (X ≤ x) ` (Zi , θ) .

(8)

Theorem 4 Under H0 and Assumptions 2.1, 2.3(i) and 2.7, n1/2 Rn ⇒ R∞ , q where ⇒ denotes weak convergence in the Skorokhod space ×m k=1 D [−∞, ∞] and R∞ is a Gaussian

process centered at zero with covariance structure Ω (x, s) = E [r1 (x, θ0 ) r10 (s, θ0 )] , 9

∀ x, s ∈ Rq .

The following corollary gives the asymptotic behavior of our statistic cn . Corollary 5 Under Assumptions 2.1, 2.3(i) and 2.7, Z d 0 cn → c∞ = R∞ (x) R∞ (x) dF (x) under H0 , Rq

n−1 cn →p

Z

E [ψ (Z, θ0 ) 1 (X ≤ x)]0 E [ψ (Z, θ0 ) 1 (X ≤ x)] dF (x) > 0,

otherwise.

Rq

Therefore, an asymptotic test can be based on the statistic cn , as soon as one can compute critical values. Because the asymptotic distribution of cn is known only in special cases, see e.g. Delgado (1993), we will propose in the next section a simulation method to approximate it. The resulting test is then a one-sided normal test and is consistent against any nonparametric alternative to H0 . Example 1 (continued) Recall that Uni = Yi − µ(Xi , θn ). The statistic cn is then equal to cn = c1n + c2n  2  2 n n n n X X X X   2 1 1 Unj 1 (Xj ≤ Xi ) + Unj − ω 2 (Xj , θn ) 1 (Xj ≤ Xi ) . = n n i=1

j=1

i=1

j=1

Testing only the specification of the conditional mean leads to consider c1n , which is exactly the statistic proposed by Stute (1997) for univariate regression models and extended by Whang (2000) to a multivariate context. Testing only the specification of the conditional variance is based upon c2n . Our test statistic for testing the specification of both conditional mean and variance is the sum of the two latter statistics.

3.3

Variations on two themes

It should be noted that a number of different valid test statistics can be built. First, many alternative equivalent formulations of the null hypothesis could be considered by replacing each of the generalized residuals ψ (k) (·, ·) by ak ψ (k) (·, ·), for given constants ak , k = 1, . . . m. The null hypothesis (4) for smooth tests would then write m X

h   i a2k E ψ (k) (Z, θ0 ) E ψ (k) (Z, θ0 ) | X wS (X) = 0

a.s.

k=1

Thus, the sum of squares generalized residuals ψ 0 (·, θn ) ψ (·, θn ) = placed in Tn by a weighted sum of squares

m P

m P

[ψ (k) (·, θn )]2 would be re-

k=1

[ak ψ

k=1

10

(k)

(·, θn

)]2 ,

and similarly, each of the ψ (k) (·, θn )

would be changed into ak ψ (k) (·, θn ) in the expressions of V and Vn . The null hypothesis (6) for integral-transform statistics would write h i ak E ψ (k) (Z, θ0 ) wP (X, x) = 0,

∀x ∈ Rq , ∀k = 1, . . . m

and the statistic cn would be changed accordingly. More generally, one could adopt different weighting functions wS (X) or wP (X, x) in the formulation of the null hypothesis. Whether there exists some optimal weighting is an open question and has not been investigated even for testing regression models. Second, as easily seen from our Example 1, one could consider testing (4) by smooth tests by standardazing first each of the components of Tn , T1n and T2n in our example, and then determining the asymptotic distribution of the sum of the standardized individual statistics. Yet another approach could be to show that the joint distribution of (T1n , T2n ) is asymptotically normal and to construct a normalized statistic with an asymptotic χ2 distribution. This approach is adopted in Li (1999) when testing portfolio mean-variance efficiency. However, such a test ignores the one-sided nature of the testing problem, and thus has undesirable properties. Indeed, it rejects the null hypothesis whenever T1n and T2n take large negative values, which can occur under H0 only. As a consequence, it is also expected to have inferior power properties. While these variants lead to statistics with analogous properties than the ones that we have studied, they should be compared on their respective power. This is a quite intricate question, because one would like to consider nonparametric alternatives of various forms, and the power of each form of test can depend on the specific form of the alternatives considered. Moreover, smooth tests and integral-transform tests have their own strengths and drawbacks, depending upon on the theoretical approach for comparing them, see Fan and Li (2000) and Guerre and Lavergne (2002). Hence, we leave this issue for future research.

4

Simulation-based critical values

On the one hand, the practical implementation of the asymptotic test from Section 3.1 involves some difficulties, as the asymptotic approximation of the null distribution can be slow, depending upon the chosen bandwidth and the number of exogenous variables in the model, see e.g. H¨ ardle and Mammen (1993). On the other hand, a procedure for computing critical values of the cn statistic is needed, because its asymptotic null distribution is case-dependent. In the following, we first briefly review the methods proposed for computing critical values for specification tests of regression functions, and explain the difficulties that arise for extending these methods to testing general conditional moment restrictions. We then propose a simple simulation approach that yields 11

asymptotically valid critical values for our tests.

4.1

Methods for approximating critical values

The integral-transform statistic raises the fundamental problem that its limiting distribution under the null hypothesis depends on the unknown data-generating process, and therefore cannot not be tabulated in general. One way of solving this problem is to derive case-independent upperbounds of the asymptotic critical values, as proposed by Bierens and Ploberger (1997). Another possibility consists of transforming the marked empirical process Rn (·) , as suggested by Stute, Thies and Zhu (1998) and Koul and Stute (1999). Indeed, in a regression context with q = 1, if the errors are independent of X, it is possible to perform a transformation Sn of Rn depending on the data, such that Sn ◦ Rn ⇒ B, where B denotes the standard Brownian motion. As Stute, Thies and Zhu (1998) show, this transformation is also feasible when the errors are conditionally heteroskedastic, using nonparametric estimators of the conditional variance. However, when q > 1, the proposed transformation is no longer asymptotically pivotal, even when the parameters are known. Hence, the implementation of such transformations in our context would require too restrictive assumptions and would be rather involved for practical purposes. For smooth specification tests of regression models, wild bootstrap procedures have been proposed to compute accurate small sample critical values. When ψ (Z, θ) = Y − m (X, θ) , with a scalar Y and a known function m (·, θ), H¨ardle and Mammen (1993) propose to generate external bootstrap resamples from a model that fulfills the null hypothesis, which is necessary to obtain valid critical values. A typical resample Zn∗ is thus obtained as {(Yi∗ , Xi ) , i = 1, ..., n}, where ∗ , with U ∗ = U ζ and Yi∗ = m (Xi , θn ) + Uni ni i ni

B1 The ζ i ’s are iids, independent of Zn , with zero mean and unit variance. Bootstrap critical values at level α are obtained by (i) generating several resamples Zn∗ and computing the corresponding test statistic t∗n for each resample, (ii) computing the empirical quantile of order (1 − α) of the statistics t∗n . This gives a critical value to be compared to the original test statistic tn . For a given α, the estimation accuracy of the critical value depends upon the number of simulated statistics. Such a wild bootstrap procedure is also applicable to statistics based on integral transforms of a regression, see Stute, Gonz´alez-Manteiga and Presedo (1997) and Whang (2000). However, it can be extremely difficult to apply in our context for practical and theoretical reasons. In practice,

12

the number of generalized residuals may be different than the number of endogenous variables. For instance, in Example 1, when one wants to jointly test for the parametric specification of the conditional mean and the conditional variance, there are two generalized residuals for only one endogenous variable. On the other hand, if one wants to consider a single equation coming from a simultaneous equation system, as in Example 3, there is less generalized residuals than endogenous variables. Moreover, the model can be nonlinear in the endogenous variables, as in a the transformation model of Example 2 or a nonlinear simultaneous equation model, so that a simple reduced form for the endogenous variables may not be available. From a theoretical viewpoint, nonlinearity of the model in endogenous variables complicates a lot the technical analysis. Furthermore, it may not be easy in that case to find an external distribution which mimics what would happen under the null hypothesis. Similar difficulties are also encountered when boostrapping parametric tests based on Generalized Method of Moments estimators, see Hall and Horowitz (1996). Because it seems difficult to propose a generalization of the wild bootstrap that would be valid in any case, we adapt a simple simulation technique proposed by Hansen (1996) and also used by de Jong (1996) and Chen and Fan (1999) for integral-transform type statistics. This method is simple and always applicable, but is likely not to be as good as one tailored for a specific model.

4.2

Smooth tests

Because the statistic Tn is a function of Yn = {(ψ (Zi , θn ) , Xi ) , i = 1, ..., n} only, we directly simulate resamples as Yn∗ = {(ψ ∗ni , Xi ) , i = 1, ..., n} , where ψ ∗ni = ψ (Zi , θn ) ζ i and the ζ i ’s satisfy Assumption B1. This gives us statistics of the form Tn∗ =

X X 1 1 ∗ ψ ∗0 ψ 0 (Zi , θn ) ψ (Zj , θn ) ζ i ζ j Kij . ni ψ nj Kij = q q n (n − 1) h n (n − 1) h i6=j

i6=j

The simulated version of the test statistic is then t∗n =

nhq/2 Tn∗ 1/2

.

Vn

Critical values are computed as the corresponding empirical quantiles for the set of simulated statistics. This gives a critical value to be compared to the original test statistic tn . The next theorem justifies the validity of this procedure. Theorem 6 Under the assumptions of Theorem 1 and Assumption B1, sup |Pr ( t∗n ≤ t| Zn ) − Φ (t)| = op (1) , t

where Zn = {Zi , i = 1, . . . , n} and Φ(·) is the standard normal distribution. 13

Whether much accuracy is gained with respect to the asymptotic approximation depends upon 0

several features. For the linear regression model, i.e. ψ (Zi , θn ) = Yi − Xi θ, and when θn is the ordinary least squares (OLS) estimator, this simulation method is equivalent to a wild boostrap procedure, and then has the same properties. In that case, Li and Wang (1998) show that moments up to order four are accurately matched by the wild bootstrap under the supplementary assumption  E ζ 31 = 1.3 Fan and Linton (1999) provide Edgeworth’s expansions for the distribution of a similar test statistic in a regression context, assuming symmetric errors. In general, however, the performances of our simulation technique may depend upon the form of the moment restrictions, the estimation technique and the distribution of the ζ i ’s.

4.3

Integral-transform tests

The procedure used for the smooth test cannot be directly applied to the statistic cn . This is because a version of Rn (·) based upon Yn∗ would not mimic its covariance structure under the null hypothesis. However, Theorem 3 shows that Rn (·) asymptotically depends only upon the ri (x, θ). This suggests to consider the statistic c∗n =

n X

n

¯ n∗ (Xi )0 R ¯ n∗ (Xi ) , R

i=1

where

X ∗ ¯ n∗ (x) = 1 R rni (x) n

∗ and rni (x) = ri (x, θn ) ζ i .

i=1

¯ n∗ (·) is, conditional on Zn , centered at zero and with covariance structure It is easy to see that n1/2 R P Ωn (x, s) = n1 ni=1 ri (x, θn ) ri0 (s, θn ) , which converges to the covariance structure of R∞ (·) under H0 . In practice, computation of ri (x, θn ) requires to evaluate `(Zi , θn ), which is unknown in general, but can be adequately approximated for usual estimators. For instance, if θn is the solution to (3) Pn −1 ˙ then `(Zi , θn ) can be replaced by L−1 n M (Xi ) ψ (Zi , θ n ) , where Ln = n i=1 M (Xi ) ψ (Zi , θ n ). Critical values for testing H0 at level α are obtained as for the smooth test. Note that, when ψ (Z, θ) = Y − X 0 θ and θn is the OLS estimator, c∗n can be simply computed from a resample Zn∗ = {(Yi∗ , Xi ) , i = 1, .., n} , where Yi∗ = Xi0 θn + ψ ∗ni , which is the method proposed by Stute et al. (1998) and Whang (2000). Theorem 7 Under the assumptions of Theorem 3 and 2.8 (see Proof Section), if |ζ 1 | ≤ c < ∞ for some c > 0, sup |Pr ( c∗n ≤ t| Zn ) − Pr (c∞ ≤ t)| = op (1) , t

where c∞ is as in Corollary 2 under H0 , and otherwise similarly defined but depending upon θ∗ in place of θ0 . 3

This is to match the skewness terms in the Edgeworth’s expansions of the statistic distribution and the conditional

bootstrap statistic distribution.

14

5

Monte Carlo Results

We investigate the small sample behavior of the tests considered in this paper within the setup of our Examples 1 and 2. Note that, even in the case of testing a regression function, there is little evidence in the econometric litterature on the comparative performances of the different approaches. A notable exception is Whang (2000), who compares Kolmogorov-Smirnov and Cramer-von-Mises tests to the tests of H¨ardle and Mammen (1993) and Bierens and Plobreger’s (1997). Testing for linearity of a regression function

We first consider testing for the linear spec-

ification of a regression function. In this case, the smooth test is identical to the test proposed by Zheng (1996) and Li and Wang (1996), while the integral-transform test is the one studied by Stute (1997). The null hypothesis of interest is H0 : ∃ (α0 , β 0 ) : E [Y − α0 − β 0 X | X] = 0

a.s.

The data generating process is chosen as Yi = 1 + 2Xi + sin (δπXi ) + ui , i = 1, ..., n, where the ui ’s are iids N (0, 1) and independently distributed of the Xi ’s which are iids N (0, 1) . The null hypothesis corresponds to δ = 0 and is denoted by DGP0 . We investigate three alternatives denoted as DGPδ for δ = 1, 2 and 3. By increasing δ, we obtain higher frequency alternatives that are more difficult to distinguish from pure noise. This allows us to observe large variations in the behavior of our tests. In each case, two different sample sizes, n = 50 and 100, are considered. For each experiment, i.e. each data generating process and sample size, we run 2000 simulations. For each simulation, the critical value is estimated using 500 bootstrap replications. The parameter vector (α0 , β 0 ) is estimated by ordinary least-squares. For the test based on smoothers, we choose the bandwidth following the rule-of-thumb h = dn−1/5 , with d varying in the grid {0.025, 0.05, 0.1, 0.5, 1, 1.5, 2}. The results of each experiment are reported on a graph that show the empirical rejection probabilities for the three tests at nominal level 5% with respect to d. The solid line corresponds to the rejection probability for the test based upon cn (which does not depend upon d), the grey solid line is the rejection probability for the smooth test based upon tn using bootstrap critical values, and the dashed line is the rejection probability for the smooth test using asymptotic critical values. Figures 1 and 2 report the result of DGP0 for the two sample sizes n = 50 and 100 respectively. In both cases, the test based on nonparametric estimation is too conservative. This is because the test statistic is negatively biased in small samples, as already noted by Li and Wang (1998). 15

However, the empirical size of the test becomes closer to its nominal size when the sample size increases. It is also seen that use of bootstrap critical values does improve upon the asymptotic critical values, but generally not as much as required. The empirical level for Stute’s test is very close to its nominal level even for a small sample size of 50. Results for alternative hypotheses DGP1 to DGP3 are reported in Figures 3 to 8. For the low frequency alternative DGP1 , Stute’s test has high power even for a sample size of 50 observations, and its power is one when n=100. When the frequency of the alternative increases, the rejection probability for Stute’s test rapidly decreases and is less than 20% for DGP3 for a sample size of 100. As expected, power of the smooth test varies with the frequency of the alternative and the bandwidth. The first phenomenon has already been noted in some other contexts, see e.g. Lavergne and Vuong (2000). A small bandwidth always corresponds a low rejection frequency, while a large one may yield low or high power depending upon the alternatives. Moreover, the smooth test has high power for a range of bandwidths that narrows as δ increases. However, a bandwidth close to 0.5 gives high power for each experiment in our study. As is the case under the null hypothesis, using bootstrap critical values does lead to some but limited improvement in most cases. Compared to Stute’s test, the smooth test has higher or equivalent power for a large range of bandwidths. Testing jointly for linearity of the regression function and for homoscedasticity We now consider a similar regression model, where one aims to test jointly for the specification of the regression function and for homoskedasticity. The null hypothesis writes   E [Y − α0 − β 0 X |X ]  i = H0 : ∃ α0 , β 0 , σ 20 :  h E (Y − α0 − β 0 X)2 − σ 20 |X

0 0

! a.s.

We consider the design √ Yi = α0 + β 0 Xi + δui (1 + Xi2 )/ 5,

i = 1, ..., n,

where the ui ’s and the Xi ’s are generated as before. The null hypothesis corresponds to δ = 0 and is denoted DGP00 . The alternative DGP10 corresponds to δ = 1. Two sample sizes are considered, n = 100 and 250. Other details are otherwise similar than in the previous Monte-Carlo study. The results are reported in Figures 9 to 12. Under the null hypothesis, the tests essentially exhibits the same features as when testing for a linear regression only. That is, the smooth test is undersized and it makes little difference to use bootstrap critical values, while the test based upon cn has an empirical size close to the nominal one. Under the alternative hypothesis, there is much more difference between the two tests. The smooth test is more powerful than the integral-transform test for all bandwidths, except a very small one (d = 0.025). However, power is larger than 70% for the integral-transform based test when the sample size equals 250. 16

Testing the transformation model

We now consider a well-studied model with a nonlinear

transformation in the endogenous variable, namely the arcsinh transformation. The hypothesis of interest here writes  arcsinh(λ0 Y ) − α0 − β 0 X | X = 0 H0 : ∃ (λ0 , α0 , β 0 ) : E λ0 

a.s.

We consider the design arcsinh(2Yi ) = 1 + 2Xi + sin (δπXi ) + ui , i = 1, ..., n, 2 where the ui ’s and the Xi ’s are generated as before, except that the variance of the error term is chosen as 0.5. The parameters are estimated by a GM M estimator, with vector of instruments 0 M (X) = 1, X, X 2 , where the identity matrix has been chosen to weight the discrepancy conditions in the estimating objective function. Two sample sizes are considered, n = 100 and 250. The 00

notation DGPδ denotes the model with δ = 0, 1, 2, 3. Other details are otherwise similar than in the previous studies. The results are reported in Figures 13 to 20. Under the null hypothesis (Figures 13 and 14), the only novel feature compared to the previous cases is that the integral-transform test is now undersized for the smaller sample size. Under the 00

alternative hypothesis DGP1 (Figures 15 and 16), the power of the smooth test is essentially one over the whole range of considered bandwidths, while the power of its competitor is around 20% 00

for n = 100 and increases to 50% when n = 250. Under the alternative hypothesis DGP2 (Figures 17 and 18), the smooth test behaves quite similarly, while the power of its competitor increases 00

up to 95% for the larger sample size. For the higher frequency alternative DGP3 (Figures 19 and 20), the test based on cn has essentially no power for n = 100, but its rejection probability reaches 70% when n = 250. The smooth test’s power is not very sensitive to the bandwidth choice as soon as it is not too large, i.e. for d less than one. Our limited set of simulations sheds light on their comparative behavior of the tests. For testing the specification of an univariate regression function, the integral-transform test has good size. Its power is quite high against an alternative of low frequency even for a small sample size, but it deteriorates against an alternative of high frequency. The smooth test is undersized and can be outperformed by its competitor, but seems well adapted to alternatives of varying frequencies. In other testing situations, the differences can be more marked. While the smooth test remains undersized, it seems that it can deal more easily with nonlinearity in the endogenous variable.

6

Conclusion

We have shown in this paper how the two methodologies used for testing the specification of regression function can be extended to testing a general set of conditional moment restrictions, 17

which can prove useful for many econometric models. Several problems warrant further research. First, it seems important to determine whether there is some optimal and feasible way to combine the conditional moment restrictions, as discussed in Section 3.3. Second, for the smooth test, it would be helpful to have some data-driven methods for bandwidth’s choice, as investigated by Horowitz and Spokoiny (2001) and Guerre and Lavergne (2002) in the regression context. Third, it is clear that procedures for computing critical values are crucial for practical implementation. We have explained the difficulties that arise in generalizing bootstrap methods proposed in a regression context. Some unreported simulations results suggest that more sophisticated methods can substantially improve upon the simple simulation technique we use here. However, a general formal theory is still required in this respect.

7

Proofs

Definition 1 Gα , α > 0, is the class of functions g(·) : IRq → IR such that ∃ρ > 0 such that for all z ∈ IRq , supky−zk≤ρ |g (y) − g (z)| / ky − zk ≤ G (z) and g(·) and G(·) have finite α-th moments (or are bounded if α = +∞). h i h (k) i Let αk (x) = E ψ (k) (Z, θ∗ ) | X = x , γ k (x) = E ψ˙ (Z, θ∗ ) | X = x ,   4 h i σ 4k (x) = E ψ (k) (Z, θ∗ ) | X = x , σ kl (x) = E ψ (k) (Z, θ∗ )ψ (l) (Z, θ∗ ) | X = x for k, l = 1, . . . , m, and let λ(x) = E [S(Z) | X = x]. The next assumption summarizes the smoothness conditions on the different nonparametric functions. 4 8/3 4 2.6 f (·) ∈ G∞ . For all k, l = 1, G  ..., m, σ kl (·) ∈  , each element of γ k (·) belong to G , σ k (·), αk (·) and 8/3 (k)

λ(·) belong to G2 and E ψ˙ (Z, θ∗ ) < ∞.

(k) b (k) = ψ (k) (Zi , θn ) , ψ˙ (k) = ψ˙ (k) (Zi , θ0 ), Proof of Theorem 1 Henceforth, ψ i = ψ (k) (Zi , θ0 ) , ψ i i P Pn Pn (k) (k) ¨ ¨ ψ i (θ) = ψ (Zi , θ), for i = 1, 2, ...n and k = 1, . . . , m, and i6=j stand for i=1 j=1,j6=i . We have m

Tn =

 X X  (k) 1 (k) Dn (Zi , Zj ) + 2T1n + T2n , n (n − 1) i6=j

with Dn (Zi , Zj ) =

(k)

Pm

k=1

(k)

ψ i ψ j h−q Kij ,

(k) T1n

=

(k)

=

T2n

k=1

 X (k)  (k) 1 (k) b ψj − ψj Kij , ψi n (n − 1) hq i6=j   X  (k) (k) 1 (k) (k) b b ψi − ψi ψj − ψj Kij . n (n − 1) hq i6=j

18

(9)

  (k) (k) We prove in a first step that T1n = Op n−1 and T2n = Op n−1 for all k = 1, ..., m. Using a mean value theorem argument,

S1n

=

0

=

(θn − θ) S1n + (θn − θ0 ) S2n (θn − θ0 ) ,

(k)

=

(θn − θ0 ) S3n (θn − θ0 ) ,

T2n

where

0

(k)

T1n

0

X (k) (k) 1 ψ i ψ˙ j Kij , n (n − 1) hq i6=j

S2n

=

S3n

=

X (k) (k)  1 ¨ ¯θn Kij , ψi ψ j n (n − 1) hq i6=j i0   ih X h (k)  1 ˙ ¨ (k) ˜θn (θn − θ0 ) ψ˙ (k) + ψ ¨ (k) ¯θn (θn − θ0 ) Kij , ψ + ψ i i j j n (n − 1) hq i6=j





¯ with θn − θ0 ≤ kθn − θ0 k and ˜θn − θ0 ≤ kθn − θ0 k .

To study S1n , we use the following lemma.

Lemma 8 (Powell, Stock and Stoker, 1989) Let Un =

 n −1 2

Pn−1 Pn i=1

j=i+1

Hn (Zi , Zj ) be a U -

statistic with symmetric kernel Hn (Zi , Zj ) and let the Zi ’s be iid. Let qn (Zi ) = E [Hn (Zi , Zj ) | Zi ], q¯n =    Pn 2 E (qn (Zi )). If E kHn (Z1 , Z2 )k = o (n) , then Un = q¯n + (2/n) i=1 [qn (Zi ) − q¯n ] + op n−1/2 .   (k) (k) (k) (k) The quantity S1n is a U -statistic with kernel Hn (Z1 , Z2 ) = h−q ψ 1 ψ˙ 2 + ψ 2 ψ˙ 1 K12 /2 and   2 E kHn (Z1 , Z2 )k       Z (k) 2 (k) 2 1 1 2 2 ˙ ˙ = E ψ1 σ kk (X2 )K12 = q E ψ 1 σ kk (X1 + hu) f (X1 + hu) K (u) du h2q h       Z  (k) 2 1 1 1 2 ˙ = E ψ1 σ kk (X1 ) f (X1 ) K (u) du + o =O = o (n) hq hq hq by Assumptions 2.4 to 2.6 together with H¨older’s inequality. As E [Hn (Zi , Zj )] = 0, Lemma 8 implies that  Pn S1n = (2/n) i=1 qn (Zi ) + op n−1/2 , where qn (Zi ) = E [Hn (Zi , Zj )|Zi ]. Moreover     E qn2 (Z) = E σ kk (X) γ 2k (X) f (X)

2

Z K (u) du

+ o(1) = O(1),

 so that S1n = Op n−1/2 . For S2n , we have 1 E |S2n | ≤ q E {S (Z2 ) E [αk (X1 ) |K12 | | Z2 ]} = E {S (Z2 ) αk (X2 ) f (X2 )} h

Z |K (u)| du + o (1) = O (1) . (k)

(k)

Hence, S2n = Op (1). Similarly, one can show that S3n = Op (1). These results imply that T1n and T2n are  both Op n−1 , as θn − θ∗ = Op (n−1/2 ) by Assumption 2.2. We now determine the asymptotic distribution of the first term in (9). We use of the following result for degenerate U -statistics.

19

Lemma 9 (Hall, 1984) Let Un be as in Lemma 8, with E [Hn (Zi , Zj ) | Zi ] = 0 a.s. Let Gn (Z1 , Z2 ) = E [Hn (Z3 , Z1 ) Hn (Z3 , Z2 ) | Z1 , Z2 ]. EG2n (Z1 , Z2 ) + n−1 EHn4 (Z1 , Z2 ) = 0, n→∞ E 2 Hn2 (Z1 , Z2 )

If lim

then

nUn d −→N (0, 1) . (Z1 , Z2 )

2E 1/2 Hn2

The first degenerate U -statistic with kernel Dn (·, ·), and the corresponding Gn (·, ·) is such h term of (9) iis aP Pm 2 m Pm Pm that E Gn (Z1 , Z2 ) = k=1 l=1 k0 =1 l0 =1 λkk0 ll0 , where λkk0 ll0

= h−4q E {σ kk0 (X1 ) σ ll0 (X2 ) E [K13 K23 σ kl (X3 ) | X1 , X2 ] E [K13 K23 σ k0 l0 (X3 ) | X1 , X2 ]}  Z    X2 − X1 1 0 0 E σ kk (X1 ) σ ll (X2 ) K (u) K u + σ kl (X1 + hu) f (X1 + hu) du = h2q h Z    X2 − X1 0 0 0 0 0 0 0 K (u ) K u + σ k l (X1 + hu ) f (X1 + hu ) du h  Z  1 0 0 = E σ kk (X1 ) σ ll (X1 + hv) K (u) K (u + v) σ kl (X1 + hu) f (X1 + hu) du hq Z   0 0 0 0 0 0 0 K (u ) K (u + v) σ k l (X1 + hu ) f (X1 + hu ) du f (X1 + hv) dv =

1  E σ kk0 (X1 ) σ ll0 (X1 ) σ kl (X1 ) σ k0 l0 (X1 ) f 3 (X1 ) q h Z Z Z 

 K (u) K (u + v) K (u0 ) K (u0 + v) du du0 dv + o h−q = O



1 hq

 ,

by Assumptions 2.4–2.6. Moreover, h

2

E Hn (Z1 , Z2 )

i

    m m X X X1 − X2 1 2 E K σ kl (X1 ) σ kl (X2 ) = h2q h k=1 l=1   Z m X m X 1 2 = E σ (X ) K (u) σ (X + hu) f (X + hu) du kl 1 kl 1 1 hq k=1 l=1   Z m X m X   1  2 1 2 −q = E σ (X ) f (X ) K (u) du + o h = O , 1 1 kl hq hq k=1 l=1

h

4

E Hn (Z1 , Z2 )

i

= ≤

  m X m X m X m 0 0 0 0 X 1 (k) (k) (k ) (k ) (l) (l) (l ) (l ) 4 E K12 ψ 1 ψ 2 ψ 1 ψ 2 ψ 1 ψ 2 ψ 1 ψ 2 h4q 0 0

k=1 l=1 k =1 l =1 m X m X m X m X

k=1 l=1 k0 =1 l0 =1



m X m X m X

m X

k=1 l=1 k0 =1 l0 =1

 ≤ O

1 h3q

1 h3q

Y

  −q 16 4  1/4 E h K12 σ p (X1 ) σ 4p (X2 )

p∈{k,l,k0 ,l0 }

O(1) h3q

Y

  4  1/4  E σ p (X1 ) σ 4p (X1 ) f (X1 ) + o h−3q

p∈{k,l,k0 ,l0 }

 .

Assumption 2.5 ensures that the conditions of Lemma 9 are fulfilled, and Theorem 1 follows.

20

Proof of Corollary 2 Let us first consider the properties of Tn when H0 does not hold. Notice that (9) holds with θ∗ in place of θ0 . By a weak law of large numbers, it is straightforward to check that the   (k) (k) corresponding S1n , S2n and S3n are all Op (1), so that T1n = Op n−1/2 and T2n = Op n−1 , for all k = 1, ..., m, using Assumption 2.2. Similarly, we get m h h i i X X 1 Dn (Zi , Zj ) →p E [D (Z1 , Z2 )] = E E 2 ψ (k) (Z, θ∗ ) |X f (X) + o (1) , n(n − 1) i6=j

k=1

so that Tn converges to a strictly positive limit when H0 does not hold. By a similar reasoning, it is easily shown that Vn →p V , whether H0 holds or not. Proof of Theorem 3 Under the null hypothesis, we shall show that nb hq/2 Tn (b h) − nhq/2 Tn (h) = op (1), where the dependence of Tn on the bandwidth is made explicit. For this equality to hold, we need to show tightness of the process n(νh)q/2 Tn (νh) for ν ∈ [B1 , B2 ], with 0 < B1 < 1 < B2 < ∞. It can be seen that the second and third term in (9) are both Op (n−1 ) uniformly for ν ∈ [B1 , B2 ]. Let T˜n (h) be the first term in (9). It is asymptotically normal at a fixed point and converges to the same limit for any ν. Moreover, for ν 1 , ν 2 ∈ [B1 , B2 ], h i2 E n(ν 1 h)q/2 T˜n (ν 1 h) − n(ν 2 h)q/2 T˜n (ν 2 h) (     2 ) m m 4n X X X1 − X2 X1 − X2 −q/2 −q/2 = E σ kl (X1 ) σ kl (X2 ) (ν 1 h) K − (ν 2 h) K n−1 ν1h ν2h k=1 l=1

  −q/2 is O (ν 1 − ν 2 )2 by a Taylor expansion of ν 2 K [x/ν 2 ] around ν 1 , using Assumption 2.6 and H¨older’s p/2 inequality. Hence, n (νh) T˜n is tight for ν ∈ [B1 , B2 ], see Billingsley (1968). Under the alternative hypothesis, it is sufficient to show that Tn (νh) is tight for ν 1 , ν 2 ∈ [B1 , B2 ], which is shown similarly. An analogous result for Vn then implies the desired result. ∗(k)

Proof of Theorem 6 Henceforth, ψ i

b ∗(k) = ψ (k) (Zi , θn ) ζ , ψ˙ ∗(k) = ψ˙ (k) (Zi , θ∗ ) ζ = ψ (k) (Zi , θ∗ ) ζ i and ψ i i i i

and E ∗ [.] ≡ E [.|Zn ]. We have a decomposition similar to (9), that is,  −1 X m X n ∗(k) ∗(k) ∗ ∗(k) Tn = Dn (Zi , Zj ) + 2T1n + T2n , 2 i6=j

∗(k)

where Dn

(k) ∗(k) ∗(k) (k) (k) b ∗(k) in and T2n are defined similarly to Dn , T1n and T2n in (9), with ψ i and ψ i  b (k) . We now show that T ∗(k) and T ∗(k) are both op n−1 h−q/2 for all k = 1, . . . , m. and ψ i 1n 2n ∗(k)

, T1n

(k)

place of ψ i

(10)

k=1

Using a mean value theorem argument, ∗(k)

T1n where

∗ S1n

0

∗ ∗ = (θn − θ∗ ) S1n + (θn − θ∗ )0 S2n (θn − θ∗ ) ,

=

X ∗(k) ∗(k) 1 ψ i ψ˙ j Kij , q n (n − 1) h

=

1 n (n − 1) hq

i6=j

∗ S2n

X

∗(k) ¨ ∗(k) ψj

ψi

i6=j

21

 ¯θn Kij ,



¯θn − θ∗ ≤ kθn − θ∗ k

∗(k)

¨ and ψ j

¨ (θ) = ψ

(k)

 ∗ ∗ (Zj , θ) ζ i . Since E ∗ ζ i ζ j = 0 for j 6= i, S1n and S2n are degenerate U -statistics. Hence, 2

∗ E ∗ [S1n ] = 0 and as E ∗ (ζ i ) = 1 for all i,

 ∗2  E ∗ S1n =



1 n (n − 1) hq

2 X n X n 

(k)

ψi

2 

(k) ψ˙ j

2

2 Kij .

i=1 j6=i

Using similar arguments as in the proof of Theorem 1,  2   ∗2  1 n(n − 1)O(hq ) = O(n−2 h−q ). E E ∗ S1n = n (n − 1) hq  ∗2   = O(1). As (θn − θ∗ ) = Op n−1/2 , we get that nhq/2 T ∗(k) = Similarly, we can prove that E E ∗ S2n 1n ∗(k)

op (1) and nT2n

= op (1). These terms are then negligible conditional upon the initial sample.

Let us now determine the asymptotic distribution of the first term in (10). For the sake of simplicity, we treat the case where m = 1. We then consider X X −q 1 1 Ten∗ = Dn∗ (Zi , Zj ) = hj ψ ∗i ψ ∗j Kij , n(n − 1) n(n − 1) i6=j

i6=j

e∗ where E ∗ [Dn∗ (Zi , Zj ) |ξ i ] = 0, for all i. By Proposition 3.2 in De Jong (1987), σ −1 n Tn →d∗ N (0, 1) in  2 probability if G1 , G2 and G3 are of lower order in probability than σ 2n , where σ 2n

≡ E 



h

i  2 e Tn =

1 n(n − 1)

1 n(n − 1)

4 X

2 X n−1 n X X

E ∗ [Dn∗ (Zi , Zj ) Dn∗ (Zk , Zl )] =

i6=j k=1 l=k+1



   E Dn∗4 (Zi , Zj ) = E 2 V14



1 n(n − 1)

G1



G2

 −4 n−2 n X n−1 X X   n ≡ E ∗ Dn∗2 (Zi , Zj ) Dn∗2 (Zi , Zk ) 2 i=1 j=i+1

i6=j

2 X

1 Vn , n(n − 1)hq

4 h−4q ψ 4i ψ 4j Kij , j

i6=j

k=j+1

  X n−1 n X X  n −4 n−2 4 2 2 = E V1 h−4q ψ 4i ψ 2j ψ 2k Kij Kik , j 2 i=1 j=i+1 k=j+1

G3

 −4 n−3 X n−2 X n−1 X n ≡ 2 i=1 j=i+1

n X

 −4 n−3 X n−2 X n−1 X n = 2 i=1 j=i+1

n X

E ∗ [Dn∗ (Zi , Zj ) Dn∗ (Zi , Zk ) Dn∗ (Zl , Zj ) Dn∗ (Zl , Zk )]

k=j+1 l=k+1

h−4q ψ 2i ψ 2j ψ 2k ψ 2l Kij Kik Klj Klk . j

k=j+1 l=k+1

Now, G1 , G2 and σ 2n

E

h

 i 2 2

σn

2

are positive and it is easily checked, as in the proof of Theorem 1, that

 2  −2 X n X n   n −2q 2 =E hj ψ 2i ψ 2j Kij = n−8 O n2 h−3q + n3 h−2q + n4 h−2q = O(n−4 h−2q ), 2 i=1 j6=i

   and that similarly E [G1 ] = O n−6 h−3q , E [G2 ] = O n−5 h−2q and E |G3 | = O n−4 h−q . The convergence of the distribution function is then uniform by Polya’s theorem.

22

Proof of Theorem 4 Write, uniformly in x n1/2 Rn (x)

=

n 1 X

n1/2

ψ (Zi , θ∗ ) 1 (Xi ≤ x) + G (x, θ∗ )

i=1

n 1 X

n1/2

l (Zi , θ∗ ) + op (1)

i=1

= Rn0 (x) + Rn1 (x) + op (1) , h i where G(x, θ) = E ψ˙ (Z, θ) 1 (X ≤ x) . The limit process is identified by the convergence of the finite dimensional distributions. Choose (x1 , ..., xp ) ∈ Rq and normalized vectors (a1 , ..., ap ) ∈ Rm . Then apply a Central Limit Theorem to obtain that

n1/2

p X

p X   a0j R∞ (xj ) . a0j Rn0 (xj ) + Rn1 (xj ) →d

j=1

j=1

We now show tightness of the process. Note that the index parameter in Rn1 is included in a deterministic continuous bounded function. Therefore, Rn1 is tight. For Rn0 , tightness will be proved when the marginal distributions of each component of X are uniform in [0, 1] . The general case is delat with applying the usual quantile transformation coordinate by coordinate. When the marginal distributions of X are uniform in the 0 q 0 0 0 interval [0, 1], Rn0 takes its values in ×m k=1 D [0, 1] . Let Rn (x) = Rn1 (x) , ..., Rnm (x) . Since we endow q m ×m k=1 D [0, 1] with the product topology (which is generated by the metric d (f, g) = max{d (fk , gk ) : k = 1, ..., m}, where fk , gk are the k-th coordinate of f, g respectively and d is the metric in the Skorokhod q

0 Space D [0, 1] ), tightness follows if each coordinate is tight. The increment of the process Rnk around

B = (s, t] = ×m j=1 (sj , tj ) is defined in Bickel and Wichura (1971) as 0 Rnk (B) =

1 X e1 =0

···

1 X

P

(−1)q−

p

ep

0 Rnk (s1 + e1 (t1 − s1 ), . . . , sq + eq (tq − sq )) .

eq =0

Then, it suffices to check the tightness condition in Bickel and Wichura (1971). That is, we have to show that, for any two neighbor intervals B and B 0 = (s0 , t0 ], i.e. they abut and for some j ∈ {1, ..., q} , they have the same j-th face ×k6=j (sk , tk ) = ×k6=j (s0k , t0k ) ,   2 2 0 0 E Rnk (B) Rnk (B 0 ) ≤ 3µ (B) µ (B 0 ) , for all k = 1, ..., m, where µ is an arbitrary measure. Notice that, applying Stute’s (1997) Lemma 1,      2 2 0 0 E Rnk (B) Rnk (B 0 ) ≤ nE α21 β 21 + 3n (n − 1) E α21 E β 21 , (k)

(k)

(k)

with αi = n−1/2 ψ i 1 (Xi ∈ B) and β i = n−1/2 ψ i 1 (Xi ∈ B 0 ) with ψ i = ψ (k) (Zi , θ∗ ) . Since αi β i = 0,       2 2 2 (k) (k) 2 0 0 E Rnk (B) Rnk (B 0 ) ≤ 3E ψ 1 1 (X1 ∈ B) E ψ 1 1 (X1 ∈ B 0 ) .

q

Thus, tightness in ×m k=1 D [0, 1] is proved when the marginal distributions of each component of X are uniform in [0, 1] . The general case is proved applying the usual quantile transformation coordinate by coordinate.

23

Proof of Corollary 5

By Lemma in Kieffer (1959, p. 424),

R

Rq

Rn (x)0 Rn (x) [dFn (x) − dF (x)] =

op (1) . The result, is an immediate consequence of the continuous mapping theorem. Proof of Theorem 7

  2.8 The function ` (·, ·) is such that E [` (Z, θ∗ )] = 0 and E ` (Z, θ∗ ) ` (Z, θ∗ )0 . (ii) `˙ (·, ·) =

˙

∂`(·, ·)/∂θ exists almost surely in an open neighborhood N (θ∗ ) of θ∗ and supθ∈N (θ∗ ) `(·, θ) < L (·) , with E [L (Z)] < ∞. It is immediate to show that supx,s kΩn (x, s) − Ω (x, s)k = op (1) . It is also straightforward to obtain



 ¯ n (x) − R ¯ n0∗ (x) = op n−1/2 , with R ¯ n0∗ (x) = n−1 Pn ri (x, θ∗ ) ζ i , where ri (x, θ) is defined that supx R i=1 in (8). Fix some (x1 , ..., xp ) ∈ Rq and normalized vectors (a1 , ..., ap ) ∈ Rm , and define ψ ∗i = ψ i ζ i with ψ i = ψ (Zi , θ∗ ) . By the Cr` amer-Wold device, the convergence of the finite distributions follows from the convergence of n1/2

p X

¯ n0∗ (xj ) a0j R

=

j=1

n

  p n X X 1  a0j ψ i (1 (Xi ≤ xj ) + G (xj , θ∗ ) l (Zi , θ∗ )) ζ i = 1/2 i=1

j=1

n 1 X

n1/2

Wi ζ i .

i=1

Asymptotic normality is proved by showing the Lindeberg-Levi’s condition. That is, for each δ > 0, n

Ln (δ) =

 i 1X 2 h 2  Wi E ζ i 1 |Wi ζ i | > n1/2 δ Zn = op (1) . n i=1

Since |ζ i | ≤ c, for all i and some c > 0, Ln (δ) ≤

  n c2 X 2 n1/2 δ Wi 1 |Wi | ≥ = op (1) , n i=1 c

using the fact that Wi2 are iid with finite first moment. Then, Theorem 7 will follow from the tightness ¯ 0∗ . As in the proof of Theorem 3, assume without loss of generality that each coordinate in X is of n1/2 R n uniform in the interval [0, 1] . We have ¯ n0∗ (x) = n1/2 R The tightness of

Rn01∗

n 1 X

n1/2

ψ i 1 (Xi ≤ x) ζ i + G (x, θ∗ )

i=1

n 1 X

n1/2

l (Zi , θ∗ ) ζ i = Rn00∗ (x) + Rn01∗ (x) .

i=1

follows from the continuity of G and applying a Central Limit Theorem to the random

00∗ sum, which does not depend on x. Consider the increment Rnk (B) around the interval B, as in Theorem

2. Bickel and Wichura’s (1971) tightness condition is satisfied, since for two neighbor intervals B and B 0 , h i 2 2 00∗ 00∗ E Rnk (B) Rnk (B 0 ) ≤ 3µn (B) µn (B 0 ) , where µn (B) = n−1

Pn

i=1

ψ 2i 1 (Xi ∈ B) . Applying, the uniform LLN, we have that, sup |µn (B) − µ (B)| = op (1) , B

  where µ (B) = E ψ 21 (X ∈ B) is a continuous measure.

24

REFERENCES Andrews, D. W. K.(1997): “A Conditional Kolmogorov Test,” Econometrica, 65, 1097–1128. Apostol, T. M. (1957): Mathematical Analysis. Reading: Addison-Wesley. Bickel, P.J. and M.J. Wichura (1971): “Convergence Criteria for Multiparameter Stochastic Processes and Some Applications,” Annals of Mathematical Statistics, 42, 1656-1670. Bierens, H. (1982): “Consistent Model Specification Test,” Journal of Econometrics, 20, 105–134. Bierens, H. (1990): “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458. Bierens, H. and D. Ginther (2001): ”Integrated Conditional Moment Testing of Quantile Regression Models,” Empirical Economics, 26, 307–324. Bierens, H. and W. Ploberger (1997): “Asymptotic Theory of Integrated Conditional Moment Tests,” Econometrica, 65, 1129–1151. Buckley, M. J. (1991): “Detecting a Smooth Signal: Optimality of Cusum Based Procedures,” Biometrika, 78, 253262.. Burbidge, J.B., L. Magee and A.L. Robb (1988): ”Alternative Transformations to Handle Extreme Values of the Dependent Variable,” Journal of the American Statistical Association, 83, 123–127. Chamberlain, G. (1987): “Asymptotic Efficiency in Estimation With Conditional Moment Restrictions,” Journal of Econometrics, 34, 305–334. Chen, X. and Y. Fan (1999): “Consistent Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series,” Journal of Econometrics, 91, 373–401. Darolles, S., J-P. Florens and E. Renault (1999): ”Nonparametric Instrumental Regression,” Universit´e Toulouse 1. Davidson, R. and J.G. MacKinnon (1993): Estimation and Inference in Econometrics. New-York: Oxford University Press. De Jong, P. (1987): “A Central Limit Theorem for Generalized Quadratic Forms,” Probability Theory and Related Fields, 75, 261–277. De Jong, R.M. (1996): “The Bierens Test Under Data Dependence,” Journal of Econometrics, 72, 1–32. De Jong, R. M. and H. J. Bierens (1994): “On Limit Behavior of a Chi-Square Type Test if the Number of Conditional Moments Tested Approaches Infinity,” Econometric Theory, 9, 70–90. Delgado, M (1993): “Testing the Equality of Nonparametric Regression Curves,” Statistics and Probability Letters, 17, 199–204. Eubank, R. L. and Spiegelman, C.H. (1990): “Testing the Goodness of Fit of a Linear Model via Nonparametric Regression Techniques,” Journal of the American Statistical Association, 85, 387–392.

25

Fan, Y. and Q. Li (1996): “Consistent Model Specification Tests: Omitted Variables, Parametric and Semiparametric Functional Forms,” Econometrica, 64, 865–890. Fan Y. and Q. Li (2000): “Consistent Model Specification Tests: Nonparametric Versus Bierens’ Tests,” Econometric Theory, 16(6), 1016–1041. Fan, Y. and O. Linton (1999): “Some Higher Order Theory For a Consistent Nonparametric Specification Test,” forthcoming in Journal of Statistics Planning and Inference. Gourieroux, C. and A. Monfort (1995): Statistics and Econometric Models. Cambridge: Cambridge University Press. Guerre, E. and P. Lavergne (2001): “Rate-Optimal Data-Driven Specification Testing in Regression Models,” Universit´e Paris 6. Guerre, E. and P. Lavergne (2002): “Optimal Minimax Rates for Nonparametric Specification Testing in Regression Models,” Econometric Theory, 18(5), 1139–1171. Hall, P. (1984): “Central Limit Theorem for Integrated Square Error of Multivariate Nonparametric Density Estimators,” Journal of Multivariate Analysis, 14, 1–16. Hall, P. and J. L. Horowitz (1996): “Bootstrap Critical Values for Tests Based on Generalized-Method-of-Moments Estimators,” Econometrica, 64, 891–916. Hansen, B. (1996): “Inference When a Nuisance Parameter is NotIdentified Under the Null Hypothesis,” Econometrica, 64, 413–430. H˝ ardle, W. and E. Mammen (1993): “Comparing Nonparametric versus Parametric Regression Fits,” Annals of Statistics, 21, 1926–1947. Hart, J.D. (1997): Nonparametric Smoothing and Lack-of-Fit Tests. Springer Verlag, New-York. Hong, Y. (1993): “Consistent Testing for Heteroscedasticity of Unknown Form,” U.C. San Diego. Hong, Y. and H. White (1995): “Consistent Specification Testing via Nonparametric Series Regressions,” Econometrica, 63, 1133–1160. Hong-Zhy, A and C. Bin (1991): “A Kolmogorov-Smirnov Type Statistic with Application to Test for Nonlinearity in Time Series,” International Statistical Review, 59, 287–307. Horowitz J., and W. H˝ ardle (1994): “Testing a Parametric Model Against a Semiparametric Alternative,” Econometric Theory, 10, 821–848. Horowitz, J.L. and V.G. Spokoiny (2001): “An Adaptive, Rate-Optimal Test of a Parametric Model Against a Nonparametric Alternative,” Econometrica, 69(3), 599–631. Hart, J. D. (1997): Nonparametric Smoothing and Lack-of-Fit Tests. New-York: Springer-Verlag.

26

Kieffer, J. (1959): “K-Sample Analogues of the Kolmogorov-Smirnov and Cramer-von-Mises Tests”, Annals of Mathematical Statistics, 30, 4420–4447. Koul, H. L. and Stute, W. (1999): “Nonparametric Model Checks for Time Series,” Annals of Statistics, 27, 204–236. Kozek, A. S. (1991). “A Nonparametric Test of Fit of a Parametric Model,” Journal of Multivariate Analysis, 37, 66–75 Lavergne, P. and Q. H. Vuong (2000): “Nonparametric Significance Testing,” Econometric Theory, 16, 576–601. Li, Q. (1999): ”Consistent Model Specification Tests for Time Series Econometric Models,” Journal of Econometrics, 92, 101–147. Li Q. and S. Wang (1998): “ A Simple Consistent Bootstrap Test for a Parametric Regression Function,” Journal of Econometrics, 87, 145–165. MacKinnon, J.G. and L. Magee (1990): ”Transforming the Dependent Variable in Regression Models,” International Economic Review, 31, 315–339. Newey, W. K. (1985a): “Maximum Likelihood Specification Testing and Conditional Moment Tests,” Econometrica, 53, 1047–1070. Newey, W. K. (1985b): “Generalized Methods of Moments Specification Testing,” Journal of Econometrics, 29, 229–256. Newey, W.K. (1990): “Efficient Instrumental Variables Estimation of Nonlinear Models,” Econometrica, 58, 809–837. Prakasa Rao, B.L.S. (1983): Nonparametric Functional Estimation. Orlando: Academic Press. Powell, J. L., J. H. Stock and T. M. Stoker (1989): “Semiparametric Estimation of Index Coefficients,” Econometrica, 57, 1403–1430. Robinson, P.M. (1989): “Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series,” Review of Economic Studies, 56, 511-534. Stinchcombe, M.B. and H. White (1998): “Consistent Specification Testing With Nuisance Parameters Present Only Under the Alternative,” Econometric Theory, 14, 295–325. Stute, W. (1997): “Nonparametric Model Checks for Regression,” Annals of Statistics, 25, 613–641. Stute, W., W. Gonzalez-Manteiga and M. Presedo (1998): ”Bootstrap Approximations in Model Checks for Regression,” Journal of the American Statistical Association, 93, 141-149. Stute, W., Thies, S. and Zhu, L.-X. (1998): “Model Checks for Regression: an Innovation Process Approach,” Annals of Statisics, 26, 1916-1934. Su, J. Q. and L. J.Wei (1991): “A Lack of Fit Test for the Mean Function in a Generalized Linear Model,” Journal of the American Statistical Association, 86, 420–426.

27

Tauchen, G. (1985): “Diagnostic Testing and Evaluation of Maximum Likelihood Models,” Journal of Econometrics, 30, 415–443. Whang, Y.J.(2000): “Consistent Bootstrap Tests of Parametric Regression Functions,” Journal of Econometrics, 98, 27–46. Whang, Y.J.(2001): “Consistent Specification Testing for Conditional Moment Restrictions,” Economics Letters, 71, 299–306. White, H. (1994): Estimation, Inference and Specification Analysis. New-York: Cambridge University Press. Wooldridge, J.M. (1990): “A Unified Approach to Robust, Regression-Based Specification Tests,” Econometric Theory, 6, 17–43. Zheng, X. (1996): ” A Consistent Test of Functional Form via Nonparametric Estimation Techniques,” Journal of Econometrics, 75, 263-289. Zheng, X. (1998): ” A Consistent Nonparametric Test of Parametric Regression Models under Conditional Quantile Restrictions,” Econometric Theory, 14, 123-138.

28