Riemannian Cons7

With panel data, it is possible to take these difficulties into account ... between cross-section and time-series estimates are discussed and used to ... Our results (see Table 1) suggest ..... M0 → M2=M0+δM0 → M'2=M0+δM0+d(M0+δM0). ... The test of the Riemannian versus the Euclidean structure of the space relies on the.
215KB taille 4 téléchargements 221 vues
Riemannian Consumers François Gardes Paris School of Economics, Université Paris I Panthéon Sorbonne1 Very Preliminary version February, 2008

Abstract The difference observed between the social distribution of consumer expenditures and their change over time is modelled using Riemannian geometry. Social distribution is measured along the geodesics of Riemannian surfaces, while changes over time correspond to movements along the tangents of these Riemannian surfaces. The Riemannian curvature of the consumption space is shown to be non-null for the Polish consumers surveyed in a four years Polish panel. This implies that usual econometric methods based on a unique metric over the whole consumption space are inadequate to estimate geodesics on the Riemannian surface. In order to propose an alternative, we define a synthetic time axis in the space of the variables which are observed in cross-section. Considering the relative position of two individuals along this time dimension allows us to estimate equations of geodesics. Also, an instrumentation using this synthetic time axis is proved to be very efficient compared to usual instrumentation for dynamic models on panel data.

1

Maison des Sciences Economiques, 106-112 Boulevard de l’Hôpital, 75647, Paris cedex 13, France. This paper has benefited remarks from seminar participants at University of Québec in Montréal, Caen, Paris X and Strasbourg Universities, and from S.Bazen, F.Laisney, P.Merrigan and C.Villa. The author thanks Professor Gorecki and Christophe Starzec for allowing him to use the Polish panel data set.

1

Introduction Economic relationships do not always appear to be the same in time series and in cross-section relationships, or in the short-run compared to the long-run. This last difference is generally related either to biases in the estimation of long-term relationships, or to specification biases due to the dynamic structure of the model. The difference between crosssection and time-series estimates, recognized early in the literature (see for example the discussion in Malinvaud’s Econometric Methods), is less well accepted by the profession, as it implies abstaining from making dynamic inferences from cross-section estimation. (differences between agents observed in the same period provide different information to changes over time for the same individual or the same population). This difference has been suggested to result from aggregation biases, from the time when panels of individuals were not available. Specification biases (whenever the estimates performed on surveys do not take into account dynamic behavior, such as habits or addiction in consumption functions) and different effects of errors in variables in the two dimensions have also been advanced as potential explanations. With panel data, it is possible to take these difficulties into account and to show that differences still appear between estimates in the temporal and spatial dimensions. In this paper, we consider that spatial relationships correspond to geodesics in a Riemannian space, while temporal relationships are modelled as movements along tangents to these surfaces. Surprisingly, Riemannian geometry has been only little applied to economic problems (it has, however, recently appeared in theoretical statistics). We show that tensor algebra allows us to analyze the difference in estimations in both dimensions, and to compute the curvature of the Riemannian space. When this curvature is non-zero (i.e. when the integrability conditions, which make it possible to define a common Euclidean metric for all points of the space, do not hold), the space is no longer Euclidean, and the shortest route between two points are not lines but geodesics, which may bring about new econometric problems. Indeed, agents situated on a Riemannian space are supposed to follow optimal pathes (along their life-cycle) which are geodesics over the surface. Thus observing the geodesics gives some information on the time structure of cross-sectional data. Geodesics are defined by a system of differential equations the estimation of which allows to recover some of the geometric properties of the Riemannian space. Therefore, considering these equations may permit to estimate dynamic models using only cross-sectional data. The logic of the first part of the article is as follows: first, empirical differences between cross-section and time-series estimates are discussed and used to demonstrate the dependency of shadow prices (corresponding to non-monetary resources and constraints concerning the agents, which are not observed in the data) on agents’ permanent characteristics of the agents. Second, this relationship is unlikely to be constant across the population, so that a Riemannian framework is required to model differences between crosssection and time series relationships. Third, the Riemannian scalar curvature of the consumption space is shown to be non-null, so that a unique metric cannot be defined over the Riemannian space, which leads us to discuss the application of usual econometric methods on cross-sections.

2

Section 1 presents empirical evidence of biases in the estimation of economic relationships on cross-section data, compared to time-series. Section 2 proposes a Riemannian framework to model consumer behavior. Section 3 computes the Riemannian curvature of the Polish food expenditures space. Section 3 discusses the choice of an optimal linear combination of time indicators on cross-sections and defines the method to correct for generation effects, while section 4 compares this estimation to the usual time-series estimation of dynamic models using a four-year Polish panel. All of these models are derived for household expenditure data but can easily be applied to other types of statistics.

Section I: Cross-section versus time-series estimation 1.1.

Empirical evidence

The recent literature on relative income effects and social interactions can be related to the old problem of the difference between cross-section and time-series estimation. The social distribution of consumption can be estimated by comparing individuals who, ceteris paribus, are at different positions in the income distribution (i.e. have different relative incomes) in the same survey. On the contrary, the change in expenditure due to income changes can be measured for the same agent (or the same type of agent) between two periods, thus using individual time-series (or pseudo-panel) data. Any discrepancy between the estimated income elasticities in cross-section and time-series means that similar agents (with respect to all characteristics except income) with different relative incomes are not identical, as the income position generates relative income effects which are due either to social interactions (as supposed by Duesenberry), or to latent variables related to the income position (for instance parents’ characteristics or liquidity constraints, which are not observed in family expenditure surveys). Such differences between cross-section and time-series estimates of demand functions have been observed in recent empirical work: for instance, Gardes et al. (2002) analyse the bias in income and total expenditure food elasticities estimated on panel or pseudo-panel data caused by measurement error and unobserved heterogeneity. Our results (see Table 1) suggest that unobserved heterogeneity imparts a downward bias to cross-section estimates of income elasticities of at-home food expenditure and an upward bias to estimates of income elasticities of away-from-home food expenditure. Moreover, the magnitude of the differences in elasticity estimates across methods of estimation is roughly similar in U.S. and Polish expenditure data: for instance, despite some differences between the estimations (see table 1), the relative income elasticity of food at home is around 0.2 based on a number of different methods with PSID data (1984-1987), while the time-series estimates (within or first differences) are 0.4. A Hausman test strongly rejects the equality of these cross-section and time-series estimates. In a Polish panel (1987-1990), the total expenditure elasticities for athome food are much larger than those based on PSID data. Higher elasticities are to be expected for a country in which food’s share in total expenditures is three times higher than in the U.S. Cross-section elasticities are estimated to be around 0.5, while the time-series estimate is 0.8. On the contrary, the cross-section elasticity for food away from home is estimated to be around 1 in the U.S. while the time-series elasticity is around 0.4 (similar

3

results are obtained for Poland, although the estimates are less accurate, due to the absence of food away from home for almost all Polish households during this period). Similar results have been obtained on pseudo-panels of French and Canadian surveys (Cardoso et al., Gardes et al., 1996). This research has shown that endogeneity biases exist in the cross-section estimates for half of the commodities. For example, the cross-section income effect, is significantly greater for most expenditures on services, while changes in expenditure on housing over time are more strongly related to income changes than are the differences between two households in the same survey. One way to explain these differences is to consider the relationship between the relative position of the agent in the income distribution and his non-monetary resources (such as time) or the presence of constraints (such as subsistence constraints) on choice. Table 1 Relative Income Elasticity of Food Expenditures PSID (U.S.)

Polish panel

Period

1984-87

1987-90

Income Elasticity

CS

TS

CS

TS

(i)

Food at home

0.19

0.38

0.49

0.76

(ii)

Food away from home

1.00

0.39

1.22

0.36

Direct Price Elasticity

-0.19

-0.38

Elasticity of the Shadow (i)

F.H. 1.00

0.71

Price Relative to Income(ii)

F.A. –3.13

-4.78

Population Size

2430

3630

Prices

no

by region and social category

Reference: Gardes, Duncan, Gaubert and Starzec (2005), Tables 1 and 2. Price elasticities are calibrated, according to Frisch proposal, as minus half of the corresponding T.S. income elasticities.

1.2 Measuring Shadow Prices Suppose that the monetary price pm and a shadow price π corresponding to nonmonetary resources and to constraints faced by the households are combined together into a complete price. Expressed in logarithmic form, we have: pc = pm + π.

4

Consider two estimations of the same equation: xiht = Zht.βi + pcht.γi + uiht

(1)

for good i (i = 1 to n), individual h (h = 1 to H) in period t (t = 1 to T), with Zht = (Z1ht, Z2ht). These estimations are carried out on cross-section and time-series data using the same dataset. Set uiht = αih + εiht, where αih is the specific effect which contains all of the permanent components of the residual for individual h and good i. As discussed by Mundlak (1970), the cross-section estimates can be biased by a correlation between the explanatory variables Zht and the specific effect. This can result from latent permanent variables (such as an event during childhood, parents’ characteristics, or permanent wealth) which are related to some of the explanatory cross-section variables Zht: for instance, the relative income position of the household can be related to its wealth or its genetic inheritance. Thus, the correlation δi between the time average of the vector of the explanatory variables, Zht = (zk1ht )k=1 to K1 , transformed by the Between matrix: BZht = {(1/T) Σtzkht}k=1 to K1, and the specific effect αih , αih = BZht.δi + ηih , will be added to the parameter βi of these variables in the time average estimation: Bxiht = BZht.(βi + δi) + ηih + Bεiht, so that the between estimates are biased. The difference between the cross-section and the time-series estimates amounts to δi. Let us now assume that the shadow price πiht of good i for household h in period t, depends on variables Z1ht, which also appear in the consumption function for good i: xiht = gi(pht, Zht, Sht) + uiht with pht the vector of prices pjht containing (if it exists) a shadow, unknown component πjht, and Sht the vector of all other determinants. We now assume that only the monetary component of prices change over time (the shadow component being related to permanent variables), while the different agents observed in the cross-section survey are characterized by different non-observed shadow prices (corresponding to individual non-monetary resources and constraints). Equation (1) writes on time-series (for instance in first differences between periods): xiht = Zht.βi + pmiht.γi + uiht while on cross-section it is, supposing the price effect γi and monetary prices are the same on both dimensions: xiht = Zht.β’i + u’iht = Zht.βi + πiht.γi + uiht with obvious notations. Thus, the difference between the two estimations is:

5

Zht.δ1i = Zht.θ1γ’i + (S ht.θ2 + λih + µiht). γi which allows to calculate the set of parameters θ1 after calibrating the price effect measured by γi. The marginal propensity to consume with respect to Z1ht, when considering the effect of the shadow prices πjht on consumption, can be written as: dxiht/dZht= dgi/dZht + Σj (dgi/dπjht).(dπjht/dZht). The second term will differ between cross-section and time-series because of the correlation of the shadow price with the endogenous variables Zht. So, comparing two different households surveyed in the same period, this bias adds to the direct unbiased consumption propensity with respect to Zht, as estimated on time-series data. For instance, the influence of the household head’s age cohort or income may differ in cross-section and time-series estimations if the shadow prices depend on cohort effects or on the relative income position of the agent (note that the same effect may occur with respect to monetary prices). The term Σj dgi/dπjht.dπjht/dZht above can be used to reveal the variation of shadow prices over Z1ht, dπjht/dZht, since it can be computed by resolving a system of n linear equations after having independently estimated the price marginal propensities dgi/dπj = γij. We can also consider only the direct effect of the variables Zht through the price of good i, γii.dπi/dZ, so that: dπi/dZ = [βi(c.s.) - βi(t.s.)]/γii.

(2)

The price effect γii is supposed to be the same for monetary and shadow prices. Thus, the change in the shadow price between two periods can be written as: dlnπiht = ∑k (dπi/dzk).dzkht. Under homogeneity (of degree m) of shadow prices over variables Z1ht, the shadow prices can be computed as lnπih = m∑k(dπi/dzk).zkht. However, this homogeneity assumption is quite strong, and we will prefer to compute only the change in the log shadow prices2. The income elasticities of the shadow prices of food at home and food away from home expenditures are computed in Table 1 for the PSID and the Polish panel (using equation 2, and assuming that direct price elasticities are minus one half of the corresponding income elasticities). These estimated parameters are remarkably similar in both countries: positive and smaller than one for food at home, so that the full price of food at home is greater for richer than for poorer households. One interpretation is that rich people are time constrained and have a larger opportunity cost for the additional time spent on food at home compared to food away. This difference may be thought to be greater in the USA than in Poland. On the contrary, the income elasticity of the full price for food away from home is negative, and of the same magnitude in both countries. In our analysis, the relation between cross-section and time-series estimates, modelled by shadow prices, is supposed to be linear over all of the distribution. It is however likely that the 2

This model is presented more thoroughly and applied to rationality tests in Diaye et al. (2001).

6

derivatives dπjht/dZ1ht also depend on individual characteristics. This is for instance the case in Barten’s model (1964) where relative prices depend on the family structure. This local dependency requires a geometric characterization of the consumer space.

7

Section 2. The Riemannian Geometry of consumption 2.1. Consider the set of n variables consisting of the n1 expenditures xih and the n2 characteristics zkh of an economic agent h. This set is at least of dimension n2 whenever the xi depend on the zk. The variables x and z define a point M in Rn, with n=n1+n2. A cross-section consists in N such points for all agents surveyed, a time-series in the dynamic configuration of T points for each agent. We suppose that xi is independent from xj for all j≠i and that the characteristics are exogenous (∂zk /∂zk’ = 0, k≠k’). Suppose N is large and the observations (x, z) describe a smooth surface over Rn. At each point, we define a tangent space by means of the gradient of x at this point. As this gradient changes from one point M to another infinitely close point M + dM, the derivative corresponding to two close points on the surface will differ from the gradient at each point. Thus, the gradient at a point is related to the time variations of the variables, while the derivative (named absolute derivative) over the surface describe the cross-section variations. Riemannian geometry consists in connecting together the tangent surfaces corresponding to close points by linear transformations, so that the Riemannian surface can be locally represented by an Euclidean space, the metric of this space defining the scalar product and the distance between two points on the tangent surface. If the integrability conditions are satisfied, these Euclidean metrics can be embedded in an Euclidean metric for the whole Riemannian space, which becomes an Euclidean space. In this case the Riemannian curvature is zero. Conversely, if the curvature differs from zero, the space cannot be considered as Euclidean and one cannot define an Euclidean metric over the whole space. We consider that (x, z) pertain to a n dimensional manifold Vn, associated with a domain E* of Rn (the atlas at point (x, z)) by a Cp diffeomorphism (p≥2): m = (y1,…, yn) ∈ En*⊂ Rn → M = f(m) ∈ Vn. Here yi are the co-ordinates of M on the chart E*. This means that every neighbourhood of a point M0 in Vn can be represented by the system of n co-ordinates (yi). Vn is said continuously differentiable at order p≥2 if other systems of co-ordinates, (u), are related to (y) by a C2 diffeomorphism ui = g(y). A natural basis (ei) corresponds to these coordinates for point M0 such that OM0 = ∑i yiei = yiei, using the Einstein convention (summation runs over indexes which appear as both a subscript and a superscript). So ∂M /∂yi = ei. As En* is Euclidean, we can define its metric at each point M0 by the quadratic form (2): ds2 = gij dyidyj with gij = ei.ej. The gij are arbitrary continuously differentiable (at order p) functions of the yi. So far, Vn is a topological manifold covered by compatible Cp co-ordinate charts, with a “Riemannian metric” g at each point which is any smooth positive definite matrix3 . We can now define the tangent surface at m0 (the corresponding point of M0∈Vn in E*) by the equation: m0m = [(yi-yi0) + Ψi(yi-yi0)].ei 3

Note that this is not really a more general setting than Rn, as proved by the Nash theorem - every abstract Riemannian manifold can be isometrically embedded in some Rm - but the calculus are easier in Vn.

8

with Ψi a second order function with respect to (yi-yi0). This equation defines the first-order representation in the neighbourhood of M0. Hence the system of co-ordinates yi also applies to point m in the neighbourhood of M0. It can be shown (Lichnérowicz, § 72) that any change in the system of co-ordinates from (y) to another system (y'), with A ki = ∂yi/∂y’k, changes the metrics by the formula gij = A ki A lj gkl, so that the metric does not truly depend on the system of co-ordinates: it is an intrinsic notion. A second-order representation of the tangent surface can also be defined in M0 so that (Lichnérowicz, § 74) the Euclidean metric on E* and the Riemannian metric on Vn have the same coefficients with the same derivatives (they are said to be connected metrics for yi=yi0). From point M in Vn to another point M+dM (with dM=(dyi)) which is infinitely close to M, the basis (ei) changes according to formula: ei + dei = ei + ωjiej

(3)

with ωji defined as a linear function of the changes dyi of the co-ordinates of M: ωji = Γkji dyk

(4)

This change is related to the change of the metric, since gij = ei.ej. Thus, the metric in each point writes: gij = (∂x/∂yi)/(∂x/∂yj). If the metric g satisfies the integrability conditions, there exists a system of coordinates in En* such that the metric in En* takes the form of equation (3) on all points of En*. This means that g is a continuous function on En*. These conditions, which apply to the differential equations relating the two points and the associated basis (dM = dyiei and dei = ωjiej: Lichnérowicz, equations 60.2 and 60.3), require the symmetry of the second derivatives of M and ei, and can be written: Γkji = Γijk for all i, k on all points (yi). They allow us to calculate at each point the n3 scalar Γkji knowing the n(n+1)/2 values gij on each point. 2.2. Consider now the change from M to M + dM. The natural basis (ei) assigned to point M changes to (ei + dei) on point M+dM. Thus, any vector v = viei is differentiated as dv = dviei + videi = dviei + vhωihei So, the components of the vector dv write ∇vj = dvi + ωihvh and are denoted the absolute differential of vi. The corresponding partial derivatives4 are: ∇kvi = ∂kvi + Γkihvh

(5)

This defines the Levy-Civita connection between the tangent spaces. We identify the absolute derivative, which includes the change of co-ordinates along the tangent surface and the change of the natural basis, with the cross-section marginal 4

The absolute (or covariant) derivatives are the component of a tensor, i.e. change according to certain formulae when the basis changes, while the ∂k cannot be considered as tensors. Thus, only the ∇k correspond to the change from M to M + dM on the Riemannian surface.

9

propensities on the Riemannian surface. These marginal propensities correspond to the overall change between two points M and M + dM on the surface, while the derivatives ∂kvi are associated with the change of the co-ordinate vi along the tangent surface (for a constant basis). The estimation of the cross-section and time-series parameters for each point of the surface thus allows us to calculate Γkihvh. Supposing that these parameters are constant over a neighbourhood of M0 allows us to calculate the coefficients Γkih as the parameters of the coordinates vh(M) for M in this neighbourhood. In the application, we consider v = (x, z) as containing the expenditures xi of households on n commodities, and the characteristics zk of the household, such as its income (or total expenditure), family size (measured in units of consumption), location and so on. Consider now Cartan’s quasi-parallelogram: an initial change from M0 to M1=M0+dM0 is followed by a second change from M0+dM0 to M’1=[M0+dM0+δ(M0+dM0)]. The two derivatives d and δ can be commuted: M0  M2=M0+δM0  M’2=M0+δM0+d(M0+δM0). Cartan shows that points M’1 and M’2 coincide if the Christoffel symbols are symmetrical: dδm0 – δdm0 = (Γkhi- Γihk)dykδyieh = 0 ⇔ Γkhi = Γihk which is the integrability condition for points, with m0 being the point corresponding to M0 on the related Euclidean surface. As M varies on the Riemannian surface, the basis also changes and is submitted to a rotation which is measured by the Riemann-Christoffel Tensor of Curvature. Suppose the two differentiations defined by Cartan are used one after another. The difference of the change of the basis according to the order of differentiation can be computed as: dδei – δdei = Rihrsdyrδyseh, and this difference is null if the Tensor of Curvature disappears: dδei – δdei = Rihrsdyrδyseh = 0 ⇔ Rihrs = ∂rΓihs + Γrhl Γils - ∂sΓihr - Γrhl Γilr = 0

(6)

Rihrs is called the Riemann-Christoffel Tensor of Curvature. This condition defines integrability for vectors, i.e. basis, and the absence of torsion in the Riemannian space. The Riemann-Christoffel Tensor of Curvature can also be defined in terms of the covariant derivatives of the Christoffel symbols using the Ricci identities, Rihrsvs = ∇rsvh - ∇srvh, so that: Rihrs = ∇rΓihs -∇sΓihr

(7)

The test of the Riemannian versus the Euclidean structure of the space relies on the nullity of the scalar Riemann curvature (which is obtained by contraction5 of the curvature tensor): the space is Euclidean only if the scalar Riemann curvature is zero at each point. 2.3. We can equate the cross-section marginal propensities with the covariant derivatives∇kvi and the between-period propensities with the time derivatives ∂kvi; we 5

Contraction consists in summing up a tensor over the same index which is in both high and low position: Rij

=

Rihhj.

10

therefore compute the Christoffel symbols as ∇kvi - ∂kvi = Γkihvh. The covariant derivatives of these ∇s Γkih = Γkih;s can be obtained by estimating the equation: Γkih = Γkih;s. vs + γikh+ εikh. The system of these two equations reduces to: ∇kvi - ∂kvi = Γkih;s. vh vs + γikh vh + εik

(8)

Another method of estimating the differences between the absolute and partial differentials of vi would be to use the formulae defining geodesics in Riemannian space: along geodesics acceleration is null, so that (Morgan, p. 59; Lichnérowitz, equation 76.2): d2vi/dt2 + Γkih(dvh/dt) (dvk/dt) = 0

(9)

2.4. The Riemannian structure of the consumption space can then be described as follows: (i) (ii) (iii) (iv)

Γkih ≠ 0 ⇔ Estimated cross-section consumption behavior differs from timeseries behavior6. Γkih = Γhik ⇔ path independence of consumption. Rihrs ≠ 0 ⇔ the differential equations dei = ωjiej are integrable i.e. there is no torsion for the basis affected to the different points in the space. The sign of R indicates the sign of Riemannian scalar curvature at each point.

In Riemannian space, the metric is attached to each point by the equation which defines locally the distance between two arbitrary close points M and M’=M+dM: d(M,M’)=gij dvidvj. Christoffel symbols are related to the metric g by the formulas (Delachet, II, chap.1 and Lichnerowicz, chp.4, III): gihΓkhj + gjhΓkhi = ∂kgij, with ∂kgij = ∂gij/∂vk. Christoffel symbols thus indicate second-order derivatives and can be used to calculate the curvature of the space. These formulae allow us to calculate Γ in terms of the metric: Γkhi = gjh Γkji = ½.(∂kgij + ∂igjk - ∂jgki). Therefore, the curvature of the space is defined by the metric which is attached to each point, and which can be represented as a square matrix over indices (i, j) of the different variables. This metric describes in a sense how a particular situation M is related to its close neighbours. It defines the situations, in terms of variables v, in which the individual will move when the determinants of its consumption change. The set of these situations is the geodesic passing through M. Thus, the metric, which plays the role of gravitation in physics (describing the properties of the space by the gravity attached to each of its point) can be interpreted as the full cost associated with each situation, and in this sense it generalizes the full price defined in Section 1.2.

6

Note that the difference between cross-section and time-series parameters depends on the agent’s location on the Riemannian surface, i.e. that the shadow prices are not the same for all agents.

11

2.5. The Riemannian curvature of the consumption space The derivatives Γkih;s of Christoffel symbols are computed as indicated in section 1 by estimating equation (8). ∇kvi and ∂kvi have been estimated on the Polish panel (1987.I1990.IV, see the Appendix for a description of the data) containing data over 3707 households. ∇kvi is the parameter of log income in a linear Almost Ideal Demand equation estimated in the Between dimension. ∂kvi is the corresponding parameter after a within transformation. These parameters differ according to three education levels, four locations and five age groups. So, the variability of the cross-section and time-series parameters is not unmanageable (60 different values over 3630 households). The Riemann curvature is computed by equation (7): Rihrs = ∂rΓihs + Γhhl Γils - ∂sΓihr - Γilh Γshl = ∇rΓihs -∇sΓihr = (γkis;h ∇rvh + γkis;r)- (γkir;h. ∇svh + γkirsh) = γkis;h ∇rvh - γkir;h. ∇svh for h≠s and r, as the γ are symmetrical in (r,s). The indices k, r, s are defined according to the scheme: k

r

s

Rihrs

ln y ln y ln y ln y lnuc lnuc lnuc lnuc

ln y ln y lnuc lnuc ln y ln y lnuc lnuc

ln y lnuc ln y lnuc ln y lnuc ln y lnuc

0 R12 - R12 0 0 - R12’ R12’ 0

with ln y=logarithm of total Expenditures and lnuc=logarithm of the number of consumption units. The elasticity of family size as regards income is calibrated at the value obtained in the estimations: 0.1, and the elasticity of income over family size at –0.05 (some endogeneity of labour supply with respect to family size has been observed in Poland, which may compensate the under-indexation of social transfers on the size of the family, see GardesStarzec, 2003). Table 2 Estimation Results for food at home expenditures

All Households R12

1st level 2nd level of education of education

3rd level of education

-0.00035

-0.00585

-0.00345

(0.28)

(3.21)

(1.98)

(0.42)

-0.01271

-0.00663

-0.98.10-4

(6.22)

(3.46)

(0.04)

R12’ -0.00525 (4.00)

-0.00135

12

N

14520

8244

5268

1008

Note: t-ratios in parentheses. Specification: wiht = αi + Σj γij ln pjt + βi ln [mht/a(pt)] + Wht.γi + uiht. Logarithm of total Expenditures instrumented. Other explanatory variables: log of household head’s age, number of Consumption Units (in log), relative logarithmic prices, education and location dummies, quarter dummies for each year. True prices are approximated by a Stone price index7: ln a(pt) = Σj wi ln pit with the wi average budget share over the population. Data: The Polish panel covering 3630 households for the period 1987.I-1990.IV. 77 households are dropped due to missing data.

Estimation over the whole population in all four waves (Table 2) generates Riemannian curvature significantly different from zero regarding the derivative k with respect to family size. It is also significant for the k-derivative with respect to total expenditure in some estimations, for instance for the whole population in year 1987. When the population is split by education, the curvature is significantly different from zero for lower levels of education, and decreases as education rises. Thus, despite the similarity between the estimated parameters for income and family size (and also the fact that only these are the only two variables considered for k-derivatives, and only food at home expenditures are analyzed), Polish consumer space is not Euclidean. The consequence is that the metric changes from one point to another: on a Riemannian space, the differential metric writes ds2=gijdvidvj, with dvi, dvj the changes in variables vi, vj. Indeed, minimising the Riemannian distance over the whole space corresponds to calculate the usual estimate by Maximum Likehood or least squares, constraining the optimization by the change of the shadow prices from one point to another: the metric at each point changes according to the non-monetary resources and the constraints, which characterize this location in Riemannian space. The curvature of the Riemannian space thus indicates the heterogeneity of these conditions of choice over the population. Therefore it is not possible to recover consumption functions by minimising a unique distance over the whole space: the distance between some point M and its estimate M1 depends on the coefficients gij which are attached to this point, if the two points are sufficiently close so that the same metric applies to them. For another point M’, it will be necessary to take into account the change of the metric when computing the distance from its estimate M1’. Moreover, if points M and M’ are not sufficiently close, the metric changes continuously on every path between them, so that it is necessary to determine the minimum path between them in the Riemannian space, i.e. a geodesic between the two points, by integrating ds between the two points. A solution would be to project the Riemannian space on an Euclidian space, where a unique distance is defined. It is well known that every Riemannian manifold of dimension n can be embedded in an Euclidian space of greater dimension. The first embedding theorems were proved by Janet and Cartan, the Euclidian space necessitating at least (n/2)(n+1) dimensions (see Ivey, Landsberg, 2000, 5.10), but the general theorem was proved by extraordinary methods by John Nash (1956): his theorem 3 realizes non-compact n-manifolds in (n/2)(n+1)(3n+11) dimensions. We propose a simple projection of the Riemannian consumption space into an Euclidian space by defining a unique metrics defined by the position of the household in its life cycle. Indeed, another way of estimating the curvature of the space would be to estimate the equation of geodesics on cross-sections: whenever the curvature is non-null, the geodesics are not straight lines, because the basis changes along the geodesic. Thus, a new econometrics on 7

We are interested in income parameters, so that this approximation is sufficient, and no correction is needed for computed price parameters (see Diaye et al., 2002, for details).

13

Riemannian space must be defined by considering geodesics. In cross-sections, geodesics represent the shortest paths from one point to another. So, an agent can be supposed to follow a geodesic over her life cycle. The definition of a time dimension on cross-section, discussed in the subsequent section, may permit the estimation of Christoffel symbols along geodesics. Considering households, one solution consists in taking into account the age of the family head or, for firms, the number of years the firm has been active in the market as measuring the passage of time in the survey. Thus, the priority of one variable over another is defined by the priority of the agents over which they are measured. For instance, the past and future consumption of some household h in a dynamic model (representing habit and addiction effects) can be instrumented by the expenditures of similar households (by age, education, location, family structure) one year younger or older in the survey8. One drawback is that age is correlated with cohort effects in cross-sections. We propose a formula to correct this cohort effect. We find that such instrumentation is very effective compared to usual instruments such as past and future prices.

8

Note that it is generally necessary when estimating dynamic models to instrument past values of the variables, even when these past values are observed.

14

Section 4. The time structure of cross-sections Causality in economics supposes the anteriority of causes over effects9 (see, for example, Hicks, 1979), which gives rises to Granger-Sims tests for the determination of one variable by another or vice-versa. These tests are usually performed on aggregate time-series, since individual panel data are generally short, due to attrition bias. Even panels on countries or industrial sectors suffer from structural changes or composition effects which make it difficult to maintain the stationarity hypothesis for all variables. This is a serious drawback, since, as Angrist and Krueger (1998) remark in the context of labor economics, the vast majority (up to 90%) of empirical estimation is now carried out using individual data. It may therefore be useful to propose causality tests and dynamic estimation methods using crosssection data, such as the Family Expenditures or Labor Force surveys. In this context, we could analyse the causality between income and family structure, between “structuring consumption”, such as housing or transport expenditures, and other expenditures, or between the labor market situation of the household head and that of other household members. In surveys of firms we could consider the causality between investment and profit, labor conditions and union membership. It will also be of use for more general economic questions, such as the modelling of preference interdependence using data from General Social Surveys. Considering households, one solution consists in taking into account the age of the family head or, for firms, the number of years the firm has been active in the market as measuring the passage of time in the survey. Thus, the priority of one variable over another is defined by the priority of the agents over which they are measured. For instance, the past and future consumption of some household h in a dynamic model (representing habit and addiction effects) can be instrumented by the expenditures of similar households (by age, education, location, family structure) one year younger or older in the survey10. We find that such instrumentation is very effective compared to usual instruments such as past and future prices. One drawback is that age is correlated with cohort effects in cross-sections. To correct this cohort effect, we use the fact that in theories such as the life cycle, both the dependent variable and its determinants are related to the time dimension. For instance, considering savings, income per unit of consumption increases early in the life cycle, then falls as household size increases, and finally increases to a constant level at the end of the life cycle, while the savings rate also varies systematically over the life cycle. Therefore, income and age can substitute locally as concerns their influence on savings. Some types of expenditure may also be related to both the time dimension and family structure. Similar relations as those considered with the head’s age can be found for each of these variables, so that a combination of all may be more efficient to date each household on the time axis. Section 4.1 discusses the choice of an optimal linear combination of time indicators on cross-sections and defines the method to correct for generation effects. 9

Note that Heckman (2000, p.2) prefers to stress the importance of modelling: “The economists are the people of the model”. Within a model, causes are easily defined when they are independent from one another. Whenever they are interdependent, defining causality relies on identification: “ The econometric analysis of the identification problem clarifies the limits of purely empirical knowledge. It makes precise the idea that correlation is not causation by using fully specified economic models as devices for measuring and interpreting causal parameters”. 10 Note that it is generally necessary when estimating dynamic models to instrument past values of the variables, even when these past values are observed.

15

4.1. Substitution between cross-section variables in the time dimension Suppose an agent h is characterised by a vector of socio-economic variables Zh = zkh (some being time invariant, noted Z’, such as the highest diploma obtained, some varying through time, including age, noted Z”), which influence some explained variable xh, for instance savings. Under appropriate hypotheses, maximising a utility function depending on x and other determinants of the agent’s satisfaction, yields x as a function of individual characteristics and other economic variables such as prices: xh=f(Zh, W). In the space of the determinants zk, a compensating substitution can be effected between two characteristics to maintain the agent in the same savings position (whenever the derivatives of x over these characteristics are non-zero). For example, a rich household aged 35 may have the same saving behavior as a poor household aged 50, so that income compensates for age in determining savings. Whenever such compensation occurs, it is as if, under the life cycle saving theory, the two households were at the same point in their life cycle. Income therefore operates some transformation of the time axis such that, considering an individual h with characteristics zkh (including age), we can define a hypothetical individual h’ with the same characteristics as h, but lower relative income, so that h’ behaves (as concerns savings) as if she were one year younger than h. More generally, as long as the Hessian matrix for f is regular, a combination of some of the variables zk may define a hypothetical time dimension in the cross-section, along which dynamic relations and causality can be defined11. This time axis can be interpreted as a life cycle dimension, the position on this axis depending not only on age, but on all variables which influence savings, such as income, education or family structure. The vertical life-cycle axis can also be interpreted as representing the age of a typical individual, as defined by a constant vector of characteristics (other than age, noted Z’’age) and other variables: (Z’’age, W) = A0. Consider the (K+1) linear space where the vertical axis corresponds to this time dimension τ and other axes correspond to characteristics zk (all other determinants W being fixed). Consider now the plane Πτ0 intersecting the time axis at τ0. An isoquant is associated with this time position, since τ corresponds to the age of the typical agent (at point=A0). The isoquant is defined by the substitution between characteristics zk such that x is constant along it. The ratios of the derivatives of f over zk and age measure substitution between the agent’s age and other characteristics so that synthetic time remains equal to τ0. The isoquants along the time axis sum up into a cone in (time, Z) space (this cone depends also on variables W, supposed to be fixed). Each isoquant, corresponding to a definite level on the vertical axis, can be projected onto a plane of characteristics Z, including age, defined by a time position τ0 (see Figure 1). For some level of all characteristics except age, the age differs for these different isoquants, so that they do not intersect. On the plane Πτ0, the projected isoquants corresponding to higher τ are posited at higher levels of age and all characteristics positively related to age (i.e. such that the marginal substitution rate between the two, (∂x/∂zk)/ (∂x/∂age), is negative).

11

If the relationship between the K variables zk and time is linear, it defines an hyperplane of dimension K. A non-linear continuous relation would define a differential manifold.

16

τ

I(t0, τ2) = I(t0, Z2) τ2 I(t0, τ1) = I(t0, Z1)

τ1 ° M1

0 z2 I(τ1) Survey plane

I(τ1) m1 I(τ2) m12

m2 m’1

m2 m’’1

I(τ1+1) z1

I(τi) : Isoquant corresponding to synthetic period τi

Figure 1 : Substitution between individual characteristics and time To make it clear, suppose that a household h=1 is posited at point τ1 of the time axis, which corresponds to its life cycle position. This point pertains to the isoquant I{Zh}=I1 defined by the constant life-cycle position: τ=τ1, so that the gradient of I1 is equal to the substitution rate between the characteristics. Suppose that another agent h=2 is posited at point M2 for a life cycle position τ=τ2 on the plane corresponding to τ2, which differs by k units of time on the life cycle axis (τ2=τ1+k). M2 is on the isoquant I{Z(h)}=I2. Projecting M1 and M2 on points m1 and m2 in the characteristics plane corresponding to some time τ0, m1 and m2 are on distinct isoquants12 (as long as τ2≠τ1). The line between points m1 and m2 intersects with the intermediate isoquant corresponding to time τ=τ1+1 at a point m which indicates the best prediction of the future position of household 1 (compared to agent 2) or the past position of household 2, (k-1) periods before (compared to agent 1)13. The calculated position of household h one period later or before depends on the agent h’ who is compared to Which are the projections of the isoquants on the cone corresponding to τ1 and τ2. Such a construction does not suppose that time on the life cycle axis is a linear function of the (z1, z2), which may require additional hypotheses to estimate position m, such as: the number of periods on the life cycle axis between two vectors of characteristics (z1, z2) and (z1, z2’) on the characteristics plane is a linear function of the difference between the two characteristics vectors. 12 13

17

h. This predicted position will be used as a proxy for agent h one year before or later. This instrumented value for f(Zh, τ-1) plays the same role as the instruments for past values of the endogenous variable when estimating dynamic models14. Suppose that agent h’s level of saving xh=f(Zh,Uh) depends on variables zkh. The change in characteristics dZ = Zh - Zh’ between two cohorts corresponding to points mh and mh’, leads to a first-order change in saving of ∑k ∂f/∂zk.dzk = dZ.β, with βk = ∂f/∂zk. Thus, the point on isoquant I(τh+1) which is paired with household h on isoquant I(τh) can be taken on the line passing through Mh with a direction defined by the βk (which is Omh only if βki/βkj = z ki(h)/ z kj(h), for all ki, kj). To define the agent who is paired with h, we first have to define the changes in characteristics dZ between ages τh and τh+1. These can be estimated in cross-section data between the averages of each variable for different cohorts in the survey (whenever there is no cohort effect with respect to these characteristics). The calibration of the coefficients βk must correct for any cohort effect occurring between the comparing two agents aged τh and τh+1 in the same cross-section. Considering the variable zk, we first estimate βk in the cross-section (βcs, computed between similar agents aged τh and τh+1 in the same survey) and in time-series (βts, for the individual in the same cohort aged τh in the first wave and τh+1 in the second). The generation effect is thus β = βcs- βts, as a change in zk over time changes x by βktsdzk as opposed to the estimated cross-section effect βkcsdzk. Taking the example of income, the estimated coefficient in a cross-section regression reveals the correlation between savings and the individual’s relative income (i.e. their position in the distribution of income at the time of the survey), controlling for their age, education etc. However, age at the time of the survey may be correlated with relative income, giving rise to the usual bias in the estimated coefficients.

14

Other definitions are possible: for instance, the closer point on I(Z(h)) if the isoquant curve is known. The position of some household h one year later may also be defined projecting it (in the space of characteristics Z) on the point on the isoquant of its synthetic age plus one, using the ray Omh to define its future position in the next period as the intersection between the isoquant of period τh+1 and Omh. This supposes that the change in all characteristics between two age groups is homothetic.

18

x=saving rate

survey in period 2

xt0(Z2, age 30)

panel for Z2

DZ.βcs xt0(z2) Panel for Z1 xt0(Z1, age 30)

x(z2) xt0-1(Z2, age 29)

xt0(z1) CE x(z1) xt0(Z2, age 29)

29

30

life cycle time

Survey in period 1

Z: Vector of individual characteristics xt0(Zj, age=a): Observation at t0 of the saving rate for agents characterized by Zj, aged a, as observed in the panel x(Zj): Observation in the survey for individuals with characteristics Zj CE : Cohort Effect DZ.βcs : Change in the saving rate due to change dZ in the explanatory variables

Figure 2: Survey and panel observations over the life cycle

4.2. Estimating dynamic models on cross-sectional data The first method consists in defining, for each agent h in a cohort Ch, an agent S(h) in the same survey with the same observed permanent characteristics Z’ but one year younger. We then correct for the generation effect associated with these characteristics by computing for each variable of interest x its estimated value for an agent in the same cohort Ch, i.e. having characteristics Zh in the previous year. Suppose that savings x depend on variables Z, so that, as a first-order approximation: (i) between two periods for individual h: x(Z h,t)- x(Z h,t-1) = (Z h,t-Z h,t-1). βts +ε h,t- ε h,t-1 (ii) between S(h) and h in period t: x(Z h,t)- x{S(Z h),t)} = (Z h,t-Z S(h),t). βcs +ε h,t- ε S(h),t. Now suppose that Z h,t-1 is equal to Z S(h),t. In order to compare saving by the similar individual S(h) in t to saving by h in t+1 we correct using the following formula, where the residuals are set to zero: Ex(Zh,t-1) = x(ZS(h),t ) +{ZS(h),t-Zh,t}.( βts - βcs)

(10)

19

The coefficients βts can be estimated on aggregate time-series or on a panel or pseudopanel containing at least two periods15. ZS(h),t can be computed as the average on households having the same permanent characteristics as household h. A second method consists in estimating the distance on the time axis between h and each other household of the survey and pairing h with another household or the average of all households distant by one period. The simplest way to define the time distance between two households relies on their age, but this implies, as noted above, cohort effects. Consider the cross-section difference in some variable x between two households h, h’. This is related to the change of the vector of all the explanatory variables zk by the crosssection estimates of the parameters β, and also (through the time-series estimate of β) to their variations between the two positions of agents h and h’ on the synthetic time axis: x(Zh’,t)-x(Zh,t) = [Zh’,t -Zh,t].βcs+εh’-εh = dZ h,t.βts +dε where dZt = dZ1.[ τh’ - τh], dZ1 being the change in explanatory variables for one period over the line defined by Z(h) and Z(h’) in the K-dimensions space16. This allows us to compute the difference in the positions of h and h’ on the time axis: dτhh’ = τh’-τh = ∆c.s.Z. βcs/ dZ1.βts

(11)

with ∆c.s.Z = Zh’-Z h. As dZ1.βts is a first-order measure of the variation in x over one period, ∆c.s.Z. βts/ dZ1.βts measures the time dτ’ necessary to change x from f(Z) to f(Z+∆c.s.Z). The difference (dτ-dτ’) indicates the additional time for the cross-section comparison between agents differing by ∆c.s.Z, corresponding to the effect of all non-monetary resources (information, time budget etc.) and constraints (such as the liquidity constraints correlated with Z in crosssections) which are, in the cross-section dimension, related to this difference in characteristics. This may also be interpreted as the influence of the change in the shadow prices πv corresponding to these resources and constraints: (dτ-dτ’). = ep. ∆πv with ep the vector of direct and cross-price propensities. So the distortion of the synthetic price axis depends on the price effect related to the positions of agents in the characteristics space17. Formula (10) shows corrected savings for a similar agent observed in the same survey, while formula (11) allows us to calculate (under a hypothesis defining dZ1) the movement on the time axis between the two agents and to pair agents according to their time position, for instance such that dτhh’ = 118. 15

Note that the estimation of dynamic models on time-series requires at least four periods to instrument the lagged endogenous variable when some endogeneity is suspected. Whenever the coefficients β are used to define the endogeneous variable, they can be calibrated over another data-set. 16 dZ1 can be calibrated on aggregate time-series or between averages of reference populations using two surveys. For instance, income growth can be calibrated over the whole population (on aggregate time series) or between two surveys for some sub-population. For age, dln(age) = ln (ageh/ageh-1). For the proportion of children in the family, one can calculate dpr = pr(ageh)- pr(ageh-1)+dp, with the first term computed on the cross-section and the second dp is the average variation between t and t-1 and is computed over the whole population or for the household’s reference population. 17 Note that equation (1) can be interpreted along the same lines. 18 These pairings may be compared to simple pairing by age.

20

The time scale is independent of agents h and h’ who are being compared: first, the time lag dτh,h’ is symmetric, as is clear from the symmetry of ∆c.s.Z in formula (11). Second, it is additive - dτh,h’’ = τh,h’ + dτh’,h’’ - as is also clear from the linearity of (11). These properties are sufficient to define uniquely a time scale up to the choice of the origin. Suppose for example that only the age of the head changes between two periods or two households, with the same coefficient in the two dimensions: βcs(age) = βts(age). In this case, E(dthh’) = ∆(Z h’ - Z h’)/ dZ1 = Age h’ - Age h. If βcs(age) > βts(age), the cohort effect is positive and the difference between h and h’ on the time axis is greater than their age difference because of this cohort effect. The effect of a difference in income between two households on the time axis can be analyzed similarly. For example, for food at home and considering only income elasticities (which can, for Poland, be calibrated at 0.5 in crosssections and 0.8 in time-series, see Gardes et al., 2002), dτ = 6 years when comparing h aged 30 with income yh and other characteristics Z’h and household h’ aged 30 with income yh’ = 2yh and the same characteristics: dτ hh’ = -.1 ∆csy / -.25 g (we suppose that income increases by g=5% each year at this age). Thus, the time distance between households increases when g decreases, because it will take longer for h to attain the income position of h’. Note that, due to the correction by the cross-section and time series elasticities, 6 years is less than the ratio 14 necessary to double income with an increase of 5% per year (i.e. for the same income elasticity on cross-section and time-series). The synthetic time scale depends on the endogenous variable being analyzed. Nevertheless, we can imagine relationships between the time scales corresponding to different expenditures because of the additivity (or other types of) constraint. When considering for instance different expenditures i=1 to n, with coefficients βi estimated under the additivity constraint (for instance ∑ iβi =0 for all variables zk except income in the Almost Ideal Demand System), one obtains from equation (1) if only zk changes or if all variables change proportionally: dZ1.∑ iβ tsi dτi = ∆c.s.Z.∑ i βcsi = 0 ⇒ ∑ iβ tsi dτi = 0, so that for n=2: β1 = -β2 ⇒ dτ1 = dτ2 and for n=3: dτ3 = dτ1.{ β1 / (β1 + β2 )}+ dτ2.{ β2 / (β1 + β2 )}. Finally, the first method can be applied to all similar agents aged one year less than household h, correcting the cohort effect by (10), then estimating a dynamic model by instrumenting past values of the variable to which (10) applies, either by the average corrected x for similar agents or by one of the set of similar agents chosen by minimising some distance. The second method consists in estimating the time distance between agents, thus pairing agent h with some h’ (or all h’) at unit time distance. A dynamic relation can also be estimated over all agents ordered along the synthetic time dimension (with appropriate modelling of the partial adjustment according to the time distance between two consecutive agents).

4.3. Defining a global time index over the whole space As the cross-section and time-series propensities change from one point to another over the consumption space, the time index defined by equation (11) is in fact attached to a

21

particular point or to a sub-space where the propensities are constant. It is thus necessary to examine how the point-time measures can be joined together over the whole space or a subset of the space. A natural idea consists in associating the time indices along a continuous curve corresponding to a geodesic, since in a Riemannian space, metrics can be continuously associated along each continuous curve in the space: in this way, each situation can be time related to those which correspond to other periods in the agent’s life cycle. This is left for future work.

22

Section 5. Empirical applications In this section, we use formula (10) to correct for cohort effects between households pertaining to different generations: dynamic models are estimated on the Polish surveys in section 5.1, and the equations of geodesics in section 5.2. In Gardes (2005), we use the second formula (11) defining the synthetic cohorts in order to get similar estimates of both models. 5.1. Estimating addiction effects on cross-sections. In the simple version of the Becker-Murphy model, the consumer is supposed to maximize the present value of her intertemporal additive utility Σtβt-1U(Ct,Ct-1,Yt,et) under the inter-temporal budget constraint Σtβt-1(Yt+PtCt)=A0 with C fixed at t=0. This yields, for quadratic utility, a dependency of current expenditures on past and future consumption, as well as on current prices and current and future values of the variables et entering directly in the utility function (and not through their effects on current consumption):

Ct = θ Ct-1 + βθ Ct+1 + θ 1 P t + θ2 e t + θ3 e t+1+ u t

(Becker, 1996, p.89, equation 5.4)

In this specification, past and future expenditures must be instrumented (because of the autocorrelation of the residual). The instruments used (past income and prices) are typically very inefficient in applications on aggregate time-series. With individual-level data, three waves are required to estimate the reduced equation, with the usual difficulties for estimating dynamic models on panel data. In previous estimations, either on aggregate or individual data, an important parameter: the inter-temporal substitution rate, is poorly estimated, with highly implausible values (such as 300% per year for Chalupka’s 1998 estimation on individual data). We have proposed (Gardes and Starzec, 2002) the use of a new type of instrumentation, considering the next or previous cohorts defined by the same permanent characteristics as h, H-h,t and H+h,t, to individual h observed in the same period t (i.e. all households with the head one year younger or older than h). Agent h’s past and future expenditures Ch,t-1 and Ch,t+1 are instrumented by the average expenditures at t of H-h,t and H+h,t E t(Ch,t-1)=Σh’∈H-h,t[1/Card(H-h,t)].C(h’,,t) with the same holding for future consumption. We then correct for the generation effect associated with these characteristics by means of equation (1)19. This estimation is carried out on Polish panel data for tobacco expenditures (see the Appendix for a description of the data-set, and Gardes and Starzec, 2002, for details of the estimation strategy). The first classic instrumentation uses past, present and future prices (Ia) eventually combined with household income (Ib), and is estimated on the first differences to cancel out the individual fixed effects. No selection bias appears. The coefficients (Table 3) of past and future tobacco expenditures are positive and very significant, as rational addiction 19

This correction amounts to 8% of expenditures.

23

would imply. On the contrary, the rate of time preference β is the same for estimations in levels and first differences. The second instrumentation based on cohorts observed in the same year and corresponding to the household (aged one year less or more), gives similar but much more precise results. The implied yearly rate of time preference β is around 32%. This is a very encouraging result, as this instrumentation allows us to estimate dynamic models on crosssections without any retrospective questions (often imprecisely recorded, as shown for instance in PSID data).

Table 1 Estimation Results for per U.C. tobacco expenditures

Model

First differences Ia Ib II ____________________________________________________

C t-1 C t+1 βt

Mills Ratio

0.239

0.211

0.323

(0.085)

(0.080)

(0.021)

0.102

0.127

0.245

(0.021)

(0.071)

(0.019)

1.352

0.659

0.318

(2.052)

(1.224)

(0.195)

-0.318

-0.505

-2.420

(2.336)

(2.338)

(1.078)

IV prices prices income cohort age ______________________________________________________________________________________ Population: Households with head aged 23 to 81, with positive expenditure on food at home and tobacco for one of the 4 years. Model instrumentation: Ia: past, present, future prices Ib: past, present, future prices log income II: by generation (age, education, income quartile,). Cohort effects corrected by formula (3). Surveys: IIc: 1988 and 1989 waves of the 1987-1990 panel (3630 households); IId: 1988 survey (30 000 households). Other explanatory variables: I and IIc: log of age and its square, proportion of children, year dummies; IId: age, region, education, social group , family type, income quartile consumption and income deflated by an equivalence scale. Standard errors in parentheses. Remark: The variance needs to be corrected due to the use of aggregate explanatory variables (see Moulton, 1986). This correction may increase the variances of all parameters. Source: Gardes and Starzec, 2002.

5.2. Estimation of the Christoffel symbols along geodesics

24

Consider the curve in the consumption space of different positions of a household in his life cycle (as indicated by the curves xt0(zk) in figure 2). We can identify this curve with a geodesic between its extreme points, as the household is supposed to maximize its satisfaction over the life cycle. Indeed, this maximization is made under all the constraints which apply to the household at each period of its life cycle, and considering both the monetary and the nonmonetary ressources of the household. These constraints and non-monetary ressources are represented by the shadow prices on each point of this geodesic. Therefore, optimizing its behavior under these constraints defines the optimal path as the geodesic over the life cycle of the household. A typical household can be defined by a set of characteristics, some of which are permanent and other change according to the normal evolution along the life cycle. On a survey, the different positions of such a typical household can be observed as the average of all households at each point of the life cycle path. Between two period along this life cycle path, the changes in the bundle of goods are biased by generation effects, which can be removed by the formula which relates these two types of partial derivatives: ∇kvi = ∂kvi + Γkihvh (4) whenever we have some information on the cross-section and time-series consumption laws. If we estimate on a single survey, we do not have information on tiemseries parameters, but we can estimate the Christoffel symbols Γkih using the equation of geodesics: d2vi/dt2 + Γkih(dvh/dt) (dvk/dt) = 0

(5)

with vi=budget share and h=logarithmic total expenditure per Unit of Consumption and logarithmic number of U.C. Thus, compute time-series estimates of the derivatives ∂gij/∂vk using the cross-section estimates of the covariant derivatives and the equation which relates these two types of partial derivatives: ∇kvi = ∂kvi + Γkihvh. A convergence process on the Γkih thus allows to compute the time derivatives ∂kvi, knowing the cross-section derivatives and using the geometric properties of the consumption space described by the Christoffel symbols. The derivatives dvk/dt can be estimated between two close points M and M+dM if an index of time is defined between these points. Suppose the difference on the time scale discussed in section 1 between points M0, M1 and M2 are noted ∆τ(M0,M1)= ∆τ1 and ∆τ(M0,M2)=∆τ2. The time index being additive, we have: ∆τ(M0,Mi) + ∆τ(Mi,M’i) = ∆τ(M0,M’i), j=1,2. So, with vj=v(Mj) and v’j=v(M’j), (v’2-v1)/ ∆τ(M1,M’2) - (v2-v0)/ ∆τ(M0,M’2) and (v’1-v2)/ ∆τ(M2,M’1) - (v1-v0)/ ∆τ(M0,M’1) are discrete measures of d2vi/dt2 which can be used to estimate Γkih in equation (5)20. 20

Note that this method also uses the true time dimension (between periods), as the time structure of crosssection is defined according to the true time gradient of variables vk.

25

Households are grouped by age classes of two years in the 1990 Polish survey. Each cell contains 140 households in average. The series of these 25 cells are considered as describing a geodesic for the average Polish household. The derivatives dvk/dt and the acceleration d2vk/dt2 are computed between consecutive points on the average geodesic, which allows to estimate the system of equations (5). The cross-section income elasticities for Food at home expenditures and other expenditures are thus corrected using the estimated Γkih, and compared to time-series parameters computed on the four years panel. For food at home, which represents one half of total expenditures, the indices T1 (calculated for all variables) and T123 (calculated only for the term corresponding to the product of income and family size) have the same sign as the difference between the crosssection and time-series elasticities (Table 3). Their magnitude is similar to this difference, which is curious as this difference must be compared to T1/w and T123/w, with w the average budget share. For the other expenditures, there is a strong correlation between the difference of the elasticities and the indices, specially index T1: for five items over seven (for which the difference between the elasticities is significant) the signs are thes ame, and the two expenditures with different sign are Food away (an expenditure which is very particular in Poland) and alcohol and tobacco expenditures (which are not well measured in households surveys). Moreover, the order of the eleven expenditures according to the difference between the elasticities is very similar to the order according to the index T1: the average difference of the rank is 1.24, compared to a random difference which would be 3.75. The average levels of T1 and the elasticity difference are also similar (and for six cases, the elasticity difference is very close to T1), which is not expected since the elasticity difference corresponds to the ratio of T1 and the budget shares. Table 2 Income Elasticities and Correction Factors (1) Ec.s.-Et.s. Food at Home Food Away AlcoholTobacco Housing Energy Clothing Transport ation Health Culture Various expenditu res Financial Goods

(2) T1

First Correction Factor

(3) T123

Second Sign Correction (1)/(2) Factor

Sign (1)/(3)

-0.35

-0.550

-1.135

-0.241

-0.498

Yes

Yes

(+0.11)

-0.026

-6.50

0.111

27.765

No

Yes

+0.42

-0.049

-1.55

0.123

3.878

No

Yes

(+0.06) -0.58 (+0.01) +0.97

-0.240 -0.112 -0.071 1.211

-2.44 -2.82 -0.76 18.69

0.477 -0.095 0.112 0.427

4.851 -2.389 1.194 6.594

No Yes No Yes

Yes Yes Yes Yes

(-0.07) +0.35 (+0.20)

-0.144 0.518 0.044

-5.48 6.82 1.41

0.029 -0.307 0.066

1.099 -4.046 2.120

Yes Yes Yes

No No Yes

+0.74

1.006

19.88

-1.034

-20.425

Yes

No

26

(1) Estimation on the Polish Panel, 1987-1989 Between parentheses: difference between elasticities is not well estimated T1= Γkihvh with i=budget share, k= logarithmic total expenditure and h=logarithmic total expenditure per Unit of Consumption and logarithmic number of U.C. and the budget share. T123= Γkihvh with i=budget share, k= logarithmic total expenditure and h=logarithmic number of U.C.

Considering only food at home and all other expenditures, table 4 shows that the corrected (by T123) cross-section elasticity are close to those which have been estimated on the panel.

Table 3 Cross-section and time-series income elasticities

Food at Home Other expenditures

Cross-section Elasticity* 0.451 (0.)

Correction Factor** 0.498. (0.079)

Corrected elasticity 0.949 ()

Time-series Elasticity* 0.805 ()

1.475 ()

-0.470 (0.079)

1.005 ()

1.169 ()

Population: Households with head aged more than 20, with positive expenditure on food at home for all four years. Surveys: * four waves of the 1987-1990 Polish panel (3630 households): Between and Within estimations (instrumented total expenditures). ** 1990 survey. Equation of the geodesic: d2vi/dt2 + Γkih(dvh/dt) (dvk/dt) = 0, with vi=budget share and h=logarithmic total expenditure per Unit of Consumption and logarithmic number of U.C. Correction of cross-section parameters: Ecs - Γkihvh, with h= total expenditures.

Finally, the estimation of geodesics along the life-cycle, although it has been performed very roughly, has allowed to classify the expenditures according to the endogeneity bias, and to predict the sign of this bias, using only the information given by one survey. For expenditures characterized by high values of this bias (very different cross-section and timeseries parameters), the magnitude of the bias is close to the index T1. Therefore, the estimation of the Christoffel symbols along an average geodesic on one survey gives an information on the difference between cross-section and time-series food consumption laws and may allow to estimate unbiased parameters using only one survey.

27

Conclusion

We have shown that the consumption space has a Riemannian structure, which allows us to connect the time dimension of consumption laws to the social distribution of consumption expenditure in the population. Riemannian curvature of consumption means that there exists, for the social distribution of consumption, path-dependency with respect to the order of the changes in the different variables influencing consumption choices. For instance, comparing a couple of two adults to a family with children in cross-section allows us to compute an equivalence scale which may depend not only on income levels, but also on income changes, in the sense that an increase in family size ds for a family of size s and income y1= y0+dy may not give rise to the same levels of expenditure as an increase in income dy for a family with size s+ds and income y0 (whenever the condition for point integrability, i.e. the symmetry of Christoffel symbols, does not hold). Thus, considering the Riemannian structure of the expenditure space may be useful for Lewbel’s problem of identification in equivalence scale models21. Second, the impossibility of defining a unique metric for the whole population means that usual econometric estimations of consumption laws on cross-section data are misleading. This is due to the fact that local conditions of choice, which correspond to local shadow prices, are not taken into account in the estimation. In a sense, the Riemannian curvature of the consumption space can be related to social heterogeneity, and the change of the basis from one point to another in this space indicates the variation in the various constraints and nonmonetary resources which influence consumer choice by means of the associated shadow prices. The conditions of choice depend on the situation of the agent, i.e. his or her location in Riemannian space. Barten’s (1964) discussion of the change in relative monetary prices due to changes in family composition can be considered as an example of this relationship. We have shown that the substitution between time and the socio-economic determinants of household behavior makes it possible to estimate dynamic models on crosssectional individual data. The position of individuals on the synthetic time axis also provides a natural distance between them, which can be related to the differences which are observed between their expenditure patterns. Third, preference interdependencies are generally estimated by adding the average quantities consumed by similar households xH,t (see Gardes and Montmarquette, 2003, for a discussion of this specification). This model is highly implausible, though it is generally estimated as significant. In the model we propose in section 3, this variable corresponds to the instrumented past or future expenditures of the household, so that the so-called interdependence effect can be interpreted as a habit or addiction effect of past and future expenditures. These effects may be inversely related to the distance between similar consumers, which is measured by the time distance between the household and its past and future expenditures22.

21

I am grateful to Alain Trognon for this suggestion. A test of this interpretation would be to estimate the parameters of the autoregressive operator A over the synthetic time scale, in a model: xh,t = A(L) xH,t +W h,t.β + ε h,t.

22

28

References My articles can be consulted on my personal page: membres.lycos.fr/fgardes Angrist, J.D., Krueger, A.B., 2000, Empirical Strategies in Labor Economics, Handbook of Labor Economics, North Holland. Barten, A.P., 1964, Family Composition, Prices and Expenditure Patterns, in P. Hart, G. Mill and J. Whittaker (ed.), Economic Analysis for National Income Planning, 16th Meeting of the Colson Society, London, Butterworth. Bettendorf, L., Barten, A.P., 1995, Rationnement dans les Systèmes de demande: Calcul des Prix Virtuels, Economie et Prévision, n° 121, 101-108. Cardoso, N. and Gardes, F., 1996, Estimation de lois de consommation sur un pseudo-panel d’enquêtes de l’Insee (1979, 1984, 1989), Economie et Prévision 5, 111-125. Delachet A., 1969, Le Calcul Tensoriel, Presses Universitaires de France. Diaye, M.A., Gardes, F., Starzec, C., 2001, The World According to Garp: Non-parametric Tests of Demand Theory and Rational Behavior, w.p. Crest 2001-27. Gardes, F., 2005, The time Structure of Cross-Sections, w.p. University Paris I-Cermsem.. Gardes, F., Duncan, G., Gaubert, P., Starzec, C., 2005, Panel and Pseudo-Panel Estimation of Cross-Sectional and Time-Series Elasticities of Food Consumption: The Case of U.S. and Polish data, Journal of Business and Economic Statistics, vol. 23, n° 2, April, 242-253.. Gardes, F., Langlois, S. and Richaudeau, D.,1996, Cross-section versus time-series income elasticities, Economics Letters 51, 169-175. Gardes,F., Montmarquette, C., How Large is your Reference Group?, 2002, w.p. Cirano, December 2002 and Crest 2003. Gardes, F., Starzec, C. 2002, Evidence on Addiction Effects from Households Expenditure Surveys: the Case of the Polish Panel, Econometric Society European Meeting, Venice, August 2002. Gardes, F., Starzec, C. 2003, Estimating Equivalence Scales on a Panel, w.p. Crest-Insee. Granger, C.W.J., 1969, Investigating Causal Relations by Econometric Models and CrossSpectral Methods, Econometrica, 37, 3, July, pp. 424-439. Gray, , Modern Differential Geometry, second edition, 2001. Heckman, J.L., 2000, Causal Parameters and Policy Analysis in Economics: A Twentieth Century Retrospective, Quarterly Journal of Economics, February, pp. 45-97. Hicks, J., 1979, Causality in Economics, Basil Blackwell, Oxford. Kordos J., Kubiczek A. 1991, Methodological Problems in the Household Budget Surveys in Poland, GUS, Warsaw. Lewbel, A., 1989, Household Equivalence Scales and Welfare Comparisons, Journal of Public Economics, vol. 39, 377-391. Lichnerowitz A.,1946, Eléments de Calcul Tensoriel, Gauthiers-Villars, new edition, Gabay. Malinvaud, E., Statistical Methods of Econometrics, 1966, North Holland, Amsterdam. Marriot, P.K., Salmon, M., 2000, Applications of Differential Geometry in Econometrics, Cambridge University Press, Cambridge. Morgan, F.,1998 , Riemannian Geometry, A.K. Peters, Wellesley, Massachusetts. Mundlak, Y., On the Pooling of Time Series and Cross Section Data, Econometrica 46 (1978) 483-509. Neary, J.P., Roberts, K.W.S., 1980, The Theory of Household Behaviour under rationing, European Economic Review, 19, pp. 25-42.

29

Appendix: The Polish data Household budget surveys have been conducted in Poland for many years. In the period analyzed, the annual total sample size was about 30 thousand households, which represent approximately 0.3% of all households in Poland. The data were collected by a rotation method on a quarterly basis. The master sample consists of households and persons living in randomly selected dwellings. This was generated by, a two-stage, and in the second stage, two-phase sampling procedure. The full description of the master sample generating procedure is given by Kordos and Kubiczek (1991). Master samples for each year contain data from four different sub-samples. Two subsamples started to be surveyed in 1986 and finished the four-year survey period in 1989. They were replaced by new sub-samples in 1990. Another two sub-samples of the same size were started in 1987 and followed through 1990. Over this four years period on every annual subsample it is possible to identify households participating in the surveys during all four years. The checked and tested number of households is 3736. However 3630 households remain in the data set after deleting households with missing values. The available information is as detailed as in the cross-section surveys: the usual socio-economic characteristics of households and individuals, as well as information on income and expenditures. A large part of this panel containing demographic and income variables is included in the comparable international data base of panels in the framework of the PACO project (Luxembourg) and is publicly available. Prices and price indices are those reported by the Polish Statistical Office (GUS) for main expenditure items. They are observed quarterly and differentiated by 4 social categories: workers, retired, farmers, and dual activity persons (farmers and workers). This distinction implicitly covers the geographical distribution: workers and the retired live mostly in large and average size cities, farmers live in the countryside and dual activity persons live mostly in the countryside and in small towns. For food, price variations are taken into account at the individual observation level. The period 1987-1990 covered by the Polish panel is unusual even in Polish economic history. It represents the shift from a centrally planned, rationed economy (1987) to a relatively unconstrained fully liberal market economy (1990). GDP grew by 4.1% between 1987 and 1988, but fell by 0.2% between 1988 and 1989 and by 11.6% between 1989 and 1990. Price increases across these pairs of years were 60.2%, 251.1% and 585.7%, respectively. Thus, the transitory years 1988 and 1989 produced a period of a very high inflation and a mixture of a free-market, shadow and administered economy.

30