Multi Factors Model - Yats.com

Mar 31, 2009 - 2 Estimating using Ordinary Least Square regression. 3. 3 Multicollinearity. 6 .... sigma.explained = sqrt(var.explained) * sqrt(252) * 100 ... ˆ The Variance explained by the factors is 0.000375 and total volatility (yearly) is 30.7%.
237KB taille 21 téléchargements 296 vues
Multi Factors Model ∗ Daniel Herlemont

March 31, 2009

Contents 1 Introduction

1

2 Estimating using Ordinary Least Square regression

3

3 Multicollinearity

6

4 Estimating Fundamental Factor Models by Orthogonal Regression

7

5 References

1

11

Introduction

The objective of this practical work is to provide an empirical case study of factor decomposition using historical prices of two stocks (Nokia and Vodafone) and four fundamental factors: ˆ a broad market index, The New York Stock Exchange (NYSE) composite index, ˆ an industry factor, a Mutual Communication fund, ˆ a growth style factor, the Riverside growth fund and ˆ a large caps factor, the AFBA Five Star Large Cap fund. ∗

source: Carol Alexander, see [1], study case II.1.4

1

1 INTRODUCTION Download the data at /downloads/alexander-case-study-II-1-4.csv the data to your working directory and read them by the command quotes=read.csv("alexander-case-study-II-1-4.csv") This work can be performed under Excel (download the package /downloads/matrix.zip Use the following code to read the data and plot the prices > > > > > > > +

dates = as.Date(quotes[, 1], "%d/%m/%Y") prices = quotes[, -1] prices = apply(prices, 2, function(p) p/p[1]) n = ncol(prices) matplot(dates, prices, type = "l", col = 1:n, lty = 1:n, xaxt = "n") axis.Date(1, dates) legend(min(dates), max(prices), colnames(prices), col = 1:n, lty = 1:n, cex = 0.7)

Daniel Herlemont

2

1.2

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION

0.2

0.4

0.6

prices

0.8

1.0

Vodafone Nokia NYSE.Index Communications Growth Large.Cap

2001

2002

2003

2004

2005

2006

dates

Using regression to build a multi factor model with these factors gives rise to some econometric problems. The main problem is related to multi-collinearity. The proposed solution is to use orthogonal regression.

2

Estimating using Ordinary Least Square regression

The following commands compute the returns and transform to a data frame to facilitate regression using R. > r = apply(prices, 2, function(p) diff(p)/p[-length(p)]) > r = data.frame(r)

Daniel Herlemont

3

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION Then we can perform a regression of stocks against the risk factors > reg.Vodafone = lm(Vodafone ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.Vodafone) Call: lm(formula = Vodafone ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Residuals: Min 1Q Median -0.110331 -0.009820 -0.000308

3Q 0.009155

Max 0.131810

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7.16e-05 5.32e-04 -0.13 0.8930 NYSE.Index 8.69e-01 1.47e-01 5.91 4.4e-09 *** Communications 1.44e-01 5.14e-02 2.81 0.0051 ** Growth 2.04e-01 1.19e-01 1.71 0.0869 . Large.Cap 1.01e-02 1.35e-01 0.07 0.9403 --Signif. codes: 0 ´ S***ˇ S 0.001 ´ S**ˇ S 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 ´ S ˇ S 1 Residual standard error: 0.0194 on 1326 degrees of freedom Multiple R-squared: 0.348, Adjusted R-squared: 0.346 F-statistic: 177 on 4 and 1326 DF, p-value: reg.Nokia = lm(Nokia ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.Nokia) Call: lm(formula = Nokia ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Residuals: Min Daniel Herlemont

1Q

Median

3Q

Max 4

2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION -0.175062 -0.009665 -0.000142

0.008843

0.217256

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000217 0.000620 0.35 0.73 NYSE.Index -0.260330 0.171240 -1.52 0.13 Communications 0.265789 0.059836 4.44 9.7e-06 *** Growth 0.209248 0.138489 1.51 0.13 Large.Cap 1.142582 0.157037 7.28 5.9e-13 *** --´ ˇ Signif. codes: 0 ´ S***ˇ S 0.001 ´ S**ˇ S 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 S S 1 Residual standard error: 0.0226 on 1326 degrees of freedom Multiple R-squared: 0.468, Adjusted R-squared: 0.467 F-statistic: 292 on 4 and 1326 DF, p-value: > > + > > > > >

w = c(0.25, 0.75) rptf = 0.75 * r[, "Nokia"] + 0.25 * r[, "Vodafone"] covFactors = cov(r[, c("NYSE.Index", "Communications", "Large.Cap", "Growth")]) beta = 0.75 * reg.Nokia$coef[-1] + 0.25 * reg.Vodafone$coef[-1] var.explained = t(beta) %*% covFactors %*% beta var.total = sd(rptf)^2 sigma.total = sd(rptf) * sqrt(252) * 100 sigma.explained = sqrt(var.explained) * sqrt(252) * 100

Daniel Herlemont

5

3 MULTICOLLINEARITY ˆ the total variance of the portfolio is 0.00072 and total volatility (yearly) is 42.6% ˆ beta

NYSE.Index Communications 0.0220 0.2354

Growth 0.2079

Large.Cap 0.8595

ˆ The Variance explained by the factors is 0.000375 and total volatility (yearly) is 30.7%

Comments ?

3

Multicollinearity

Multicollinearity refers to the correlation between the explanatory variables in a regression model: if one or more explanatory variables are highly correlated then it is difficult to estimate their regression coefficients. The multicollinearity problem becomes apparent when the estimated change considerably when adding another (collinear) variable to the regression. When high multicollinearity is present, confidence intervals for coefficients tend to be very wide and tstatistics tend to be very small. Coefficients will have to be larger in order to be statistically significant, i.e. it will be harder to reject the null when multicollinearity is present. There is no statistical test for multicollinearity, but a useful rule of thumb is that a model will suffer from it if the square of the pairwise correlation between explanatory variables is greater than the multiple R2 of the regression. Todo: perform regression of the Nokia and Vodafone using ˆ one factor: NYSE.Index ˆ 2 factors: NYSE.Index and Communications ˆ 3 factors: NYSE.Index and Communications and Growth ˆ 4 factors: NYSE.Index and Communications and Growth and Large.Cap

Explain the results, using the correlation matrix of the factors > r.factors = r[, c("NYSE.Index", "Communications", "Growth", "Large.Cap")] > cor.factors = cor(r.factors) > cor.factors

Daniel Herlemont

6

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION NYSE.Index Communications Growth Large.Cap NYSE.Index 1.000 0.689 0.844 0.909 Communications 0.689 1.000 0.880 0.834 Growth 0.844 0.880 1.000 0.892 Large.Cap 0.909 0.834 0.892 1.000

4

Estimating Fundamental Factor Models by Orthogonal Regression

The best solution to a multicollinearity problem is to apply principal component analysis and then use the principal components as explanatory variables. We apply principal component analysis to the covariance matrix of the factors: > pca = prcomp(r.factors) > pca Standard deviations: [1] 0.031355 0.008992 0.004167 0.002782 Rotation: PC1 PC2 PC3 PC4 NYSE.Index 0.2588 -0.6099 -0.0966 0.7427 Communications 0.7963 0.5640 -0.1407 0.1674 Growth 0.3915 -0.2687 0.8447 -0.2472 Large.Cap 0.3817 -0.4875 -0.5074 -0.5993 > summary(pca) Importance of components: PC1 PC2 PC3 PC4 Standard deviation 0.03 0.009 0.004 0.003 Proportion of Variance 0.90 0.074 0.016 0.007 Cumulative Proportion 0.90 0.977 0.993 1.000 > plot(pca)

Daniel Herlemont

7

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION

6e−04 4e−04 0e+00

2e−04

Variances

8e−04

pca

Alternatively we can use eigen(cov(r.factors)). todo: using the first component (maybe the 2 main components) compute the explained variance by the components. Conclusions ?

Daniel Herlemont

8

4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION Solutions: > > > > > > > > >

pc1 = pca$rotation[, 1] pc2 = pca$rotation[, 2] pc3 = pca$rotation[, 3] pc4 = pca$rotation[, 4] pc1r = apply(r.factors, pc2r = apply(r.factors, pc3r = apply(r.factors, pc3r = apply(r.factors, summary(lm(r[, "Nokia"]

1, function(x) 1, function(x) 1, function(x) 1, function(x) ~ pc1r))

sum(x sum(x sum(x sum(x

* * * *

pc1)) pc2)) pc3)) pc4))

Call: lm(formula = r[, "Nokia"] ~ pc1r) Residuals: Min 1Q Median -0.182175 -0.009307 -0.000295

3Q 0.008892

Max 0.201183

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000275 0.000628 0.44 0.66 pc1r 0.662287 0.020043 33.04 |t|) 0.28 0.78 33.27 < 2e-16 *** -4.39 1.2e-05 *** 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 ´ S ˇ S 1

Residual standard error: 0.0228 on 1328 degrees of freedom Multiple R-squared: 0.459, Adjusted R-squared: 0.458 F-statistic: 563 on 2 and 1328 DF, p-value: summary(lm(r[, "Vodafone"] ~ pc1r)) Call: lm(formula = r[, "Vodafone"] ~ pc1r) Residuals: Min 1Q Median -0.112669 -0.010215 -0.000164

3Q 0.009569

Max 0.126809

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000140 0.000548 0.26 0.8 pc1r 0.423424 0.017470 24.24 |t|) -0.04 0.97 24.89