Multi Factors Model ∗ Daniel Herlemont
March 31, 2009
Contents 1 Introduction
1
2 Estimating using Ordinary Least Square regression
3
3 Multicollinearity
6
4 Estimating Fundamental Factor Models by Orthogonal Regression
7
5 References
1
11
Introduction
The objective of this practical work is to provide an empirical case study of factor decomposition using historical prices of two stocks (Nokia and Vodafone) and four fundamental factors: a broad market index, The New York Stock Exchange (NYSE) composite index, an industry factor, a Mutual Communication fund, a growth style factor, the Riverside growth fund and a large caps factor, the AFBA Five Star Large Cap fund. ∗
source: Carol Alexander, see [1], study case II.1.4
1
1 INTRODUCTION Download the data at /downloads/alexander-case-study-II-1-4.csv the data to your working directory and read them by the command quotes=read.csv("alexander-case-study-II-1-4.csv") This work can be performed under Excel (download the package /downloads/matrix.zip Use the following code to read the data and plot the prices > > > > > > > +
dates = as.Date(quotes[, 1], "%d/%m/%Y") prices = quotes[, -1] prices = apply(prices, 2, function(p) p/p[1]) n = ncol(prices) matplot(dates, prices, type = "l", col = 1:n, lty = 1:n, xaxt = "n") axis.Date(1, dates) legend(min(dates), max(prices), colnames(prices), col = 1:n, lty = 1:n, cex = 0.7)
Daniel Herlemont
2
1.2
2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION
0.2
0.4
0.6
prices
0.8
1.0
Vodafone Nokia NYSE.Index Communications Growth Large.Cap
2001
2002
2003
2004
2005
2006
dates
Using regression to build a multi factor model with these factors gives rise to some econometric problems. The main problem is related to multi-collinearity. The proposed solution is to use orthogonal regression.
2
Estimating using Ordinary Least Square regression
The following commands compute the returns and transform to a data frame to facilitate regression using R. > r = apply(prices, 2, function(p) diff(p)/p[-length(p)]) > r = data.frame(r)
Daniel Herlemont
3
2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION Then we can perform a regression of stocks against the risk factors > reg.Vodafone = lm(Vodafone ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.Vodafone) Call: lm(formula = Vodafone ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Residuals: Min 1Q Median -0.110331 -0.009820 -0.000308
3Q 0.009155
Max 0.131810
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7.16e-05 5.32e-04 -0.13 0.8930 NYSE.Index 8.69e-01 1.47e-01 5.91 4.4e-09 *** Communications 1.44e-01 5.14e-02 2.81 0.0051 ** Growth 2.04e-01 1.19e-01 1.71 0.0869 . Large.Cap 1.01e-02 1.35e-01 0.07 0.9403 --Signif. codes: 0 ´ S***ˇ S 0.001 ´ S**ˇ S 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 ´ S ˇ S 1 Residual standard error: 0.0194 on 1326 degrees of freedom Multiple R-squared: 0.348, Adjusted R-squared: 0.346 F-statistic: 177 on 4 and 1326 DF, p-value: reg.Nokia = lm(Nokia ~ NYSE.Index + Communications + Growth + + Large.Cap, data = r) > summary(reg.Nokia) Call: lm(formula = Nokia ~ NYSE.Index + Communications + Growth + Large.Cap, data = r) Residuals: Min Daniel Herlemont
1Q
Median
3Q
Max 4
2 ESTIMATING USING ORDINARY LEAST SQUARE REGRESSION -0.175062 -0.009665 -0.000142
0.008843
0.217256
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000217 0.000620 0.35 0.73 NYSE.Index -0.260330 0.171240 -1.52 0.13 Communications 0.265789 0.059836 4.44 9.7e-06 *** Growth 0.209248 0.138489 1.51 0.13 Large.Cap 1.142582 0.157037 7.28 5.9e-13 *** --´ ˇ Signif. codes: 0 ´ S***ˇ S 0.001 ´ S**ˇ S 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 S S 1 Residual standard error: 0.0226 on 1326 degrees of freedom Multiple R-squared: 0.468, Adjusted R-squared: 0.467 F-statistic: 292 on 4 and 1326 DF, p-value: > > + > > > > >
w = c(0.25, 0.75) rptf = 0.75 * r[, "Nokia"] + 0.25 * r[, "Vodafone"] covFactors = cov(r[, c("NYSE.Index", "Communications", "Large.Cap", "Growth")]) beta = 0.75 * reg.Nokia$coef[-1] + 0.25 * reg.Vodafone$coef[-1] var.explained = t(beta) %*% covFactors %*% beta var.total = sd(rptf)^2 sigma.total = sd(rptf) * sqrt(252) * 100 sigma.explained = sqrt(var.explained) * sqrt(252) * 100
Daniel Herlemont
5
3 MULTICOLLINEARITY the total variance of the portfolio is 0.00072 and total volatility (yearly) is 42.6% beta
NYSE.Index Communications 0.0220 0.2354
Growth 0.2079
Large.Cap 0.8595
The Variance explained by the factors is 0.000375 and total volatility (yearly) is 30.7%
Comments ?
3
Multicollinearity
Multicollinearity refers to the correlation between the explanatory variables in a regression model: if one or more explanatory variables are highly correlated then it is difficult to estimate their regression coefficients. The multicollinearity problem becomes apparent when the estimated change considerably when adding another (collinear) variable to the regression. When high multicollinearity is present, confidence intervals for coefficients tend to be very wide and tstatistics tend to be very small. Coefficients will have to be larger in order to be statistically significant, i.e. it will be harder to reject the null when multicollinearity is present. There is no statistical test for multicollinearity, but a useful rule of thumb is that a model will suffer from it if the square of the pairwise correlation between explanatory variables is greater than the multiple R2 of the regression. Todo: perform regression of the Nokia and Vodafone using one factor: NYSE.Index 2 factors: NYSE.Index and Communications 3 factors: NYSE.Index and Communications and Growth 4 factors: NYSE.Index and Communications and Growth and Large.Cap
Explain the results, using the correlation matrix of the factors > r.factors = r[, c("NYSE.Index", "Communications", "Growth", "Large.Cap")] > cor.factors = cor(r.factors) > cor.factors
Daniel Herlemont
6
4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION NYSE.Index Communications Growth Large.Cap NYSE.Index 1.000 0.689 0.844 0.909 Communications 0.689 1.000 0.880 0.834 Growth 0.844 0.880 1.000 0.892 Large.Cap 0.909 0.834 0.892 1.000
4
Estimating Fundamental Factor Models by Orthogonal Regression
The best solution to a multicollinearity problem is to apply principal component analysis and then use the principal components as explanatory variables. We apply principal component analysis to the covariance matrix of the factors: > pca = prcomp(r.factors) > pca Standard deviations: [1] 0.031355 0.008992 0.004167 0.002782 Rotation: PC1 PC2 PC3 PC4 NYSE.Index 0.2588 -0.6099 -0.0966 0.7427 Communications 0.7963 0.5640 -0.1407 0.1674 Growth 0.3915 -0.2687 0.8447 -0.2472 Large.Cap 0.3817 -0.4875 -0.5074 -0.5993 > summary(pca) Importance of components: PC1 PC2 PC3 PC4 Standard deviation 0.03 0.009 0.004 0.003 Proportion of Variance 0.90 0.074 0.016 0.007 Cumulative Proportion 0.90 0.977 0.993 1.000 > plot(pca)
Daniel Herlemont
7
4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION
6e−04 4e−04 0e+00
2e−04
Variances
8e−04
pca
Alternatively we can use eigen(cov(r.factors)). todo: using the first component (maybe the 2 main components) compute the explained variance by the components. Conclusions ?
Daniel Herlemont
8
4 ESTIMATING FUNDAMENTAL FACTOR MODELS BY ORTHOGONAL REGRESSION Solutions: > > > > > > > > >
pc1 = pca$rotation[, 1] pc2 = pca$rotation[, 2] pc3 = pca$rotation[, 3] pc4 = pca$rotation[, 4] pc1r = apply(r.factors, pc2r = apply(r.factors, pc3r = apply(r.factors, pc3r = apply(r.factors, summary(lm(r[, "Nokia"]
1, function(x) 1, function(x) 1, function(x) 1, function(x) ~ pc1r))
sum(x sum(x sum(x sum(x
* * * *
pc1)) pc2)) pc3)) pc4))
Call: lm(formula = r[, "Nokia"] ~ pc1r) Residuals: Min 1Q Median -0.182175 -0.009307 -0.000295
3Q 0.008892
Max 0.201183
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000275 0.000628 0.44 0.66 pc1r 0.662287 0.020043 33.04 |t|) 0.28 0.78 33.27 < 2e-16 *** -4.39 1.2e-05 *** 0.01 ´ S*ˇ S 0.05 ´ S.ˇ S 0.1 ´ S ˇ S 1
Residual standard error: 0.0228 on 1328 degrees of freedom Multiple R-squared: 0.459, Adjusted R-squared: 0.458 F-statistic: 563 on 2 and 1328 DF, p-value: summary(lm(r[, "Vodafone"] ~ pc1r)) Call: lm(formula = r[, "Vodafone"] ~ pc1r) Residuals: Min 1Q Median -0.112669 -0.010215 -0.000164
3Q 0.009569
Max 0.126809
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.000140 0.000548 0.26 0.8 pc1r 0.423424 0.017470 24.24 |t|) -0.04 0.97 24.89