Comparing State-of-the-Art Collaborative Filtering Systems
Introduction Collaborative approaches Experiments
Laurent Candillier, Frank Meyer, Marc Boull´e France Telecom R&D Lannion MLDM 2007
Conclusions
1 Introduction 2 Collaborative approaches 3 Experiments 4 Conclusions
Recommender systems
Introduction Collaborative approaches Experiments
Help users find items they should appreciate from huge catalogues [Adomavicius and Tuzhilin, 2005] ⇒ Collaborative filtering : based on user to item rating matrix
Conclusions
u1 u2 u3 u4 u5 u6 u7
i1 4 4 5
i2 4 3
i3
5 4
5 ?
i4
i5 1
2 4 4 3
1 5
1
User-based approaches Recommend items appreciated by users whose tastes are similar to the ones of the given user [Resnick et al., 1994] Introduction Collaborative approaches Experiments
⇒ need a similarity measure between users ex : pearson similarity : cosine of deviation from the mean
Conclusions
w (a, u) = qP
P
− va )(vui − vu ) P 2 2 i ∈Sa ∩Su (vai − va ) i ∈Sa ∩Su (vui − vu ) i ∈Sa ∩Su (vai
vui : rating of user u on item i Su : set of items rated by user u vu : mean rating of user u vu =
P
i ∈Su
|Su |
vui
User-based approaches
Introduction Collaborative approaches Experiments Conclusions
Which rating for user a (active) on item i ? Prediction using weighted sum P pai =
{u|i ∈Su } w (a, u) × vui
P
{u|i ∈Su } |w (a, u)|
Prediction using weighted sum of deviations from the mean P {u|i ∈Su } w (a, u) × (vui − vu ) P pai = va + {u|i ∈Su } |w (a, u)| How many neighbors considered ?
Cluster-based approaches
Introduction Collaborative approaches Experiments Conclusions
Recommend items appreciated by users that belong to the same group as the given user [Breese et al., 1998] ⇒ need a clustering method : ex : K-means a distance measure : ex : euclidian distance Then the rating of a user on an item is the mean rating given by the users that belong to the same cluster How many clusters considered ?
Item-based approaches
Introduction Collaborative approaches Experiments Conclusions
Recommend items similar to those appreciated by the given user [Karypis, 2001] ⇒ dual of user-based approach P pai = vi +
{j∈Sa |j6=i } sim(i , j)
× (vaj − vj )
P
{j∈Sa |j6=i } |sim(i , j)|
sim(i , j) : similarity measure between items i and j Sa : set of items rated by user a vi : mean rating on item i How many neighbors considered ?
Experiments
Introduction Collaborative approaches Experiments Conclusions
For user- and item-based approaches, choose similarity measure prediction scheme neighborhood size K For cluster-based approaches, choose distance measure prediction scheme number of clusters Evaluation protocol [Herlocker et al., 2004] movie rating dataset : MovieLens (6040 × 3706) 10-fold cross validation (10 × 9/10th for learning) Mean Absolute Error Rate on test set T = {(u, i , r )} X 1 |pui − r | MAE = |T | (u,i ,r )∈T
User-based approaches, similarity measures
Introduction
MAE Pearson Constraint Cosine Adjusted Proba
Collaborative approaches Experiments
0.8
Conclusions
0.76
0.72
0.68 0
500
1000
1500
2000
2500
K
User-based approaches, prediction schemes
Introduction
MAE PearsonWeighted PearsonDeviation ProbaWeighted ProbaDeviation
Collaborative approaches Experiments
0.8
Conclusions
0.76
0.72
0.68 0
500
1000
1500
2000
2500
K
Item-based approaches, similarity measures
Introduction
MAE Pearson Constraint Cosine Adjusted Proba
Collaborative approaches Experiments
0.76
Conclusions
0.72
0.68
0.64 0
200
400
600
800 1000 1200 1400
K
Summary of experiments
Introduction
Experiments
model construction time (in sec.) prediction time (in sec.)
Conclusions
MAE
Collaborative approaches
BestDefault
BestUser
BestItem
BestCluster
1
730
170
254
1
31
3
1
0.6829
0.6688
0.6382
0.6736
BestDefault : Bayes minimizing MAE BestUser : pearson similarity, 1500 neighbors, prediction using deviation from the mean BestItem : probabilistic similarity, 400 neighbors, prediction using deviation from the mean BestCluster : K-means, euclidian distance, 4 clusters, prediction using Bayes minimizing MAE
Conclusions
Introduction Collaborative approaches Experiments Conclusions
All approaches, and all their possible options, are tested under exactly the same conditions Bayes is a good compromise : low error rate, low execution time, incremental Deviation from the mean : better results, new for item-based approaches Similarity measures : pearson for user-based, probabilistic for item-based
Conclusions
Introduction
The item-based approach
Collaborative approaches
get the best performances in the experiments
Experiments
seems to need fewer neighbors than user-based approach
Conclusions
is also appropriate to navigate in item catalogues even with no user information may naturally use content data about items to improve its results (idem for user-based approach with demographic data) results depend on the number of items compared to the number of users ?
Next
Introduction Collaborative approaches Experiments Conclusions
Need to scale well even when faced with huge datasets ex : netflix prize : 100,480,507 ratings from 480,189 users on 17,770 movies select most relevant users [Yu et al., 2002] reduce dimensionality with PCA or SVD [Goldberg et al., 2001, Vozalis and Margaritis, 2005] create a set of super-users [Rashid et al., 2006] sampling ? stochastic ? bagging ? Combine approaches ⇒ ensemble methods [Polikar, 2006]
Introduction Collaborative approaches Experiments Conclusions
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994) Grouplens: an open architecture for collaborative filtering of netnews In Conference on Computer Supported Cooperative Work, pages 175–186. ACM J. Breese, D. Heckerman and C. Kadie (1998) Empirical analysis of predictive algorithms for collaborative filtering In 14th Conference on Uncertainty in Artificial Intelligence, pages 43–52. Morgan Kaufman G. Karypis (2001) Evaluation of item-based top-N recommendation algorithms
In 10th International Conference on Information and Knowledge Management, pages 247–254 Introduction Collaborative approaches Experiments Conclusions
K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001) Eigentaste: a constant time collaborative filtering algorithm Information Retrieval, 4(2):133–151 K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002) Instance selection techniques for memory-based collaborative filtering In SIAM Data Mining J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004) Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems, 22(1):5–53 G. Adomavicius and A. Tuzhilin (2005)
Introduction Collaborative approaches Experiments Conclusions
Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749 M. Vozalis and K. Margaritis (2005) Applying SVD on item-based filtering In 5th International Conference on Intelligent Systems Design and Applications, pages 464–469 A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006) ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm In KDD Workshop on Web Mining and Web Usage Analysis R. Polikar (2006) Ensemble systems in decision making IEEE Circuits & Systems Magazine, 6(3):21–45