Comparing State-of-the-Art Collaborative Filtering ... - Laurent Candillier

by the users that belong to the same cluster. How many ... For cluster-based approaches, choose ... Mean Absolute Error Rate on test set T = {(u,i,r)}. MAE = 1.
158KB taille 14 téléchargements 283 vues
Comparing State-of-the-Art Collaborative Filtering Systems

Introduction Collaborative approaches Experiments

Laurent Candillier, Frank Meyer, Marc Boull´e France Telecom R&D Lannion MLDM 2007

Conclusions

1 Introduction 2 Collaborative approaches 3 Experiments 4 Conclusions

Recommender systems

Introduction Collaborative approaches Experiments

Help users find items they should appreciate from huge catalogues [Adomavicius and Tuzhilin, 2005] ⇒ Collaborative filtering : based on user to item rating matrix

Conclusions

u1 u2 u3 u4 u5 u6 u7

i1 4 4 5

i2 4 3

i3

5 4

5 ?

i4

i5 1

2 4 4 3

1 5

1

User-based approaches Recommend items appreciated by users whose tastes are similar to the ones of the given user [Resnick et al., 1994] Introduction Collaborative approaches Experiments

⇒ need a similarity measure between users ex : pearson similarity : cosine of deviation from the mean

Conclusions

w (a, u) = qP

P

− va )(vui − vu ) P 2 2 i ∈Sa ∩Su (vai − va ) i ∈Sa ∩Su (vui − vu ) i ∈Sa ∩Su (vai

vui : rating of user u on item i Su : set of items rated by user u vu : mean rating of user u vu =

P

i ∈Su

|Su |

vui

User-based approaches

Introduction Collaborative approaches Experiments Conclusions

Which rating for user a (active) on item i ? Prediction using weighted sum P pai =

{u|i ∈Su } w (a, u) × vui

P

{u|i ∈Su } |w (a, u)|

Prediction using weighted sum of deviations from the mean P {u|i ∈Su } w (a, u) × (vui − vu ) P pai = va + {u|i ∈Su } |w (a, u)| How many neighbors considered ?

Cluster-based approaches

Introduction Collaborative approaches Experiments Conclusions

Recommend items appreciated by users that belong to the same group as the given user [Breese et al., 1998] ⇒ need a clustering method : ex : K-means a distance measure : ex : euclidian distance Then the rating of a user on an item is the mean rating given by the users that belong to the same cluster How many clusters considered ?

Item-based approaches

Introduction Collaborative approaches Experiments Conclusions

Recommend items similar to those appreciated by the given user [Karypis, 2001] ⇒ dual of user-based approach P pai = vi +

{j∈Sa |j6=i } sim(i , j)

× (vaj − vj )

P

{j∈Sa |j6=i } |sim(i , j)|

sim(i , j) : similarity measure between items i and j Sa : set of items rated by user a vi : mean rating on item i How many neighbors considered ?

Experiments

Introduction Collaborative approaches Experiments Conclusions

For user- and item-based approaches, choose similarity measure prediction scheme neighborhood size K For cluster-based approaches, choose distance measure prediction scheme number of clusters Evaluation protocol [Herlocker et al., 2004] movie rating dataset : MovieLens (6040 × 3706) 10-fold cross validation (10 × 9/10th for learning) Mean Absolute Error Rate on test set T = {(u, i , r )} X 1 |pui − r | MAE = |T | (u,i ,r )∈T

User-based approaches, similarity measures

Introduction

MAE Pearson Constraint Cosine Adjusted Proba

Collaborative approaches Experiments

0.8

Conclusions

0.76

0.72

0.68 0

500

1000

1500

2000

2500

K

User-based approaches, prediction schemes

Introduction

MAE PearsonWeighted PearsonDeviation ProbaWeighted ProbaDeviation

Collaborative approaches Experiments

0.8

Conclusions

0.76

0.72

0.68 0

500

1000

1500

2000

2500

K

Item-based approaches, similarity measures

Introduction

MAE Pearson Constraint Cosine Adjusted Proba

Collaborative approaches Experiments

0.76

Conclusions

0.72

0.68

0.64 0

200

400

600

800 1000 1200 1400

K

Summary of experiments

Introduction

Experiments

model construction time (in sec.) prediction time (in sec.)

Conclusions

MAE

Collaborative approaches

BestDefault

BestUser

BestItem

BestCluster

1

730

170

254

1

31

3

1

0.6829

0.6688

0.6382

0.6736

BestDefault : Bayes minimizing MAE BestUser : pearson similarity, 1500 neighbors, prediction using deviation from the mean BestItem : probabilistic similarity, 400 neighbors, prediction using deviation from the mean BestCluster : K-means, euclidian distance, 4 clusters, prediction using Bayes minimizing MAE

Conclusions

Introduction Collaborative approaches Experiments Conclusions

All approaches, and all their possible options, are tested under exactly the same conditions Bayes is a good compromise : low error rate, low execution time, incremental Deviation from the mean : better results, new for item-based approaches Similarity measures : pearson for user-based, probabilistic for item-based

Conclusions

Introduction

The item-based approach

Collaborative approaches

get the best performances in the experiments

Experiments

seems to need fewer neighbors than user-based approach

Conclusions

is also appropriate to navigate in item catalogues even with no user information may naturally use content data about items to improve its results (idem for user-based approach with demographic data) results depend on the number of items compared to the number of users ?

Next

Introduction Collaborative approaches Experiments Conclusions

Need to scale well even when faced with huge datasets ex : netflix prize : 100,480,507 ratings from 480,189 users on 17,770 movies select most relevant users [Yu et al., 2002] reduce dimensionality with PCA or SVD [Goldberg et al., 2001, Vozalis and Margaritis, 2005] create a set of super-users [Rashid et al., 2006] sampling ? stochastic ? bagging ? Combine approaches ⇒ ensemble methods [Polikar, 2006]

Introduction Collaborative approaches Experiments Conclusions

P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994) Grouplens: an open architecture for collaborative filtering of netnews In Conference on Computer Supported Cooperative Work, pages 175–186. ACM J. Breese, D. Heckerman and C. Kadie (1998) Empirical analysis of predictive algorithms for collaborative filtering In 14th Conference on Uncertainty in Artificial Intelligence, pages 43–52. Morgan Kaufman G. Karypis (2001) Evaluation of item-based top-N recommendation algorithms

In 10th International Conference on Information and Knowledge Management, pages 247–254 Introduction Collaborative approaches Experiments Conclusions

K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001) Eigentaste: a constant time collaborative filtering algorithm Information Retrieval, 4(2):133–151 K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002) Instance selection techniques for memory-based collaborative filtering In SIAM Data Mining J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004) Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems, 22(1):5–53 G. Adomavicius and A. Tuzhilin (2005)

Introduction Collaborative approaches Experiments Conclusions

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749 M. Vozalis and K. Margaritis (2005) Applying SVD on item-based filtering In 5th International Conference on Intelligent Systems Design and Applications, pages 464–469 A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006) ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm In KDD Workshop on Web Mining and Web Usage Analysis R. Polikar (2006) Ensemble systems in decision making IEEE Circuits & Systems Magazine, 6(3):21–45