Download Recommendation systems and more Slides Computer science in PDF only on Docsity!
UNIT III: User-Based collaborative filtering,
Similarity Function Variants, Variants of the
Prediction Function, Item-Based Collaborative
filtering, Comparing User-Based and Item-Based
Methods, Strengths and Weaknesses of
Neighborhood-Based Methods
Neighborhood-Based Collaborative Filtering
Introduction
• Neighborhood-based collaborative filtering (also
called memory-based filtering ) relies on user and
item similarity.
• Two main types:
– User-based collaborative filtering : Predicts ratings
based on similar users' ratings.
– Item-based collaborative filtering : Predicts ratings
based on a user's ratings of similar items.
Key Properties of Ratings Matrices
1. Definition and Structure of Ratings Matrices
- (^) The ratings matrix R is an m × n matrix where m represents users and n represents items.
- (^) Ratings are typically sparse , with only a small subset of the entries specified.
- (^) Specified entries = Training data ; Unspecified entries = Test data.
- (^) Recommendation is a generalization of classification and regression problems.
2. Types of Ratings
Continuous Ratings
- (^) Ratings can take any value within a range (e.g., Jester joke system: -10 to 10).
- (^) Drawback : Users find it difficult to choose from an infinite set of values. Interval-Based Ratings
- (^) Ratings are selected from a fixed scale (e.g., 1-5, -2 to 2, 1-7).
- (^) Assumes equal distance between rating levels. Ordinal Ratings
- (^) Categorical but ordered values (e.g., “Strongly Disagree” to “Strongly Agree”).
- (^) No assumption that differences between categories are equal
3. Implicit Feedback & Unary Ratings
- (^) Implicit feedback : User actions (e.g., purchases, clicks) are interpreted as preferences.
- (^) More common than explicit ratings, as users interact more frequently than they rate.
- (^) Can be seen as a positive-unlabeled (PU) learning problem in classification. 4. The Long-Tail Property in Ratings Distribution
- (^) Observation : A small fraction of items are rated frequently (popular items), while the majority have few ratings (long-tail items).
- (^) Graph representation :
- (^) X-axis: Items ranked by frequency of ratings.
- (^) Y-axis: Number of ratings per item.
- (^) Results in a skewed distribution.
5. Implications of the Long-Tail Property
- (^) Merchant Profitability
- (^) Popular items are competitive but low-profit.
- (^) Less popular items (long-tail) often have higher profit margins (e.g., Amazon’s strategy).
Predicting Ratings with Neighborhood-Based
Methods
1. Concept of Neighborhood-Based Methods - (^) Uses user-user similarity or item-item similarity to make recommendations. - (^) Relies on the principle that similar users or similar items have similar ratings. 2. Two Basic Principles
- (^) User-Based Models
- (^) Users with similar rating patterns tend to rate items similarly.
- (^) Example: If Alice and Bob have rated movies similarly in the past, Alice’s rating for "Terminator" can predict Bob’s rating for the same movie.
- (^) Item-Based Models
- (^) Similar items receive similar ratings from the same user.
- (^) Example: Bob's ratings for "Alien" and "Predator" can predict his rating for "Terminator." 3. Connection to Nearest Neighbor Classification
- (^) Collaborative filtering is a generalization of classification/regression modeling.
- (^) Neighborhood-based models are similar to nearest neighbor classifiers in machine learning.
- (^) Unlike classification, collaborative filtering determines nearest neighbors using both rows (users) and columns (items).
5. Item-Item Similarity Computation (Example from Table 2.2)
- (^) Adjusted cosine similarity is used for item similarity calculations.
- (^) Items are compared after mean-centering ratings to eliminate user bias.
- (^) Cosine similarity scores between items indicate their similarity levels.
User-Based Neighborhood Models
1. Concept of User-Based Neighborhoods - (^) Defines user neighborhoods by identifying similar users to the target user. - (^) Uses these similar users' ratings to predict missing ratings for the target user. - (^) A similarity function is required, but it must account for different rating scales among users. 2. Key Challenges in User-Based Similarity Computation - (^) Different rating scales: Some users consistently give higher or lower ratings than others. - (^) Sparse ratings: Many users rate only a small subset of items, making similarity computation challenging. - (^) Mutual rating sets: Similarity is computed only for the overlapping rated items between two users.
5. Variations & Enhancements - (^) Some implementations compute mean ratings dynamically based on overlapping items. - (^) Heuristic filtering removes users with low or negative similarity to improve accuracy. - (^) The method allows for different similarity measures and weighting strategies to fine-tune recommendations.