Options are good, but if there are too many options, we need help. It is increasingly the case that our interaction with content is mediated by recommendation systems. There are two main approaches taken in recommendation systems: content filtering and collaborative filtering. Content filtering makes use of features associated with items and users (e.g., age, location, gender of users and genre, actors, director of movies). In contrast, collaborative filtering is based on observed user preferences. Thus, two users are thought of as similar if they have revealed similar preferences, irrespective of their profile. Likewise, two items are thought of as similar if most users have similar preferences for them. More generally, collaborative filtering (CF) makes use of structure in the matrix of preferences, as in low-rank matrix formulations [1, 2, 3, 4, 5, 6, 7, 8]. In this paper, since our model has no item and user features, all algorithms must do collaborative filtering.
An important aspect of most recommendation systems is that each recommendation influences what is learned about the users and items, which in turn determines the possible accuracy of future recommendations. This introduces a tension between exploring to obtain information and exploiting existing knowledge to make good recommendations. The tension between exploring and exploiting is exactly the phenomenon of interest in the substantial literature on the multi-armed bandit (MAB) problem and its variants [9, 10, 11]. In the multi-armed bandit setup, optimal algorithms necessarily converge to repeated play of the same arm; in contrast, a recommendation system that repeatedly recommends the same movie, even if it is a very good movie, is surely problematic! For this reason we will allow each item to be recommended at most once to each user (as done in [12, 13]).
It is common to think of recommendation systems as a matrix completion problem. Given a subset of observed entries, the matrix completion problem is to estimate the rest of matrix, where it is assumed that the matrix satisfies some properties. This criterion does not capture the experience of users in a recommendation system: a more appropriate measure of performance is the proportion of good recommendations made by the algorithm.
With the aforementioned issues in mind, we work within a mathematical framework for evaluating the performance of various recommendation system algorithms, related to the models studied in [12, 13]. The framework is detailed in Section 2, but in brief, at each time-step each user in the system is given a recommendation and then provides binary feedback in the form of ’like’ or ’dislike’. The user preferences are described by a latent variable model in which each user is associated with a user type and each item is associated with an item type. Users who belong to the same user type have identical preferences for all items and items belonging to the same type have identical ratings from all users.The basic measure of performance is expected regret, defined as the expected number of bad recommendations made per user over a time horizon of interest. A second performance criterion is the cold start time, the first time at which recommendations become nontrivial in quality. Our goal is to understand the dependence of these quantities on system parameters and we will therefore seek bounds accurate only to within constant or logarithmic factors.
In the literature there are two categories of collaborative filtering (CF) algorithms. Useruser algorithms [16, 12, 17] use structure in the user space to predict user preferences. Here, the preference of user u for item i is estimated from the preference of other users believed to be similar to u based on their previous ratings. Alternatively, item-item algorithms [13, 18, 19] use structure in the item space. This time, the preference of user u for item i is estimated from the preference of the same user u for other items
believed to be similar to i based on previous ratings from users that have rated both
. In Sections 4 and 6 we develop versions of user-user and item-item CF algorithms tailored to our online recommendation system model and prove performance guarantees. In order to achieve good performance, these algorithms must carefully explore the a priori unknown relationships between users and items. One of the unexpected insights that emerge from the analysis is that the item-item algorithm must limit the exploration to only a subset of the items types, where the size of this subset depends on the system parameters and time-horizon. The straightforward approach to Item-Item CF algorithms is to learn the whole preference matrix, and as described in Section 3 this results in a qualitatively suboptimal cold-start time that can be arbitrarily worse than the one obtained by our algorithm.
In order to focus on the information structure of the recommendation problem, and the associated exploration-exploitation tradeoff, the majority of the paper assumes that user feedback is noiseless. We generalize our user-user algorithm to handle noisy feedback and also describe how one would similarly accommodate noisy feedback in the item-item algorithm. In essence, estimation of similarity between users (or items) requires some redundancy in the information collected in order to average out the noise.
We prove nearly tight lower bounds on regret for two parameter regimes of interest, identifying settings in which the proposed algorithms cannot be significantly improved. In the user structure only scenario, the model parameters are such that there is no structure in the item space. Analogously, in the item structure only scenario, the parameters are such that there is no structure in the user space. We prove information-theoretic lower bounds for the performance of any algorithm in the user-structure only and item-structure only models, which match to within a logarithmic factor the performance obtained by our proposed user-user and item-item CF algorithms. These results are outlined in Section 3.
One of this paper’s main contributions is the development of techniques for proving lower bounds on the performance of online recommendation algorithms. Our lower bounds depend crucially on the inability to repeatedly recommend the same item to a given user, and for this reason are completely different from lower bounds for multi-armed bandit problems [10, 9]. At a high level, however, the basic challenge is the same as when proving lower bounds for bandits: one must connect the information obtained by the algorithm to the regret incurred. This allows to reason that subsequent recommendations will have low regret only if prior recommendations yielded significant information, which in turn necessitated exploratory recommendations with correspondingly substantial regret. Thus, regret is a conserved quantity and cannot be avoided by employing complicated adaptive algorithms.
The methods used for the lower bounds are elementary in nature. For example, in the user structure only model, the arguments in Section 7 are based on two observations. First, one cannot be confident in recommending any item to user u at time t if there is no user that has rated enough items in common, and in agreement, with user
In this situation, the similarity of u to any other user is uncertain and so too is the outcome of any recommendation. Second, the outcome of recommending item i to user u at time t is also uncertain if none of the users that actually are similar to u have rated item
1. These observations imply a lower bound on the necessary number of exploratory recommendations before it is possible to recommend with much better likelihood of success than chance. Similar reasoning leads to lower bounds in Section 5 for the model with only item-structure.
A few papers including [12, 13, 20] have theoretical analyses for online collaborative filtering. The paper [12] analyzes a user-user CF algorithm in a similar setting to ours and [13] analyzes an item-item CF algorithm in a somewhat different and more flexible model. Relative to these, our main distinction is obtaining nearly matching lower bounds showing optimality of our algorithms and analysis. The model studied by Dabeer and coauthors [20, 1, 22] is also quite similar to our setup, but their objective is different: they seek an algorithm that exploits in a provably optimal fashion asymptotically in time, but their approach does not reveal how to explore. In a different direction, Kerenidis and Prakash [21] seek to achieve low computational complexity for recommendation in a similar setup as ours. What they show is that reconstructing the preference matrix only partially, which is what our item-item CF algorithm does, is useful also with regards to computation.
Hybrid algorithms exploiting both structure in user space and item space have been studied before in [23, 24, 25]. Both Song et al. [23] and Borgs et al. [26] study a more flexible latent variable model in the offline (matrix completion style) setting and propose collaborative filtering algorithms using both item and user space. In a forthcoming paper we analyze a hybrid algorithm within the same framework studied here.
1.1 Outline
The model and performance metric are described in Section 2. Section 3 overviews the main results of this paper and includes numerical simulations to complement the theoretical analyses. Our version of user-user CF is introduced and analyzed in Section 4. In Section 5 we prove that the proposed algorithm is almost information-theoretically optimal in the setup with user structure only. Our version of item-item CF is described and analyzed in Section 6, and the corresponding lower bound in the setting with item structure only is given in Section 7. Appendix A contains a few basic probabilistic lemmas, and Appendix B relates so-called anytime regret (unknown time horizon) to known time horizon.
1.2 Notation
For an integer a we write [and for real-valued
. All logarithms are to the base of 2. The set of natural numbers (positive integers) is denoted by N. We note here that variables or parameters in Figure 1 have the same meaning throughout the paper, but any others may take different values in each section. For real-valued
denotes the greatest integer less than or equal to
denotes the smallest integer greater than or equal to x. Numerical constants (
and so forth) may take different values in different theorem statements unless explicitly stated otherwise.
2.1 Problem setup
There is a fixed set of users {1, . . . , N}. At each time t = 1, 2, 3, . . . the algorithm recommends an item to each user u and receives feedback
(‘like’ or ‘dislike’). For the reasons stated in the introduction, we impose the condition that each item may be recommended at most once to each user. In order that the algorithm never run out of items to recommend, we suppose there are infinitely many items to draw from and identify them with the natural numbers.
The history is the collection of actions and feedback up to time t. We are interested in online learning algorithms, in which the action
is a (possibly random) function of the history up through the end of the previous time-step
. This additional randomness is encoded in a random variable
, assumed to be independent of all other variables. In this way,
), for some deterministic function
Algorithm performance will be evaluated after some arbitrary number of time-steps T. The performance metric we use is expected regret (simply called regret in what follows), defined as the expected number of disliked items recommended per user:
Here the expectation is with respect to the randomness in both the model and the algorithm. The algorithms we describe depend on knowing the time-horizon T, but by a standard doubling trick (explained in Appendix B) it is possible to convert these to algorithms achieving the same (up to constant factors) regret without this knowledge (see, e.g., [27]). This latter notion of regret, where the algorithm does not know the time-horizon of interest and must achieve good performance across all time-scales, is called anytime regret in the literature.
The time at which point recommendations become nontrivial in quality is another important performance criterion, because until that point users invest effort but get little in return. In the recommendation systems literature the notion of cold start describes the difficulty of providing useful recommendations when insufficient information is available about user preferences. We define the cold start time to be the first time at which the slope of regret as a function of T is bounded by some value
This is similar to (but somewhat simpler than) the definition in [13].
2.2 User preferences
We study a latent-variable model for the preferences (‘like’ or ‘dislike’) of the users for the items, based on the idea that there are relatively few types of users and/or few types of items. Each user ] has a user type
) i.i.d. uniform on [
is the number of user types.
Figure 1: Notation for the recommendation system model.
We assume that , because if
then most users have their own type and all of the results remain unchanged upon replacing
. Similarly, each item
has a random item type
) i.i.d. uniform on [
is the number of item types
. The random variables
are assumed to be jointly independent.
All users of a given type have identical preferences for all the items, and similarly all items of a given type are rated in the same way by any particular user. The entire collection of user preferences (is therefore encoded into a much smaller
which specifies the preference of each user type for each item type. The preference
is the preference
of the associated user type
) for the item type
) in the matrix Ξ, i.e.,
We assume that the entries of Ξ are i.i.d., 2. Generalizing our results to i.i.d. entries with bias p is straightforward. However, the independence assumption is quite strong and an important future research direction is to obtain results for more realistic preference matrices. We also consider a noisy model with
are i.i.d. random variables with
2.3 Two regimes of interest
Two specific parameter regimes play a central role in this paper, capturing settings with structure only in user space or only in item space. As described in Section 3, each of user-user or item-item CF is almost optimal in the corresponding regime.
Definition 2.1 (User structure only (refers to the case that there is no structure in the item space. To simplify matters, we assume that the preference matrix Ξ
is deterministic and has columns consisting of all sequences in
Essentially the same preference matrix would arise (with high probability) if
is much larger than
2(when the entries are i.i.d. as specified above in Subsection 2.2).
Definition 2.2 (Item structure only (refers to the case that there is no structure in the user space. This happens when
is much larger than N, since then most user types have no more than one user. For the purpose of proving near-optimality of item-item CF, it suffices to take
(and we do so).
We will analyze a version of each of user-user and item-item CF within the general setup described in Section 2. The resulting regret bounds appear in Theorems 4.1 and 6.1. These theorems are complemented by information theoretic-lower bounds, Theorems 5.1 and 7.1, showing that no other algorithm can achieve much better regret (up to multiplicative logarithmic factors) in the specific extreme parameter regimes with user-structure only and item-structure only. The simplified versions of these theorems appear in this section. Towards the end of this section we present simulation results supporting the theorems.
3.1 User-user collaborative filtering
User-user CF exploits structure in the user space: the basic idea is to recommend items to a user that are liked by similar users. We analyze an instance of user-user CF described in detail in Section 4.1, obtaining the regret bound given in Theorem 4.1 below. Essentially, the algorithm clusters users according to type by recommending random items for an initial phase, and then uses this knowledge to efficiently explore the preferences of each user type (as opposed to each user individually). The subsequent savings is due to the fact that the cost of exploration can be shared amongst users of the same type.
The random recommendations made during the initial phase incur regret with slope 1/2, because a random recommendation is disliked with probability half. Afterward, the users are clustered according to type. Recommending an item to users, one from each type, gives us the preferences of all N users for the item, and each such recommendation is disliked with probability 1/2. This results in a slope of
for regret in the second phase of the algorithm.
Theorem 4.1 (Regret upper bound in user-user CF, simplified version). Consider the recommendation system model described in Section 2 with user types, and
types. There exists numerical constants c, C so that Algorithm 1 achieves regret
The cold-start time, the time until the slope of the regret drops below , is evidently Θ(log N) for any
). It follows from the next theorem that if there is no structure in the item space and the number of user types is
1, then the user-user CF algorithm achieves both regret and cold start time that are optimal up to multiplicative constants.
Theorem 5.1 (Regret lower bound with user structure only, simplified version). There exist a numerical constant c such that in the user structure model (Defn 2.1) with types and
users, any recommendation algorithm must incur regret
The reasoning for the first part of the lower bound is as follows. If a user has been recommended fewer than log items, then its similarity with respect to other users cannot be determined. This implies that any recommendation made to this user has uncertain outcome. The second part of the lower bound is obtained by showing that when an item is recommended for the first time to a user from a given user type the outcome of that recommendation is uncertain, and lower bounding the number of such recommendations. This is where we use the condition that each item is recommended at most once to each user.
The lower bound shows that the poor initial performance of user-user CF, as bad as simply recommending random items, is unavoidable in the setting with only user structure and that its duration depends on the number of user types. In [13] it was shown that a version of item-item CF obtains much smaller cold start time than user-user CF in a model with item structure only. Our results on item-item CF, described next, corroborate this.
3.2 Item-item collaborative filtering
Item-item CF exploits structure in the item space: users are recommended items similar to those they have liked. We analyze an instance of item-item CF in Section 6.1, obtaining the regret bound given in Theorem 6.1 below. The algorithm creates several clusters of items, as well as a set of unclustered items. Similarity of two items is estimated by having random users rate both items. Users then explore a single item from each cluster and liked clusters are subsequently recommended. The effort of clustering is shared amongst all the users, and the savings is due to liked explorations yielding an entire cluster of items to recommend.
Crucially, this version of item-item CF has the feature that only a subset of the item space is explored (i.e., only a subset of the item types are clustered, with the others cast aside). To the best of our knowledge, the benefit of limiting the scope of item exploration has not been made explicit before; this only became evident to us in seeking to match the lower bound. The total number of items compared and the number of clusters are chosen depending on the system parameters to give the best regret bound.
Theorem 6.1 (Regret upper bound in item-item CF, simplified version). Consider the recommendation system model described in Section 2 with item types, and
user types. There are numerical constants
such that Algorithm 3 obtains regret per user at time T upper bounded as
If there is no structure in the user space and the number of item types is 0, then the item-item CF algorithm is optimal up to a logarithmic factor.
Theorem 7.1 (Regret lower bound for item structure only, simplified version). In the item-structure model (Defn. 2.2) with item types and N > 32 users, there exist numerical constants
such that any recommendation algorithm must incur regret
It follows that the cold start time in the item-structure only regime is lower bounded as
), while the upper bound based on our proposed algorithm is
Note that the cold start time with
). The gap in the upper and lower bounds on cold start time is a consequence of the existence of the second regime in the lower bound given above, and appears to be an artifact of our proof.
The proof of the lower bound is based on two main observations. First, if an item has been recommended to fewer than log users, then its similarity with respect to other items cannot be determined; this implies that recommending this item to any user has uncertain outcome. Second, when a user is recommended an item from a given item type for the first time, the outcome of that recommendation is uncertain since this reveals a new variable in the preference matrix. Lower bounding the number of such uncertain recommendations gives the lower bound for regret.
If there is structure in the item space it is possible to avoid the long cold-start time of algorithms using only user structure: even for a very short time horizon, they can guarantee nontrivial bounds on regret. In particular, the near-optimal algorithm proposed here suffers from a constant value of regret for an initial period. Note that as N increases, the regret upper bound (given in Theorem 6.1) in the initial phase (constant ) does not change, but the length of the initial phase increases. Thus increasing N makes it easier to make meaningful recommendations. The same phenomenon is true more generally: the upper bound on regret at any time T is a decreasing function of N.
3.3 Numerical Simulations
We simulated our versions of User-User and Item-Item Algorithms (As described in Sections 4 and 6). In Figure 2, we plot the regret as a function of time for the User-User Algorithm (Alg. 1 in Section 4). We observe that the slope of regret in the asymptotic regime increases by increasing . We also observe that increasing N decreases the asymptotic slope but does not decrease the cold start time of the algorithm.
Figure 2: Simulated performance for Algorithm User-User. System parameters are (a) N = 400 and
In Figure 3, we plot the regret as a function of time for the Item-Item Algorithm (Alg. 3 in Section 6). We observe that with fixed N, increasing increases the cold-start time. But with fixed
, the cold-start time shrinks linearly in N. We also observe that the slope of regret after the cold start time increases with increasing
and decreasing N, consistent with the statement of Theorem 6.1.
In this section, we describe a version of user-user CF and then analyze it within the latent variable model introduced in Section 2.
Figure 3: Simulated performance for Algorithm Item-Item. System parameters are (a) N = 600 and
4.1 Algorithm
Pseudocode for algorithm User-User appears as Algorithm 1. In Step 1, random items are recommended to all of the users. The ratings of these items are used to construct a partition of users that recovers the user types correctly with high probability. In Step 2, users are recommended new random items (exploration) until an item is liked. If the user is in group
the partition, the item is added to a set
of items to be recommended to all other users in the same partition (exploitation). Step 2 (find and recommend items) is repeated indefinitely.
Remark 4.1. Our model assumes that users of the same type have identical ratings. Hence, users of the same type are always in the same group after partitioning. However, due to random sampling of the items in exploration, users from different types can have identical ratings for the items recommended in Step 1, in which case they will end up in the same partition. It follows that the total number of groups in the user partition is at most
We make a few additional remarks regarding the algorithm:
• The labeling of user groups in the partitioning step is arbitrary (and may be different from the similarly arbitrary labeling of user types).
• In Step 2, the sets of items at each time contain the items exploitable by users in the k-th group in the partition. The algorithm predicts that all users in the k-th group like items in
• The algorithm takes as input. As mentioned in Section 2, a doubling trick described in Appendix B converts the algorithm to one oblivious to T. It is also fairly straightforward to modify the algorithm to be adaptive to
The adaptive algorithm initializes with a trivial partition placing all users in one group. The algorithm subsequently refines
the partition whenever a user’s feedback indicates that they have been grouped incorrectly. We chose not to do so since it complicates the analysis.
Theorem 4.1. Consider the model introduced in Section 2 with user types and
item types. Let
achieves regret
The assumption ensures that with probability 1
) for each user type, there is at least one item type that is liked. This assumption also ensures that with probability 1
any pair of user types, there is at least one item type which is rated differently by them. If there is no such item type, then the two user types rate everything similarly and are indistinguishable.
The theorem indicates that up until time r, the algorithm is making meaningless (randomly chosen independent of feedback) recommendations. Random recommendations have probability half of being liked, hence incur regret with slope 1/2. After that, the algorithm achieves the asymptotic slope indicating that on average recommendations out of N are random. The simplified version of this theorem in Section 3 is obtained using 2 log N < r < 7 log N (since
). We also pick the constant C large enough so that
T > r.
4.2 Proof of Theorem 4.1
We first bound the probability that the partition created by the algorithm is correct in Lemma 4.2. Next, to prove the theorem we will show that conditioned on the partition being correct, the number of exploratory recommendations (and hence the regret) is upper bounded.
be the event that users u and v are partitioned correctly with respect to each other in Step 1 of
be as defined there. If
. It follows that if
is the event that all users are partitioned correctly, then
Proof. As observed in Remark 4.1, users from the same partition rate items identically. Therefore the only way an error in partitioning occurs is if users of different types are grouped together. This happens when two users rate all exploratory items identically in Step 1. In Step 1, the first r items recommended to all users are chosen uniformly at random independent of feedback, so the types of these items are uniformly distributed on []. Let s be the number of items with distinct item types among the r exploratory items from Step 1. This is a balls and bins scenario with r balls into
bins, and Lemma A.3 states that if
. By symmetry, each of the types of the s items with distinct types is uniformly distributed on [
Since all users rate the same items and users of the same type have identical preferences, as far as the lemma is concerned we only consider how the user types themselves rate items in Step 1. Two user types rate s independently chosen items of distinct types in the same way with probability 2
. On the event s
2, we have 2
The above two statements show that for users
The second statement in the lemma follows by union bounding over2 pairs of user types.
, the algorithm recommends random items chosen independently of feedback to all users. So at these times
2 for all users
]. It follows that, for
Now consider the case T > r. At t = r, by Lemma 4.2, the partitioning step recovers the user types correctly with probability at least 1 all users in a partition have the same type, so by construction of the sets
in Line 15 of
liked by at least one user of the same type as u and therefore also by u, and
Because there are TN terms in the sum and , it follows that
Now, we need to find an upper bound for the expected number of disliked exploration recommendations in Step 2 of the algorithm,
It will be useful to relate the expected number of liked and disliked explorations. To this end, we consider the event that every user type likes at least 1/3 of the item types: define the event
A Chernoff bound (Lemma A.1) applied to the i.i.d. variables gives
1/N, where the last inequality due to
. Conditioning on event C, we get
To obtain an upper bound for the first term, in Claim 4.3 below we will upper bound the expected number of exploration recommendations that were liked, and on event C this will provide also an upper bound for the expected number of exploration recommendations that were disliked. The number of liked explorations is easier to deal with, because of a self-limiting effect: these result in items added to sets for exploitation, and exploration only happens when there are not enough items to be exploited.
We now relate the expected number of liked and disliked explorations. At then it means the item is an exploratory recommendation and thus
is an independent new random item with uniformly random type
]. Hence, using the definition of event C,
and 1 . It follows that
This means that to bound the first term in (5) it suffices to bound the contribution from the sum with
Claim 4.3. On event C, the number of liked ‘explore’ recommendations (line 13 of Algorithm 1) by time T can be bounded as
Proof. For user partition to be the set of items denoted by
in the algorithm at time t, after making the time-step t recommendations. Item
is added to
precisely on the event
. Therefore, dropping C from the indicator,
If . Meanwhile, at time t, the exploration event (recommending
in line 13) happens only if there are no items left in
to exploit, i.e.,
. In this way,
guarantees that there is an exploitable item at time t for each user in
. Consequently,
The bound is due to the sum having
terms, each upper bounded by 1.
Let be the last time for which we are not guaranteed (based on the reasoning before the last displayed eqn.) to have an exploitable item. Note that the set over which we take the maximum is nonempty if
Since for , by (9) and (10) for these times we have
. This gives (a) in the above display. By definition,
. Inequality (b) uses (9) and (10) to bound
Note that . Using (8) and summing the last displayed inequality over the (at most)
partition indices proves the claim.
We can now complete the proof of Theorem 4.1. By the preceding claim and Equations (4), (5), and (7) we get
For T > r , we can now bound the regret by combining Equation (2) with the previous display:
4.3 User-User Algorithm with Noisy Preferences
We generalize the result to the scenario in which the feedback to the recommendation system is noisy. In this case, the preference of user u for item i is
where are i.i.d. random variables with
(we assume 0
2). With probability
, the preference of user u for item i is flipped relative to the preference of user type
) for item type
) in the preference matrix Ξ.
To accommodate the noisy feedback, we modify the partitioning subroutine in Step 1 of UserUser algorithm with NoisyUserPartition given in Algorithm 2. The main modification is that in Lines 5 and 6 of User-User algorithm, users are placed in the same partition if they rate all of the first r items similarly. Instead, users are now placed in the same partition if they rate the majority of the first r items similarly. The parameters are chosen to guarantee that the partitioning over users is consistent with their type with probability greater than 1
Remark 4.2. In Line 8, the algorithm checks whether there is a partitioning over the users consistent with variables . This is true precisely when the graph with edge set
is a disjoint union of cliques.
Remark 4.3. The noisy feedback decreases performance in two ways: partitioning users correctly requires more exploration recommendations, resulting in a larger cold-start time. Additionally, in Step 2, even good exploitation recommendations can be disliked due to noise. The next theorem quantifies these observations.
Theorem 4.4. Consider the model introduced in Section 2 with user types and
item types. Let
achieves regret
Proof. The proof of this theorem is very similar to the proof of Theorem 4.1. Lemma 4.5 replaces Lemma 4.2 to show that with the given choice of parameters in Algorithm 2, the partitioning is the same as partitioning over the users by their types with probability greater than 1
Additionally, Equation (3) in the proof of Theorem 4.1 changes as follows to be consistent as a result of noisy feedback modeled in (11):
Equation (6) is replaced with
and since . It follows that
Claim 4.3 bounds the right-hand side. Plugging in these equations in the subsequent part of the proof of Theorem 4.1 gives the statement of the theorem.
Lemma 4.5. Consider the user similarities computed in Step 7 of NoisyUserPartition (Algorithm 2). Define the event that these similarities coincide with the underlying user types. If
. It follows that if
the event that all users are partitioned correctly, then
The proof of this lemma is similar to the proof of Lemma 4.2 and is deferred to Appendix C.
In this section we prove a lower bound on the regret of any online recommendation system in the regime with user structure only where as described in Definition 2.1.
. In the user structure model with
user types, any recommendation algorithm must incur regret
Remark 5.1. The lower bound depends on a parameter that has two effects: (1) the slope of the regret curve during the cold start grows (approaching 1/2) as the chosen parameter
shrinks to 0; (2) the cold start time r is upper bounded as
Additionally, if the number of user types satisfies , the slope of regret after the cold start time (the asymptotic rate of regret) approaches
. This is expected since each item can be recommended at most once to each user. Hence, even if the structure in the user space is known, the algorithm should explore new items. On average,
explorations (about half of which are disliked) are necessary for every N recommendations.
The simplified version of this theorem in Section 3 is obtained using for a constant
5.1 Proof strategy
At a high level, the lower bound is based on two observations:
• A good estimate of user types is necessary to make meaningful recommendations. Notably, estimating similarity between users requires approximately log items rated in common.
Suppose that the preference matrix Ξ (with elements , the preference of user types for item types) is known (which is the case in user-structure only model). Also, suppose that we have obtained feedback from some user u for t items. Relative to the total number of types, user u must belong to a restricted set of user types consistent with this feedback. If t is small, the set of consistent types is large (for instance, if a user has rated only one item, there are roughly
2 candidate user types for this user). At this point, user u likes some item i with probability proportional to the number of consistent types liking the item. Control of this
count amounts to a property of the matrix we call ()-column regularity in Definition 5.1, which holds with high probability.
• Even if we know the user types (i.e., clustering of users), the first time a given item is recommended to a user from a given type, the outcome is uniformly random.
Since there is no structure in item space (), learning the preference of a user type for an item is only achieved by recommending the item to one user from the user type. This is for the reason that the random variable
in the preference matrix is independent of all previous history in the situation described.
5.2 Proof of Theorem 5.1
We separately prove the two lower bounds in the statement of the theorem, starting with the first. The following regularity property in submatrices of the preference matrix allows us to control the posterior probability for an item being liked.
)-column regularity)
. For ordered tuple of distinct (column) indices
be the matrix formed from the columns of A indexed by w. For given row vector
the set of rows in M that are identical to b and denote its cardinality by
). The matrix A is said to be (
)-column regular
regular for all s < r.
Proof. Suppose that )-column regular. By induction it suffices to show that
column regular. We will check that (1
for all size
vectors b. For any given
be obtained from b by appending +1. Similarly
is obtained from b by appending
any
(
)-column regular, (1
, hence (1
Lemma 5.3. Let matrix have i.i.d. Bern(1/2) entries. If
(
-column regular (i.e.,
) with probability at least
Proof. For given column tuple w and row vector b, the expected number of times the row vector b appears is . A Chernoff bound (Lemma A.1) with
There are no more than possible choices of column tuple
possible choices of row vector b; the union bound yields the proof.
Proposition 5.4. Consider the user structure only model in Definition 2.1. Let , the regret is lower bounded by
Proof. We will show that for preference matrices Ξ satisfying column regularity, at any time most users have probability roughly half of liking any particular item given the feedback obtained thus far, even if the preference matrix is known. (Recall that the preference matrix contains the preference of each user type for each item type; there is still uncertainty in the actual type of each user or item).
At time t, suppose that n items in total have been recommended by the algorithm (each of the N users rates one item per time-step). We label the set of items by [n] = {1, . . . , n}. Let
matrix indicating the preference of each user type for these n items. Each item
]) and because the set of columns of the preference matrix Ξ is precisely
, the columns of A are independent and uniformly distributed in
according to this model.
We now focus on a particular user be the items recommended to user u up to time
1, and let
be the vector of feedback for these items. We claim that conditional on the matrix
the end of time instant
1 is uniformly distributed over the set of user types
) consistent with this data (
) is defined in Definition 5.1).
Let be obtained from b by appending +1. Then
when
), which in words reads “user u is among those types that are consistent with the first
1 ratings of u and have preference vector with ’+1’ for the item recommended to u at time t”. It follows that for any matrix A corresponding to items [n],
The second equality is due to: i) are functions of
; ii) for fixed w and b the set
) is determined by
) is uniformly distributed on
) conditional on A, b, w. Recall that Ω
was defined as the set of (
)- column regular matrices. It now follows by the
tower property of conditional expectation that
The last two equalities are justified as follows: if then by Claim 5.2,
Definition 5.1, this means that
2 to get the last inequality.
Fix 0 and define
Lemma 5.3 shows that at time , for this choice of
with probability 1
. We get the bound
It follows from the above display and the definition of
where (a) uses the definition of . (b) uses the summation of a Geometric series and
log
. (c) uses the definition of
We now proceed to the proof of the second lower bound in Theorem 5.1.
The main ingredient in the proof of the proposition is definition of an event that implies that the outcome of the associated recommendation is uniformly random. Let be the event that some user of same type
has rated item
Claim 5.6. If no user with the same type as u has rated item , the probability that user u likes item i conditional on any history consistent with this is
Proof. According to Definition 2.1, in the user-structure only model, the matrix Ξ is deterministic and has columns consisting of all sequences in We will show that
A priori ) is uniform on [
]. Given the sequence
), the matrix Ξ and the feedback
up to time
1, the posterior distribution of
) is uniform over the set of all item types j which are consistent with the outcome of recommending i to users of various types. We call this set
Since on the event (, no user with the same type as
since the matrix Ξ has columns consisting of all sequences in
, half of the item types in set
) are liked by user u and half of them are disliked.
The final ingredient in the proof of Proposition 5.5 is a lower bound on the number of items recommended for which Claim 5.6 applies.
Claim 5.7. The expected number of times a new item is recommended to a user type by time T is lower bounded as
Proof. At the end of time-step T each user has been recommended T items, hence each user type has been recommended at least T items. Let be the number of user types in which there is at least one user. The total number of times an item is recommended to a user type for the first time is at least
. Applying Lemma A.5 shows that
We now complete the proof of Proposition 5.5.
Proof of Prop 5.5. Partitioning recommendations according to
where all the summations are over ]. Rearranging shows that
This section describes a version of item-item CF with explicit exploration steps and analyzes its performance within the setup specified in Section 2. The algorithm is quite different from the user-user algorithm. This is due to the inherent asymmetry between users and items: multiple users can rate a given item simultaneously but each user can rate only one item at each time-step.
6.1 Algorithm
Algorithm Item-Item performs the following steps (see Algorithm 3). First, items are partitioned according to type; next, each user’s preference for each item type is determined; finally, items from liked partitions are recommended. These steps are now described in more detail.
Two sets , each containing M random items, are selected. In the exploration step, each item is recommended to
random users. The feedback from these recommendations later helps to partition the items according to type. The parameter r is chosen large enough to guarantee small probability of error in partitioning (
for each item). The use of two sets
, as opposed to just one, is to simplify the analysis; as described next, item type representatives are selected from
, and are used to represent clusters of items from
An item from each of is recommended to all users, where
is a parameter determined by the algorithm. It turns out that it is often beneficial (depending on system parameters) to learn user preferences for only a subset of the types, in which case
is strictly less than
of the
items chosen from
is thought of as a representative of its type. For each
all items in
that appear to be of the same type as the representative item i
are stored in a set
and then removed from
. This guarantees that at each time
does not contain items with the same type as any of the previously selected representative items. For each
all items in
that appear to be of the same type as i
are stored in a set
and then removed from
For each user u, we add the items in the groups whose representative i
were liked by u to the set of exploitable items
. Finally, in the exploitation phase, each user is recommended items from
. We choose the number M of items in each of
as a function of
to ensure that there are enough exploitable items in
for all users u for the entire length-T time-horizon. Then,
is chosen to minimize regret.
The algorithm description uses the following notation. For an item i and time t > 0,
is the set of users that have rated item i before time t. The time t is implicit in the algorithm description, with rated(i) used to represent ) at the time of its appearance.
Remark 6.1. The set of items are updated throughout algorithm ItemExplore. In the proof, we use the notation
to refer to the set of items
at the beginning of the algorithm Item-Item.
Remark 6.2. Assignment of users to items for Line 3 of ItemExplore: This is done over time-steps, with additional recommendations being random new items. The main requirement in assigning users to items is to make Claim 6.3 (which bounds the probability of mis-classification of each item) hold. Since the claim addresses each item separately, we may introduce dependencies between sets of users for different items. What is important is that the set of users assigned to each specific item is uniform at random among all sets of users (or at least contains a random subset with size r). For example, one can choose a random permutation over the users, repeat this list
times, and then assign each item to a block of r users.
(Algorithm 3) obtains regret per user at time T upper bounded as
The simplified version of this theorem in Section 3 is obtained 2 log(N > 5.
Remark 6.3. The regret bound we get for the algorithm is actually obtain T/2 by a trivial algorithm which recommends random items independent of feedback to all users. This trivial algorithm improves on our bound for the parameter range in which T/2 < Y (T). Thus in our analysis we focus on the parameter range where
; one consequence is that
as chosen in Algorithm 3.
6.2 Proof of Theorem 6.1
We prove Theorem 6.1, deferring several lemmas and claims to the next subsection.
The basic error event is misclassification of an item. In of items in
that the algorithm posits are of the same type as the j-th representative i
be the event that item i was mis-classified,
Let be the number of time-steps spent making recommendations in ItemExplore. Recall that
is the set of items to be recommended to user u in the exploit phase. We partition the recommendations made by Algorithm Item-Item to decompose the regret as follows:
The first term, A1, is the regret from early time-steps up to . The second term, A2, is the regret due to not having enough items available for the exploitation phase, which is proved to be small with high probability for sufficiently large M. The third term, A3, is the regret due to exploiting the misclassified items. It is small since few items are misclassified with the proper choice of
r. The fourth term, A4, is the regret due to exploiting the correctly classified items. It is intuitively clear and will be checked later that A4 = 0 .
Bounding A1. Line 3 of ItemExplore takes at mostunits of time to rate every item in
users each, since N users provide feedback at each time-step. Remark 6.2 above discusses the assignment of users to items in this phase. After this,
representative items are rated by every user in the for loop (lines 4 through 10 of ItemExplore), which takes
time-steps. This gives
For from the perspective of any user the items recommended to it are of random type and hence
are devoted to exploitation as described in ItemExploit. During this phase, a random item
is recommended to user u only when there are no items to exploit because all items in
have already been recommended to u. So, the total number of times an item
is recommended in the time interval
is at most (
and
where the last inequality is from Lemma 6.4 in Section 6.3.
Bounding A3. Term A3 in (15) is the expected number of mistakes made by the Algorithm ItemExploit as a result of misclassification. Claim 6.2 below upper bounds the expected number of “potential misclassifications” (defined in Equation (19)) in the algorithm to provide an upper bound for this quantity:
Bounding A4. By definition of the mis-classification event, , given in Equation (14), if an item
is correctly classified, then
, all users rate i the same
as i. By construction of the sets
in Line 11 of ItemExploit, for any item
some
] such that
likes item i
It follows that A4 = 0.
Combining all the bounds. Plugging in Equations (16), (17), (18) and A4 = 0 into Equation (15) gives
Setting
where (a) holds for 13 (imposed by the algorithm for
2) which gives
(which holds if
, we have 3
(b) is obtained by choosing the parameter
6.3 Lemmas used in the proof
In the remainder of this section we state and prove the lemmas used in the analysis above.
Claim 6.2. Suppose that . The expected total number of times a misclassified item is recommended in the exploitation step is upper bounded as
, defined in (14) to be the misclassification of item
, occurs if the preferences of random users classifying item i is the same as a previous representative item with a different type. Given matrix Ξ, this event is a function of the order of choosing the representative items and the choice of random users in Line 3 of ItemExplore. Instead of directly analyzing
an event called “potential error event”
, which will be shown to satisfy
bound for
] is given in Claim 6.3.
For item i and subset of users
to be the event that the ratings of users in U for item i agree with some other item type. For item is the time i is added to
in the exploration phase, let
the set of witness users whose ratings were used to conclude that
are of the same type. Item i is added to
only if all users in
, so misclassification
Equation (14)) implies
. We can now deduce the inequalities, justified below:
Inequality (a) holds because according to Line 11 of ItemExplore for every user a subset of
, and also the containment
; (b) follows since each item i is recommended at most N times; (c) uses
together with Claim 6.3 which shows that
Claim 6.3. Suppose that . Consider the “potential error” event
the set of users
defined immediately after. Then
Proof. Each representative item iis rated by all of the users. Other items in
are rated by at least r users in the exploration phase. Remark 6.2 describes how Line 3 in ItemExplore is performed to guarantee that the set of r users assigned to each specific item is uniformly at random among all subset of users of size r. By Lemma A.3, if
with probability at least 1
2 users with distinct user types in a specific
. It follows that the r/2 users with distinct types (chosen independently of the feedback) that rate item
) also have the same ratings for type
) with probability at most 2
. (Any two item types
have jointly independent columns in the preference matrix.) The choice
) and a union bound over item types j completes the proof.
be defined as in Line 10 of algorithm ItemExplore. Then
Proof. We begin with a sketch. Any user u likes roughly half of the item types that have representatives
. The total number of items in
, so there are about
from each of the item types in
. Adding over the
types, the set
of items in
u likes will typically have size at least T.
Making this argument rigorous requires some care for the following reasons: 1) is the union of the
can be missing items due to misclassification (there may be items of the same type as i
that have been classified as being of the same type as some i
which
1). 2) The distribution of the type of each representative item depends on the number of remaining items of each type in
when the representative is chosen. Again, due to misclassification, this can be different from the actual number of items of each type initially present in
. Moreover, because misclassification of an item depends on the ratings of users for the item, the choice of next item type to be represented is therefore dependent on ratings of users for other types. The effect of this dependence is addressed in Claim 6.5 below.
We now proceed with the proof, bounding the size of by introducing a different set. For
be the items in whose types are the same as one of the representatives i
that are liked by u. Note that if an item
is correctly classified by the algorithm (i.e. (
occurs, where
defined in Equation (14)), then
It follows that
To bound the second term we use Claim 6.3 above, which gives
and by Markov’s Inequality
We now bound the first term on the right-hand side of (22). To this end, let
be the type of item representatives that are liked by user u. By definition of
Now, we claim that the set is independent of all types and preferences for items in
this, note that
is determined by row u of the type matrix Ξ and the items in
, their types, and randomness in the algorithm, which determines the choice of i
); these together determine
). So, conditioning on
having cardinality ˜
is the sum of
i.i.d. Bernoulli variables with parameter
Conditioning on ˜
, by a Chernoff bound (Lemma A.1) and stochastic domination of binomials with increasing number of trials we obtain
In Lemma 6.5 below, we will lower bound the probability that ˜20. Combining the last displayed inequality with Lemma 6.5, we get
Plugging this and (24) into (22) gives
and since
The bound on ] is an immediate consequence.
Proof. For a given user is the number of item type of representatives liked by user
be the item types in [] that are liked by user u:
The variables are i.i.d. Bern(1/2), so a Chernoff bound (Lemma A.1) gives
We will show that
which will prove the lemma by combining with (27).
We now work towards defining a certain error event in Equation (31) below. Let the sequence of random variables ) denote the types of the item representatives chosen by the algorithm, so that ˜
defined in (25)). Let ¯
) be the set of items in
Later we will use the notation ¯for the collection of these sets. Now let the event
denote misclassification of an item
(similar to event
Let Err be the event that for some item type j, more than a fraction 1/10 of the items in type j are misclassified:
Conditioning on Err in the left-hand side of (28) gives
Bound on second term of (32) We will use Markov’s inequality to bound
We need to consider the effect of matrix Ξ (and in particular its )th row, which determines
on the probability of error in categorizing items. Notably, if users rate two item types similarly, the probability of misclassifying items of these types is higher. As a result,
contains some information about discriminability of distinct item types and we need to control this dependence.
Claim 6.2 shows that for , the probability of
(the event that item i is miscategorized) is
. The same proof gives
defined in (30) for
that, let the potential error event
, be the event that there exists an item type
that for all the users
the time the algorithm added
. Then the containment
holds. We will later use the inequality
] and focus on the set of users
using
As specified in Algorithm of at least r users, chosen independently of feedback, rate item i. By Lemma A.3, if
, then there are users of at least r/2 distinct user types in
with probability at least 1
Conditional on
having at least r/2 distinct user types, there are at least
1 users of distinct types in
—also distinct from type of u—whose preferences for item type
) are independent of
and ¯
Conditional on , any two item types
have jointly independent user preferences by
1 users with distinct types in
; they are rated in the same way by these users with probability at most 2
. A union bound over item types
. The choice
and the con- tainment
. Knowing ¯
determines
and Markov’s inequality gives
Union bounding over ] and tower property gives the desired bound.
Bound on first term of (32). We start by showing that conditioned on the event conditional on
and variables
), the type of the m-th representative item,
, is almost uniform over all the item types not learned yet, [
Concretely, we will find upper and lower bounds on
for any . Later, we will focus on
Lower bound for (33). Let t be the time the m-th item type representative is chosen by the algorithm. For the probability of choosing representative of type
equal to the proportion of items of type j in the remaining items in
The number of items of type
1 is at least
The number of items removed from is at least
(there could be
additional items with types not in that were removed from
due to misclassification). Hence, the total number of remaining items in
is at most
be the collection of indicator variables
On the event (defined in (31)),
fore,
Upper bound for (33). This time we lower bound the number of items remaining in number of items in types [
minus the number of possible mistakes which removed some items from them. This gives
By definition of Err (in (31)), we have
Combining lower and upper bounds. The expected value of conditional on variables
is invariant to choice of
. So, the conditional expectation
is independent of . Note that in the above display, the conditional expectation is with respect to ¯
1. Hence, using tower property of expectation on (34) and (35) to remove the conditioning on
along with the definition of C above gives
Since there are , summing over j in the second inequality of the last display gives
. Plugging this into the first inequality of the last display
gives for all
Conditional on
So, on the event , the random variable
] conditional on
and events
, stochastically dominates a Binomial random variable with mean
12. Hence, by a Chernoff bound,
In this section we prove a lower bound on the regret of any online recommendation system in the regime with item structure only where as described in Definition 2.2. Throughout this section, we will assume N > 32.
. In the item structure model with
item types, any recommendation algorithm must incur regret
The assumption N > 32 implies 5. Note that the function Z(T) is continuous up to a multiplicative constant factor
To get the simplified version in Section 3, we bounded the value of Z(T) in the second (in which . Also, the assumption
guarantees that
7.1 Proof strategy
We call a recommendation (or uncertain) recommendation when the probability of
2. Conversely, recommendations for which the probability of
1/2) are considered good recommendations. Good and bad refers only to the confidence that the recommendation is liked given the history at the moment the recommendation is made: a good recommendation is not always liked and a bad recommendation is not always disliked.
We identify two scenarios in which recommendations are necessarily bad in the item structure only model introduced in Definition 2.2: (i) A good estimate of item types is necessary in order to make meaningful recommendations. To determine whether or not two items are of the same type, approximately log users should rate both of them. To formalize this, similar to the lower bound for the model with user structure only in Section 5, we use the concept of (
)-row regularity (Definition 7.1). Lemma 7.3 shows that for a (
)-row regular preference matrix, items with fewer than r ratings are liked by any user with probability roughly half, even if the preference matrix is known. (ii) Even when we know the item types (i.e., clustering of items), if a given user u has not rated any item with the same type as item i before, the probability that user u likes item i is 1/2. Lemma 7.4 shows this property.
In Lemma 7.5 we bound regret in terms of the number of good recommendations and in Lemma 7.6 we upper bound the number of good recommendations, which entails a counting argument. The theorem follows immediately from Lemmas 7.5 and 7.6.
7.2 Proof of Theorem 7.1
Definition 7.1 (Row-regularity). The matrix is said to be (
its transpose is (
)-column regular (Definition 5.1). We write
is the set of (
)-column regular matrices.
The following lemma is an immediate corollary of Lemma 5.3.
Lemma 7.2. Let matrix have i.i.d. Bern(1/2) entries. If
(
-row regular with probability at least
Throughout this section we will fix
Applying Lemma 7.2 with these choices of , we obtain that so long as N > 32 the preference matrix Ξ is (
) row-regular (
) with probability
The following lemma shows that if is small and the preference matrix is row-regular, then the outcome of recommending item i to any user at time t is uncertain.
The next lemma shows that if a user u has not rated any item with the same type as item i before, the probability that u likes item i is 1/2. Let denote the event that user u has rated an item of the same type as item
Lemmas 7.3 and 7.4 (proved in Section 7.3) identify scenarios in which recommendations are bad. In the complementary scenario, recommendations are not necessarily bad (and may be good). We denote the number of such recommendations by
The proof of the following lemma uses Lemmas 7.3 and 7.4 to lower bound regret in terms of expectation of good(T).
Proof. We partition the liked recommendations based on , and row regularity of Ξ:
The proof is obtained by plugging in the four bounds below and simplifying.
Bounding A1. Plugging in the probability of Ξ being row regular from (38) gives
Bounding A2. Multiplying the statement of Lemma 7.3 by summing over i gives
Bounding A3. Lemma 7.4 gives
Bounding A4. We bound by one the probability that a good recommendation is liked to obtain
Next, we upper bound the expected number of good recommendations made by the algorithm in terms of parameters of the model.
Lemma 7.6 (Upper Bound for expected number of good recommendations). For any algorithm,
We prove this lemma in the next subsection. Theorem 7.1 is an immediate consequence of Lemmas 7.5 and 7.6.
7.3 Proofs of lemmas
7.3.1 Proof of Lemma 7.3
We show that if an item has been rated by fewer than r users, its type is uncertain, because many item types are consistent with the history even if the preference matrix is known. For a row-regular preference matrix, uncertainty in the type of an item makes it impossible to accurately predict user preferences for that item.
Consider item be the ordered tuple corresponding to the set of users that were recommended item i up to time
vector of feedback from users in w about item
We re-introduce the notation from Definition 5.1: if M is the matrix obtained by concatenating the rows of Ξ indexed by
) is the set of columns of M (corresponding to the item types) equal to b. This is the set of item types consistent with the ratings b of users w for item i.
Conditional on Ξ, at the end of time
1 is uniformly distributed over the set of item types
). This allows us to relate the posterior probability of i being liked to row regularity of Ξ as follows. Let
be obtained from b by appending +1 . For a given user
which in words reads “item i is among those types that are consistent with the ratings of i up to time
1 and have preference vector with ‘+1’ for user u”. It follows that for any preference matrix Ξ and any user u which has not rated i up to time
The second equality is due to: (i) w and b are functions of the history up to time (ii) for fixed
) is determined by Ξ
) is uniformly distributed on
Recall that Ξif the preference matrix Ξ is (
)-row regular. We have
and this proves the lemma. It remain to justify the steps above. (a) follows since conditional on , the random variable
is independent of all other random variables. (b) uses (42) and the fact that
] is nonzero only if u has not rated item i, so we may add
justified as follows: if Ξ
, then by Claim 5.2, Ξ
By Definition 5.1, this means that
is less than 1/ log 32 and (1 +
7.3.2 Proof of Lemma 7.4
At a high level, we make two observations: (i) if user u has not rated any item with type the feedback in the history
is independent of the value of
given all other elements of matrix Ξ and the item types; (ii) the types of items (function
)) are independent of matrix Ξ, and the elements of Ξ are independent. Hence, conditional on (
, the posterior distribution at time
is uniform on
Concretely, we may think of revealing the entries of Ξ on a “need-to-know” basis. Conditional on (has not yet been touched, so
The lemma now follows by the tower property.
7.3.3 Proof of Lemma 7.6
Lemma 7.6 upper bounds the expected number of good recommendations. To prepare for the proof of this lemma we introduce some notation. Recall that (defined in (39)) is the number of users who have rated item i before time
be the set of items with at least r ratings before time
their number,
Let be the set of items that have been rated by at least one and fewer than r users by time
their number,
In the following claim, we bound the number of good recommendations up to time T in terms of
Claim 7.7. The number of good recommendations good(T), defined in (41), satisfies
is recommended to at least r users by the end of time T . So, for any
there are r recommendations in which i has been rated fewer than r previous times:
Any has been recommended at least once, so for these items
All recommended items are either in . So, the total number of recommendations satisfies
where we used
The rest of the section contains the proof of Lemma 7.6.
Proof of Lemma 7.6. The lemma consists of three bounds on good(T). . This bound is the easiest, so we start with this.
(a) uses the definition of good(T). For any item i, the sequence is nondecreasing in t. This shows that if for an item
, then item
. (b) is derived by changing the order of summations, the definition of
and equality
(c) Because each item is recommended at most once to each user, each item (and specifically items in
) are recommended at most N times. This gives for
bound for the first term in (c). The second term in (c) is bounded by Equation (43).
Plugging this into the statement of Claim 7.7 gives for any values of
We now prove the other two bounds on good(T), this time in terms of the number of item types each user has rated by time T. Let Γbe the set of item types that are recommended to user u up to time T,
Proof. For user u, the number of times an item type is rated for the first time by this user is equal to the number of item types rated by user . Summing over
the claim.
. This follows from Claim 7.8 since
This last inequality is more involved than the others. Let
be the number of items with type j that have been recommended (to any user) by time t:
We get a lower bound on minvia an upper bound on max
, which is in turn obtained via martingale concentration bounds for each
. This is essentially just a question of bounding the fullest bin in a balls and bins scenario, with the added complication that the time-steps in which balls are thrown is random. Thus the number of balls (i.e.,
) is random, and the decision to throw a ball at a given time may depend on the configuration of balls in bins.
2 ,
3 log
8 log
Proof. First, we define a useful martinagle. Let is defined in (46). Note that
is the total number of recommended items at the end of time t. Any new item has type uniformly distributed on [
]; as a consequence, the sequence
martingale with respect to filtration
is incremented whenever a new item is recommended and each new item increases
by one with probability 1
It turns out to be easier to work with a different martingale that considers recommendations to each user separately, so that the item counts are incremented by at most one at each step. Consider the lexicographical ordering on pairs () if either
(such that the recommendation to user v at time s occurred before that of user u at time t). For
Let ) and define
to be the total number of items recommended by (
. We now define a sequence of stopping times
where (t, 0) is interpreted as (is the first (t, u) such that a new item is recommended by the algorithm for the k-th time, so
are stopping times with respect to (
), and observe that
total number of items recommended by the algorithm by the end of time
and
Fix item type ]. The sequence
is a martingale with respect to the filtration
. It follows that
is martingale as well, this time with respect to
. We will use this notation to prove statement of the claim in three different regimes. First, we would like to apply martingale concentration (Lemma A.2) to
, and to this end observe that Var(
and
1 almost surely.
, Lemma A.2 gives
This gives
where (a) uses . (b) uses a union bound and the inequality in the last display. (c) uses definition of
(which is derived using
This gives
(a) uses . (b) uses the inequality in the above display.
2, corresponds to bounding the number of balls in the fullest bin when the number of balls, k, is sublinear in the number of bins,
). We will show that in this regime, the number of balls in the fullest bin is bounded by 1
. For given
(a) is a union bound over ]. (b) uses the fact that
+ 1 with probability 1
with probability 1
independently of
. (c) holds for every
is due to
(a) uses . (b) uses a union bound and (51). Last inequality uses
Step 4 This step uses a variation of the Birthday Paradox to bound max
(a) and (b) use the fact that is a nondecreasing function of
of the (k + 1)-th drawn item is independent of the type of the previous k drawn items. Hence, conditional on
, the random variable
is independent of
. This gives equality (c). (d) uses exp
for
We put it all together,
(a) uses the definition of (b) uses (49), (50), (52) and (53). Last inequality uses
This statement and Claim 7.8 imply that with probability at least 1/2 we have
Combining this bound and Claim 7.7 gives that with probability at least 1/2,
Using the definition of the function Θ) in Claim 7.9 and some algebra, one can show that for any k > 0,
or alternatively, max) is defined in (36). To prove this, we show that if
for each regime of parameter T. Note that the above bound is not tight, but it is chosen such that this lower bound (and consequently the function Z(T) which is a scaling of the above lower bound as defined in (36)) is continuous up to a multiplicative constant factor.
Equation (41) shows that with probability one. Hence,
This completes the proof of the lemma.
In this paper, we analyzed the performance of online collaborative filtering within a latent variable model for the preferences of users for items. We proposed variants of user-user CF and item-item CF that explicitly explore the preference space. We also proved lower bounds for regret in the extreme regimes of parameters corresponding to user-structure only (no structure in item space) and item-structure only (no structure in user space). The lower bounds showed that the proposed algorithms are almost information-theoretically optimal in these parameter regimes.
Adaptivity to unknown time time horizon T, as required to bound the anytime regret, is achieved via a doubling trick whereby the algorithm is run afresh at a growing sequence of epochs. In practice one would surely benefit from using knowledge gained from exploration in earlier epochs instead of starting from scratch at each epoch. We mentioned how the user-user algorithm can be modified to be adaptive to the number of user types, but it is less obvious how to make the item-item algorithm adaptive without resorting to an impractical trick analogous to the doubling trick used for T.
It is possible to modify all of the proposed algorithms to handle i.i.d. noise in the user feedback. We did this only for the user-user algorithm, but it is straightforward to do so also for the item-item algorithm. A hybrid algorithm, exploiting structure in both user space and item space, that is nearly information-theoretically optimal in all regimes appears in a forthcoming paper.
While various insights were obtained through the analysis carried out in the paper, the assumed randomly generated user preference matrix is unrealistic. A reasonable next objective is to perform a similar analysis with a more flexible model for user preferences, perhaps described by a low-rank matrix or a graphical model.
We are indebted to Devavrat Shah, George Chen, and Luis Voloch for many inspiring discussions and thank Alexander Rakhlin for suggesting several references. This work was supported in part by grants NSF CCF-1565516, ONR N00014-17-1-2147, and DARPA W911NF-16-1-0551.
The following lemma is derived by application of Chernoff bound to Binomial variables [28].
Lemma A.1 (Chernoff bound)be independent random variables. Let
. Then, for any
Lemma A.3 (Balls and bins: tail bound for number of nonempty bins)balls are placed into n bins each independently and uniformly at random, then with probability at least 1
bins are nonempty.
Proof. Any configuration with at most m/2 nonempty bins has at least 2 empty bins. Thus we may bound the probability of having
2 bins be empty. There are
which has probability [(Thus, using union bound, the probability of at most m/2 nonempty bins is bounded by
where we used
The following generalization of the above lemma is used in the analysis of the recommendation system in noisy setup.
Lemma A.4 (Generalized Balls and bins: tail bound for number of nonempty bins). Fix 0 < a < 1 and balls are placed into n bins each independently and uniformly at random, then with probability at least 1
bins are nonempty.
Proof. Any configuration with at most m/2 nonempty bins has at least 2 empty bins. Thus we may bound the probability of having
2 bins be empty. There are
which has probability [(. Thus, the probability of at most m/2 nonempty bins is bounded by
where we used
The following lemma records a simple consequence of linearity of expectation.
Lemma A.5 (Balls and bins: bound for the expected number of nonempty bins). If we throw m balls into n bins independently uniformly at random, then, the expected number of nonempty bins is
The doubling trick converts an online algorithm designed for a finite known time horizon to an algorithm that does not require knowledge of the time horizon and yet achieves the same regret (up to multiplicative constant) at any time [27] (i.e., anytime regret).
The trick is to divide time into intervals and restart algorithm at the beginning of each interval. Let A(T) be an online algorithm taking the known time horizon as input and achieving regret R(T) at time T. There are two regret scalings of interest. (1) If R() for some 0
then to achieve anytime regret, the doubling trick uses time intervals of length 2
achieves regret of at most R(
. (2) Alternatively, if R(T) = O(log T), then using intervals of length 2
achieves regret of at most 4R(T) at time T for any T.
Clearly, different scalings can be used before and after if the algorithm achieves regret
, as is the case for the proposed item-item CF algorithm.
We show that ) for any pair of users
Using a union bound over the
pairs of users gives
According to Line 7 of the Algorithm 4.5,
, the same item is recommended to all users:
Hence,
. First, we look at the users of the same type:
where (a) uses the definition of according to the Line 7 of Algorithm 2. (b) uses the noise model (11). If
which gives (c). The variables
are independent of other variables in the model which implies (d). For users
and any item i,
. Lemma A.1 and the choice of
in Algorithm 2 gives (e). The choice of r gives the result (f).
Now, we look at the pair of users to be the set of item types on which user types
) disagree.
(a) uses the definition of in Line 7 of Algorithm 2. Total probability lemma gives (b). Conditional on
), the variables
are independently uniformly distributed. Using the definition on
, this implies that
is the sum of
i.i.d. uniform Bernouli random variables. Hence, Chernoff bound in Lemma A.1 gives the bound on the first term in (c). To bound the second term, note that the items
are chosen independently of feedback for
conditional on
12, the variables
1 with probability at least 5/12. The variables
are independent of other parameters of the model and algorithm and
1 with probability 2
probability at least [5 + 4
12. Using Chernoff bound in Lemma A.1 gives the second term in (c). The assumption
) and the choice of r gives the result.
[1] S. Aditya, O. Dabeer, and B. K. Dey, “A channel coding perspective of collaborative filtering,” IEEE Transactions on Information Theory, vol. 57, no. 4, pp. 2327–2341, 2011.
[2] G. Biau, B. Cadre, and L. Rouviere, “Statistical analysis of k-nearest neighbor collaborative recommendation,” The Annals of Statistics, vol. 38, no. 3, pp. 1568–1592, 2010.
[3] E. J. Cand`es and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, p. 717, 2009.
[4] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the forty-fifth annual ACM Symposium on Theory of Computing. ACM, 2013, pp. 665–674.
[5] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010.
[6] S. Negahban and M. J. Wainwright, “Restricted strong convexity and weighted matrix completion: Optimal bounds with noise,” Journal of Machine Learning Research, vol. 13, no. May, pp. 1665–1697, 2012.
[7] A. Rohde and A. B. Tsybakov, “Estimation of high-dimensional low-rank matrices,” The Annals of Statistics, vol. 39, no. 2, pp. 887–930, 2011.
[8] N. Srebro, N. Alon, and T. S. Jaakkola, “Generalization error bounds for collaborative prediction with low-rank matrices,” in Advances In Neural Information Processing Systems, 2005, pp. 1321–1328.
[9] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
[10] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985.
[11] D. Russo and B. Van Roy, “Learning to optimize via information-directed sampling,” in Advances in Neural Information Processing Systems, 2014, pp. 1583–1591.
[12] G. Bresler, G. H. Chen, and D. Shah, “A latent source model for online collaborative filtering,” in Advances in Neural Information Processing Systems, 2014, pp. 3347–3355.
[13] G. Bresler, D. Shah, and L. F. Voloch, “Collaborative filtering with low regret,” in SIGMETRICS Performance Evaluation Review, vol. 44, no. 1. New York, NY, USA: ACM, Jun. 2016, pp. 207–220. [Online]. Available: http://doi.acm.org/10.1145/2964791.2901469
[14] B.-H. Shen, S. Ji, and J. Ye, “Mining discrete patterns via binary matrix factorization,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2009, pp. 757–766.
[15] J. Xu, R. Wu, K. Zhu, B. Hajek, R. Srikant, and L. Ying, “Jointly clustering rows and columns of binary matrices: Algorithms and trade-offs,” in ACM SIGMETRICS Performance Evaluation Review, vol. 42, no. 1. ACM, 2014, pp. 29–41.
[16] A. Bellogin and J. Parapar, “Using graph partitioning techniques for neighbour selection in user-based collaborative filtering,” in Proceedings of the sixth ACM Conference on Recommender Systems. ACM, 2012, pp. 213–216.
[17] A. S. Das, M. Datar, A. Garg, and S. Rajaram, “Google news personalization: scalable online collaborative filtering,” in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 271–280.
[18] G. Linden, B. Smith, and J. York, “Amazon.com recommendations: Item-to-item collaborative filtering,” IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.
[19] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web. ACM, 2001, pp. 285–295.
[20] O. Dabeer, “Adaptive collaborating filtering: The low noise regime,” in International Symposium on Information Theory Proceedings (ISIT). IEEE, 2013, pp. 1197–1201.
[21] I. Kerenidis and A. Prakash, “Quantum recommendation systems,” arXiv preprint arXiv:1603.08675, 2016.
[22] K. Barman and O. Dabeer, “Analysis of a collaborative filter based on popularity amongst neighbors,” IEEE Transactions on Information Theory, vol. 58, no. 12, pp. 7110–7134, 2012.
[23] D. Song, C. E. Lee, Y. Li, and D. Shah, “Blind regression: Nonparametric regression for latent variable models via collaborative filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 2155–2163.
[24] J. Wang, A. P. De Vries, and M. J. Reinders, “Unifying user-based and item-based collaborative filtering approaches by similarity fusion,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 501–508.
[25] B.-H. Kim, A. Yedla, and H. D. Pfister, “Imp: A message-passing algorithm for matrix com- pletion,” in Turbo Codes and Iterative Information Processing (ISTC), 2010 6th International Symposium on. IEEE, 2010, pp. 462–466.
[26] C. Borgs, J. Chayes, C. E. Lee, and D. Shah, “Thy friend is my friend: Iterative collaborative filtering for sparse matrix estimation,” in Advances in Neural Information Processing Systems, 2017, pp. 4715–4726.
[27] N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games. Cambridge University Press, 2006.
[28] F. Chung and L. Lu, “Concentration inequalities and martingale inequalities: a survey,” Internet Mathematics, vol. 3, no. 1, pp. 79–127, 2006.
[29] C. McDiarmid, “Concentration,” in Probabilistic methods for algorithmic discrete mathematics. Springer, 1998, pp. 195–248.