Posts [PR2] Deep content-based musicc recommendation
Post
Cancel

[PR2] Deep content-based musicc recommendation

Motivation

Most recommender systems rely on CF However,
CF suffers from the cold start problem:
fails when no usage data is available, not effective for recommending new and unpopular songs.

Contribution

  1. propose to use a latent factor model for recommendation
  2. predict the latent factors from music audio when they cannot be obtained from usage data.

Methods

Compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks,

Dataset

evaluate the predictions quantitatively and qualitatively on the Million Song Dataset.

Introduction

As the power of streaming services gain its strength, music industry has shifted towards digital distribution.

automatic music recommendation

  • it allows listeners to discover new music that matches their tastes,
  • enables online music stores to target their wares to the right audience.

aspects that effect music recommendation towards users

  • different styles and genres
  • social and geographic factors that influence listener preferences

recommender system

  1. Collaborative Filtering approach - rely on usage patterns: the combinations of items that users have consumed or rated
  2. Content-Based approach - predict user preferences from item content and metadata

prior consensus

CF generally outperform CB recommendation
ONLY usage data is available.

CF suffers from the cold start problem

cold start problem is new items that have not been consumed before cannot be recommended.
niche audience are more difficult to recommend because usage data is scarce.

Collaborative filtering suffers from the cold start problem: new items that have not been consumed before cannot be recommended. Additionally, items that are only of interest to a niche audience are more difficult to recommend because usage data is scarce.

In many domains, and especially in music, they comprise the majority of the available items, because the users’ consumption patterns follow a power law [2].

power law distributions in recommendation system power law is commonly used to describe the Pareto Principle or the 80/20 rule. CB approach does not affected by these issues.

Weighted matrix factorization

The Taste Profile Subset

this contains play counts per song and play counts per user -> but not rated them

traditional method

traditional matrix factorization algorithm is focused on predicting ratings problem is

  • If a user has never listened to a song algorithm goes to no usage

??????? 여기 약간 이해 안됨

revised method

WMF algorith is to learn latent factor representations of all users and items in the Taste Profile Subset

~~ 수식 쓰는것 부터~~

We used the weighted matrix factorization (WMF) algorithm, proposed by Hu et al. [16], to learn latent factor representations of all users and items in the Taste Profile Subset.

This is a modified matrix factorization algorithm aimed at implicit feedback datasets. Let rui be the play count for user u and song i. For each user-item pair, we define a preference variable pui and a confidence variable cui (I(x) is the indicator function, ↵ and ✏ are hyperparameters):

Related work

Many researchers have attempted to mitigate the cold start problem in collaborative filtering by incorporating content-based features. We review some recent work in this area of research. 7 Wang et al. [28] extend probabilistic matrix factorization (PMF) [29] with a topic model prior on the latent factor vectors of the items, and apply this model to scientific article recommendation. Topic proportions obtained from the content of the articles are used instead of latent factors when no usage data is available. The entire system is trained jointly, allowing the topic model and the latent space learned by matrix factorization to adapt to each other. Our approach is sequential instead: we first obtain latent factor vectors for songs for which usage data is available, and use these to train a regression model. Because we reduce the incorporation of content information to a regression problem, we are able to use a deep convolutional network. McFee et al. [5] define an artist-level content-based similarity measure for music learned from a sample of collaborative filter data using metric learning to rank [21]. They use a variation on the typical bag-of-words approach for audio feature extraction (see section 4.1). Their results corroborate that relying on usage data to train the model improves content-based recommendations. For audio data they used the CAL10K dataset, which consists of 10,832 songs, so it is comparable in size to the subset of the MSD that we used for our initial experiments. Weston et al. [17] investigate the problem of recommending items to a user given another item as a query, which they call ‘collaborative retrieval’. They optimize an item scoring function using a ranking loss and describe a variant of their method that allows for content features to be incorporated. They also use the bag-of-words approach to extract audio features and evaluate this method on a large proprietary dataset. They find that combining collaborative filtering and content-based information does not improve the accuracy of the recommendations over collaborative filtering alone. Both McFee et al. and Weston et al. optimized their models using a ranking loss. We have opted to use quadratic loss functions instead, because we found their optimization to be more easily scalable. Using a ranking loss instead is an interesting direction of future research, although we suspect that this approach may suffer from the same problems as the WPE objective (i.e. popular songs will have an unfair advantage).

Conclusion

In this paper, we have investigated the use of deep convolutional neural networks to predict latent factors from music audio when they cannot be obtained from usage data. We evaluated the predictions by using them for music recommendation on an industrial-scale dataset. Even though a lot of characteristics of songs that affect user preference cannot be predicted from audio signals, the resulting recommendations seem to be sensible. We can conclude that predicting latent factors from music audio is a viable method for recommending new and unpopular music. We also showed that recent advances in deep learning translate very well to the music recommendation setting in combination with this approach, with deep convolutional neural networks significantly outperforming a more traditional approach using bag-of-words representations of audio signals. This bag-of-words representation is used very often in MIR, and our results indicate that a lot of research in this domain could benefit significantly from using deep neural networks

This post is licensed under CC BY 4.0 by the author.