Why is the YouTube video recommendation system so strong? You will know after reading this article.

Why is the YouTube video recommendation system so strong? You will know after reading this article.
- youtube
  • Selected from Medium author: Tim Elfrink Machine Heart to compile and participate: Zhang Qian as the global mainstream video platform, the success of Google's video website YouTube can not be separated from the accurate video recommendation system
    What are the highlights of YouTube's recommendation system? What problems have they solved? In a RecSys 2019 paper, Google researchers explained these issues
    A data scientist from the Netherlands summarized the contents of the paper
    Address: what problem does https://dl
    acm
    org/citation
    cfm?id=3346997Youtube 's recommendation system solve? When watching videos on Youtube, the page displays a list of video recommendations that users might like
    This paper focuses on the following two major goals: 1) the unused goals need to be optimized
    They do not define the exact objective function, instead, the objective function is divided into "participation" (clicks, time spent) goals and "satisfaction" (likes, steps) goals; 2) reduce the "selection bias" introduced by the system: users are usually more likely to click on the recommended videos that rank first, although later videos may have higher participation and satisfaction
    How to effectively reduce these prejudices is an urgent problem to be solved
    How to solve the problem? Figure 1: the complete architecture of the model
    The model introduced in this paper focuses on two main objectives
    They used a width-amp; depth model framework
    The width model has strong memory ability, the depth neural network has generalization ability, and the width-amp; depth model combines the advantages of both
    The width-amp; depth model generates a prediction for each defined goal (participation and satisfaction)
    These objective functions can be divided into two categories: dichotomy (whether you like a video or not) and regression (rating a video)
    There is also a separate sorting model on top of this model
    This is just a weighted combination of output vectors, which are different prediction targets
    These weights are adjusted manually to achieve the best performance for different goals
    In addition, the researchers also proposed pairing, list and other advanced methods to improve the performance of the model, but due to the increase of computing time, these methods have not been applied to production
    Figure 2: replace the shared-bottom layer with MMoE
    In the deep part of the width-amp; depth model, the researchers used a multi-task learning model, MMoE
    The characteristics of the existing video (content, title, topic, upload time, etc
    ) and the information of the user being watched (time, user profile, etc
    ) are used as input
    The MMoE model can efficiently share weights among different goals
    Shared underlying (shared bOttom layer) is divided into several layers of experts to predict different goals
    Each objective function has a gate function (gate function)
    This gate function is a softmax function that receives input from the original sharing layer and different expert layers
    This softmax function will determine which expert layers are important for different objective functions
    As shown in figure 3 below, different expert levels have different degrees of importance to different goals
    If the correlation of different goals is lower than that of shared-bottom architecture, the training in MMoE model will be less affected
    Figure 3: application of the expert layer in multiple tasks of Youtube
    The width part of the model aims to solve the problem of selection bias caused by video location in the system
    The researchers call this part the "shallow tower" (shallow tower), which can be a simple linear model, using simple features such as where the video is clicked and the device on which the user watches the video
    The output of the "shallow tower" is combined with the output of the MMoE model, which is also a key component of the width-amp; depth model architecture
    In this way, the model will pay more attention to the location of the video
    During the training process, the dropout rate is set at 10% to prevent location features from becoming too important in the model
    If you do not use the width-amp; depth model, but add the position as a feature, the model may not notice this feature at all
    Results the results of this paper show that replacing the shared-bottom layer with MMoE can improve the performance of the model in terms of participation (time spent watching recommended videos) and satisfaction (survey feedback)
    Increasing the number of expert layers and multiplication in MMoE can further improve the performance of the model
    However, due to computational limitations, this cannot be achieved in real deployment
    Table the YouTube real-time experimental results of the 1:MMoE model
    Further research results show that the participation measure can be improved by using "shallow tower" to reduce selection bias
    This is a significant improvement over just adding features to the MMoE model
    Table 2: YouTube real-time experimental results of modeling video location bias
    Interestingly, although Google has a strong computing infrastructure, it is still very cautious in terms of training and cost; by using the depth-amp; width model, you can predefine some important features when designing the network; the MMoE model can be very effective when you need a multi-objective model; even with a powerful and complex model architecture, people are still manually adjusting the weight of the last layer fromThe actual ranking is determined according to different objective predictions
    Original link: https://medium
    com/vantageai/how-youtube-is-recommending-your-next-video-7e5f1a6bd6d9 report / feedback
Related Content: