Why is the YouTube video recommendation system so strong? You will know after reading this article.

Why is the YouTube video recommendation system so strong? You will know after reading this article.
2021-10-10 - youtube

Selected from Medium author: Tim Elfrink Machine Heart to compile and participate: Zhang Qian as the global mainstream video platform, the success of Google's video website YouTube can not be separated from the accurate video recommendation system
What are the highlights of YouTube's recommendation system? What problems have they solved? In a RecSys 2019 paper, Google researchers explained these issues
A data scientist from the Netherlands summarized the contents of the paper
Address: what problem does ://dl
acm
org/citation
cfm?id=3346997Youtube 's recommendation system solve? When watching videos on Youtube, the page displays a list of video recommendations that users might like
This paper focuses on the following two major goals: 1) the unused goals need to be optimized
They do not define the exact objective function, instead, the objective function is divided into "participation" (clicks, time spent) goals and "satisfaction" (likes, steps) goals; 2) reduce the "selection bias" introduced by the system: users are usually more likely to click on the recommended videos that rank first, although later videos may have higher participation and satisfaction
How to effectively reduce these prejudices is an urgent problem to be solved
How to solve the problem? Figure 1: the complete architecture of the model
The model introduced in this paper focuses on two main objectives
They used a width-amp; depth model framework
The width model has strong memory ability, the depth neural network has generalization ability, and the width-amp; depth model combines the advantages of both
The width-amp; depth model generates a prediction for each defined goal (participation and satisfaction)
These objective functions can be divided into two categories: dichotomy (whether you like a video or not) and regression (rating a video)
There is also a separate sorting model on top of this model
This is just a weighted combination of output vectors, which are different prediction targets
These weights are adjusted manually to achieve the best performance for different goals
In addition, the researchers also proposed pairing, list and other advanced methods to improve the performance of the model, but due to the increase of computing time, these methods have not been applied to production
Figure 2: replace the shared-bottom layer with MMoE
In the deep part of the width-amp; depth model, the researchers used a multi-task learning model, MMoE
The characteristics of the existing video (content, title, topic, upload time, etc
) and the information of the user being watched (time, user profile, etc
) are used as input
The MMoE model can efficiently share weights among different goals
Shared underlying (shared bOttom layer) is divided into several layers of experts to predict different goals
Each objective function has a gate function (gate function)
This gate function is a softmax function that receives input from the original sharing layer and different expert layers
This softmax function will determine which expert layers are important for different objective functions
As shown in figure 3 below, different expert levels have different degrees of importance to different goals
If the correlation of different goals is lower than that of shared-bottom architecture, the training in MMoE model will be less affected
Figure 3: application of the expert layer in multiple tasks of Youtube
The width part of the model aims to solve the problem of selection bias caused by video location in the system
The researchers call this part the "shallow tower" (shallow tower), which can be a simple linear model, using simple features such as where the video is clicked and the device on which the user watches the video
The output of the "shallow tower" is combined with the output of the MMoE model, which is also a key component of the width-amp; depth model architecture
In this way, the model will pay more attention to the location of the video
During the training process, the dropout rate is set at 10% to prevent location features from becoming too important in the model
If you do not use the width-amp; depth model, but add the position as a feature, the model may not notice this feature at all
Results the results of this paper show that replacing the shared-bottom layer with MMoE can improve the performance of the model in terms of participation (time spent watching recommended videos) and satisfaction (survey feedback)
Increasing the number of expert layers and multiplication in MMoE can further improve the performance of the model
However, due to computational limitations, this cannot be achieved in real deployment
Table the YouTube real-time experimental results of the 1:MMoE model
Further research results show that the participation measure can be improved by using "shallow tower" to reduce selection bias
This is a significant improvement over just adding features to the MMoE model
Table 2: YouTube real-time experimental results of modeling video location bias
Interestingly, although Google has a strong computing infrastructure, it is still very cautious in terms of training and cost; by using the depth-amp; width model, you can predefine some important features when designing the network; the MMoE model can be very effective when you need a multi-objective model; even with a powerful and complex model architecture, people are still manually adjusting the weight of the last layer fromThe actual ranking is determined according to different objective predictions
Original link: ://medium
com/vantageai/how-youtube-is-recommending-your-next-video-7e5f1a6bd6d9 report / feedback