Improving video segment prefetching with machine learning
With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main way to deliver infotainment. Achieving good user experience has become more important than ever, and although many solutions have been proposed, Adaptive Bitrate Streaming (ABR) has shown to be the best technique for video delivery over the Internet.
In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve user experience using different session information, e.g. TCP throughput, buffer occupancy, download time, etc. Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of a media session.
In this article, we share the results of our recent PhD candidate Hiba Yousef, part of her doctoral research work for École doctorale de l’Institut polytechnique de Paris (Jean-Claude Dufourd and Jean Le Feuvre) and Lumen, defense published in 2021.
Yousef shows that machine learning techniques, and supervised classification in particular, can predict the class of the ABR algorithms by predicting the most relevant features (feature importance based on mean decrease in impurity). Moreover, they are able to predict the bitrate decisions of any ABR algorithm, giving a set of input features that can be seen at the application level.
Introduction to ABR
In ABR, media (video, audio) is divided into small parts of roughly constant duration, called segments, and each segment is encoded into different qualities. A client-side adaptive bitrate algorithm is then used to decide on the most convenient quality given the network conditions.
Many ABR algorithms have been proposed in literature and in real open or closed video player deployments. Even if they differ in their decision logic, most state-of-the-art ABR algorithms rely on heuristic observations as inputs to optimize the bitrate selection of the next segments. These inputs are usually the bandwidth measurements and the maximum buffer size, segment characteristics (e.g. size, duration, and encoding bitrate) and in some cases device capabilities (e.g. CPU usage, memory, playback speed).
Machine learning (ML) and other control techniques have shown promise in many research fields, and video streaming is no exception. Supervised learning (SL) consists in learning the relation between an input and an output based on previous examples of input-output pairs. SL enables learning from past observations to predict future events, which is the main task of ABR. Therefore, mixing these two approaches is a logical choice.
Predicting ABR behavior is important for pre-fetching and caching-based systems. In this direction, there have been some attempts to improve CDN pre-fetching by informing the CDN about the next segment to be requested, as described in the emerging specification Common Media Client Data (CMCD). However, in current ABR implementations, the video player requests segments sequentially, and it is not stated in the CMCD whether prefetching the next segment follows the last ABR decisions, or if the ABR will be modified to select the bitrate for the next segments to prefetched from CDN.
Additionally, in Lumen’s particular use case — hybrid HTTP / Peer-to-Peer video (P2P) streaming — a pre-fetcher usually downloads video segments ahead of the player requests. Current prefetching techniques rely on the last requested quality to fetch upcoming segments; therefore, in the case of quality switches, pre-fetched segments on the previous quality are disregarded and new segments are requested.
Knowing ABR decisions in advance would help anticipate track switches and allow us to fetch the next segments on the predicted quality. In the current implementations of P2P streaming, the video player is integrated on top of the P2P stack, which replaces the HTTP stack for P2P as a transport layer. The ABR in web and native players is completely unknown to the P2P stack, which makes the P2P-ABR integration more problematic.
For this reason, in this blog, we deliberately try to predict any ABR algorithm given the most frequently used input features only, and regardless of the ABR used. To this end, we choose some of the well-known supervised classifiers to predict the bitrate decision of the ABR algorithm. To justify that our work is ABR agnostic, we first test the models on six well-known state-of-the-art ABR algorithms. Then, we predict the bitrate decision of three commercial, completely unknown and closed-source algorithms.
ML-based classification in adaptive streaming
As mentioned, research in adaptive streaming is shifting towards machine learning and optimization control. ML and ABR intersect at the point of learning from heuristic observations and predicting future decisions.
Adaptive bitrate algorithms are generally classified by their required inputs into three main classes: buffer-based (BBA and BOLA), throughput-based (PANDA and CONVENTIONAL), and hybrid-buffer-throughput-based. Recently, machine learning and control techniques have been used a lot in ABR, leading to an emerging ML-based and control-based class (RobustMPC).
Models in SL are trained on labeled datasets with the perspective of using models on yet unknown data. SL involves classification, regression and structure output predictions. In this study, we focus on classification problems and use well-known classifiers like Naive Bayes, Decision Trees, Random Forest, Adaboost and Gradient Boost, K-Nearest Neighbors and Support Vector Machine (SVM).
Bitrate selection classification problem
ABR bitrate prediction is formulated as a multiclass classification problem in this article. The first step in solving this ML problem is to define the features needed to predict the ABR bitrate. Most ABR algorithms use a set or a subset of network, buffer, or segment variables. In this study, to keep the model generic, we use the following features:
- Buffer level(s): the buffer occupancy when requesting the segment, which is available via most video players’ public APIs.
- Bandwidth (bps): the TCP throughput as seen by the application layer after downloading the segment, which is simply measured by computing the data downloaded (segment size here) over the download time.
- Previous bandwidth (bps): the throughput measured when downloading the previous segment; this information is used for smoothing the bandwidth estimation, or when the current estimation is not available, as with some buffer-based algorithms.
- Download time(s): the segment download time.
- Previous bitrate (bps): the bitrate of the previously selected segment, as used by the majority of ABR algorithms to take bitrate smoothness into account.
As they have a wide range of values, we rescale these features using a Min-Max scaler where all features will be transformed into the range [0,1]. Then, we build our dataset matrix of pairs of M inputs (features) and output (label) over N instances (video segments in our problem). For training and testing, we perform stratified K-Folds Cross-Validation with K = 10, where the data is further split into K different subsets (or folds). The folds are made by preserving the percentage of samples for each class. Then, k − 1 folds are used to train the model whereas the subset k is left as test data. The model results are then averaged against each of the folds and tested after against the test set.
Results and discussion
For any ML problem, it is important to evaluate the usefulness of the features, i.e. Feature Importance. In our ABR-prediction problem, Feature Importance is even more critical because it carries information about the nature or the class of the ABR used.
Looking at Fig. 1, we see that the previous bitrate is noticeably dominating the decision of the selected ABR algorithms, acknowledging the importance of smoothness in these algorithms. Additionally, the buffer-based algorithms (BBA and BOLA) demonstrate the high importance of buffer level over the network, while the throughput-based algorithms (CONVENTIONAL, PANDA, and FESTIVE) give higher importance to the bandwidth and the download time features. Finally, we can see that R-MPC — a hybrid control-based algorithm — relies on all features selected (buffer and network) with different, but closer, percentages.
Now let’s jump to some metrics. In classification problems, it is common to use known metrics (e.g. accuracy, precision, recall) to evaluate models.
- Accuracy: Prediction accuracy reports how many classes are predicted correctly out of the total prediction samples.
- Precision: This metric looks, for each class, at the proportion of samples that actually belong to this class out of the total samples that are predicted as belonging to this class.
- Recall: Recall describes, for each class, the proportion of samples that are correctly assigned to a class out of all the samples actually belonging to this class.
The prediction accuracy of the different machine learning classifiers is shown in Table I. Among the selected ML classifiers, Random Forest (RF) and Gradient Boost (GrdBst) achieve the highest prediction accuracy for all the selected ABR algorithms with overall accuracy of more than 96%.
Many works advise that using accuracy only is not enough; depending on the application, one metric may be favored over another. Taking our bitrate selection problem, if the purpose is to check, for any given bitrate, how often it was truly predicted out of all the total predictions of this class, then precision is the metric that should be taken into account.
On the other hand, if the goal is to assure that the actual classes are truly predicted then recall is the preferred metric. Table II below shows that RF and GrdBst achieve high precision and recall, with more than 96%.
Analysis of black box commercial ABRs
Three commercial service providers, referred to as S1, S2, S3, were chosen randomly, and for each, the data was collected from an overall range of 5k to 10k concurrent peers. The service provider used different live and VoD streams, so we built two datasets, one for VoD and one for live sessions, each with up to 40,000 instances. We performed our experiments over the ML algorithms presented above.
The feature importance shown in the figure above that these ABR algorithms are more sensitive to the previous bitrate in the first place, which reveals their smoothness preference. Also, apart from the VOD scenario for S1, all the ABR algorithms rely on the throughput measurements for the current and the last downloaded segment. Interestingly, the ABR of S1 seems to give higher importance to the buffer level feature in the VOD scenario.
The table above shows that RF, DT, and GrdBst perform the best among the tested ML algorithms. For S1 (both Live and VOD), these three algorithms achieve very high accuracy, slightly higher than 99%. For S2, RF and GrdBst perform even better than DT with an accuracy around 98% and 99% for Live and VOD respectively. Finally, S3, which has the highest number of quality levels (seven qualities), for Live we notice that RF and GrdBst achieve again high accuracy, with GrdBst achieving slightly higher accuracy. Nevertheless, with VoD, RF performs the best with an accuracy of 98.5%, which is nearly 1% better than DT and GrdBst.
The table below shows that each RF, DT, and GrdBst achieves high precision and recall, with RF and GrdBst slightly better than DT in some scenarios.
In this article, we showed the possibility of learning the behavior of the ABR logic thanks to supervised learning.
We performed our study over both classic ABR algorithms and real-world closed-source deployments. From the results gained, we found that Random Forest and Gradient Boost algorithms are able to achieve very high prediction accuracy, using only the basic information provided as input to the application layer.
This work serves the prefetching-based P2P streaming use case, and is compatible with one of the functional requirements that the peer-to-peer and ABR layers remain isolated. However, it can also extend to traditional CDN applications, which may also leverage prefetching to improve the QoE. This optimization is only possible if prefetches take into account track switches and minimize wasted downloads, hence the advantage of being able to predict ABR decisions ahead of time.
This model is portable to real-world use cases for more reasons than just prediction accuracy. On one hand, for the training phase, many users can participate by sending measurements of their buffer and bandwidth measured to a central server, which reduces the time needed for data collection. On the other hand, for the prediction phase, the input features for upcoming segments are either pre-known (e.g. segment size, download time, and measured bandwidth) or easy to predict (e.g buffer level giving the download time and the previous buffer level). It is also worth noting that prediction latency is negligible compared to the segment duration.
These reasons render our approach quite efficient to advance smarter prefetching techniques using machine learning.
For more information on video delivery improvements and how Lumen can help enable them, reach out to firstname.lastname@example.org
This content is provided for informational purposes only and may require additional research and substantiation by the end user. In addition, the information is provided “as is” without any warranty or condition of any kind, either express or implied. Use of this information is at the end user’s own risk. Lumen does not warrant that the information will meet the end user’s requirements or that the implementation or usage of this information will result in the desired outcome of the end user. All third-party company and product or service names referenced in this article are for identification purposes only and do not imply endorsement or affiliation with Lumen. This document represents Lumen’s products and offerings as of the date of issue. Services not available everywhere. Business customers only. Lumen may change or cancel products and services or substitute similar products and services at its sole discretion without notice. ©2021 Lumen Technologies. All Rights Reserved.