Perceptual Adaptive Quantization Parameter Selection Using Deep Convolutional Features for HEVC Encoder

Ismail Marzuki; Donggyu Sim

doi:10.1109/ACCESS.2020.2976142

Outline

Perceptual Adaptive Quantization Parameter Selection Using Deep Convolutional Features for HEVC Encoder

Ismail Marzuki

IEEE Access

https://doi.org/10.1109/ACCESS.2020.2976142

visibility

…

description

14 pages

Abstract

In this paper, we propose a perceptual adaptive quantization based on a deep neural network on high efficiency video coding (HEVC) for bitrate reduction while maintaining subjective visual quality. The proposed algorithm adaptively determines frame-level QP values for different picture types of the hierarchical coding structure in HEVC by taking into account the high-level features extracted from the original and previously reconstructed pictures. A predefined model based on the visual geometry group (VGG-16) network is exploited to extract the high-level features for subjective visual characteristics. Furthermore, the Lagrange multiplier for each frame is also adaptively determined by involving the proposed features for deciding the appropriate parameter of the Lagrange multiplier that can be used for rate-distortion optimization during the encoding process. Experimental results reveal that the proposed perceptual adaptive QP selection can facilitate bitrate savings up to 65.73% and 47.68% and improve the BD-rate based on SSIM by approximately 20.68% and 14.27% under low-delay-P and random-access coding structures, respectively, with very minimal visual quality degradation when compared to HM-16.20 without adaptive QP selection. INDEX TERMS Adaptive quantization parameter, deep neural network, high efficiency video coding (HEVC), perceptual quantization parameter, VGG-16 network, video coding.

Received January 15, 2020, accepted February 13, 2020, date of publication February 24, 2020, date of current version March 2, 2020. Digital Object Identifier 10.1109/ACCESS.2020.2976142 Perceptual Adaptive Quantization Parameter Selection Using Deep Convolutional Features for HEVC Encoder ISMAIL MARZUKI AND DONGGYU SIM Department of Computer Engineering, Kwangwoon University, Seoul 139701, South Korea Corresponding author: Donggyu Sim ([email protected]) This work was supported in part by the Ministry of Science and ICT (MSIT), South Korea, under the Information Technology Research Center (ITRC) supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP), under Grant IITP-2019-2016-0-00288, and in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) through the Ministry of Science, ICT & Future Planning under Grant NRF-2018R1A2B2008238. ABSTRACT In this paper, we propose a perceptual adaptive quantization based on a deep neural network on high efficiency video coding (HEVC) for bitrate reduction while maintaining subjective visual quality. The proposed algorithm adaptively determines frame-level QP values for different picture types of the hierarchical coding structure in HEVC by taking into account the high-level features extracted from the original and previously reconstructed pictures. A predefined model based on the visual geometry group (VGG-16) network is exploited to extract the high-level features for subjective visual characteristics. Furthermore, the Lagrange multiplier for each frame is also adaptively determined by involving the proposed features for deciding the appropriate parameter of the Lagrange multiplier that can be used for rate-distortion optimization during the encoding process. Experimental results reveal that the proposed perceptual adaptive QP selection can facilitate bitrate savings up to 65.73% and 47.68% and improve the BD-rate based on SSIM by approximately 20.68% and 14.27% under low-delay-P and random-access coding structures, respectively, with very minimal visual quality degradation when compared to HM-16.20 without adaptive QP selection. INDEX TERMS Adaptive quantization parameter, deep neural network, high efficiency video coding (HEVC), perceptual quantization parameter, VGG-16 network, video coding. I. INTRODUCTION global Lagrange multiplier and determines the quantization High-efficiency video coding (HEVC) standard has been parameter (QP) value using a QP- λ model. The Lagrange widely accepted to achieve better compression performance multiplier λ can be termed as a function of the quantization over H.264/Advanced Video Coding (AVC) by maintain- step size, which is closely related to the QP value. It is used ing similar visual quality [1]. It has encompassed various for the coding efficiency of each basic unit by selecting the video media services and applies not only to full high def- best coding mode under a given QP value, where the basic inition (FHD) but also to 4K/8K ultra-HD (UHD) [2]–[4]. unit can be a frame, slice, or coding unit (CU). The common Since the standard was released, many studies have been test condition (CTC) designed by the Joint Video Experts conducted for the sake of its advantages of visual quality Team (JVET) employs static quantization parameters for fair improvement [5]–[7], computational complexity reduction comparison in standardization [32]. However, an adaptive [8]–[16], bitrate reduction [17], [18], and prospects as a QP selection is known to be effective in improving subjec- future video coding standard [19]–[26]. Among many cod- tive visual quality for practical applications. The adaptive ing tools, rate-distortion optimization (RDO) in the HEVC QP should be designed to be harmonized within the RDO software model (HM) [26]–[28] is used to improve its coding process. It can adjust the QP value for a distinctive frame efficiency [30], [31]. It is based on optimization using the or slice according to different spatial, temporal, or visual aspects. Some studies have discovered approaches to improve The associate editor coordinating the review of this manuscript and the compression rates [33]–[37] or visual quality [38]–[44] approving it for publication was Shiqi Wang. with various adaptive QP techniques. Typically, these studies This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 37052 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder prioritize the determination of optimum QPs for the RDO where N denotes the number of basic units, Di is the coding process to produce better encoding parameters by analyz- distortion, and Ri is the coding bitrate of the i−th basic unit. ing the QP- λ relationship or by observing the effective- Note that the basic unit in HEVC term may be a frame, slice, ness of spatial-temporal dependencies among the basic units. or CU. Di and Ri in (1) form on QP = (QPi , · · · , QN ). Generally, these studies take into consideration the essential QPi refers to the QP value for the i−th basic unit and role of λ in the RDO process. Thus, it will be interesting to QP∗ = (QP∗i , · · · , QP∗N ) represents the optimal QP set consider a deep neural network (DNN) for more varied QPs for the N basic units. Applying the λ method [29] into the in HEVC. Studies have prevailed benefits of DNN for video following unconstrained form, equation (1) can be rewritten coding [45]–[49]. However, there is no existing effective as: DNN-based algorithm for perceptual adaptive QP purposes. QP∗ = arg min (QP) {J }, This study presents a DNN-based QP selection method by N N the adaptive determination of frame-level perceptual QP for X X J = Di + λ Ri (2) HEVC to achieve bitrate reduction without inducing visual i=1 i=1 quality degradation. The proposed algorithm is embedded in HM-16.20 and generates QP values adaptively for different where J stands for the total rate-distortion (RD) cost function, picture types and coding structures in HEVC. The proposed and λ represents the trade-off parameter between Di and Ri . algorithm first determines a QP for the first frame in a Along with the RDO process, λ in HEVC can be obtained as sequence by averaging the standard deviation value of the λ = QPfactor 2QP/3 , (3) original blocks (StD). Then, the proposed algorithm obtains high-level features from the original and reconstructed frames where QP denotes the quantization parameter, and QPfactor using a pretrained visual geometry group (VGG-16) network is a constant parameter related to coding configurations. The model [50]. Based on the extracted high-level features, more QP value in (3) is an integer introduced to represent an actual visual-friendly QP is then distributed for the next consecutive quantization step size by an exponential mapping function. frames in the encoding order. The algorithm also determines However, the quantization step size in HEVC tends to be the Lagrange multiplier adaptively for each frame based static for complexity reduction in the RDO process. Applying on the proposed model, which can be used for RDO in a fixed or predefined QP scheme may cause the compression the encoding process. As a result, the proposed algorithm rate to drop significantly, while HEVC has different coding demonstrates significant coding gain with minimal visual configurations. Hence, this becomes a major challenge for degradation against HM-16.20 and other existing adaptive QP any QP method design in HEVC. Many QP adjustment meth- algorithms. ods have been studied for better coding gain. For example, The rest of this paper is organized as follows. In section 2, a QP–λ relationship is used to determine the λ value accord- we briefly present an overview of the QP decision in HM ing to an initial QP, and subsequently, the new QP value is and related works. In section 3, we discuss the proposed recalculated [30], [31]. This algorithm is widely known as perceptual adaptive QP for HM. In section 4, we review a straight-forward algorithm for the RDO scheme in HEVC. several performance evaluations of the proposed algorithm, Wang et al. [33] introduced an improved block-level adaptive and finally, we draw the conclusions and suggest further QP value that considers previously coded block information. research directions in section 5. Zhao et al. [34] proposed a QP cascading scheme that assigns QP values to different hierarchical temporal picture layers. II. CURRENT STATE OF QP SELECTION AND RELATED Similar algorithms were also introduced by Li et al. [35] and STUDIES OF PERCEPTUAL ADAPTIVE QP IN HEVC He et al. [36], which presented only an inter-frame depen- The current QP selection within the RDO process in HEVC is dency technique. As far as we know, these last two algorithms not optimal. Many studies have revealed several weaknesses can provide better coding gain for an HEVC encoder. Exten- of the QP selection technique in the HEVC encoder. In this sive use of spatial-temporal predictions in HEVC is important section, several adaptive QP techniques for HEVC are dis- for adaptive QP selection in RDO. Although the integration cussed as follows. of such propagation effects is desirable, there are not many such studies. A. GENERAL QP SELECTION CONCEPT IN HM B. EXISTING METHODS OF PERCEPTUAL ADAPTIVE QP QP selection in video coding can be mathematically SELECTION FOR HM described as an RDO problem [35], [36] that minimizes the total coding distortion D at a given bitrate RT as: Determining the QP value for video encoders also affects the entirely visual quality of a video sequence. To improve XN the subjective quality of adaptive QP, the spatial and tem- QP∗ = QP∗i , · · · , QP∗N = arg min (QP) Di , i=1 poral features or combination of those may be designed N X empirically. Open software of × 265 [38] becomes one of s.t. Ri ≤ RT (1) several algorithms that developed a perceptual adaptive QP i=1 method with spatial and temporal features. However, it still VOLUME 8, 2020 37053 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder fails to give promising outcomes if a reference frame has as shown in (3). The two main factors involved are the characteristics different from the current coding frame. Test QPfactor and QP value. Frame-level QP decision in HM-16.20 Model 5 (TM5 Model) of MPEG-2 software [39] also uses is determined with the same QP offset for multiple frames the method that scales a quantization step according to the in the same temporal ID layer, while the QPfactor denotes spatial activity of one CU relative to a frame-level average for the coding structure parameter is always set static as of the spatial activity. This method fails when the size of a 0.57, regardless of frame or slice types and coding structures. large CU block needs to be estimated, thus limiting its perfor- In HEVC, the different frames form a set of hierarchical mance [37]. Similarly, Yeo et al. [40] also introduced a block- structures within a group of pictures, GOP. For example, level adaptive QP selection algorithm. It observes the spatial frames at a higher temporal layer in the same GOP can be and temporal pixel characteristics of CU blocks. However, predicted from one or more frames at the lower temporal it needs a higher encoding time. Prangnell et al. [41] used layers. Therefore, giving only the default value of QP off- transform coefficients based on a soft thresholding method. set and QPfactor to generalize different frames and coding However, the proposed soft thresholding method may still structures is not perceptually wise for HEVC encoders. Both cause fluctuations of the visible quality, resulting in severe spatial and temporal features could be sufficient to resolve the visual distortion. issues. However, most of the existing adaptive QP methods An alternative algorithm was proposed by determining a mainly concentrate only on one of both elements. In this QP offset based on a QP − λ relationship that is formed. paper, the proposed algorithm demonstrates visual feature Yeo et al. [40] has also studied related topics. However, their extraction based on a particular convolutional layer of a DNN method utilized only the spatial variance of a block, which model for a frame-level adaptive QP. We consider both the is limited for videos with large homogeneous areas [42]. spatial and temporal features to generate the adaptive QP and Xiang et al. [43] proposed a perceptual motion estimation QP factor decision for the proposed algorithm. method using a spatial-temporal just-noticeable-distortion Fig. 1 depicts the whole process of the proposed algorithm. (JND) model for a QP offset design. Rouis et al. [44] gen- As shown in Fig. 1, the proposed algorithm is embedded erated perceptual features temporally as well as CTU visual in the HEVC encoder. The proposed algorithm is processed sensitivity for spatial features. However, both features con- during the slice initialization. Depending on the slice or frame sidered in this algorithm are provided only for an adaptive λ types, the QP value and QP factor are determined adaptively. in RDO. As a conclusion, spatial and temporal perceptual Fig. 2 shows the detailed process of the proposed algorithm. features for an adaptive QP decision can provide a better For the first frame in a sequence, the proposed algorithm trade-off [43], [44]. is designed in a straightforward manner by considering the standard deviation values of the original frame to decide C. DNN APPROACH TO PERCEPTUAL ADAPTIVE QP upon a QP value and set QP factor as its default value. Then, SELECTION FOR HM a pretrained VGG-16 model is employed to extract visual The use of DNN for video coding has now become pos- features from the original and reconstructed frames to predict sible for the video coding community. Liu et al. [45] and the QP and QP factor for consecutive frames. The designed Ma et al. [46] have presented case studies on deep visual features result in a perceptual loss value based on learning-based video coding. Several researchers such as the Euclidean distance measure, VGGfeature . The QP and Choi and Bajic [47] studied a deep learning-based frame Lagrange multiplier values based on VGGfeature are then prediction using decoded frames to predict the textures of a adaptively estimated by considering the picture types and block. It performs both uni- and bi-directional predictions at coding configurations in HEVC. A detailed discussion of various distances from a target frame. Ki et al. [48] developed this section is divided into several sub-categories as follows. a JND model based on deep learning for the assessment of Symbols and descriptions used in the proposed algorithm perceptual distortion in HEVC. Li et al. [49] proposed a of the adaptive frame-level perceptual QP for HEVC are DNN-based rate control for Intra coded pictures in HEVC tabulated in Table 1. that is designed to predict the parameters of the R − λ rate control model. Other studies have successfully revealed the A. GENERATION OF VISUAL FEATURES FOR THE benefits of deep learning for video encoding. However, it is PROPOSED PERCEPTUAL ADAPTIVE QP ALGORITHM still difficult to find one specific deep learning method for a We propose to adaptively adjust a perceptual QP value perceptual adaptive QP. In this paper, we present a perceptual per frame by employing a deep learning network, namely, adaptive QP based on a predefined VGG network for HEVC. the VGG-16 network [50]. The proposed algorithm employs a pretrained VGG-16 model to construct high-level feature III. PROPOSED ALGORITHM FOR PERCEPTUAL descriptors using a specific convolutional layer. We select ADAPTIVE QP SELECTION FOR HEVC ENCODER VGG-16 for this study due to some of its desirable charac- The main objective of the proposed algorithm is to achieve teristics. VGG-16 is widely recognized for its remarkable significant bitrate savings without inducing noticeable visual performance on image classification, which classifies over distortions in reconstructed video frames. We first observed 14 million images to 1000 categories. It has a better image the current setting of the QP − λ relationship in HEVC, classification accuracy than the AlexNet model [51]. It has 37054 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder FIGURE 1. Block diagram of the proposed perceptual adaptive QP. deep convolutional layer design used to train on an enormous and manifold image dataset, which results in convolution filters that are well suited to search universal patterns and gen- eralize them. It is also widely applied as a feature extraction technique in many computer vision solutions [52], [53]. For the same reason, the proposed algorithm also takes advantage of the VGG-16 convolution layers only for visual feature extraction. In this paper, a simplified VGG-16 network is employed by removing the latest pooling and fully connected layers, as depicted in Fig. 3. In the figure, h and w represent the height and width of the input 64 × 64 CTU block, respec- tively. Fortunately, the VGG network can handle any input block size, as long as h and w are multiplication of 32. Hence, the CTU block size can be used directly without necessary prior processing. By examining the visualization of convo- lution filters and trial–and–error experiments, we selected ‘block5conv1’, which is the first-fifth convolution layer to build general features for the proposed algorithm. The ‘pool5’ layer is initially included in the network. However, it is nei- ther considered for the algorithm nor included in the figure. The ‘pool5’ layer is commonly affected by specific classifi- cation objects, which is not favorable for the detection of gen- eral features. We mainly consider the generalizability of the FIGURE 2. Overall flowchart of the proposed perceptual adaptive QP. VGG network, and thereby, the proposed feature descriptors can search for common and universal patterns. For better features with HVS consideration, we introduce a perceptual loss function with a full-reference visual quality a straightforward architecture that is constructed simply by measure that uses the Euclidean distance. It is based on a stacking convolution, pooling, and fully connected layers comparison of different feature maps extracted from original without branches or shortcut connections to reinforce gradi- and reconstructed blocks, as depicted in Fig. 4. The recon- ent flow. Such a design is versatile and adaptable for different structed block fed to the network is derived after the in-loop practical purposes. Besides, the VGG-16 has an extremely filter process. The figure shows that the same model of the VOLUME 8, 2020 37055 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder TABLE 1. Symbols and descriptions used in the proposed perceptual scheme in video coding standards. In this study, the proposed adaptive QP selection. algorithm determines the frame-level QP for different picture types by obtaining a perceptual loss value based on high-level features from the original and previously reconstructed pic- tures. With regards to the first frame in a sequence, the deter- mination of a proper QP value is crucial as it will determine the overall coding performance. However, having only an original picture is not enough to provide a perceptual loss value before the encoding. Hence, we examine whether the standard deviation values (StD) of the original blocks can demonstrate the characteristics of a complete picture for frame-level QP decision. We activated rate control to observe the different QP values of every CTU within the intraframe using the ‘BasketballPass’ test sequence with QP 22, 27, 32, and 37. Subsequently, a relationship between QP and StD is presented in Fig. 5. A lower StD, which reflects a flat region, tends to have a higher QP, vice versa. Therefore, we can expect some coding gain with lower visual quality depres- sion in this area. However, applying the StD value directly to vary λ over the QPfactor may lead to high coding loss performance. Therefore, the QP decision in this algorithm is adjusted by firstly normalizing the pixel value of every CTU block in a frame before calculating StD and disregarded the λ and QPfactor for QP decision. Then, the QP of the first frame can be more visual-friendly provided and can be expressed as: QP0 = QPinit − 3 log2 (StDintra ) (4) N 1 X StDintra = σi (5) N i=1 v u u1 X M σi = t (xj − µi )2 (6) u VGG-16 network is utilized for extracting those high-level M features. The Euclidean distance is preferred owing to its j=1 simplicity in expressing VGGfeature as a perceptual loss value. To do this, we first convert the color format of both the where QP0 denotes the QP value of the first frame in a original and the reconstructed CTU blocks to the RGB color sequence, and QPinit represents the initial QP value set by format. This process is suggested as a requirement of the the encoder. Since we design the proposed algorithm in CTU VGG-16 architecture. Then, the network can operate ade- wise, the final picture characteristic of the first frame is quately to obtain visual features from both input blocks. Once decided based on the StDintra value, which is the average a VGGfeature is generated, we then use it to determine the StD of the total number N of the original CTU blocks in an QP value and QPfactor adaptively for the Lagrange multiplier Intra frame. Thus, the symbols σi and µi become the StD and decision. mean values of the original i−th CTU block, respectively. M denotes the total number of pixel values xj . B. PERCEPTUAL ADAPTIVE QP DETERMINATION WITH For the rest of the frames, the quality of the reconstruction QP-λ RELATIONSHIP frames is generally influenced by a previously coded frame From the formula in (3), the QP value per frame can be with a certain QP value. In this study, instead of analyzing derived. However, the λ value in HM-16.20, which represents the distortion of two consecutive frames, we investigate the the Lagrange multiplier is decided later after the QP decision distortion of VGG features for determining a proper QP is determined, while the QP value per frame is decided empir- value perceptually. Note that the proposed VGG features are ically based on the HM configuration. Therefore, finding extracted from the original and reconstructed frames based on a proper parameter for predicting a frame-level perceptual the VGG-16 model. Therefore, the distortion of VGG features adaptive QP is a challenging issue. of two consecutive frames can be expressed as Generally, coding errors may propagate from the previous frame to subsequent frames because of the prediction coding DVGGpre = f DVGGref (7) 37056 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder FIGURE 3. Proposed double-simplified VGG-16 network architecture. FIGURE 4. Proposed double-simplified VGG-16 network architecture. where DVGGpre is the VGG feature distortion of a predicted A further experiment was also conducted with rate control frame, DVGGref denotes the VGG feature distortion of a refer- enabled to support the observations. Fig. 6(b) shows a high ence frame, and f (·) is the relationship between DVGGref and correlation between the VGG feature and QP selection per DVGGpre . frame. Accordingly, the QP decision for the rest of the frame Fig. 6(a) shows the VGG feature distortion relationship can be determined by considering the picture types as in (8). between two consecutive frames of the ‘BasketballPass’ test The QP decision for a future intra picture can be deter- sequence. The sequence is encoded under LDP configuration mined by using the VGGfeature from a previously intra coded with the coding structure of I-P-P-P-P. Each P frame uses only picture. With regards to the QP decision for P- and B- frames, its previous coded frame as a reference. We set the predicted we control QPinit with 1pQPFidi and 1bQPFidi depend- frame with a fixed QP value of 32 and encoded the first ing on the hierarchical frame index i(Fidi ) as shown 15 frames. It can be seen that DVGGref influences DVGGpre . in Table 2. The values of 1pQPFidi and 1bQPFidi are derived VOLUME 8, 2020 37057 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder derived as the default settings as in HEVC encoder configu- rations organized depending on the frame index i. Values of both QPOffsetModelScalei and QPOffsetModelOffseti parameters can be found as in Table 3.    QP0 ,  if I frame or slice, POC = 0     QPinit − 3 log2 VGGfeature ,        if I frame or slice, POC 6= 0 QPperceptual = (8)  QPinit + 1pQPFidi ,       if P frame or slice QPinit + 1bQPFid i ,  FIGURE 5. Correlation between StD value of original blocks and QP    values.   if B frame or slice  1pQPOffset = Clip(0.0, 3.0, 1QPOffseti ) (9)  Clip 0.0, 3.0, 1QPOffseti ,        if Fid = 0 0.0, 3.0, 1QPOffseti ,  Clip      if Fid = 1      Clip 0.0, 6.0, 1QP Offseti ,  1bQPOffset = (10)    if Fid = 2  Clip 0.0, 7.0, 1QPOffseti ,     if Fid = 3     Clip 0.0, 9.0, 1QPOffseti ,        if Fid = 4 1QPOffseti = QPperceptual × QPOffsetModelScalei + QPOffsetModelOffseti + VGGfeature (11) C. PERCEPTUAL ADAPTIVE LAGRANGE MULTIPLIER DETERMINATION WITH QP-λ RELATIONSHIP For increased bitrate savings while maintaining the visual quality of the proposed adaptive QP decision algorithm, we also aim to determine the Lagrange multiplier by involv- ing the proposed VGGfeature . Note that the Lagrange multi- plier in HM-16.20 is assigned a static QPfactor value. Hence, it is essential to provide an adaptive QPfactor designed for different picture types and coding structures in HEVC. 1) QPfactor DECISION FOR I-FRAMES First, we searched for the best QPfactor of intra coded frames by assigning several constant values of equation (3) FIGURE 6. Relationship of: (a) VGG feature distortion between reference through experiments using HM-16.20 under All Intra config- and predicted frames, and (b) VGG feature and QP selection. urations. ‘BasketballPass’, ‘BQSquare’, ‘BlowingBubbles’, and ‘RaceHorses’ were used with all the QP settings for the experiment. Fig. 7 depicts the BD-rate based on SSIM empirically, which also corresponds to the coding structure performance with the corresponding QPfactor values. It shows under the LDP and RA configurations, respectively. For an approximation of the optimum QPfactor for intra frames, avoiding large fluctuations in quality between neighboring which lies in the range of 0.60 to 0.80 with a minimal frames, both 1pQPFidi and 1bQPFidi values for different BD-BR-SSIM gain of approximately −0.2%, while the high- temporal levels should satisfy the conditions described in est coding gain is approximately −0.5% given by QPfactor (9)–(11), where QPOffsetModelScalei and QPOffsetModelOffseti are as 0.65. Accordingly, the QPfactor for intra pictures can be 37058 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder TABLE 2. Initial of 1pQPFid and 1bQPFid for different Fid i . i i TABLE 3. Default value of QPOffsetModelScale and QP OffsetModelOffset for different Fid i . i i reference frame DVGGref . Note that the λ values among different frames in the same GOP should be set differently, although they are coded with the same QP value. Hence, deciding the QPfactor for different frames in a different temporal layer is desirable, and relationship in (7) can be approximated as (I ) PQPfactor = DVGGpre ≈ c × DVGGref + Dref (13) where PQPfactor stands for the QPfactor of P-frame, and c is the linear coefficient, i.e., the slope of the approximated linear (I ) distortion relationship between DVGGpre and DVGGref . Dref is added to the linear relationship to represent the feature extraction of the reference frame coded under all intra mode. (I ) FIGURE 7. QPfactor decision and BD-rate-SSIM of intra coded frames. The Dref value in the proposed algorithm is used to maintain gaps of bit distributions among inter-coded pictures in the same GOP and set as determined as (I ) StDintra Dref = (14) (GOPsize − Fid i )   0.57, POC = 0 IQPfactor = StDIntra + VGGfeature (12) where GOPsize and Fidi denote the GOP size for LDP, which  , POC 6 = 0 2 is set to 4 and the frame index listed in the same GOP, where IQPfactor must satisfy 0.57 ≤ IQPfactor ≤ 0.80, POC respectively. An illustration of how PQPfactor is provided for denotes the picture order count, and VGGfeature is a percep- P-frames under the LDP coding structure can be seen tual loss value from the original and previously intra coded in Fig. 8. Then, the combination of (13) and (14) can be pictures based on the VGG-16 model. expressed as PQPfactor = DVGGpre ≈ c × DVGGref 2) QPfactor DECISION FOR P-FRAMES StDintra In the Inter picture coding framework under the LDP config- + (15) uration, the quality of the reconstruction frames is generally (GOPsize − Fid i ) influenced by the coding structure factor (or QPfactor as Since DVGGREF is the same as VGGfeature for the perceptual previously mentioned). As a result, the distortion of one frame retention purposes in PQPfactor , (15) can be further adjusted as with a certain QP value may affect both the visual quality in (16), where the parameter c is empirically set as 0.45 in and RD performance of future frames in encoding order this study. according to the given QPfactor . Based on the previous obser- vation illustrated in Fig. 6(a), the VGG feature of a predicted StDintra PQPfactor = c × Vggfeature + (16) frame DVGGpre increases linearly with the VGG feature of a (GOPsize − Fid i ) VOLUME 8, 2020 37059 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder TABLE 4. Pattern of POC difference between the current POC and its reference POCs. expressed as FIGURE 8. Example of the proposed adaptive QPfactor for LDP case. StDintra BQPfactor = ci × Vggfeature + (17) (GOPsize − Tid i ) 3) QPFACTOR DECISION FOR B-FRAMES For RA configuration, the QPfactor decision uses a similar where BQPfactor represents the QPfactor for the B-frame, and concept as those in the LDP case with further adjustments. VGGfeature denotes the VGG feature extraction of the refer- We first analyzed the hierarchical B coding structure under ence frames. StDintra is given from the I-frame depending on RA configuration in the HEVC depicted in Fig. 9. Both the the intra period of each sequence configuration. GOPsize is coding distortion and visual quality of the higher temporal the GOP size of the RA case, which is set to 16, and Tid i layers are affected by those of the lower temporal levels. For is the temporal ID of frames in the same GOP. Parameter ci the first frame in a GOP coded as an I-frame, its coding is a constant value of the i−th temporal ID that determines distortion and visual quality will depend only on the spa- the BQPfactor of each frame in different temporal IDs. We first tial operation. However, those pictures coded as B-frames, searched the best c per Tid i empirically with the default QP including the frame with temporal ID = 0 but not an I-frames, setting as in HM-16.20. Fig. 10 depicts the results of the BD- need to be treated in Interframe fashion with its corresponding BR-SSIM with the selected c values for different temporal reference frames. Table 4 shows the POC difference between IDs. The ‘BasketballPass’ and ‘RaceHorses’ test sequences the current POC and its reference pictures to their tempo- are used for testing all the QP settings. According to Fig. 10, ral ID. This algorithm is designed to enable proper feature it can be seen that the optimum c values for temporal extraction for the coding frames. However, we used only the ID-1 (T_1) is 0.20, and for T_2 to T_4 have the best c values reference frame nearest to the current coded picture in the RA 0.30, 0.40, and 0.42, respectively. In this test, the ci values coding structure. increase with the temporal IDs; hence, we set the c values As we follow a similar concept in LDP configura- as 0.12 for the Interframe having temporal ID = 0. Accord- tion, thus, the formula in (17) for the RA case can be ingly, the c values for different temporal ID in (17) can be FIGURE 9. Hierarchical B coding structure under RA configuration. 37060 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder proposed algorithm is worse than that of HM-16.20. We also evaluate the bitrate reduction, 1Bitrate towards the anchor software, which can be denoted by RPRO − RHM 1Bitrate = × 100% (20) RHM where RPRO and RHM represent the output bitrate of the proposed and anchor algorithms, respectively. The proposed algorithm is also evaluated against the anchor in BD-BR with the SSIM metric (BD-BR-SSIM) [54], [55]. For bitrate reduc- tion and BD-BR-SSIM measures, a negative value indicates gains over the anchor. We used HEVC video test sequences FIGURE 10. c parameter decision for each Tid i under RA configuration. with the LDP and RA configurations for several QPs: 22, 27, 32, 37. As shown in Table 6, the proposed algorithm demon- strates a very negligible SSIM degradation of approximately expressed as: −0.00541 and −0.00656 on average against HM-16.20 with-  out a perceptual adaptive QP method, respectively. In terms    0.12, if Tid = 0 and POC 6 = 0 of bitrate reduction, the proposed algorithm increases bitrate   0.20, if Tid =1 saving, on average, by approximately −42.67% for LDP and   ci = 0.30, if Tid =2 (18) −33.93% for RA configurations over the HM-16.20. For the    0.40, if Tid =3 ‘BQTerrace’ test sequence, the proposed algorithm achieves     0.42, the highest bitrate reduction of −66% for the LDP case and if Tid =4 −48% for the RA case. Note that the sequence has large flat regions over its frames that benefit the proposed algorithm IV. EXPERIMENTAL RESULTS both spatially and temporally. In terms of the coding effi- The test configuration used for evaluating the proposed algo- ciency, the proposed algorithm yields better BD-BR-SSIM rithm is listed in Table 5. Coding efficiency evaluation was scores than the anchor about −20.68% and −14.27% for LDP performed under a common test condition for HEVC [32] and RA configurations, respectively. The proposed algorithm with the SSIM term [54]. In addition, subjective evaluation can also simulate better performance for test sequences with was done using the difference mean opinion scores (DMOS). higher resolutions. In the case of LDP, Class B and Class E The assessments were conducted by comparing the proposed provide an average coding gain of approximately −21% and algorithm against HM-16.20 as an anchor software and also −28%, respectively. In the case of RA, Class A also gives a against other existing works [40], [42]. coding gain of approximately −15%. According to Table 6, the proposed algorithm can achieve TABLE 5. Experimental environment. better objective performances under the LDP configuration than RA. For the sake of visual quality, the number of intra coded pictures in the LDP case indicates that the proposed algorithm has an essential role in maintaining the quality of the reconstructed frames. Better quality of the reconstructed frames can provide better prediction modes for the future inter coded frames, as well as better visual features for the proposed QP and Lagrange multiplier selections. Considering both spatial and temporal visual features for the proposed algorithm results in significant bitrate reduction while retain- ing the visual quality of the test videos. For test sequences that A. CODING PERFORMANCE EVALUATIONS have many homogeneous regions, slow motions, and larger We conducted several evaluations of the coding performance background areas than the moving objects in a frame, the pro- to assess the objective quality of the proposed algorithm. posed algorithm can play a prominent role in obtaining higher All the objective quality measures are tabulated in Table 6. objective measures. The visual characteristics of such test First, we checked the SSIM difference, 1SSIM between the sequences can be seen in ‘BQTerrace’, ‘Johnny’, ‘FourPeo- proposed algorithm and the anchor. It is defined by ple’, ‘Cactus’, ‘KristenAndSarra’ videos, etc., in which the most significant coding gains are obtained in perceptual 1SSIM = SSIM PRO − SSIM HM (19) terms. On the other hand, the proposed algorithm can con- where SSIM PRO and SSIM HM denote the luma SSIM qual- tribute only moderate coding improvements for ‘Kimono’ ity of the proposed algorithm and the anchor, respectively. and ‘RaceHorses’ that have more textures and fast or more For (19), a negative value means that the SSIM quality of the motions. VOLUME 8, 2020 37061 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder TABLE 6. Objective quality comparisons between the proposed algorithm and HM-16.20. TABLE 7. DMOS comparisons between the proposed algorithm and HM-16.20. TABLE 8. Average of DMOS comparisons. process. For each participant, the reconstructed frames from the proposed algorithm and HM-16.20 were randomly shown twice with all the QP values. Then, the observers were asked to provide MOS values in the continuous scale ranging from 1 to 5. Finally, we processed the MOS values to produce the DMOS scores between MOS PRO and MOS HM , which denotes the luma MOS quality of the proposed algorithm and the anchor, respectively. DMOS scores are defined by B. SUBJECTIVE PERFORMANCE EVALUATIONS Subjective quality assessment was performed to compare DMOS = MOSpro − MOSHM (21) the proposed algorithm and HM-16.20 for all the test sequences by following the double stimulus continuous Table 7 shows the DMOS of all the test sequences under quality scale (DSCQS) method [55]. There are 18 observers LDP and RA configurations. For convenience, we introduced among which 11 are in the relative field, and the rest are naïve the average of DMOS per each sequence for all the QP values in image processing. Before the test, we conducted simple to see visual quality judgments of the generated reconstruc- demonstrations for the observers to introduce the evaluation tion frames. Minus values indicate that the video quality of 37062 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder TABLE 9. BD-rate-SSIM comparisons of the proposed algorithm and other existing algorithms. FIGURE 11. DMOS comparisons of Xiang’s, Yeo’s, and the proposed algorithms. the proposed algorithm is subjectively worse than that of the algorithm can code nearly visually identical output over anchor ones. As presented, DMOS scales for the entire test those by HM-16.20. For several video sequences, as shown sequences are quite close to 0. It means that the proposed in Table 7, the visual quality of the proposed algorithm is even VOLUME 8, 2020 37063 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder slightly better than that of the anchor, such as in ‘PeopleOn- gains in SSIM, are yielded by the proposed algorithm, com- Street’, ‘BQTerrace’, ‘BQMall’, and ‘BQSquare’, primarily pared with the HM-16.20, for LDP and RA, respectively. when they are generated under the RA coding structure. This The subjective quality evaluation shows that the proposed similarity in video quality between the proposed algorithm algorithm can produce comparable visual quality against the and HM-16.20 can be seen for all the video sequence classes. anchor with significant bitrate-saving. We can see that the proposed algorithm degrades visually based on the DMOS test very slightly compared to its anchor, REFERENCES by only about −0.05 and −0.04 for LDP and RA configura- [1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, ‘‘Overview of the tions, respectively, as shown in Table 8. high efficiency video coding (HEVC) standard,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [2] I. Marzuki, Y.-J. Ahn, and D. Sim, ‘‘Tile-level rate control for tile- C. COMPARISONS WITH EXISTING ALGORITHMS parallelization HEVC encoders,’’ J. Real-Time Image Process., vol. 16, After we presented both objective and subjective comparisons no. 6, pp. 2107–2125, Sep. 2017, doi: 10.1007/s11554-017-0720-5. [3] C. C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux, between the proposed algorithm and HM-16.20, we can and T. Schierl, ‘‘Parallel scalability and efficiency of HEVC parallelization conclude that the perceptual adaptive QP at the frame-level approaches,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, demonstrates its capability to maintain visual quality with pp. 1827–1838, Dec. 2012. [4] H. Jo and D. Sim, ‘‘Bitstream decoding processor for fast entropy decoding better coding efficiency performances in the perceptual of variable length coding-based multiformat videos,’’ Opt. Eng., vol. 53, term. In this sub-section, we present the same compar- no. 6, Jun. 2014, Art. no. 063102, doi: 10.1117/1.OE.53.6.063102. isons (objective and subjective comparisons) of the proposed [5] Y.-J. Yoon, H. Kim, S.-J. Baek, and S.-J. Ko, ‘‘Largest coding unit level algorithm against other existing algorithms. Table 9 shows rate control algorithm for hierarchical video coding in HEVC,’’ IEIE Trans. Smart Process. Comput., vol. 1, no. 3, pp. 171–181, Dec. 2012. the SSIM-based BD-rate comparisons of Yeo et al. [40], [6] J. Kim and M. Kim, ‘‘Analysis of the JND-suppression effect in quantiza- Xiang et al. [42], and the proposed algorithms. As both tion perspective for HEVC-based perceptual video coding,’’ IEIE Trans. existing algorithms were integrated into HM-16.0, we also Smart Process. Comput., vol. 4, no. 1, pp. 22–27, Feb. 2015. [7] W. Wiratama, Y.-J. Ahn, I. Marzuki, and D. Sim, ‘‘Adaptive Gaussian low- implemented the proposed algorithm in the same software pass pre-filtering for perceptual video coding,’’ IEIE Trans. Smart Process. version to meet fair comparisons. As shown in Table 9, we can Comput., vol. 7, no. 5, pp. 366–377, Oct. 2018. see that the proposed algorithm in the downgraded version [8] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, ‘‘Reducing com- plexity of HEVC: A deep learning approach,’’ IEEE Trans. Image Process., can still outperform two existing algorithms in perceptual vol. 27, no. 10, pp. 5044–5059, Oct. 2018. coding efficiency. Overall, we can achieve a coding gain [9] B. Lee and M. Kim, ‘‘A CU-level rate and distortion estimation scheme of approximately −14.44%, while Xiang’s and Yeo’s are for RDO of hardware-friendly HEVC encoders using low-complexity integer DCTs,’’ IEEE Trans. Image Process., vol. 25, no. 8, pp. 3787–3800, −4.51% and −3.56%, respectively. Note that all the pre- Aug. 2016. sented results in Table 9 were generated under random-access [10] I. Marzuki, J. Ma, Y.-J. Ahn, and D. Sim, ‘‘A context-adaptive fast intra configuration with all the quantization parameter values. coding algorithm of high-efficiency video coding (HEVC),’’ J. Real-Time Image Process., vol. 16, no. 4, pp. 883–899, Mar. 2016, doi: 10.1007/ Furthermore, we also performed the MOS test to eval- s11554-016-0571-5. uate the subjective visual quality of all the algorithms. [11] Q. Hu, X. Zhang, Z. Shi, and Z. Gao, ‘‘Neyman-pearson-based early mode Fig. 11 presents the average DMOS results of Xiang’s, Yeo’s, decision for HEVC encoding,’’ IEEE Trans. Multimedia, vol. 18, no. 3, pp. 379–391, Mar. 2016. and the proposed algorithms in the RA structure. The per- [12] M. Ismail, J. Ma, and D. Sim, ‘‘Full depth RQT after PU decision for fast formance of the baseline, which refers to the HM software, encoding of HEVC,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron. is set to zero for the visual similarity evaluation of the three (ISCE ), Jeju Island, South Korea, Jun. 2014, pp. 1–2. algorithms. DMOS scores that are close to the zero baseline [13] Y.-J. Ahn and D. Sim, ‘‘Square-type-first inter-CU tree search algorithm for acceleration of HEVC encoder,’’ J. Real-Time Image Process., vol. 12, indicate visual similarity to the anchor. From the experi- no. 2, pp. 419–432, Feb. 2015, doi: 10.1007/s11554-015-0487-5. mental results, most of the test sequences tested under the [14] J. Gu, M. Tang, J. Wen, and Y. Han, ‘‘Adaptive intra candidate selection proposed algorithm can stand more DMOS points closer to with early depth decision for fast intra prediction in HEVC,’’ IEEE Signal Process. Lett., vol. 25, no. 2, pp. 159–163, Feb. 2018. zero, followed by the Xiang’s and Yeo’s algorithms. This [15] K. Yang, Y. Gong, M. Ma, and H. R. Wu, ‘‘An efficient rate-distortion means that the proposed algorithm can give better quality optimization method for low-delay configuration in H.265/HEVC based subjectively than the two existing algorithms. on temporal layer rate and distortion dependence,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1230–1236, Apr. 2019. [16] M. Ismail, H. Jo, and D. Sim, ‘‘Fast intra mode decision for HEVC V. CONCLUSION intra coding,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE), In this work, we propose a perceptual adaptive QP algo- Jeju Island, South Korea, Jun. 2014, pp. 1–2. [17] W. Lee, J. Lee, D. Sim, and S.-J. Oh, ‘‘A deep learning based inter-layer rithm at the frame-level to obtain better subjective coding reference picture generation method for improving SHVC coding perfor- performance for HEVC. The proposed algorithm utilizes a mance,’’ J. Broadcast Eng., vol. 24, no. 3, pp. 401–410, May 2019. predefined model of the VGG-16 network for feature extrac- [18] W. Lim and D. Sim, ‘‘Determination of optimum quantization parameters in residual quad-tree of HEVC based on perceptual quality,’’ J. Imag. Sci. tions from the original and previously reconstructed pictures. Technol., vol. 62, no. 2, pp. 205021–205028, Mar. 2018. We designed the proposed algorithm by developing a percep- [19] V. Barocini, J.-R. Ohm, and G. J. Sullivan, Report of Results From the tual loss function based on the extracted features. The pro- Call for Proposals on Video Compression With Capability Beyond HEVC, posed algorithm adaptively determines perceptual QP values document JVET-J1003, Joint Video Experts Team, 2018. [20] S. Liu, B. Choi, K. Kawamura, Y. Li, L. Wang, P. Wu, and H. Yang, JVET for different picture types of the hierarchical coding structure AHG Report: Neural Networks in Video Coding, document JVET-L0009, in HEVC. Results of approximately −21% and −14% coding Joint Video Experts Team, 2018. 37064 VOLUME 8, 2020 I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder [21] L. Zhou, X. Song, J. Yao, L. Wang, and F. Chen, Convolutional Neural Net- [45] D. Liu, Y. Li, J. Lin, H. Li, and F. Wu, ‘‘Deep learning-based video coding: work Filter for Intra Frame, document JVET-I0022, Joint Video Experts A review and a case study,’’ 2019, arXiv:1904.12462. [Online]. Available: Team, 2018. http://arxiv.org/abs/1904.12462 [22] J. Yao, X. Song, S. Fang, and L. Wang, AHG9: Convolutional Neural Net- [46] S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wanga, ‘‘Image and work Filter for Inter Frame, document JVET-J0043, Joint Video Experts video compression with neural networks: A review,’’ IEEE Trans. Circuits Team, 2018. Syst. Video Technol., to be published, doi: 10.1109/TCSVT.2019.2910119. [23] T. Hashimoto and E. Sasaki T. Ikai, AHG9: Separable Convolutional Neu- [47] H. Choi and I. V. Bajic, ‘‘Deep frame prediction for video coding,’’ IEEE ral Network Filter With Squeeze-and-Excitation Block, document JVET- Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/TCSVT. K0158, Joint Video Experts Team, 2018. 2019.2924657. [24] Y.-L. Hsiao, C.-Y. Chen, T.-D. Chuang, C.-W. Hsu, Y.-W Huang, and [48] S. Ki, S.-H. Bae, M. Kim, and H. Ko, ‘‘Learning-based just-noticeable- S.-M Lei, AHG9: Convolution Neural Network Loop Filter, document quantization-distortion modeling for perceptual video coding,’’ IEEE JVET-K0222, Joint Video Experts Team, 2018. Trans. Image Process., vol. 27, no. 7, pp. 3178–3193, Jul. 2018. [25] Y. Wang, Z. Chen, and Y. Li, AHG9: Dense Residual Convolutional [49] Y. Li, B. Li, D. Liu, and Z. Chen, ‘‘A convolutional neural network-based Neural Network Based in-Loop Filter, document JVET-K0391, Joint Video approach to rate control in HEVC intra coding,’’ in Proc. IEEE Vis. Experts Team, 2018. Commun. Image Process. (VCIP), St. Petersburg, FL, USA, Dec. 2017, [26] I. Marzuki and D. Sim, ‘‘Overview of potential technologies for future pp. 1–4. video coding standard (FVC) in JEM software: Status and review,’’ IEIE [50] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for Trans. Smart Process. Comput., vol. 7, no. 1, pp. 22–35, Feb. 2018. large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online]. Avail- [27] HM. HEVC Test Model. [Online]. Available: http://hevc.hhi.fraunhofer. able: http://arxiv.org/abs/1409.1556 de/svn/svnHEVCSoftware/ [51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [28] G. J. Sullivan and T. Wiegand, ‘‘Rate-distortion optimization for video with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6, compression,’’ IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, pp. 84–90, May 2017. Nov. 1998. [52] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real- [29] A. Ortego and K. Ramchandran, ‘‘Rate-distortion methods for image and time object detection with region proposal networks,’’ IEEE Trans. Pattern video compression,’’ IEEE Signal Process. Mag., vol. 15, no. 6, pp. 23–50, Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. Nov. 1998. [53] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, [30] B. Li, D. Zhang, H. Li, and J. Xu, QP Determination By Lambda Value, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, ‘‘Photo-realistic single document JCTVC-I0426, Joint Collaborative Team on Video Coding, image super-resolution using a generative adversarial network,’’ in Proc. 2012. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, [31] B. Li, J. Xu, D. Zhang, and H. Li, ‘‘QP refinement according to Lagrange Jul. 2017, pp. 105–114. multiplier for high efficiency video coding,’’ in Proc. IEEE Int. Symp. [54] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality Circuits Syst. (ISCAS), Beijing, China, May 2013, pp. 447–480. assessment: From error visibility to structural similarity,’’ IEEE Trans. [32] F. Bossen, Common HM Test Conditions and Software Reference Con- Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. figurations, document JCTVC-L1100, Joint Collaborative Team on Video [55] S. Pateux and J. Jung, An Excel Add-in for Computing Bjontegaard Metric Coding, 2013. and Its Evolution, document VCEG-AE07, Video Coding Experts Group, [33] M. Wang, K. N. Ngan, H. Li, and H. Zeng, ‘‘Improved block level adaptive 2007. quantization for high efficiency video coding,’’ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Lisbon, Portugal, May 2015, pp. 509–512. [34] T. Zhao, Z. Wang, and C. W. Chen, ‘‘Adaptive quantization parameter cascading in HEVC hierarchical coding,’’ IEEE Trans. Image Process., ISMAIL MARZUKI received the B.S. degree in vol. 25, no. 7, pp. 2997–3009, Jul. 2016. informatics from the UIN Sultan Syarif Kasim [35] S. Li, C. Zhu, Y. Gao, Y. Zhou, F. Dufaux, and M.-T. Sun, ‘‘Lagrangian Riau, Indonesia, in 2011, and the M.S. degree in multiplier adaptation for rate-distortion optimization with inter-frame computer engineering from Kwangwoon Univer- dependency,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1, sity, Seoul, South Korea, in 2015, where he is pp. 117–129, Jan. 2016. currently pursuing the Ph.D. degree. He joined [36] J. He, E.-H. Yang, F. Yang, and K. Yang, ‘‘Adaptive quantization parameter the Image Processing Systems Laboratory (IPSL), selection for H.265/HEVC by employing inter-frame dependency,’’ IEEE Trans. Circuits Syst. for Video Technol., vol. 28, no. 12, pp. 3424–3436, in 2013. His research interests are related to Dec. 2018. high-efficiency video compression (HEVC/x265) [37] T.-D. Chuang, C.-Y. Chen, Y.-L. Chang, Y.-W. Huang, and S. Lei, AhG techniques, fast coding, rate control, and versatile Quantization: Sub-LCU Delta QP, document JCTVC-E051, Joint Collab- video coding (VVC), and deep learning. orative Team on Video Coding, 2011. [38] X265/HEVC Reference Software. [Online]. Available: http://hg.videolan. org/x265 [39] MPEG-2 Test Model 5, Rate Control and Quantization Control Chapter DONGGYU SIM received B.S. and M.S. degrees 10. [Online]. Available: http://www.mpeg.org/MPEG/MSSG/tm5/Ch10/ in electronic engineering and the Ph.D. degree Ch10.html from Sogang University, South Korea, in 1993, [40] C. Yeo, H. L. Tan, and Y. H. Tan, ‘‘SSIM-based adaptive quantization 1995, and 1999, respectively. He was with the in HEVC,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Hyundai Electronics Company, Ltd., from 1999 to Vancouver, BC, Canada, May 2013, pp. 1690–1694. 2000, being involved in MPEG-7 standardization. [41] L. Prangnell, V. Sanchez, and R. Vanam, ‘‘Adaptive quantization by soft He was a Senior Research Engineer with Varo thresholding in HEVC,’’ in Proc. Picture Coding Symp. (PCS), Cairns, Vision Company, Ltd, working on MPEG-4 wire- QLD, Australia, May 2015, pp. 35–39. less applications, from 2000 to 2002. He worked [42] G. Xiang, H. Jia, M. Yang, J. Liu, C. Zhu, Y. Li, and X. Xie, ‘‘An improved for the Image Computing Systems Laboratory adaptive quantization method based on perceptual CU early splitting for (ICSL), University of Washington, as a Senior Research Engineer, from HEVC,’’ in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), Las Vegas, NV, USA, Jan. 2017, pp. 362–365. 2002 to 2005. He researched on ultrasound image analysis and parametric [43] G. Xiang, H. Jia, M. Yang, X. Zhang, X. Huang, J. Liu, and X. Xie, video coding. Since 2005, he has been with the Department of Computer ‘‘A perceptually temporal adaptive quantization algorithm for HEVC,’’ Engineering, Kwangwoon University, Seoul, South Korea. In 2011, he joined J. Vis. Commun. Image Represent., vol. 50, pp. 280–289, Jan. 2018. the Simon Frasier University as a Visiting Scholar. He is one of main inven- [44] K. Rouis, M.-C. Larabi, and J. B. Tahar, ‘‘Perceptually adaptive tors in many essential patents licensed to MPEG-LA for HEVC standard. Lagrangian multiplier for HEVC guided rate-distortion optimization,’’ His current research interests are video coding, video processing, computer IEEE Access, vol. 6, pp. 33589–33603, Jun. 2018, doi: 10.1109/ACCESS. vision, and video communication. 2018.2843384. VOLUME 8, 2020 37065

References (55)

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, ''Overview of the high efficiency video coding (HEVC) standard,'' IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649-1668, Dec. 2012.
I. Marzuki, Y.-J. Ahn, and D. Sim, ''Tile-level rate control for tile- parallelization HEVC encoders,'' J. Real-Time Image Process., vol. 16, no. 6, pp. 2107-2125, Sep. 2017, doi: 10.1007/s11554-017-0720-5.
C. C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux, and T. Schierl, ''Parallel scalability and efficiency of HEVC parallelization approaches,'' IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1827-1838, Dec. 2012.
H. Jo and D. Sim, ''Bitstream decoding processor for fast entropy decoding of variable length coding-based multiformat videos,'' Opt. Eng., vol. 53, no. 6, Jun. 2014, Art. no. 063102, doi: 10.1117/1.OE.53.6.063102.
Y.-J. Yoon, H. Kim, S.-J. Baek, and S.-J. Ko, ''Largest coding unit level rate control algorithm for hierarchical video coding in HEVC,'' IEIE Trans. Smart Process. Comput., vol. 1, no. 3, pp. 171-181, Dec. 2012.
J. Kim and M. Kim, ''Analysis of the JND-suppression effect in quantiza- tion perspective for HEVC-based perceptual video coding,'' IEIE Trans. Smart Process. Comput., vol. 4, no. 1, pp. 22-27, Feb. 2015.
W. Wiratama, Y.-J. Ahn, I. Marzuki, and D. Sim, ''Adaptive Gaussian low- pass pre-filtering for perceptual video coding,'' IEIE Trans. Smart Process. Comput., vol. 7, no. 5, pp. 366-377, Oct. 2018.
M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, ''Reducing com- plexity of HEVC: A deep learning approach,'' IEEE Trans. Image Process., vol. 27, no. 10, pp. 5044-5059, Oct. 2018.
B. Lee and M. Kim, ''A CU-level rate and distortion estimation scheme for RDO of hardware-friendly HEVC encoders using low-complexity integer DCTs,'' IEEE Trans. Image Process., vol. 25, no. 8, pp. 3787-3800, Aug. 2016.
I. Marzuki, J. Ma, Y.-J. Ahn, and D. Sim, ''A context-adaptive fast intra coding algorithm of high-efficiency video coding (HEVC),'' J. Real-Time Image Process., vol. 16, no. 4, pp. 883-899, Mar. 2016, doi: 10.1007/ s11554-016-0571-5.
Q. Hu, X. Zhang, Z. Shi, and Z. Gao, ''Neyman-pearson-based early mode decision for HEVC encoding,'' IEEE Trans. Multimedia, vol. 18, no. 3, pp. 379-391, Mar. 2016.
M. Ismail, J. Ma, and D. Sim, ''Full depth RQT after PU decision for fast encoding of HEVC,'' in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE ), Jeju Island, South Korea, Jun. 2014, pp. 1-2.
Y.-J. Ahn and D. Sim, ''Square-type-first inter-CU tree search algorithm for acceleration of HEVC encoder,'' J. Real-Time Image Process., vol. 12, no. 2, pp. 419-432, Feb. 2015, doi: 10.1007/s11554-015-0487-5.
J. Gu, M. Tang, J. Wen, and Y. Han, ''Adaptive intra candidate selection with early depth decision for fast intra prediction in HEVC,'' IEEE Signal Process. Lett., vol. 25, no. 2, pp. 159-163, Feb. 2018.
K. Yang, Y. Gong, M. Ma, and H. R. Wu, ''An efficient rate-distortion optimization method for low-delay configuration in H.265/HEVC based on temporal layer rate and distortion dependence,'' IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1230-1236, Apr. 2019.
M. Ismail, H. Jo, and D. Sim, ''Fast intra mode decision for HEVC intra coding,'' in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE), Jeju Island, South Korea, Jun. 2014, pp. 1-2.
W. Lee, J. Lee, D. Sim, and S.-J. Oh, ''A deep learning based inter-layer reference picture generation method for improving SHVC coding perfor- mance,'' J. Broadcast Eng., vol. 24, no. 3, pp. 401-410, May 2019.
W. Lim and D. Sim, ''Determination of optimum quantization parameters in residual quad-tree of HEVC based on perceptual quality,'' J. Imag. Sci. Technol., vol. 62, no. 2, pp. 205021-205028, Mar. 2018.
V. Barocini, J.-R. Ohm, and G. J. Sullivan, Report of Results From the Call for Proposals on Video Compression With Capability Beyond HEVC, document JVET-J1003, Joint Video Experts Team, 2018.
S. Liu, B. Choi, K. Kawamura, Y. Li, L. Wang, P. Wu, and H. Yang, JVET AHG Report: Neural Networks in Video Coding, document JVET-L0009, Joint Video Experts Team, 2018.
L. Zhou, X. Song, J. Yao, L. Wang, and F. Chen, Convolutional Neural Net- work Filter for Intra Frame, document JVET-I0022, Joint Video Experts Team, 2018.
J. Yao, X. Song, S. Fang, and L. Wang, AHG9: Convolutional Neural Net- work Filter for Inter Frame, document JVET-J0043, Joint Video Experts Team, 2018.
T. Hashimoto and E. Sasaki T. Ikai, AHG9: Separable Convolutional Neu- ral Network Filter With Squeeze-and-Excitation Block, document JVET- K0158, Joint Video Experts Team, 2018.
Y.-L. Hsiao, C.-Y. Chen, T.-D. Chuang, C.-W. Hsu, Y.-W Huang, and S.-M Lei, AHG9: Convolution Neural Network Loop Filter, document JVET-K0222, Joint Video Experts Team, 2018.
Y. Wang, Z. Chen, and Y. Li, AHG9: Dense Residual Convolutional Neural Network Based in-Loop Filter, document JVET-K0391, Joint Video Experts Team, 2018.
I. Marzuki and D. Sim, ''Overview of potential technologies for future video coding standard (FVC) in JEM software: Status and review,'' IEIE Trans. Smart Process. Comput., vol. 7, no. 1, pp. 22-35, Feb. 2018.
HM. HEVC Test Model. [Online]. Available: http://hevc.hhi.fraunhofer. de/svn/svnHEVCSoftware/
G. J. Sullivan and T. Wiegand, ''Rate-distortion optimization for video compression,'' IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74-90, Nov. 1998.
A. Ortego and K. Ramchandran, ''Rate-distortion methods for image and video compression,'' IEEE Signal Process. Mag., vol. 15, no. 6, pp. 23-50, Nov. 1998.
B. Li, D. Zhang, H. Li, and J. Xu, QP Determination By Lambda Value, document JCTVC-I0426, Joint Collaborative Team on Video Coding, 2012.
B. Li, J. Xu, D. Zhang, and H. Li, ''QP refinement according to Lagrange multiplier for high efficiency video coding,'' in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Beijing, China, May 2013, pp. 447-480.
F. Bossen, Common HM Test Conditions and Software Reference Con- figurations, document JCTVC-L1100, Joint Collaborative Team on Video Coding, 2013.
M. Wang, K. N. Ngan, H. Li, and H. Zeng, ''Improved block level adaptive quantization for high efficiency video coding,'' in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Lisbon, Portugal, May 2015, pp. 509-512.
T. Zhao, Z. Wang, and C. W. Chen, ''Adaptive quantization parameter cascading in HEVC hierarchical coding,'' IEEE Trans. Image Process., vol. 25, no. 7, pp. 2997-3009, Jul. 2016.
S. Li, C. Zhu, Y. Gao, Y. Zhou, F. Dufaux, and M.-T. Sun, ''Lagrangian multiplier adaptation for rate-distortion optimization with inter-frame dependency,'' IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1, pp. 117-129, Jan. 2016.
J. He, E.-H. Yang, F. Yang, and K. Yang, ''Adaptive quantization parameter selection for H.265/HEVC by employing inter-frame dependency,'' IEEE Trans. Circuits Syst. for Video Technol., vol. 28, no. 12, pp. 3424-3436, Dec. 2018.
T.-D. Chuang, C.-Y. Chen, Y.-L. Chang, Y.-W. Huang, and S. Lei, AhG Quantization: Sub-LCU Delta QP, document JCTVC-E051, Joint Collab- orative Team on Video Coding, 2011.
X265/HEVC Reference Software. [Online]. Available: http://hg.videolan. org/x265
MPEG-2 Test Model 5, Rate Control and Quantization Control Chapter 10. [Online]. Available: http://www.mpeg.org/MPEG/MSSG/tm5/Ch10/ Ch10.html
C. Yeo, H. L. Tan, and Y. H. Tan, ''SSIM-based adaptive quantization in HEVC,'' in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Vancouver, BC, Canada, May 2013, pp. 1690-1694.
L. Prangnell, V. Sanchez, and R. Vanam, ''Adaptive quantization by soft thresholding in HEVC,'' in Proc. Picture Coding Symp. (PCS), Cairns, QLD, Australia, May 2015, pp. 35-39.
G. Xiang, H. Jia, M. Yang, J. Liu, C. Zhu, Y. Li, and X. Xie, ''An improved adaptive quantization method based on perceptual CU early splitting for HEVC,'' in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), Las Vegas, NV, USA, Jan. 2017, pp. 362-365.
G. Xiang, H. Jia, M. Yang, X. Zhang, X. Huang, J. Liu, and X. Xie, ''A perceptually temporal adaptive quantization algorithm for HEVC,'' J. Vis. Commun. Image Represent., vol. 50, pp. 280-289, Jan. 2018.
K. Rouis, M.-C. Larabi, and J. B. Tahar, ''Perceptually adaptive Lagrangian multiplier for HEVC guided rate-distortion optimization,'' IEEE Access, vol. 6, pp. 33589-33603, Jun. 2018, doi: 10.1109/ACCESS. 2018.2843384.
D. Liu, Y. Li, J. Lin, H. Li, and F. Wu, ''Deep learning-based video coding: A review and a case study,'' 2019, arXiv:1904.12462. [Online]. Available: http://arxiv.org/abs/1904.12462
S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wanga, ''Image and video compression with neural networks: A review,'' IEEE Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/TCSVT.2019.2910119.
H. Choi and I. V. Bajic, ''Deep frame prediction for video coding,'' IEEE Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/TCSVT. 2019.2924657.
S. Ki, S.-H. Bae, M. Kim, and H. Ko, ''Learning-based just-noticeable- quantization-distortion modeling for perceptual video coding,'' IEEE Trans. Image Process., vol. 27, no. 7, pp. 3178-3193, Jul. 2018.
Y. Li, B. Li, D. Liu, and Z. Chen, ''A convolutional neural network-based approach to rate control in HEVC intra coding,'' in Proc. IEEE Vis. Commun. Image Process. (VCIP), St. Petersburg, FL, USA, Dec. 2017, pp. 1-4.
K. Simonyan and A. Zisserman, ''Very deep convolutional networks for large-scale image recognition,'' 2014, arXiv:1409.1556. [Online]. Avail- able: http://arxiv.org/abs/1409.1556
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ''ImageNet classification with deep convolutional neural networks,'' Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017.
S. Ren, K. He, R. Girshick, and J. Sun, ''Faster R-CNN: Towards real- time object detection with region proposal networks,'' IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, Jun. 2017.
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, ''Photo-realistic single image super-resolution using a generative adversarial network,'' in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 105-114.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ''Image quality assessment: From error visibility to structural similarity,'' IEEE Trans. Image Process., vol. 13, no. 4, pp. 600-612, Apr. 2004.
S. Pateux and J. Jung, An Excel Add-in for Computing Bjontegaard Metric and Its Evolution, document VCEG-AE07, Video Coding Experts Group, 2007. ISMAIL MARZUKI received the B.S. degree in informatics from the UIN Sultan Syarif Kasim Riau, Indonesia, in 2011, and the M.S. degree in computer engineering from Kwangwoon Univer- sity, Seoul, South Korea, in 2015, where he is currently pursuing the Ph.D. degree. He joined the Image Processing Systems Laboratory (IPSL), in 2013. His research interests are related to high-efficiency video compression (HEVC/x265) techniques, fast coding, rate control, and versatile video coding (VVC), and deep learning. DONGGYU SIM received B.S. and M.S. degrees in electronic engineering and the Ph.D. degree from Sogang University, South Korea, in 1993, 1995, and 1999, respectively. He was with the Hyundai Electronics Company, Ltd., from 1999 to 2000, being involved in MPEG-7 standardization. He was a Senior Research Engineer with Varo Vision Company, Ltd, working on MPEG-4 wire- less applications, from 2000 to 2002. He worked for the Image Computing Systems Laboratory (ICSL), University of Washington, as a Senior Research Engineer, from 2002 to 2005. He researched on ultrasound image analysis and parametric video coding. Since 2005, he has been with the Department of Computer Engineering, Kwangwoon University, Seoul, South Korea. In 2011, he joined the Simon Frasier University as a Visiting Scholar. He is one of main inven- tors in many essential patents licensed to MPEG-LA for HEVC standard. His current research interests are video coding, video processing, computer vision, and video communication.

About the author

Ismail Marzuki

Papers

Followers

View all papers from Ismail Marzukiarrow_forward

Perceptual Adaptive Quantization Parameter Selection Using Deep Convolutional Features for HEVC Encoder

Sign up for access to the world's latest research

Abstract

Related papers

References (55)

Related papers