Received January 15, 2020, accepted February 13, 2020, date of publication February 24, 2020, date of current version March 2, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2976142
Perceptual Adaptive Quantization Parameter
Selection Using Deep Convolutional Features
for HEVC Encoder
ISMAIL MARZUKI AND DONGGYU SIM
Department of Computer Engineering, Kwangwoon University, Seoul 139701, South Korea
Corresponding author: Donggyu Sim (
[email protected])
This work was supported in part by the Ministry of Science and ICT (MSIT), South Korea, under the Information Technology Research
Center (ITRC) supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP), under Grant
IITP-2019-2016-0-00288, and in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF)
through the Ministry of Science, ICT & Future Planning under Grant NRF-2018R1A2B2008238.
ABSTRACT In this paper, we propose a perceptual adaptive quantization based on a deep neural network
on high efficiency video coding (HEVC) for bitrate reduction while maintaining subjective visual quality.
The proposed algorithm adaptively determines frame-level QP values for different picture types of the
hierarchical coding structure in HEVC by taking into account the high-level features extracted from the
original and previously reconstructed pictures. A predefined model based on the visual geometry group
(VGG-16) network is exploited to extract the high-level features for subjective visual characteristics.
Furthermore, the Lagrange multiplier for each frame is also adaptively determined by involving the proposed
features for deciding the appropriate parameter of the Lagrange multiplier that can be used for rate-distortion
optimization during the encoding process. Experimental results reveal that the proposed perceptual adaptive
QP selection can facilitate bitrate savings up to 65.73% and 47.68% and improve the BD-rate based on SSIM
by approximately 20.68% and 14.27% under low-delay-P and random-access coding structures, respectively,
with very minimal visual quality degradation when compared to HM-16.20 without adaptive QP selection.
INDEX TERMS Adaptive quantization parameter, deep neural network, high efficiency video coding
(HEVC), perceptual quantization parameter, VGG-16 network, video coding.
I. INTRODUCTION global Lagrange multiplier and determines the quantization
High-efficiency video coding (HEVC) standard has been parameter (QP) value using a QP- λ model. The Lagrange
widely accepted to achieve better compression performance multiplier λ can be termed as a function of the quantization
over H.264/Advanced Video Coding (AVC) by maintain- step size, which is closely related to the QP value. It is used
ing similar visual quality [1]. It has encompassed various for the coding efficiency of each basic unit by selecting the
video media services and applies not only to full high def- best coding mode under a given QP value, where the basic
inition (FHD) but also to 4K/8K ultra-HD (UHD) [2]–[4]. unit can be a frame, slice, or coding unit (CU). The common
Since the standard was released, many studies have been test condition (CTC) designed by the Joint Video Experts
conducted for the sake of its advantages of visual quality Team (JVET) employs static quantization parameters for fair
improvement [5]–[7], computational complexity reduction comparison in standardization [32]. However, an adaptive
[8]–[16], bitrate reduction [17], [18], and prospects as a QP selection is known to be effective in improving subjec-
future video coding standard [19]–[26]. Among many cod- tive visual quality for practical applications. The adaptive
ing tools, rate-distortion optimization (RDO) in the HEVC QP should be designed to be harmonized within the RDO
software model (HM) [26]–[28] is used to improve its coding process. It can adjust the QP value for a distinctive frame
efficiency [30], [31]. It is based on optimization using the or slice according to different spatial, temporal, or visual
aspects. Some studies have discovered approaches to improve
The associate editor coordinating the review of this manuscript and the compression rates [33]–[37] or visual quality [38]–[44]
approving it for publication was Shiqi Wang. with various adaptive QP techniques. Typically, these studies
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
37052 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
prioritize the determination of optimum QPs for the RDO where N denotes the number of basic units, Di is the coding
process to produce better encoding parameters by analyz- distortion, and Ri is the coding bitrate of the i−th basic unit.
ing the QP- λ relationship or by observing the effective- Note that the basic unit in HEVC term may be a frame, slice,
ness of spatial-temporal dependencies among the basic units. or CU. Di and Ri in (1) form on QP = (QPi , · · · , QN ).
Generally, these studies take into consideration the essential QPi refers to the QP value for the i−th basic unit and
role of λ in the RDO process. Thus, it will be interesting to QP∗ = (QP∗i , · · · , QP∗N ) represents the optimal QP set
consider a deep neural network (DNN) for more varied QPs for the N basic units. Applying the λ method [29] into the
in HEVC. Studies have prevailed benefits of DNN for video following unconstrained form, equation (1) can be rewritten
coding [45]–[49]. However, there is no existing effective as:
DNN-based algorithm for perceptual adaptive QP purposes.
QP∗ = arg min (QP) {J },
This study presents a DNN-based QP selection method by
N N
the adaptive determination of frame-level perceptual QP for X X
J = Di + λ Ri (2)
HEVC to achieve bitrate reduction without inducing visual
i=1 i=1
quality degradation. The proposed algorithm is embedded in
HM-16.20 and generates QP values adaptively for different where J stands for the total rate-distortion (RD) cost function,
picture types and coding structures in HEVC. The proposed and λ represents the trade-off parameter between Di and Ri .
algorithm first determines a QP for the first frame in a Along with the RDO process, λ in HEVC can be obtained as
sequence by averaging the standard deviation value of the λ = QPfactor 2QP/3 , (3)
original blocks (StD). Then, the proposed algorithm obtains
high-level features from the original and reconstructed frames where QP denotes the quantization parameter, and QPfactor
using a pretrained visual geometry group (VGG-16) network is a constant parameter related to coding configurations. The
model [50]. Based on the extracted high-level features, more QP value in (3) is an integer introduced to represent an actual
visual-friendly QP is then distributed for the next consecutive quantization step size by an exponential mapping function.
frames in the encoding order. The algorithm also determines However, the quantization step size in HEVC tends to be
the Lagrange multiplier adaptively for each frame based static for complexity reduction in the RDO process. Applying
on the proposed model, which can be used for RDO in a fixed or predefined QP scheme may cause the compression
the encoding process. As a result, the proposed algorithm rate to drop significantly, while HEVC has different coding
demonstrates significant coding gain with minimal visual configurations. Hence, this becomes a major challenge for
degradation against HM-16.20 and other existing adaptive QP any QP method design in HEVC. Many QP adjustment meth-
algorithms. ods have been studied for better coding gain. For example,
The rest of this paper is organized as follows. In section 2, a QP–λ relationship is used to determine the λ value accord-
we briefly present an overview of the QP decision in HM ing to an initial QP, and subsequently, the new QP value is
and related works. In section 3, we discuss the proposed recalculated [30], [31]. This algorithm is widely known as
perceptual adaptive QP for HM. In section 4, we review a straight-forward algorithm for the RDO scheme in HEVC.
several performance evaluations of the proposed algorithm, Wang et al. [33] introduced an improved block-level adaptive
and finally, we draw the conclusions and suggest further QP value that considers previously coded block information.
research directions in section 5. Zhao et al. [34] proposed a QP cascading scheme that assigns
QP values to different hierarchical temporal picture layers.
II. CURRENT STATE OF QP SELECTION AND RELATED Similar algorithms were also introduced by Li et al. [35] and
STUDIES OF PERCEPTUAL ADAPTIVE QP IN HEVC He et al. [36], which presented only an inter-frame depen-
The current QP selection within the RDO process in HEVC is dency technique. As far as we know, these last two algorithms
not optimal. Many studies have revealed several weaknesses can provide better coding gain for an HEVC encoder. Exten-
of the QP selection technique in the HEVC encoder. In this sive use of spatial-temporal predictions in HEVC is important
section, several adaptive QP techniques for HEVC are dis- for adaptive QP selection in RDO. Although the integration
cussed as follows. of such propagation effects is desirable, there are not many
such studies.
A. GENERAL QP SELECTION CONCEPT IN HM
B. EXISTING METHODS OF PERCEPTUAL ADAPTIVE QP
QP selection in video coding can be mathematically
SELECTION FOR HM
described as an RDO problem [35], [36] that minimizes the
total coding distortion D at a given bitrate RT as: Determining the QP value for video encoders also affects
the entirely visual quality of a video sequence. To improve
XN the subjective quality of adaptive QP, the spatial and tem-
QP∗ = QP∗i , · · · , QP∗N = arg min (QP) Di ,
i=1 poral features or combination of those may be designed
N
X empirically. Open software of × 265 [38] becomes one of
s.t. Ri ≤ RT (1) several algorithms that developed a perceptual adaptive QP
i=1 method with spatial and temporal features. However, it still
VOLUME 8, 2020 37053
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
fails to give promising outcomes if a reference frame has as shown in (3). The two main factors involved are the
characteristics different from the current coding frame. Test QPfactor and QP value. Frame-level QP decision in HM-16.20
Model 5 (TM5 Model) of MPEG-2 software [39] also uses is determined with the same QP offset for multiple frames
the method that scales a quantization step according to the in the same temporal ID layer, while the QPfactor denotes
spatial activity of one CU relative to a frame-level average for the coding structure parameter is always set static as
of the spatial activity. This method fails when the size of a 0.57, regardless of frame or slice types and coding structures.
large CU block needs to be estimated, thus limiting its perfor- In HEVC, the different frames form a set of hierarchical
mance [37]. Similarly, Yeo et al. [40] also introduced a block- structures within a group of pictures, GOP. For example,
level adaptive QP selection algorithm. It observes the spatial frames at a higher temporal layer in the same GOP can be
and temporal pixel characteristics of CU blocks. However, predicted from one or more frames at the lower temporal
it needs a higher encoding time. Prangnell et al. [41] used layers. Therefore, giving only the default value of QP off-
transform coefficients based on a soft thresholding method. set and QPfactor to generalize different frames and coding
However, the proposed soft thresholding method may still structures is not perceptually wise for HEVC encoders. Both
cause fluctuations of the visible quality, resulting in severe spatial and temporal features could be sufficient to resolve the
visual distortion. issues. However, most of the existing adaptive QP methods
An alternative algorithm was proposed by determining a mainly concentrate only on one of both elements. In this
QP offset based on a QP − λ relationship that is formed. paper, the proposed algorithm demonstrates visual feature
Yeo et al. [40] has also studied related topics. However, their extraction based on a particular convolutional layer of a DNN
method utilized only the spatial variance of a block, which model for a frame-level adaptive QP. We consider both the
is limited for videos with large homogeneous areas [42]. spatial and temporal features to generate the adaptive QP and
Xiang et al. [43] proposed a perceptual motion estimation QP factor decision for the proposed algorithm.
method using a spatial-temporal just-noticeable-distortion Fig. 1 depicts the whole process of the proposed algorithm.
(JND) model for a QP offset design. Rouis et al. [44] gen- As shown in Fig. 1, the proposed algorithm is embedded
erated perceptual features temporally as well as CTU visual in the HEVC encoder. The proposed algorithm is processed
sensitivity for spatial features. However, both features con- during the slice initialization. Depending on the slice or frame
sidered in this algorithm are provided only for an adaptive λ types, the QP value and QP factor are determined adaptively.
in RDO. As a conclusion, spatial and temporal perceptual Fig. 2 shows the detailed process of the proposed algorithm.
features for an adaptive QP decision can provide a better For the first frame in a sequence, the proposed algorithm
trade-off [43], [44]. is designed in a straightforward manner by considering the
standard deviation values of the original frame to decide
C. DNN APPROACH TO PERCEPTUAL ADAPTIVE QP upon a QP value and set QP factor as its default value. Then,
SELECTION FOR HM a pretrained VGG-16 model is employed to extract visual
The use of DNN for video coding has now become pos- features from the original and reconstructed frames to predict
sible for the video coding community. Liu et al. [45] and the QP and QP factor for consecutive frames. The designed
Ma et al. [46] have presented case studies on deep visual features result in a perceptual loss value based on
learning-based video coding. Several researchers such as the Euclidean distance measure, VGGfeature . The QP and
Choi and Bajic [47] studied a deep learning-based frame Lagrange multiplier values based on VGGfeature are then
prediction using decoded frames to predict the textures of a adaptively estimated by considering the picture types and
block. It performs both uni- and bi-directional predictions at coding configurations in HEVC. A detailed discussion of
various distances from a target frame. Ki et al. [48] developed this section is divided into several sub-categories as follows.
a JND model based on deep learning for the assessment of Symbols and descriptions used in the proposed algorithm
perceptual distortion in HEVC. Li et al. [49] proposed a of the adaptive frame-level perceptual QP for HEVC are
DNN-based rate control for Intra coded pictures in HEVC tabulated in Table 1.
that is designed to predict the parameters of the R − λ rate
control model. Other studies have successfully revealed the A. GENERATION OF VISUAL FEATURES FOR THE
benefits of deep learning for video encoding. However, it is PROPOSED PERCEPTUAL ADAPTIVE QP ALGORITHM
still difficult to find one specific deep learning method for a We propose to adaptively adjust a perceptual QP value
perceptual adaptive QP. In this paper, we present a perceptual per frame by employing a deep learning network, namely,
adaptive QP based on a predefined VGG network for HEVC. the VGG-16 network [50]. The proposed algorithm employs
a pretrained VGG-16 model to construct high-level feature
III. PROPOSED ALGORITHM FOR PERCEPTUAL descriptors using a specific convolutional layer. We select
ADAPTIVE QP SELECTION FOR HEVC ENCODER VGG-16 for this study due to some of its desirable charac-
The main objective of the proposed algorithm is to achieve teristics. VGG-16 is widely recognized for its remarkable
significant bitrate savings without inducing noticeable visual performance on image classification, which classifies over
distortions in reconstructed video frames. We first observed 14 million images to 1000 categories. It has a better image
the current setting of the QP − λ relationship in HEVC, classification accuracy than the AlexNet model [51]. It has
37054 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
FIGURE 1. Block diagram of the proposed perceptual adaptive QP.
deep convolutional layer design used to train on an enormous
and manifold image dataset, which results in convolution
filters that are well suited to search universal patterns and gen-
eralize them. It is also widely applied as a feature extraction
technique in many computer vision solutions [52], [53]. For
the same reason, the proposed algorithm also takes advantage
of the VGG-16 convolution layers only for visual feature
extraction. In this paper, a simplified VGG-16 network is
employed by removing the latest pooling and fully connected
layers, as depicted in Fig. 3. In the figure, h and w represent
the height and width of the input 64 × 64 CTU block, respec-
tively. Fortunately, the VGG network can handle any input
block size, as long as h and w are multiplication of 32. Hence,
the CTU block size can be used directly without necessary
prior processing. By examining the visualization of convo-
lution filters and trial–and–error experiments, we selected
‘block5conv1’, which is the first-fifth convolution layer to
build general features for the proposed algorithm. The ‘pool5’
layer is initially included in the network. However, it is nei-
ther considered for the algorithm nor included in the figure.
The ‘pool5’ layer is commonly affected by specific classifi-
cation objects, which is not favorable for the detection of gen-
eral features. We mainly consider the generalizability of the
FIGURE 2. Overall flowchart of the proposed perceptual adaptive QP.
VGG network, and thereby, the proposed feature descriptors
can search for common and universal patterns.
For better features with HVS consideration, we introduce
a perceptual loss function with a full-reference visual quality
a straightforward architecture that is constructed simply by measure that uses the Euclidean distance. It is based on a
stacking convolution, pooling, and fully connected layers comparison of different feature maps extracted from original
without branches or shortcut connections to reinforce gradi- and reconstructed blocks, as depicted in Fig. 4. The recon-
ent flow. Such a design is versatile and adaptable for different structed block fed to the network is derived after the in-loop
practical purposes. Besides, the VGG-16 has an extremely filter process. The figure shows that the same model of the
VOLUME 8, 2020 37055
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
TABLE 1. Symbols and descriptions used in the proposed perceptual scheme in video coding standards. In this study, the proposed
adaptive QP selection.
algorithm determines the frame-level QP for different picture
types by obtaining a perceptual loss value based on high-level
features from the original and previously reconstructed pic-
tures. With regards to the first frame in a sequence, the deter-
mination of a proper QP value is crucial as it will determine
the overall coding performance. However, having only an
original picture is not enough to provide a perceptual loss
value before the encoding. Hence, we examine whether the
standard deviation values (StD) of the original blocks can
demonstrate the characteristics of a complete picture for
frame-level QP decision. We activated rate control to observe
the different QP values of every CTU within the intraframe
using the ‘BasketballPass’ test sequence with QP 22, 27, 32,
and 37. Subsequently, a relationship between QP and StD is
presented in Fig. 5. A lower StD, which reflects a flat region,
tends to have a higher QP, vice versa. Therefore, we can
expect some coding gain with lower visual quality depres-
sion in this area. However, applying the StD value directly
to vary λ over the QPfactor may lead to high coding loss
performance. Therefore, the QP decision in this algorithm is
adjusted by firstly normalizing the pixel value of every CTU
block in a frame before calculating StD and disregarded the λ
and QPfactor for QP decision. Then, the QP of the first frame
can be more visual-friendly provided and can be expressed
as:
QP0 = QPinit − 3 log2 (StDintra ) (4)
N
1 X
StDintra = σi (5)
N
i=1
v
u
u1 X M
σi = t (xj − µi )2 (6)
u
VGG-16 network is utilized for extracting those high-level M
features. The Euclidean distance is preferred owing to its j=1
simplicity in expressing VGGfeature as a perceptual loss value.
To do this, we first convert the color format of both the where QP0 denotes the QP value of the first frame in a
original and the reconstructed CTU blocks to the RGB color sequence, and QPinit represents the initial QP value set by
format. This process is suggested as a requirement of the the encoder. Since we design the proposed algorithm in CTU
VGG-16 architecture. Then, the network can operate ade- wise, the final picture characteristic of the first frame is
quately to obtain visual features from both input blocks. Once decided based on the StDintra value, which is the average
a VGGfeature is generated, we then use it to determine the StD of the total number N of the original CTU blocks in an
QP value and QPfactor adaptively for the Lagrange multiplier Intra frame. Thus, the symbols σi and µi become the StD and
decision. mean values of the original i−th CTU block, respectively. M
denotes the total number of pixel values xj .
B. PERCEPTUAL ADAPTIVE QP DETERMINATION WITH For the rest of the frames, the quality of the reconstruction
QP-λ RELATIONSHIP frames is generally influenced by a previously coded frame
From the formula in (3), the QP value per frame can be with a certain QP value. In this study, instead of analyzing
derived. However, the λ value in HM-16.20, which represents the distortion of two consecutive frames, we investigate the
the Lagrange multiplier is decided later after the QP decision distortion of VGG features for determining a proper QP
is determined, while the QP value per frame is decided empir- value perceptually. Note that the proposed VGG features are
ically based on the HM configuration. Therefore, finding extracted from the original and reconstructed frames based on
a proper parameter for predicting a frame-level perceptual the VGG-16 model. Therefore, the distortion of VGG features
adaptive QP is a challenging issue. of two consecutive frames can be expressed as
Generally, coding errors may propagate from the previous
frame to subsequent frames because of the prediction coding DVGGpre = f DVGGref (7)
37056 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
FIGURE 3. Proposed double-simplified VGG-16 network architecture.
FIGURE 4. Proposed double-simplified VGG-16 network architecture.
where DVGGpre is the VGG feature distortion of a predicted A further experiment was also conducted with rate control
frame, DVGGref denotes the VGG feature distortion of a refer- enabled to support the observations. Fig. 6(b) shows a high
ence frame, and f (·) is the relationship between DVGGref and correlation between the VGG feature and QP selection per
DVGGpre . frame. Accordingly, the QP decision for the rest of the frame
Fig. 6(a) shows the VGG feature distortion relationship can be determined by considering the picture types as in (8).
between two consecutive frames of the ‘BasketballPass’ test The QP decision for a future intra picture can be deter-
sequence. The sequence is encoded under LDP configuration mined by using the VGGfeature from a previously intra coded
with the coding structure of I-P-P-P-P. Each P frame uses only picture. With regards to the QP decision for P- and B- frames,
its previous coded frame as a reference. We set the predicted we control QPinit with 1pQPFidi and 1bQPFidi depend-
frame with a fixed QP value of 32 and encoded the first ing on the hierarchical frame index i(Fidi ) as shown
15 frames. It can be seen that DVGGref influences DVGGpre . in Table 2. The values of 1pQPFidi and 1bQPFidi are derived
VOLUME 8, 2020 37057
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
derived as the default settings as in HEVC encoder configu-
rations organized depending on the frame index i. Values of
both QPOffsetModelScalei and QPOffsetModelOffseti parameters can
be found as in Table 3.
QP0 ,
if I frame or slice, POC = 0
QPinit − 3 log2 VGGfeature ,
if I frame or slice, POC 6= 0
QPperceptual = (8)
QPinit + 1pQPFidi ,
if P frame or slice
QPinit + 1bQPFid i ,
FIGURE 5. Correlation between StD value of original blocks and QP
values.
if B frame or slice
1pQPOffset = Clip(0.0, 3.0, 1QPOffseti ) (9)
Clip 0.0, 3.0, 1QPOffseti ,
if Fid = 0
0.0, 3.0, 1QPOffseti ,
Clip
if Fid = 1
Clip 0.0, 6.0, 1QP
Offseti ,
1bQPOffset = (10)
if Fid = 2
Clip 0.0, 7.0, 1QPOffseti ,
if Fid = 3
Clip 0.0, 9.0, 1QPOffseti ,
if Fid = 4
1QPOffseti = QPperceptual × QPOffsetModelScalei
+ QPOffsetModelOffseti + VGGfeature (11)
C. PERCEPTUAL ADAPTIVE LAGRANGE MULTIPLIER
DETERMINATION WITH QP-λ RELATIONSHIP
For increased bitrate savings while maintaining the visual
quality of the proposed adaptive QP decision algorithm,
we also aim to determine the Lagrange multiplier by involv-
ing the proposed VGGfeature . Note that the Lagrange multi-
plier in HM-16.20 is assigned a static QPfactor value. Hence,
it is essential to provide an adaptive QPfactor designed for
different picture types and coding structures in HEVC.
1) QPfactor DECISION FOR I-FRAMES
First, we searched for the best QPfactor of intra coded
frames by assigning several constant values of equation (3)
FIGURE 6. Relationship of: (a) VGG feature distortion between reference
through experiments using HM-16.20 under All Intra config-
and predicted frames, and (b) VGG feature and QP selection. urations. ‘BasketballPass’, ‘BQSquare’, ‘BlowingBubbles’,
and ‘RaceHorses’ were used with all the QP settings for
the experiment. Fig. 7 depicts the BD-rate based on SSIM
empirically, which also corresponds to the coding structure performance with the corresponding QPfactor values. It shows
under the LDP and RA configurations, respectively. For an approximation of the optimum QPfactor for intra frames,
avoiding large fluctuations in quality between neighboring which lies in the range of 0.60 to 0.80 with a minimal
frames, both 1pQPFidi and 1bQPFidi values for different BD-BR-SSIM gain of approximately −0.2%, while the high-
temporal levels should satisfy the conditions described in est coding gain is approximately −0.5% given by QPfactor
(9)–(11), where QPOffsetModelScalei and QPOffsetModelOffseti are as 0.65. Accordingly, the QPfactor for intra pictures can be
37058 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
TABLE 2. Initial of 1pQPFid and 1bQPFid for different Fid i .
i i
TABLE 3. Default value of QPOffsetModelScale and QP OffsetModelOffset for different Fid i .
i i
reference frame DVGGref . Note that the λ values among
different frames in the same GOP should be set differently,
although they are coded with the same QP value. Hence,
deciding the QPfactor for different frames in a different
temporal layer is desirable, and relationship in (7) can be
approximated as
(I )
PQPfactor = DVGGpre ≈ c × DVGGref + Dref (13)
where PQPfactor stands for the QPfactor of P-frame, and c is the
linear coefficient, i.e., the slope of the approximated linear
(I )
distortion relationship between DVGGpre and DVGGref . Dref
is added to the linear relationship to represent the feature
extraction of the reference frame coded under all intra mode.
(I )
FIGURE 7. QPfactor decision and BD-rate-SSIM of intra coded frames.
The Dref value in the proposed algorithm is used to maintain
gaps of bit distributions among inter-coded pictures in the
same GOP and set as
determined as (I ) StDintra
Dref = (14)
(GOPsize − Fid i )
0.57, POC = 0
IQPfactor = StDIntra + VGGfeature (12) where GOPsize and Fidi denote the GOP size for LDP, which
, POC 6 = 0
2 is set to 4 and the frame index listed in the same GOP,
where IQPfactor must satisfy 0.57 ≤ IQPfactor ≤ 0.80, POC respectively. An illustration of how PQPfactor is provided for
denotes the picture order count, and VGGfeature is a percep- P-frames under the LDP coding structure can be seen
tual loss value from the original and previously intra coded in Fig. 8. Then, the combination of (13) and (14) can be
pictures based on the VGG-16 model. expressed as
PQPfactor = DVGGpre ≈ c × DVGGref
2) QPfactor DECISION FOR P-FRAMES
StDintra
In the Inter picture coding framework under the LDP config- + (15)
uration, the quality of the reconstruction frames is generally (GOPsize − Fid i )
influenced by the coding structure factor (or QPfactor as Since DVGGREF is the same as VGGfeature for the perceptual
previously mentioned). As a result, the distortion of one frame retention purposes in PQPfactor , (15) can be further adjusted as
with a certain QP value may affect both the visual quality in (16), where the parameter c is empirically set as 0.45 in
and RD performance of future frames in encoding order this study.
according to the given QPfactor . Based on the previous obser-
vation illustrated in Fig. 6(a), the VGG feature of a predicted StDintra
PQPfactor = c × Vggfeature + (16)
frame DVGGpre increases linearly with the VGG feature of a (GOPsize − Fid i )
VOLUME 8, 2020 37059
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
TABLE 4. Pattern of POC difference between the current POC and its
reference POCs.
expressed as
FIGURE 8. Example of the proposed adaptive QPfactor for LDP case.
StDintra
BQPfactor = ci × Vggfeature + (17)
(GOPsize − Tid i )
3) QPFACTOR DECISION FOR B-FRAMES
For RA configuration, the QPfactor decision uses a similar where BQPfactor represents the QPfactor for the B-frame, and
concept as those in the LDP case with further adjustments. VGGfeature denotes the VGG feature extraction of the refer-
We first analyzed the hierarchical B coding structure under ence frames. StDintra is given from the I-frame depending on
RA configuration in the HEVC depicted in Fig. 9. Both the the intra period of each sequence configuration. GOPsize is
coding distortion and visual quality of the higher temporal the GOP size of the RA case, which is set to 16, and Tid i
layers are affected by those of the lower temporal levels. For is the temporal ID of frames in the same GOP. Parameter ci
the first frame in a GOP coded as an I-frame, its coding is a constant value of the i−th temporal ID that determines
distortion and visual quality will depend only on the spa- the BQPfactor of each frame in different temporal IDs. We first
tial operation. However, those pictures coded as B-frames, searched the best c per Tid i empirically with the default QP
including the frame with temporal ID = 0 but not an I-frames, setting as in HM-16.20. Fig. 10 depicts the results of the BD-
need to be treated in Interframe fashion with its corresponding BR-SSIM with the selected c values for different temporal
reference frames. Table 4 shows the POC difference between IDs. The ‘BasketballPass’ and ‘RaceHorses’ test sequences
the current POC and its reference pictures to their tempo- are used for testing all the QP settings. According to Fig. 10,
ral ID. This algorithm is designed to enable proper feature it can be seen that the optimum c values for temporal
extraction for the coding frames. However, we used only the ID-1 (T_1) is 0.20, and for T_2 to T_4 have the best c values
reference frame nearest to the current coded picture in the RA 0.30, 0.40, and 0.42, respectively. In this test, the ci values
coding structure. increase with the temporal IDs; hence, we set the c values
As we follow a similar concept in LDP configura- as 0.12 for the Interframe having temporal ID = 0. Accord-
tion, thus, the formula in (17) for the RA case can be ingly, the c values for different temporal ID in (17) can be
FIGURE 9. Hierarchical B coding structure under RA configuration.
37060 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
proposed algorithm is worse than that of HM-16.20. We also
evaluate the bitrate reduction, 1Bitrate towards the anchor
software, which can be denoted by
RPRO − RHM
1Bitrate = × 100% (20)
RHM
where RPRO and RHM represent the output bitrate of the
proposed and anchor algorithms, respectively. The proposed
algorithm is also evaluated against the anchor in BD-BR with
the SSIM metric (BD-BR-SSIM) [54], [55]. For bitrate reduc-
tion and BD-BR-SSIM measures, a negative value indicates
gains over the anchor. We used HEVC video test sequences
FIGURE 10. c parameter decision for each Tid i under RA configuration. with the LDP and RA configurations for several QPs: 22, 27,
32, 37. As shown in Table 6, the proposed algorithm demon-
strates a very negligible SSIM degradation of approximately
expressed as: −0.00541 and −0.00656 on average against HM-16.20 with-
out a perceptual adaptive QP method, respectively. In terms
0.12, if Tid = 0 and POC 6 = 0 of bitrate reduction, the proposed algorithm increases bitrate
0.20, if Tid =1 saving, on average, by approximately −42.67% for LDP and
ci = 0.30, if Tid =2 (18) −33.93% for RA configurations over the HM-16.20. For the
0.40, if Tid =3 ‘BQTerrace’ test sequence, the proposed algorithm achieves
0.42, the highest bitrate reduction of −66% for the LDP case and
if Tid =4 −48% for the RA case. Note that the sequence has large flat
regions over its frames that benefit the proposed algorithm
IV. EXPERIMENTAL RESULTS both spatially and temporally. In terms of the coding effi-
The test configuration used for evaluating the proposed algo- ciency, the proposed algorithm yields better BD-BR-SSIM
rithm is listed in Table 5. Coding efficiency evaluation was scores than the anchor about −20.68% and −14.27% for LDP
performed under a common test condition for HEVC [32] and RA configurations, respectively. The proposed algorithm
with the SSIM term [54]. In addition, subjective evaluation can also simulate better performance for test sequences with
was done using the difference mean opinion scores (DMOS). higher resolutions. In the case of LDP, Class B and Class E
The assessments were conducted by comparing the proposed provide an average coding gain of approximately −21% and
algorithm against HM-16.20 as an anchor software and also −28%, respectively. In the case of RA, Class A also gives a
against other existing works [40], [42]. coding gain of approximately −15%.
According to Table 6, the proposed algorithm can achieve
TABLE 5. Experimental environment. better objective performances under the LDP configuration
than RA. For the sake of visual quality, the number of intra
coded pictures in the LDP case indicates that the proposed
algorithm has an essential role in maintaining the quality of
the reconstructed frames. Better quality of the reconstructed
frames can provide better prediction modes for the future
inter coded frames, as well as better visual features for the
proposed QP and Lagrange multiplier selections. Considering
both spatial and temporal visual features for the proposed
algorithm results in significant bitrate reduction while retain-
ing the visual quality of the test videos. For test sequences that
A. CODING PERFORMANCE EVALUATIONS have many homogeneous regions, slow motions, and larger
We conducted several evaluations of the coding performance background areas than the moving objects in a frame, the pro-
to assess the objective quality of the proposed algorithm. posed algorithm can play a prominent role in obtaining higher
All the objective quality measures are tabulated in Table 6. objective measures. The visual characteristics of such test
First, we checked the SSIM difference, 1SSIM between the sequences can be seen in ‘BQTerrace’, ‘Johnny’, ‘FourPeo-
proposed algorithm and the anchor. It is defined by ple’, ‘Cactus’, ‘KristenAndSarra’ videos, etc., in which the
most significant coding gains are obtained in perceptual
1SSIM = SSIM PRO − SSIM HM (19)
terms. On the other hand, the proposed algorithm can con-
where SSIM PRO and SSIM HM denote the luma SSIM qual- tribute only moderate coding improvements for ‘Kimono’
ity of the proposed algorithm and the anchor, respectively. and ‘RaceHorses’ that have more textures and fast or more
For (19), a negative value means that the SSIM quality of the motions.
VOLUME 8, 2020 37061
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
TABLE 6. Objective quality comparisons between the proposed algorithm and HM-16.20.
TABLE 7. DMOS comparisons between the proposed algorithm and HM-16.20.
TABLE 8. Average of DMOS comparisons. process. For each participant, the reconstructed frames from
the proposed algorithm and HM-16.20 were randomly shown
twice with all the QP values. Then, the observers were asked
to provide MOS values in the continuous scale ranging from
1 to 5. Finally, we processed the MOS values to produce the
DMOS scores between MOS PRO and MOS HM , which denotes
the luma MOS quality of the proposed algorithm and the
anchor, respectively. DMOS scores are defined by
B. SUBJECTIVE PERFORMANCE EVALUATIONS
Subjective quality assessment was performed to compare DMOS = MOSpro − MOSHM (21)
the proposed algorithm and HM-16.20 for all the test
sequences by following the double stimulus continuous Table 7 shows the DMOS of all the test sequences under
quality scale (DSCQS) method [55]. There are 18 observers LDP and RA configurations. For convenience, we introduced
among which 11 are in the relative field, and the rest are naïve the average of DMOS per each sequence for all the QP values
in image processing. Before the test, we conducted simple to see visual quality judgments of the generated reconstruc-
demonstrations for the observers to introduce the evaluation tion frames. Minus values indicate that the video quality of
37062 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
TABLE 9. BD-rate-SSIM comparisons of the proposed algorithm and other existing algorithms.
FIGURE 11. DMOS comparisons of Xiang’s, Yeo’s, and the proposed algorithms.
the proposed algorithm is subjectively worse than that of the algorithm can code nearly visually identical output over
anchor ones. As presented, DMOS scales for the entire test those by HM-16.20. For several video sequences, as shown
sequences are quite close to 0. It means that the proposed in Table 7, the visual quality of the proposed algorithm is even
VOLUME 8, 2020 37063
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
slightly better than that of the anchor, such as in ‘PeopleOn- gains in SSIM, are yielded by the proposed algorithm, com-
Street’, ‘BQTerrace’, ‘BQMall’, and ‘BQSquare’, primarily pared with the HM-16.20, for LDP and RA, respectively.
when they are generated under the RA coding structure. This The subjective quality evaluation shows that the proposed
similarity in video quality between the proposed algorithm algorithm can produce comparable visual quality against the
and HM-16.20 can be seen for all the video sequence classes. anchor with significant bitrate-saving.
We can see that the proposed algorithm degrades visually
based on the DMOS test very slightly compared to its anchor, REFERENCES
by only about −0.05 and −0.04 for LDP and RA configura- [1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, ‘‘Overview of the
tions, respectively, as shown in Table 8. high efficiency video coding (HEVC) standard,’’ IEEE Trans. Circuits Syst.
Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[2] I. Marzuki, Y.-J. Ahn, and D. Sim, ‘‘Tile-level rate control for tile-
C. COMPARISONS WITH EXISTING ALGORITHMS parallelization HEVC encoders,’’ J. Real-Time Image Process., vol. 16,
After we presented both objective and subjective comparisons no. 6, pp. 2107–2125, Sep. 2017, doi: 10.1007/s11554-017-0720-5.
[3] C. C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux,
between the proposed algorithm and HM-16.20, we can and T. Schierl, ‘‘Parallel scalability and efficiency of HEVC parallelization
conclude that the perceptual adaptive QP at the frame-level approaches,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12,
demonstrates its capability to maintain visual quality with pp. 1827–1838, Dec. 2012.
[4] H. Jo and D. Sim, ‘‘Bitstream decoding processor for fast entropy decoding
better coding efficiency performances in the perceptual of variable length coding-based multiformat videos,’’ Opt. Eng., vol. 53,
term. In this sub-section, we present the same compar- no. 6, Jun. 2014, Art. no. 063102, doi: 10.1117/1.OE.53.6.063102.
isons (objective and subjective comparisons) of the proposed [5] Y.-J. Yoon, H. Kim, S.-J. Baek, and S.-J. Ko, ‘‘Largest coding unit level
algorithm against other existing algorithms. Table 9 shows rate control algorithm for hierarchical video coding in HEVC,’’ IEIE Trans.
Smart Process. Comput., vol. 1, no. 3, pp. 171–181, Dec. 2012.
the SSIM-based BD-rate comparisons of Yeo et al. [40], [6] J. Kim and M. Kim, ‘‘Analysis of the JND-suppression effect in quantiza-
Xiang et al. [42], and the proposed algorithms. As both tion perspective for HEVC-based perceptual video coding,’’ IEIE Trans.
existing algorithms were integrated into HM-16.0, we also Smart Process. Comput., vol. 4, no. 1, pp. 22–27, Feb. 2015.
[7] W. Wiratama, Y.-J. Ahn, I. Marzuki, and D. Sim, ‘‘Adaptive Gaussian low-
implemented the proposed algorithm in the same software pass pre-filtering for perceptual video coding,’’ IEIE Trans. Smart Process.
version to meet fair comparisons. As shown in Table 9, we can Comput., vol. 7, no. 5, pp. 366–377, Oct. 2018.
see that the proposed algorithm in the downgraded version [8] M. Xu, T. Li, Z. Wang, X. Deng, R. Yang, and Z. Guan, ‘‘Reducing com-
plexity of HEVC: A deep learning approach,’’ IEEE Trans. Image Process.,
can still outperform two existing algorithms in perceptual vol. 27, no. 10, pp. 5044–5059, Oct. 2018.
coding efficiency. Overall, we can achieve a coding gain [9] B. Lee and M. Kim, ‘‘A CU-level rate and distortion estimation scheme
of approximately −14.44%, while Xiang’s and Yeo’s are for RDO of hardware-friendly HEVC encoders using low-complexity
integer DCTs,’’ IEEE Trans. Image Process., vol. 25, no. 8, pp. 3787–3800,
−4.51% and −3.56%, respectively. Note that all the pre- Aug. 2016.
sented results in Table 9 were generated under random-access [10] I. Marzuki, J. Ma, Y.-J. Ahn, and D. Sim, ‘‘A context-adaptive fast intra
configuration with all the quantization parameter values. coding algorithm of high-efficiency video coding (HEVC),’’ J. Real-Time
Image Process., vol. 16, no. 4, pp. 883–899, Mar. 2016, doi: 10.1007/
Furthermore, we also performed the MOS test to eval- s11554-016-0571-5.
uate the subjective visual quality of all the algorithms. [11] Q. Hu, X. Zhang, Z. Shi, and Z. Gao, ‘‘Neyman-pearson-based early mode
Fig. 11 presents the average DMOS results of Xiang’s, Yeo’s, decision for HEVC encoding,’’ IEEE Trans. Multimedia, vol. 18, no. 3,
pp. 379–391, Mar. 2016.
and the proposed algorithms in the RA structure. The per-
[12] M. Ismail, J. Ma, and D. Sim, ‘‘Full depth RQT after PU decision for fast
formance of the baseline, which refers to the HM software, encoding of HEVC,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron.
is set to zero for the visual similarity evaluation of the three (ISCE ), Jeju Island, South Korea, Jun. 2014, pp. 1–2.
algorithms. DMOS scores that are close to the zero baseline [13] Y.-J. Ahn and D. Sim, ‘‘Square-type-first inter-CU tree search algorithm
for acceleration of HEVC encoder,’’ J. Real-Time Image Process., vol. 12,
indicate visual similarity to the anchor. From the experi- no. 2, pp. 419–432, Feb. 2015, doi: 10.1007/s11554-015-0487-5.
mental results, most of the test sequences tested under the [14] J. Gu, M. Tang, J. Wen, and Y. Han, ‘‘Adaptive intra candidate selection
proposed algorithm can stand more DMOS points closer to with early depth decision for fast intra prediction in HEVC,’’ IEEE Signal
Process. Lett., vol. 25, no. 2, pp. 159–163, Feb. 2018.
zero, followed by the Xiang’s and Yeo’s algorithms. This [15] K. Yang, Y. Gong, M. Ma, and H. R. Wu, ‘‘An efficient rate-distortion
means that the proposed algorithm can give better quality optimization method for low-delay configuration in H.265/HEVC based
subjectively than the two existing algorithms. on temporal layer rate and distortion dependence,’’ IEEE Trans. Circuits
Syst. Video Technol., vol. 29, no. 4, pp. 1230–1236, Apr. 2019.
[16] M. Ismail, H. Jo, and D. Sim, ‘‘Fast intra mode decision for HEVC
V. CONCLUSION intra coding,’’ in Proc. 18th IEEE Int. Symp. Consum. Electron. (ISCE),
In this work, we propose a perceptual adaptive QP algo- Jeju Island, South Korea, Jun. 2014, pp. 1–2.
[17] W. Lee, J. Lee, D. Sim, and S.-J. Oh, ‘‘A deep learning based inter-layer
rithm at the frame-level to obtain better subjective coding reference picture generation method for improving SHVC coding perfor-
performance for HEVC. The proposed algorithm utilizes a mance,’’ J. Broadcast Eng., vol. 24, no. 3, pp. 401–410, May 2019.
predefined model of the VGG-16 network for feature extrac- [18] W. Lim and D. Sim, ‘‘Determination of optimum quantization parameters
in residual quad-tree of HEVC based on perceptual quality,’’ J. Imag. Sci.
tions from the original and previously reconstructed pictures.
Technol., vol. 62, no. 2, pp. 205021–205028, Mar. 2018.
We designed the proposed algorithm by developing a percep- [19] V. Barocini, J.-R. Ohm, and G. J. Sullivan, Report of Results From the
tual loss function based on the extracted features. The pro- Call for Proposals on Video Compression With Capability Beyond HEVC,
posed algorithm adaptively determines perceptual QP values document JVET-J1003, Joint Video Experts Team, 2018.
[20] S. Liu, B. Choi, K. Kawamura, Y. Li, L. Wang, P. Wu, and H. Yang, JVET
for different picture types of the hierarchical coding structure AHG Report: Neural Networks in Video Coding, document JVET-L0009,
in HEVC. Results of approximately −21% and −14% coding Joint Video Experts Team, 2018.
37064 VOLUME 8, 2020
I. Marzuki, D. Sim: Perceptual Adaptive QP Selection Using Deep Convolutional Features for HEVC Encoder
[21] L. Zhou, X. Song, J. Yao, L. Wang, and F. Chen, Convolutional Neural Net- [45] D. Liu, Y. Li, J. Lin, H. Li, and F. Wu, ‘‘Deep learning-based video coding:
work Filter for Intra Frame, document JVET-I0022, Joint Video Experts A review and a case study,’’ 2019, arXiv:1904.12462. [Online]. Available:
Team, 2018. http://arxiv.org/abs/1904.12462
[22] J. Yao, X. Song, S. Fang, and L. Wang, AHG9: Convolutional Neural Net- [46] S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wanga, ‘‘Image and
work Filter for Inter Frame, document JVET-J0043, Joint Video Experts video compression with neural networks: A review,’’ IEEE Trans. Circuits
Team, 2018. Syst. Video Technol., to be published, doi: 10.1109/TCSVT.2019.2910119.
[23] T. Hashimoto and E. Sasaki T. Ikai, AHG9: Separable Convolutional Neu- [47] H. Choi and I. V. Bajic, ‘‘Deep frame prediction for video coding,’’ IEEE
ral Network Filter With Squeeze-and-Excitation Block, document JVET- Trans. Circuits Syst. Video Technol., to be published, doi: 10.1109/TCSVT.
K0158, Joint Video Experts Team, 2018. 2019.2924657.
[24] Y.-L. Hsiao, C.-Y. Chen, T.-D. Chuang, C.-W. Hsu, Y.-W Huang, and [48] S. Ki, S.-H. Bae, M. Kim, and H. Ko, ‘‘Learning-based just-noticeable-
S.-M Lei, AHG9: Convolution Neural Network Loop Filter, document quantization-distortion modeling for perceptual video coding,’’ IEEE
JVET-K0222, Joint Video Experts Team, 2018. Trans. Image Process., vol. 27, no. 7, pp. 3178–3193, Jul. 2018.
[25] Y. Wang, Z. Chen, and Y. Li, AHG9: Dense Residual Convolutional [49] Y. Li, B. Li, D. Liu, and Z. Chen, ‘‘A convolutional neural network-based
Neural Network Based in-Loop Filter, document JVET-K0391, Joint Video approach to rate control in HEVC intra coding,’’ in Proc. IEEE Vis.
Experts Team, 2018. Commun. Image Process. (VCIP), St. Petersburg, FL, USA, Dec. 2017,
[26] I. Marzuki and D. Sim, ‘‘Overview of potential technologies for future pp. 1–4.
video coding standard (FVC) in JEM software: Status and review,’’ IEIE [50] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
Trans. Smart Process. Comput., vol. 7, no. 1, pp. 22–35, Feb. 2018. large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online]. Avail-
[27] HM. HEVC Test Model. [Online]. Available: http://hevc.hhi.fraunhofer. able: http://arxiv.org/abs/1409.1556
de/svn/svnHEVCSoftware/ [51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
[28] G. J. Sullivan and T. Wiegand, ‘‘Rate-distortion optimization for video with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6,
compression,’’ IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, pp. 84–90, May 2017.
Nov. 1998. [52] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-
[29] A. Ortego and K. Ramchandran, ‘‘Rate-distortion methods for image and time object detection with region proposal networks,’’ IEEE Trans. Pattern
video compression,’’ IEEE Signal Process. Mag., vol. 15, no. 6, pp. 23–50, Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
Nov. 1998. [53] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
[30] B. Li, D. Zhang, H. Li, and J. Xu, QP Determination By Lambda Value, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, ‘‘Photo-realistic single
document JCTVC-I0426, Joint Collaborative Team on Video Coding, image super-resolution using a generative adversarial network,’’ in Proc.
2012. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA,
[31] B. Li, J. Xu, D. Zhang, and H. Li, ‘‘QP refinement according to Lagrange Jul. 2017, pp. 105–114.
multiplier for high efficiency video coding,’’ in Proc. IEEE Int. Symp. [54] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ‘‘Image quality
Circuits Syst. (ISCAS), Beijing, China, May 2013, pp. 447–480. assessment: From error visibility to structural similarity,’’ IEEE Trans.
[32] F. Bossen, Common HM Test Conditions and Software Reference Con- Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
figurations, document JCTVC-L1100, Joint Collaborative Team on Video [55] S. Pateux and J. Jung, An Excel Add-in for Computing Bjontegaard Metric
Coding, 2013. and Its Evolution, document VCEG-AE07, Video Coding Experts Group,
[33] M. Wang, K. N. Ngan, H. Li, and H. Zeng, ‘‘Improved block level adaptive 2007.
quantization for high efficiency video coding,’’ in Proc. IEEE Int. Symp.
Circuits Syst. (ISCAS), Lisbon, Portugal, May 2015, pp. 509–512.
[34] T. Zhao, Z. Wang, and C. W. Chen, ‘‘Adaptive quantization parameter
cascading in HEVC hierarchical coding,’’ IEEE Trans. Image Process., ISMAIL MARZUKI received the B.S. degree in
vol. 25, no. 7, pp. 2997–3009, Jul. 2016. informatics from the UIN Sultan Syarif Kasim
[35] S. Li, C. Zhu, Y. Gao, Y. Zhou, F. Dufaux, and M.-T. Sun, ‘‘Lagrangian Riau, Indonesia, in 2011, and the M.S. degree in
multiplier adaptation for rate-distortion optimization with inter-frame computer engineering from Kwangwoon Univer-
dependency,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1, sity, Seoul, South Korea, in 2015, where he is
pp. 117–129, Jan. 2016.
currently pursuing the Ph.D. degree. He joined
[36] J. He, E.-H. Yang, F. Yang, and K. Yang, ‘‘Adaptive quantization parameter
the Image Processing Systems Laboratory (IPSL),
selection for H.265/HEVC by employing inter-frame dependency,’’ IEEE
Trans. Circuits Syst. for Video Technol., vol. 28, no. 12, pp. 3424–3436, in 2013. His research interests are related to
Dec. 2018. high-efficiency video compression (HEVC/x265)
[37] T.-D. Chuang, C.-Y. Chen, Y.-L. Chang, Y.-W. Huang, and S. Lei, AhG techniques, fast coding, rate control, and versatile
Quantization: Sub-LCU Delta QP, document JCTVC-E051, Joint Collab- video coding (VVC), and deep learning.
orative Team on Video Coding, 2011.
[38] X265/HEVC Reference Software. [Online]. Available: http://hg.videolan.
org/x265
[39] MPEG-2 Test Model 5, Rate Control and Quantization Control Chapter DONGGYU SIM received B.S. and M.S. degrees
10. [Online]. Available: http://www.mpeg.org/MPEG/MSSG/tm5/Ch10/ in electronic engineering and the Ph.D. degree
Ch10.html from Sogang University, South Korea, in 1993,
[40] C. Yeo, H. L. Tan, and Y. H. Tan, ‘‘SSIM-based adaptive quantization 1995, and 1999, respectively. He was with the
in HEVC,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Hyundai Electronics Company, Ltd., from 1999 to
Vancouver, BC, Canada, May 2013, pp. 1690–1694. 2000, being involved in MPEG-7 standardization.
[41] L. Prangnell, V. Sanchez, and R. Vanam, ‘‘Adaptive quantization by soft He was a Senior Research Engineer with Varo
thresholding in HEVC,’’ in Proc. Picture Coding Symp. (PCS), Cairns, Vision Company, Ltd, working on MPEG-4 wire-
QLD, Australia, May 2015, pp. 35–39. less applications, from 2000 to 2002. He worked
[42] G. Xiang, H. Jia, M. Yang, J. Liu, C. Zhu, Y. Li, and X. Xie, ‘‘An improved
for the Image Computing Systems Laboratory
adaptive quantization method based on perceptual CU early splitting for
(ICSL), University of Washington, as a Senior Research Engineer, from
HEVC,’’ in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), Las Vegas,
NV, USA, Jan. 2017, pp. 362–365.
2002 to 2005. He researched on ultrasound image analysis and parametric
[43] G. Xiang, H. Jia, M. Yang, X. Zhang, X. Huang, J. Liu, and X. Xie, video coding. Since 2005, he has been with the Department of Computer
‘‘A perceptually temporal adaptive quantization algorithm for HEVC,’’ Engineering, Kwangwoon University, Seoul, South Korea. In 2011, he joined
J. Vis. Commun. Image Represent., vol. 50, pp. 280–289, Jan. 2018. the Simon Frasier University as a Visiting Scholar. He is one of main inven-
[44] K. Rouis, M.-C. Larabi, and J. B. Tahar, ‘‘Perceptually adaptive tors in many essential patents licensed to MPEG-LA for HEVC standard.
Lagrangian multiplier for HEVC guided rate-distortion optimization,’’ His current research interests are video coding, video processing, computer
IEEE Access, vol. 6, pp. 33589–33603, Jun. 2018, doi: 10.1109/ACCESS. vision, and video communication.
2018.2843384.
VOLUME 8, 2020 37065