Download B-SSIM: Structural Similarity Index for Blurred Videos

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Broadcast television systems wikipedia , lookup

Camcorder wikipedia , lookup

Serial digital interface wikipedia , lookup

Home cinema wikipedia , lookup

Videocassette recorder wikipedia , lookup

Television standards conversion wikipedia , lookup

Interlaced video wikipedia , lookup

Video CD wikipedia , lookup

H.264/MPEG-4 AVC wikipedia , lookup

HD-MAC wikipedia , lookup

Transcript
XXX SIMPÓSIO BRASILEIRO DE TELECOMUNICAÇÕES - SBrT’12, 13-16 DE SETEMBRO DE 2012, BRASÍLIA, DF
B-SSIM: Structural Similarity Index for Blurred
Videos
José V. de M. Cardoso, Carlos D. M. Regis and Marcelo S. Alencar
Abstract— This paper presents the B-SSIM, a new metric
to video quality assessment based on the structural similarity
index and the spatial perceptual information. It shows a better
correlation with the quality perceived by the human visual
system, than the metrics PSNR, SSIM and MultiScale-SSIM (MSSSIM).
Keywords— Video Quality Assessment, Structural Similarity,
Spatial Perceptual Information, Human Visual System.
I. I NTRODUCTION
Visual quality is an important factor in video communication systems and services. Especially due to an increased
demand and the variety of video services distributed over the
Internet and mobile networks. Compression and transmission
introduce a variety of artifacts and distortions in the digital
video, such as, blurring, blocking, Gaussian noise and salt &
pepper noise, causing loss of visual quality [1].
The most accurate way to determine the quality of a video is
by measuring it using psychophysical experiments with human
subjects, called subjective video quality assessment [2], in
which evaluators watch video samples and assign a score to the
quality according to individual criteria of judgment. However,
the implementation of this approach is complex, has a high
cost and demands human resources.
An alternative is to resort to methods of objective video
quality assessment, that use algorithms to measure the visual
quality. This model is faster and has lower cost than subjective
solutions and its results allow continuous monitoring of the
quality of videos or define optimal compression parameters.
Nevertheless, popular objective metrics, such as PSNR (Peak
Sinal-to-Noise Ratio) and MSE (Mean Squared Error) do
not usually show good correlation with subjective scores,
compromising the reliability of this approach [2].
Currently, one of the most studied objective metrics is the
Structural SIMilarity Index (SSIM), based on the concept
that the Human Visual System (HVS) is highly adapted to
recognize structural information in the visual environment.
Its results show substantial progress on image and video
quality assessment. Nevertheless, this metric does not have
the same success in the case of videos that present blurring
distortion [3].
A new approach to assess the quality of digital video
objectively in the presence of blurring distortion is proposed,
called B-SSIM, based on SSIM and the Spatial Perceptual
Information (SI) of the videos.
II. SSIM: S TRUCTURAL SIM MILARITY I NDEX
The Structural SIMilarity Index (SSIM) is a metric proposed
by Wang et al [4], based on the hypothesis that the structural
information of the image is a highly sensitive measure to
the HVS and that the change in this structural information
can provide a good approximation to the visual quality. Let
f = {fi | i = 1, 2, 3, . . . , P } be the original video signal and
h = {hi | i = 1, 2, 3, . . . , P } be the distorted video signal,
computed as the set of three measures over the pixel luminance
plane: luminance comparison l(f, h), constrast comparison
c(f, h) and structural comparison s(f, h),
l(f, h) =
2σf σh + C2
2µf µh + C1
, c(f, h) = 2
,
2
2
µf + µh + C1
σf + σh2 + C2
s(f, h) =
σf h + C 3
,
σf σh + C3
(1)
(2)
in which µ is the average, σ is the standard deviation, σf h is
the covariance, C1 = (0.01 · 255)2 , C2 = 2C3 = (0.03 · 255)2 .
The structural similarity index is described as
SSIM(f, h) = [l(f, h)]α · [c(f, h)]β · [s(f, h)]γ ,
(3)
in which usually α = β = γ = 1 [4].
In pratice the SSIM is computed for an 8×8 sliding squared
window or for an 11 × 11 Gaussian-circular window. The first
approach is used in this paper. Then, for two videos which are
subdivided into D blocks, the SSIM is computed as
D
1 X
SSIM(f, h) =
SSIM(fi , hi ).
D i=1
(4)
III. B-SSIM: S TRUCTURAL SIM ILARITY I NDEX FOR
B LURRED V IDEOS
The parameter that measures the complexity of the spatial
details of the video samples is called Spatial Perceptual
Information (SI). It is higher for more spatially complex
samples [5]. The SI is computed taking into account the gradients in the vertical and horizontal directions using the Sobel
filters in the n-th video frame (Sobel(Fn )), then the standard
deviation of the magnitude of the gradients (std[Sobel(Fn )])
is calculated for each video frame. The highest value among
the standard deviations represents the SI of the video sample.
This process is
SI = max{std[Sobel(Fn )]}.
(5)
The blurring distortion is presented as a reduction of edge
sharpness and a loss of spatial detail. In real applications,
this degradation is due to the exclusion of high frequency
XXX SIMPÓSIO BRASILEIRO DE TELECOMUNICAÇÕES - SBrT’12, 13-16 DE SETEMBRO DE 2012, BRASÍLIA, DF
(a) Original frame
(b) 2-Blurred frame
(c) 4-Blurred frame
Fig. 1: Blurring degradation in the “Foreman” video frame.
coefficients in the quantization process [1]. Fig. 1a shows an
example of this distortion, in which the “2-blurred” means
two applications of the mean filter 3 × 3 on the video and
“4-blurred” means four applications.
The investigation that resulted in the proposed method
started from the observation that the SI of the video samples
is closely related to the blurring distortion. Fig. 2 shows that
an increase in the amount of the blurring distortion causes a
reduction in the SI.
Spatial Perceptual Information (SI)
200
Original Video
2−Blurred Video
4−Blurred Video
150
100
50
TABLE I: Pearson and Spearman Rank-order Correlation
Coefficients
0
Mother Foreman Glasgow
Videos
Mobile
Fig. 2: Relationship between SI and the blurring distortion.
A comparison between the SI of the original and processed
videos provides a good approximation to the quality perceived
by the HVS regarding the blurring degradation.
A modification is proposed for the SSIM, taking into
account the SI to evaluate the quality of videos that present
this type of degradation. It is called Blur-inspired Structural
Similarity Index (B-SSIM).
Initially, define the comparison function of the spatial perceptual informations b(f, h) as
2SIf SIh
,
(6)
b(f, h) = 2
SIf + SI2h
in which SIf and SIh are the spatial perceptual informations
of the original and processed video, respectively. The B-SSIM
is described as,
B-SSIM(f, h) = b(f, h) ·
D
1 X
SSIM(fi , hi ).
D i=1
Daughter”. They display two intensity levels of blurring distortion, as seen in Fig. 1. These videos were analyzed by
40 evaluators, using the Absolute Category Rating (ACR)
method [5], and their scores were assigned according as a
scale of five discrete values.
The efficiency of the B-SSIM metric was assessed by the
correlation between its results and the values of the subjective
evaluation (Mean Opinion Score − MOS). For the videos
presented in Fig. 1b and Fig. 1c, B-SSIM = 0.6434 and
SSIM = 0.8241, B-SSIM = 0.4526 and SSIM = 0.7276,
respectively.
These results indicate that the effect of the SI on the
objective video quality assessment, provided by the B-SSIM
measure, is more sensitive to the blurring artifact, resulting
in an improvement that better represents the quality perceived by the HVS. This improvement is shown in Table I,
considering the Pearson Correlation Coefficient (PCC) and
the Spearman Rank-order Correlation Coeficient (SROCC)
between the MOS and the results obtained using objective
metrics, suggesting that the proposed metric has a better ability
to predict the visual quality perceived by HVS.
(7)
IV. S IMULATION AND R ESULTS
For subjective evaluation, the selected videos were: “Foreman”, “Glasgow”, “Mobile & Calendar” and “Mother and
Model
PSNR
SSIM
MS-SSIM
B-SSIM
PCC
0.677
0.806
0.817
0.905
SROCC
0.738
0.738
0.786
0.881
ACKNOWLEDGMENTS
The authors would like to thank CNPq/PIBITI,
UFCG/COPELE, IFPB and Iecom for supporting research.
R EFERENCES
[1] S. Winkler, Digital Video Quality: Vision Models and Metrics, WileyEditors, Ed. Wiley, 2005.
[2] Z. Wang and A. Bovik, “Mean squared error: Love it or leave it? a
new look at signal fidelity measures,” Signal Processing Magazine, IEEE,
vol. 26, no. 1, pp. 98 –117, jan. 2009.
[3] J. V. de Miranda Cardoso, A. C. S. Mariano, C. D. M. Regis, and M. S.
Alencar, “Comparação das métricas objetivas baseadas na similaridade estrutural e na sensibilidade ao erro,” Revista de Tecnologia da Informação
e Comunicação (RTIC), no. 2, pp. 33–40, 2012.
[4] Z. Wang, L. Lu, and A. Bovik, “Video quality assessment using structural
distortion measurement,” in International Conference on Image Processing., vol. 3, 2002, pp. III–65 – III–68 vol.3.
[5] ITU-T, “ITU-T Recommendation P.910: Subjective video quality assessment methods for multimedia applications,” ITU-T, Tech. Rep., April
2008.