Graph Signal Processing – Part II: Processing and Analyzing Signals on Graphs
Ljubiša Stankovića , Danilo Mandicb , Miloš Dakovića , Miloš Brajovića , Bruno Scalzob , Anthony G. Constantinidesb
a University of Montenegro, Podgorica, Montenegro
b Imperial College London, London, United Kingdom
Abstract
Data analytics on graphs deals with information processing of data acquired on irregular but structured graph domains.
The focus of Part I of this monograph has been on both the fundamental and higher-order graph properties, graph
topologies, and spectral representations of graphs. Part I also establishes rigorous frameworks for vertex clustering
and graph segmentation, and illustrates the power of graphs in various data association tasks. Part II embarks on
arXiv:1909.10325v1 [cs.IT] 23 Sep 2019
these concepts to address the algorithmic and practical issues centered round data/signal processing on graphs, that is,
the focus is on the analysis and estimation of both deterministic and random data on graphs. The fundamental ideas
related to graph signals are introduced through a simple and intuitive, yet illustrative and general enough case study of
multisensor temperature field estimation. The concept of systems on graph is defined using graph signal shift operators,
which generalize the corresponding principles from traditional learning systems. At the core of the spectral domain
representation of graph signals and systems is the Graph Discrete Fourier Transform (GDFT), which is defined based
on the eigendecomposition of both the adjacency matrix and the graph Laplacian. The spectral domain representations
are then used as the basis to introduce graph signal filtering concepts and address their design, including Chebyshev
polynomial approximation series. Ideas related to the sampling of graph signals, and in particular the challenging topic
of data dimensionality reduction through graph subsampling, are presented and further linked with compressive sensing.
The principles of time-varying signals on graphs and basic definitions related to random graph signals are next reviewed.
Localized graph signal analysis in the joint vertex-spectral domain is referred to as the vertex-frequency analysis, since it
can be considered as an extension of classical time-frequency analysis to the graph domain of a signal. Important topics
related to the local graph Fourier transform (LGFT) are covered, together with its various forms including the graph
spectral and vertex domain windows and the inversion conditions and relations. A link between the LGFT with spectral
varying window and the spectral graph wavelet transform (SGWT) is also established. Realizations of the LGFT and
SGWT using polynomial (Chebyshev) approximations of the spectral functions are further considered and supported
by examples. Finally, energy versions of the vertex-frequency representations are introduced, along with their relations
with classical time-frequency analysis, including a vertex-frequency distribution that can satisfy the marginal properties.
The material is supported by numerous examples.
Contents 3.5 Graph Signal Filtering in the Spectral Do-
main of the Adjacency Matrix . . . . . . . . 11
1 Introduction 2 3.5.1 Normalization of the Adjacency Ma-
trix . . . . . . . . . . . . . . . . . . 11
2 Problem Statement: An Illustrative Exam- 3.5.2 Spectral Ordering of Eigenvectors of
ple 3 the Adjacency Matrix . . . . . . . . 11
3.5.3 Spectral Domain Filter Design . . . 12
3 Signals and Systems on Graphs 7
3.5.4 Polynomial (Chebyshev) Approxima-
3.1 Adjacency Matrix and Graph Signal Shift . 8
tion of the System on a Graph Trans-
3.2 Systems Based on Graph Shifted Signals . . 8
fer Function . . . . . . . . . . . . . . 14
3.3 Graph discrete Fourier transform (GDFT),
3.5.5 Inverse System on a Graph . . . . . 15
adjacency matrix based definition . . . . . . 10
3.6 Graph Fourier Transform Based on the Lapla-
3.4 System on a graph in the GDFT domain . . 10
cian . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Ordering and Filtering in the Laplacian Spec-
Email addresses:
[email protected] (Ljubiša Stanković), tral Domain . . . . . . . . . . . . . . . . . . 16
[email protected] (Danilo Mandic),
[email protected] 3.8 Systems on a Graph Defined Using the Graph
(Miloš Daković),
[email protected] (Miloš Brajović), Laplacian . . . . . . . . . . . . . . . . . . . 17
[email protected] (Bruno Scalzo),
[email protected] (Anthony G. Constantinides)
3.9 Convolution of Signals on a Graph . . . . . 18
3.10 The z-transform of a Signal on a Graph . . 20
Preprint submitted to Elsevier Received: date / Accepted: date
3.11 Shift Operator in the Spectral Domain . . . 21 7.4.1 Graph Wavelet Transform Inversion 54
3.12 Parseval’s Theorem on a Graph . . . . . . . 21 7.5 Vertex-Frequency Energy Distributions . . . 55
3.13 Optimal Denoising . . . . . . . . . . . . . . 21 7.5.1 Smoothness Index and Local Smooth-
3.14 Systems on a Graph Defined Using Random ness . . . . . . . . . . . . . . . . . . 55
Walk Laplacian . . . . . . . . . . . . . . . . 22 7.5.2 Reduced Interference Distributions
(RID) on Graphs . . . . . . . . . . . 57
4 Subsampling, Compressed Sensing, and Re- 7.5.3 Reduced Interference Distribution Ker-
construction 23 nels . . . . . . . . . . . . . . . . . . 57
4.1 Subsampling of Low-Pass Graph Signals . . 24
4.2 Subsampling of Sparse Graph Signals . . . . 25 8 Conclusion 58
4.2.1 Known Coefficient Positions in GDFT 25
4.2.2 Support Matrices, Subsampling and 9 Bibliography 58
Upsampling . . . . . . . . . . . . . . 25
4.2.3 Unknown Coefficient Positions . . . 26 1. Introduction
4.2.4 Unique Reconstruction Conditions . 26
4.3 Measurements as Linear Combinations of Graphs are irregular structures which naturally account
Samples . . . . . . . . . . . . . . . . . . . . 27 for data integrity, however, traditional approaches have
4.4 Aggregate Sampling . . . . . . . . . . . . . 28 been established outside Machine Learning and Signal Pro-
4.5 Filter Bank on a Graph . . . . . . . . . . . 28 cessing, and largely focus on analyzing the underlying graphs
rather than dealing with signals on graphs. On the other
5 Time-Varying Signals on Graphs 32 hand, given the rapidly increasing availability of multisen-
5.1 Diffusion on Graph and Low Pass Filtering 32 sor and multinode measurements, likely recorded on irreg-
5.2 Taubin’s α − β algorithm . . . . . . . . . . 33 ular or ad-hoc grids, it would be extremely advantageous
to analyze such structured data as “signals on graphs” and
6 Random Graph Signals 33
thus benefit from the ability of graphs to account for spa-
6.1 Review of WSS and related Properties for
tial sensing awareness, physical intuition and sensor im-
Random Signals in Standard Time Domain 34 portance, together with the inherent “local versus global”
6.2 Adjacency Matrix Based Definition of GWSS 34 sensor association. The aim of Part II of our monograph is
6.3 Wiener filter on a graph . . . . . . . . . . . 35
therefore to establish a common language between graph
6.4 Spectral Domain Shift Based Definition of
signals which are observed on irregular signal domains,
GWSS . . . . . . . . . . . . . . . . . . . . . 35
and some of the most fundamental paradigms in Learn-
6.5 Isometric Shift Operator . . . . . . . . . . . 35 ing Systems, Signal Processing and Data Analytics, such
7 Vertex-Frequency Representations 36 as spectral analysis, system transfer function, digital filter
design, parameter estimation, and optimal denoising.
7.1 Localized Graph Fourier Transform (LGFT) 37
7.1.1 Windows Defined in the GDFT Do- In classical Data Analytics and Signal Processing, the
main . . . . . . . . . . . . . . . . . . 37 signal domain is determined by equidistant time instants
7.1.2 Spectral Domain Localization of the or by a set of spatial sensing points on a uniform grid.
LGFT . . . . . . . . . . . . . . . . . 40 However, increasingly the actual data sensing domain may
7.1.3 LGFT Realization with Band-Pass not even be related to the physical dimensions of time
and/or space, and it typically does exhibit various forms
Functions . . . . . . . . . . . . . . . 40
7.1.4 Signal Adaptive LGFT . . . . . . . . 43 of irregularity, as, for example, in social or web-related
7.1.5 Polynomial LGFT Approximation . 45 networks, where the sensing points and their connectiv-
7.1.6 The Spectral Graph Wavelet Trans- ity pertain to specific objects/nodes and ad-hoc topology
form . . . . . . . . . . . . . . . . . . 45 of their links. It should be noted that even for the data
7.1.7 Windows Defined Using the Vertex acquired on well defined time and space domains, the in-
Neighborhood . . . . . . . . . . . . . 49 troduction of new relations between the signal samples,
7.1.8 Window Parameter Optimization . . 50 through graphs, may yield new insights into the analysis
7.2 Inversion of the LGFT . . . . . . . . . . . . 50 and provide enhanced data processing (for example, based
7.2.1 Inversion by the Summation of the on local similarity, through neighborhoods). We therefore
LGFT . . . . . . . . . . . . . . . . . 51 set out to show that the advantage of graphs over classical
7.2.2 Inversion of the LGFT with Band- data domains is that graphs account naturally and com-
Pass Functions . . . . . . . . . . . . 52 prehensively for irregular data relations in the problem
7.2.3 Kernel-Based Inversion . . . . . . . 52 definition, together with the corresponding data connec-
7.2.4 Vertex-Varying Filtering . . . . . . . 52 tivity in the analysis [1, 2, 3, 4, 5, 6, 7, 8].
7.3 Uncertainty Principle for Graph Signals . . 53 To build up the intuition behind the fundamental ideas
7.4 Graph Spectrogram and Frames . . . . . . . 53 of signals/data on graph, a simple yet general example of
multisensor temperature estimation is first considered in
2
Section 2. Basic concepts regarding the signals and sys-
tems on graphs are presented in Section 3, including basic 11
6
definitions, operations and transforms, which generalize
the foundations of traditional signal processing. Systems 12
on graphs are interpreted starting from a comprehensive 10 4
account of the existing and the introduction of a novel, iso- 9
metric, graph signal shift operator. Further, graph Fourier 5 14
transform is defined based on both the adjacency matrix 13
and the graph Laplacian and it serves as the basis to in- 0
1
troduce graph signal filtering concepts. Various ideas re- 8 2
lated to the sampling of graph signals, and particularly, 3
the challenging topic of their subsampling, are reviewed 15
in Section 4. Sections 5 and 6 present the concepts of
time-varying signals on graphs and introduce basic defi-
7
nitions related to random graph signals. Localized graph
signal behavior can be simultaneously characterized in the
vertex-frequency domain, which is discussed in Section
(a)
7. This Section also covers the important topics of lo-
cal graph Fourier transform, various forms of its inversion,
40
relations with the frames framework and links with the
graph wavelet transform. Energy versions of the vertex- 30
frequency representations are also considered, along with
20
their relations with classical time-frequency analysis.
10
2. Problem Statement: An Illustrative Example 0
0 2 4 6 8 10 12 14 16
Consider a multi-sensor setup for measuring a temper- Sensor index (b)
ature field in a region of interest. The temperature sensing
locations are chosen according to the significance of a par- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (c)
ticular geographic area to local users, with N = 16 sensing
Figure 1: Temperature sensing as a classic data analytics problem.
points in total, as shown in Fig. 1(a). The temperature (a) Sensing locations in a geographic region along the Adriatic sea.
field is denoted by {x(n)}, with n as the sensor index, while (b) Temperatures measured at N = 16 sensing locations. In standard
a snapshot of its values is given in Fig. 1(b). Each mea- data estimation, the spatial sensor index is used for the horizontal
axis and serves as the data domain. This domain can be interpreted
sured sensor signal can then be mathematically expressed
as a directed path graph structure, shown in the bottom panel (c).
as Observe that the consecutive samples (vertices) on this path graph
x(n) = s(n) + ε(n), n = 0, 1, . . . , 15, (1) offer no physical intuition or interpretation, as in this “brute force”
arrangement, for example, vertex 6 is located on a high mountain,
where s(n) is the true temperature that would have been whereas its neighboring vertices 5 and 7 are located along the sea;
obtained in ideal measuring conditions and ε(n) comprises despite the consecutive index numbers these sensors are physically
the adverse effects of the local environment on sensor read- distant, as indicated by their very different temperature measure-
ments.
ings or faulty sensor activity, and is referred to as “noise”
in the sequel. For illustrative purposes, in our study each
ε(n) was modeled as a realization of white, zero-mean, a more complex estimation structure than the standard
Gaussian process, with standard deviation σε = 2, that is, linear one corresponding to the classical signal processing
ε(n) ∈ N (0, 4). It was added to the signal, s(n), to yield framework, shown in Fig. 1(b).
the signal-to-noise ratio in x(n) of SN Rin = 14.2 dB. To introduce a “situation-aware” noise reduction scheme
Remark 1: Classical data analytics require a rearrange- for the temperature field in Fig. 1, we proceed to ex-
ment of the quintessentially irregular spatial temperature plore a graph-theoretic framework to this problem, start-
sensing arrangement in Fig. 1(a) into a linear structure ing from a local signal average operator. In classical anal-
shown in Fig. 1(b). Obviously, such “lexicographic” or- ysis, this may be achieved through a moving average oper-
dering is not amenable to exploiting the information re- ator, e.g., by averaging across the neighboring data sam-
lated to the actual sensor locations, which is inherently ples, or equivalently neighboring sensors in the linear data
dictated by the terrain. This renders classical analyses of setup in Fig. 1(b), and for each sensing point. Physically,
this multisensor temperature field inapplicable (or at best such local neighborhood should include close neighboring
suboptimal), as the performance critically depends on the sensing points but only those which also exhibit similar
chosen sensor ordering scheme. This exemplifies that even meteorological properties defined by the sensor distance,
a most routine multisensor measurement setup requires altitude difference, and other terrain specific properties.
3
by X
6
11 y(n) = x(m),
m at and around n
12
so that the local average temperature for a sensing point n
10 4 may be obtained by dividing the cumulative temperature,
9 y(n), with the number of included sensing points (size of
5 14
local neighborhood). For example, for the sensing points
13 n = 3 and n = 6, presented in Fig. 2(a), the “domain
1
0 knowledge aware” local estimation takes the form
8 2
3 y(3) = x(3) + x(0) + x(14) + x(15) (2)
15 y(6) = x(6) + x(9) + x(10). (3)
For convenience, the full set of relations among the sensing
7 points can now be arranged into a matrix form, to give
y = x + Ax, (4)
(a)
where the adjacency matrix A, given in (5), indicates
the connectivity structure of the sensing locations; this
11
local connectivity structure should be involved in the cal-
6 culation of each y(n).
12 This simple real-world example can be interpreted within
10 the graph signal processing framework as follows:
4 • Sensing points where the signal is measured are des-
9 ignated as the graph vertices, as in Fig. 1,
14
5 13
• Vertex-to-vertex lines which indicate physically mean-
ingful connectivity among the sensing points become
8 2 0
1 the graph edges, as in Fig. 2(a),
3
• The vertices and edges form a graph, as in Fig.
15 2(b), a new very structurally rich signal domain,
7
• The graph, rather than a standard vector of sens-
ing points, is then used for analyzing and processing
data, as it exhibits both spatial and physical domain
(b) awareness,
Figure 2: Temperature sensing setup as a graph signal estimation • The measured temperatures are now interpreted as
problem. (a) Local neighborhood for the sensing points n = 3, 6, signal samples on graph, as shown in Fig. 3,
and 8. These neighborhoods are chosen using “domain knowledge”
dictated by the local terrain and by taking into account the sensor • Similar to traditional signal processing, this new graph
distance and altitude. Neighboring sensors for each of these sensing
signal may have many realizations on the same graph
locations (vertices) are chosen in a physically meaningful way and
their relation is indicated by the connectivity lines, that is, graph and may comprise noise,
edges. (b) Local neighborhoods for all sensing vertices, presented in
a graph form. • Through relation (4), we have therefore introduced
a simple system on a graph for physically and
spatially aware signal averaging (a linear first-order
In other words, since the sensor network in Fig. 1 mea- system on a graph).
sures a set of related temperatures from irregularly spaced
sensors, an effective estimation strategy should include do- To emphasize our trust in a particular sensor (i.e., to
main knowledge – not possible to achieve with standard model sensor relevance), a weighting scheme may be im-
methods (linear path graph). posed, in the form
To illustrate the advantages of approaches based on lo- X
y(n) = x(n) + Wnm x(m), (8)
cal information (neighborhood based) , consider the neigh-
m6=n
borhoods for the sensing points n = 3 (low land), n = 6
(mountains), and 8 (coast), shown in Fig. 2(a). The cu- where Wnm are the elements of the weighting matrix, W.
mulative temperature for each sensing point is then given There are three classes of approaches to the definition
of graph edges and their corresponding weights, Wnm :
4
0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1
1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0
2 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0
3
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
4
1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
5
0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0
6
0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
7 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
A= (5)
8
0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0
9
0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0
10
0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
11
0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
12
0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0
13
1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
14 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
15 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 0.97 0.91 0 0 0 0 0 0 0 0 0.05 0.90 0.94
1 0 0 0.03 0 0 0 0 0 0 0.37 0 0 0 0.78 0 0
2 0 0.03 0 0 0 0.96 0 0.95 0.98 0 0 0 0 0 0 0
3 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0.88 0.96
4 0.91 0 0 0 0 0 0 0 0 0 0.06 0.01 0.01 0 0.94 0
5 0 0 0.96 0 0 0 0 0 0.97 0.01 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0.62 0.40 0 0 0 0 0
7 0 0 0.95 0 0 0 0 0 0.94 0 0 0 0 0 0 0
W= (6)
8 0 0 0.98 0 0 0.97 0 0.94 0 0 0 0 0 0 0 0
9 0 0.37 0 0 0 0.01 0.62 0 0 0 0.25 0 0 0 0 0
10
0 0 0 0 0.06 0 0.40 0 0 0.25 0 0 0 0.85 0 0
11
0 0 0 0 0.01 0 0 0 0 0 0 0 0.92 0 0 0
12
0 0 0 0 0.01 0 0 0 0 0 0 0.92 0 0 0.01 0
13
0.05 0.78 0 0 0 0 0 0 0 0 0.85 0 0 0 0 0
14 0.90 0 0 0.88 0.94 0 0 0 0 0 0 0 0.01 0 0 0
15 0.94 0 0 0.96 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3.77 0 0 −0.97 −0.91 0 0 0 0 0 0 0 0 −0.05 −0.90 −0.94
0 1.19 −0.03 0 0 0 0 0 0 −0.37 0 0 0 −0.78 0 0
0 −0.03 2.93 0 0 −0.96 0 −0.95 −0.98 0 0 0 0 0 0 0
−0.97 0 0 2.81 0 0 0 0 0 0 0 0 0 0 −0.88 −0.96
−0.91 0 0 0 1.94 0 0 0 0 0 −0.06 −0.01 −0.01 0 −0.94 0
0 0 −0.96 0 0 1.94 0 0 −0.97 −0.01 0 0 0 0 0 0
0 0 0 0 0 0 1.02 0 0 −0.62 −0.40 0 0 0 0 0
0 0 −0.95 0 0 0 0 1.89 −0.94 0 0 0 0 0 0 0
L = (7)
0 0 −0.98 0 0 −0.97 0 −0.94 2.89 0 0 0 0 0 0 0
0 −0.37 0 0 0 −0.01 −0.62 0 0 1.24 −0.25 0 0 0 0 0
0 0 0 0 −0.06 0 −0.40 0 0 −0.25 1.56 0 0 −0.85 0 0
0 0 0 0 −0.01 0 0 0 0 0 0 0.93 −0.92 0 0 0
0 0 0 0 −0.01 0 0 0 0 0 0 −0.92 0.94 0 −0.01 0
−0.05 −0.78 0 0 0 0 0 0 0 0 −0.85 0 0 1.68 0 0
−0.90 0 0 −0.88 −0.94 0 0 0 0 0 0 0 −0.01 0 2.74 0
−0.94 0 0 −0.96 0 0 0 0 0 0 0 0 0 0 0 1.91
5
11 weight elements, Wnm , for the neighboring vertices are cal-
6
culated based on the horizontal vertex distance, rmn , and
12 the altitude difference, hmn , as
10
4 Wmn = e−αrmn −βhmn , (9)
9
where α and β are suitable constants. The so obtained
14 weight matrix, W, is given in (6).
5 13
Based on (4), a weighted graph signal estimator of cu-
0 mulative temperature now becomes
8 2 1
y = x + Wx. (10)
3
15 In order to produce unbiased estimates, instead of the cu-
mulative sums in (4) and (8), the weighting coefficients
7
within the estimate for each y(n) should sum up to unity.
This can be achieved through a normalized form of (10),
given by
1
6
11 y = (x + D−1 Wx), (11)
2
12 where the elements of the diagonal normalization matrix,
10 D,
P are equal to the −1
the degree matrix elements, Dnn =
4 m W nm , while D W is a random walk (diffusion)
9
shift operator [9, 10].
13
14 Now that we have defined the graph vertices and edge
5
weights we may resort to the data-agnostic clustering ap-
2
proaches, given in Part I - Section 4.3, to cluster the ver-
8 0
1 tices of this graph based on the graph topology. Fig. 4
3 shows the clustering result obtained based on the three
smoothest eigenvectors, u1 , u2 , and u3 (excluding the
15
constant eigenvector, u0 ), of the graph Laplacian matrix,
7 L = D − W, of which the values are given in (7). No-
11
6
12
Figure 3: From a multi-sensor temperature measurement to a graph
signal. The temperature field is represented on a graph that combines 10
the spatially unaware measurements in Fig. 1(b) and the physically
4
relevant graph topology in Fig. 2(b). The graph signal values are
represented in two ways: (top) by vertical lines for which the length 9
14
is proportional to the signal values, and (bottom) by using a “hot” 5 13
colormap to designate the signal values at the vertices.
8 2 0
1
• Already physically well defined edges and weights,
3
• Definition of edges and weights based on the geom- 15
etry of vertex positions,
7
• Data similarity based methods for learning the un-
derlying graph topology.
All three approaches to define the edge weights are covered
in detail in Part III of this monograph. Figure 4: Clustering of the graph from Fig. 2(b) based on the graph
Since in our case of geographic temperature measure- Laplacian eigenvectors, u1 , u2 , and u3 . Observe the correct clus-
tering of the graph into the clusters that belong to the seaside area
ments, the graph weights do not belong to the class of
(blue), low mountains (red), low land (yellow), and high mountains
obvious and physically well defined edges and weights, we (green).
will employ the “geometry of the vertices” based approach
for the definition of the edges and weights. In this way, the tice that even such a simple graph clustering scheme was
6
x(5)
2
x(4) x(6)
x(0) x(3) 3 1
x(1) x(2) x(7)
0 1 2 3 4 5 6 7 4 0
Figure 5: Directed path graph representation of a classical time-
domain signal defined on an equidistant discrete-time grid.
5 7
capable of identifying different physically meaningful geo- 6
graphic regions. This also means that temperature estima-
(a)
tion can roughly be performed within each cluster, which
may even be treated as an independent graph (see graph x(3)
x(2)
segmentation and graph cuts in Part I, Section 4), rather x(4) x(1)
than over the whole sensor network. x(0)
x(5) 2
The above-introduced graph data estimation frame- 3 1
work is quite general and admits application to many dif- x(6)
ferent scenarios where, after identifying a suitable graph 4 x(7) 0
topology, we desire to perform estimation on data acquired
on such graphs, the subject of this part of the monograph. 5 7
6
3. Signals and Systems on Graphs
(b)
In classical data analytics, a signal is sampled at suc-
cessive, equally spaced, time instants. This then dictates Figure 6: Graph representation of periodic data. (a) A directed cir-
cular graph. (b) A periodic signal measured on a circular graph. Sig-
the ordering of signal samples, with x(n) being preceded
nal values, x(n), are designated by vertical lines at the corresponding
by x(n−1) and succeeded by x(n+1). The “time distance” vertex, n.
between data samples is therefore an inherent parameter
in standard data processing algorithms. The relation be-
tween sampling instants can also be represented in a graph 2
form, whereby the vertices that correspond to the instants
when the signal is sampled and the corresponding edges 3 1
define the linear sampling (vertex) ordering. The equally
spaced nature of sampling instants in classical scenarios
can then be represented with equal weights for all edges 4 0
(for example, normalized to 1), as shown in Fig. 5.
Algorithms defined in discrete time (like, for example,
those based on the DFT or other similar data transforms), 5 7
usually assume periodicity of the analyzed signals, which
means that sample x(N − 1) is succeeded by sample x(0), 6
in a perpetual sequence. Notice that this case corresponds (a)
to the circular graph, shown in Fig. 6, which allows us to
use this model in many standard data transforms, such as
the DFT, DCT, wavelets, and to define graph-counterparts 2
of other processing algorithms, based on these transforms. 3 1
A signal on general (including also circular) undirected
graph is defined by associating real (or complex) data val- 4 0
ues, x(n), to each vertex, as shown in Fig. 7 and Fig. 8.
Such signal values can be arranged in a vector form 5 7
6
x = [x(0), x(1), . . . , x(N − 1)]T , (b)
so that a graph may be considered as a generalized signal
Figure 7: Undirected circular graph (a) and signal on the graph (b).
domain. Signal values, x(n), are presented as vertical lines at the correspond-
This allows, in general, for any linear processing scheme ing vertex, n.
for a graph signal observed at a vertex, n, to be defined
7
0 1 2
3 1
2
3 0
4
4
5 7
6 5 6
7
(a)
(a) 2
3 1
0 1
2 4 0
3
4
5 7
6
7
5 6
(b)
(b) Figure 9: Graph shift operator on a directed graph (classical circular
shift). (a) Elements of a signal, x, shown as red lines on a directed
Figure 8: Arbitrary undirected graph (a) and signal on graph (b). circular graph. (b) The shifted version, Ax, of the graph signal from
Signal values, x(n), are presented as vertical lines at the correspond- (a). The adjacency matrix of for this graph is given in (14) in Part
ing vertex, n. I.
as a linear combination of the signal value, x(n), at this The original signal, x, is shown in Fig. 9(a), and its shifted
vertex and the signal samples, x(m), at the neighboring version, x1 , in Fig. 9(b). Another simple signal on the undi-
rected graph from Fig. 8 (a) is presented in Fig. 10(a), with
vertices, that is
its shifted version, x1 = Ax, shown in Fig. 10(b).
X
y(n) = x(n)h(n, n) + x(m)h(m, n), (12) A signal shifted by two graph shifts is obtained by fur-
m∈Vn ther shifting x1 = Ax by one shift. The resulting, twice
shifted, graph signal is then given by
where Vn is the set of vertices in the neighborhood of ver-
tex n, and h(m, n) the scaling coefficients. x2 = Ax1 = A(A x) = A2 x.
Remark 2: The estimation form in (12) is highly vertex-
dependent; it is vertex-invariant only in a very specific Therefore, in general, an m times shifted signal on
case of regular graphs, where Vn is a K-neighborhood of graph is given by
the vertex n, with h(n, m) = h(n − m).
xm = Axm−1 = Am x.
We now proceed to define various forms of vertex-invariant
filtering functions, using shifts on a graph. These will then Remark 3: Like the standard shift operator, the second
be used to introduce efficient graph signal processing meth- order shift of a graph signal is obtained by shifting the
ods [11, 12, 13, 14, 15, 16, 17]. already once shifted signal. The role of the shift operator
is assumed by the adjacency matrix, A.
3.1. Adjacency Matrix and Graph Signal Shift
Consider a graph signal, x, for which x(n) is the ob- 3.2. Systems Based on Graph Shifted Signals
served sample at a vertex n. A signal shift on a graph can
Very much like in standard linear shift-based systems,
be defined as movement of the signal sample, x(n), from
a system on a graph can be implemented as a linear combi-
its original vertex, n, along all walks of length one, that is
nation of a graph signal, x, and its graph shifted versions,
K = 1, that start at vertex n. If the signal shifted in this
Am x, m = 1, 2, . . . , M − 1. The output signal from a
way is denoted by x1 , then its values can be defined using
system on a graph can then be written as
the graph adjacency matrix, A, as
M −1
x1 = Ax. (13)
X
y = h0 A0 x+h1 A1 x+· · ·+hM −1 AM −1 x = hm Am x
m=0
Example 1: As an illustration of a graph signal and its shifted (14)
version, consider the signal on a circular graph from Fig. 6(a).
8
0
1
0
2 1
3 4
2
6 7 5 (a) 3 4
6 7 5
0
1
2 input signal x
3 4 (a)
6 7 5 (b)
Figure 10: Graph signal shift on an undirected graph. (a) A simple
signal, x, on an undirected graph. (b) Shifted version, Ax, of the 0
graph signal from (a). 1
2
where A0 = I, by definition, and h0 , h1 , . . . , hM −1 are 3 4
the system coefficients. For a circular (classical linear sys- 7
tem) graph, this relation reduces to the well known Finite 6 5
Impulse Response (FIR) filter, given by,
y(n) = h0 x(n)+h1 x(n−1)+· · ·+hM −1 x(n−M +1). (15)
output signal y
Keeping in mind that the matrix Am describes walks (b)
of the length K = m in a graph (see Property M2 in Figure 11: Example of vertex domain signal filtering. (a) An arbi-
Part 1), the output graph signal, y(n), is calculated as a trary graph signal. (b) The output signal obtained through a first-
linear combination of the input graph signal values and the oder (averaging) system on a graph, defined as y = x + 0.5 Ax.
signal values observed at vertices belonging to the (M −1)-
neighborhood of the considered vertex n. also provided.
Remark 4: When the minimal and characteristic poly- Example 2: Consider a signal on graph from Fig. 8(a), given
nomial are of the same degree, a physically meaningful in Fig. 11(a), and a linear system which operates on this graph,
system order (M − 1) should be lower than the number of defined by the coefficients h0 = 1, h1 = 0.5. Observe that this
vertices N , that is is, M ≤ N . The corresponding con- system on a graph corresponds to a simple classical first-order
dition in classical signal analysis would that the number, weighted moving average system. The output graph signal then
M , of the system impulse response coefficients, hm , in (15) represents a weighted average of the signal value at a vertex n
should be lower or equal to the total number of signal sam- and the signal values at its K = 1 neighborhood. The output
ples, N (for the graph in Fig. 9 it means that the mean- graph signal is shown in Fig. 11(b).
ingful graph signal shifts are m = 0, 1, 2, . . . , N − 1, since General system on graph. A system on a graph may
the shift for m = N reduce to the shift for m = 0, the shift be defined in the vertex domain as
for m = N + 1 is equivalent to the shift for m = 1, and
y = H(A)x, (16)
so on). Therefore, in general, the system order (M − 1)
should be lower than the degree Nm of the minimal poly- where H(A) is a vertex domain system (filter) function.
nomial of the adjacency matrix A. For more detail see A system on a graph is then linear and shift invariant if it
Part I, Section 3.1. satisfies the following properties of:
Remark 5: Any system of order M − 1 ≥ Nm can be 1. Linearity
reduced to a system of order Nm − 1.
H(A)(a1 x1 + a2 x2 ) = a1 y1 + a2 y2 .
Remark 6: If the system order is greater than or equal
to the degree of the minimal polynomial, M − 1 ≥ Nm , 2. Shift invariance
then there exist more than one system producing the same
output signal for a given input signal. All such systems on H(A)[Ax] = A[H(A)x] = Ay.
a graph are called equivalent.
Remark 7: A system on a graph defined by
The statements in the last three remarks will be ad-
dressed in more detail in Section 3.5.3, with their proofs H(A) = h0 A0 + h1 A1 + · · · + hM −1 AM −1 (17)
9
is linear and shift invariant since AAm = Am A. 3.4. System on a graph in the GDFT domain
Consider a general system on a graph defined in (17),
3.3. Graph discrete Fourier transform (GDFT), adjacency
matrix based definition y = H(A)x = h0 A0 + h1 A1 + · · · + hM −1 AM −1 x.
Classical exploratory data analysis often employs esti- (22)
mation of signals in the spectral (Fourier) domain; this has
led to a number of simple and efficient algorithms. While Upon employing the spectral representation of the adja-
standard spectral analysis employs an equidistant grid in cency matrix, A = UΛU−1 , we have
both time and frequency, following the ideas of a system
on a graph, we next show that spectral domain represen- y = h0 UΛ0 U−1 + h1 UΛ1 U−1 + · · · + hM −1 UΛM −1 U−1 x
tations of graph signals are naturally based on spectral
decompositions of the adjacency matrix or graph Lapla- = U(h0 Λ0 + h1 Λ1 + · · · + hM −1 ΛM −1 )U−1 x
cian. = U H(Λ)U−1 x, (23)
The graph Fourier transform of a signal, x, is defined
as with the system on a graph transfer function
X = U−1 x (18)
H(Λ) = h0 Λ0 + h1 Λ1 + · · · + hM −1 ΛM −1 , (24)
where X denotes a vector of the GDFT coefficients, and
U is a matrix whose columns represent the eigenvectors where Λ is the matrix of eigenvalues of A.
of the adjacency matrix, A. Denote the elements of the A pre-multiplication of this relation with U−1 , yields
vector X by X(k), for k = 0, 1, . . . , N − 1, and recall that
for undirected graphs, the adjacency matrix is symmetric, U−1 y = H(Λ)U−1 x (25)
that is, AT = A, and that the eigenmatrices of a symmet-
ric matrix satisfy the property From (18), the terms U−1 y and U−1 x are respectively
the GDFTs of the output graph signal, y, and the input
U−1 = UT . graph signal, x, so that the spectral domain system on a
graph relation becomes
The element, X(k), of the graph Fourier transform vec-
tor, X, therefore represents a projection of the considered Y = H(Λ) X, (26)
graph signal, x(n), onto the k-th eigenvector of A (a basis
function), given by The output graph signal in the vertex domain can then be
calculated as
N
X −1
X(k) = x(n)uk (n). (19) y = H(A)x = IGDFT{H(Λ) X}. (27)
n=0
The element-wise form of the system on a graph in (26) is
In this way, the graph discrete Fourier transform can be of the form
interpreted as a set of projections (signal decomposition)
−1
onto the set of eigenvectors, u0 , u1 , . . . , uN −1 , which serve Y (k) = (h0 + h1 λk + · · · + hM −1 λM
k )X(k),
as orthonormal basis functions.
The inverse graph discrete Fourier transform is then where λk denotes the kth eigenvalue of the adjacency ma-
straightforwardly obtained from (18) as trix, A. From (24) and the above equation, we can now
define the transfer function of a system on a graph in the
x = U X, (20) form
or element-wise Y (k) −1
H(λk ) = = h0 + h1 λk + · · · + hM −1 λM
k . (28)
X(k)
N
X −1
x(n) = X(k)uk (n). (21) Remark 8: The classical linear system in (15) can be ob-
k=0
tained directly from its graph counterpart in (28) when
Observe that, for example, for a circular graph from the graph is directed and circular. This is because the
Fig. 6, the graph discrete Fourier transform pair in (19) adjacency matrix of a directed circular graph has eigen-
and (21) reduces to the standard discrete Fourier trans- values λk = e−j2πk/N (see Part I, Section 3.2.1 for more
form (DFT) pair. For this reason, the transform in (19) detail on directed circular graphs), which are identical to
and its inverse in (21) are referred to as the graph discrete the samples on the unit circle in classical DFT.
Fourier transform (GDFT) and the inverse graph discrete Similar to the z-transform in classical signal processing,
Fourier transform (IGDFT). for systems on graphs we can also introduce the system
transfer function in the z-domain .
10
The z-domain transfer function of a system on a graph with the normalized adjacency matrix in (31) in the same
is defined as way as with the original adjacency matrix. An important
property which does not apply to standard adjacency ma-
H(z −1 ) = Z{hn } = h0 + h1 z −1 + · · · + hM −1 z −(M −1) , trices is that the normalization of adjacency matrix yields
(29) a simpler eigenvector and eigenvalue ordering scheme, as
for n = 0, 1, . . . , M − 1. Obviously, from (28), we have shown next.
H(λk ) = H(z −1 )|z−1 =λk . 3.5.2. Spectral Ordering of Eigenvectors of the Adjacency
However, the definition of the z-transform for arbitrary Matrix
graph signals, x(n) and y(n), that would satisfy the re- For physically meaningful low-pass and high-pass filter-
lation Y (z −1 ) = H(z −1 )X(z −1 ) is not straightforward, ing on a graph, we need to establish the notion of spectral
which limits the application of the z-transform on graphs. order. This, in turn, requires a criterion to classify the
This will be discussed in more detail in Section 3.10. eigenvectors (corresponding to the GDFT basis functions)
into the slow-varying and fast-varying ones.
3.5. Graph Signal Filtering in the Spectral Domain of the Remark 10: In classical Fourier analysis, the basis func-
Adjacency Matrix tions are ordered according to their frequency, whereby, for
The energy of a graph shifted signal is given by example, low-pass (slow varying) basis functions are har-
monic functions characterized by low frequencies. On the
2 2
kx1 k2 = kAxk2 . other hand, the notion of frequency of the eigenvectors of
the graph adjacency matrix, which serve as a basis for for
However, as shown in Fig. 10, in general, the energy of a signal decomposition, is not defined and we have to find
shifted signal is not the same as the energy of the original another criterion to classify or rank order the eigenvectors.
signal, that is Again, we draw the inspiration from classical Fourier anal-
2 2
kAxk2 6= kxk2 . ysis which suggests that the energy of the “signal change”
On the other hand, in graph signal processing it is often can be used instead of frequency to indicate the rate of
desirable that a graph shift does not increase signal energy. change of an eigenvector along time.
One such graph shift operator is introduced bellow. Energy of signal change. The first graph difference can
Remark 9: Using the matrix two-norm it is straightfor- be defined for graph signals as a difference of the original
ward to show that the ratio of energies of the graph shifted graph signal and its graph shift, that is,
signal, Ax, and the original graph signal, x, satisfies the
∆x = x − x1 = x − Anorm x.
relation
kAxk2
2
xT AT Ax In analogy to classical analysis, the energy of signal change
max{ 2 } = max{ 2 } = λ2max . (30) can then be defined as the energy of the first difference of
kxk2 kxk2 a graph signal x, and takes the form
where λmax = maxk |λk |, k = 0, 1, . . . , N − 1.
2
2
1
E∆x = kx − Anorm xk2 =
x −
Ax
.
λmax 2
3.5.1. Normalization of the Adjacency Matrix
From (30), for the energy of a graph shifted signal, When the graph signal assumes a specific form of an
2
kAxk2 , not to exceed the energy of the original graph sig- eigenvector, x = u, of the adjacency matrix, A, the energy
2 of this eigenvector change is equal to
nal, kxk2 , we may employ the normalized adjacency ma-
trix, defined as
2 2
1
1
= 1 − λ ,
Anorm = A (31) E∆u =
u − λu (32)
λmax
λmax
2
λmax
as a graph shift operator within any system on a graph.
whereby the normalized adjacency matrix, Anorm , is used
While this kind of normalization still does not make the
to bound the energy of the shifted graph signal. In the
shift on a graph isometric, the energy of the signal shifted 2
derivation we have also used Au = λu and kuk2 = 1.
in this way is guaranteed not to be bigger than the energy
Now, the lower values of E∆u indicate that u is slow-
of the original graph signal, since
varying, E∆u = 0 indicates that the signal is constant,
2
kAnorm xk2 ≤ kxk2 .
2 while larger values of E∆u are associated with fast changes
of u in time. The form in (32) is also referred to as the
The equality holds only for a very specific signal which is two-norm total variation of a basis function/eigenvector.
proportional to the eigenvector that corresponds to λmax . Therefore, if the change in a basis function, u, has a large
The basic shift on a graph, system on a graph, and energy, then the eigenvector, u, can be considered to be-
graph spectral domain representations can be implemented long to the higher spectral content of the graph signal.
11
Remark 11: From (32), the energy of the rate of change
of a graph signal is minimal for λ = λmax and it increases 0
as λ decreases (see Fig. 10 in Part 1). 1
Now that we have established a criterion for the or- 2
dering of eigenvectors, based on the corresponding eigen- 3 4
values, we shall proceed to define an ideal low-pass filter
6 7 5
on a graph. The intuition behind low-pass filtering in the
graph domain is that such a filter should pass unchanged
all signal components (eigenvectors of A) for which the
rates of change are slower than that defined by the cut-
off eigenvalue, λc (cf. cut-off frequency), while all signal
components (eigenvectors) which exhibit variations which
are faster than that defined by the cut-off eigenvalue, λc , 0 1 2 3 4 5 6 7
should be suppressed. The ideal low-pass filter in the
graph domain is therefore defined as (a) original signal, x = 3.2u7 + 2u6
(
1, for λ > λc , 0
f (λ) =
0, for other λ. 1
2
Example 3: Consider again the undirected graph from Fig.
8(a) on which we observe a graph signal shown in Fig. 12(a), 3 4
which is constructed as a linear combination of two of the
6 7 5
eigenvectors of the adjacency matrix of this graph to give x =
3.2u7 + 2u6 (eigenvectors of the adjacency matrix of the con-
sidered graph are presented in Part I, Fig. 10). The signal is
corrupted by additive white Gaussian noise, ε, at the signal-
to-noise (SNR) ratio of SN Rin = 2.7dB and the noisy graph
signal, xε = x + ε, is shown in Fig. 12(b). This noisy signal is
next filtered using an ideal spectral domain graph filter with a
cut-off eigenvalue of λc = 1. The output signal, xf , is shown
in Fig. 12(c). With SN Rout = 18.8dB, an increase in signal
0 2 6
quality of 16.1dB is achieved with this type of filtering. 1 3 4 5 7
Remark 12: The energy of the rate of change of an eigen-
vector is consistent with the classical DFT based filtering
when λk = exp(−j2πk/N ) and λmax = 1.
(b) noisy signal, xε = x + ε
3.5.3. Spectral Domain Filter Design
0
We shall denote by G(Λ) the desired graph transfer 1
function of a system defined on a graph. Then, a system
with this transfer function can be implemented either in 2
the spectral domain or in the vertex domain. 3 4
In the spectral domain, the implementation is straight- 7
forward and can be performed in the following three steps: 6 5
1. Calculate the GDFT of the input graph signal, X =
U−1 x,
2. Multiply the GDFT of the input graph signal by the
graph transfer function, G(Λ), to obtain the output 0 1 2 3 4 5 6 7
spectral form, Y = G(Λ)X, and
3. Calculate the output graph signal as the inverse GDFT (c) filtered signal
of Y in Step 2, that is, y = UY.
Figure 12: A low-pass graph signal filtering example. (a) Original
This procedure may be computationally very demand- signal, x = 3.2u7 + 2u6 . (b) Noisy signal, xε = x + ε, at an SN R =
2.7dB. (c) Filtered signal, at an SN R = 18.8dB. . Ideal low-pass
ing for large graphs, where it may be more convenient to filtering based on the two highest eigenvalues in the pass-band was
implement the desired filter (or its close approximation) applied.
directly in the vertex domain.
12
For the implementation in the vertex domain, the task 2. If some of the eigenvalues are of a degree higher than
is to find the coefficients (cf. standard impulse response) one (minimal polynomial order, Nm , is lower than
h0 , h1 , . . . , hM −1 in (14), such that their spectral repre- the number of vertices, N ) the system in (33) reduces
sentation, H(Λ), is equal (or approximately equal) to the to a system of Nm linear equations (by removing
desired G(Λ). This is performed in the following way. multiple equations which correspond to the repeated
The transfer function of the vertex domain system is given eigenvalues λ).
by (28) as H(λk ) = h0 + h1 λ1k + · · · + hM −1 λkM −1 and
should be equal to the desired transfer function, G(λk ),
for k = 0, 1, . . . , N − 1. This condition leads to a system (a) If the filter order, M , is such that Nm < M ≤
of linear equations N , the system in (33) is underdetermined. In
that case (M − Nm ) filter coefficients are free
−1 variables and the system has an infinite num-
h0 + h1 λ10 + · · · hM −1 λM
0 = G(λ0 )
−1 ber of solutions, while all so obtained filters are
h0 + h1 λ11 + · · · + hM −1 λM
1 = G(λ1 )
equivalent.
..
.
−1 (b) If the filter order is such that M = Nm , the
h0 + h1 λ1N −1 + · · · + hM −1 λM
N −1 = G(λN −1 ). (33)
solution to the system in (33) is unique.
The matrix form of this system is then
(c) If the filter order is such that M < Nm , the sys-
Vλ h = g, (34) tem in (33) is overdetermined and the solution
is obtained in the least squares sense.
where Vλ is the Vandermonde matrix form of the eigen-
values λk , given by
−1 3. Any filter of an order M > Nm has a unique equiva-
λ10 ··· λM
1 0
−1 lent filter of order Nm . This equivalent filter can be
1 λ11 ··· λM
1
Vλ = . (35) obtained by setting the free variables to zero, that
.. ..
.. ..
. . . is, hi = 0 for i = Nm , Nm + 1, . . . , N − 1.
−1
1 λ1N −1 ··· λM
N −1
Finding the system coefficients
and Exact solution: For M = N = Nm , that is, when the
h = [h0 , h1 , . . . , hM −1 ]T (36) filter order is equal to the number of vertices and the order
is the vector of system coefficients which need to be calcu- of minimal polynomial, the solution to the system in (33)
lated to obtain the desired or (34) is unique and is obtained from
g = [G(λ0 ), G(λ1 ), . . . , G(λN −1 )]T = diag(G(Λ)). (37) h = Vλ−1 g.
Least-squares solution. For the overdetermined case,
Comments on the solution in (33):
when M < Nm , the mean-square approximation of h =
[h0 , h1 , . . . , hM −1 ]T in Vλ h = g is obtained by minimizing
1. Consider the case with N vertices and with all dis- the squared error
tinct eigenvalues of the adjacency matrix (in other
words, the minimal polynomial is equal to the char- 2
e = kVλ h − gk2 .
acteristic polynomial, Pmin (λ) = P (λ)).
From ∂e/∂hT = 0 we then have
(a) If the filter order, M , is such that M = N , then ĥ = (VλT Vλ )−1 VλT g = pinv(Vλ )g.
the solution to (33) is unique, since the deter-
minant of the Vandermonde matrix is always Since M < Nm , the obtained solution, ĥ, is the least-
nonzero. squares approximation for Vλ h = g. Given that this so-
lution may not satisfy Vλ h = g, the designed coefficient
(b) If the filter order, M , is such that M < N , then vector, ĝ (its spectrum Ĝ(Λ)), obey
the system in (33) is overdetermined. There-
Vλ ĥ = ĝ
fore, the solution to (33) can only be obtained
in the least squares sense (as described later in which, in general, differs from the desired system coeffi-
this section). cients, g (their spectrum G(Λ)).
13
1.5 1.5 For this the reason, the approximation of the desired
1 1
transfer function, G(λ), may be performed using the trun-
cated Chebyshev polynomial
0.5 0.5
M −1
c0 X
0 0 PM −1 (z) = + cm Tm (z), (38)
2 m=1
-1 1 3 -1 1 3
where Tm (z) are the Chebyshev polynomials defined as
1.5 1.5
T0 (z) = 1,
1 1
T1 (z) = z,
0.5 0.5 T2 (z) = 2z 2 − 1,
0 0 T3 (z) = 4z 3 − 3z,
..
-1 1 3 -1 1 3 .
Tm (z) = 2zTm−1 (z) − Tm−2 (z), (39)
Figure 13: Design of a graph filter with a specified transfer func-
tion in the spectral domain (cf. standard frequency response). The
desired spectral response, G(λk ), is denoted by blue circles. Red with the variable λ being centered and normalized as
asterisks designate the spectral response of the filter designed in Ex-
ample 4, denoted by Ĝ(λk ), obtained with M filter coefficients, h0 , 2λ − (λmax + λmin )
z= , (40)
h1 , . . . , hM −1 , in the vertex domain. λmax − λmin
such that −1 ≤ z ≤ 1 (required by the Chebyshev poly-
Example 4: Consider the unweighted graph from Fig. 8(a) nomial definition). The inverse mapping, from z to λ, is
and the task of the synthesis of a desired filter for which the
given by
frequency response is described by
g = [0, 0, 0, 0, 0, 0.5, 1, 1]T . 1
λ= z(λmax − λmin ) + λmax + λmin .
2
This filter was designed for various filter orders M = 1, 2, 4, 6,
using (33) and the results are shown in Fig. 13. For clarity, Since the √Chebyshev polynomials are orthogonal, with
analytically, the vertex domain realization of the filter with measure dz/ 1 − z 2 , the Chebyshev coefficients, cm , for
M = 4 is given by an expansion of the desired function, G(z), into the poly-
y = 0.1734A0 x + 0.3532A1 x + 0.0800A2 x − 0.0336A3 x, nomial series, PM −1 (z), are easily obtained as
however, the exact frequency response ĝ = g is only obtained 2 1 dz
Z
with M = N = 8. cm = G(z)Tm (z) √
π −1 1 − z2
Z π
3.5.4. Polynomial (Chebyshev) Approximation of the Sys- 2
= cos(mθ)G(cos(θ))dθ.
tem on a Graph Transfer Function π 0
Without loss of generality, it can be considered that the
Example 5: Consider the unweighted graph from Fig. 8(a)
desired transfer function, g = [G(λ0 ), G(λ1 ), . . . , G(λN −1 )]T ,
with the desired transfer function
consists of samples taken from a continuous function of λ
within the interval λmin ≤ λ ≤ λmax , where λmin and λmax 1 + sign(λ − λ5 )
G(λ) = .
denote the minimum and maximum value of {λ0 , λ1 , . . . , 2
λN −1 }, respectively. The variable λ of the desired trans- The samples of G(λ) at the discrete points
fer function, G(λ), is continuous, and the system on graph
uses only the values at discrete points λ ∈ {λ0 , λ1 , . . . , λN −1 }. λk ∈ {−2, −1.74, −1.28, −0.68, −0.41, 1.11, 1.81, 3.19},
Therefore, for a polynomial approximation, P (λ), of the correspond to the values of G(λk ) in Example 4, Fig. 13. Since
desired transfer function, G(λ), it is important that the the minimum and maximum eigenvalues are λmin = −2 and
error at the points within the considered interval, λmin ≤ λmax = 3.19, this yields the desired transfer function with a
λ ≤ λmax , is bounded and sufficiently small. variable z within a normalized interval, −1 ≤ z ≤ 1,
This problem is known in algebra as the min-max ap-
1 + sign(z − z5 )
proximation, and its goal is to find an approximating poly- G(z) = ,
2
nomial that has the smallest maximum absolute error from
the desired function value. The min-max polynomials can where z5 is defined by (40) as
be approximated by the truncated Chebyshev polynomi- 2λ5 − (λ7 + λ0 )
als, P (λ), which yield approximations of the desired func- z5 = = 0.2.
λ7 − λ0
tion having almost min-max behavior.
14
Calculation complexity. If the number of nonzero ele-
1 1 ments in the adjacency matrix, A, is NA , then the num-
ber of arithmetic operations (additions) to calculate Ax
is of NA order. The same number of operations is re-
quired to calculate A2 x = A(Ax) using the available
0 0
Ax. This means that the total number arithmetic op-
-1 1 3 -1 1 3 erations (additions) to calculate all Ax, A2 x,...,AM −1 x
is of order M NA . Adding these terms requires additional
M NA arithmetic operations (additions), while the calcu-
lation of all terms of the form cm Am x requires an order of
1 1
M NA multiplications by constants cm , m = 0, 1, . . . , M −
1. Therefore, to calculate the output graph signal, y =
P̄M −1 (A)x, an order of 2M NA additions and M NA mul-
0 0 tiplications is needed. Notice that the eigenanalysis of the
adjacency matrix, A, requires an order of N 3 arithmetic
-1 1 3 -1 1 3
operations. For large graphs, the advantage in calculation
Figure 14: Design of a graph filter with a specified transfer function complexity of the vertex domain realization with the poly-
in the spectral domain using the Chebyshev polynomial approxima- nomial transfer function approximation, y = P̄M −1 (A)x,
tion of order (M − 1) with M terms, T0 (z), T1 (z), . . . , TM −1 (z). The is obvious.
desired spectral response, G(λ), is denoted by blue dashed line and
blue dots. Red lines designate the spectral response of the designed
As is common place in standard filter design theory,
Chebyshev approximation. the transition intervals of the approximated transfer func-
tion, G(λ), can be appropriately smoothed, to improve the
approximation.
0 In general, the mapping in (40) from λ to z can be
1
written as z = aλ + b, where a = 2/(λmax − λmin ) and
2 b = −(λmax + λmin )/(λmax − λmin ). The Chebyshev poly-
4 nomials series in λ is then of the form
3
6 7 5 c0
M
X −1
P̄M −1 (λ) = + cm T̄m (λ), (41)
2 m=1
with T̄0 (λ) = 1, T̄1 (λ) = aλ + b, and
T̄m (λ) = 2(aλ + b)T̄m−1 (λ) − T̄m−2 (λ),
0 1 2 3 4 5 6 7
for m ≥ 2.
The same relations hold for
Figure 15: Vertex-domain filtering result for the noisy signal from M −1
Fig. 12, using the Chebyshev approximation of the desired transfer c0 X
P̄M −1 (A) = + cm T̄m (A), (42)
function from Fig. 14 with M = 4. 2 m=1
The Chebyshev series for (M − 1) = 3 is given by This change of variables admits recursive calculation, as
in (39).
PM −1 (z) = 0.43 + 0.62T1 (z) + 0.12T2 (z) − 0.18T3 (z)
= 0.31 + 1.16z + 0.24z 2 − 0.72z 3 . 3.5.5. Inverse System on a Graph
A system on a graph, H(Λ), which represents an in-
Upon the change of variables, z → λ, we obtain the form
verse of the system on a graph, given by G(Λ), can be
P̄M −1 (λ) = 0.07 + 0.36λ + 0.11λ2 − 0.04λ3 . obtained from their generic relationship
Graph signal filtering can now be performed in the vertex
H(Λ)G(Λ)X = X.
domain using
y = P̄M −1 (A)x,
According to (37), this in turn means that if all G(λk ) 6= 0
where and P (λ) = Pmin (λ), then H(λk ) = 1/G(λk ) for each k.
P̄M −1 (A) = 0.07 + 0.36A + 0.11A2 − 0.04A3 .
3.6. Graph Fourier Transform Based on the Laplacian
The result of the vertex domain filtering using P̄M −1 (A) is
shown in Fig. 15 for the noisy signal from Fig. 12, with the Similar to the graph graph discrete Fourier transform
SNR improvement of 16.76 dB. based on the adjacency matrix, spectral representation of
15
a graph signal can be alternatively based on eigenvalue or
decomposition of the graph Laplacian, given by uT Lu = λuT u = λ = Eu .
L = UΛU−1 This means that the quadratic form of an eigenvector, uk ,
is equal to its corresponding eigenvalue. This is elaborated
or LU = UΛ. in detail in Section 4.2.1 in Part I, where we have shown
Although the analysis can be conducted in a unified way that
for spectral decompositions based on both the adjacency N −1 N −1
matrix and the graph Laplacian, due to their different be- 1 X X 2
uTk Luk = λk = Wnm uk (n) − uk (m) ≥ 0.
havior and scope of application, these will be considered 2 n=0 m=0
separately. (45)
The graph Fourier transform of a signal, x, which em-
ploys the graph Laplacian eigenvalue decomposition to de- Obviously, a small uTk Luk = λk implies a small variation of
fine its basis functions, is given by Wnm (uk (n) − uk (m))2 in the eigenvector uk , and for each
vertex n. Consequently, the eigenvectors corresponding
X = U−1 x, (43) to small λk correspond to the low-pass part of a graph
signal. In other words, the smaller the smoothness index
where the matrix U comprises in its columns the eigen-
(curvature), uTk Luk = λk , the smoother the eigenvector,
vectors of the graph Laplacian. The inverse graph Fourier
uk .
transform then follows immediately in the form
An ideal low-pass filter in the Laplacian spectrum do-
x = U X. (44) main, with a cut-off eigenvalue λc , can be therefore defined
as (
In the case of undirected circular unweighted graph, 1, for λ < λc
f (λ) =
such as the graph in Fig. 7(a), this Laplacian based spec- 0, for other λ.
tral analysis reduces to the standard Fourier transform,
but with real-valued basis functions (eigenvectors), as shown Example 6: Consider a signal on the undirected graph from
in Part I, equation (32). Fig. 8(a), shown in Fig. 16(a). This graph signal is generated
as a linear combination of two Laplacian eigenvectors (which
3.7. Ordering and Filtering in the Laplacian Spectral Do- correspond to the slow-varying signal part), to give x = 2u0 +
1.5u1 . The Laplacian eigenvectors of the considered graph are
main
presented in Part I, Fig. 13, while the considered graph signal is
As shown in Section 3.5.2, the graph shift and the ad- shown in Fig. 16(a). The original signal, x, was then corrupted
jacency matrix are related to the first finite difference of by white Gaussian noise at the signal-to-noise ratio of SN Rin =
eigenvectors in the vertex domain, while the rate of the −1.76 dB, and shown in Fig. 16(b). Next, this noisy graph
eigenvector change is related to its corresponding eigen- signal was filtered using an ideal spectral domain graph filter,
value (cf. standard frequency). A similar approach can be with a cut-off eigenvalue λc = 2, to obtain the filtered signal,
used for the Laplacian based eigendecomposition. We have xf , shown in Fig. 16(c). The so achieved output SNR was
seen that for standard time domain signals, the Laplacian SN Rout = 21.29 dB, that is, despite its simplicity, the graph
of a circle graph represents the second order finite differ- filter achieved a gain in SNR of 23.05 dB, as compared to the
noisy signal in Fig. 16(b).
ence, y(n), of a signal u(n), that is
To further illustrate the principle of graph filtering, the
y(n) = −u(n − 1) + 2u(n) − u(n + 1), noisy signal from Fig. 3 was filtered using a filter with the
spectral cut-off at λc = 0.25 and the result is shown in Fig. 17.
as shown in Section 3.3.2 in Part I. A compact expression The same signal was also filtered using a polynomial approxi-
for all elements of the Laplacian can then be written in a mation to the low-pass system, as illustrated in Fig. 18.
matrix form as y = Lu. It is now obvious that the eigen- Laplacian versus adjacency-based GDFT for reg-
vectors, u, which exhibit small variations should also have ular graphs. A direct relation between the adjacency-
a small cumulative energy of the second order difference based and Laplacian-based spectral decomposition can be
X h 2 2 i established for J -regular unweighted graphs (see (13) in
Eu = u(n) − u(n − 1) + u(n) − u(n + 1) /2. Part I), for which
n L = JI−A
Recall that this expression corresponds to the quadratic to yield
(A) (L)
form of the eigenvector, u, defined by Eu = uT Lu. λk = J − λk ,
The above reasoning for the Laplacian quadratic form where the eigenvalues of the adjacency matrix and the
can also be used for graph signals. As a default case for the (A) (L)
graph Laplacian are respectively denoted by λk and λk ,
Laplacian analysis we will consider undirected weighted
while they share the same eigenvectors.
graphs, where by definition
Remark 13: Rank-ordering of the eigenvectors, uk , from
Lu = λu, uT u = 1 low-pass to high-pass, which is based on the respective
16
11
0 6
1
12
2 10
3 4 4
6 7 5
9
14
5 13
8 2 0
1
3
7 15
0 1 2 3 4 5 6
7
(a) original signal
Figure 17: Denoising results for the noisy signal from Fig. 3, which
0 was filtered using a low-pass graph filter with λc = 0.25.
1
2 (A) (L)
eigenvalues, λk and λk , yields exactly opposite ordering
3 4 for these two graph spectral decompositions. For exam-
6 7 5
(L)
ple, the smoothest eigenvector is obtained for mink λk =
(L) (A) (L)
λ0 = 0 or for maxk λk = λmax = J − λ0 = J .
3.8. Systems on a Graph Defined Using the Graph Lapla-
cian
Following on the discussion in Section 3.2 and equa-
tion (14), a system on a graph, defined using the graph
Laplacian, has the form
1 7
0 2 3 4 5 6
y = h0 L0 x + h1 L1 x + · · · + hM −1 LM −1 x
M
X −1
(b) noisy signal = hm Lm x. (46)
m=0
0 For an unweighted graph, this definition of a system
1 on a graph can be related to the corresponding adjacency
matrix form as L = D − A.
2 The spectral domain description of a system on a
3 4 graph is then obtained through the Laplacian eigenvalue
6 7 5
decomposition, to yield
M
X −1
y = UY = hm Lm x = H(L)x
m=0
= UH(Λ)UT x = UH(Λ)X, (47)
7 where we used the property of the eigendecomposition of
0 1 2 3 4 5 6
matrix polynomial,
(c) filtered signal M
X −1 M
X −1
H(L) = hm Lm = hm UΛm UT = UH(Λ)UT
Figure 16: Graph signal filtering example. (a) Original signal. (b) m=0 m=0
Noisy signal. (c) Filtered signal. Low pass filtering was performed (48)
based on the two lowest eigenvalues of the graph Laplacian.
17
described in Section 3.2.3 in Part I, and the notation For illustration, consider the delta function located at a
graph vertex m, given by
M
X −1
hm Λm
(
H(Λ) = (49) 1, for m = n
m=0 δm (n) = (53)
0, for m 6= n
to obtain
Y = H(Λ)X with the corresponding GDFT
N −1
or in an element-wise form X
∆(k) = δm (n)uk (n) = uk (m), (54)
Y (k) = H(λk )X(k), k = 0, 1, . . . , N − 1. n=0
which is defined based on graph Laplacian eigenvectors.
Observe that, similar to the standard time domain, any
11 graph signal can be written as a sum of delta functions at
6
the graph vertices, that is
12
N −1
10 X
x(n) = x(i)δn (i)
4 i=0
9
13
14 or in a vector form
5
N
X −1
8 2 0 x= x(i)δ i ,
1
i=0
3
where δ i is a vector with elements δ(n − i), as in (53).
15
Then, the system output, y, takes the form
7
M
X −1
y= hm Lm x = UH(Λ)UT x
m=0
N
X −1
= x(i)UH(Λ)UT δ i
Figure 18: Graph filtering of a noisy signal from Fig. 3, using a
i=0
fourth-order system given by y = h0 L0 x + h1 L1 x + h2 L2 + h3 L3 +
h4 L4 . and its elements are obtained as
N −1 N −1 N −1
In the vertex domain, the n-th element of the output X X X
signal, y = UH(Λ)UT x, of a system on a graph is given y(n) = x(i) uk (n)H(λk )uk (i) = x(i)hn (i),
by i=0 k=0 i=0
N −1 N −1 N −1
where hn (i) are related to H(λk ) as in (52).
X X X
y(n) = x(i)uk (i)H(λk )uk (n) = x(i)hn (i), Remark 15: According to (47), the form of the graph con-
k=0 i=0 i=0 volution operator for a vertex n, given in (50), is localized
(50) within the (M − 1)-neighborhood of vertex n. This local-
for which the transfer function is defined by ization property is even more important for large graphs.
−1 A generalized convolution for two arbitrary graph sig-
H(λk ) = h0 + h1 λk + · · · + hM −1 λM
k (51)
nals will be addressed next.
and the graph impulse response is
3.9. Convolution of Signals on a Graph
N
X −1 Consider two graph signals, x(n) and h(n). A general-
hn (i) = H(λk )uk (n)uk (i) = Tn {h(i)}. (52) ized convolution operator for these two signals on a graph
k=0 is defined using their graph Laplacian spectra [18], based
on the assumption that the spectrum of a convolution on
Remark 14: The expression for y(n) in (50) can be in- a graph
terpreted as a generalized convolution on graphs, which is y(n) = x(n) ∗ h(n)
performed using a generalized graph shift of the impulse
response, hn (i), in the vertex domain. is equal to the product of the corresponding spectra of
graph signals, x(n) and h(n), that is
We next proceed to describe the generalized convolu-
tion on graphs through responses to the unit delta pulses. Y (k) = X(k)H(k), (55)
18
in the element-wise form. The output of the generalized
graph convolution operation, x(n) ∗ h(n), is then equal to
the inverse GDFT of the spectral product Y (k) in (55),
that is
0 1 2 3 4 5 6 7
y(n) = x(n) ∗ h(n)
N −1 N −1
X X 1
= Y (k)uk (n) = X(k)H(k)uk (n), 0
2
k=0 k=0 3
4
0 1 2 3 4 5 6 7 5
where 7
N
X −1 6
H(k) = h(n)uk (n). (56)
1
n=0 0
Notice the difference between the definition of H(k) in (56) 3
2
4
and H(λk ) in (51). Both these forms will be discussed in 0 1 2 3 4 5 6 7 5
more detail in the next section. 6
7
Shift on a graph – an alternative definition. The
above framework of generalized graph convolution can also 0
1
serve as a basis for a convenient definition of a shift on a 2
4
graph. Consider the graph signal, h(n), and the delta 3
5
function located at a vertex m. Here, we will use hm (n) to 0 1 2 3 4 5 6 7
7
denote the shifted version of the graph signal, h(n), “to- 6
ward” a vertex m. This kind of shifted signal will be de-
1
fined following the reasoning in classical signal processing 0
where the shifted signal is obtained as a convolution of the 3
2
4
original signal and an appropriately shifted delta function. 0 1 2 3 4 5 6 7 5
Therefore, a graph shifted signal is here defined through a 6
7
generalized graph convolution, h(n) ∗ δm (n), whose GDFT
is equal to H(k)uk (m), according to (54) and (55). The 1
0
graph shifted signal is then the IGDFT of H(k)uk (m), that 2
4
is 3
0 1 2 3 4 5 6 7 5
N −1 7
X 6
hm (n) = h(n) ∗ δm (n) = H(k)uk (m)uk (n). (57)
k=0 1
0
The same relation follows when calculating the inverse 2
4
3
GDFT of X(k)H(k), to yield 0 1 2 3 4 5 6 7 5
7
6
N
X −1
y(n) = X(k)H(k)uk (n)
1
k=0 0
N −1 N −1 2
4
X X 3
= x(m)uk (m)H(k)uk (n) 0 1 2 3 4 5 6 7 5
k=0 m=0 6
7
N
X −1
= x(m)hm (n) = x(n) ∗ h(n), (58) 1
0
m=0
2
4
3
where 0 1 2 3 4 5 6 7 5
7
N −1 6
X
hm (n) = H(k)uk (m)uk (n) = Tm {h(n)} (59)
k=0 Figure 19: An example of graph shift operator. Top: The graph
signal defined by its Laplacian GDFT, given by H(k) = exp(−2λk τ ).
is another version of graph shifted signal. Since the defini- Left and right column: The graph signals hm (n) ”shifted” for m = 0
tion of H(k) as a GDFT of a signal h(n) differs from that to m = 7, calculated using hm (n) = Tm {h(n)} in (59). The shifted
in (51), these produce different shift operations, which are signal is shown both on the vertex index line (left) and on the graph
itself (right).
respectively denoted by Tm {h(n)} and Tm {h(n)}.
19
Remark 16: Note that neither of the two shift operations,
2
(52) or (59), satisfy the property that a shift by 0 is equal z -1 complex plane
to the original signal, h0 (n) 6= h(n). 1.5
Example 7: Consider a signal on graph from Fig. 8(a), which 1 4
6
is defined by its graph Laplacian GDFT, given by
0.5 2
H(k) = exp(−2λk τ ), 1 0
0
3
with τ = 0.1573. All shifted signals, hm (n) = Tm {h(n)}, ob-
-0.5 7
tained using the shift operator in (59), are shown in Fig. 19. 5
-1
3.10. The z-transform of a Signal on a Graph
-1.5
The relation between the graph signal shift operators,
Tm {h(n)} and Tm {h(n)}, which are respectively used used -2
to define the generalized convolutions in (51) and (58),
-2 -1 0 1 2
can be established based on the definitions of H(λk ) and
H(k). Consider H(λk ), defined by (51), as a graph discrete Figure 20: Complex eigenvalues of the adjacency matrix of a directed
Fourier transform of signal h(n). The samples of the graph graph in Fig. 1(b) in Part I.
signal h(n) are then equal to the IGDFT of H(λk ), that is
N −1 The z-transform of graph signals. For a given graph
signal x = [x(0), x(1), . . . , x(N −1)]T , following the reason-
X
h(n) = H(λk )uk (n)
k=0 ing as in (60), the coefficients of a system [x0 , x1 , . . . , xN −1 ]T
which corresponds to a system transfer function that would
while the system coefficients hn , n = 0, 1, . . . , M − 1, are have the same GDFT as the graph signal itself are
related to H(λk ) by (51), that is
[x0 , x1 , . . . , xN −1 ]T = Vλ−1 UT [x(0), x(1), . . . , x(N − 1)]T
−1
H(λk ) = h0 + h1 λk + · · · + hM −1 λM
k .
or
For M = N , the vector forms of the last two relations are
[x0 , x1 , . . . , xN −1 ]T = Vλ−1 [X(0), X(1), . . . , X(N − 1)]T .
T
[h(0), h(1), . . . , h(N − 1)] = UH(Λ)
The graph z-transform of a signal x is therefore equal to
H(Λ) = Vλ [h0 , h1 , . . . , hN −1 ]T the classic z-transform of coefficients [x0 , x1 , . . . , xN −1 ]T ,
so that the signal, h(n), and the coefficients, hn , can be X(z −1 ) = Z{xn } = x0 +x1 z −1 +· · ·+xN −1 z −(N −1) (61)
related as
so that the following holds
[h0 , h1 , . . . , hN −1 ]T = Vλ−1 UT [h(0), h(1), . . . , h(N −1)]T .
(60) Y (z −1 ) = H(z −1 )X(z −1 )
Remark 17: In classical DFT (the case of a directed cir- The output signal, y(n), can now be obtained as
cular graph and its adjacency matrix, when UH should be
used instead of UT ), the signal samples, h(n), which are [y(0), y(1), . . . , y(N − 1)]T = UVλ [y0 , y1 , . . . , yN −1 ]T ,
obtained as the inverse DFT of H(λk ) and the system coef-
ficients, hn , are the same, since the eigenvalues are equal to where the output graph signal, y(n), results from the in-
the corresponding shift operators in the spectral domain, verse z-transform of the coefficients, yn , that is
√
λk = √ exp(−j2πk/N ) and u√ k (n) = exp(j2πnk/N )/ N = Y (z −1 ) = Z{yn } = y0 + y1 z −1 + · · · + yN −1 z −(N −1) .
λ−n
k / N , with hn = h(n)/ N and
The z-transform representation in the complex valued
N −1
1 X z-domain may be of interest when the eigenvalues are complex-
H(k) = √ h(n)e−j2πnk/N .
N n=0 valued, which occurs in the decomposition of adjacency
matrices of undirected graphs. For example, for the graph
Therefore, for classical DFT analysis, the following rela- from Fig. 1(b) in Part I and its adjacency matrix, the
tion holds √ eigenvalues are shown in Fig. 20.
N Vλ = (UH )−1 . Definition: The analytic graph signal, Xa (k), and the
√ graph Hilbert transform, Xh (k), are defined in the spectral
This relation is obvious from (35) and u∗k (n) = λnk / N , domain as
and will be used to define the z-transform of a graph sig-
nal. Xa (k) = 1 + sign(=(λk )) X(k)
20
Xh (k) = j sign =(λk ) X(k) 3.13. Optimal Denoising
Consider a measurement, x, composed of a slow-varying
X(k) = Xa (k) + jXh (k), graph signal, s, and a fast changing disturbance, ε, to give
where =(λk ) denotes imaginary part of λk . If these rela-
tions are applied to the standard DFT with λk = e−j2πk/N x = s + ε.
we would obtain the corresponding classical signal process-
The aim is to design a filter for disturbance suppression
ing definitions.
(denoising), the output of which is denoted by y = H(x).
The optimal denoising task may then be defined as a
3.11. Shift Operator in the Spectral Domain
minimization of the objective function
A shift operation in the spectral domain can be defined
in the same way as the shift in the vertex domain. Consider 1
J= ky − xk22 +αyT Ly. (64)
a product of two graph signals, x(n)y(n), defined on an 2
undirected graph. The GDFT of this product then takes
the form Physically, the minimization of the first term 21 ky − xk22
forces the output signal y to be as close as possible to the
N
X −1 available observations x, in terms of the energy of their Eu-
GDFT{x(n)y(n)} = x(n)y(n)uk (n) = clidean distance (minimum error energy), while the second
n=0 term represent a measure of smoothness of y (see Section
N −1 N −1 N −1
X X X 3.7). This is also physically meaningful, as the original in-
X(i)ui (n)y(n)uk (n) = X(i)Yi (k), put, s, was low-pass and smoother than the disturbance, ε.
n=0 i=0 i=0
The parameter α provides a balance between the closeness
where of output, y, to x and the output smoothness criterion.
N
X −1 To solve this minimization problem, we differentiate
Yi (k) = y(n)ui (n)uk (n)
n=0 ∂J
= y − x + 2αLy = 0
can be considered as a shift of Y (k) by i spectral indices. ∂yT
Remark 18: As desired, a shift by i = 0 in the spectral which results in
domain produces the√original value, Y0 (k) = Y (k), up to
a constant factor 1/ N . This relation does not hold for y = (I + 2αL)−1 x.
the shift operators in the vertex domain.
The spectral domain form of this relation follows from L =
UT ΛU, Y = UT y, and X = UT x, to yield
3.12. Parseval’s Theorem on a Graph
Consider two graph signals, x(n) and y(n), which are Y = (I + 2αΛ)−1 X.
observed on an undirected graph and their spectra, X(k)
and Y (k). Then, Parseval’s theorem has the form The element-wise transfer function of the above spectral
input/output relation then takes the form
N
X −1 N
X −1
x(n)y(n) = X(k)Y (k) (62) 1
H(λk ) = . (65)
n=0 k=0 1 + 2αλk
and it holds for any two graph signals.
To prove Parseval’s theorem on graphs, consider Remark 19: For a small α, we have H(λk ) ≈ 1, that
is, an all-pass behavior of (65), with no signal smoothing,
N −1 N −1 N −1
which yields y ≈ x. On the other hand, for a large α,
X X X
x(n)y(n) = X(k)uk (n) y(n) H(λk ) ≈ δ(k). The resulting y ≈ const. is maximally
n=0 n=0 k=0
smooth (a constant output, without any variation).
N
X −1 N
X −1
= X(k) y(n)uk (n), (63) Example 8: The noisy signal from Fig. 3 was filtered using
k=0 n=0 the optimal filter in (65) with α = 1, and the result is shown
in Fig. 21. The achieved SNR was 19.16 dB.
to yield Parseval’s equivalence between the energies in the Other cost functions. Among many possible alterna-
original and spectral domains. It has been assumed that tives, we will introduce two more cost functions for graph
the graphs are undirected, so that U−1 = UT holds. This signal denoising, which exploit different constraints im-
theorem is quite general and applies to both the graph posed on the solution.
Laplacian and the adjacency matrix based decompositions Instead of enforcing the smoothness of the output sig-
on undirected graphs. nal, we may instead desire that its deviation from a linear
form (that is, the signal, y, which satisfies Ly = 0) is
21
and
11
6 !p/2
N −1 N −1 N −1
12 1X 2
X X
2
J= (y(n)−x(n)) +α Wmn (y(n)− y(m))
10 2 n=0 n=0 m=0
4 (68)
9
13
14 with 0 ≤ p ≤ 1.
5
Remark 20: The zero-norm, `0 , with p = 0, is the best in
8 2 0 promoting sparsity, since for p = 0 the second term in the
1
cost function in (67) counts (and minimizes) the number
3 of nonzero elements in Ly. Minimization of the sparsity
15 of Ly promotes constant (or linear) solutions for y, with
the smallest number of discontinuities (nonzero elements of
7
vector Ly). In the second cost function in (68), the zero-
norm promotes the smallest
PN −1 possible number of2 nonzero
elements of the term m=0 Wmn (y(n) − y(m)) ; this is
also known as the total variation (TV) approach. However,
the minimization of such objective functions cannot be
Figure 21: Graph signal denoising for a noisy signal from Fig. 3,
achieved in an analytic way, like in the standard MSE case
which is filtered using an optimal filter in (65), with α = 1.
of p = 2.
On the other hand, the choice of p = 1 with one-norm,
as small as possible. This can be achieved with the cost `1 , makes the above cost functions convex, allowing for
function given by gradient descend methods be used to arrive at the solution,
1 1 while producing the same solution as with p = 0, under
J= ky − xk22 +αkLyk22 = ky − xk22 +αyT L2 y (66) some mild conditions. The `1 -norm serves as an analytic
2 2
proxy to the `0 -norm [19].
which yields a closed form denoising solution
y = (I + 2αL2 )−1 x 3.14. Systems on a Graph Defined Using Random Walk
Laplacian
with the corresponding element-wise spectral domain re- While common choices for the graph shift operator
lation H(λk ) = 1/(1 + 2αλ2k ). are: (i) adjacency matrix, S = A, and (ii) graph Lapla-
A combination of the two cost function forms in (64) cian, S = L, normalized versions of the adjacency matrix,
and (66), may provide additional flexibility in the design graph Laplacian, S = D−1/2 LD−1/2 , and random walk
of the filter transfer function, for example (diffusion) matrix, S = D−1 W, can also be used [20, 21].
1 Various shift operators produce corresponding eigenvector
J= ky − xk22 +αyT Ly + βyT L2 y (signal decomposition) bases, such as those analyzed in
2
Part I and given in Table 1.
would yield the transfer function A generalized form of the output from a system on a
1 graph can then be written as
H(λk ) = .
1 + 2αλk + 2βλ2k M −1
X
0 1 M −1
This transfer function form can be further fine-tuned through y = h0 S x + h1 S x + · · · + hM −1 S x = hm Sm x,
m=0
the choice of the parameters α and β. For example, if we
(69)
desire the component corresponding to λ1 6= 0 not to be at-
where, by definition S0 = I, while h0 , h1 , . . . , hM −1 are
tenuated, we would use α + βλ1 = 0. Such a cost function
the system coefficients.
can be straightforwardly extended to produce a transfer
An unbiased version of the random walk shift operator
function for M unattenuated components.
can also be employed in this context, defined as
Sparsity promoting solutions. Some applications re-
quire to promote the sparsity of the output graph signal, S = (I + D)−1 (I + W), (70)
rather than its smoothness. Such solutions then naturally
rest upon compressive sensing theory which requires the as it exhibits the desirable property of asymptotic signal
two-norm in the previous cost functions to be replaced energy preservation [22]. The shift operator in (70) can
with the norms that promote sparsity. Two examples of be derived under the assumption that the random graph
such cost functions are signal, x, follows the general random walk (GRW) model,
1 which exhibits the following properties:
J = ky − xk22 +αkLykpp (67)
2
22
the GRW has a probability density which convergences to
Table 1: Summary of graph spectral basis vectors.
that of the Wiener process [23, 24, 25, 26]. In the graph
Operator Eigenanalysis setting, for a walker at a vertex m, the central limit the-
orem [27] asserts that after a sufficiently large number of
Graph Laplacian Luk = λk uk independent steps, the probability of walker’s position is
2
Gaussian distributed, Pmn ∝ e−rmn , where rmn is a mea-
Generalized eigenvectors sure of physical distance between vertices m and n. Conse-
of graph Laplacian Luk = λk Duk quently, the elements of the GRW weight matrix, denoted
1 1 by W̃, are given by
Normalized graph Laplacian D− 2 LD− 2 uk = λk uk 2
−r
e mn , (m, n) ∈ E,
Adjacency matrix Auk = λk uk W̃mn = 1, (75)
m = n,
0, (m, n) ∈ / E.
1
Normalized adjacency matrix λmax A uk = λk uk
Notice that in a probabilistic setting the vertices are im-
plicitly self-connected ; to ensure that the transition proba-
i) Graph Markov property, that is, the random process bilities sum up to unity, we therefore need to normalise the
is dependent only of its shifted state, GRW weights to obtain Pmn = W̃mn /D̃mm . Notice that
! the standard weight matrix, W, has zeros on the diagonal
so that for W̃ in (75), W̃ = (I + W) and D̃ = (I + D).
\ m
P x S x = P x Sx ; (71)
Therefore, this graph shift operator takes the form in (70).
m>0
Example 9: Consider again the multi-sensor setup described
in Section 2, and shown in Fig. 22(a). The graph shift op-
ii) Graph Martingale property, whereby the conditional erator based on the GRW model was employed within a first
expectation of the random process is equal to its order averaging system (h0 = 0, h1 = 1), as in (69), to estimate
shifted state, which can be written as the true temperature from the observed temperature field. The
2
( ) weight matrix elements, Wmn = e−rmn , were specified based
on the Euclidean distance between vertices, rmn , thereby ac-
\ m
E x S x = Sx. (72)
counting for the difference in latitude, longitude and altitute.
m>0
The resulting denoised temperature field is illustrated in Fig.
22(b) and demonstrates the attained increase in the SNR from
In this way, the random walk can be described by a Markov
14.2 dB to 19.8 dB, which results from the desirable unbiased-
matrix, P ∈ RN ×N , with its (m, n)-th element defined as ness and asymptotic power preservation properties of the shift
the transition probability, Pmn , of going from vertex m to operator.
vertex n. By setting S = P, this shift operator is unbi-
ased, since each row in P sums up to unity, i.e. P 1 = 1.
Furthermore, owing to the graph Martingale condition in
(72), the shift operator exhibits a dual role of the expecta-
tion operator, since Sx = Px = E {x}. With this result,
it can also be proven that with an increase in the number
of vertices, N , the shift operator is asymptotically power
preserving (isometric), that is [22]
lim kSxk2 = E kxk2 .
(73)
N →∞ (a) Observed field (b) GRW local expectation
(SNR = 14.2 dB). (SNR = 19.8 dB).
Therefore, the class of systems based on this graph shift
also exhibits the following boundedness property Figure 22: Local average operator on a graph based on the general-
ized random walk (GRW) shift operator. The graph signal intensity
M
X −1 is designated by the vertex color.
lim kyk2 ≤ |hm |2 E kxk2 .
(74)
N →∞
m=0
The use of the Markov matrix as the shift operator was re- 4. Subsampling, Compressed Sensing, and Recon-
cently proposed in [20, 21], and the above analysis further struction
justifies this concept.
In practice, the actual probabilities of vertex transition Graphs may comprise of a very large number of ver-
are often unknown but can be inferred from the available tices, of the order of millions or even higher. The associ-
information of the graph topology, implied by the weight ated computational and storage issues bring to the fore the
matrix, W. In the limit, Donsker’s theorem states that consideration of potential advantages of subsampling and
23
compressive sensing defined on graphs. We here present coefficients X(K), X(K + 1), . . . , X(N − 1) in (77) since
several basic approaches to subsampling, along with their these are zero-valued and do not contribute to the for-
relations to classical signal processing [28, 29, 30, 31, 32, mation of graph signal samples. With this in mind, the
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]. M × N system of equations in (78) is reduced to the fol-
lowing M × K system
4.1. Subsampling of Low-Pass Graph Signals
x(n1 ) u0 (n1 ) u1 (n1 ) . . . uK−1 (n1 ) X(0)
For convenience, we shall start from the simplest case x(n2 ) u0 (n2 ) u1 (n2 ) . . . uK−1 (n2 ) X(1)
where the considered graph signal is of a low-pass nature. .. =
.. .. .. ..
.. ,
Such a signal can be expressed as a linear combination of . . . . . .
K < N eigenvectors of the graph Laplacian which exhibit x(nM ) u0 (nM ) u1 (nM ) . . . uK−1 (nM ) X(K − 1)
the lowest smoothness indices, or, in the matrix form
K−1
X y = AM K XK , (79)
x(n) = X(k)uk (n), n = 0, 1, . . . , N − 1. (76)
k=0 where the definitions of the reduced measurement matrix
AM K and the reduced GDFT vector XK are obvious. For
The GDFT domain coefficients of this (K-sparse) signal M = K independent measurements, this system can be
in the GDFT domain are of the following form solved uniquely, while for M > K the system is typi-
cally overdetermined and the solution is found in the least
X = [X(0), X(1), . . . , X(K − 1), 0, 0, . . . , 0]T . (77)
squares (LS) sense, as [34]
Recall that a graph signal is sparse in the GDFT do- XK = (ATM K AM K )−1 ATM K y = pinv(AM K )y, (80)
main if K N . The smallest number of graph signal
samples, M , needed to recover the sparse signal is therefore where pinv(AM K ) = (ATM K AM K )−1 ATM K is the matrix
M = K < N . For stability of reconstruction, it is common pseudo-inverse of AM K .
to employ K ≤ M < N graph signal samples. The vector After XK is calculated, all GDFT values follow directly
of available graph signal samples will be referred to as the as X = [X(0), X(1), . . . , X(K − 1), 0, 0, . . . , 0]T , where the
measurement vector, and will be denoted by y, while the assumed zero values are added for X(K), X(K + 1), . . . ,
set of vertices (a random subset of V = {0, 1, 2, . . . , N −1}) X(N −1). The graph signal is then recovered at all vertices
over which the samples of graph signal are available is de- using x = U X.
noted by Recovery condition. The signal reconstruction in (80) is
M = {n1 , n2 , . . . , nM }. possible if the inverse (ATM K AM K )−1 exists, which means
that
The measurement matrix can now be defined using the
rank(ATM K AM K ) = K. (81)
IGDFT, x = U X, of which an element-wise form is given
by (76). The equations in (76) corresponding to the avail- In terms of the matrix condition number, this requirement
able graph signal samples at vertices n ∈ M = {n1 , n2 , . . . , nM }is equivalent to
then define the system cond(ATM K AM K ) < ∞,
x(n1 ) u0 (n1 ) u1 (n1 ) . . . uN −1 (n1 ) X(0) that is, a nonsingular ATM K AM K .
x(n2 ) u0 (n2 ) u1 (n2 ) . . . uN −1 (n2 ) X(1)
Remark 21: For noisy measurements of graph signals, the
.. = .. .. .. .. ,
..
noise in the reconstructed GDFT coefficients is directly re-
. . . . . .
x(nM ) u0 (nM ) u1 (nM ) . . . uN−1 (nM ) X(N − 1) lated to the input noise and the matrix condition number.
If we are able to choose the available signal sample po-
for which the matrix form is given by sitions (vertices), then the sampling strategy would be to
find the set of measurements so that these produce the
y = AM N X, (78) condition number which is as close to unity as possible
(for stability and reduced influence of noise).
where AM N is the measurement matrix and the measure-
ments vector Example 10: To demonstrate the principle of reconstruction
from a reduced set of graph signal samples, consider the values
y = [x(n1 ), x(n2 ), . . . , x(nM )]T of a graph signal at M = 3 vertices, given by
y = [x(0), x(2), x(6)]T = [1.140, 0.996, 0.563]T ,
consists of the available graph signal samples. In general,
since M < N this system is underdetermined, and cannot as shown in Fig. 23 (upper panel). Assume that the graph sig-
be solved uniquely for X without additional constraints. nal is of a low-pass type, with K = 2 lowest nonzero GDFT co-
The assumption that the spectral representation of a efficients X(0) and X(1). The GDFT coefficients of this graph
signal can then be reconstructed from
signal contains a linear combination of only K ≤ M slow-
est varying eigenvectors allows us to exclude the GDFT y = A32 X2 , (82)
24
Similar to (78), the corresponding system of equations
0 x(n1 ) uk1 (n1 ) uk2 (n1 ) . . . ukK (n1 ) X(k1 )
×1 x(n2 ) uk1 (n2 ) uk2 (n2 ) . . . ukK (n2 ) X(k2 )
.. = .. .. .. .. .
..
2
. . . . . .
×
3 ×4 x(nM ) uk1 (nM ) uk2 (nM ) . . . ukK(nM ) X(kK )
(83)
6 7 ×5
× of which the matrix form is y = AM K XK , is solved for
the nonzero spectral values X(k), k ∈ K, in the same way
as in the case of a low-pass signal presented in Section 4.1.
0
1 4.2.2. Support Matrices, Subsampling and Upsampling
In graph signal processing literature, the subsampling
2 problem is often defined using the so called support ma-
3 4 trices. Assume that a graph signal, x, is subsampled
7 in such way that it is available on a subset of vertices
6 5 n ∈ M = {n1 , n2 , . . . , nM }, rather than on the full set of
vertices. For this subsampled signal, we can define its
Figure 23: Illustration of the subsampling of a lowpass graph signal.
Top: A graph signal with missing samples at vertices 1, 3, 4, 5 and 7.
upsampled version, xs , by adding zeros at the vertices
Bottom: The reconstructed graph signal. where the signal is not available. Using a mathematical
formalism, the subsampled and upsampled version, xs , of
the original signal, x, is then
that follows from the definition in (76) for the assumed available
signal samples, x(n), at the three vertices n = 0, n = 2, and xs = Bx, (84)
n = 6, for two nonzero coefficients, X(0) and X(1),
where the support matrix B is an N × N diagonal matrix
with ones at the diagonal positions which correspond to
x(0) u0 (0) u1 (0)
x(2) = u0 (2) u1 (2) X(0) . M = {n1 , n2 , . . . , nM } and zeros elsewhere. The subsam-
X(1)
x(6) u0 (6) u1 (6) pled and upsampled version, xs , of the signal x is obtained
is such a way that the signal x is subsampled on a reduced
The rank of the matrix A32 is 2. The corresponding ma-
trix condition number is cond(AT32 A32 ) = 4.33, while the re- set of vertices, and then upsampled by adding zeros at
constructed nonzero values of the GDFT are X(0) = 2 and the original signal positions where the subsampled signal
X(1) = 1, to yield the reconstructed graph signal x = U X, is not defined.
with X = [2, 1, 0, 0, 0, 0, 0, 0]T , as shown in Fig. 23 (lower Recall that in general a signal, x, with N independent
panel). values cannot be reconstructed from its M < N nonzero
values in xs , without additional constraints. However, for
Remark 22: For a directed circular√ graph, with the eigen- graph signals which are also sparse in the GDFT domain,
vectors uk (n) = exp(j2πnk/N )/ N , the above downsam-
the additional constraint is that the signal, x, has only
pling and interpolation relations are identical to those in
K ≤ M nonzero coefficients in the GDFT domain, X =
classical signal processing [35].
UT x, at k ∈ K = {k1 , k2 , . . . , kK }, so that the relation
4.2. Subsampling of Sparse Graph Signals X = CX
The subsampling of graph signals which are sparse in holds, where the support matrix C is an N × N diagonal
the GDFT domain will be next considered for the cases of matrix with ones at the diagonal positions which corre-
both known and unknown positions of the nonzero GDFT spond to K = {k1 , k2 , . . . , kK } and zeros elsewhere. Note
coefficients. the presence of the GDFT, X, is on both sides of this
equation, contrary to xs = Bx in (84). The reconstruc-
4.2.1. Known Coefficient Positions in GDFT tion formula then follows from
The previous analysis holds not only for a low-pass type
of the graph signal, x, and its corresponding GDFT, X, xs = Bx = BUX = BUCX.
but also for case of GDFT, X, with K nonzero values at as X = pinv(BUC)xs . The inversion
arbitrary, but known spectral positions, that is,
X = CX = pinv(BUC)xs
X(k) = 0 for k ∈
/ K = {k1 , k2 , . . . , kK }.
is possible for K nonzero coefficients of CX if the rank of
BUC is K (if there are K linearly independent equations),
that is
rank(C) = K = rank(BUC).
25
This condition is equivalent to (81) since the nonzero part submatrix of the measurement matrix, AM N , which
of matrix BUC is equal to AM K in (83). consists of the columns defined by the indices k1 and
k2 . The reconstructed vector y2 = A2 X2 , is re-
4.2.3. Unknown Coefficient Positions moved from the measurements, y, with the error,
The reconstruction problem is more complex if the po- e = y − y2 , now acting as the new measurement
sitions of nonzero spectral coefficients K = {k1 , k2 , . . . , kK } vector.
are not known. This case has been addressed within stan-
(iii) The procedure is iteratively repeated K times or un-
dard compressive sensing theory and can be formulated
til the remaining measurement values in e are negli-
as
gible. In the cases when the sparsity, K, is unknown,
min kXk0 subject to y = AM N X, (85)
the procedure is iterated until kek2 < ε, where ε is a
where kXk0 denotes the number of nonzero elements in X predefined precision.
(`0 pseudo-norm).
Example 11: Consider a sparse graph signal, of the sparsity
While the ways to solve this minimization problem are
degree K = 2, measured at vertices n = 2, 3, 4, 5, and 7, which
manifold, we here adopt a simple, two-step approach: takes the values
1. Estimate the positions K = {k1 , k2 , . . . , kK } of the y = [0.707, 1.307, 0.407, 1.307, 0.407]T ,
nonzero coefficients using M > K signal samples,
as shown in Fig. 24 (top panel). Our task is to reconstruct the
2. Reconstruct the nonzero coefficients of X at the esti- full signal, that is, to find the missing samples x(0), x(1), and
x(6).
mated positions K, along with the signal x at all ver-
The estimate positions of the nonzero elements in the GDFT,
tices, using the methods for the reconstruction with
X, the initial estimate, X0 , is calculated for given measure-
the known nonzero coefficient positions, described in ments, y, according to (86). Because K = 2, the positions of
Sections 4.1 and 4.2.1. The nonzero coefficients at the two nonzero coefficients are estimated as positions of the
positions K are calculated as XK = pinv(AM K )y. two largest values in X0 . In the considered case, K = {0, 3},
as shown in Fig. 24 (bottom panel). The GDFT coefficients
The nonzero positions of the GDFT in Step 1 can be are then reconstructed for the sparsity degree K = 2, as X2 =
estimated through the projection of measurements (avail- pinv(A52 )y, resulting in X(0) = 2, X(3) = 1.2, as illustrated
able signal samples), y, on the measurement matrix in Fig. 24 (bottom–right). Finally, the reconstructed graph
signal at all vertices, x = UX, is shown in the middle panel of
u0 (n1 ) u1 (n1 ) . . . uN −1 (n1 ) Fig. 24.
u0 (n2 ) u1 (n2 ) . . . uN −1 (n2 )
AM N = .. .. ..
.. 4.2.4. Unique Reconstruction Conditions
. . . .
u0 (nM ) u1 (nM ) . . . uN −1 (nM ) As is the case with the standard compressive sensing
problem, the initial GDFT estimate, X0 , will produce cor-
to give rect positions of the nonzero elements, X(k), and the re-
X0 = ATM N y, (86) construction will be unique, if
where the positions of K largest values in X0 are used as 1 1
K< 1+ ,
an estimate of the nonzero positions, K. This procedure 2 µ
can also be implemented in an iterative way [34], where where µ is equal to the maximum value of the inner prod-
(i) In the first iteration we assume K = 1 and pro- uct among any two columns of the measurement matrix,
ceed to estimate the largest spectral component in AM N (µ is referred to as the coherence index) [47].
the graph signal. Upon determining its position as For illustration of the uniqueness of reconstruction, re-
k1 = argmax|ATM N y|, the initially empty set of the call that a K-sparse signal can be written as
nonzero positions becomes K = {k1 }. The recon- K
X
structed vector y1 = A1 X1 , where X1 = pinv(AM 1 )y, x(n) = X(ki )uki (n),
is then removed from the measurements, y. In this i=1
case, the matrix AM 1 is a column of the matrix AM N
of which the initial estimate in (86) is equal to X0 =
defined by the index k1 . The difference e = y − y1
ATM N y = ATM N AM N X, or element-wise
is used as the measurement vector in the next step.
K K
(ii) The position of the second largest spectral compo-
X X X
X0 (k) = X(ki ) uk (n)uki (n) = X(ki )µ(k, ki ),
nent in the graph signal is estimated by solving k2 = i=1 n∈M i=1
argmax|ATM N e|. The set of nonzero positions now
becomes K = {k1 , k2 }. The first and the second where M = {n1 , n2 , . . . , nM } and
component of the graph signal are now estimated µ(k, ki ) =
X
uk (n)uki (n).
as X2 = pinv(AM 2 )y, where the matrix AM 2 is a n∈M
26
0 tract these two matrix equations we get a zero-vector on
× ×1 the right-side and a nonzero solution for the resulting vec-
tor,
2
" #
(1)
XK
3 4 X2K = (2) ,
−XK
6× 7 5 (a) requires the zero-valued determinant of AM 2K . The nonzero
determinant of AM 2K guarantees that two such, nonzero
(1) (2)
0 solutions, XK and XK , cannot exist. If all possible sub-
1 matrices AM 2K of the measurement matrix AM K are non-
sigular, then two solutions of sparsity K cannot exist, and
2 the solution is unique. The requirement that all reduced
3 4 measurement matrices corresponding to a 2K-sparse X are
7 nonsingular can be written in several forms, listed below
6 5 (b)
det{ATM 2K AM 2K } = d1 d2 · · · d2K 6= 0
|X0 (k)| |X (k)|
dmax 1 + δ2K
cond{ATM 2K AM 2K } = ≤ <∞
dmin 1 − δ2K
2
kAM 2K X2K k2
1 − δ2K ≤ dmin ≤ ≤ dmax ≤ 1 + δ2K
0 1 2 3 4 5 6 7 k 0 1 2 3 4 5 6 7 k (c) kX2K k2
2
Figure 24: Compressive sensing on graphs. (a) Available sam- where di are the eigenvalues of ATM 2K AM 2K , dmin is the
ples (measurements), y = [x(2), x(3), x(4), x(5), x(7)]T , with missing minimum eigenvalue, dmax is the maximum eigenvalue,
samples at n = 0, 1, 6. (b) Reconstructed signal, x, over the whole
set of vertices. (c) Initial estimate of the GDFT, X0 (k), (left), and and δ2K is the restricted isometry constant. All these con-
the reconstructed sparse GDFT, X(k), (right). ditions are satisfied if dmin > 0 or 0 ≤ δ2K < 1.
Noisy data require robust estimators, and thus more
strict bounds on dmin and δ2K . For example, it has been
If the maximum possible absolute value of µ(k, ki ) is de- shown that the condition 0 ≤ δ2K < 0.41 will guaran-
noted by µ = max|µ(k, ki )| (coherence index of AM N ) tee stable inversion of ATM 2K AM 2K and consequently a
then, in the worst case scenario, the amplitude of the robust reconstruction for noisy signals; in addition, this
largest component, X(ki ), (assumed with the normalized bound will allow for convex relaxation of the reconstruc-
amplitude 1), will be reduced for the maximum possi- tion problem [48]. Namely, the previous problem, (85), can
ble influence of other equally strong (unity) components be solved using the convex relation from the norm-zero to
1 − (K − 1)µ, and should be greater than the maximum a norm-one formulation given by
possible disturbance at k 6= ki , which is Kµ. From 1 −
(K − 1)µ > Kµ, the unique reconstruction condition fol- min kXk1 subject to y = AM N X.
lows; see also [34, 47].
In order to define other unique reconstruction condi- The solutions to these two problem formulations are the
tions, we shall consider again the solution to y = AM N X same if the measurement matrix satisfies the previous con-
which assumes a minimum number of nonzero coefficients ditions, with 0 ≤ δ2K < 0.41. The signal reconstruction
in X. Assume that the sparsity degree K is known, then problem can now be solved using optimization techniques,
a set of K measurements would yield a possible solution, such as gradient-based approaches or linear programming
XK , for any combination of K nonzero coefficients in X. methods [34, 48].
For another set of K measurements, we would obtain an-
other set of possible solutions, XK . Then, a common so- 4.3. Measurements as Linear Combinations of Samples
lution between these two sets of solutions would be the It should be mentioned that if some spectrum coeffi-
solution to our problem. For a unique solution, there are cients of a graph signal are strongly related to only a few
(1) (2) of the signal samples, then these signal samples may not
no two different K-sparse solutions XK and XK if all
possible matrices, ATM 2K AM 2K , are nonsingular. Namely, be good candidates for the measurements.
both of these two different solutions would satisfy mea- Example 12: Consider a graph with one of its eigenvectors
surement equations, of the form close to ui (n) = δ(n − m). This case is possi-
(1) ble on graphs, in contrast to the classic DFT analysis where
0K
XK the basis functions are spread over all sensing instants (ver-
AM 2K = y and AM 2K (2) = y, tices). A similar scenario is also possible in wavelet analysis or
0K XK
short time Fourier transforms, which also allow for some of the
h
(1) (2)
i transform coefficients to be related to only a few of the signal
where AM 2K = AM K AM K . Obviously, if we sub- samples. In the assumed simplified case, if a considered sparse
27
signal contains a nonzero coefficient, X(i), corresponding to To proceed with signal reconstruction, observe that if
ui (n) = δ(n − m), then all information about X(i) is contained the shifts are stopped after M < N steps, the available
in the graph signal sample x(m) only. This is prohibitive to signal samples will be x(n), x(n − 1), . . . , x(n − M + 1).
the principle of reduced number of samples, since an arbitrary From this reduced set of measurements/samples we can
set of available samples may not contain x(m). still recover the full graph signal, x, using compressive
In classical and graph data analysis this class of problems is
sensing based reconstruction methods, if the appropriate
solved by defining a more complex form of the measurements,
y(n), through linear combinations of all signal samples rather
reconstruction conditions are met.
than the original samples themselves. In this way, each mea- Principle of aggregate sampling on arbitrary graph.
surement, y(n), will contain information about all signal sam- The same procedure can be applied to a signal observed
ples, x(n), n = 0, 2, . . . , N − 1. in the same way on an arbitrary graph. Assume that we
Such measurements are linear combinations of all signal observe the graph signal at only one vertex, n, and obtain
samples, and are given by one graph signal sample
y0 (n) = x(n),
y(1) b11 b12 . . . b1N x(0)
y(2) b21 b12 . . . b2N x(1) which will be considered as the measurement y(0) = y0 (n).
.. = .. .. . . .. .. ,
. . . . . . This graph signal may now be “graph shifted” to pro-
y(M ) bM 1 bM 2 . . . bM N x(N − 1) duce y1 = Ax. Recall that in a one-step signal shift on a
graph, all signal samples will move by one step along the
or in a matrix form graph edges, as described in detail in Section 3.1 and illus-
trated in Fig. 25. The sample of a graph signal at vertex
y = BM N x. n will now be a sum of all signal samples that have shifted
to this vertex. Its value is obtained as an inner product of
The weighting coefficients for the measurements, bmn , in the mth row of the adjacency matrix, A, and the original
the matrix, BM N , may be, for example, drawn from a signal vector, x. The value of graph shifted signal at the
Gaussian random distribution. vertex n, is therefore given by
For reconstruction, the sparsity of a graph signal, x, X
should be again assumed in the GDFT domain. The re- y1 (n) = Anm x(m),
lation of the measurement vector, y, with this sparsity m
domain vector of coefficients, X, is then given by
and represents a linear combination of some of the sig-
y = BM N x = BM N UX = AM N X. nal samples, which is now considered as the measurement
y(1) = y1 (n).
The reconstruction is now obtained as a solution to One more signal shift on the graph yields
X
min kXk0 subject to y = (BM N U)X y2 (n) = A(2)
nm x(m),
m
or as a solution of the corresponding convex minimization (2)
problem, where Anm are the elements of matrix A2 = AA (see
Property M2 in Part I, Section 2.3). Such an observed
min kXk1 subject to y = (BM N U)X, value, after two one step shifts, y2 (n) at a vertex n, repre-
sents a new linear combination of some signal samples and
as described in Section 4.2.3. will be considered as the measurement y(2) = y2 (n).
If we proceed with shifts M = N times, a system of
4.4. Aggregate Sampling N linear equations, y = BM N x, is obtained from which
A specific form of a linear combination of graph signals all signal values, x(n), can be calculated. If we stop at
is referred to as aggregate sampling. M < N , the signal can still be recovered using compressive
For clarity, we shall first establish an interpretation of sensing based reconstruction methods if the signal is sparse
sampling in classical signal processing through its graph and the reconstruction conditions are met.
counterpart – sampling on a directed circular graph (Fig. Instead of M signal samples (instants) at one vertex,
6). Consider a graph signal, x, at a vertex/instant n. If we may use, for example, P samples at vertex n and
the signal is observed at this vertex/instant only, then its (M − P ) samples from a vertex m. Other combinations of
value is y0 (n) = x(n). Upon applying the graph shift vertices and samples may be also used to obtain M mea-
operator, we have y1 = Ax, then for the same vertex, n, surements and to fully reconstruct a signal.
we have y1 (n) = x(n − 1). If we continue this “shift and
observe” operation on the directed circular graph N times 4.5. Filter Bank on a Graph
at the same vertex/instant, n, we will eventually have all Subsampling and upsampling are the two standard op-
signal values x(n), x(n − 1), . . . , x(n − N + 1) observed at erators used to alter the scale at which the signal is pro-
vertex n. cessed. Subsampling of a signal by a factor of 2, followed
28
0 This is the basic operation used in multiresolution ap-
1 proaches based on filter banks and can be extended to
signals on graphs in the following way. Consider a graph
2 with the set of vertices V. Any set of vertices can be
3 4
considered as a union of two disjoint subsets E and H,
6 7 5 such that V = E ∪ H and E ∩ H = ∅. The subsampling-
upsampling procedure can then be performed in the fol-
y(0) = x(7) lowing two steps:
(a) signal x
1. Subsample the signal on a graph by keeping only
signal values on the vertices n ∈ E, while not altering
0 the original graph topology,
1
2. Upsample the graph signal by setting the signal val-
2 ues for the vertices n ∈
/ E to zero.
3 4
This combined subsampling-upsampling operation produces
6 7 5 a graph signal
y(1) = x(4) + x(5) + x(6) 1
f (n) = 1 + (−1)βE (n) x(n),
(b) shifted signal Ax 2
where (
Figure 25: Principle of aggregate sampling. (a) A graph signal x. (b) 0, if n ∈ E
Its graph shifted version Ax. For example, for a graph signal value βE (n) =
observed at the vertex n = 7 in the graph in (a) the measurement 1, if n ∈ H.
is y(0) = x(7), and the aggregate measurement at the same vertex,
n = 7, after the graph signal is shifted, is equal to y(1) = x(4) + The values of the resulting graph signal, f (n), are therefore
x(5) + x(6) in (b). These two observations, y(0) and y(1), would f (n) = x(n) if n ∈ E and f (n) = 0 elsewhere.
be sufficient to reconstruct a signal whose sparsity degree is K = 2 The vector form of the subsamped-upsampled graph
with nonzero values at the known spectral index positions, k1 and
k2 , if the reconstruction condition (81) is satisfied for the matrix signal, f (n), which comprises all n ∈ V, is given by
AM N = BM N U at the specified spectral index positions.
1 1
f= (x + JE x) = (I + JE )x, (87)
2 2
where JE = diag((−1)βE (n) ), n ∈ V.
The focus of our analysis will be on the two-channel
wavelet filter bank on a graph, shown in Fig. 27. As in
the classical wavelet analysis framework for temporary sig-
nals, such a filter bank provides decomposition of a graph
signal into the corresponding low-pass (smooth) and high-
2
pass (fast-varying) constituents. The analysis side (left
part of the system in Fig. 27) consists of two channels
with filters characterized by the vertex domain operators
HL (L) and HH (L), with the corresponding spectral do-
main operators HL (Λ) and HH (Λ). The operator HL (L)
acts as a low-pass filter, transferring the low-pass compo-
2 2
nents of the graph signal, while the operator HH (L) does
the opposite, acting as a high-pass filter. The low-pass fil-
ter, HL (H), is followed by a downsampling operator which
keeps only the graph signal values, x, at the vertices n ∈ E.
Figure 26: Principle of a signal, x(n), downsampling and upsampling Similarly, the high-pass filtering with the operator HH (L),
in the classical time domain. is subsequently followed by a downsampling to the vertices
n ∈ H. These operations are crucial to alter the scale at
by the corresponding upsampling, can be described in clas- which the graph signal is processed.
sical signal processing by The synthesis side (right part in Fig. 27), comprises
the complementary upsampling and filtering operations,
1 1 aiming to perform the graph signal reconstruction based
f (n) = x(n) + (−1)n x(n) = (1 + (−1)n )x(n) ,
2 2 on the upsampled versions, 21 (I + JE )HL (L)x and 21 (I +
JH )HH (L)x, of signals obtained on the filter bank anal-
as illustrated in Fig. 26. ysis side. Therefore, upon performing the upsampling of
29
Spectral solution. For the spectral representation of the
filter-bank signals in the domain of Laplacian basis func-
tions, we will use the decomposition of the graph Laplacian
+
in the form
1 1
F = UT f = (UT x + UT JE x) = (X + X(alias) ), (94)
2 2
where X(alias) = UT JE x is the aliasing spectral compo-
nent.
Figure 27: Principle of a filter bank for a graph signal. In the case of bipartite graphs, the matrix operator
UT JE produces the transformation matrix UT with re-
these signals onto the original set of vertices, V, by adding versed (left-right flipped) order of eigenvectors. This is
zeros to the complementary sets of vertices, filtering is obvious from (29) in Part I, since
performed by adequate low-pass, GL (L), and high-pass, UT JE = u0 u1 . . . uN −1 JE
T
GH (L), filters, to replace the zeros with meaningful values, T
as required for a successful reconstruction of the original u0E u1E uN −1E
=
signal. As in the classical wavelet analysis, to achieve the −u0H −u1H · · · −uN −1H
perfect (distortion-free) reconstruction it is necessary to T
= uN −1 uN −2 . . . u0 = UTLR
conveniently design the analysis filters, HL (L) and HH (L),
and the synthesis filters, GL (L) and GH (L), as well as to where
determine adequate downsampling and upsampling oper-
ukE ukE
ators. uk = , uN −1−k = , k = 0, 1, . . . N − 1,
ukH −ukH
It will be shown that the spectral folding phenomenon,
described by (29) in Part I, characterized by the specific and
spectral symmetry in the case of bipartite graphs, can ULR = uN −1 uN −2 ... u0
be used to form the basis for the two-channel filter bank is a left-right flipped version of the eigenvector matrix
framework discussed in this Section.
Consider a graph signal, x, and the filter-bank as in U = u0 u1 . . . uN −1 .
Fig. 27. If the graph signal, x, passes through a low-
pass analysis filter, HL (L), the output signal is HL (L)x. The element-wise form of equation (94) is given by
According to (87), the downsampled-upsampled form of 1
the output signal, HL (L)x, is given by 12 (I + JE )HL (L)x. F (k) = (X(k) + X(N − 1 − k)).
2
After the syntheses filter, GL (L), the graph signal output
For bipartite graphs and the normalized graph Laplacian,
becomes
1 we can write
fL = GL (L)(I + JE )HL (L)x. (88)
2 1
F (λk ) = (X(λk ) + X(2 − λk )).
The same holds for the high-pass part 2
1 The second term in F (λk ) represents an aliasing compo-
fH = GH (L)(I + JH )HH (L)x, (89) nent of the GDFT of the original signal.
2
The spectral representation of (92) is obtained with a
where JH = −JE = diag((−1)1−βE (n) ) and left-multiplication by UT and a right-multiplication by U,
JH + JE = 0. (90) UT GL (L)UUT HL (L)U + UT GH (L)UUT HH (L)U = 2I,
having in mind that we can add UT U = UUT = I be-
The overall output is a sum of these two signals, as illus-
tween GL (L) and HL (L), and between GH (L) and HH (L).
trated in Fig. 27, which after rearranging of terms gives
Using the spectral domain definition of the transfer func-
1 tions, UT HL (L)U = HL (Λ), we get the spectral domain
y = fL + fH = (GL (L)HL (L) + GH (L)HH (L))x+ form of the reconstruction condition (92) as
2
1
(GL (L)JE HL (L) + GH (L)JH HH (L))x. (91) GL (Λ)HL (Λ) + GH (Λ)HH (Λ) = 2I. (95)
2
For the aliasing part in equation (93), the left-multiplication
The perfect reconstruction condition, y = x, is then
is performed by UT , while the right-multiplication is done
achieved if
by UTLR . The first term in (93) is then of the form
GL (L)HL (L) + GH (L)HH (L) = 2I, (92) UT GL (L)UUT JE HL (L)ULR = UT GL (L)UUTLR HL (L)ULR
GL (L)JE HL (L) − GH (L)JE HH (L) = 0. (93) (R)
= GL (Λ)HL (Λ), (96)
30
since UT JE = UTLR and UTLR ULR = I. The term
(R)
HL (Λ) = UTLR HL (L)ULR = HL (2I − Λ)
is just a reversed order version of the diagonal matrix
HL (Λ), with diagonal elements HL (λN −1−k ) = HL (2−λk )
instead of HL (λk ).
The same holds for the second term in (93) which is
equal to GH (L)JH HH (L), yielding the final spectral form (a) (b)
of the aliasing condition in (93) as
Figure 28: Bipartite graph for the Haar wavelet transform with N =
GL (Λ)HL (2I − Λ) − GH (Λ)HH (2I − Λ) = 0. (97) 16 vertices. (a) Vertices in yellow are used for the low-pass part of
the signal and correspond to the set E, while the vertices in gray
An element-wise solution to the system in (92)-(93), for belong to the set H. This is the highest two-vertex resolution level
bipartite graphs and the normalized graph Laplacian, ac- for the Haar wavelet. (b) Graph for a four-vertex resolution level in
cording to (95) and (97), reduces to the Haar wavelet.
GL (λk )HL (λk ) + GH (λk )HH (λk ) = 2, (98)
GL (λk )HL (2 − λk ) − GH (λk )HH (2 − λk ) = 0. (99) since UT U = I, UTLR ULR = I, UT JE = UTLR , UTLR U = ILR ,
and ILR X = XUD , where ILR is an anti-diagonal (backward)
Remark 23: A quadratic mirror filter solution would be identity matrix, and XUD is the GDFT vector, X, with ele-
such that for the designed transfer function of the low-pass ments flipped upside-down.
analysis filter, HL (λ), the other filters are The same holds for the high-pass part in (89), to yield
GL (λ) = HL (λ), 1 T
U GH (L)(I + JH )HH (L)x
FH =
2
HH (λ) = HL (2 − λ), 1 1
= GH (Λ)HH (Λ)X − GH (Λ)HH (2I − Λ)XUD
GH (λ) = HH (λ) = HL (2 − λ). (100) 2 2
For this solution, the design equation is given by and
FL + FH = X.
HL2 (λ) + HL2 (2 − λ) = 2, (101)
while the aliasing cancellation condition, (99), is always Therefore, after the one-step filter-bank based decomposi-
satisfied. tion on a bipartite graph, we have a new low-pass signal, fL , for
which the nonzero values are at the vertices in E, and a high-
An example of such a system
√ would be an ideal low-pass pass signal, fH , with nonzero values only on H. Note that the
filter, defined by HL (λ) = 2 for λ < 1 and HL (λ) = 0 high-pass operator on the graph signal is the graph Laplacian,
elsewhere. Since HH (λ) = HL (2 − λ) holds for systems L, while the low-pass operator is 2I − L, which easily reduces
on bipartite graphs, this satisfies the reconstruction con- to |L|, for the normalized graph Laplacian used here.
dition. For the vertex domain realization, an approxima- Another simple transfer √ function that satisfies the design
tion of the ideal filter with a finite neighborhood filtering equation (101) is HL (λ) = 2 cos(πλ/4). A similar analysis
relation would be required. can also be done for this transfer function and other functions
Example 13: Consider a simple form of the low-pass system defined by (100). √
The considered
√ transfer functions HL (λ) = 2 − λ and
HL2 (λ) = 2 − λ, HL (λ) = 2 cos(πλ/4) have several disadvantages, the most
which satisfies the design equation, HL2 (λ) + HL2 (2 − λ) = 2. important being that they are not sufficiently smooth in the
It also satisfies the condition that its form is of low-pass type spectral domain at the boundary interval points [35]. In ad-
for the normalized Laplacian of bipartite graphs, HL2 (λ0 ) = dition, although the graph Laplacian, L, is commonly sparse
2 − λ0 = 2, since λ0 = 0, and HL2 (λmax ) = 2 − λmax = 0, as (with a small number of nonzero elements √ in large graphs), the
λmax = 2. The vertex domain system operators which satisfy transfer function form HL (L) = 2I − L is not sparse. This
all four quadratic mirror analysis and synthesis filters in (100), is the reason to use other forms which are sufficiently smooth
are toward the boundary points, along with their polynomial ap-
√ √ proximations, HL (Λ) = c0 Λ + c1 Λ2 + · · · + cM −1 ΛM −1 , with
HL (Λ) = 2I − Λ, GL (Λ) = HL (Λ) = 2I − Λ,
√ √ the coefficients c0 , c1 , . . . , cM −1 , that approximate HL (λ) and
HH (Λ) = HL (2I − Λ) = Λ, GH (Λ) = HH (Λ) = Λ. HH (λ) = HL (2 − λ) for each λ = λk , k = 0, 1, . . . , N − 1. This
topic will be addressed in detail on a general form of graphs in
The spectral domain filtering form for the low-pass part of
Section 7.
graph signal is then obtained from (88), as
1 T
The classic time-domain Haar wavelet (and scale) func-
FL = UT fL = U GL (L)(I + JE )HL (L)x tions are easily obtained for a bipartite graph, such that
2
1 T E = 0, 2, 4, . . . , N − 2 and H = 1, 3, 5, . . . , N − 1, with
= U GL (L)UUT (I + JE )HL (L)ULR UTLR UX the adjacency/weighting matrix defined by the elements
2
1 1 Amn = 1, for (m, n) ∈ {(0, 1), (2, 3), . . . , (N − 2, N − 1)},
= GL (Λ)HL (Λ)X + GL (Λ)HL (2I − Λ)XUD
2 2 as shown in Fig. 28(a). This adjacency matrix has the
31
0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0
4 0 0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0 0 −1 0 0 0 0
8 0 0 0 0 1 0 0 0 0 0 0 0 −1 0 0 0
10
0 0 0 0 0 1 0 0 0 0 0 0 0 −1 0 0
12
0 0 0 0 0 0 1 0 0 0 0 0 0 0 −1 0
14 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 −1
L= (102)
1 −1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 0 −1 0 0 0 0 0 0 1 0 0 0 0 0 0
5 0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0 0
7 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0
9 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0
11
0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0
13 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0
15 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 1
0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
block form as in equation (18), Part I. The correspond- way, which designates the diffusion process on a graph in
ing graph Laplacian is given in (102). Its eigenvectors time.
are equal to the wavelet transform functions. The bipar-
tite graph for the four-vertex resolution level in the Haar 5.1. Diffusion on Graph and Low Pass Filtering
wavelet transform is shown in Fig. 28(b). Consider the diffusion equation
5. Time-Varying Signals on Graphs ∂x/∂t = −αLx.
We shall denote a time-varying signal by xp (n), where Its discrete-time form, at a time instant p, may be ob-
n designates the vertex index and p the discrete-time in- tained by using the backward difference approximation of
dex. For uniform sampling in time, the index p corre- the partial derivative (∂x/∂t ∼ xp+1 − xp ), and has the
sponds to the time instant p∆t, where ∆t is the sampling form
interval. In general, this type of data can be considered xp+1 − xp = −αLxp+1
within the graph Cartesian product framework (given in or xp+1 (I + αL) = xp to produce
Property M15 , Section 2.3, Part I). The resulting graph
G = (V, B) follows as a Cartesian product of the given xp+1 = (I + αL)−1 xp .
graph G1 = (V1 , B1 ) and a simple path (or circular) graph
G2 = (V2 , B2 ) that corresponds to the classical uniformly On the other hand, the forward difference approxima-
samples time-domain axis. tion (∂x/∂t ∼ xp − xp−1 ) to the diffusion equation yields
Example 14: A graph topology for a time varying signal on
a graph is shown in Part I, Fig. 9, where the graph vertices xp+1 − xp = −αLxp
are designated by 1, 2, 3, 4, 5 and time instants are denoted as
the a, b, c vertices on the path graph. The resulting Cartesian or
product graph, for the analysis of this kind of signals, is shown xp+1 = (I − αL)xp .
in Part I, Fig. 9.
It is interesting to note that these iterative forms lead
The adjacency matrix of a Cartesian product of two to the minimization of the quadratic form of a graph sig-
graphs is then given by nal, Ex = xLxT , (see Section 4.2.1, Part I). The minimum
A = A1 ⊗ IN2 + IN1 ⊗ A2 = A1 ⊕ A2 , of this quadratic form can be found based on the steepest
descent method, whereby the signal value at a time instant
where A1 is the adjacency matrix of the graph of interest p is moving in the direction opposite to the gradient, to-
G1 , and A2 is the adjacency matrix for the path or circular ward the energy minimum position, with a step constant
graph, G2 , which designates the sampling grid, while N1 α. The gradient of the quadratic form, Ex = xLxT , is
and N2 denote, respectively, the number of vertices in G1 ∂Ex /∂xT = 2xL, which results in an iterative procedure
and G2 .
We will next consider a simple and important example xp+1 = xp − αLxp = (I − αL)xp . (103)
of a time-varying signal defined on graph in an iterative
32
This relation can be used for simple and efficient filtering of For some values of α < β, this system can be a good
graph signals (with the aim to minimize Ex as a measure of and computationally very simple approximation of a graph
signal smoothness). The spectral domain relation follows low-pass filter.
immediately, and has the form
1.5 1.5
Xp+1 = (I − αΛ)Xp
1 1
or for every individual component
0.5 0.5
Xp+1 (k) = (1 − αλk )Xp (k).
0 0
Recall that the eigenvalues, λk , represent the index of
smoothness for a spectral vector (eigenvector), uk , with a 0 2 4 0 2 4
small λk indicating smooth slow-varying elements of the 1.5 1.5
eigenvectors; therefore, for low-pass filtering we should re-
tain the slow-varying eigenvectors in a spectral represen- 1 1
tation of the graph signal. Obviously, these slow-varying
components will pass through this system since (1−αλk ) is 0.5 0.5
close to 1 for small λk , while the fast-varying components 0 0
with a larger λk , are attenuated. This iterative procedure
will converge if |1 − αλmax |< 1. 0 2 4 0 2 4
In a stationary state of a diffusion process, the trivial
minimal energy solution is obtained when Figure 29: Filter approximation in the spectral domain for a varying
number of iterations, K, using Taubin’s algorithm and the graph
lim Xp+1 (k) = (1 − αλk )p+1 X0 (k), Laplacian matrix of the graph in Fig. 8.
p→∞
that is, all components Xp+1 (k) tend to 0, except for the Example 15: Consider the graph from Fig. 8(a) and its graph
constant component, Xp+1 (0), for which λ0 = 0. This Laplacian, L. For the choice of parameters α = 0.1798 and
component therefore defines the stationary state (maxi- β = 0.2193, the spectral transfer function in (106) is shown in
Fig. 29 for the considered graph filter, and for the numbers of
mally smooth solution). In order to avoid this effect in the
iterations in Taubin’s algorithm K = 1, 5, 30 and 150. Observe
processing of data on graphs, and to retain several low-
how the transfer function, H(λk ), approaches the ideal low-pass
pass components (eigenvectors) in the signal, the iteration form as the number of iterations, K, increases.
process in (103) can be used in alternation with The task is next to low-pass filter the noisy signal from
Fig. 16(b), with the initial noisy signal is denoted by x0 . Then
xp+2 = (I + βL)xp+1 . (104)
x1 = (I − 0.1545L)x0 is calculated using the corresponding
This is the basis for Taubin’s α − β algorithm, presented graph Laplacian, followed by obtaining x2 = (I + 0.1875L)x1 .
next. In the third and fourth iteration, the signal values x3 = (I −
0.1545L)x2 and x4 = (I+0.1875L)x3 are calculated. This two-
5.2. Taubin’s α − β algorithm step iteration cycle is repeated K = 20 times. The resulting
signal is the same as the output of an ideal low-pass filter shown
When the two iterative processes in (103) and (104) are in Fig. 16(c).
used in a successive order, the resulting system on a graph Finally, the noisy signal from Fig. 3 was filtered using
is referred to as Taubin’s α−β algorithm. This algorithm is Taubin’s α − β algorithm, with α = 0.1 and β = 0.1, over
widely used for low-pass filtering of data on graphs, since K = 100 iterations, and the result is shown in Fig. 30. Ob-
it is very simple, and admits efficient implementation in serve the reduced level of additive noise in the output.
the vertex domain.
Definition: Taubin’s α−β algorithm is a two-step iterative
algorithm for efficient low-pass data filtering on graphs. Its 6. Random Graph Signals
two steps are defined in a unified way as
This section extends the concepts of data analytics for
xp+2 = (I + βL)(I − αL)xp . (105) deterministic signals on graphs addressed so far, to intro-
duce notions of random signals on graphs, their proper-
The corresponding element-wise transfer function spectral ties, and statistical graph-specific methods for their anal-
domain of the two iteration steps in (105) is given by ysis. The main focus is on wide-sense stationary (WSS)
H(λk ) = (1 + βλk )(1 − αλk ). data observed on graphs. In general, the stationarity of
a signal is inherently related to the signal shift operator
After K iterations of this algorithm, the spectral domain and its properties. We have already presented two ap-
transfer function can be written as proaches to define a shift on a graph (through the adja-
K cency matrix and the graph Laplacian, and their spectral
HK (λk ) = (1 + βλk )(1 − αλk ) . (106) decompositions). These will be used, along with other
33
Remark 27: For WSS random signals, their correlation
11
6 matrix, Rx = E{xxT }, is diagonalizable with the same
def
12 transform matrix, U, which defines the DFT, X = UH x,
def
10 with x = UX. The proof follows from
4
Rx = E{xxT } = E{UX(UX)H }
9
14
5 13 = UE{XXH }UH = UPx UH , (107)
2
and the fact that Px is a diagonal matrix for WSS signals.
8 0
1
The properties of the WSS signals in classical analyses,
3 presented in this subsection, will be used next to define the
15 corresponding properties of random signals on undirected
graphs.
7
6.2. Adjacency Matrix Based Definition of GWSS
Consider a real-valued white noise signal on a graph,
ε = {ε(n)}. Following Remark 24, a signal x on the graph
is graph wide sense stationary (GWSS) if it can be con-
Figure 30: The noisy signal from Fig. 3 was filtered using K = 100
iterations of the Taubin two-step algorithm with α = 0.1 and β = 0.1. sidered an output of a linear shift invariant system on a
PM −1
graph, H(A) = m=0 hm Am , which is driven by a white
noise input, ε, that is
general properties of WSS signals, to define the conditions
for wide sense stationarity of random signals on graphs x = H(A)ε.
[49, 50, 51, 52, 53, 54]. However the main obstacle toward
extending the classical statistical data analytics to graphs Remark 28: The autocorrelation matrix, Rx = E{xxT },
is that the shift on a graph typically does not preserve of a GWSS signal is diagonalizable with the eigenmatrix
2 2
signal energy (isometry property), that is, kAxk2 6= kxk2 . of the adjacency matrix, A, since (cf. Remark 27)
For completeness, we first provide a short review of
WSS definitions in classical signal processing, together with A = UΛU−1 = UΛUT
their properties. E{xxT } = UPx UT , (108)
6.1. Review of WSS and related Properties for Random where Px is a diagonal matrix. The values on the diag-
Signals in Standard Time Domain onal of matrix Px can be comprised into the vector px ,
Definition: A real-valued random signal, x(n), is WSS which represents the PSD of a graph signal, x, px (k) =
in the standard time domain if its mean value is time- E{|X(k)|2 }.
invariant, µx (n) = E{x(n)} = µx , and its autocorrela- To prove this property for a signal x = H(A)ε, con-
tion function is shift-invariant, that is, rx (n, n − m) = sider
E{x(n)x(n − m)} = rx (m). T
Remark 24: A random WSS time-domain signal, x(n), Rx = E{xxT } = E{H(A)ε H(A)ε } = H(A)H T (A),
can be considered as an output of a linear shift invariant
system with impulse response, h(n), which is driven by a since E{εεT } = I. Using H(A) = UT H(Λ)U, we obtain
white noise input, ε(n), with rε (n, m) = δ(n − m).
Rx = UT |H(Λ)|2 U,
Remark 25: In classical time domain, the eigenvectors,
uk , of the shift operator y(n) = x(n − 1), or in a matrix which concludes the proof that the matrix Px is diagonal
form y = Ax, are the DFT basis functions, with A =
Px = |H(Λ)|2 ,
UΛUH . This property is discussed in detail and proven
in Part I, Section 3.2.1, equations (24)-(25). with the diagonal elements equal to the PSD of signal x,
H
Remark 26: For a random signal, its DFT X = U x
px (k) = |H(λk )|2 .
is also a random signal with the power spectrum matrix
Px = E{XXH }, where UH is the DFT transformation The periodogram of a graph signal can be estimated
matrix. For WSS signals, the matrix Px is diagonal and using K realizations of the random signal, denoted by xi ,
has the power spectral density (PSD) as its diagonal values and is equal to the diagonal elements of the matrix
px (k) = DFT{rx (n)} = E{|X(k)|2 }. K K
1 X 1 X
P̂x = |Xi |2 = UT (xi xTi )U.
K i=1 K i=1
34
Consider a system on a graph, with a spectral domain 6.4. Spectral Domain Shift Based Definition of GWSS
transfer function H(Λ). Assume that the input signal to Consider an m-step shift on a graph defined using the
this system is GWSS, with PSD px (k). The PSD of the graph filter response
output graph signal, y(n), is then given by
N −1
py (k) = |H(λk )|2 px (k).
X
Tm {h(n)} = hm (n) = H(λk )uk (m)uk (n). (109)
k=0
This expression is conformal with the output power of a
standard linear system. The matrix form of this relation is given by
M −1
6.3. Wiener filter on a graph X
Th = H(L) = hm Lm = UH(Λ)UT , (110)
Consider a real-valued graph signal, s, which serves as
m=0
an input to a linear shift-invariant system on an undirected
graph, to yield a noisy output where Tm {h(n)} are the elements of Th .
M −1
Note that the graph filter response function is well lo-
calized on a graph. Namely, if we use, for example, the
X
x= hm Am s + ε.
m=0
(M −1)-neighborhood of a vertex n, within a filtering func-
tion of order M defined by H(Λ), then only the vertices
In the spectral domain, this system is described by within this neighborhood are used in the calculation of
X = H(Λ)S + E, graph filter response. From (110), we see that the local-
ization operator acts in the spectral domain and associates
where E is the GDFT of the noise, ε. the corresponding shift to the vertex domain.
Assume that the signal and noise are statistically in- Definition: A random graph signal, x(n), is GWSS if its
dependent, and that the noise is a zero-mean GWSS ran- autocorrelation function is invariant with respect to the
dom signal. The aim is to find the system function of the shift, Tm {rx (n)}, that is
optimal filter, G(Λ), such that its output Y = G(Λ)X,
estimates the GDFT of the input, S, in the least squares rx (m) = E{x(n)x(n − m)} = Tm {rx (n)}.
sense. This condition can be expressed as
Similar to (108), the autocorrelation matrix, Rx , of a
2 2
e2 = E{kS − Yk2 } = E{kS − G(Λ)Xk2 }. GWSS signal is diagonalizable based on the matrix of
eigenvectors of the graph Laplacian L, whereby
Upon setting the derivative of e2 with respect to the ele-
ments of G(Λ) to zero, we arrive at L = UΛUT . (111)
2E{(S − G(Λ)X)XT } = 0, For the basic autocorrelation we use
which results in the system function of the graph Wiener Rx = UPx (Λ)UT
filter in the form
E{SXT } E{S(H(Λ)S + E)T } so that
G(Λ) = =
E{XXT } E{(H(Λ)S + E)(H(Λ)S + E)T } N
X −1
H(Λ)Ps Tm {rx (n)} = px (λk )uk (m)uk (n)
= 2 k=0
H (Λ)Ps + Pε
where
or element-wise
Px (Λ) = URx UT
H(λk )ps (k)
G(λk ) = . is a diagonal matrix.
H 2 (λk )ps (k) + E(k)
When the noise is not present, the elements of the vector E 6.5. Isometric Shift Operator
are zero-valued, E(k) = 0 for all k, and the graph inverse Another possible approach may pbe based on the shift
filter (introduced in Section 3.5.5) directly follows. operator defined as Tm = exp(jπ L/ρ), where ρ is the
Remark 29: The above expressions for the graph Wiener upper bound on the eigenvalues, ρ = maxk {λk }, [55, 56].
filter are conformal with the standard frequency domain Physically, this operator casts the eigenvalues of the Lapla-
Wiener filter, given by cian, L, onto a unit circle, thus preserving in this way the
isometry property, since
Ps (ω)
G(ω) = , p p
Ps (ω) + Pε (ω) Th = exp(jπ L/ρ) = U exp(jπ Λ/ρ)UT . (112)
which again demonstrates the generic nature of Graph The property f (L) = Uf (Λ)UH was used above. Observe
Data Analytics. that for real-valued eigenvalues, λk , all eigenvalues of the
35
1
p 0 1
matrix exp(jπ Λ/ρ) p reside on the unit circle, with the
2
frequency 0 ≤ ωk = π λk /ρ ≤ π being associated to the 3
eigenvector uk . 4
However, with this setup the corresponding GDFT bases 6 5
7
are not necessarily orthogonal for a general graph struc-
ture. In turn, the work in [57] drew inspiration from (a) Directed graph structure. (b) Graph signal, x.
the classical definition of the unitary shift operator acting
on Hilbert spaces [58] to define an isometric shift oper-
ator with orthogonal GDFT bases. Since the adjacency
matrix, A, provides the minimal information required to
fully reflect the connectivity structure arising from the
(c) Backward shifted signal, Sx. (d) Backward shifted signal, Ax.
graph topology, and therefore to define the most elemen-
tary graph shift, the task of determining an isometric graph
shift operator can be formulated as finding the matrix S
closest to A in a Hilbert space, that is,
max hS, Ai = tr SAT
S
(e) Forward shifted signal, ST x. (f) Forward shifted signal, AT x.
s.t. ST S = SST = I
Figure 31: Graph signal shifts by the proposed isometric shift opera-
This can be achieved analytically by evaluating the singu- tor, S, evaluated on a directed graph. (a) Directed graph structure.
(b) A simple graph signal, x. (c) Backward shifted version of x,
lar value decomposition of A, given by A = UΣVT where given by Sx. (d) Backward shifted version, Ax. (e) Forward shifted
U, V ∈ RN ×N are respectively the left and right matrix of version, ST x. (f) Forward shifted version, AT x. The red arrows
singular vectors, and Σ ∈ RN ×N is the diagonal matrix of indicate the movement of the pulse at vertex n = 2 towards vertices
connected with (i) outgoing blue arrows in (a) for a forward shift
singular values. In this way, the backward shift operator
and (ii) to vertices connected with incoming blue arrows in (a) for a
can be expressed as backward shift.
S = UQVT (113)
in a natural analogy with classical time-frequency analysis
where [63, 64, 65].
I(N −1)×(N −1) 0(N −1)×1 It is important to note that, while the concept of win-
Q= (114)
01×(N −1) det(UVT ) dow functions for signal localization has been extended
ensures that det(S) = 1, so as to produce a proper rotation to signals defined on graphs [18, 66, 67, 68, 69], such ex-
matrix. The resulting matrix, S, is called the symmetric tensions are not straightforward, since, owing to inher-
orthogonalization of the matrix A, and is unique [59, 60]. ent properties of graphs as irregular but interconnected
The solution is also closely related to the orthogonal Pro- domains, even an operation which is very simple in clas-
crustes problem [61] and the Kabsch algorithm [62]. sical time-domain analysis, like the time shift, cannot be
Example 16: Consider the directed graph in Fig. 31(a) which
straightforwardly generalized to graph signal domain. This
has N = 8 vertices in the set V = {0, 1, 2, 3, 4, 5, 6, 7}. The has resulted in several approaches to the definition of the
backward and forward shifted versions of the signal in Fig. graph shift operator, and much ongoing research in this
31(b) were evaluated using both the elementary shift matrix, domain [18, 66, 67, 68, 69].
A, and the proposed isometric graph shift operator, S in (113). A common approach to signal windowing in the graph
Notice that, as desired, the signal energy was preserved when domain is to utilize the eigenspectrum of a graph signal
employing the isometric graph shift operator, S, while the en- to obtain window function for each graph vertex [5]. An-
ergy of the signals shifted through A increased. other possibility is to define the window support as a local
neighborhood for each vertex [69]. In either case, the lo-
calization window is defined based on a set of vertices that
7. Vertex-Frequency Representations
contain the current vertex, n, and all vertices that are close
Oftentimes in practical applications concerned with large in some sense to the vertex n, that is, a neighborhood of
graphs, we may not be interested in the analysis of the en- vertex n. In this monograph, special attention is devoted
tire graph signal, but rather in its local behavior. Indeed, to the class of local graph Fourier transform approaches
the Big Data paradigm has revealed the possibility of using which can be implemented in the vertex domain, since
smaller and localized subsets of the available information this domain often offers a basis for numerically efficient
to enable reliable mathematical analysis and local char- analysis in the case of very large graphs.
acterization of subsets of data of interest [12]. Our aim Notice that, as in classical signal analysis, a localization
in this section is to characterize the localized graph signal window should be narrow enough so as to provide good
behavior simultaneously in the vertex-frequency domain, localization of signal properties, but at the same time wide
36
enough to produce high resolution in the spectral domain. while matrix U is composed of the eigenvectors uk , with
With vertex-frequency analysis serving as a key to graph elements uk (n), k = 0, 1, . . . , N −1, of the graph Laplacian
signal estimation, filtering, and efficient representation, as its columns.
two forms of the local graph Fourier transform inversion Special cases:
are considered here, while the inversion condition is de-
fined within the frames framework, that is, based on the • For hm (n) = 1, the localized vertex spectrum is
analysis of energy of the graph spectrogram. A relation equal to the standard spectrum, S(m, k) = X(k),
between the graph wavelet transform and the local graph in (115) for each m; this means that no vertex local-
Fourier transform implementation and its inversion is also ization is performed.
established. • If hm (m) = 1 and hm (n) = 0 for n 6= m, the lo-
Remark 30: The energy versions of the vertex-frequency calized vertex spectrum
√ is equal to the graph signal,
representations are also considered, as these representa- S(m, 0) = x(m)/ N , for k = 0.
tions can be implemented without a localization window, In the following, we outline ways to create vertex do-
and they can serve as estimators of the local smoothness main windows with desirable localization characteristics,
index. and address two methods for defining graph localization
The reduced interference vertex-frequency distributions, window functions, hm (n):
which satisfy the marginal property and localize graph sig-
• Spectral domain definition of windows, hm (n), which
nal energy in the vertex-frequency domain are also defined,
are defined using their spectral basic function. The
and are subsequently related to classical time-frequency
spectral domain definition of the window is shown to
analysis, as a special case.
be related to the wavelet transform.
Consider a graph with N vertices, n ∈ V = {0, 1, . . . , N −
1}, which are connected with edges whose weights are • Vertex domain window definitions, with one method
Wmn . Spectral analysis of graphs is most commonly based bearing a direct relation to the spectral analysis of
on the eigendecomposition of the graph Laplacian, L, or the graph window, while the other method represents
the adjacency matrix, A. By default, we shall assume a purely vertex domain formulation.
the decomposition of the graph Laplacian, L, if not stated
otherwise. 7.1.1. Windows Defined in the GDFT Domain
The basic function of a window, h(n), can be conve-
7.1. Localized Graph Fourier Transform (LGFT)
niently defined in the spectral domain, for example, in the
The localized graph Fourier transform (LGFT), de- form
noted by S(m, k), can be considered as an extension of H(k) = C exp(−λk τ ), (116)
the standard time-localized (short-time) Fourier transform
where C denotes the “window amplitude” and τ > 0 is a
(STFT), and can be calculated as the GDFT of a signal,
constant which determines the window width in the spec-
x(n), multiplied by an appropriate vertex localization win-
tral domain. Notice that the graph shifted and “modu-
dow function, hm (n), to yield
lated” versions of this window are straightforwardly ob-
N
X −1 tained using the generalized convolution of graph signals,
S(m, k) = x(n)hm (n) uk (n). (115) defined in Section 3.9. The graph-shifted window in the
n=0 vertex domain is then defined by the IGDFT of H(k)uk (m),
to give the window localized at the vertex m, denoted by
In general, it is desired that a graph window function, hm (n), as in (57), in the form
hm (n), should be such that it localizes the signal content
N −1
around the vertex m. To this end, its values should be close X
to 1 at vertex m and vertices in its close neighborhood, hm (n) = h(n) ∗ δm (n) = H(k)uk (m)uk (n). (117)
while it should approach to 0 for vertices that are far from k=0
vertex m. For an illustration of the concept of localization An example of two windows obtained in this way is given
window on a graph see Fig. 33, panels (a) and (c). in Fig. 33(a), (b).
The localized GDFT in (115) admits a matrix notation, Observe that the exponential function in (116) corre-
S, and contains all elements, S(m, k), m = 0, 1, . . . , N − 1, sponds to a Gaussian window in classical analysis (thus of-
k = 0, 1, . . . , N − 1. The columns of S which correspond fering the best time-frequency concentration [63, 64, 65]),
to a vertex m are given by since graph signal processing on a path graph reduces to
classical signal analysis. In this case, the eigenvalues of
sm = GDFT{x(n)hm (n)} = UT xm , the graph Laplacian, λ, may be related to the frequency,
ω, in classical signal analysis as λ ∼ ω 2 .
where xm is the vector of which the elements, x(n)hm (n), Properties of graph window functions. The graph
are equal to the graph signal samples, x(n), multiplied by window which is localized at the vertex m, and defined by
the window function, hm (n), centered at the vertex m, (117), satisfies the following properties:
37
1 1
0 0
-1 -1
0.5 0.5
0 0
-0.5 -0.5
-1 0 1 (a) -1 0 1 (b)
99 5
79 74 8
93 59 52 51 31 24 13
8280 62 55
32 22
98 76 75 69 28
29 15 7
63 39 26
95 94 78 66 58 27 19 12 1
86 60 4240 37 23 17
81 65 53
54 35 11 6 4
89 67 61 57 48 41 25 20
85 84 44 2
96 92 38 34
68 64 56 45 43
97 83 18
90 88
73
71
50 30 21 0
77 49
91 36 33 9
87 72 70 47
46 1614 10 3
(c)
(d)
Figure 32: Concept of a signal on a graph. (a) Vertices on a three-dimensional manifold Swiss roll surface. (b) A graph representation
on the Swiss roll manifold. (c) Two-dimensional presentation of the three-dimensional graph from (b), with vertex colors defined by the
three smoothest graph Laplacian eigenvectors u1 (n), u2 (n), and u3 (n). (d) A signal observed on the graph in (c), which is composed of
three Laplacian eigenvectors (signal components). The supports of these three components are designated by different vertex colors. The
vertex-frequency representations are then assessed based on their ability to accurately resolve and localize these three graph signal components.
38
W1 : Symmetry, hm (n) = hn (m), which follows from the the value of S(m, k) in (119) becomes the standard STFT,
definition in (117). that is
W2 : A sum of all coefficients of a localized window, hm (n), N −1 N −1
1 X X 2π 2π 2π
is equal to H(0), since S(m, k) = x(n)H(p)ej− N mp ej N np e−j N nk ,
N 3/2 n=0 p=0
N −1 N −1 N −1
X X X N −1
hm (n) = H(k)uk (m) uk (n) 1 X
= x(n)h(n − m)e−j2πnk/N , (123)
n=0 k=0 n=0 N n=0
N −1
X √
= H(k)uk (m)δ(k) N = H(0), where h(n) is the inverse DFT of H(k).
k=0
Example 17: To illustrate the principle of local vertex-frequency
PN −1 √
with n=0 uk (n) = δ(k) N , following from the def- representations, consider the graph and the graph signal from
inition of the eigenvectors, uk (n). Fig. 32. A graph with N = 100 vertices, randomly placed
on the so called Swiss roll surface, is shown in Fig. 32(a).
W3 : The Parseval theorem for hm (n) has the form The vertices are connected with edges whose weights are de-
2
fined as Wmn = exp(−rmn /α), where rmn is the Euclidean
N
X −1 N
X −1 distance between the vertices m and n, measured along the
|hm (n)|2 = |H(k)uk (m)|2 . (118) Swiss roll manifold, and α is a constant. Small weight values
n=0 k=0 were hard-thresholded to zero, in order to reduce the number
of edges associated with each vertex to only a few strongest
These properties will be used in the sequel in the inversion ones. The so produced graph is shown in Fig. 32(b), and its
analysis of the LGFT. two-dimensional presentation in Fig. 32(c). Vertices are or-
Based on the above properties, the LGFT can now be dered so that the values of the Fiedler eigenvector, u1 (n), are
written as nondecreasing.
N −1
A signal on this graph was created so as to be composed
X of parts of three Laplacian eigenvectors. For the subset, V1 , of
S(m, k) = x(n)hm (n) uk (n) (119)
all vertices, V, which comprises the vertices with indices from
n=0
m = 0 to m = 29, the eigenvector with the spectral index
N −1 N −1
X X k = 8 was used. For the subset, V2 , with the vertex indices
= x(n)H(p)up (m)up (n) uk (n). (120) from m = 30 to m = 59, the signal was equal to the eigenvector
n=0 p=0 u66 (n), that is, with k = 66. The remaining vertices form the
vertex subset V3 , and the signal on this subset was equal to the
The modulated (frequency shifted) version of the window
eigenvector with the spectral index k = 27. The amplitudes of
centered at a vertex m and for a spectral index k will be these eigenvectors were scaled too.
referred to as the vertex-frequency kernel, Hm,k (n), which Consider now the vertex-frequency localization kernels,
is defined as
−1 Hm,k (n) = hm (n)uk (n),
NX
Hm,k (n) = hm (n)uk (n) = H(p)up (m)up (n) uk (n). √
shown in Fig. 33. The constant eigenvector, u0 (n) = 1/ N ,
p=0 was used in the panel shown in Fig. 33(a) at m = 34. In this
(121) case, the√localization window, h34 (n), is shown since H34,0 (n) =
Using the kernel notation, it becomes obvious that the h34 (n)/ N . The illustration is repeated in the panel in Fig.
LGFT in (120), for a given vertex m and a spectral index k, 33(c) for the vertex m = 78. The frequency shifted version of
physically represents a projection of a graph signal, x(n), these two vertex-domain kernels, shown in Figs. 33(a) and (c),
onto the graph kernel, Hm,k (n), that is, are given respectively in Figs. 33(b) and (d), where Hm,20 (n) =
hm (n)u20 (n) is shown for m = 34 and m = 78, respectively.
N −1
X Next, the vertex-frequency representation, S(n, k), using
S(m, k) = hHm,k (n), x(n)i = Hm,k (n)x(n). (122) the LGFT and the localization window defined in the spectral
n=0 domain is shown in Fig. 34. From this representation, we can
clearly identify the three constituent signal components, within
Remark 31: The classical STFT, a basic tool in time- their intervals of support. The marginal properties, such as
frequency analysis, can be obtained as a special case of the the projections of S(n, k) onto the vertex index axis and the
GDFT when the graph is directed and circular. For this spectral index axis, are also clearly distinguishable. From the
type of graph, the eigendecomposition of the adjacency marginal properties, we can conclude that the considered graph
matrix√produces complex-valued eigenvectors of the form signal in hand is spread over all vertex indices, while its spectral
uk (n) N = exp(j2πnk/N ). Then, having in mind the localization is dominated by the three spectral indices which
complex nature of these eigenvectors, correspond to the three components of the original graph signal.
In an ideal case of vertex-frequency analysis, these marginals
N
X −1 N
X −1 should respectively be equal to |x(n)|2 and |X(k)|2 , which is
S(m, k) = x(n)H(p)u∗p (m)up (n) u∗k (n), not the case here.
n=0 p=0
39
(a) (b)
H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n)
(c) (d)
H34,20 (n) = h34 (n)u20 (n) H78,20 (n) = h78 (n)u20 (n)
1 1
0 0
-1 -1
0.5 0.5
0 0
-0.5 -0.5
-1 0 1 (e) -1 0 1 (f)
H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n)
Figure 33: Illustration of localization kernels, Hm,k (n) = hm (n)uk (n), for vertex-frequency analysis based on spectral domain defined windows
PN −1
within the local graph Fourier transform, S(m, k) =
√ n=0 x(n)Hm,k (n). (a) Localization kernel H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n), for a
constant eigenvector, u0 (n) = 1/ N , centered at the vertex m = 34. (b) The same localization kernel as in (a) but centered at the vertex
m = 78. (c) Localization kernel, H34,20 (n) = h34 (n)u20 (n), centered at the vertex m = 34 and frequency shifted by u20 (n). Notice that the
variations in kernel amplitude indicate the effects of modulation of the localization window, hm (n). (d) The same localization kernel as in
(c), but centered at the vertex m = 78. (e) Three-dimensional representation of the kernel H34,0 (n) = h34 (n)u0 (n). (f) Three-dimensional
representation of the kernel H78,0 (n) = h78 (n)u0 (n).
7.1.2. Spectral Domain Localization of the LGFT Remark 32: Recall that the classical time-frequency anal-
Recall that the classical STFT admits frequency lo- ysis counterpart of (124) is [63]
calization in the spectral domain; this is achieved based
N −1
on the DFT of the original signal and a spectral domain 1 X 2π
S(m, k) = √ X(p)H(k − p)ej N mp .
window. For graph signals, we may also adapt this ap- N p=0
proach to perform signal localization in the spectral do-
main, whereby the LGFT is obtained as an inverse GDFT
of X(p) that is localized by a spectral domain window,
H(k − p), which is centered around spectral index k, that 7.1.3. LGFT Realization with Band-Pass Functions
is Assume that the GDFT of the localization window,
NX−1 hm (n), corresponds to the transfer function of a band-
S(m, k) = X(p)H(k − p) up (m). (124) pass system on a graph, centered at an eigenvalue, λk , and
p=0 around it, and that it is defined in the form of a polynomial
Note that this form of the LGFT can be entirely imple- given by
mented in the graph spectral domain. The spectral domain −1
LGFT form in (124) can be implemented using band-pass Hk (λp ) = h0,k + h1,k λp + · · · + hM −1,k λM
p , (126)
transfer functions, Hk (λp ) = H(k − p), as with (M − 1) as the polynomial order and k = 0, 1, . . . , K,
N −1 where K is the number of spectral bands.
The vertex shifted version of the window, hm (n), has
X
S(m, k) = X(p)Hk (λp ) up (m). (125)
p=0 the GDFT of the form, GDFT{h(n)∗δm (n)} = H(p)up (m).
Therefore, the inverse GDFT of Hk (λp )up (m) represents
a vertex domain kernel, where Hk (λp ) is centered at the
40
0.5 0
100
Classical time-frequency domain kernel. To addi-
tionally clarify the previous two forms of kernels, we will
observe their special cases for a circular directed graph and
80
write the kernels in the classical time-frequency domain.
The kernel defined by (121) uses low-pass function H(k)
60 and assumes the following form
N −1
40 1 X 2π 2π 2π
Hm,k (n) = H(p)e−j N mp ej N np e−j N kn
N 3/2 p=0
20 1 2π 1
= h(n − m)e−j N kn = √ hk (n − m),
N N
0
which is shifted for m in time, and modulated√ by the
2π
0.08 kth eigenvector elements u∗k (n) = e−j N kn / N , to achieve
0.06 centering around the spectral index k.
0.04 The classical time-frequency domain form of the kernel
0.02 in (127) is given by
0
0 20 40 60 80 100 N −1
1 X 2π 2π
Hm,k (n) = Hk (λp )e−j N mp ej N np
N p=0
Figure 34: Local vertex-frequency spectrum calculated using the
LGFT and the vertex-frequency localized kernels defined in the spec- N −1
1 X 2π 2π 1
tral domain, as in (121). From this representation, observe that the = H(p − k)e−j N mp ej N np = √ hk (n − m),
graph signal consists of three distinct components located at spectral N p=0 N
indices k = 8, k = 66, and k = 27, with the corresponding vertex
index subsets V1 , V2 , and V3 , where V1 ∪ V2 ∪ V3 = V. The marginal
(vertex and spectrum-wise) properties are shown in the panels to the
where hk (n−m) is the temporary shifted version of hk (n) =
right and below the vertex-frequency representation. Observe that, IGDFT{Hk (λp )} = IDFT{H(k − p)}, which corresponds
while the graph signal is spread across all vertices, its spectral con- to the already frequency shifted (band-pass) transfer func-
tent is localized at the three spectral indices which correspond to the tion Hk (λp ) = H(p − k).
constituent signal components. In an ideal case of vertex-frequency
analysis, these marginals should be respectively equal to |x(n)|2 and In the case of kernel (127), the local vertex-frequency
|X(k)|2 . transform for a vertex, m, and a spectral index, k, becomes
N
X −1
spectral index k by definition, while up (m) corresponds to S(m, k) = Hm,k (n)x(n)
the shift in the vertex domain which centers the window at n=0
the vertex m. In other words, this kernel, centered around N
X −1 N
X −1 N
X −1
the spectral index k and vertex m, is defined as = x(n)Hk (λp )up (m)up (n) = X(p)Hk (λp )up (m).
n=0 p=0 p=0
N −1
X (128)
Hm,k (n) = Hk (λp )up (m)up (n). (127)
p=0 The relation (128) can be written in a vector form as
Remark 33: It is important to emphasize crucial differ- M
X −1
ence between the vertex-frequency kernels in (121) and sk = UHk (Λ)UT x = Hk (L)x = hp,k Lp x, (129)
(127). The kernel in (121) is defined based on the low- p=0
pass transfer function H(k), such as in (116), appropri-
ately shifted in the vertex domain and the spectral do- where sk is the column vector with elements S(m, k), m =
main, to be centered at a vertex m and at a spectral in- 0, 1, . . . , N − 1, and the property of the eigendecomposi-
dex k. This is achieved involving adequate modulation tion of a matrix polynomial is used in derivation. The
terms uk (n) and up (m). The transfer function in the ker- number of bands (shifted transfer functions, Hk (λp ), k =
nel given by (127), Hk (λp ), is centered at k by definition 0, 1, . . . , K) is equal to K + 1 and is not related to the total
(126). Hence, it is needed to perform the spectral modula- number of indices, N .
tion only, by up (n), in order to center the kernel, Hm,k (n), Example 18: Consider the simplest decomposition into a low-
at a vertex m. Therefore, the main difference between the pass and high-pass part of a graph signal, with K = 1. In this
kernels in (121) and (127) is that the spectral shift in (121) case, the two values, k = 0 and k = 1, represent respectively
is achieved by a modulation in the vertex domain using the low-pass part and high-pass part of the graph signal. Such
a decomposition can be achieved using the graph Laplacian
uk (n), while in (127) the kernel is directly shifted (defined
with h0,0 = 1, h0,1 = −1/λmax , and h1,0 = 0, h1,1 = 1/λmax ,
as a pass-band function) in the spectral domain.
41
where the coefficients are chosen so as to form a simple lin- 1
early decreasing function of λp for the low-pass, and a linearly 0.8
increasing function of λp for the high-pass, in the correspond-
ing transfer functions. These low-pass and high-pass transfer 0.6
functions are respectively given by
0.4
λp λp
H0 (λp ) = (1 − ), H1 (λp ) = , 0.2
λmax λmax
which leads to the vertex domain implementation of the LGFT 0
0 10 20 30 40 50 60 70 80 90 100
in the form
(a)
1 1
s0 = (I − L) x, s1 = L x.
λmax λmax
To improve the spectral resolution, we can employ the same
transfer function, but divide the low-pass part into its low-pass 1
and high-pass part. The same can be performed for the high-
pass part, to obtain
0.5
L 2 L L L2
s00 = I− x, s01 = 2 I− x, s11 = 2 x.
λmax λmax λmax λmax
The factor 2 appears in the new middle pass-band, s01 , since 0
the low-high-pass and the high-low-pass components are the 0 1 2 3 4 5 6 7
same. (b)
A division into (K +1) bands would correspond to the terms
of a binomial form
K
(I − L/λmax ) + L/λmax x, 1
with the corresponding transfer functions in the vertex domain
given by
0.5
!
K 1 K−k 1 k
Hk (L) = I− L L .
k λmax λmax
0
0 1 2 3 4 5 6 7
Example 19: Consider the transfer functions Hk (λp ), p = (c)
0, 1, . . . , N − 1, k = 0, 1, . . . , K in the spectral domain, cor-
responding to the binomial form terms for K = 25, which Figure 35: Exemplar of transfer functions in the spectral domain. (a)
The spectral domain transfer functions Hk (λp ), p = 0, 1, . . . , N −
are shown in Fig. 35(a). These functions are used for the
1, k = 0, 1, . . . , K which correspond to the binomial form terms
LGFT calculation at vertex indices m = 0, 1, . . . , N − 1 in the for K = 50. (b) The transfer functions Hk (λp ), p = 0, 1, . . . , N −
k = 0, 1, . . . , K bands for the graph and signal from Fig. 32. 1, k = 0, 1, . . . , K which correspond to the raised cosine (Hann)
Since the bands are quite spread out, the resulting LGFT is window form for K = 25. (c) The spectral index-varying (wavelet-
also spread along the frequency axis. The frequency concentra- like) transfer functions Hk (λp ), p = 0, 1, . . . , N − 1, k = 0, 1, . . . , K
tion can be improved by reassigning the values of S(m, k) to which correspond to the raised cosine (Hann) window form for K =
the position of their maximum value along the frequency band 13. The transfer function H9 (λ) is designated by the thick black line
for each considered domain, while its discrete values at λp , H9 (λp ),
index, k, for each vertex index, m. The so reassigned LGFT are shown in gray, in panels (b) and (c).
values are given in Fig. 36.
Of course, any band-pass function, Hk (Λ), can be used
in (125) or (128) to produce the LGFT in the form where (ak , bk ] and (bk , ck ], k = 1, 2, . . . , K, define the spec-
tral bands for Hk (Λ). For uniform bands within 0 ≤ λ ≤
sk = UHk (Λ)UT x = Hk (L)x. (130) λmax , the intervals can be defined by
Commonly used examples of such band-pass functions are λmax
the spline or raised cosine (Hann window) functions. We ak = ak−1 +
K
will next use the general form of the shifted raised cosine λmax
functions as the transfer functions, defined by bk = ak +
K
λmax
2 π ak λ ck = ak + 2 (132)
sin 2 bk −ak ( ak − 1) , for ak < λ ≤ bk
K
Hk (λ) = cos2 π bk ( λ − 1) , for b < λ ≤ c with a1 = 0 and limλ→0 (a1 /λ) = 1. The initial transfer
2 ck −bk bk k k
function, H0 (λ), is defined using only 0 = b0 ≤ λ ≤ c0 =
0, elsewhere, λmax /K, while the last transfer function, HK (λ), is defined
(131) using the interval aK < λ ≤ bK = λmax in (131).
42
100
80
25
60
20
40
15
20
10
5 20 40 60 80 100
20 40 60 80 100 Figure 37: Vertex-frequency representation from Fig. 36(c) with the
(a) axis of the eigenvalue index, p, instead of the frequency band index,
k. The same value of LGFT, S(m, k), is assigned to each spectral
index, p, when λp ∈ ( ak +b
2
k bk +ck
, 2 ], and without any scaling.
25
20
The raised cosine transfer function satisfy the following
condition
XK
15 Hk (λp ) = 1. (133)
k=0
10 The conditions for graph signal reconstruction from the
LGFT will be discussed in Section 7.2.
5 Example 20: The shifted raised cosine functions, defined by
(131) and (132), are shown in Fig. 35(b) for the graph from
Fig. 32, for K = 25. These functions are used for the LGFT
20 40 60 80 100
calculation of the graph signal from Fig. 32 at the vertex
(b) indices m = 0, 1, . . . , N − 1, and in (K + 1) spectral bands,
k = 0, 1, . . . , K. The absolute LGFT values are given in Fig.
36(b). Spectral resolution depends on the number of bands K,
10 with a larger number of spectral bands resulting in a higher
spectral resolution.
8 Example 21: The experiment from Examples 19 and 20 is
repeated with varying bounds of the spectral intervals in the
6 raised cosine transfer functions Hk (λp ), p = 0, 1, . . . , N − 1,
k = 0, 1, . . . , K. The spectral index-varying (wavelet-transform
4 like) form of the raised cosine transfer functions Hk (λp ), p =
0, 1, . . . , N −1, k = 0, 1, . . . , K, is defined by the interval bounds
5
2 λmax (1.5 + p)/11.5 , for p = 0, 1, 2, . . . , 10,
20 40 60 80 100
ak ∈ {0, 0.004, 0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9},
(c) bk ∈ {0.004, 0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9, 4.8},
ck ∈ {0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9, 4.8, 7.63},
Figure 36: Vertex-frequency representation of a three-component sig-
nal in Fig. 32(d). (a) The LGFT of the signal from Fig 32(d), calcu- k = 1, 2, . . . , 9,
lated using the transfer functions for frequency selection given in Fig. and depicted in Fig. 35(c). The LGFT values, S(m, k), cal-
35(a). The LGFT values, S(m, k), were reassigned to the position
culated with the so-obtained transfer functions, Hk (λp ), are
of its maximum value along the frequency band index, k, for each
vertex index, m. (b) The LGFT of the signal from Fig 32(d), calcu- shown in Fig. 36(c). In order to illustrate the change of resolu-
lated using the transfer functions for frequency selection given in Fig. tion in this case, the LGFT was reassigned to each eigenvalue
35(b). The LGFT values, S(m, k), were reassigned to the positions λp , p = 0, 1, . . . , N − 1, and shown in Fig. 37. As in classi-
of their maximum values along the frequency band index, k, for each cal wavelet transform, the spectral resolution is lower for the
vertex index, m. (c) The LGFT of the signal from Fig 32(d), calcu- higher spectral indices.
lated using the wavelet-like transfer functions for frequency selection
given in Fig. 35(c).
7.1.4. Signal Adaptive LGFT
The spectral graph wavelet-like transform is just an ex-
ample of varying spectral transfer functions in the LGFT,
43
where the spectral resolution is the highest (spectral wavelet
1 functions narrowest) for small values of the smoothness in-
dex, λp . The spectral resolution decreases as the spectral
0.5
wavelet functions become wider for large smoothness index
0 values, Fig. 35(c). In general, the change of resolution may
be arbitrary and signal adaptive, for example, the resolu-
tion may be higher for the spectral intervals of λ which are
0 1 2 3 4 5 6 7 (a) rich in signal components and lower within the intervals
where there are no signal components.
Before introducing an example with a signal adaptive
LGFT, we will modify the transfer functions, Hk (λp ), in
1 (131) to satisfy the condition
K
X
0.5 Hk2 (λp ) = 1. (134)
k=0
as this will be important for the frame-based LGFT inver-
0
0 1 2 3 4 5 6 7 sion.
(b) Notice that a simple transformation of the transfer
functions, Hk (λp ) → Hk2 (λp ), would allow for the condi-
PK PK
tion k=0 Hk2 (λp ) = 1 to hold instead of k=0 Hk (λp ) =
16 1. This means that a simple removal of squares in the
14 sine and cosine functions in (131) would produce a form
PK
12 to satisfy the condition k=0 Hk2 (λp ) = 1. Both of these
conditions will be used in Section 7.2 in various approaches
10
to the graph signal reconstruction from the LGFT.
8 By removing the squares in the sine and cosine func-
6 tions in (131), their first derivative loses continuity in λ
4 at the end interval points. In order to preserve continuous
derivatives, the arguments in the sine and cosine functions
2
can be mapped by a polynomial,
20 40 60 80 100
vx (x) = x4 (35 − 84x + 70x2 − 20x3 ), for 0 ≤ x ≤ 1,
(c)
with vx (0) = 0 and vx (1) = 1. In this way, we arrive
100 at the Meyer wavelet-like transfer functions [70] for the
LGFT calculation, given by
80
π ak λ
sin 2 vx bk −ak ( ak − 1) , for ak < λ ≤ bk
60
Hk (λ) = cos π v
bk λ
2 x ck −bk ( bk − 1) , for bk < λ ≤ ck
40
0, elsewhere.
20 (135)
The initial transfer function, k = 0, and the last transfer
function, k = K, are calculated using only the half of the
20 40 60 80 100 interval, as explained after the spectral band definition in
(d) relation (132).
Example 22: The transfer functions of the form defined in
Figure 38: A graph signal and transfer functions in the spectral (135) are used with signal adaptive intervals. These intervals
domain for a signal adaptive LGFT. (a) Graph signal in the spec-
are defined in such a way that they are small (fine) around λ,
tral domain, X(p), as a function of the eigenvalues, λp . (b) The
spectral domain transfer functions Hk (λp ), p = 0, 1, . . . , N − 1, where a significant signal spectral content is detected, and are
k = 0, 1, . . . , K which satisfy the condition K
P 2 big (rough) around λ where the signal spectral content is low,
k=0 Hk (λp ) = 1, with
K = 16. (c) The LGFT of the signal from Fig 32(d), calculated as in Fig. 38(a) and (b). The intervals are narrow (with a high
using the transfer functions for frequency selection given in (b). (d) resolution) around the three signal components at λ = 0.38,
Vertex-frequency representation from (c) with the eigenvalue (spec- λ = 1.87, and λ = 4.62. Vertex-frequency representation with
tral) index, p, axis instead of the frequency band index, k. The same these transfer functions is shown in Fig. 38(c) and (d) with the
value of LGFT, S(m, k), is assigned to each spectral index, p, when
spectral band index, k, and the assigned eigenvalue (spectral)
λp ∈ ( ak +b
2
k bk +ck
, 2 ], without any scaling.
44
index, p, as a spectral axis. Fine intervals around the spectral
Table 2: Coefficients, hi,k , i = 0, 1, . . . , M − 1, k = 0, 1, . . . , K, for
signal components allowed for high spectral resolution repre- the polynomial calculation of the LGFT, sk , of a signal, x , in various
sentation, as in Fig. 38(c), with a smaller number of transfer spectral bands, k, shown in Fig. 39(b). The obtained LGFT of the
functions K + 1 = 17. A wider interval width for the third three-component signal from Fig. 32(d) is given in Fig. 40(a).
component resulted in a lower spectral resolution than in the
sk = (h0,k I + h1,k L + h2,k L2 + h3,k L3 + h4,k L4 + h5,k L5 )x
case of the other two components.
k h0,k h1,k h2,k h3,k h4,k h5,k
7.1.5. Polynomial LGFT Approximation 0 1.062 −1.925 1.168 −0.3115 0.03776 −0.001702
Bandpass LGFT functions, Hk (λ), k = 0, 1, . . . , K, 1 −0.002 1.773 −1.655 0.5357 −0.07250 0.003508
of the form (131) or (135) can be implemented using the 2 −0.154 1.016 −0.601 0.1295 −0.01155 0.000349
Chebyshev finite (M −1)-order polynomial approximation, 3 0.005 −0.301 0.621 −0.2674 0.04200 −0.002225
P̄k,M −1 (λ), k = 0, 1, . . . , K, of the form 4 0.089 −0.748 0.869 −0.3042 0.04217 −0.002040
5 0.060 −0.381 0.319 −0.0704 0.00461 0.000000
M −1 6 −0.024 0.277 −0.430 0.2055 −0.03570 0.002040
ck,0 X
7 −0.076 0.598 −0.714 0.2814 −0.04292 0.002225
P̄k,M −1 (λ) = + ck,m T̄m (λ). (136)
2 m=1 8 −0.027 0.159 −0.122 0.0198 0.00177 −0.000349
9 0.087 −0.699 0.868 −0.3662 0.06140 −0.003508
This leads to the vertex domain implementation of the 10 −0.026 0.220 −0.293 0.1333 −0.02435 0.001536
spectral LGFT form, given by
sk = P̄k,M −1 (L)x, operations is needed to calculate the LGFT in the vertex do-
main, with (K + 1) spectral bands, using a polynomial whose
for k = 0, 1, 2, . . . , K, with order is (M − 1). The number of nonzero elements in the graph
Laplacian is denoted by NL .
M −1
ck,0 X
P̄k,M −1 (L) = + ck,m T̄m (L), (137)
2 m=1
7.1.6. The Spectral Graph Wavelet Transform
As in classical signal processing, wavelet coefficients
= h0,k I + h1,k L + h2,k L2 + · · · + h(M −1),k LM −1
can be defined as a projection of a graph signal onto the
as discussed in Section 3.5.4 and shown in Table 2. The wavelet kernel functions. Assume that the basic form for
polynomial form in (137) uses only the (M −1)-neighborhood the wavelet definition in the spectral domain is a band-pass
in calculation of the LGFT for each considered vertex, function, H(λ). The wavelet in spectral domain then rep-
without the need for eigendecomposition analysis, thus sig- resents a scaled version of H(λ) in scale si , i = 1, 2, . . . , K,
nificantly reducing the computational cost. and is denoted by H(si λ). The wavelet kernel is already,
Example 23: Consider the shifted transfer functions, Hk (λ), k =
by definition, of a high-pass form. Now, in the same way
0, 1, . . . , K, defined by (131) and (132), as in the case of the kernel form of the LGFT in (128), the
PKshown in Fig. 39(a), graph wavelet transform (spectral graph wavelet transform
for K = 10. Functions Hk (λ) satisfy k=0 Hk (λ) = 1, which
is numerically confirmed and designated by the horizontal dot- – SGWT) is defined using the wavelet kernel, ψm,si (n),
ted line in 39(a). Each individual transfer function, Hk (λ), is N −1
approximated using the Chebyshev polynomial, P̄k,M −1 , k = X
ψm,si (n) = H(si λp )up (m)up (n), (138)
0, 1, . . . , K, as detailed in Section 3.5.4, with three polynomial
p=0
orders defined by M = 6, M = 20 and M = 80. These poly-
nomial approximations are shown in P Fig. 39(b), (c) and (d). which corresponds to the LGFT kernel, Hm,k (n), defined
In each considered case, summations K k=0 P̄k,M −1 (λ) are cal- in (127). This yields the wavelet coefficients given by
culated. It can be observed that for different values of M , the
summations in all considered cases are very close to 1, thus N
X −1
guaranteeing numerically stable invertibility of the LGFT, as W (m, si ) = ψm,si (n)x(n) =
discussed later. n=0
The so obtained approximations of transfer functions, Hk (λ), N
X −1 N
X −1 N
X −1
are used for the LGFT based vertex-frequency analysis. Ab- H(si λp )x(n)up (m)up (n) = H(si λp )X(p)up (m).
solute LGFT values, calculated for the three-component graph n=0 p=0 p=0
signal from Fig. 32(d), are shown in Fig. 40(a),(b) and (c),
for M = 6, M = 20 and M = 80. Low resolution in Fig. The wavelet coefficients may be interpreted as the IGDFT
40(a) is directly related to the imprecise and very wide (with a of H(si λp )X(p), that is
low spectral resolution) approximation of the spectral transfer
functions for M = 6, in Fig. 39 (b). Notice that high values of W (m, si ) = IGDFT{H(si λp )X(p)}. (139)
the polynomial order, (M − 1), increase calculation complexity
and require wide vertex neighborhood in the calculation of the Remark 34: We will use the notation H(si λ) = Hi (λ)
LGFT. with the corresponding matrix function form Hi (Λ). No-
Based on the analysis of calculation complexity in Section tice that this scale-based indexing is opposite to the classi-
3.5.4 we may conclude that an order of KM NL of arithmetic cal frequency band indexing. The largest scale for H(s1 λ),
45
10
1 8
6
0.5
4
0
0 1 2 3 4 5 6 7 2
20 40 60 80 100
(a)
1
10
0.5
8
0
0 1 2 3 4 5 6 7 6
4
1 2
20 40 60 80 100
0.5 (b)
0
0 1 2 3 4 5 6 7 10
8
1 6
4
0.5
2
0
0 1 2 3 4 5 6 7 20 40 60 80 100
(c)
Figure 39: Chebyshev approximation of LGFT transfer functions, Figure 40: Vertex-frequency representation of a three-component sig-
which correspond to the raised cosine window in the spectral domain. nal in Fig. 32(d). The LGFT is based on raised cosine (Hann win-
(a) Original transfer functions Hk (λ), k = 0, 1, . . . , K, for K = 10. dow) like bandpass transfer functions for frequency selection, with
PK
The dotted horizontal line designates k=0 Hk (λ). (b) Polyno-
K = 10, approximated using the Chebyshev polynomials of various
mial Chebyshev approximations, P̄k,M −1 (λ), k = 0, 1, . . . , K, with order, as shown in Fig. 39 (b), (c), (d). (a) The LGFT of the signal
M = 6. (c) Polynomial Chebyshev approximations, P̄k,M −1 (λ), k = from Fig 32(d), calculated using the Chebyshev polynomial approx-
0, 1, . . . , K, with M = 20. (d) Polynomial Chebyshev approxima- imation of transfer functions given in Fig. 39 (b), with M = 6.
tions, P̄k,M −1 (λ), k = 0, 1, . . . , K, with M = 80. The dotted hor- (b) The LGFT of the signal from Fig 32(d), calculated using the
izontal line designates
PK transfer functions for frequency selection given in Fig. 35(c), with
k=0 P̄k,M −1 (λ), which is close to 1 in all
considered approximations, thus guaranteeing stable transform in- M = 20. (c) The LGFT of the signal from Fig 32(d), calculated us-
vertibility. Transfer function H6 (λ) and approximations, P̄6,M −1 (λ), ing the transfer functions for frequency selection given in Fig. 35(d),
are designated by the thick black line. with M = 80. Low resolution in (a) can be directly related with
low M = 6 used in approximation in Fig. 39 (b). The resolution is
considerably improved for M = 20.
46
1 < s1 λ ≤ M , is obtained for the smallest s1 , 1/s1 < In classical wavelet transforms the dyadic scheme with M = 2
λ ≤ M/s1 , where M > 1 is the coefficient of the scale is commonly used. The last value of the scale factor, sK =
changes, which will be explained later. The associated M K /λmax /M , is defined by K and indicates how close the last
spectral wavelet transfer function, H(s1 λ) = H1 (λ), corre- wavelet transfer function is to λ = 0.
sponds to the highest frequency band. The wavelet trans- The polynomial function, vx (x), is defined by
fer function in scale sK , H(sK λ) = HK (λ), is associated vx (x) = x4 (35 − 84x + 70x2 − 20x3 ), for 0 ≤ x ≤ 1, with
with the lowest frequency band. Notation for the spectral
scale function (low-pass transfer function complementary vx q(0) = vx (0) = 0, vx q(M − 1) = vx (1) = 1. (142)
to H(sK λ) within the lowest spectral interval) is G(λ). The wavelet transfer functions,
The spectral scale function, G(λ), plays the role of low-
pass transfer function with spectral index 0 in the LGFT. Hi (λ) = H(si λ),
Therefore, K spectral wavelet transfer functions H(si λ), are of a band-pass type. The main property (condition for the
i = 1, 2, . . . , K, along with the scale function G(λ), cover reconstruction) is that the wavelet functions in two successive
exactly K + 1 spectral bands as in the LGFT case. scales satisfy the following property
According to (47), we can write
Hi2 (λ) + Hi+1
2
(λ)
wi = Hi (L)x, (140)
π si λ
π si λ
= cos2 vx q( − 1) + sin2 vx q( − 1) = 1,
where wi a column vector with elements W (m, si ), m = 2 M 2 M
0, 1, . . . , N − 1. within
If Hi (λ) = H(si λ) can be approximated by a polyno- M < si λ ≤ M 2 .
mial in λ, Hi (λ) ≈ Pi (λ), then the relation PK
This property implies i=1 H 2 (si λ) = 1 for all λ except in the
last interval, sK λ ∈ [0, M 2 ]. To handle the low-pass spectral
wi ≈ Pi (L)x, (141)
components (the interval for λ closest to λ = 0), the low-pass
follows, where Pi (L) is a polynomial in the graph Lapla- type scale function, G(λ)), is added in the form
cian (see Section 3.5.4 and Example 23).
K−1
0 ≤ λ ≤ M/sK =λmax /M
1, for
Example 24: The wavelet transform (vertex-scale) representa-
tion of a three-component signal in Fig. 32(d), obtained using G(λ) = cos π2 vx q( sM Kλ
− 1) , for M < sK λ ≤ M 2
the Meyer-like graph wavelet in the spectral domain, λ, will be
0, elsewhere.
illustrated here. As in classical wavelet transform, the wavelet
in the first scale should correspond to the high-pass transfer
function with nonzero values in the interval λmax /M < λ ≤ Remark 35: The number of wavelet transfer functions,
λmax , where M > 1 is the coefficient of the scale changes. In K, does not depend on the other wavelet parameters. A
classical wavelet transforms the dyadic scheme with M = 2 is
large value of K will only increase the number of intervals
commonly used. The scale based indexing is opposite to the
and the resolution (producing smaller width of the first
classical frequency indexing, where large indices indicate the
high frequency content. The Meyer-like graph wavelet in the interval defined by λmax /M K−1 ) toward λ → 0, as shown
first scale is defined by [70, 71] in Fig. 41(a), (b), and (c).
sin π v q(s λ − 1) , for 1 < s λ ≤ M, Remark 36: The wavelet transfer functions, H(si λ), in-
x 1 1
H(s1 λ) = 2 cluding the low-pass scale function, G(λ), defined in Ex-
0, elsewhere. ample 24 satisfy the relation
For 2 ≤ i ≤ K the Meyer-like graph wavelet is given by K
X
H 2 (si λ) + G2 (λ) = 1.
π
sin vx q(si λ − 1) , for 1 < si λ ≤ M
2 i=1
H(si λ) = cos π v q( si λ − 1) , for M < s λ ≤ M 2
2 x M i
Example 25: For q = 1, M = 2, and K = 9 the Meyer
0, elsewhere, wavelet functions are given in Fig. 41(a). The Meyer wavelet
functions for q = 3, M = 4/3, K = 13 and q = 9, M = 10/9,
where q = 1/(M − 1). The initial interval is defined by s1 =
K = 45 are shown in Fig. 41(b) and (c). The vertex-frequency
M/λmax , so that 1 < si λ ≤ M corresponds to λmax /M <
representation of the signal from Fig. 32 using these three sets
λ ≤ λmax , while the other interval bounds are defined using a
of wavelet transfer functions are shown in Fig. 42(a),(b), and
geometric sequence of scale factors,
(c).
1
si = si−1 M = s1 M i−1 = M i. Polynomial SGWT approximation. Chebyshev ap-
λmax
proximation of the wavelet functions, H(si λ) = Hi (λ), in
Observe that the larger the scale factor si (and the scale in- the form
dex i), the narrower the transfer function, H(si λ), while the
M −1
progression coefficient is ci,0 X
P̄i,M −1 (λ) = + ci,m T̄m (λ), (143)
M = (q + 1)/q > 1. 2 m=1
47
1
100
0.5 80
60
0
0 1 2 3 4 5 6 7
(a) 40
20
1
20 40 60 80 100
(a)
0.5
100
0
0 1 2 3 4 5 6 7 80
(b)
60
40
1
20
0.5
20 40 60 80 100
0 (b)
0 1 2 3 4 5 6 7
(c) 100
Figure 41: Exemplars of Meyer wavelet functions (acting as transfer
functions in the wavelet transform), shown in the spectral domain. 80
(a) Band-pass Meyer wavelet functions H(si λ), i = 1, 2, . . . , K and
the low-pass scale function G(λ), for K = 9 and M = 2. (b) Band-
60
pass Meyer wavelet functions H(si λ), i = 1, 2, . . . , K and the low-
pass scale function G(λ), for K = 13 and M = 3/2. (c) Band-
pass Meyer wavelet functions H(si λ), i = 0, 1, . . . , K and the low- 40
pass function G(λ), for K = 45 and M = 10/9. Transfer functions
H(s2 λ), H(s2 λ), H(s5 λ) are designated by the tick black line, for
each of the considered setups in (a), (b) and (c), respectively; their 20
values at λp are shown in gray.
20 40 60 80 100
can be used for the vertex domain wavelet transform im- (c)
plementation
Figure 42: Vertex-frequency representation of a three-component sig-
M −1 nal in Fig. 32(d). (a) The Meyer wavelet transform of the signal from
ci,0 X
Fig 32(d), calculated using the transfer functions for frequency se-
P̄i,M −1 (L) = + ci,m T̄m (L),
2 m=1
lection given in Fig. 41(a). (b) The Meyer wavelet transform of
the signal from Fig 32(d), calculated using the transfer functions
i = 0, 1, 2, . . . , K for frequency selection given in Fig. 41(b). (c) The Meyer wavelet
transform of the signal from Fig 32(d), calculated using the Meyer
using only the (M − 1)-neighborhood of each considered wavelet transform transfer functions for frequency selection given
in Fig. 41(c). Wavelet values were reassigned to spectral indices,
vertex, and without any graph Laplacian eigendecompo-
p, in order to illustrate the change in resolution. The same value
sition analysis. The Chebyshev polynomials can be cal- of SGWT, W (m, k), is assigned to each spectral index, p, when
culated recursively, as in (39), with a change of variables λp ∈ ( ak +b
2
k bk +ck
, 2 ], without any scaling.
and the recursive implementation as described in detail in
Examples 5 and 23.
48
7.1.7. Windows Defined Using the Vertex Neighborhood so that a graph signal which is localized around a vertex
In order to show that the window, hm (n), which is lo- m, may be formed based on this matrix, as
calized at a vertex m can also be defined using the vertex
neighborhood, recall that the distance, dmn , between ver- xm (n) = hm (n)x(n) = PD (n, m)x(n).
tices m and n is equal to the length of the shortest walk The LGFT representation of a graph signal, x(n), then
from vertex m to vertex n, and that dmn takes integer becomes
values. Then, the window function can be defined as a
N −1 N −1
function of vertex distance, in the form X X
S(m, k) = x(n)hm (n) uk (n) = x(n)PD (n, m) uk (n),
hm (n) = g(dmn ), n=0 n=0
(145)
where g(d) corresponds to any basic window function in with the vertex-frequency kernel given by
classical signal processing. For example, we can use the
Hann window, given by Hm,k (n) = hm (n)uk (n) = PD (n, m)uk (n). (146)
1 This allows us to arrive at the matrix form of the LGFT,
hm (n) = 1 + cos(πdmn /D) , for 0 ≤ dmn < D, given by
2
S = UT PD ◦ [x, x, . . . , x] , (147)
where D is the assumed window width.
For convenience, window functions for every vertex can where [x, x, . . . , x] is an N × N matrix, the columns of
be calculated in a matrix form as follows: which are the signal vectors, x.
For a rectangular function g(d) = 1, and for any d < D,
• For the vertices for which the distance is dmn = 1,
the LGFT can be calculated recursively with respect to the
window functions are defined trough an adjacency
window width, D, as
(neighborhood one) matrix A1 = A. In other words,
the vertices which belong to the one-neighborhood
SD = SD−1 + UT AD−1 ◦ [x, x, . . . , x] . (148)
of a vertex, m, are indicated by unit-value elements
in the mth row of the adjacency matrix A (in un-
Example 26: Consider the local vertex-frequency represen-
weighted graphs). In weighed graphs, the corre-
tation of the signal from Fig. 32, using vertex domain defined
sponding adjacency matrix A can be obtained from
windows. The localization kernels, Hm,k (n) = hm (n)uk (n), are
the weighting matrix W as A = sign(W). shown in Fig. 43 for two vertices and two spectral indices. Ob-
serve that for the spectral index k = 0, the localization kernel
• Window functions for vertices m and n, for which
is proportional to the localization function hm (n), given in Fig.
the distance is dmn = 2 are defined by the matrix
43(a) and (c) for the vertices m = 34 and m = 78. Frequency
modulated forms of these localization functions are shown in
A2 = (A A1 ) ◦ (1 − A1 ) ◦ (1 − I),
Figs. 43(b) and (d), for the same vertices and k = 20.
where the symbol denotes the logical (Boolean) A vertex domain window is next used to analyze the graph
signal from Fig. 32. The vertex-frequency representation,
matrix product, ◦ is the Hadamard (element-by-element)
S(n, k), obtained with the LGFT and the vertex domain lo-
product, and 1 is a matrix with all elements equal to calization window is given in Fig. 44. Again, we can observe
1. The nonzero elements of the mth row of the ma- three constituent graph signal components in three distinct ver-
trix A A1 then designate the vertices that are con- tex regions. The marginals of S(n, k) are also shown in the right
nected to the vertex m with walks of length K = 2 and bottom panels.
or lower. It should be mentioned that the element-
by-element multiplication of (A A1 ) by matrix Remark 37: Directed graphs. The vertex neighbor-
(1 − A1 ) removes the vertices connected with walks hood, as a set of vertices that can be reached from the
of length 1, while the multiplication by (1 − I) re- considered vertex by a walk whose length is at most D,
moves the diagonal elements from (A A1 ). may be also defined on directed graphs. In this case, this
approach corresponds to one-sided windows in classical sig-
• For dmn = d ≥ 2, we arrive at a recursive relation for nal analysis.
the calculation of a matrix which will give the infor- If we want to define two-sided window, then we should
mation about the vertices separated by the distance also include all vertices from which we can reach the con-
d. Such a matrix has the form sidered vertex by walk whose length is at most D. This
means that for a directed graph we should assume that
Ad = (A Ad−1 ) ◦ (1 − Ad−1 ) ◦ (1 − I). (144) vertices with distance dmn = 1 form the considered vertex
m are the vertices from which we can reach vertex m with
The window matrix for an assumed graph window width, walk of length 1. In this case A1 = A+AT where addition
D, can now be defined as is logical operation (Boolean OR). The matrix A2 is
PD = g(0)I + g(1)A1 + · · · + g(D − 1)AD−1 , A2 = (A A + AT AT ) ◦ (1 − I) ◦ (1 − A1 ).
49
(a) (b)
H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n)
(c) (d)
H34,20 (n) = h34 (n)u20 (n) H78,20 (n) = h78 (n)u20 (n)
1 1
0 0
-1 -1
0.5 0.5
0 0
-0.5 -0.5
-1 0 1 (e) -1 0 1 (f)
H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n)
Figure 43: Localization kernels for vertex-frequency analysis, Hm,k (n) = hm (n)uk (n), for the case of vertex domain defined windows in the
PN −1
local graph Fourier transform, S(m, k) =
√ n=0 x(n)Hm,k (n). (a) Localization kernel H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n), for a constant
eigenvector, u0 (n) = 1/ N , centered at the vertex m = 34. (b) The same localization kernel as in (a), but centered at the vertex m = 78.
(c) Localization kernel, H34,20 (n) = h34 (n)u20 (n), centered at the vertex m = 35 and frequency shifted by u20 (n). Observe the variations in
kernel amplitude, which indicate a modulation of the localization window, hm (n). (d) The same localization kernel as in (c), but centered at
the vertex m = 78. (e) Three-dimensional representation of the kernel H34,0 (n) = h34 (n)u0 (n). (f) Three-dimensional representation of the
kernel H78,0 (n) = h78 (n)u0 (n).
This procedure could be continued for walks up to the Example 27: The concentration measure, M(τ ) = kSk1 /kSkF ,
desired maximal length D. for the signal from Fig. 32, the window given in (116), and for
For a circular directed graph in this way, we will get various τ is shown in Fig. 45, along with the optimal vertex
the classical STFT with symmetric window. frequency representation. This representation is similar to that
shown in Fig. 34, where an empirical value of τ = 3 was used,
with the same localization window and kernel form.
7.1.8. Window Parameter Optimization The optimal τ can be obtained in only a few steps through
The concentration of local vertex spectrum representa- the iteration
tion can be measured using the normalized one-norm [72],
as τk = τk−1 − α M(τk−1 ) − M(τk−2 ) ,
N −1 N −1
1 X X 1
M= |S(m, k)|= kSk1 , (149) with α a step-size parameter.
F m=0 F
k=0 The optimization of parameter τ can also be achieved
where v trough graph uncertainty principle based techniques [32,
uN −1 N −1
uX X 14].
F = kSkF = t |S(m, k)|2
m=0 k=0 7.2. Inversion of the LGFT
is the Frobenius norm of matrix S. Alternatively, any The inversion relation of the LGFT, calculated using
other norm kSkpp , with 0 ≤ p ≤ 1 can be used instead any of the presented localization (window) forms, will next
of kSk1 . Recall that norms with p close to 0 are noise be considered in a unified way; the two approaches for the
sensitive, while the norm with p = 1 is the only convex LGFT inversion here are: (i) inversion by summation of
norm, which hence allows for gradient based optimization LGFT and (ii) kernel based inversion.
[72].
50
1 0 42
100
40
80
38
60
36
40 0 10 20 30
(a)
20 1 0.5 0
100
0
80
0.1
60
0.05
40
0
0 20 40 60 80 100
20
Figure 44: Local vertex-frequency spectrum calculated using the
LGFT and vertex neighborhood windows, as in (146). This represen- 0
tation immediately shows that the graph signal consists of three com-
ponents located at spectral indices k = 8, k = 66, and k = 27, with
0.06
the corresponding vertex indices in their respective vertex subsets V1 ,
V2 , and V3 , where V1 ∪ V2 ∪ V3 = V. The marginal properties are 0.04
also given in the panels to the right and below the vertex-frequency
representation, and they differ from the ideal ones given respectively 0.02
by |x(n)|2 and |X(k)|2 .
0
0 20 40 60 80 100
(b)
7.2.1. Inversion by the Summation of the LGFT
The reconstruction of a graph signal, x(n), from its Figure 45: Principle of the optimization of localization window. (a)
local spectrum, S(m, k), can be performed through an in- Measure of the concentration of graph spectrogram for a varying
spectral domain window parameter τ . (b) The corresponding opti-
verse GDFT of (119), based on the graph windowed signal mal vertex-frequency representation, calculated with τ = 7, together
N −1 with its marginals.
X
x(n)hm (n) = S(m, k) uk (n) (150)
k=0
is a projection of the LGFT onto the spectral index axis.
followed by a summation over all vertices, m, to yield For windows obtained using the generalized graph shift in
N −1 N −1 (144), this conditions is always satisfied since H(0) = 1.
1 X X
x(n) = PN −1 S(m, k)uk (n). (151) The condition
PN −1
m=0 hm (n) = 1 can be enforced by
m=0 hm (n) m=0 k=0 normalizing the elements of the matrix Ad , d = 1, 2, . . . , D−
Remark 38: If the windows, hm (n), for every vertex, n, 1 in (144), prior to the calculation of matrix PD , in such
satisfy the condition a way that the sum of each of its columns is equal to 1,
which allows us to arrive at
N
X −1
hm (n) = 1, N
X −1 N
X −1 D−1
X
m=0 hm (n) = PD (n, m) = g(d) = const.
m=0 m=0 d=1
then the reconstruction does not depend on the vertex
index, n, or in other words such reconstruction is vertex In general, the local vertex spectrum, S(m, k), can also
independent. This becomes clear from be calculated over a reduced set of vertices, m ∈ M ⊂ V.
N
X −1 N
X −1 N
X −1 In this case, the summation over m in the reconstruction
x(n) = S(m, k)uk (n) = X(k)uk (n), (152) formula should be executed over only the vertices m ∈ M,
m=0 k=0 k=0 while
P a vertex-independent reconstruction is achieved if
where m∈M hm (n) = 1.
N
X −1
X(k) = S(m, k)
m=0
51
7.2.2. Inversion of the LGFT with Band-Pass Functions Remark 39: This kind of kernel-based inversion is vertex-
PM −1
For the LGFT, defined in (129) as sk = p=0 hp,k Lp x, invariant if the sum over all vertices, m, is invariant with
the inversion is obtained by a summation over all spectral respect to n and is equal to 1, that is
index shifts, k = 0, 1, . . . , K, that is
N
X −1
K
X K N
X X −1 K
X h2m (n) = 1. (157)
p
sk = hp,k L x = Hk (L)x = x, (153) m=0
k=0 k=0 p=0 k=0
If the LGFT, S(m, k), is calculated over a reduced set
PK of vertices, m ∈ M ⊂ V, then the P vertex independent
if k=0 Hk (L) = I. This condition is equivalent to the
following spectral domain form reconstruction condition becomes m∈M h2m (n) = 1.
(b) For the LGFT with spectral shifted spectral windows,
K
X defined in (128), the kernel based inversion is of the form
Hk (Λ) = I (154)
k=0 N
X −1 X
K
PK x(n) = S(m, k)Hm,k (n) (158)
since U k=0 Hk (Λ)UT = I and UT U = I. The condi- m=0 k=0
tion in (154) is used to define the transfer functions in Fig.
35. if the following condition
K
X
7.2.3. Kernel-Based Inversion Hk2 (λp ) = 1 (159)
Another approach to the inversion of the local vertex k=0
spectrum, S(m, k), follows the Gabor expansion frame-
work [63], whereby the local vertex spectrum, S(m, k), is is satisfied for all λp , p = 0, 1, 2, . . . , N − 1.
projected back to the vertex-frequency localized kernels, The inversion formula in (158), with condition (159),
Hm,k (n). The inversion for two forms of the LGFT, de- follows from
fined as in (120) and (128), will be analyzed. N
X −1 X
K
(a) For the LGFT defined in (120), the sum of all of its S(m, k)Hm,k (n) (160)
projections to the localized kernels, Hm,k (n), is m=0 k=0
N −1 X
K N −1 N −1
−1 N −1 −1 N −1
N N
X X X
= X(p)Hk (λp )up (m) Hk (λl )ul (m)ul (n).
X X X X
S(m, k)Hm,k (n) = S(m, k)hm (n)uk (n) m=0 k=0 p=0 l=0
m=0 k=0 m=0 k=0
N −1 N −1 PN −1
Since m=0 up (m)ul (m) = δ(p − l), the last expression
X X
= IGDFT{S(m, k)}IGDFT{hm (n)uk (n)} reduces to the graph signal, x(n),
k→i k→i
m=0 i=0
N
X −1 N
X −1 K N
X X −1
= [x(i)hm (i)][hm (n)δ(n − i)] X(p)Hk (λp )Hk (λp )up (n) = x(n), (161)
m=0 i=0 k=0 p=0
N
X −1 N
X −1
= x(n)h2m (n) = x(n) h2m (n), (155) if the transfer functions, Hk (λp ), k = 0, 1, . . . , K, satisfy
m=0 m=0 the condition in (159) for all λp .
where IGDFT denotes the inverse GDFT transform. Par- 7.2.4. Vertex-Varying Filtering
seval’s theorem for graph signals Filtering in the vertex-frequency domain may be imple-
N −1 N −1 mented using a vertex-frequency support function, B(m, k).
The filtered LGFT is then given by
X X
x(n)y(n) = X(k)Y (k)
n=0 k=0
Sf (m, k) = S(m, k)B(m, k),
was used in the derivation. In this form of the LGFT all
possible spectral shifts, k = 0, 1, . . . , N − 1, are used. and the filtered signal, xf (n), is obtained by the inversion
The inversion formula for the local vertex spectrum, of Sf (m, k) using the above mentioned inversion methods.
S(m, k), which yields the original graph signal, x(n), then The filtering support function, B(m, k), can be obtained,
becomes for example, by thresholding noisy values of the local ver-
tex spectrum, S(m, k).
N −1 N −1
1 X X Example 28: Consider the graph signal, x(n), from Fig. 32(d),
x(n) = PN −1 S(m, k)Hm,k (n). (156)
2 also shown in Fig. 46 (a), and its version corrupted by an
m=0 hm (n) m=0 k=0
additive white Gaussian noise, at the signal-to-noise ratio of
SN Rin = 5.3 dB, given in Fig. 46 (b). The LGFT, S(m, k)
52
or in other words, that the product of the number of
nonzero signal values, kxk0 , and the number of its nonzero
DFT coefficients, kXk0 , is greater or equal than the total
number of signal samples N ; they cannot simultaneously
assume small values.
To arrive at the uncertainty principle for graph signals,
(a) consider a graph signal, x, and its spectral transform, X,
in a domain of orthonormal basis functions, uk (n). Then,
the uncertainty principle states that [32, 14, 73, 74]
1
kxk0 kXk0 ≥ . (163)
maxk,m {|uk (m)|2 }
This form of the uncertainty principle is generic, and in-
deed for the basis functions uk (n) = √1N exp(j2πnk/N ),
(b) the standard DFT uncertainty principle form in (162) fol-
lows.
Remark 40: Note, however, that in graph signal process-
ing, the eigenvectors/basis functions can assume quite dif-
ferent forms than in the standard DFT case. For example,
when one vertex is loosely connected with other vertices,
then max{|uk (m)|2 } → 1 and even kxk0 kXk0 ≥ 1 is pos-
(c) sible for the uncertainty condition in (163). This means
that, unlike the classical Fourier transform-based
Figure 46: Vertex-varying filtering of a graph signal. (a) The original
graph signal, x(n), from Fig. 32 (d). (b) The graph signal, x(n),
time and frequency domains, a graph signal can be
corrupted by an additive white Gaussian noise, at SN Rin = 5.3 dB. well localized in both the vertex and the spectral
(c) The graph signal, xf (n), after vertex-varying filtering based on domains.
thresholding of the LGFT of noisy graph signal, S(m, k), with the
final signal-to-noise ratio SN Rout = 10.36 dB. Example 29: For the graph shown in Fig. 32, we have
max{|uk (m)|2 } = 0.8713
k,m
of the noisy graph signal is calculated according to (128), us-
ing shifted bandpass spectral transfer functions, Hk (λp ), k = which indicates that even kxk0 kXk0 ≥ 1.1478 is possible. In
0, 1, . . . , K, p = 0, 1, . . . , N − 1, given
Pby (131) without squares other words, a graph signal for which the number of nonzero
(Hk (λp ) → Hk2 (λp )), which allows K k=0 Hk2 (λp ) = 1 to hold, samples, x(n), in the vertex domain is just two, will not violate
instead of K the uncertainty principle even if it has just one nonzero GDFT
P
k=0 Hk (λp ) = 1. In this way, the condition for the
inversion (159) is satisfied. The transfer functions, Hk (λp ), oth- coefficient, X(k).
erwise correspond to those shown in Fig. 35 (b) with K = 25.
The vertex-varying filtering is performed using Sf (m, k) = 7.4. Graph Spectrogram and Frames
S(m, k)B(m, k) for m = 0, 1, . . . , N − 1, k = 0, 1, . . . , K, with
a simple thresholding-based filtering support function Based on (119), the graph spectrogram can be defined
as
−1
NX
(
0, for |S(m, k)|< T 2
B(m, k) = |S(m, k)|2 = x(n)hm (n) uk (n) . (164)
1, otherwise,
n=0
m = 0, 1, . . . , N − 1, k = 0, 1, . . . , K, with the threshold T = Then, according to Parseval’s theorem, the vertex marginal
0.09 set empirically. The output graph signal, xf (n), is ob- property, which is a projection of |S(m, k)|2 onto the vertex
tained using the inversion relation in (158) for the filtered index axis, is given by
LGFT, Sf (m, k), and shown in Fig. 46(c). The achieved output
SNR was SN Rout = 10.36 dB. N
X −1 N
X −1 N
X −1
2
|S(m, k)| = S(m, k) x(n)hm (n) uk (n)
7.3. Uncertainty Principle for Graph Signals k=0 k=0 n=0
N −1
In classical signal analysis, the purpose of a window X
= |x(n)hm (n)|2 ,
function is to enhance signal localization in the joint time-
n=0
frequency domain. However, the uncertainty principle pre-
vents the ideal localization in both time and frequency. which would be equal to the signal power, |x(m)|2 , at the
Indeed, in the classical DFT analysis the uncertainty prin- vertex m, if hm (n) = δ(m − n). Since this is not the case,
ciple states that the vertex marginal property of the graph spectrogram is
kxk0 kXk0 ≥ N, (162)
53
equal to the power of the graph signal in hand, smoothed A frame is termed a tight frame if the equality in (168)
by the window, hm (n). holds, that is, if
Energy of graph spectrogram. For the total energy of N
X −1
graph spectrogram, we consequently have |hm (n)|2 = 1,
m=0
N −1 N −1 N −1 N −1
what is the same condition as in (157).
X X X X
|S(m, k)|2 = |x(n)|2 |hm (n)|2 ) . (165)
m=0 k=0 n=0 m=0 (b) The LGFT defined in (128) is a tight frame if
PN −1 K N −1 K N −1
If m=0 |hm (n)|2 = 1 for all n, then the spectrogram on X X X X
the graph is energy unbiased (statistically consistent with |S(m, k)|2 = |X(p)Hk (λp )|2 = Ex , (170)
k=0 m=0 k=0 p=0
respect to the energy), that is
N
X −1 N
X −1 N
X −1 where Parseval’s theorem for the S(m, k) as the GFT of
|S(m, k)| = 2 2 2
|x(n)| = ||x|| = Ex . (166) X(p)Hk (λp ) was used to yield
m=0 k=0 n=0
N
X −1 N
X −1
The LGFT viewed as a frame. A set of functions, |S(m, k)|2 = |X(p)Hk (λp )|2 .
S(m, k), is called a frame for the expansion of a graph m=0 p=0
signal, x, if
This means that the LGFT in (128) is a tight frame if
N
X −1
A||x||2 ≤ |S(m, k)|2 ≤ B||x||2 , K
X
m=0 |Hk (λp )|2 = 1 for p = 0, 1, . . . , N − 1.
k=0
where A and B are positive constants. If A = B, the
frame is termed Parseval’s tight frame and the signal can This condition is used to define transfer functions in Fig.
be recovered as 35(b) and (c).
N −1 N −1
From (170), it is straightforward to conclude that the
1 X X graph spectrogram energy is bounded with
x(n) = S(m, k)hm (n)uk (n).
A m=0
k=0 K N −1
X X
The constants A and B govern the numerical stability AEx ≤ |S(m, k)|2 ≤ BEx , (171)
k=0 m=0
of recovering the original signal x from the coefficients
S(m, k). where A and B are respectively the minimum and the
The conditions for two forms of the LGFT, defined as maximum of value of
in (120) and (128), to represent frames will be analyzed
K
next: X
g(λp ) = |Hk (λp )|2 .
(a) The LGFT, defined as in (120), is a frame, since in
k=0
this case Parseval’s theorem holds [42, 75, 44, 55], that is
N −1 N −1
7.4.1. Graph Wavelet Transform Inversion
The wavelet inversion formula
X X
2 2 2
|hm (n)| = |H(k)| |uk (n)| , (167)
m=0 k=0 N −1 X
K
X
which allows us to write x(n) = ψ(n, si )W (n, si ) (172)
n=0 i=0
N −1 N −1
1 2 X X
H (0) ≤ |hm (n)|2 ≤ max|uk (n)|2 |H(k)|2 = γ 2 Eh , can be derived in the same way and under the same con-
N m=0
m,k
dition as in (158)-(159), where a set of discrete scales for
k=0
(168) the wavelet calculation, denoted by s ∈ {s1 , s1 , . . . , sK }, is
where γ = maxm,k |uk (n)| and assumed, and ψ(n, s0 ) is used as a notation for the scale
function, φ(n), whose spectral transfer function is G(λ),
N −1
X as explained in Remark 34. In the same way as in the
Eh = |H(k)|2 .
LGFT case, it can be shown that the wavelet transform
k=0
represents a frame with
By multiplying both sides of the above inequalities by
N −1 X
K
||x||2 , we arrive at X
A||x||2 ≤ |W (n, si )|2 ≤ B||x||2 , (173)
N −1 N −1 n=0 i=0
1 2 X X
H (0)||x||2 ≤ |S(m, k)|2 ≤ ||x||2 γ 2 Eh . (169)
N m=0 k=0
54
where [71, 76, 77] where for each vertex, the vertex-frequency energy distri-
bution, E(n, k), is defined by [78, 79]
A= min g(λ),
0≤λ≤λmax N −1
X
B= max g(λ), E(n, k) = x(n)X(k)uk (n) = x(n)x(m)uk (m)uk (n).
0≤λ≤λmax
m=0
(175)
and the function g(λ) is defined by
K
X Remark 41: The definition in (175) corresponds to the
g(λ) = H 2 (si λ) + G2 (λ). Rihaczek distribution in classical time-frequency analysis
i=1 [63, 64, 65]. Observe that based on the Rihaczek distribu-
tion and the expression in (175), we may obtain a vertex-
The low-pass scale function, G(λ), is added in the recon- frequency representation even without a localization win-
struction formula, since all H(si λ) = 0 for λ = 0, as ex- dow. This very important property is also the main advan-
plained in Example 24 and Remark 34. It should be men- tage (along with the concentration improvement) of classi-
tioned that the spectral functions of the wavelet transform, cal time-frequency distributions with respect to the spec-
H(si λ), form Parseval’s frame if trogram and STFT based time-frequency representations.
g(λ) = 1. The marginal properties of the vertex-frequency energy
distribution, E(n, k), are defined as its projections onto the
Since the number of wavelet transform coefficients, W (n, si ), spectral index axis, k, and the vertex index axis, n, to give
for each n and i, is greater than the number of signal sam- N −1 N −1
ples, N , this representation is redundant, and this redun- X
E(n, k) = |X(k)|2 and
X
E(n, k) = x2 (n),
dancy allows us to implement the transform through a fast n=0 k=0
algorithm, rather than using the explicit computation of
all wavelet coefficients [76, 77]. Indeed, for large graphs, it which correspond respectively to the squared spectra, |X(k)|2 ,
can be computationally too complex to compute the full and the signal power, x2 (n), of the graph signal, x(n).
eigendecomposition of the graph Laplacian. A common Example 30: Fig. 47 shows the vertex-frequency distribution,
way to avoid this computational burden is to use a poly- E(n, k), of the graph signal from Fig. 32, together with its
nomial approximation schemes for H(si λ), i = 1, 2, . . . , K, marginal properties. The marginal properties are satisfied up
and G(λ). One such approach is the truncated Chebyshev to the computer precision. Observe also that the localization of
polynomial approximation method which is based on the energy is better than in the cases obtained with the localization
application of the continuous spectral window functions windows in Figs. 34, 44, and 45. Importantly, the distribution,
with Chebyshev polynomials, which admit order-recursive E(n, k), does not employ a localization window.
calculation (see Section 3.5.4 and Example 23). If, for a
given scale, si , the wavelet function is approximated by 7.5.1. Smoothness Index and Local Smoothness
a polynomial in the Laplacian, Pi (L), then the wavelet The smoothness index, l, in graph signal processing
transform can be efficiently calculated using plays the role of frequency, ω, in classical spectral analysis.
For a graph signal, x, the smoothness index is defined as
wi = Pi (L)x, (174) the Rayleigh quotient of the matrix L and vector x, that
is (see Section 4.2.1, Part I)
where wi a column vector with elements W (m, si ), m =
0, 1, . . . , N − 1. Note that this form corresponds to the xT Lx
l= ≥ 0. (176)
LGFT form in (129). xT x
7.5. Vertex-Frequency Energy Distributions
Remark 42: The expression in (176) indicates that the
The energy of a general signal is usually defined as smoothness index can be considered as a measure of the
N −1 N −1 N −1 rate of change of a graph signal. Faster changing signals
(corresponding to high-frequency signals) have larger val-
X X X
E= x2 (n) = x(n) X(k)uk (n).
n=0 n=0 k=0
ues of the smoothness index. The maximally smooth graph
signal is then a constant signal, x(n) = c, for which the
This expression can be rearranged into smoothness index is l = 0.
In the mathematics literature, the inverse of the smooth-
N −1 N −1 N −1 N −1
X X X X ness index is known as the curvature (curvature ∼ 1/l).
E= x(n)X(k)uk (n) = E(n, k),
While larger values of the smoothness index correspond
n=0 k=0 n=0 k=0
to graph signals with larger rates of change (less smooth
graph signals), the larger values of curvature would indi-
cate smoother graph signals.
55
1 0.5 0
100
2. Assume a piece-wise monocomponent signal
x(n) = αi uki (n) for n ∈ Vi , i = 1, 2, . . . , M,
80
where Vi ⊂ V are the subsets of the vertices such that
60
Vi ∩Vj = ∅ for i 6= j, V1 ∪V2 ∪· · ·∪VM = V, that is, ev-
ery vertex belongs to only one subset, Vi . Given the
monocomponent nature of this signal, within each
40
subset, Vi , the considered signal is proportional to
the eigenvector, uki (n).
20
Then, for each interior vertex, n ∈ Vi , i.e., a vertex
whose neighborhood lies in the same set, Vi , the local
0 smoothness index is given by
αi Luki (n)
0.4 λ(n) = = λki . (179)
αi uki (n)
0.2
3. An ideally concentrated vertex-frequency distribu-
0 tion (ideal distribution) can be defined as
0 20 40 60 80 100
I(n, k) ∼ |x(n)|2 δ λk − [λ(n)] ,
Figure 47: Vertex-frequency energy distribution for the graph sig-
nal whose vertex-frequency representation is given in Fig. 34. No whereby it is assumed that the local smoothness in-
localization window was used here.
dex is rounded to the nearest eigenvalue.
This distribution can also be used as a local smooth-
Notice that the smoothness index for an eigenvector, ness estimator, since for each vertex, n, the maxi-
uk , of the graph Laplacian, L, is equal to its corresponding mum of I(n, k) is positioned at λk = λ(n). An es-
eigenvalue, λk , that is timate of the spectral index at a vertex, n, denoted
uTk Luk by k̂(n), is then obtained as
= λk , (177)
uTk uk
k̂(n) = arg max{I(n, k)},
k
since by definition Luk = λk uk .
Remark 43: If the above eigenvectors are the classical so that the estimated local smoothness index be-
Fourier transform basis functions, then the smoothness in- comes λ̂(n) = λk̂(n) . This type of estimator is widely
dex corresponds to the squared frequency of the considered used in classical time-frequency analysis [63, 64, 65].
basis function, λk ∼ ωk2 , while the curvature corresponds
to the squared period in harmonic signals. 4. Local smoothness property. The vertex-frequency
distribution, E(n, k), satisfies the local smoothness
This makes it possible to define the local smoothness
property if
index for a vertex n, λ(n), in analogy with the standard
instantaneous frequency, ω(t), at an instant t, as [80] PN −1
k=0 λk E(n, k)
PN −1 = λ(n). (180)
Lx (n) k=0 E(n, k)
λ(n) = , (178)
x(n)
In that case, the centers of masses of the vertex-
where it was assumed that x(n) 6= 0 and Lx (n) are the frequency distribution along the spectral index axis,
elements of the vector Lx. k, should be exactly at λ = λ(n), and can be used as
The properties of the local smoothness include: an unbiased estimator of this graph signal parame-
ter.
1. The local smoothness index, λ(n), for a monocom-
ponent signal Example 31: The vertex-frequency distribution, defined by
x(n) = αuk (n), E(n, k) = x(n)X(k)uk (n), satisfies the local smoothness prop-
is vertex independent, and is equal to the global erty in (180), since
smoothness index, λk , since PN −1 PN −1
k=0 λk E(n, k) k=0 λk x(n)X(k)uk (n) Lx (n)
= = = λ(n).
Lx (n) = αLuk (n) = αλk uk (n).
PN −1 N −1 x(n)
P
k=0 E(n, k) k=0 x(n)X(k)u k (n)
In the standard time-domain signal analysis, this P −1
The above relation follows from the fact that N
k=0 λk X(k)uk (n)
property means that the instantaneous frequency of are the elements of the IGDFT of λk X(k). Upon employ-
a sinusoidal signal is equal to its global frequency. ing the matrix form of the IGDFT of ΛX, we have UΛX =
56
7
6
20
5
4 40
3
60
2
1 80
0
0 20 40 60 80 100 100
20 40 60 80 100
Figure 48: Local smoothness index, λ(n), of the graph signal from
Fig. 32. Figure 49: The sinc kernel of the reduced interference vertex-
frequency distribution in the frequency domain.
UΛ(UT U)X = (UΛUT )(UX) = Lx. With the notation,
Lx (n), for the elements of Lx, we next obtain This is obvious from
N −1
X N −1 N −1 N −1
λk X(k)uk (n) = Lx (n).
X X X
G(n, k) = X(p)X ∗ (q)up (n)u∗q (n) = |x(n)|2 .
k=0
k=0 p=0 q=0
The local smoothness index for the graph signal from Fig.
32 is shown in Fig. 48. • The frequency marginal property is satisfied if
7.5.2. Reduced Interference Distributions (RID) on Graphs φ(p, k, p) = δ(p − k).
In order to emphasize the close relations with classical
time-frequency analysis, in this subsection we will use the Then, the sum over all vertex indices produces
complex-sensitive notation for eigenvectors and spectral N −1 N −1
vectors. The frequency domain definition of the energy X
G(n, k) =
X
|X(p)|2 φ(p, k, p) = |X(k)|2 ,
distribution in (175) is given by n=0 p=0
N −1 PN −1
since n=0 up (n)u∗q (n) = δ(p−q), that is, the eigen-
X
E(n, k) = x(n)X ∗ (k)u∗k (n) = X(p)X ∗ (k)up (n)u∗k (n).
p=0 vectors are orthonormal.
Then, the general form of a graph distribution can be de- 7.5.3. Reduced Interference Distribution Kernels
fined with the help of a kernel φ(p, k, q), as [81]
A straightforward extension of classical time-frequency
N
X −1 N
X −1 kernels to graph signal processing would be naturally based
G(n, k) = X(p)X ∗ (q)up (n)u∗q (n)φ(p, k, q). upon exploiting the relation λ ∼ ω 2 , together with an ap-
p=0 q=0 propriate exponential kernel normalization.
(181) The simplest reduced interference kernel in the frequency-
frequency shift domain, which would satisfy the marginal
Observe that for φ(p, k, q) = δ(q − k), the graph Rihaczek properties, is the sinc kernel, given by
distribution in (175) follows, while the unbiased energy
PN −1 PN −1 (
condition k=0 n=0 G(n, k) = Ex is satisfied if
1
, for |k − p|≤ |p − q|,
φ(p, k, q) = 1+2|p−q|
N
X −1 0, otherwise,
φ(p, k, p) = 1.
k=0 which is is shown in Fig. 49 at the frequency shift corre-
sponding to k = 50.
The so obtained distribution, G(n, k), may also satisfy
Example 32: The sinc kernel was used for a vertex-frequency
the vertex and frequency marginal properties, as elabo-
representation of the signal from Fig. 32(d), with the results
rated bellow. shown in Fig. 50. This representation is a smoothed version of
• The vertex marginal property is satisfied if the energy vertex-frequency distribution in Fig. 47, whereby
both (vertex and frequency) marginals are preserved.
N −1
Remark 44: Marginal properties of graph spectro-
X
φ(p, k, q) = 1.
k=0 gram. A general vertex-frequency distribution can be
57
1 0.5 0
100
in (181), the classical (Rihaczek based) Cohen class of dis-
tributions directly follows, where c(k, n) is the distribution
kernel in the ambiguity domain [63, 64, 65].
80
60
8. Conclusion
Fundamental ideas of graph signals and their analy-
40 sis have been introduced starting from an intuitive multi-
sensor estimation example, frequently considered in tradi-
20 tional data analytics. The concept of systems on graphs
has been defined using graph signal shift operators, which
generalize the fundamental signal shift concepts in tra-
0
ditional signal processing. In part II of our monograph,
the Graph Discrete Fourier Transform (GDFT) has been
0.4
at the core of the spectral domain representation of graph
signals and systems on graphs, and has been defined based
0.2
on both the adjacency matrix and graph Laplacian. These
0
spectral domain representations have been used as the ba-
0 20 40 60 80 100 sis to introduce graph signal filtering concepts. Methods
for the design of the graph filters have been presented next,
Figure 50: Reduced interference vertex-frequency distribution of a
including those based on the polynomial approximation.
signal whose vertex-frequency representation is given in Fig. 34. The Various ideas related to the sampling of graph signals, and
marginal properties are given in the panels to the right and below the particularly, the challenging topic of the subsampling, have
vertex-frequency representation, and are equal to their corresponding also been addressed in this part of the monograph. This is
ideal forms given by |x(n)|2 and |X(k)|2 .
followed by conditions for the recovery of signals on graphs,
from a reduced number of samples. The concepts of time-
written for the vertex-vertex shift domain as a dual form varying signals on graphs and basic definitions related to
of (181), to yield random graph signals have also been reviewed.
While traditional approaches for graph analysis, clus-
N
X −1 N
X −1 tering and segmentation consider only graph topology and
G(n, k) = x(m)x∗ (l)uk (m)u∗k (l)ϕ(m, n, l), (182) spectral properties of graphs, when dealing with signals on
m=0 l=0 graphs, localized analyzes should employed in order to con-
sider both data on graphs and the graph topology. Such
where ϕ(m, n, l) is the kernel in this domain (the same
a unified approach to define and implement graph signal
mathematical form as for the frequency-frequency shift do-
localization methods, which takes into account both the
main kernel). The frequency marginal is then satisfied if
PN −1 data on graph and the corresponding graph topology, is
n=0 ϕ(m, n, l) = 1 holds, while the vertex marginal is at the core of vertex-frequency analysis. Like in classi-
met if ϕ(m, n, m) = δ(m−n). The relation of this distribu- cal time-frequency analysis, the main research efforts have
tion with the vertex domain spectrogram (115) is simple, been devoted to linear representations of the graph signals
and is given by which include a localization window for enhanced signal
discrimination. Several methods for the definition of lo-
ϕ(m, n, l) = hn (m)h∗n (l).
calization widows in the spectral and vertex domain have
However, this kernel cannot satisfy both the frequency and been addressed in Part II of this monograph. Optimiza-
vertex P
marginal properties, while the unbiased energy con- tion of the window parameters, uncertainty principle, and
N −1
dition n=0 ϕ(m, n, m) = 1 reduces to (157). inversion methods have also been discussed. Following
classical time-frequency analysis, energy forms of vertex-
Remark 45: Classical time-frequency analysis fol- frequency energy and reduced interference distributions,
lows as a special case from the general form of graph which do not use localization windows, have also been con-
distributions in (181), if the considered graph is a di- sidered, together with their role as an estimator of the local
rected circular graph. This becomes obvious upon re- smoothness index.
calling that the adjacency matrix eigendecomposition pro-
duces complex-valued
√ eigenvectors of the form uk (n) =
9. Bibliography
exp(j2πnk/N )/ N . With the kernel choice
References
N −1
−j 2πnk j 2πnp
X
φ(p, k, q) = φ(p − q, k − p) = c(p − q, n)e N eN [1] J. M. Moura, “Graph signal processing,” in Cooperative and
n=0
Graph Signal Processing, P. Djuric and C. Richard, Editors,
pp. 239–259, Elsevier, 2018.
58
[2] M. Vetterli, J. Kovačević, and V. Goyal, Foundations of Signal Society, vol. 6, pp. 1–12, 1951.
Processing. Cambridge University Press., 2014. [24] P. B. Billingsley, Convergence of Probability Measures. John
[3] A. Sandryhaila and J. M. Moura, “Discrete signal processing Wiley & Sons, 1999.
on graphs,” IEEE Transactions on Signal Processing, vol. 61, [25] R. Durrett, Probability: Theory and Examples. Cambridge Uni-
no. 7, pp. 1644–1656, 2013. versity Press, 1996.
[4] V. N. Ekambaram, Graph-structured data viewed through a [26] D. Revuz and M. Yor, Continuous Martingales and Brownian
Fourier lens. University of California, Berkeley, 2014. Motion. Springer, 1999.
[5] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van- [27] P. B. Billingsley, Probability and Measure. John Wiley & Sons,
dergheynst, “The emerging field of signal processing on graphs: 1995.
Extending high-dimensional data analysis to networks and other [28] S. Chen, A. Sandryhaila, and J. Kovačević, “Sampling theory
irregular domains,” IEEE Signal Processing Magazine, vol. 30, for graph signals,” in Proc. IEEE International Conference on
no. 3, pp. 83–98, 2013. Acoustics, Speech and Signal Processing (ICASSP), pp. 3392–
[6] R. Hamon, P. Borgnat, P. Flandrin, and C. Robardet, “Extrac- 3396, 2015.
tion of temporal network structures from graph-based signals,” [29] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete
IEEE Transactions on Signal and Information Processing over signal processing on graphs: Sampling theory,” IEEE Transac-
Networks, vol. 2, no. 2, pp. 215–226, 2016. tions on Signal Processing, vol. 63, no. 24, pp. 6510–6523, 2015.
[7] S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, “Sig- [30] S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, “Sig-
nal denoising on graphs via graph filtering,” in Proc. 2014 IEEE nal recovery on graphs: Variation minimization,” IEEE Trans-
Global Conference on Signal and Information Processing (Glob- actions on Signal Processing, vol. 63, no. 17, pp. 4609–4624,
alSIP), pp. 872–876, 2014. 2015.
[8] A. Gavili and X.-P. Zhang, “On the shift operator, graph fre- [31] S. Chen, R. Varma, A. Singh, and J. Kovačević, “Signal re-
quency, and optimal filtering in graph signal processing,” IEEE covery on graphs: Fundamental limits of sampling strategies,”
Transactions on Signal Processing, vol. 65, no. 23, pp. 6303– IEEE Transactions on Signal and Information Processing over
6318, 2017. Networks, vol. 2, no. 4, pp. 539–554, 2016.
[9] L. Stankovic, D. Mandic, M. Dakovic, I. Kisil, E. Sejdic, and [32] M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, “Signals on
A. G. Constantinides, “Understanding the basis of graph sig- graphs: Uncertainty principle and sampling,” IEEE Transac-
nal processing via an intuitive example-driven approach,” IEEE tions Signal Processing, vol. 64, no. 18, pp. 539–554, 2016.
Signal Processing Magazine, arXiv preprint arXiv:1903.11179, [33] X. Wang, P. Liu, and Y. Gu, “Local-set-based graph signal re-
2019. construction,” IEEE Transactions on Signal Processing, vol. 63,
[10] L. Stanković, E. Sejdić, and M. Daković, “Vertex-frequency en- no. 9, pp. 2432–2444, 2015.
ergy distributions,” IEEE Signal Processing Letters, vol. 25, [34] L. Stanković, E. Sejdić, S. Stanković, M. Daković, and I. Orović,
no. 3, pp. 358–362, 2017. “A tutorial on sparse signal reconstruction and its applications
[11] A. Sandryhaila and J. M. Moura, “Discrete signal processing in signal processing,” Circuits, Systems, and Signal Processing,
on graphs: Frequency analysis,” IEEE Transactions on Signal pp. 1–58, 2018.
Processing, vol. 62, no. 12, pp. 3042–3054, 2014. [35] L. Stanković, Digital signal processing with selected topics. Cre-
[12] A. Sandryhaila and J. M. Moura, “Big data analysis with signal ateSpace Independent Publishing Platform, An Amazon.com
processing on graphs: Representation and processing of mas- Company, 2015.
sive data sets with irregular structure,” IEEE Signal Processing [36] S. K. Narang and A. Ortega, “Downsampling graphs using spec-
Magazine, vol. 31, no. 5, pp. 80–90, 2014. tral theory,” in Proc. IEEE International Conference on Acous-
[13] A. Venkitaraman, S. Chatterjee, and P. Händel, “Hilbert trans- tics, Speech and Signal Processing (ICASSP), pp. 4208–4211,
form, analytic signal, and modulation analysis for graph signal 2011.
processing,” arXiv preprint arXiv:1611.05269, 2016. [37] H. Q. Nguyen and M. N. Do, “Downsampling of signals on
[14] A. Agaskar and Y. M. Lu, “A spectral graph uncertainty princi- graphs via maximum spanning trees.,” IEEE Transactions on
ple,” IEEE Transactions on Information Theory, vol. 59, no. 7, Signal Processing, vol. 63, no. 1, pp. 182–191, 2015.
pp. 4338–4356, 2013. [38] S. K. Narang and A. Ortega, “Perfect reconstruction two-
[15] X. Yan, B. M. Sadler, R. J. Drost, P. L. Yu, and K. Lerman, channel wavelet filter banks for graph structured data,” IEEE
“Graph filters and the z-Laplacian,” IEEE Journal of Selected Transactions on Signal Processing, vol. 60, no. 6, pp. 2786–
Topics in Signal Processing, vol. 11, pp. 774–784, Sept 2017. 2799, 2012.
[16] X. Wang, J. Chen, and Y. Gu, “Local measurement and recon- [39] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Inter-
struction for noisy bandlimited graph signals,” Signal Process- polation of graph signals using shift-invariant graph filters,”
ing, vol. 129, pp. 119–129, 2016. in Proc. 23rd European Signal Processing Conference (EU-
[17] S. Segarra and A. Ribeiro, “Stability and continuity of centrality SIPCO), pp. 210–214, 2015.
measures in weighted graphs,” IEEE Transactions on Signal [40] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sam-
Processing, vol. 64, no. 3, pp. 543–555, 2016. pling of graph signals with successive local aggregations.,” IEEE
[18] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex- Transactions Signal Processing, vol. 64, no. 7, pp. 1832–1843,
frequency analysis on graphs,” Applied and Computational Har- 2016.
monic Analysis, vol. 40, no. 2, pp. 260–291, 2016. [41] A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set se-
[19] S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “\ell 1 trend lection for bandlimited graph signals using graph spectral prox-
filtering,” SIAM review, vol. 51, no. 2, pp. 339–360, 2009. ies,” IEEE Transactions on Signal Processing, vol. 64, no. 14,
[20] L. Stanković, M. Daković, and E. Sejdić, “Introduction to graph pp. 3775–3789, 2016.
signal processing,” in Vertex-Frequency Analysis of Graph Sig- [42] H. Behjat, U. Richter, D. Van De Ville, and L. Sörnmo, “Signal-
nals, pp. 3–108, Springer, 2019. adapted tight frames on graphs,” IEEE Transactions on Signal
[21] A. Heimowitz and Y. C. Eldar, “A Unified View of Diffusion Processing, vol. 64, no. 22, pp. 6017–6029, 2016.
Maps and Signal Processing on Graphs,” In Proceedings of the [43] Y. Tanaka and A. Sakiyama, “M-channel oversampled graph
International Conference on Sampling Theory and Applications filter banks,” IEEE Transactions Signal Processessing, vol. 62,
(SampTA), pp. 308–312, 2017. no. 14, pp. 3578–3590, 2014.
[22] B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, [44] A. Sakiyama and Y. Tanaka, “Oversampled graph Laplacian
and D. P. Mandic, “A Unifying Analysis of Shift Operators on matrix for graph filter banks,” IEEE Transactions on Signal
a Graph,” arXiv:1908.01596, 2019. Processing, vol. 62, no. 24, pp. 6425–6437, 2014.
[23] M. D. Donsker, “An Invariance Principle for Certain Probabil- [45] N. Tremblay and P. Borgnat, “Subgraph-based filterbanks for
ity Limit Theorems,” Memoirs of the American Mathematical graph signals,” IEEE Transactions Signal Processing, vol. 64,
59
no. 15, pp. 3827–3840, 2016. ture notes],” IEEE Signal Processing Magazine, vol. 34, no. 4,
[46] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in pp. 176–182, 2017.
Proceedings of the 12th ACM SIGKDD International Confer- [70] Y. Meyer, Wavelets and Operators. Cambridge University
ence on Knowledge Discovery and Data Mining, pp. 631–636, Press, 1992.
ACM, 2006. [71] N. Leonardi and D. Van De Ville, “Tight wavelet frames on
[47] L. Stankovic, D. Mandic, M. Dakovic, and I. Kisil, “An in- multislice graphs,” IEEE Transactions on Signal Processing,
tuitive derivation of the coherence index relation in compres- vol. 61, no. 13, pp. 3357–3367, 2013.
sive sensing,” IEEE Signal Processing Magazine, arXiv preprint [72] L. Stanković, “A measure of some time–frequency distributions
arXiv:1903.11136, 2019. concentration,” Signal Processing, vol. 81, no. 3, pp. 621–631,
[48] E. J. Candes, “The restricted isometry property and its implica- 2001.
tions for compressed sensing,” Comptes rendus mathematique, [73] M. Elad and A. M. Bruckstein, “Generalized uncertainty prin-
vol. 346, no. 9-10, pp. 589–592, 2008. ciple and sparse representation in pairs of bases,” IEEE Trans-
[49] N. Perraudin and P. Vandergheynst, “Stationary signal pro- actions on Information Theory, vol. 48, no. 9, pp. 2558–2567,
cessing on graphs,” IEEE Transactions on Signal Processing, 2002.
vol. 65, no. 13, pp. 3462–3477, 2017. [74] N. Perraudin, B. Ricaud, D. I. Shuman, and P. Vandergheynst,
[50] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Stationary “Global and local uncertainty principles for signals on graphs,”
graph processes and spectral estimation,” IEEE Transactions APSIPA Transactions on Signal and Information Processing,
on Signal Processing, vol. 65, no. 22, pp. 5911–5926, 2017. vol. 7, no. e3, pp. 1–26, 2018.
[51] A. Loukas and N. Perraudin, “Stationary time-vertex signal pro- [75] D. K. Hammond, P. Vandergheynst, and R. Gribonval,
cessing,” arXiv preprint arXiv:1611.00255, 2016. “Wavelets on graphs via spectral graph theory,” Applied and
[52] S. P. Chepuri and G. Leus, “Subsampling for graph power spec- Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150,
trum estimation,” in Proc. IEEE Sensor Array and Multichan- 2011.
nel Signal Processing Workshop (SAM), pp. 1–5, 2016. [76] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “The
[53] G. Puy, N. Tremblay, R. Gribonval, and P. Vandergheynst, spectral graph wavelet transform: Fundamental theory and fast
“Random sampling of bandlimited signals on graphs,” Applied computation,” in Vertex-Frequency Analysis of Graph Signals,
and Computational Harmonic Analysis, vol. 44, no. 2, pp. 446– pp. 141–175, Springer, 2019.
475, 2016. [77] H. Behjat and D. Van De Ville, “Spectral design of signal-
[54] C. Zhang, D. Florêncio, and P. A. Chou, “Graph signal process- adapted tight frames on graphs,” in Vertex-Frequency Analysis
ing - A probabilistic framework,” Microsoft Research, Redmond, of Graph Signals, pp. 177–206, Springer, 2019.
WA, USA, Tech. Rep. MSR-TR-2015-31, 2015. [78] L. Stanković, E. Sejdić, and M. Daković, “Vertex-frequency en-
[55] B. Girault, “Stationary graph signals using an isometric graph ergy distributions,” IEEE Signal Processing Letters, vol. 25,
translation,” in Proc. 23rd European Signal Processing Confer- no. 3, pp. 358–362, 2018.
ence (EUSIPCO), pp. 1516–1520, 2015. [79] L. Stanković, M. Daković, and E. Sejdić, “Vertex-frequency en-
[56] B. Girault, P. Gonçalves, and É. Fleury, “Translation on graphs: ergy distributions,” in Vertex-Frequency Analysis of Graph Sig-
An isometric shift operator,” IEEE Signal Processing Letters, nals (L. Stanković and E. Sejdić, eds.), pp. 377–415, Springer,
vol. 22, no. 12, pp. 2416–2420, 2015. 2019.
[57] B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, [80] M. Daković, L. Stanković, and E. Sejdić, “Local smooth-
and D. P. Mandic, “Unitary Shift Operators on a Graph,” ness of graph signals,” Mathematical Problems in Engineering,
arXiv:1909.05767, 2019. vol. 2019, 2019.
[58] P. A. Fillmore, “The Shift Operator,” American Mathematical [81] L. Stanković, E. Sejdić, and M. Daković, “Reduced interfer-
Monthly, vol. 81, no. 7, pp. 717–723, 1974. ence vertex-frequency distributions,” IEEE Signal Processing
[59] P. O. Löwdin, “On the Non-Orthogonality Problem Connected Letters, vol. 25, no. 9, pp. 1393–1397, 2018.
with the Use of Atomic Wave Functions in the Theory of
Molecules and Crystals,” Journal of Chemical Physics, vol. 18,
no. 3, pp. 365–375, 1950.
[60] P. O. Löwdin, “On the Nonorthogonality Problem,” Advances
in Quantum Chemistry, vol. 5, pp. 185–199, 1970.
[61] P. H. Schönemann, “A Generalized Solution of the Orthogonal
Proctrustes Problem,” Psychometrika, vol. 31, pp. 1–10, 1966.
[62] W. Kabsch, “A Solution for the Best Rotation to Related Two
Sets of Vectors,” Acta Crystallographica, vol. 32, p. 922, 1976.
[63] L. Stanković, M. Daković, and T. Thayaparan, Time-frequency
signal analysis with applications. Artech House, 2014. Graph Signal Processing – Part III:
[64] L. Cohen, Time-frequency Analysis. Electrical Engineering Sig-
nal Processing, Prentice Hall PTR, 1995. Learning Graph Topology
[65] B. Boashash, Time-frequency signal analysis and processing: A
comprehensive reference. Academic Press, 2015.
L. Stankovic, D. Mandic, M Dakovic,
[66] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “A windowed
graph Fourier transform,” in Proc. IEEE Statistical Signal Pro- M. Brajovic, B. Scalzo, A. G. Constantinides
cessing Workshop (SSP), pp. 133–136, 2012.
[67] X.-W. Zheng, Y. Y. Tang, J.-T. Zhou, H.-L. Yuan, Y.-L. Wang,
L.-N. Yang, and J.-J. Pan, “Multi-windowed graph Fourier
frames,” in Proc. IEEE International Conference on Machine
Learning and Cybernetics (ICMLC), vol. 2, pp. 1042–1048,
2016.
[68] M. Tepper and G. Sapiro, “A short-graph Fourier transform
via personalized pagerank vectors,” in Proc. IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 4806–4810, 2016.
[69] L. Stanković, M. Daković, and E. Sejdić, “Vertex-frequency
analysis: A way to localize graph spectral components [lec-
60