Graph Signal Processing - Part II: Processing and Analyzing Signals on Graphs

Anthony  Constantinides

Outline

Graph Signal Processing - Part II: Processing and Analyzing Signals on Graphs

Anthony Constantinides

2019

visibility

…

description

60 pages

Key takeaways
AI

Graph signal processing integrates irregular data structures with traditional signal processing techniques.
The Graph Discrete Fourier Transform (GDFT) underpins spectral representations of graph signals.
Localized methods enhance analysis by considering both graph topology and signal data.
Practical examples, like multisensor temperature estimation, illustrate graph signal applications.
Optimal denoising and subsampling strategies are key for efficient graph signal processing.

Abstract

The focus of Part I of this monograph has been on both the fundamental properties, graph topologies, and spectral representations of graphs. Part II embarks on these concepts to address the algorithmic and practical issues centered round data/signal processing on graphs, that is, the focus is on the analysis and estimation of both deterministic and random data on graphs. The fundamental ideas related to graph signals are introduced through a simple and intuitive, yet illustrative and general enough case study of multisensor temperature field estimation. The concept of systems on graph is defined using graph signal shift operators, which generalize the corresponding principles from traditional learning systems. At the core of the spectral domain representation of graph signals and systems is the Graph Discrete Fourier Transform (GDFT). The spectral domain representations are then used as the basis to introduce graph signal filtering concepts and address their design, including Chebys...

Graph Signal Processing – Part II: Processing and Analyzing Signals on Graphs Ljubiša Stankovića , Danilo Mandicb , Miloš Dakovića , Miloš Brajovića , Bruno Scalzob , Anthony G. Constantinidesb a University of Montenegro, Podgorica, Montenegro b Imperial College London, London, United Kingdom Abstract Data analytics on graphs deals with information processing of data acquired on irregular but structured graph domains. The focus of Part I of this monograph has been on both the fundamental and higher-order graph properties, graph topologies, and spectral representations of graphs. Part I also establishes rigorous frameworks for vertex clustering and graph segmentation, and illustrates the power of graphs in various data association tasks. Part II embarks on arXiv:1909.10325v1 [cs.IT] 23 Sep 2019 these concepts to address the algorithmic and practical issues centered round data/signal processing on graphs, that is, the focus is on the analysis and estimation of both deterministic and random data on graphs. The fundamental ideas related to graph signals are introduced through a simple and intuitive, yet illustrative and general enough case study of multisensor temperature field estimation. The concept of systems on graph is defined using graph signal shift operators, which generalize the corresponding principles from traditional learning systems. At the core of the spectral domain representation of graph signals and systems is the Graph Discrete Fourier Transform (GDFT), which is defined based on the eigendecomposition of both the adjacency matrix and the graph Laplacian. The spectral domain representations are then used as the basis to introduce graph signal filtering concepts and address their design, including Chebyshev polynomial approximation series. Ideas related to the sampling of graph signals, and in particular the challenging topic of data dimensionality reduction through graph subsampling, are presented and further linked with compressive sensing. The principles of time-varying signals on graphs and basic definitions related to random graph signals are next reviewed. Localized graph signal analysis in the joint vertex-spectral domain is referred to as the vertex-frequency analysis, since it can be considered as an extension of classical time-frequency analysis to the graph domain of a signal. Important topics related to the local graph Fourier transform (LGFT) are covered, together with its various forms including the graph spectral and vertex domain windows and the inversion conditions and relations. A link between the LGFT with spectral varying window and the spectral graph wavelet transform (SGWT) is also established. Realizations of the LGFT and SGWT using polynomial (Chebyshev) approximations of the spectral functions are further considered and supported by examples. Finally, energy versions of the vertex-frequency representations are introduced, along with their relations with classical time-frequency analysis, including a vertex-frequency distribution that can satisfy the marginal properties. The material is supported by numerous examples. Contents 3.5 Graph Signal Filtering in the Spectral Do- main of the Adjacency Matrix . . . . . . . . 11 1 Introduction 2 3.5.1 Normalization of the Adjacency Ma- trix . . . . . . . . . . . . . . . . . . 11 2 Problem Statement: An Illustrative Exam- 3.5.2 Spectral Ordering of Eigenvectors of ple 3 the Adjacency Matrix . . . . . . . . 11 3.5.3 Spectral Domain Filter Design . . . 12 3 Signals and Systems on Graphs 7 3.5.4 Polynomial (Chebyshev) Approxima- 3.1 Adjacency Matrix and Graph Signal Shift . 8 tion of the System on a Graph Trans- 3.2 Systems Based on Graph Shifted Signals . . 8 fer Function . . . . . . . . . . . . . . 14 3.3 Graph discrete Fourier transform (GDFT), 3.5.5 Inverse System on a Graph . . . . . 15 adjacency matrix based definition . . . . . . 10 3.6 Graph Fourier Transform Based on the Lapla- 3.4 System on a graph in the GDFT domain . . 10 cian . . . . . . . . . . . . . . . . . . . . . . 15 3.7 Ordering and Filtering in the Laplacian Spec- Email addresses: [email protected] (Ljubiša Stanković), tral Domain . . . . . . . . . . . . . . . . . . 16 [email protected] (Danilo Mandic), [email protected] 3.8 Systems on a Graph Defined Using the Graph (Miloš Daković), [email protected] (Miloš Brajović), Laplacian . . . . . . . . . . . . . . . . . . . 17 [email protected] (Bruno Scalzo), [email protected] (Anthony G. Constantinides) 3.9 Convolution of Signals on a Graph . . . . . 18 3.10 The z-transform of a Signal on a Graph . . 20 Preprint submitted to Elsevier Received: date / Accepted: date 3.11 Shift Operator in the Spectral Domain . . . 21 7.4.1 Graph Wavelet Transform Inversion 54 3.12 Parseval’s Theorem on a Graph . . . . . . . 21 7.5 Vertex-Frequency Energy Distributions . . . 55 3.13 Optimal Denoising . . . . . . . . . . . . . . 21 7.5.1 Smoothness Index and Local Smooth- 3.14 Systems on a Graph Defined Using Random ness . . . . . . . . . . . . . . . . . . 55 Walk Laplacian . . . . . . . . . . . . . . . . 22 7.5.2 Reduced Interference Distributions (RID) on Graphs . . . . . . . . . . . 57 4 Subsampling, Compressed Sensing, and Re- 7.5.3 Reduced Interference Distribution Ker- construction 23 nels . . . . . . . . . . . . . . . . . . 57 4.1 Subsampling of Low-Pass Graph Signals . . 24 4.2 Subsampling of Sparse Graph Signals . . . . 25 8 Conclusion 58 4.2.1 Known Coefficient Positions in GDFT 25 4.2.2 Support Matrices, Subsampling and 9 Bibliography 58 Upsampling . . . . . . . . . . . . . . 25 4.2.3 Unknown Coefficient Positions . . . 26 1. Introduction 4.2.4 Unique Reconstruction Conditions . 26 4.3 Measurements as Linear Combinations of Graphs are irregular structures which naturally account Samples . . . . . . . . . . . . . . . . . . . . 27 for data integrity, however, traditional approaches have 4.4 Aggregate Sampling . . . . . . . . . . . . . 28 been established outside Machine Learning and Signal Pro- 4.5 Filter Bank on a Graph . . . . . . . . . . . 28 cessing, and largely focus on analyzing the underlying graphs rather than dealing with signals on graphs. On the other 5 Time-Varying Signals on Graphs 32 hand, given the rapidly increasing availability of multisen- 5.1 Diffusion on Graph and Low Pass Filtering 32 sor and multinode measurements, likely recorded on irreg- 5.2 Taubin’s α − β algorithm . . . . . . . . . . 33 ular or ad-hoc grids, it would be extremely advantageous to analyze such structured data as “signals on graphs” and 6 Random Graph Signals 33 thus benefit from the ability of graphs to account for spa- 6.1 Review of WSS and related Properties for tial sensing awareness, physical intuition and sensor im- Random Signals in Standard Time Domain 34 portance, together with the inherent “local versus global” 6.2 Adjacency Matrix Based Definition of GWSS 34 sensor association. The aim of Part II of our monograph is 6.3 Wiener filter on a graph . . . . . . . . . . . 35 therefore to establish a common language between graph 6.4 Spectral Domain Shift Based Definition of signals which are observed on irregular signal domains, GWSS . . . . . . . . . . . . . . . . . . . . . 35 and some of the most fundamental paradigms in Learn- 6.5 Isometric Shift Operator . . . . . . . . . . . 35 ing Systems, Signal Processing and Data Analytics, such 7 Vertex-Frequency Representations 36 as spectral analysis, system transfer function, digital filter design, parameter estimation, and optimal denoising. 7.1 Localized Graph Fourier Transform (LGFT) 37 7.1.1 Windows Defined in the GDFT Do- In classical Data Analytics and Signal Processing, the main . . . . . . . . . . . . . . . . . . 37 signal domain is determined by equidistant time instants 7.1.2 Spectral Domain Localization of the or by a set of spatial sensing points on a uniform grid. LGFT . . . . . . . . . . . . . . . . . 40 However, increasingly the actual data sensing domain may 7.1.3 LGFT Realization with Band-Pass not even be related to the physical dimensions of time and/or space, and it typically does exhibit various forms Functions . . . . . . . . . . . . . . . 40 7.1.4 Signal Adaptive LGFT . . . . . . . . 43 of irregularity, as, for example, in social or web-related 7.1.5 Polynomial LGFT Approximation . 45 networks, where the sensing points and their connectiv- 7.1.6 The Spectral Graph Wavelet Trans- ity pertain to specific objects/nodes and ad-hoc topology form . . . . . . . . . . . . . . . . . . 45 of their links. It should be noted that even for the data 7.1.7 Windows Defined Using the Vertex acquired on well defined time and space domains, the in- Neighborhood . . . . . . . . . . . . . 49 troduction of new relations between the signal samples, 7.1.8 Window Parameter Optimization . . 50 through graphs, may yield new insights into the analysis 7.2 Inversion of the LGFT . . . . . . . . . . . . 50 and provide enhanced data processing (for example, based 7.2.1 Inversion by the Summation of the on local similarity, through neighborhoods). We therefore LGFT . . . . . . . . . . . . . . . . . 51 set out to show that the advantage of graphs over classical 7.2.2 Inversion of the LGFT with Band- data domains is that graphs account naturally and com- Pass Functions . . . . . . . . . . . . 52 prehensively for irregular data relations in the problem 7.2.3 Kernel-Based Inversion . . . . . . . 52 definition, together with the corresponding data connec- 7.2.4 Vertex-Varying Filtering . . . . . . . 52 tivity in the analysis [1, 2, 3, 4, 5, 6, 7, 8]. 7.3 Uncertainty Principle for Graph Signals . . 53 To build up the intuition behind the fundamental ideas 7.4 Graph Spectrogram and Frames . . . . . . . 53 of signals/data on graph, a simple yet general example of multisensor temperature estimation is first considered in 2 Section 2. Basic concepts regarding the signals and sys- tems on graphs are presented in Section 3, including basic 11 6 definitions, operations and transforms, which generalize the foundations of traditional signal processing. Systems 12 on graphs are interpreted starting from a comprehensive 10 4 account of the existing and the introduction of a novel, iso- 9 metric, graph signal shift operator. Further, graph Fourier 5 14 transform is defined based on both the adjacency matrix 13 and the graph Laplacian and it serves as the basis to in- 0 1 troduce graph signal filtering concepts. Various ideas re- 8 2 lated to the sampling of graph signals, and particularly, 3 the challenging topic of their subsampling, are reviewed 15 in Section 4. Sections 5 and 6 present the concepts of time-varying signals on graphs and introduce basic defi- 7 nitions related to random graph signals. Localized graph signal behavior can be simultaneously characterized in the vertex-frequency domain, which is discussed in Section (a) 7. This Section also covers the important topics of lo- cal graph Fourier transform, various forms of its inversion, 40 relations with the frames framework and links with the graph wavelet transform. Energy versions of the vertex- 30 frequency representations are also considered, along with 20 their relations with classical time-frequency analysis. 10 2. Problem Statement: An Illustrative Example 0 0 2 4 6 8 10 12 14 16 Consider a multi-sensor setup for measuring a temper- Sensor index (b) ature field in a region of interest. The temperature sensing locations are chosen according to the significance of a par- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (c) ticular geographic area to local users, with N = 16 sensing Figure 1: Temperature sensing as a classic data analytics problem. points in total, as shown in Fig. 1(a). The temperature (a) Sensing locations in a geographic region along the Adriatic sea. field is denoted by {x(n)}, with n as the sensor index, while (b) Temperatures measured at N = 16 sensing locations. In standard a snapshot of its values is given in Fig. 1(b). Each mea- data estimation, the spatial sensor index is used for the horizontal axis and serves as the data domain. This domain can be interpreted sured sensor signal can then be mathematically expressed as a directed path graph structure, shown in the bottom panel (c). as Observe that the consecutive samples (vertices) on this path graph x(n) = s(n) + ε(n), n = 0, 1, . . . , 15, (1) offer no physical intuition or interpretation, as in this “brute force” arrangement, for example, vertex 6 is located on a high mountain, where s(n) is the true temperature that would have been whereas its neighboring vertices 5 and 7 are located along the sea; obtained in ideal measuring conditions and ε(n) comprises despite the consecutive index numbers these sensors are physically the adverse effects of the local environment on sensor read- distant, as indicated by their very different temperature measure- ments. ings or faulty sensor activity, and is referred to as “noise” in the sequel. For illustrative purposes, in our study each ε(n) was modeled as a realization of white, zero-mean, a more complex estimation structure than the standard Gaussian process, with standard deviation σε = 2, that is, linear one corresponding to the classical signal processing ε(n) ∈ N (0, 4). It was added to the signal, s(n), to yield framework, shown in Fig. 1(b). the signal-to-noise ratio in x(n) of SN Rin = 14.2 dB. To introduce a “situation-aware” noise reduction scheme Remark 1: Classical data analytics require a rearrange- for the temperature field in Fig. 1, we proceed to ex- ment of the quintessentially irregular spatial temperature plore a graph-theoretic framework to this problem, start- sensing arrangement in Fig. 1(a) into a linear structure ing from a local signal average operator. In classical anal- shown in Fig. 1(b). Obviously, such “lexicographic” or- ysis, this may be achieved through a moving average oper- dering is not amenable to exploiting the information re- ator, e.g., by averaging across the neighboring data sam- lated to the actual sensor locations, which is inherently ples, or equivalently neighboring sensors in the linear data dictated by the terrain. This renders classical analyses of setup in Fig. 1(b), and for each sensing point. Physically, this multisensor temperature field inapplicable (or at best such local neighborhood should include close neighboring suboptimal), as the performance critically depends on the sensing points but only those which also exhibit similar chosen sensor ordering scheme. This exemplifies that even meteorological properties defined by the sensor distance, a most routine multisensor measurement setup requires altitude difference, and other terrain specific properties. 3 by X 6 11 y(n) = x(m), m at and around n 12 so that the local average temperature for a sensing point n 10 4 may be obtained by dividing the cumulative temperature, 9 y(n), with the number of included sensing points (size of 5 14 local neighborhood). For example, for the sensing points 13 n = 3 and n = 6, presented in Fig. 2(a), the “domain 1 0 knowledge aware” local estimation takes the form 8 2 3 y(3) = x(3) + x(0) + x(14) + x(15) (2) 15 y(6) = x(6) + x(9) + x(10). (3) For convenience, the full set of relations among the sensing 7 points can now be arranged into a matrix form, to give y = x + Ax, (4) (a) where the adjacency matrix A, given in (5), indicates the connectivity structure of the sensing locations; this 11 local connectivity structure should be involved in the cal- 6 culation of each y(n). 12 This simple real-world example can be interpreted within 10 the graph signal processing framework as follows: 4 • Sensing points where the signal is measured are des- 9 ignated as the graph vertices, as in Fig. 1, 14 5 13 • Vertex-to-vertex lines which indicate physically mean- ingful connectivity among the sensing points become 8 2 0 1 the graph edges, as in Fig. 2(a), 3 • The vertices and edges form a graph, as in Fig. 15 2(b), a new very structurally rich signal domain, 7 • The graph, rather than a standard vector of sens- ing points, is then used for analyzing and processing data, as it exhibits both spatial and physical domain (b) awareness, Figure 2: Temperature sensing setup as a graph signal estimation • The measured temperatures are now interpreted as problem. (a) Local neighborhood for the sensing points n = 3, 6, signal samples on graph, as shown in Fig. 3, and 8. These neighborhoods are chosen using “domain knowledge” dictated by the local terrain and by taking into account the sensor • Similar to traditional signal processing, this new graph distance and altitude. Neighboring sensors for each of these sensing signal may have many realizations on the same graph locations (vertices) are chosen in a physically meaningful way and their relation is indicated by the connectivity lines, that is, graph and may comprise noise, edges. (b) Local neighborhoods for all sensing vertices, presented in a graph form. • Through relation (4), we have therefore introduced a simple system on a graph for physically and spatially aware signal averaging (a linear first-order In other words, since the sensor network in Fig. 1 mea- system on a graph). sures a set of related temperatures from irregularly spaced sensors, an effective estimation strategy should include do- To emphasize our trust in a particular sensor (i.e., to main knowledge – not possible to achieve with standard model sensor relevance), a weighting scheme may be im- methods (linear path graph). posed, in the form To illustrate the advantages of approaches based on lo- X y(n) = x(n) + Wnm x(m), (8) cal information (neighborhood based) , consider the neigh- m6=n borhoods for the sensing points n = 3 (low land), n = 6 (mountains), and 8 (coast), shown in Fig. 2(a). The cu- where Wnm are the elements of the weighting matrix, W. mulative temperature for each sensing point is then given There are three classes of approaches to the definition of graph edges and their corresponding weights, Wnm : 4   0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1  1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0    2 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0   3   1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1   4   1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0   5   0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0   6   0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0   7  0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0  A=   (5) 8   0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0   9   0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0   10   0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0   11   0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0   12   0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0   13   1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0   14  1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0  15 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15   0 0 0 0 0.97 0.91 0 0 0 0 0 0 0 0 0.05 0.90 0.94   1 0 0 0.03 0 0 0 0 0 0 0.37 0 0 0 0.78 0 0     2 0 0.03 0 0 0 0.96 0 0.95 0.98 0 0 0 0 0 0 0     3 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0.88 0.96     4 0.91 0 0 0 0 0 0 0 0 0 0.06 0.01 0.01 0 0.94 0     5 0 0 0.96 0 0 0 0 0 0.97 0.01 0 0 0 0 0 0   6  0 0 0 0 0 0 0 0 0 0.62 0.40 0 0 0 0 0   7  0 0 0.95 0 0 0 0 0 0.94 0 0 0 0 0 0 0  W=   (6) 8  0 0 0.98 0 0 0.97 0 0.94 0 0 0 0 0 0 0 0   9  0 0.37 0 0 0 0.01 0.62 0 0 0 0.25 0 0 0 0 0   10   0 0 0 0 0.06 0 0.40 0 0 0.25 0 0 0 0.85 0 0   11   0 0 0 0 0.01 0 0 0 0 0 0 0 0.92 0 0 0   12   0 0 0 0 0.01 0 0 0 0 0 0 0.92 0 0 0.01 0   13   0.05 0.78 0 0 0 0 0 0 0 0 0.85 0 0 0 0 0   14  0.90 0 0 0.88 0.94 0 0 0 0 0 0 0 0.01 0 0 0  15 0.94 0 0 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15   3.77 0 0 −0.97 −0.91 0 0 0 0 0 0 0 0 −0.05 −0.90 −0.94   0 1.19 −0.03 0 0 0 0 0 0 −0.37 0 0 0 −0.78 0 0   0 −0.03 2.93 0 0 −0.96 0 −0.95 −0.98 0 0 0 0 0 0 0  −0.97 0 0 2.81 0 0 0 0 0 0 0 0 0 0 −0.88 −0.96     −0.91 0 0 0 1.94 0 0 0 0 0 −0.06 −0.01 −0.01 0 −0.94 0     0 0 −0.96 0 0 1.94 0 0 −0.97 −0.01 0 0 0 0 0 0   0 0 0 0 0 0 1.02 0 0 −0.62 −0.40 0 0 0 0 0  0 0 −0.95 0 0 0 0 1.89 −0.94 0 0 0 0 0 0 0 L =  (7)   0 0 −0.98 0 0 −0.97 0 −0.94 2.89 0 0 0 0 0 0 0   0 −0.37 0 0 0 −0.01 −0.62 0 0 1.24 −0.25 0 0 0 0 0   0 0 0 0 −0.06 0 −0.40 0 0 −0.25 1.56 0 0 −0.85 0 0   0 0 0 0 −0.01 0 0 0 0 0 0 0.93 −0.92 0 0 0   0 0 0 0 −0.01 0 0 0 0 0 0 −0.92 0.94 0 −0.01 0  −0.05 −0.78 0 0 0 0 0 0 0 0 −0.85 0 0 1.68 0 0    −0.90 0 0 −0.88 −0.94 0 0 0 0 0 0 0 −0.01 0 2.74 0 −0.94 0 0 −0.96 0 0 0 0 0 0 0 0 0 0 0 1.91 5 11 weight elements, Wnm , for the neighboring vertices are cal- 6 culated based on the horizontal vertex distance, rmn , and 12 the altitude difference, hmn , as 10 4 Wmn = e−αrmn −βhmn , (9) 9 where α and β are suitable constants. The so obtained 14 weight matrix, W, is given in (6). 5 13 Based on (4), a weighted graph signal estimator of cu- 0 mulative temperature now becomes 8 2 1 y = x + Wx. (10) 3 15 In order to produce unbiased estimates, instead of the cu- mulative sums in (4) and (8), the weighting coefficients 7 within the estimate for each y(n) should sum up to unity. This can be achieved through a normalized form of (10), given by 1 6 11 y = (x + D−1 Wx), (11) 2 12 where the elements of the diagonal normalization matrix, 10 D, P are equal to the −1 the degree matrix elements, Dnn = 4 m W nm , while D W is a random walk (diffusion) 9 shift operator [9, 10]. 13 14 Now that we have defined the graph vertices and edge 5 weights we may resort to the data-agnostic clustering ap- 2 proaches, given in Part I - Section 4.3, to cluster the ver- 8 0 1 tices of this graph based on the graph topology. Fig. 4 3 shows the clustering result obtained based on the three smoothest eigenvectors, u1 , u2 , and u3 (excluding the 15 constant eigenvector, u0 ), of the graph Laplacian matrix, 7 L = D − W, of which the values are given in (7). No- 11 6 12 Figure 3: From a multi-sensor temperature measurement to a graph signal. The temperature field is represented on a graph that combines 10 the spatially unaware measurements in Fig. 1(b) and the physically 4 relevant graph topology in Fig. 2(b). The graph signal values are represented in two ways: (top) by vertical lines for which the length 9 14 is proportional to the signal values, and (bottom) by using a “hot” 5 13 colormap to designate the signal values at the vertices. 8 2 0 1 • Already physically well defined edges and weights, 3 • Definition of edges and weights based on the geom- 15 etry of vertex positions, 7 • Data similarity based methods for learning the un- derlying graph topology. All three approaches to define the edge weights are covered in detail in Part III of this monograph. Figure 4: Clustering of the graph from Fig. 2(b) based on the graph Since in our case of geographic temperature measure- Laplacian eigenvectors, u1 , u2 , and u3 . Observe the correct clus- tering of the graph into the clusters that belong to the seaside area ments, the graph weights do not belong to the class of (blue), low mountains (red), low land (yellow), and high mountains obvious and physically well defined edges and weights, we (green). will employ the “geometry of the vertices” based approach for the definition of the edges and weights. In this way, the tice that even such a simple graph clustering scheme was 6 x(5) 2 x(4) x(6) x(0) x(3) 3 1 x(1) x(2) x(7) 0 1 2 3 4 5 6 7 4 0 Figure 5: Directed path graph representation of a classical time- domain signal defined on an equidistant discrete-time grid. 5 7 capable of identifying different physically meaningful geo- 6 graphic regions. This also means that temperature estima- (a) tion can roughly be performed within each cluster, which may even be treated as an independent graph (see graph x(3) x(2) segmentation and graph cuts in Part I, Section 4), rather x(4) x(1) than over the whole sensor network. x(0) x(5) 2 The above-introduced graph data estimation frame- 3 1 work is quite general and admits application to many dif- x(6) ferent scenarios where, after identifying a suitable graph 4 x(7) 0 topology, we desire to perform estimation on data acquired on such graphs, the subject of this part of the monograph. 5 7 6 3. Signals and Systems on Graphs (b) In classical data analytics, a signal is sampled at suc- cessive, equally spaced, time instants. This then dictates Figure 6: Graph representation of periodic data. (a) A directed cir- cular graph. (b) A periodic signal measured on a circular graph. Sig- the ordering of signal samples, with x(n) being preceded nal values, x(n), are designated by vertical lines at the corresponding by x(n−1) and succeeded by x(n+1). The “time distance” vertex, n. between data samples is therefore an inherent parameter in standard data processing algorithms. The relation be- tween sampling instants can also be represented in a graph 2 form, whereby the vertices that correspond to the instants when the signal is sampled and the corresponding edges 3 1 define the linear sampling (vertex) ordering. The equally spaced nature of sampling instants in classical scenarios can then be represented with equal weights for all edges 4 0 (for example, normalized to 1), as shown in Fig. 5. Algorithms defined in discrete time (like, for example, those based on the DFT or other similar data transforms), 5 7 usually assume periodicity of the analyzed signals, which means that sample x(N − 1) is succeeded by sample x(0), 6 in a perpetual sequence. Notice that this case corresponds (a) to the circular graph, shown in Fig. 6, which allows us to use this model in many standard data transforms, such as the DFT, DCT, wavelets, and to define graph-counterparts 2 of other processing algorithms, based on these transforms. 3 1 A signal on general (including also circular) undirected graph is defined by associating real (or complex) data val- 4 0 ues, x(n), to each vertex, as shown in Fig. 7 and Fig. 8. Such signal values can be arranged in a vector form 5 7 6 x = [x(0), x(1), . . . , x(N − 1)]T , (b) so that a graph may be considered as a generalized signal Figure 7: Undirected circular graph (a) and signal on the graph (b). domain. Signal values, x(n), are presented as vertical lines at the correspond- This allows, in general, for any linear processing scheme ing vertex, n. for a graph signal observed at a vertex, n, to be defined 7 0 1 2 3 1 2 3 0 4 4 5 7 6 5 6 7 (a) (a) 2 3 1 0 1 2 4 0 3 4 5 7 6 7 5 6 (b) (b) Figure 9: Graph shift operator on a directed graph (classical circular shift). (a) Elements of a signal, x, shown as red lines on a directed Figure 8: Arbitrary undirected graph (a) and signal on graph (b). circular graph. (b) The shifted version, Ax, of the graph signal from Signal values, x(n), are presented as vertical lines at the correspond- (a). The adjacency matrix of for this graph is given in (14) in Part ing vertex, n. I. as a linear combination of the signal value, x(n), at this The original signal, x, is shown in Fig. 9(a), and its shifted vertex and the signal samples, x(m), at the neighboring version, x1 , in Fig. 9(b). Another simple signal on the undi- rected graph from Fig. 8 (a) is presented in Fig. 10(a), with vertices, that is its shifted version, x1 = Ax, shown in Fig. 10(b). X y(n) = x(n)h(n, n) + x(m)h(m, n), (12) A signal shifted by two graph shifts is obtained by fur- m∈Vn ther shifting x1 = Ax by one shift. The resulting, twice shifted, graph signal is then given by where Vn is the set of vertices in the neighborhood of ver- tex n, and h(m, n) the scaling coefficients. x2 = Ax1 = A(A x) = A2 x. Remark 2: The estimation form in (12) is highly vertex- dependent; it is vertex-invariant only in a very specific Therefore, in general, an m times shifted signal on case of regular graphs, where Vn is a K-neighborhood of graph is given by the vertex n, with h(n, m) = h(n − m). xm = Axm−1 = Am x. We now proceed to define various forms of vertex-invariant filtering functions, using shifts on a graph. These will then Remark 3: Like the standard shift operator, the second be used to introduce efficient graph signal processing meth- order shift of a graph signal is obtained by shifting the ods [11, 12, 13, 14, 15, 16, 17]. already once shifted signal. The role of the shift operator is assumed by the adjacency matrix, A. 3.1. Adjacency Matrix and Graph Signal Shift Consider a graph signal, x, for which x(n) is the ob- 3.2. Systems Based on Graph Shifted Signals served sample at a vertex n. A signal shift on a graph can Very much like in standard linear shift-based systems, be defined as movement of the signal sample, x(n), from a system on a graph can be implemented as a linear combi- its original vertex, n, along all walks of length one, that is nation of a graph signal, x, and its graph shifted versions, K = 1, that start at vertex n. If the signal shifted in this Am x, m = 1, 2, . . . , M − 1. The output signal from a way is denoted by x1 , then its values can be defined using system on a graph can then be written as the graph adjacency matrix, A, as M −1 x1 = Ax. (13) X y = h0 A0 x+h1 A1 x+· · ·+hM −1 AM −1 x = hm Am x m=0 Example 1: As an illustration of a graph signal and its shifted (14) version, consider the signal on a circular graph from Fig. 6(a). 8 0 1 0 2 1 3 4 2 6 7 5 (a) 3 4 6 7 5 0 1 2 input signal x 3 4 (a) 6 7 5 (b) Figure 10: Graph signal shift on an undirected graph. (a) A simple signal, x, on an undirected graph. (b) Shifted version, Ax, of the 0 graph signal from (a). 1 2 where A0 = I, by definition, and h0 , h1 , . . . , hM −1 are 3 4 the system coefficients. For a circular (classical linear sys- 7 tem) graph, this relation reduces to the well known Finite 6 5 Impulse Response (FIR) filter, given by, y(n) = h0 x(n)+h1 x(n−1)+· · ·+hM −1 x(n−M +1). (15) output signal y Keeping in mind that the matrix Am describes walks (b) of the length K = m in a graph (see Property M2 in Figure 11: Example of vertex domain signal filtering. (a) An arbi- Part 1), the output graph signal, y(n), is calculated as a trary graph signal. (b) The output signal obtained through a first- linear combination of the input graph signal values and the oder (averaging) system on a graph, defined as y = x + 0.5 Ax. signal values observed at vertices belonging to the (M −1)- neighborhood of the considered vertex n. also provided. Remark 4: When the minimal and characteristic poly- Example 2: Consider a signal on graph from Fig. 8(a), given nomial are of the same degree, a physically meaningful in Fig. 11(a), and a linear system which operates on this graph, system order (M − 1) should be lower than the number of defined by the coefficients h0 = 1, h1 = 0.5. Observe that this vertices N , that is is, M ≤ N . The corresponding con- system on a graph corresponds to a simple classical first-order dition in classical signal analysis would that the number, weighted moving average system. The output graph signal then M , of the system impulse response coefficients, hm , in (15) represents a weighted average of the signal value at a vertex n should be lower or equal to the total number of signal sam- and the signal values at its K = 1 neighborhood. The output ples, N (for the graph in Fig. 9 it means that the mean- graph signal is shown in Fig. 11(b). ingful graph signal shifts are m = 0, 1, 2, . . . , N − 1, since General system on graph. A system on a graph may the shift for m = N reduce to the shift for m = 0, the shift be defined in the vertex domain as for m = N + 1 is equivalent to the shift for m = 1, and y = H(A)x, (16) so on). Therefore, in general, the system order (M − 1) should be lower than the degree Nm of the minimal poly- where H(A) is a vertex domain system (filter) function. nomial of the adjacency matrix A. For more detail see A system on a graph is then linear and shift invariant if it Part I, Section 3.1. satisfies the following properties of: Remark 5: Any system of order M − 1 ≥ Nm can be 1. Linearity reduced to a system of order Nm − 1. H(A)(a1 x1 + a2 x2 ) = a1 y1 + a2 y2 . Remark 6: If the system order is greater than or equal to the degree of the minimal polynomial, M − 1 ≥ Nm , 2. Shift invariance then there exist more than one system producing the same output signal for a given input signal. All such systems on H(A)[Ax] = A[H(A)x] = Ay. a graph are called equivalent. Remark 7: A system on a graph defined by The statements in the last three remarks will be ad- dressed in more detail in Section 3.5.3, with their proofs H(A) = h0 A0 + h1 A1 + · · · + hM −1 AM −1 (17) 9 is linear and shift invariant since AAm = Am A. 3.4. System on a graph in the GDFT domain Consider a general system on a graph defined in (17), 3.3. Graph discrete Fourier transform (GDFT), adjacency matrix based definition y = H(A)x = h0 A0 + h1 A1 + · · · + hM −1 AM −1 x. Classical exploratory data analysis often employs esti- (22) mation of signals in the spectral (Fourier) domain; this has led to a number of simple and efficient algorithms. While Upon employing the spectral representation of the adja- standard spectral analysis employs an equidistant grid in cency matrix, A = UΛU−1 , we have both time and frequency, following the ideas of a system on a graph, we next show that spectral domain represen- y = h0 UΛ0 U−1 + h1 UΛ1 U−1 + · · · + hM −1 UΛM −1 U−1 x tations of graph signals are naturally based on spectral decompositions of the adjacency matrix or graph Lapla- = U(h0 Λ0 + h1 Λ1 + · · · + hM −1 ΛM −1 )U−1 x cian. = U H(Λ)U−1 x, (23) The graph Fourier transform of a signal, x, is defined as with the system on a graph transfer function X = U−1 x (18) H(Λ) = h0 Λ0 + h1 Λ1 + · · · + hM −1 ΛM −1 , (24) where X denotes a vector of the GDFT coefficients, and U is a matrix whose columns represent the eigenvectors where Λ is the matrix of eigenvalues of A. of the adjacency matrix, A. Denote the elements of the A pre-multiplication of this relation with U−1 , yields vector X by X(k), for k = 0, 1, . . . , N − 1, and recall that for undirected graphs, the adjacency matrix is symmetric, U−1 y = H(Λ)U−1 x (25) that is, AT = A, and that the eigenmatrices of a symmet- ric matrix satisfy the property From (18), the terms U−1 y and U−1 x are respectively the GDFTs of the output graph signal, y, and the input U−1 = UT . graph signal, x, so that the spectral domain system on a graph relation becomes The element, X(k), of the graph Fourier transform vec- tor, X, therefore represents a projection of the considered Y = H(Λ) X, (26) graph signal, x(n), onto the k-th eigenvector of A (a basis function), given by The output graph signal in the vertex domain can then be calculated as N X −1 X(k) = x(n)uk (n). (19) y = H(A)x = IGDFT{H(Λ) X}. (27) n=0 The element-wise form of the system on a graph in (26) is In this way, the graph discrete Fourier transform can be of the form interpreted as a set of projections (signal decomposition) −1 onto the set of eigenvectors, u0 , u1 , . . . , uN −1 , which serve Y (k) = (h0 + h1 λk + · · · + hM −1 λM k )X(k), as orthonormal basis functions. The inverse graph discrete Fourier transform is then where λk denotes the kth eigenvalue of the adjacency ma- straightforwardly obtained from (18) as trix, A. From (24) and the above equation, we can now define the transfer function of a system on a graph in the x = U X, (20) form or element-wise Y (k) −1 H(λk ) = = h0 + h1 λk + · · · + hM −1 λM k . (28) X(k) N X −1 x(n) = X(k)uk (n). (21) Remark 8: The classical linear system in (15) can be ob- k=0 tained directly from its graph counterpart in (28) when Observe that, for example, for a circular graph from the graph is directed and circular. This is because the Fig. 6, the graph discrete Fourier transform pair in (19) adjacency matrix of a directed circular graph has eigen- and (21) reduces to the standard discrete Fourier trans- values λk = e−j2πk/N (see Part I, Section 3.2.1 for more form (DFT) pair. For this reason, the transform in (19) detail on directed circular graphs), which are identical to and its inverse in (21) are referred to as the graph discrete the samples on the unit circle in classical DFT. Fourier transform (GDFT) and the inverse graph discrete Similar to the z-transform in classical signal processing, Fourier transform (IGDFT). for systems on graphs we can also introduce the system transfer function in the z-domain . 10 The z-domain transfer function of a system on a graph with the normalized adjacency matrix in (31) in the same is defined as way as with the original adjacency matrix. An important property which does not apply to standard adjacency ma- H(z −1 ) = Z{hn } = h0 + h1 z −1 + · · · + hM −1 z −(M −1) , trices is that the normalization of adjacency matrix yields (29) a simpler eigenvector and eigenvalue ordering scheme, as for n = 0, 1, . . . , M − 1. Obviously, from (28), we have shown next. H(λk ) = H(z −1 )|z−1 =λk . 3.5.2. Spectral Ordering of Eigenvectors of the Adjacency However, the definition of the z-transform for arbitrary Matrix graph signals, x(n) and y(n), that would satisfy the re- For physically meaningful low-pass and high-pass filter- lation Y (z −1 ) = H(z −1 )X(z −1 ) is not straightforward, ing on a graph, we need to establish the notion of spectral which limits the application of the z-transform on graphs. order. This, in turn, requires a criterion to classify the This will be discussed in more detail in Section 3.10. eigenvectors (corresponding to the GDFT basis functions) into the slow-varying and fast-varying ones. 3.5. Graph Signal Filtering in the Spectral Domain of the Remark 10: In classical Fourier analysis, the basis func- Adjacency Matrix tions are ordered according to their frequency, whereby, for The energy of a graph shifted signal is given by example, low-pass (slow varying) basis functions are har- monic functions characterized by low frequencies. On the 2 2 kx1 k2 = kAxk2 . other hand, the notion of frequency of the eigenvectors of the graph adjacency matrix, which serve as a basis for for However, as shown in Fig. 10, in general, the energy of a signal decomposition, is not defined and we have to find shifted signal is not the same as the energy of the original another criterion to classify or rank order the eigenvectors. signal, that is Again, we draw the inspiration from classical Fourier anal- 2 2 kAxk2 6= kxk2 . ysis which suggests that the energy of the “signal change” On the other hand, in graph signal processing it is often can be used instead of frequency to indicate the rate of desirable that a graph shift does not increase signal energy. change of an eigenvector along time. One such graph shift operator is introduced bellow. Energy of signal change. The first graph difference can Remark 9: Using the matrix two-norm it is straightfor- be defined for graph signals as a difference of the original ward to show that the ratio of energies of the graph shifted graph signal and its graph shift, that is, signal, Ax, and the original graph signal, x, satisfies the ∆x = x − x1 = x − Anorm x. relation kAxk2 2 xT AT Ax In analogy to classical analysis, the energy of signal change max{ 2 } = max{ 2 } = λ2max . (30) can then be defined as the energy of the first difference of kxk2 kxk2 a graph signal x, and takes the form where λmax = maxk |λk |, k = 0, 1, . . . , N − 1. 2 2 1 E∆x = kx − Anorm xk2 = x − Ax . λmax 2 3.5.1. Normalization of the Adjacency Matrix From (30), for the energy of a graph shifted signal, When the graph signal assumes a specific form of an 2 kAxk2 , not to exceed the energy of the original graph sig- eigenvector, x = u, of the adjacency matrix, A, the energy 2 of this eigenvector change is equal to nal, kxk2 , we may employ the normalized adjacency ma- trix, defined as 2 2 1 1 = 1 − λ , Anorm = A (31) E∆u = u − λu (32) λmax λmax 2 λmax as a graph shift operator within any system on a graph. whereby the normalized adjacency matrix, Anorm , is used While this kind of normalization still does not make the to bound the energy of the shifted graph signal. In the shift on a graph isometric, the energy of the signal shifted 2 derivation we have also used Au = λu and kuk2 = 1. in this way is guaranteed not to be bigger than the energy Now, the lower values of E∆u indicate that u is slow- of the original graph signal, since varying, E∆u = 0 indicates that the signal is constant, 2 kAnorm xk2 ≤ kxk2 . 2 while larger values of E∆u are associated with fast changes of u in time. The form in (32) is also referred to as the The equality holds only for a very specific signal which is two-norm total variation of a basis function/eigenvector. proportional to the eigenvector that corresponds to λmax . Therefore, if the change in a basis function, u, has a large The basic shift on a graph, system on a graph, and energy, then the eigenvector, u, can be considered to be- graph spectral domain representations can be implemented long to the higher spectral content of the graph signal. 11 Remark 11: From (32), the energy of the rate of change of a graph signal is minimal for λ = λmax and it increases 0 as λ decreases (see Fig. 10 in Part 1). 1 Now that we have established a criterion for the or- 2 dering of eigenvectors, based on the corresponding eigen- 3 4 values, we shall proceed to define an ideal low-pass filter 6 7 5 on a graph. The intuition behind low-pass filtering in the graph domain is that such a filter should pass unchanged all signal components (eigenvectors of A) for which the rates of change are slower than that defined by the cut- off eigenvalue, λc (cf. cut-off frequency), while all signal components (eigenvectors) which exhibit variations which are faster than that defined by the cut-off eigenvalue, λc , 0 1 2 3 4 5 6 7 should be suppressed. The ideal low-pass filter in the graph domain is therefore defined as (a) original signal, x = 3.2u7 + 2u6 ( 1, for λ > λc , 0 f (λ) = 0, for other λ. 1 2 Example 3: Consider again the undirected graph from Fig. 8(a) on which we observe a graph signal shown in Fig. 12(a), 3 4 which is constructed as a linear combination of two of the 6 7 5 eigenvectors of the adjacency matrix of this graph to give x = 3.2u7 + 2u6 (eigenvectors of the adjacency matrix of the con- sidered graph are presented in Part I, Fig. 10). The signal is corrupted by additive white Gaussian noise, ε, at the signal- to-noise (SNR) ratio of SN Rin = 2.7dB and the noisy graph signal, xε = x + ε, is shown in Fig. 12(b). This noisy signal is next filtered using an ideal spectral domain graph filter with a cut-off eigenvalue of λc = 1. The output signal, xf , is shown in Fig. 12(c). With SN Rout = 18.8dB, an increase in signal 0 2 6 quality of 16.1dB is achieved with this type of filtering. 1 3 4 5 7 Remark 12: The energy of the rate of change of an eigen- vector is consistent with the classical DFT based filtering when λk = exp(−j2πk/N ) and λmax = 1. (b) noisy signal, xε = x + ε 3.5.3. Spectral Domain Filter Design 0 We shall denote by G(Λ) the desired graph transfer 1 function of a system defined on a graph. Then, a system with this transfer function can be implemented either in 2 the spectral domain or in the vertex domain. 3 4 In the spectral domain, the implementation is straight- 7 forward and can be performed in the following three steps: 6 5 1. Calculate the GDFT of the input graph signal, X = U−1 x, 2. Multiply the GDFT of the input graph signal by the graph transfer function, G(Λ), to obtain the output 0 1 2 3 4 5 6 7 spectral form, Y = G(Λ)X, and 3. Calculate the output graph signal as the inverse GDFT (c) filtered signal of Y in Step 2, that is, y = UY. Figure 12: A low-pass graph signal filtering example. (a) Original This procedure may be computationally very demand- signal, x = 3.2u7 + 2u6 . (b) Noisy signal, xε = x + ε, at an SN R = 2.7dB. (c) Filtered signal, at an SN R = 18.8dB. . Ideal low-pass ing for large graphs, where it may be more convenient to filtering based on the two highest eigenvalues in the pass-band was implement the desired filter (or its close approximation) applied. directly in the vertex domain. 12 For the implementation in the vertex domain, the task 2. If some of the eigenvalues are of a degree higher than is to find the coefficients (cf. standard impulse response) one (minimal polynomial order, Nm , is lower than h0 , h1 , . . . , hM −1 in (14), such that their spectral repre- the number of vertices, N ) the system in (33) reduces sentation, H(Λ), is equal (or approximately equal) to the to a system of Nm linear equations (by removing desired G(Λ). This is performed in the following way. multiple equations which correspond to the repeated The transfer function of the vertex domain system is given eigenvalues λ). by (28) as H(λk ) = h0 + h1 λ1k + · · · + hM −1 λkM −1 and should be equal to the desired transfer function, G(λk ), for k = 0, 1, . . . , N − 1. This condition leads to a system (a) If the filter order, M , is such that Nm < M ≤ of linear equations N , the system in (33) is underdetermined. In that case (M − Nm ) filter coefficients are free −1 variables and the system has an infinite num- h0 + h1 λ10 + · · · hM −1 λM 0 = G(λ0 ) −1 ber of solutions, while all so obtained filters are h0 + h1 λ11 + · · · + hM −1 λM 1 = G(λ1 ) equivalent. .. . −1 (b) If the filter order is such that M = Nm , the h0 + h1 λ1N −1 + · · · + hM −1 λM N −1 = G(λN −1 ). (33) solution to the system in (33) is unique. The matrix form of this system is then (c) If the filter order is such that M < Nm , the sys- Vλ h = g, (34) tem in (33) is overdetermined and the solution is obtained in the least squares sense. where Vλ is the Vandermonde matrix form of the eigen- values λk , given by −1  3. Any filter of an order M > Nm has a unique equiva- λ10 ··· λM  1 0 −1  lent filter of order Nm . This equivalent filter can be 1 λ11 ··· λM 1 Vλ =  . (35) obtained by setting the free variables to zero, that .. ..     .. .. . . .  is, hi = 0 for i = Nm , Nm + 1, . . . , N − 1. −1 1 λ1N −1 ··· λM N −1 Finding the system coefficients and Exact solution: For M = N = Nm , that is, when the h = [h0 , h1 , . . . , hM −1 ]T (36) filter order is equal to the number of vertices and the order is the vector of system coefficients which need to be calcu- of minimal polynomial, the solution to the system in (33) lated to obtain the desired or (34) is unique and is obtained from g = [G(λ0 ), G(λ1 ), . . . , G(λN −1 )]T = diag(G(Λ)). (37) h = Vλ−1 g. Least-squares solution. For the overdetermined case, Comments on the solution in (33): when M < Nm , the mean-square approximation of h = [h0 , h1 , . . . , hM −1 ]T in Vλ h = g is obtained by minimizing 1. Consider the case with N vertices and with all dis- the squared error tinct eigenvalues of the adjacency matrix (in other words, the minimal polynomial is equal to the char- 2 e = kVλ h − gk2 . acteristic polynomial, Pmin (λ) = P (λ)). From ∂e/∂hT = 0 we then have (a) If the filter order, M , is such that M = N , then ĥ = (VλT Vλ )−1 VλT g = pinv(Vλ )g. the solution to (33) is unique, since the deter- minant of the Vandermonde matrix is always Since M < Nm , the obtained solution, ĥ, is the least- nonzero. squares approximation for Vλ h = g. Given that this so- lution may not satisfy Vλ h = g, the designed coefficient (b) If the filter order, M , is such that M < N , then vector, ĝ (its spectrum Ĝ(Λ)), obey the system in (33) is overdetermined. There- Vλ ĥ = ĝ fore, the solution to (33) can only be obtained in the least squares sense (as described later in which, in general, differs from the desired system coeffi- this section). cients, g (their spectrum G(Λ)). 13 1.5 1.5 For this the reason, the approximation of the desired 1 1 transfer function, G(λ), may be performed using the trun- cated Chebyshev polynomial 0.5 0.5 M −1 c0 X 0 0 PM −1 (z) = + cm Tm (z), (38) 2 m=1 -1 1 3 -1 1 3 where Tm (z) are the Chebyshev polynomials defined as 1.5 1.5 T0 (z) = 1, 1 1 T1 (z) = z, 0.5 0.5 T2 (z) = 2z 2 − 1, 0 0 T3 (z) = 4z 3 − 3z, .. -1 1 3 -1 1 3 . Tm (z) = 2zTm−1 (z) − Tm−2 (z), (39) Figure 13: Design of a graph filter with a specified transfer func- tion in the spectral domain (cf. standard frequency response). The desired spectral response, G(λk ), is denoted by blue circles. Red with the variable λ being centered and normalized as asterisks designate the spectral response of the filter designed in Ex- ample 4, denoted by Ĝ(λk ), obtained with M filter coefficients, h0 , 2λ − (λmax + λmin ) z= , (40) h1 , . . . , hM −1 , in the vertex domain. λmax − λmin such that −1 ≤ z ≤ 1 (required by the Chebyshev poly- Example 4: Consider the unweighted graph from Fig. 8(a) nomial definition). The inverse mapping, from z to λ, is and the task of the synthesis of a desired filter for which the given by frequency response is described by g = [0, 0, 0, 0, 0, 0.5, 1, 1]T . 1 λ= z(λmax − λmin ) + λmax + λmin . 2 This filter was designed for various filter orders M = 1, 2, 4, 6, using (33) and the results are shown in Fig. 13. For clarity, Since the √Chebyshev polynomials are orthogonal, with analytically, the vertex domain realization of the filter with measure dz/ 1 − z 2 , the Chebyshev coefficients, cm , for M = 4 is given by an expansion of the desired function, G(z), into the poly- y = 0.1734A0 x + 0.3532A1 x + 0.0800A2 x − 0.0336A3 x, nomial series, PM −1 (z), are easily obtained as however, the exact frequency response ĝ = g is only obtained 2 1 dz Z with M = N = 8. cm = G(z)Tm (z) √ π −1 1 − z2 Z π 3.5.4. Polynomial (Chebyshev) Approximation of the Sys- 2 = cos(mθ)G(cos(θ))dθ. tem on a Graph Transfer Function π 0 Without loss of generality, it can be considered that the Example 5: Consider the unweighted graph from Fig. 8(a) desired transfer function, g = [G(λ0 ), G(λ1 ), . . . , G(λN −1 )]T , with the desired transfer function consists of samples taken from a continuous function of λ within the interval λmin ≤ λ ≤ λmax , where λmin and λmax 1 + sign(λ − λ5 ) G(λ) = . denote the minimum and maximum value of {λ0 , λ1 , . . . , 2 λN −1 }, respectively. The variable λ of the desired trans- The samples of G(λ) at the discrete points fer function, G(λ), is continuous, and the system on graph uses only the values at discrete points λ ∈ {λ0 , λ1 , . . . , λN −1 }. λk ∈ {−2, −1.74, −1.28, −0.68, −0.41, 1.11, 1.81, 3.19}, Therefore, for a polynomial approximation, P (λ), of the correspond to the values of G(λk ) in Example 4, Fig. 13. Since desired transfer function, G(λ), it is important that the the minimum and maximum eigenvalues are λmin = −2 and error at the points within the considered interval, λmin ≤ λmax = 3.19, this yields the desired transfer function with a λ ≤ λmax , is bounded and sufficiently small. variable z within a normalized interval, −1 ≤ z ≤ 1, This problem is known in algebra as the min-max ap- 1 + sign(z − z5 ) proximation, and its goal is to find an approximating poly- G(z) = , 2 nomial that has the smallest maximum absolute error from the desired function value. The min-max polynomials can where z5 is defined by (40) as be approximated by the truncated Chebyshev polynomi- 2λ5 − (λ7 + λ0 ) als, P (λ), which yield approximations of the desired func- z5 = = 0.2. λ7 − λ0 tion having almost min-max behavior. 14 Calculation complexity. If the number of nonzero ele- 1 1 ments in the adjacency matrix, A, is NA , then the num- ber of arithmetic operations (additions) to calculate Ax is of NA order. The same number of operations is re- quired to calculate A2 x = A(Ax) using the available 0 0 Ax. This means that the total number arithmetic op- -1 1 3 -1 1 3 erations (additions) to calculate all Ax, A2 x,...,AM −1 x is of order M NA . Adding these terms requires additional M NA arithmetic operations (additions), while the calcu- lation of all terms of the form cm Am x requires an order of 1 1 M NA multiplications by constants cm , m = 0, 1, . . . , M − 1. Therefore, to calculate the output graph signal, y = P̄M −1 (A)x, an order of 2M NA additions and M NA mul- 0 0 tiplications is needed. Notice that the eigenanalysis of the adjacency matrix, A, requires an order of N 3 arithmetic -1 1 3 -1 1 3 operations. For large graphs, the advantage in calculation Figure 14: Design of a graph filter with a specified transfer function complexity of the vertex domain realization with the poly- in the spectral domain using the Chebyshev polynomial approxima- nomial transfer function approximation, y = P̄M −1 (A)x, tion of order (M − 1) with M terms, T0 (z), T1 (z), . . . , TM −1 (z). The is obvious. desired spectral response, G(λ), is denoted by blue dashed line and blue dots. Red lines designate the spectral response of the designed As is common place in standard filter design theory, Chebyshev approximation. the transition intervals of the approximated transfer func- tion, G(λ), can be appropriately smoothed, to improve the approximation. 0 In general, the mapping in (40) from λ to z can be 1 written as z = aλ + b, where a = 2/(λmax − λmin ) and 2 b = −(λmax + λmin )/(λmax − λmin ). The Chebyshev poly- 4 nomials series in λ is then of the form 3 6 7 5 c0 M X −1 P̄M −1 (λ) = + cm T̄m (λ), (41) 2 m=1 with T̄0 (λ) = 1, T̄1 (λ) = aλ + b, and T̄m (λ) = 2(aλ + b)T̄m−1 (λ) − T̄m−2 (λ), 0 1 2 3 4 5 6 7 for m ≥ 2. The same relations hold for Figure 15: Vertex-domain filtering result for the noisy signal from M −1 Fig. 12, using the Chebyshev approximation of the desired transfer c0 X P̄M −1 (A) = + cm T̄m (A), (42) function from Fig. 14 with M = 4. 2 m=1 The Chebyshev series for (M − 1) = 3 is given by This change of variables admits recursive calculation, as in (39). PM −1 (z) = 0.43 + 0.62T1 (z) + 0.12T2 (z) − 0.18T3 (z) = 0.31 + 1.16z + 0.24z 2 − 0.72z 3 . 3.5.5. Inverse System on a Graph A system on a graph, H(Λ), which represents an in- Upon the change of variables, z → λ, we obtain the form verse of the system on a graph, given by G(Λ), can be P̄M −1 (λ) = 0.07 + 0.36λ + 0.11λ2 − 0.04λ3 . obtained from their generic relationship Graph signal filtering can now be performed in the vertex H(Λ)G(Λ)X = X. domain using y = P̄M −1 (A)x, According to (37), this in turn means that if all G(λk ) 6= 0 where and P (λ) = Pmin (λ), then H(λk ) = 1/G(λk ) for each k. P̄M −1 (A) = 0.07 + 0.36A + 0.11A2 − 0.04A3 . 3.6. Graph Fourier Transform Based on the Laplacian The result of the vertex domain filtering using P̄M −1 (A) is shown in Fig. 15 for the noisy signal from Fig. 12, with the Similar to the graph graph discrete Fourier transform SNR improvement of 16.76 dB. based on the adjacency matrix, spectral representation of 15 a graph signal can be alternatively based on eigenvalue or decomposition of the graph Laplacian, given by uT Lu = λuT u = λ = Eu . L = UΛU−1 This means that the quadratic form of an eigenvector, uk , is equal to its corresponding eigenvalue. This is elaborated or LU = UΛ. in detail in Section 4.2.1 in Part I, where we have shown Although the analysis can be conducted in a unified way that for spectral decompositions based on both the adjacency N −1 N −1 matrix and the graph Laplacian, due to their different be- 1 X X 2 uTk Luk = λk = Wnm uk (n) − uk (m) ≥ 0. havior and scope of application, these will be considered 2 n=0 m=0 separately. (45) The graph Fourier transform of a signal, x, which em- ploys the graph Laplacian eigenvalue decomposition to de- Obviously, a small uTk Luk = λk implies a small variation of fine its basis functions, is given by Wnm (uk (n) − uk (m))2 in the eigenvector uk , and for each vertex n. Consequently, the eigenvectors corresponding X = U−1 x, (43) to small λk correspond to the low-pass part of a graph signal. In other words, the smaller the smoothness index where the matrix U comprises in its columns the eigen- (curvature), uTk Luk = λk , the smoother the eigenvector, vectors of the graph Laplacian. The inverse graph Fourier uk . transform then follows immediately in the form An ideal low-pass filter in the Laplacian spectrum do- x = U X. (44) main, with a cut-off eigenvalue λc , can be therefore defined as ( In the case of undirected circular unweighted graph, 1, for λ < λc f (λ) = such as the graph in Fig. 7(a), this Laplacian based spec- 0, for other λ. tral analysis reduces to the standard Fourier transform, but with real-valued basis functions (eigenvectors), as shown Example 6: Consider a signal on the undirected graph from in Part I, equation (32). Fig. 8(a), shown in Fig. 16(a). This graph signal is generated as a linear combination of two Laplacian eigenvectors (which 3.7. Ordering and Filtering in the Laplacian Spectral Do- correspond to the slow-varying signal part), to give x = 2u0 + 1.5u1 . The Laplacian eigenvectors of the considered graph are main presented in Part I, Fig. 13, while the considered graph signal is As shown in Section 3.5.2, the graph shift and the ad- shown in Fig. 16(a). The original signal, x, was then corrupted jacency matrix are related to the first finite difference of by white Gaussian noise at the signal-to-noise ratio of SN Rin = eigenvectors in the vertex domain, while the rate of the −1.76 dB, and shown in Fig. 16(b). Next, this noisy graph eigenvector change is related to its corresponding eigen- signal was filtered using an ideal spectral domain graph filter, value (cf. standard frequency). A similar approach can be with a cut-off eigenvalue λc = 2, to obtain the filtered signal, used for the Laplacian based eigendecomposition. We have xf , shown in Fig. 16(c). The so achieved output SNR was seen that for standard time domain signals, the Laplacian SN Rout = 21.29 dB, that is, despite its simplicity, the graph of a circle graph represents the second order finite differ- filter achieved a gain in SNR of 23.05 dB, as compared to the noisy signal in Fig. 16(b). ence, y(n), of a signal u(n), that is To further illustrate the principle of graph filtering, the y(n) = −u(n − 1) + 2u(n) − u(n + 1), noisy signal from Fig. 3 was filtered using a filter with the spectral cut-off at λc = 0.25 and the result is shown in Fig. 17. as shown in Section 3.3.2 in Part I. A compact expression The same signal was also filtered using a polynomial approxi- for all elements of the Laplacian can then be written in a mation to the low-pass system, as illustrated in Fig. 18. matrix form as y = Lu. It is now obvious that the eigen- Laplacian versus adjacency-based GDFT for reg- vectors, u, which exhibit small variations should also have ular graphs. A direct relation between the adjacency- a small cumulative energy of the second order difference based and Laplacian-based spectral decomposition can be X h 2 2 i established for J -regular unweighted graphs (see (13) in Eu = u(n) − u(n − 1) + u(n) − u(n + 1) /2. Part I), for which n L = JI−A Recall that this expression corresponds to the quadratic to yield (A) (L) form of the eigenvector, u, defined by Eu = uT Lu. λk = J − λk , The above reasoning for the Laplacian quadratic form where the eigenvalues of the adjacency matrix and the can also be used for graph signals. As a default case for the (A) (L) graph Laplacian are respectively denoted by λk and λk , Laplacian analysis we will consider undirected weighted while they share the same eigenvectors. graphs, where by definition Remark 13: Rank-ordering of the eigenvectors, uk , from Lu = λu, uT u = 1 low-pass to high-pass, which is based on the respective 16 11 0 6 1 12 2 10 3 4 4 6 7 5 9 14 5 13 8 2 0 1 3 7 15 0 1 2 3 4 5 6 7 (a) original signal Figure 17: Denoising results for the noisy signal from Fig. 3, which 0 was filtered using a low-pass graph filter with λc = 0.25. 1 2 (A) (L) eigenvalues, λk and λk , yields exactly opposite ordering 3 4 for these two graph spectral decompositions. For exam- 6 7 5 (L) ple, the smoothest eigenvector is obtained for mink λk = (L) (A) (L) λ0 = 0 or for maxk λk = λmax = J − λ0 = J . 3.8. Systems on a Graph Defined Using the Graph Lapla- cian Following on the discussion in Section 3.2 and equa- tion (14), a system on a graph, defined using the graph Laplacian, has the form 1 7 0 2 3 4 5 6 y = h0 L0 x + h1 L1 x + · · · + hM −1 LM −1 x M X −1 (b) noisy signal = hm Lm x. (46) m=0 0 For an unweighted graph, this definition of a system 1 on a graph can be related to the corresponding adjacency matrix form as L = D − A. 2 The spectral domain description of a system on a 3 4 graph is then obtained through the Laplacian eigenvalue 6 7 5 decomposition, to yield M X −1 y = UY = hm Lm x = H(L)x m=0 = UH(Λ)UT x = UH(Λ)X, (47) 7 where we used the property of the eigendecomposition of 0 1 2 3 4 5 6 matrix polynomial, (c) filtered signal M X −1 M X −1 H(L) = hm Lm = hm UΛm UT = UH(Λ)UT Figure 16: Graph signal filtering example. (a) Original signal. (b) m=0 m=0 Noisy signal. (c) Filtered signal. Low pass filtering was performed (48) based on the two lowest eigenvalues of the graph Laplacian. 17 described in Section 3.2.3 in Part I, and the notation For illustration, consider the delta function located at a graph vertex m, given by M X −1 hm Λm ( H(Λ) = (49) 1, for m = n m=0 δm (n) = (53) 0, for m 6= n to obtain Y = H(Λ)X with the corresponding GDFT N −1 or in an element-wise form X ∆(k) = δm (n)uk (n) = uk (m), (54) Y (k) = H(λk )X(k), k = 0, 1, . . . , N − 1. n=0 which is defined based on graph Laplacian eigenvectors. Observe that, similar to the standard time domain, any 11 graph signal can be written as a sum of delta functions at 6 the graph vertices, that is 12 N −1 10 X x(n) = x(i)δn (i) 4 i=0 9 13 14 or in a vector form 5 N X −1 8 2 0 x= x(i)δ i , 1 i=0 3 where δ i is a vector with elements δ(n − i), as in (53). 15 Then, the system output, y, takes the form 7 M X −1 y= hm Lm x = UH(Λ)UT x m=0 N X −1 = x(i)UH(Λ)UT δ i Figure 18: Graph filtering of a noisy signal from Fig. 3, using a i=0 fourth-order system given by y = h0 L0 x + h1 L1 x + h2 L2 + h3 L3 + h4 L4 . and its elements are obtained as N −1 N −1 N −1 In the vertex domain, the n-th element of the output X X X signal, y = UH(Λ)UT x, of a system on a graph is given y(n) = x(i) uk (n)H(λk )uk (i) = x(i)hn (i), by i=0 k=0 i=0 N −1 N −1 N −1 where hn (i) are related to H(λk ) as in (52). X X X y(n) = x(i)uk (i)H(λk )uk (n) = x(i)hn (i), Remark 15: According to (47), the form of the graph con- k=0 i=0 i=0 volution operator for a vertex n, given in (50), is localized (50) within the (M − 1)-neighborhood of vertex n. This local- for which the transfer function is defined by ization property is even more important for large graphs. −1 A generalized convolution for two arbitrary graph sig- H(λk ) = h0 + h1 λk + · · · + hM −1 λM k (51) nals will be addressed next. and the graph impulse response is 3.9. Convolution of Signals on a Graph N X −1 Consider two graph signals, x(n) and h(n). A general- hn (i) = H(λk )uk (n)uk (i) = Tn {h(i)}. (52) ized convolution operator for these two signals on a graph k=0 is defined using their graph Laplacian spectra [18], based on the assumption that the spectrum of a convolution on Remark 14: The expression for y(n) in (50) can be in- a graph terpreted as a generalized convolution on graphs, which is y(n) = x(n) ∗ h(n) performed using a generalized graph shift of the impulse response, hn (i), in the vertex domain. is equal to the product of the corresponding spectra of graph signals, x(n) and h(n), that is We next proceed to describe the generalized convolu- tion on graphs through responses to the unit delta pulses. Y (k) = X(k)H(k), (55) 18 in the element-wise form. The output of the generalized graph convolution operation, x(n) ∗ h(n), is then equal to the inverse GDFT of the spectral product Y (k) in (55), that is 0 1 2 3 4 5 6 7 y(n) = x(n) ∗ h(n) N −1 N −1 X X 1 = Y (k)uk (n) = X(k)H(k)uk (n), 0 2 k=0 k=0 3 4 0 1 2 3 4 5 6 7 5 where 7 N X −1 6 H(k) = h(n)uk (n). (56) 1 n=0 0 Notice the difference between the definition of H(k) in (56) 3 2 4 and H(λk ) in (51). Both these forms will be discussed in 0 1 2 3 4 5 6 7 5 more detail in the next section. 6 7 Shift on a graph – an alternative definition. The above framework of generalized graph convolution can also 0 1 serve as a basis for a convenient definition of a shift on a 2 4 graph. Consider the graph signal, h(n), and the delta 3 5 function located at a vertex m. Here, we will use hm (n) to 0 1 2 3 4 5 6 7 7 denote the shifted version of the graph signal, h(n), “to- 6 ward” a vertex m. This kind of shifted signal will be de- 1 fined following the reasoning in classical signal processing 0 where the shifted signal is obtained as a convolution of the 3 2 4 original signal and an appropriately shifted delta function. 0 1 2 3 4 5 6 7 5 Therefore, a graph shifted signal is here defined through a 6 7 generalized graph convolution, h(n) ∗ δm (n), whose GDFT is equal to H(k)uk (m), according to (54) and (55). The 1 0 graph shifted signal is then the IGDFT of H(k)uk (m), that 2 4 is 3 0 1 2 3 4 5 6 7 5 N −1 7 X 6 hm (n) = h(n) ∗ δm (n) = H(k)uk (m)uk (n). (57) k=0 1 0 The same relation follows when calculating the inverse 2 4 3 GDFT of X(k)H(k), to yield 0 1 2 3 4 5 6 7 5 7 6 N X −1 y(n) = X(k)H(k)uk (n) 1 k=0 0 N −1 N −1 2 4 X X 3 = x(m)uk (m)H(k)uk (n) 0 1 2 3 4 5 6 7 5 k=0 m=0 6 7 N X −1 = x(m)hm (n) = x(n) ∗ h(n), (58) 1 0 m=0 2 4 3 where 0 1 2 3 4 5 6 7 5 7 N −1 6 X hm (n) = H(k)uk (m)uk (n) = Tm {h(n)} (59) k=0 Figure 19: An example of graph shift operator. Top: The graph signal defined by its Laplacian GDFT, given by H(k) = exp(−2λk τ ). is another version of graph shifted signal. Since the defini- Left and right column: The graph signals hm (n) ”shifted” for m = 0 tion of H(k) as a GDFT of a signal h(n) differs from that to m = 7, calculated using hm (n) = Tm {h(n)} in (59). The shifted in (51), these produce different shift operations, which are signal is shown both on the vertex index line (left) and on the graph itself (right). respectively denoted by Tm {h(n)} and Tm {h(n)}. 19 Remark 16: Note that neither of the two shift operations, 2 (52) or (59), satisfy the property that a shift by 0 is equal z -1 complex plane to the original signal, h0 (n) 6= h(n). 1.5 Example 7: Consider a signal on graph from Fig. 8(a), which 1 4 6 is defined by its graph Laplacian GDFT, given by 0.5 2 H(k) = exp(−2λk τ ), 1 0 0 3 with τ = 0.1573. All shifted signals, hm (n) = Tm {h(n)}, ob- -0.5 7 tained using the shift operator in (59), are shown in Fig. 19. 5 -1 3.10. The z-transform of a Signal on a Graph -1.5 The relation between the graph signal shift operators, Tm {h(n)} and Tm {h(n)}, which are respectively used used -2 to define the generalized convolutions in (51) and (58), -2 -1 0 1 2 can be established based on the definitions of H(λk ) and H(k). Consider H(λk ), defined by (51), as a graph discrete Figure 20: Complex eigenvalues of the adjacency matrix of a directed Fourier transform of signal h(n). The samples of the graph graph in Fig. 1(b) in Part I. signal h(n) are then equal to the IGDFT of H(λk ), that is N −1 The z-transform of graph signals. For a given graph signal x = [x(0), x(1), . . . , x(N −1)]T , following the reason- X h(n) = H(λk )uk (n) k=0 ing as in (60), the coefficients of a system [x0 , x1 , . . . , xN −1 ]T which corresponds to a system transfer function that would while the system coefficients hn , n = 0, 1, . . . , M − 1, are have the same GDFT as the graph signal itself are related to H(λk ) by (51), that is [x0 , x1 , . . . , xN −1 ]T = Vλ−1 UT [x(0), x(1), . . . , x(N − 1)]T −1 H(λk ) = h0 + h1 λk + · · · + hM −1 λM k . or For M = N , the vector forms of the last two relations are [x0 , x1 , . . . , xN −1 ]T = Vλ−1 [X(0), X(1), . . . , X(N − 1)]T . T [h(0), h(1), . . . , h(N − 1)] = UH(Λ) The graph z-transform of a signal x is therefore equal to H(Λ) = Vλ [h0 , h1 , . . . , hN −1 ]T the classic z-transform of coefficients [x0 , x1 , . . . , xN −1 ]T , so that the signal, h(n), and the coefficients, hn , can be X(z −1 ) = Z{xn } = x0 +x1 z −1 +· · ·+xN −1 z −(N −1) (61) related as so that the following holds [h0 , h1 , . . . , hN −1 ]T = Vλ−1 UT [h(0), h(1), . . . , h(N −1)]T . (60) Y (z −1 ) = H(z −1 )X(z −1 ) Remark 17: In classical DFT (the case of a directed cir- The output signal, y(n), can now be obtained as cular graph and its adjacency matrix, when UH should be used instead of UT ), the signal samples, h(n), which are [y(0), y(1), . . . , y(N − 1)]T = UVλ [y0 , y1 , . . . , yN −1 ]T , obtained as the inverse DFT of H(λk ) and the system coef- ficients, hn , are the same, since the eigenvalues are equal to where the output graph signal, y(n), results from the in- the corresponding shift operators in the spectral domain, verse z-transform of the coefficients, yn , that is √ λk = √ exp(−j2πk/N ) and u√ k (n) = exp(j2πnk/N )/ N = Y (z −1 ) = Z{yn } = y0 + y1 z −1 + · · · + yN −1 z −(N −1) . λ−n k / N , with hn = h(n)/ N and The z-transform representation in the complex valued N −1 1 X z-domain may be of interest when the eigenvalues are complex- H(k) = √ h(n)e−j2πnk/N . N n=0 valued, which occurs in the decomposition of adjacency matrices of undirected graphs. For example, for the graph Therefore, for classical DFT analysis, the following rela- from Fig. 1(b) in Part I and its adjacency matrix, the tion holds √ eigenvalues are shown in Fig. 20. N Vλ = (UH )−1 . Definition: The analytic graph signal, Xa (k), and the √ graph Hilbert transform, Xh (k), are defined in the spectral This relation is obvious from (35) and u∗k (n) = λnk / N , domain as and will be used to define the z-transform of a graph sig- nal. Xa (k) = 1 + sign(=(λk )) X(k) 20 Xh (k) = j sign =(λk ) X(k) 3.13. Optimal Denoising Consider a measurement, x, composed of a slow-varying X(k) = Xa (k) + jXh (k), graph signal, s, and a fast changing disturbance, ε, to give where =(λk ) denotes imaginary part of λk . If these rela- tions are applied to the standard DFT with λk = e−j2πk/N x = s + ε. we would obtain the corresponding classical signal process- The aim is to design a filter for disturbance suppression ing definitions. (denoising), the output of which is denoted by y = H(x). The optimal denoising task may then be defined as a 3.11. Shift Operator in the Spectral Domain minimization of the objective function A shift operation in the spectral domain can be defined in the same way as the shift in the vertex domain. Consider 1 J= ky − xk22 +αyT Ly. (64) a product of two graph signals, x(n)y(n), defined on an 2 undirected graph. The GDFT of this product then takes the form Physically, the minimization of the first term 21 ky − xk22 forces the output signal y to be as close as possible to the N X −1 available observations x, in terms of the energy of their Eu- GDFT{x(n)y(n)} = x(n)y(n)uk (n) = clidean distance (minimum error energy), while the second n=0 term represent a measure of smoothness of y (see Section N −1 N −1 N −1 X X X 3.7). This is also physically meaningful, as the original in- X(i)ui (n)y(n)uk (n) = X(i)Yi (k), put, s, was low-pass and smoother than the disturbance, ε. n=0 i=0 i=0 The parameter α provides a balance between the closeness where of output, y, to x and the output smoothness criterion. N X −1 To solve this minimization problem, we differentiate Yi (k) = y(n)ui (n)uk (n) n=0 ∂J = y − x + 2αLy = 0 can be considered as a shift of Y (k) by i spectral indices. ∂yT Remark 18: As desired, a shift by i = 0 in the spectral which results in domain produces the√original value, Y0 (k) = Y (k), up to a constant factor 1/ N . This relation does not hold for y = (I + 2αL)−1 x. the shift operators in the vertex domain. The spectral domain form of this relation follows from L = UT ΛU, Y = UT y, and X = UT x, to yield 3.12. Parseval’s Theorem on a Graph Consider two graph signals, x(n) and y(n), which are Y = (I + 2αΛ)−1 X. observed on an undirected graph and their spectra, X(k) and Y (k). Then, Parseval’s theorem has the form The element-wise transfer function of the above spectral input/output relation then takes the form N X −1 N X −1 x(n)y(n) = X(k)Y (k) (62) 1 H(λk ) = . (65) n=0 k=0 1 + 2αλk and it holds for any two graph signals. To prove Parseval’s theorem on graphs, consider Remark 19: For a small α, we have H(λk ) ≈ 1, that is, an all-pass behavior of (65), with no signal smoothing, N −1 N −1 N −1 which yields y ≈ x. On the other hand, for a large α, X X X x(n)y(n) = X(k)uk (n) y(n) H(λk ) ≈ δ(k). The resulting y ≈ const. is maximally n=0 n=0 k=0 smooth (a constant output, without any variation). N X −1 N X −1 = X(k) y(n)uk (n), (63) Example 8: The noisy signal from Fig. 3 was filtered using k=0 n=0 the optimal filter in (65) with α = 1, and the result is shown in Fig. 21. The achieved SNR was 19.16 dB. to yield Parseval’s equivalence between the energies in the Other cost functions. Among many possible alterna- original and spectral domains. It has been assumed that tives, we will introduce two more cost functions for graph the graphs are undirected, so that U−1 = UT holds. This signal denoising, which exploit different constraints im- theorem is quite general and applies to both the graph posed on the solution. Laplacian and the adjacency matrix based decompositions Instead of enforcing the smoothness of the output sig- on undirected graphs. nal, we may instead desire that its deviation from a linear form (that is, the signal, y, which satisfies Ly = 0) is 21 and 11 6 !p/2 N −1 N −1 N −1 12 1X 2 X X 2 J= (y(n)−x(n)) +α Wmn (y(n)− y(m)) 10 2 n=0 n=0 m=0 4 (68) 9 13 14 with 0 ≤ p ≤ 1. 5 Remark 20: The zero-norm, `0 , with p = 0, is the best in 8 2 0 promoting sparsity, since for p = 0 the second term in the 1 cost function in (67) counts (and minimizes) the number 3 of nonzero elements in Ly. Minimization of the sparsity 15 of Ly promotes constant (or linear) solutions for y, with the smallest number of discontinuities (nonzero elements of 7 vector Ly). In the second cost function in (68), the zero- norm promotes the smallest PN −1 possible number of2 nonzero elements of the term m=0 Wmn (y(n) − y(m)) ; this is also known as the total variation (TV) approach. However, the minimization of such objective functions cannot be Figure 21: Graph signal denoising for a noisy signal from Fig. 3, achieved in an analytic way, like in the standard MSE case which is filtered using an optimal filter in (65), with α = 1. of p = 2. On the other hand, the choice of p = 1 with one-norm, as small as possible. This can be achieved with the cost `1 , makes the above cost functions convex, allowing for function given by gradient descend methods be used to arrive at the solution, 1 1 while producing the same solution as with p = 0, under J= ky − xk22 +αkLyk22 = ky − xk22 +αyT L2 y (66) some mild conditions. The `1 -norm serves as an analytic 2 2 proxy to the `0 -norm [19]. which yields a closed form denoising solution y = (I + 2αL2 )−1 x 3.14. Systems on a Graph Defined Using Random Walk Laplacian with the corresponding element-wise spectral domain re- While common choices for the graph shift operator lation H(λk ) = 1/(1 + 2αλ2k ). are: (i) adjacency matrix, S = A, and (ii) graph Lapla- A combination of the two cost function forms in (64) cian, S = L, normalized versions of the adjacency matrix, and (66), may provide additional flexibility in the design graph Laplacian, S = D−1/2 LD−1/2 , and random walk of the filter transfer function, for example (diffusion) matrix, S = D−1 W, can also be used [20, 21]. 1 Various shift operators produce corresponding eigenvector J= ky − xk22 +αyT Ly + βyT L2 y (signal decomposition) bases, such as those analyzed in 2 Part I and given in Table 1. would yield the transfer function A generalized form of the output from a system on a 1 graph can then be written as H(λk ) = . 1 + 2αλk + 2βλ2k M −1 X 0 1 M −1 This transfer function form can be further fine-tuned through y = h0 S x + h1 S x + · · · + hM −1 S x = hm Sm x, m=0 the choice of the parameters α and β. For example, if we (69) desire the component corresponding to λ1 6= 0 not to be at- where, by definition S0 = I, while h0 , h1 , . . . , hM −1 are tenuated, we would use α + βλ1 = 0. Such a cost function the system coefficients. can be straightforwardly extended to produce a transfer An unbiased version of the random walk shift operator function for M unattenuated components. can also be employed in this context, defined as Sparsity promoting solutions. Some applications re- quire to promote the sparsity of the output graph signal, S = (I + D)−1 (I + W), (70) rather than its smoothness. Such solutions then naturally rest upon compressive sensing theory which requires the as it exhibits the desirable property of asymptotic signal two-norm in the previous cost functions to be replaced energy preservation [22]. The shift operator in (70) can with the norms that promote sparsity. Two examples of be derived under the assumption that the random graph such cost functions are signal, x, follows the general random walk (GRW) model, 1 which exhibits the following properties: J = ky − xk22 +αkLykpp (67) 2 22 the GRW has a probability density which convergences to Table 1: Summary of graph spectral basis vectors. that of the Wiener process [23, 24, 25, 26]. In the graph Operator Eigenanalysis setting, for a walker at a vertex m, the central limit the- orem [27] asserts that after a sufficiently large number of Graph Laplacian Luk = λk uk independent steps, the probability of walker’s position is 2 Gaussian distributed, Pmn ∝ e−rmn , where rmn is a mea- Generalized eigenvectors sure of physical distance between vertices m and n. Conse- of graph Laplacian Luk = λk Duk quently, the elements of the GRW weight matrix, denoted 1 1 by W̃, are given by Normalized graph Laplacian D− 2 LD− 2 uk = λk uk  2 −r e mn , (m, n) ∈ E,  Adjacency matrix Auk = λk uk W̃mn = 1, (75) m = n,  0, (m, n) ∈ / E.  1 Normalized adjacency matrix λmax A uk = λk uk Notice that in a probabilistic setting the vertices are im- plicitly self-connected ; to ensure that the transition proba- i) Graph Markov property, that is, the random process bilities sum up to unity, we therefore need to normalise the is dependent only of its shifted state, GRW weights to obtain Pmn = W̃mn /D̃mm . Notice that ! the standard weight matrix, W, has zeros on the diagonal so that for W̃ in (75), W̃ = (I + W) and D̃ = (I + D). \ m P x S x = P x Sx ; (71) Therefore, this graph shift operator takes the form in (70). m>0 Example 9: Consider again the multi-sensor setup described in Section 2, and shown in Fig. 22(a). The graph shift op- ii) Graph Martingale property, whereby the conditional erator based on the GRW model was employed within a first expectation of the random process is equal to its order averaging system (h0 = 0, h1 = 1), as in (69), to estimate shifted state, which can be written as the true temperature from the observed temperature field. The 2 ( ) weight matrix elements, Wmn = e−rmn , were specified based on the Euclidean distance between vertices, rmn , thereby ac- \ m E x S x = Sx. (72) counting for the difference in latitude, longitude and altitute. m>0 The resulting denoised temperature field is illustrated in Fig. 22(b) and demonstrates the attained increase in the SNR from In this way, the random walk can be described by a Markov 14.2 dB to 19.8 dB, which results from the desirable unbiased- matrix, P ∈ RN ×N , with its (m, n)-th element defined as ness and asymptotic power preservation properties of the shift the transition probability, Pmn , of going from vertex m to operator. vertex n. By setting S = P, this shift operator is unbi- ased, since each row in P sums up to unity, i.e. P 1 = 1. Furthermore, owing to the graph Martingale condition in (72), the shift operator exhibits a dual role of the expecta- tion operator, since Sx = Px = E {x}. With this result, it can also be proven that with an increase in the number of vertices, N , the shift operator is asymptotically power preserving (isometric), that is [22] lim kSxk2 = E kxk2 . (73) N →∞ (a) Observed field (b) GRW local expectation (SNR = 14.2 dB). (SNR = 19.8 dB). Therefore, the class of systems based on this graph shift also exhibits the following boundedness property Figure 22: Local average operator on a graph based on the general- ized random walk (GRW) shift operator. The graph signal intensity M X −1 is designated by the vertex color. lim kyk2 ≤ |hm |2 E kxk2 . (74) N →∞ m=0 The use of the Markov matrix as the shift operator was re- 4. Subsampling, Compressed Sensing, and Recon- cently proposed in [20, 21], and the above analysis further struction justifies this concept. In practice, the actual probabilities of vertex transition Graphs may comprise of a very large number of ver- are often unknown but can be inferred from the available tices, of the order of millions or even higher. The associ- information of the graph topology, implied by the weight ated computational and storage issues bring to the fore the matrix, W. In the limit, Donsker’s theorem states that consideration of potential advantages of subsampling and 23 compressive sensing defined on graphs. We here present coefficients X(K), X(K + 1), . . . , X(N − 1) in (77) since several basic approaches to subsampling, along with their these are zero-valued and do not contribute to the for- relations to classical signal processing [28, 29, 30, 31, 32, mation of graph signal samples. With this in mind, the 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]. M × N system of equations in (78) is reduced to the fol- lowing M × K system 4.1. Subsampling of Low-Pass Graph Signals      x(n1 ) u0 (n1 ) u1 (n1 ) . . . uK−1 (n1 ) X(0) For convenience, we shall start from the simplest case  x(n2 )   u0 (n2 ) u1 (n2 ) . . . uK−1 (n2 )  X(1)  where the considered graph signal is of a low-pass nature.  ..  =     .. .. .. ..   .. ,  Such a signal can be expressed as a linear combination of  .   . . . .  .  K < N eigenvectors of the graph Laplacian which exhibit x(nM ) u0 (nM ) u1 (nM ) . . . uK−1 (nM ) X(K − 1) the lowest smoothness indices, or, in the matrix form K−1 X y = AM K XK , (79) x(n) = X(k)uk (n), n = 0, 1, . . . , N − 1. (76) k=0 where the definitions of the reduced measurement matrix AM K and the reduced GDFT vector XK are obvious. For The GDFT domain coefficients of this (K-sparse) signal M = K independent measurements, this system can be in the GDFT domain are of the following form solved uniquely, while for M > K the system is typi- cally overdetermined and the solution is found in the least X = [X(0), X(1), . . . , X(K − 1), 0, 0, . . . , 0]T . (77) squares (LS) sense, as [34] Recall that a graph signal is sparse in the GDFT do- XK = (ATM K AM K )−1 ATM K y = pinv(AM K )y, (80) main if K N . The smallest number of graph signal samples, M , needed to recover the sparse signal is therefore where pinv(AM K ) = (ATM K AM K )−1 ATM K is the matrix M = K < N . For stability of reconstruction, it is common pseudo-inverse of AM K . to employ K ≤ M < N graph signal samples. The vector After XK is calculated, all GDFT values follow directly of available graph signal samples will be referred to as the as X = [X(0), X(1), . . . , X(K − 1), 0, 0, . . . , 0]T , where the measurement vector, and will be denoted by y, while the assumed zero values are added for X(K), X(K + 1), . . . , set of vertices (a random subset of V = {0, 1, 2, . . . , N −1}) X(N −1). The graph signal is then recovered at all vertices over which the samples of graph signal are available is de- using x = U X. noted by Recovery condition. The signal reconstruction in (80) is M = {n1 , n2 , . . . , nM }. possible if the inverse (ATM K AM K )−1 exists, which means that The measurement matrix can now be defined using the rank(ATM K AM K ) = K. (81) IGDFT, x = U X, of which an element-wise form is given by (76). The equations in (76) corresponding to the avail- In terms of the matrix condition number, this requirement able graph signal samples at vertices n ∈ M = {n1 , n2 , . . . , nM }is equivalent to then define the system cond(ATM K AM K ) < ∞,      x(n1 ) u0 (n1 ) u1 (n1 ) . . . uN −1 (n1 ) X(0) that is, a nonsingular ATM K AM K .  x(n2 )   u0 (n2 ) u1 (n2 ) . . . uN −1 (n2 )  X(1)   Remark 21: For noisy measurements of graph signals, the  ..  =  .. .. .. .. ,     ..   noise in the reconstructed GDFT coefficients is directly re-  .   . . . .  . x(nM ) u0 (nM ) u1 (nM ) . . . uN−1 (nM ) X(N − 1) lated to the input noise and the matrix condition number. If we are able to choose the available signal sample po- for which the matrix form is given by sitions (vertices), then the sampling strategy would be to find the set of measurements so that these produce the y = AM N X, (78) condition number which is as close to unity as possible (for stability and reduced influence of noise). where AM N is the measurement matrix and the measure- ments vector Example 10: To demonstrate the principle of reconstruction from a reduced set of graph signal samples, consider the values y = [x(n1 ), x(n2 ), . . . , x(nM )]T of a graph signal at M = 3 vertices, given by y = [x(0), x(2), x(6)]T = [1.140, 0.996, 0.563]T , consists of the available graph signal samples. In general, since M < N this system is underdetermined, and cannot as shown in Fig. 23 (upper panel). Assume that the graph sig- be solved uniquely for X without additional constraints. nal is of a low-pass type, with K = 2 lowest nonzero GDFT co- The assumption that the spectral representation of a efficients X(0) and X(1). The GDFT coefficients of this graph signal can then be reconstructed from signal contains a linear combination of only K ≤ M slow- est varying eigenvectors allows us to exclude the GDFT y = A32 X2 , (82) 24 Similar to (78), the corresponding system of equations      0 x(n1 ) uk1 (n1 ) uk2 (n1 ) . . . ukK (n1 ) X(k1 ) ×1  x(n2 )   uk1 (n2 ) uk2 (n2 ) . . . ukK (n2 )  X(k2 )   ..  =  .. .. ..  ..  .      .. 2  .   . . . .  .  × 3 ×4 x(nM ) uk1 (nM ) uk2 (nM ) . . . ukK(nM ) X(kK ) (83) 6 7 ×5 × of which the matrix form is y = AM K XK , is solved for the nonzero spectral values X(k), k ∈ K, in the same way as in the case of a low-pass signal presented in Section 4.1. 0 1 4.2.2. Support Matrices, Subsampling and Upsampling In graph signal processing literature, the subsampling 2 problem is often defined using the so called support ma- 3 4 trices. Assume that a graph signal, x, is subsampled 7 in such way that it is available on a subset of vertices 6 5 n ∈ M = {n1 , n2 , . . . , nM }, rather than on the full set of vertices. For this subsampled signal, we can define its Figure 23: Illustration of the subsampling of a lowpass graph signal. Top: A graph signal with missing samples at vertices 1, 3, 4, 5 and 7. upsampled version, xs , by adding zeros at the vertices Bottom: The reconstructed graph signal. where the signal is not available. Using a mathematical formalism, the subsampled and upsampled version, xs , of the original signal, x, is then that follows from the definition in (76) for the assumed available signal samples, x(n), at the three vertices n = 0, n = 2, and xs = Bx, (84) n = 6, for two nonzero coefficients, X(0) and X(1), where the support matrix B is an N × N diagonal matrix with ones at the diagonal positions which correspond to     x(0) u0 (0) u1 (0) x(2) = u0 (2) u1 (2) X(0) . M = {n1 , n2 , . . . , nM } and zeros elsewhere. The subsam- X(1) x(6) u0 (6) u1 (6) pled and upsampled version, xs , of the signal x is obtained is such a way that the signal x is subsampled on a reduced The rank of the matrix A32 is 2. The corresponding ma- trix condition number is cond(AT32 A32 ) = 4.33, while the re- set of vertices, and then upsampled by adding zeros at constructed nonzero values of the GDFT are X(0) = 2 and the original signal positions where the subsampled signal X(1) = 1, to yield the reconstructed graph signal x = U X, is not defined. with X = [2, 1, 0, 0, 0, 0, 0, 0]T , as shown in Fig. 23 (lower Recall that in general a signal, x, with N independent panel). values cannot be reconstructed from its M < N nonzero values in xs , without additional constraints. However, for Remark 22: For a directed circular√ graph, with the eigen- graph signals which are also sparse in the GDFT domain, vectors uk (n) = exp(j2πnk/N )/ N , the above downsam- the additional constraint is that the signal, x, has only pling and interpolation relations are identical to those in K ≤ M nonzero coefficients in the GDFT domain, X = classical signal processing [35]. UT x, at k ∈ K = {k1 , k2 , . . . , kK }, so that the relation 4.2. Subsampling of Sparse Graph Signals X = CX The subsampling of graph signals which are sparse in holds, where the support matrix C is an N × N diagonal the GDFT domain will be next considered for the cases of matrix with ones at the diagonal positions which corre- both known and unknown positions of the nonzero GDFT spond to K = {k1 , k2 , . . . , kK } and zeros elsewhere. Note coefficients. the presence of the GDFT, X, is on both sides of this equation, contrary to xs = Bx in (84). The reconstruc- 4.2.1. Known Coefficient Positions in GDFT tion formula then follows from The previous analysis holds not only for a low-pass type of the graph signal, x, and its corresponding GDFT, X, xs = Bx = BUX = BUCX. but also for case of GDFT, X, with K nonzero values at as X = pinv(BUC)xs . The inversion arbitrary, but known spectral positions, that is, X = CX = pinv(BUC)xs X(k) = 0 for k ∈ / K = {k1 , k2 , . . . , kK }. is possible for K nonzero coefficients of CX if the rank of BUC is K (if there are K linearly independent equations), that is rank(C) = K = rank(BUC). 25 This condition is equivalent to (81) since the nonzero part submatrix of the measurement matrix, AM N , which of matrix BUC is equal to AM K in (83). consists of the columns defined by the indices k1 and k2 . The reconstructed vector y2 = A2 X2 , is re- 4.2.3. Unknown Coefficient Positions moved from the measurements, y, with the error, The reconstruction problem is more complex if the po- e = y − y2 , now acting as the new measurement sitions of nonzero spectral coefficients K = {k1 , k2 , . . . , kK } vector. are not known. This case has been addressed within stan- (iii) The procedure is iteratively repeated K times or un- dard compressive sensing theory and can be formulated til the remaining measurement values in e are negli- as gible. In the cases when the sparsity, K, is unknown, min kXk0 subject to y = AM N X, (85) the procedure is iterated until kek2 < ε, where ε is a where kXk0 denotes the number of nonzero elements in X predefined precision. (`0 pseudo-norm). Example 11: Consider a sparse graph signal, of the sparsity While the ways to solve this minimization problem are degree K = 2, measured at vertices n = 2, 3, 4, 5, and 7, which manifold, we here adopt a simple, two-step approach: takes the values 1. Estimate the positions K = {k1 , k2 , . . . , kK } of the y = [0.707, 1.307, 0.407, 1.307, 0.407]T , nonzero coefficients using M > K signal samples, as shown in Fig. 24 (top panel). Our task is to reconstruct the 2. Reconstruct the nonzero coefficients of X at the esti- full signal, that is, to find the missing samples x(0), x(1), and x(6). mated positions K, along with the signal x at all ver- The estimate positions of the nonzero elements in the GDFT, tices, using the methods for the reconstruction with X, the initial estimate, X0 , is calculated for given measure- the known nonzero coefficient positions, described in ments, y, according to (86). Because K = 2, the positions of Sections 4.1 and 4.2.1. The nonzero coefficients at the two nonzero coefficients are estimated as positions of the positions K are calculated as XK = pinv(AM K )y. two largest values in X0 . In the considered case, K = {0, 3}, as shown in Fig. 24 (bottom panel). The GDFT coefficients The nonzero positions of the GDFT in Step 1 can be are then reconstructed for the sparsity degree K = 2, as X2 = estimated through the projection of measurements (avail- pinv(A52 )y, resulting in X(0) = 2, X(3) = 1.2, as illustrated able signal samples), y, on the measurement matrix in Fig. 24 (bottom–right). Finally, the reconstructed graph   signal at all vertices, x = UX, is shown in the middle panel of u0 (n1 ) u1 (n1 ) . . . uN −1 (n1 ) Fig. 24.  u0 (n2 ) u1 (n2 ) . . . uN −1 (n2 )  AM N =  .. .. ..   ..  4.2.4. Unique Reconstruction Conditions  . . . .  u0 (nM ) u1 (nM ) . . . uN −1 (nM ) As is the case with the standard compressive sensing problem, the initial GDFT estimate, X0 , will produce cor- to give rect positions of the nonzero elements, X(k), and the re- X0 = ATM N y, (86) construction will be unique, if where the positions of K largest values in X0 are used as 1 1 K< 1+ , an estimate of the nonzero positions, K. This procedure 2 µ can also be implemented in an iterative way [34], where where µ is equal to the maximum value of the inner prod- (i) In the first iteration we assume K = 1 and pro- uct among any two columns of the measurement matrix, ceed to estimate the largest spectral component in AM N (µ is referred to as the coherence index) [47]. the graph signal. Upon determining its position as For illustration of the uniqueness of reconstruction, re- k1 = argmax|ATM N y|, the initially empty set of the call that a K-sparse signal can be written as nonzero positions becomes K = {k1 }. The recon- K X structed vector y1 = A1 X1 , where X1 = pinv(AM 1 )y, x(n) = X(ki )uki (n), is then removed from the measurements, y. In this i=1 case, the matrix AM 1 is a column of the matrix AM N of which the initial estimate in (86) is equal to X0 = defined by the index k1 . The difference e = y − y1 ATM N y = ATM N AM N X, or element-wise is used as the measurement vector in the next step. K K (ii) The position of the second largest spectral compo- X X X X0 (k) = X(ki ) uk (n)uki (n) = X(ki )µ(k, ki ), nent in the graph signal is estimated by solving k2 = i=1 n∈M i=1 argmax|ATM N e|. The set of nonzero positions now becomes K = {k1 , k2 }. The first and the second where M = {n1 , n2 , . . . , nM } and component of the graph signal are now estimated µ(k, ki ) = X uk (n)uki (n). as X2 = pinv(AM 2 )y, where the matrix AM 2 is a n∈M 26 0 tract these two matrix equations we get a zero-vector on × ×1 the right-side and a nonzero solution for the resulting vec- tor, 2 " # (1) XK 3 4 X2K = (2) , −XK 6× 7 5 (a) requires the zero-valued determinant of AM 2K . The nonzero determinant of AM 2K guarantees that two such, nonzero (1) (2) 0 solutions, XK and XK , cannot exist. If all possible sub- 1 matrices AM 2K of the measurement matrix AM K are non- sigular, then two solutions of sparsity K cannot exist, and 2 the solution is unique. The requirement that all reduced 3 4 measurement matrices corresponding to a 2K-sparse X are 7 nonsingular can be written in several forms, listed below 6 5 (b) det{ATM 2K AM 2K } = d1 d2 · · · d2K 6= 0 |X0 (k)| |X (k)| dmax 1 + δ2K cond{ATM 2K AM 2K } = ≤ <∞ dmin 1 − δ2K 2 kAM 2K X2K k2 1 − δ2K ≤ dmin ≤ ≤ dmax ≤ 1 + δ2K 0 1 2 3 4 5 6 7 k 0 1 2 3 4 5 6 7 k (c) kX2K k2 2 Figure 24: Compressive sensing on graphs. (a) Available sam- where di are the eigenvalues of ATM 2K AM 2K , dmin is the ples (measurements), y = [x(2), x(3), x(4), x(5), x(7)]T , with missing minimum eigenvalue, dmax is the maximum eigenvalue, samples at n = 0, 1, 6. (b) Reconstructed signal, x, over the whole set of vertices. (c) Initial estimate of the GDFT, X0 (k), (left), and and δ2K is the restricted isometry constant. All these con- the reconstructed sparse GDFT, X(k), (right). ditions are satisfied if dmin > 0 or 0 ≤ δ2K < 1. Noisy data require robust estimators, and thus more strict bounds on dmin and δ2K . For example, it has been If the maximum possible absolute value of µ(k, ki ) is de- shown that the condition 0 ≤ δ2K < 0.41 will guaran- noted by µ = max|µ(k, ki )| (coherence index of AM N ) tee stable inversion of ATM 2K AM 2K and consequently a then, in the worst case scenario, the amplitude of the robust reconstruction for noisy signals; in addition, this largest component, X(ki ), (assumed with the normalized bound will allow for convex relaxation of the reconstruc- amplitude 1), will be reduced for the maximum possi- tion problem [48]. Namely, the previous problem, (85), can ble influence of other equally strong (unity) components be solved using the convex relation from the norm-zero to 1 − (K − 1)µ, and should be greater than the maximum a norm-one formulation given by possible disturbance at k 6= ki , which is Kµ. From 1 − (K − 1)µ > Kµ, the unique reconstruction condition fol- min kXk1 subject to y = AM N X. lows; see also [34, 47]. In order to define other unique reconstruction condi- The solutions to these two problem formulations are the tions, we shall consider again the solution to y = AM N X same if the measurement matrix satisfies the previous con- which assumes a minimum number of nonzero coefficients ditions, with 0 ≤ δ2K < 0.41. The signal reconstruction in X. Assume that the sparsity degree K is known, then problem can now be solved using optimization techniques, a set of K measurements would yield a possible solution, such as gradient-based approaches or linear programming XK , for any combination of K nonzero coefficients in X. methods [34, 48]. For another set of K measurements, we would obtain an- other set of possible solutions, XK . Then, a common so- 4.3. Measurements as Linear Combinations of Samples lution between these two sets of solutions would be the It should be mentioned that if some spectrum coeffi- solution to our problem. For a unique solution, there are cients of a graph signal are strongly related to only a few (1) (2) of the signal samples, then these signal samples may not no two different K-sparse solutions XK and XK if all possible matrices, ATM 2K AM 2K , are nonsingular. Namely, be good candidates for the measurements. both of these two different solutions would satisfy mea- Example 12: Consider a graph with one of its eigenvectors surement equations, of the form close to ui (n) = δ(n − m). This case is possi- (1) ble on graphs, in contrast to the classic DFT analysis where 0K XK the basis functions are spread over all sensing instants (ver- AM 2K = y and AM 2K (2) = y, tices). A similar scenario is also possible in wavelet analysis or 0K XK short time Fourier transforms, which also allow for some of the h (1) (2) i transform coefficients to be related to only a few of the signal where AM 2K = AM K AM K . Obviously, if we sub- samples. In the assumed simplified case, if a considered sparse 27 signal contains a nonzero coefficient, X(i), corresponding to To proceed with signal reconstruction, observe that if ui (n) = δ(n − m), then all information about X(i) is contained the shifts are stopped after M < N steps, the available in the graph signal sample x(m) only. This is prohibitive to signal samples will be x(n), x(n − 1), . . . , x(n − M + 1). the principle of reduced number of samples, since an arbitrary From this reduced set of measurements/samples we can set of available samples may not contain x(m). still recover the full graph signal, x, using compressive In classical and graph data analysis this class of problems is sensing based reconstruction methods, if the appropriate solved by defining a more complex form of the measurements, y(n), through linear combinations of all signal samples rather reconstruction conditions are met. than the original samples themselves. In this way, each mea- Principle of aggregate sampling on arbitrary graph. surement, y(n), will contain information about all signal sam- The same procedure can be applied to a signal observed ples, x(n), n = 0, 2, . . . , N − 1. in the same way on an arbitrary graph. Assume that we Such measurements are linear combinations of all signal observe the graph signal at only one vertex, n, and obtain samples, and are given by one graph signal sample      y0 (n) = x(n), y(1) b11 b12 . . . b1N x(0)  y(2)   b21 b12 . . . b2N  x(1)  which will be considered as the measurement y(0) = y0 (n).  ..  =  .. .. . . ..  .. ,       .   . . . .  .  This graph signal may now be “graph shifted” to pro- y(M ) bM 1 bM 2 . . . bM N x(N − 1) duce y1 = Ax. Recall that in a one-step signal shift on a graph, all signal samples will move by one step along the or in a matrix form graph edges, as described in detail in Section 3.1 and illus- trated in Fig. 25. The sample of a graph signal at vertex y = BM N x. n will now be a sum of all signal samples that have shifted to this vertex. Its value is obtained as an inner product of The weighting coefficients for the measurements, bmn , in the mth row of the adjacency matrix, A, and the original the matrix, BM N , may be, for example, drawn from a signal vector, x. The value of graph shifted signal at the Gaussian random distribution. vertex n, is therefore given by For reconstruction, the sparsity of a graph signal, x, X should be again assumed in the GDFT domain. The re- y1 (n) = Anm x(m), lation of the measurement vector, y, with this sparsity m domain vector of coefficients, X, is then given by and represents a linear combination of some of the sig- y = BM N x = BM N UX = AM N X. nal samples, which is now considered as the measurement y(1) = y1 (n). The reconstruction is now obtained as a solution to One more signal shift on the graph yields X min kXk0 subject to y = (BM N U)X y2 (n) = A(2) nm x(m), m or as a solution of the corresponding convex minimization (2) problem, where Anm are the elements of matrix A2 = AA (see Property M2 in Part I, Section 2.3). Such an observed min kXk1 subject to y = (BM N U)X, value, after two one step shifts, y2 (n) at a vertex n, repre- sents a new linear combination of some signal samples and as described in Section 4.2.3. will be considered as the measurement y(2) = y2 (n). If we proceed with shifts M = N times, a system of 4.4. Aggregate Sampling N linear equations, y = BM N x, is obtained from which A specific form of a linear combination of graph signals all signal values, x(n), can be calculated. If we stop at is referred to as aggregate sampling. M < N , the signal can still be recovered using compressive For clarity, we shall first establish an interpretation of sensing based reconstruction methods if the signal is sparse sampling in classical signal processing through its graph and the reconstruction conditions are met. counterpart – sampling on a directed circular graph (Fig. Instead of M signal samples (instants) at one vertex, 6). Consider a graph signal, x, at a vertex/instant n. If we may use, for example, P samples at vertex n and the signal is observed at this vertex/instant only, then its (M − P ) samples from a vertex m. Other combinations of value is y0 (n) = x(n). Upon applying the graph shift vertices and samples may be also used to obtain M mea- operator, we have y1 = Ax, then for the same vertex, n, surements and to fully reconstruct a signal. we have y1 (n) = x(n − 1). If we continue this “shift and observe” operation on the directed circular graph N times 4.5. Filter Bank on a Graph at the same vertex/instant, n, we will eventually have all Subsampling and upsampling are the two standard op- signal values x(n), x(n − 1), . . . , x(n − N + 1) observed at erators used to alter the scale at which the signal is pro- vertex n. cessed. Subsampling of a signal by a factor of 2, followed 28 0 This is the basic operation used in multiresolution ap- 1 proaches based on filter banks and can be extended to signals on graphs in the following way. Consider a graph 2 with the set of vertices V. Any set of vertices can be 3 4 considered as a union of two disjoint subsets E and H, 6 7 5 such that V = E ∪ H and E ∩ H = ∅. The subsampling- upsampling procedure can then be performed in the fol- y(0) = x(7) lowing two steps: (a) signal x 1. Subsample the signal on a graph by keeping only signal values on the vertices n ∈ E, while not altering 0 the original graph topology, 1 2. Upsample the graph signal by setting the signal val- 2 ues for the vertices n ∈ / E to zero. 3 4 This combined subsampling-upsampling operation produces 6 7 5 a graph signal y(1) = x(4) + x(5) + x(6) 1 f (n) = 1 + (−1)βE (n) x(n), (b) shifted signal Ax 2 where ( Figure 25: Principle of aggregate sampling. (a) A graph signal x. (b) 0, if n ∈ E Its graph shifted version Ax. For example, for a graph signal value βE (n) = observed at the vertex n = 7 in the graph in (a) the measurement 1, if n ∈ H. is y(0) = x(7), and the aggregate measurement at the same vertex, n = 7, after the graph signal is shifted, is equal to y(1) = x(4) + The values of the resulting graph signal, f (n), are therefore x(5) + x(6) in (b). These two observations, y(0) and y(1), would f (n) = x(n) if n ∈ E and f (n) = 0 elsewhere. be sufficient to reconstruct a signal whose sparsity degree is K = 2 The vector form of the subsamped-upsampled graph with nonzero values at the known spectral index positions, k1 and k2 , if the reconstruction condition (81) is satisfied for the matrix signal, f (n), which comprises all n ∈ V, is given by AM N = BM N U at the specified spectral index positions. 1 1 f= (x + JE x) = (I + JE )x, (87) 2 2 where JE = diag((−1)βE (n) ), n ∈ V. The focus of our analysis will be on the two-channel wavelet filter bank on a graph, shown in Fig. 27. As in the classical wavelet analysis framework for temporary sig- nals, such a filter bank provides decomposition of a graph signal into the corresponding low-pass (smooth) and high- 2 pass (fast-varying) constituents. The analysis side (left part of the system in Fig. 27) consists of two channels with filters characterized by the vertex domain operators HL (L) and HH (L), with the corresponding spectral do- main operators HL (Λ) and HH (Λ). The operator HL (L) acts as a low-pass filter, transferring the low-pass compo- 2 2 nents of the graph signal, while the operator HH (L) does the opposite, acting as a high-pass filter. The low-pass fil- ter, HL (H), is followed by a downsampling operator which keeps only the graph signal values, x, at the vertices n ∈ E. Figure 26: Principle of a signal, x(n), downsampling and upsampling Similarly, the high-pass filtering with the operator HH (L), in the classical time domain. is subsequently followed by a downsampling to the vertices n ∈ H. These operations are crucial to alter the scale at by the corresponding upsampling, can be described in clas- which the graph signal is processed. sical signal processing by The synthesis side (right part in Fig. 27), comprises the complementary upsampling and filtering operations, 1 1 aiming to perform the graph signal reconstruction based f (n) = x(n) + (−1)n x(n) = (1 + (−1)n )x(n) , 2 2 on the upsampled versions, 21 (I + JE )HL (L)x and 21 (I + JH )HH (L)x, of signals obtained on the filter bank anal- as illustrated in Fig. 26. ysis side. Therefore, upon performing the upsampling of 29 Spectral solution. For the spectral representation of the filter-bank signals in the domain of Laplacian basis func- tions, we will use the decomposition of the graph Laplacian + in the form 1 1 F = UT f = (UT x + UT JE x) = (X + X(alias) ), (94) 2 2 where X(alias) = UT JE x is the aliasing spectral compo- nent. Figure 27: Principle of a filter bank for a graph signal. In the case of bipartite graphs, the matrix operator UT JE produces the transformation matrix UT with re- these signals onto the original set of vertices, V, by adding versed (left-right flipped) order of eigenvectors. This is zeros to the complementary sets of vertices, filtering is obvious from (29) in Part I, since performed by adequate low-pass, GL (L), and high-pass, UT JE = u0 u1 . . . uN −1 JE T GH (L), filters, to replace the zeros with meaningful values, T as required for a successful reconstruction of the original u0E u1E uN −1E = signal. As in the classical wavelet analysis, to achieve the −u0H −u1H · · · −uN −1H perfect (distortion-free) reconstruction it is necessary to T = uN −1 uN −2 . . . u0 = UTLR conveniently design the analysis filters, HL (L) and HH (L), and the synthesis filters, GL (L) and GH (L), as well as to where determine adequate downsampling and upsampling oper- ukE ukE ators. uk = , uN −1−k = , k = 0, 1, . . . N − 1, ukH −ukH It will be shown that the spectral folding phenomenon, described by (29) in Part I, characterized by the specific and spectral symmetry in the case of bipartite graphs, can ULR = uN −1 uN −2 ... u0 be used to form the basis for the two-channel filter bank is a left-right flipped version of the eigenvector matrix framework discussed in this Section. Consider a graph signal, x, and the filter-bank as in U = u0 u1 . . . uN −1 . Fig. 27. If the graph signal, x, passes through a low- pass analysis filter, HL (L), the output signal is HL (L)x. The element-wise form of equation (94) is given by According to (87), the downsampled-upsampled form of 1 the output signal, HL (L)x, is given by 12 (I + JE )HL (L)x. F (k) = (X(k) + X(N − 1 − k)). 2 After the syntheses filter, GL (L), the graph signal output For bipartite graphs and the normalized graph Laplacian, becomes 1 we can write fL = GL (L)(I + JE )HL (L)x. (88) 2 1 F (λk ) = (X(λk ) + X(2 − λk )). The same holds for the high-pass part 2 1 The second term in F (λk ) represents an aliasing compo- fH = GH (L)(I + JH )HH (L)x, (89) nent of the GDFT of the original signal. 2 The spectral representation of (92) is obtained with a where JH = −JE = diag((−1)1−βE (n) ) and left-multiplication by UT and a right-multiplication by U, JH + JE = 0. (90) UT GL (L)UUT HL (L)U + UT GH (L)UUT HH (L)U = 2I, having in mind that we can add UT U = UUT = I be- The overall output is a sum of these two signals, as illus- tween GL (L) and HL (L), and between GH (L) and HH (L). trated in Fig. 27, which after rearranging of terms gives Using the spectral domain definition of the transfer func- 1 tions, UT HL (L)U = HL (Λ), we get the spectral domain y = fL + fH = (GL (L)HL (L) + GH (L)HH (L))x+ form of the reconstruction condition (92) as 2 1 (GL (L)JE HL (L) + GH (L)JH HH (L))x. (91) GL (Λ)HL (Λ) + GH (Λ)HH (Λ) = 2I. (95) 2 For the aliasing part in equation (93), the left-multiplication The perfect reconstruction condition, y = x, is then is performed by UT , while the right-multiplication is done achieved if by UTLR . The first term in (93) is then of the form GL (L)HL (L) + GH (L)HH (L) = 2I, (92) UT GL (L)UUT JE HL (L)ULR = UT GL (L)UUTLR HL (L)ULR GL (L)JE HL (L) − GH (L)JE HH (L) = 0. (93) (R) = GL (Λ)HL (Λ), (96) 30 since UT JE = UTLR and UTLR ULR = I. The term (R) HL (Λ) = UTLR HL (L)ULR = HL (2I − Λ) is just a reversed order version of the diagonal matrix HL (Λ), with diagonal elements HL (λN −1−k ) = HL (2−λk ) instead of HL (λk ). The same holds for the second term in (93) which is equal to GH (L)JH HH (L), yielding the final spectral form (a) (b) of the aliasing condition in (93) as Figure 28: Bipartite graph for the Haar wavelet transform with N = GL (Λ)HL (2I − Λ) − GH (Λ)HH (2I − Λ) = 0. (97) 16 vertices. (a) Vertices in yellow are used for the low-pass part of the signal and correspond to the set E, while the vertices in gray An element-wise solution to the system in (92)-(93), for belong to the set H. This is the highest two-vertex resolution level bipartite graphs and the normalized graph Laplacian, ac- for the Haar wavelet. (b) Graph for a four-vertex resolution level in cording to (95) and (97), reduces to the Haar wavelet. GL (λk )HL (λk ) + GH (λk )HH (λk ) = 2, (98) GL (λk )HL (2 − λk ) − GH (λk )HH (2 − λk ) = 0. (99) since UT U = I, UTLR ULR = I, UT JE = UTLR , UTLR U = ILR , and ILR X = XUD , where ILR is an anti-diagonal (backward) Remark 23: A quadratic mirror filter solution would be identity matrix, and XUD is the GDFT vector, X, with ele- such that for the designed transfer function of the low-pass ments flipped upside-down. analysis filter, HL (λ), the other filters are The same holds for the high-pass part in (89), to yield GL (λ) = HL (λ), 1 T U GH (L)(I + JH )HH (L)x FH = 2 HH (λ) = HL (2 − λ), 1 1 = GH (Λ)HH (Λ)X − GH (Λ)HH (2I − Λ)XUD GH (λ) = HH (λ) = HL (2 − λ). (100) 2 2 For this solution, the design equation is given by and FL + FH = X. HL2 (λ) + HL2 (2 − λ) = 2, (101) while the aliasing cancellation condition, (99), is always Therefore, after the one-step filter-bank based decomposi- satisfied. tion on a bipartite graph, we have a new low-pass signal, fL , for which the nonzero values are at the vertices in E, and a high- An example of such a system √ would be an ideal low-pass pass signal, fH , with nonzero values only on H. Note that the filter, defined by HL (λ) = 2 for λ < 1 and HL (λ) = 0 high-pass operator on the graph signal is the graph Laplacian, elsewhere. Since HH (λ) = HL (2 − λ) holds for systems L, while the low-pass operator is 2I − L, which easily reduces on bipartite graphs, this satisfies the reconstruction con- to |L|, for the normalized graph Laplacian used here. dition. For the vertex domain realization, an approxima- Another simple transfer √ function that satisfies the design tion of the ideal filter with a finite neighborhood filtering equation (101) is HL (λ) = 2 cos(πλ/4). A similar analysis relation would be required. can also be done for this transfer function and other functions Example 13: Consider a simple form of the low-pass system defined by (100). √ The considered √ transfer functions HL (λ) = 2 − λ and HL2 (λ) = 2 − λ, HL (λ) = 2 cos(πλ/4) have several disadvantages, the most which satisfies the design equation, HL2 (λ) + HL2 (2 − λ) = 2. important being that they are not sufficiently smooth in the It also satisfies the condition that its form is of low-pass type spectral domain at the boundary interval points [35]. In ad- for the normalized Laplacian of bipartite graphs, HL2 (λ0 ) = dition, although the graph Laplacian, L, is commonly sparse 2 − λ0 = 2, since λ0 = 0, and HL2 (λmax ) = 2 − λmax = 0, as (with a small number of nonzero elements √ in large graphs), the λmax = 2. The vertex domain system operators which satisfy transfer function form HL (L) = 2I − L is not sparse. This all four quadratic mirror analysis and synthesis filters in (100), is the reason to use other forms which are sufficiently smooth are toward the boundary points, along with their polynomial ap- √ √ proximations, HL (Λ) = c0 Λ + c1 Λ2 + · · · + cM −1 ΛM −1 , with HL (Λ) = 2I − Λ, GL (Λ) = HL (Λ) = 2I − Λ, √ √ the coefficients c0 , c1 , . . . , cM −1 , that approximate HL (λ) and HH (Λ) = HL (2I − Λ) = Λ, GH (Λ) = HH (Λ) = Λ. HH (λ) = HL (2 − λ) for each λ = λk , k = 0, 1, . . . , N − 1. This topic will be addressed in detail on a general form of graphs in The spectral domain filtering form for the low-pass part of Section 7. graph signal is then obtained from (88), as 1 T The classic time-domain Haar wavelet (and scale) func- FL = UT fL = U GL (L)(I + JE )HL (L)x tions are easily obtained for a bipartite graph, such that 2 1 T E = 0, 2, 4, . . . , N − 2 and H = 1, 3, 5, . . . , N − 1, with = U GL (L)UUT (I + JE )HL (L)ULR UTLR UX the adjacency/weighting matrix defined by the elements 2 1 1 Amn = 1, for (m, n) ∈ {(0, 1), (2, 3), . . . , (N − 2, N − 1)}, = GL (Λ)HL (Λ)X + GL (Λ)HL (2I − Λ)XUD 2 2 as shown in Fig. 28(a). This adjacency matrix has the 31   0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 2  0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0   4  0 0 1 0 0 0 0 0 0 0 −1 0 0 0 0 0   6  0 0 0 1 0 0 0 0 0 0 0 −1 0 0 0 0   8  0 0 0 0 1 0 0 0 0 0 0 0 −1 0 0 0   10   0 0 0 0 0 1 0 0 0 0 0 0 0 −1 0 0   12   0 0 0 0 0 0 1 0 0 0 0 0 0 0 −1 0   14  0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 −1  L=   (102) 1  −1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   3  0 −1 0 0 0 0 0 0 1 0 0 0 0 0 0   5  0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0 0   7  0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0   9  0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0   11   0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0   13  0 0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0  15 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15 block form as in equation (18), Part I. The correspond- way, which designates the diffusion process on a graph in ing graph Laplacian is given in (102). Its eigenvectors time. are equal to the wavelet transform functions. The bipar- tite graph for the four-vertex resolution level in the Haar 5.1. Diffusion on Graph and Low Pass Filtering wavelet transform is shown in Fig. 28(b). Consider the diffusion equation 5. Time-Varying Signals on Graphs ∂x/∂t = −αLx. We shall denote a time-varying signal by xp (n), where Its discrete-time form, at a time instant p, may be ob- n designates the vertex index and p the discrete-time in- tained by using the backward difference approximation of dex. For uniform sampling in time, the index p corre- the partial derivative (∂x/∂t ∼ xp+1 − xp ), and has the sponds to the time instant p∆t, where ∆t is the sampling form interval. In general, this type of data can be considered xp+1 − xp = −αLxp+1 within the graph Cartesian product framework (given in or xp+1 (I + αL) = xp to produce Property M15 , Section 2.3, Part I). The resulting graph G = (V, B) follows as a Cartesian product of the given xp+1 = (I + αL)−1 xp . graph G1 = (V1 , B1 ) and a simple path (or circular) graph G2 = (V2 , B2 ) that corresponds to the classical uniformly On the other hand, the forward difference approxima- samples time-domain axis. tion (∂x/∂t ∼ xp − xp−1 ) to the diffusion equation yields Example 14: A graph topology for a time varying signal on a graph is shown in Part I, Fig. 9, where the graph vertices xp+1 − xp = −αLxp are designated by 1, 2, 3, 4, 5 and time instants are denoted as the a, b, c vertices on the path graph. The resulting Cartesian or product graph, for the analysis of this kind of signals, is shown xp+1 = (I − αL)xp . in Part I, Fig. 9. It is interesting to note that these iterative forms lead The adjacency matrix of a Cartesian product of two to the minimization of the quadratic form of a graph sig- graphs is then given by nal, Ex = xLxT , (see Section 4.2.1, Part I). The minimum A = A1 ⊗ IN2 + IN1 ⊗ A2 = A1 ⊕ A2 , of this quadratic form can be found based on the steepest descent method, whereby the signal value at a time instant where A1 is the adjacency matrix of the graph of interest p is moving in the direction opposite to the gradient, to- G1 , and A2 is the adjacency matrix for the path or circular ward the energy minimum position, with a step constant graph, G2 , which designates the sampling grid, while N1 α. The gradient of the quadratic form, Ex = xLxT , is and N2 denote, respectively, the number of vertices in G1 ∂Ex /∂xT = 2xL, which results in an iterative procedure and G2 . We will next consider a simple and important example xp+1 = xp − αLxp = (I − αL)xp . (103) of a time-varying signal defined on graph in an iterative 32 This relation can be used for simple and efficient filtering of For some values of α < β, this system can be a good graph signals (with the aim to minimize Ex as a measure of and computationally very simple approximation of a graph signal smoothness). The spectral domain relation follows low-pass filter. immediately, and has the form 1.5 1.5 Xp+1 = (I − αΛ)Xp 1 1 or for every individual component 0.5 0.5 Xp+1 (k) = (1 − αλk )Xp (k). 0 0 Recall that the eigenvalues, λk , represent the index of smoothness for a spectral vector (eigenvector), uk , with a 0 2 4 0 2 4 small λk indicating smooth slow-varying elements of the 1.5 1.5 eigenvectors; therefore, for low-pass filtering we should re- tain the slow-varying eigenvectors in a spectral represen- 1 1 tation of the graph signal. Obviously, these slow-varying components will pass through this system since (1−αλk ) is 0.5 0.5 close to 1 for small λk , while the fast-varying components 0 0 with a larger λk , are attenuated. This iterative procedure will converge if |1 − αλmax |< 1. 0 2 4 0 2 4 In a stationary state of a diffusion process, the trivial minimal energy solution is obtained when Figure 29: Filter approximation in the spectral domain for a varying number of iterations, K, using Taubin’s algorithm and the graph lim Xp+1 (k) = (1 − αλk )p+1 X0 (k), Laplacian matrix of the graph in Fig. 8. p→∞ that is, all components Xp+1 (k) tend to 0, except for the Example 15: Consider the graph from Fig. 8(a) and its graph constant component, Xp+1 (0), for which λ0 = 0. This Laplacian, L. For the choice of parameters α = 0.1798 and component therefore defines the stationary state (maxi- β = 0.2193, the spectral transfer function in (106) is shown in Fig. 29 for the considered graph filter, and for the numbers of mally smooth solution). In order to avoid this effect in the iterations in Taubin’s algorithm K = 1, 5, 30 and 150. Observe processing of data on graphs, and to retain several low- how the transfer function, H(λk ), approaches the ideal low-pass pass components (eigenvectors) in the signal, the iteration form as the number of iterations, K, increases. process in (103) can be used in alternation with The task is next to low-pass filter the noisy signal from Fig. 16(b), with the initial noisy signal is denoted by x0 . Then xp+2 = (I + βL)xp+1 . (104) x1 = (I − 0.1545L)x0 is calculated using the corresponding This is the basis for Taubin’s α − β algorithm, presented graph Laplacian, followed by obtaining x2 = (I + 0.1875L)x1 . next. In the third and fourth iteration, the signal values x3 = (I − 0.1545L)x2 and x4 = (I+0.1875L)x3 are calculated. This two- 5.2. Taubin’s α − β algorithm step iteration cycle is repeated K = 20 times. The resulting signal is the same as the output of an ideal low-pass filter shown When the two iterative processes in (103) and (104) are in Fig. 16(c). used in a successive order, the resulting system on a graph Finally, the noisy signal from Fig. 3 was filtered using is referred to as Taubin’s α−β algorithm. This algorithm is Taubin’s α − β algorithm, with α = 0.1 and β = 0.1, over widely used for low-pass filtering of data on graphs, since K = 100 iterations, and the result is shown in Fig. 30. Ob- it is very simple, and admits efficient implementation in serve the reduced level of additive noise in the output. the vertex domain. Definition: Taubin’s α−β algorithm is a two-step iterative algorithm for efficient low-pass data filtering on graphs. Its 6. Random Graph Signals two steps are defined in a unified way as This section extends the concepts of data analytics for xp+2 = (I + βL)(I − αL)xp . (105) deterministic signals on graphs addressed so far, to intro- duce notions of random signals on graphs, their proper- The corresponding element-wise transfer function spectral ties, and statistical graph-specific methods for their anal- domain of the two iteration steps in (105) is given by ysis. The main focus is on wide-sense stationary (WSS) H(λk ) = (1 + βλk )(1 − αλk ). data observed on graphs. In general, the stationarity of a signal is inherently related to the signal shift operator After K iterations of this algorithm, the spectral domain and its properties. We have already presented two ap- transfer function can be written as proaches to define a shift on a graph (through the adja- K cency matrix and the graph Laplacian, and their spectral HK (λk ) = (1 + βλk )(1 − αλk ) . (106) decompositions). These will be used, along with other 33 Remark 27: For WSS random signals, their correlation 11 6 matrix, Rx = E{xxT }, is diagonalizable with the same def 12 transform matrix, U, which defines the DFT, X = UH x, def 10 with x = UX. The proof follows from 4 Rx = E{xxT } = E{UX(UX)H } 9 14 5 13 = UE{XXH }UH = UPx UH , (107) 2 and the fact that Px is a diagonal matrix for WSS signals. 8 0 1 The properties of the WSS signals in classical analyses, 3 presented in this subsection, will be used next to define the 15 corresponding properties of random signals on undirected graphs. 7 6.2. Adjacency Matrix Based Definition of GWSS Consider a real-valued white noise signal on a graph, ε = {ε(n)}. Following Remark 24, a signal x on the graph is graph wide sense stationary (GWSS) if it can be con- Figure 30: The noisy signal from Fig. 3 was filtered using K = 100 iterations of the Taubin two-step algorithm with α = 0.1 and β = 0.1. sidered an output of a linear shift invariant system on a PM −1 graph, H(A) = m=0 hm Am , which is driven by a white noise input, ε, that is general properties of WSS signals, to define the conditions for wide sense stationarity of random signals on graphs x = H(A)ε. [49, 50, 51, 52, 53, 54]. However the main obstacle toward extending the classical statistical data analytics to graphs Remark 28: The autocorrelation matrix, Rx = E{xxT }, is that the shift on a graph typically does not preserve of a GWSS signal is diagonalizable with the eigenmatrix 2 2 signal energy (isometry property), that is, kAxk2 6= kxk2 . of the adjacency matrix, A, since (cf. Remark 27) For completeness, we first provide a short review of WSS definitions in classical signal processing, together with A = UΛU−1 = UΛUT their properties. E{xxT } = UPx UT , (108) 6.1. Review of WSS and related Properties for Random where Px is a diagonal matrix. The values on the diag- Signals in Standard Time Domain onal of matrix Px can be comprised into the vector px , Definition: A real-valued random signal, x(n), is WSS which represents the PSD of a graph signal, x, px (k) = in the standard time domain if its mean value is time- E{|X(k)|2 }. invariant, µx (n) = E{x(n)} = µx , and its autocorrela- To prove this property for a signal x = H(A)ε, con- tion function is shift-invariant, that is, rx (n, n − m) = sider E{x(n)x(n − m)} = rx (m). T Remark 24: A random WSS time-domain signal, x(n), Rx = E{xxT } = E{H(A)ε H(A)ε } = H(A)H T (A), can be considered as an output of a linear shift invariant system with impulse response, h(n), which is driven by a since E{εεT } = I. Using H(A) = UT H(Λ)U, we obtain white noise input, ε(n), with rε (n, m) = δ(n − m). Rx = UT |H(Λ)|2 U, Remark 25: In classical time domain, the eigenvectors, uk , of the shift operator y(n) = x(n − 1), or in a matrix which concludes the proof that the matrix Px is diagonal form y = Ax, are the DFT basis functions, with A = Px = |H(Λ)|2 , UΛUH . This property is discussed in detail and proven in Part I, Section 3.2.1, equations (24)-(25). with the diagonal elements equal to the PSD of signal x, H Remark 26: For a random signal, its DFT X = U x px (k) = |H(λk )|2 . is also a random signal with the power spectrum matrix Px = E{XXH }, where UH is the DFT transformation The periodogram of a graph signal can be estimated matrix. For WSS signals, the matrix Px is diagonal and using K realizations of the random signal, denoted by xi , has the power spectral density (PSD) as its diagonal values and is equal to the diagonal elements of the matrix px (k) = DFT{rx (n)} = E{|X(k)|2 }. K K 1 X 1 X P̂x = |Xi |2 = UT (xi xTi )U. K i=1 K i=1 34 Consider a system on a graph, with a spectral domain 6.4. Spectral Domain Shift Based Definition of GWSS transfer function H(Λ). Assume that the input signal to Consider an m-step shift on a graph defined using the this system is GWSS, with PSD px (k). The PSD of the graph filter response output graph signal, y(n), is then given by N −1 py (k) = |H(λk )|2 px (k). X Tm {h(n)} = hm (n) = H(λk )uk (m)uk (n). (109) k=0 This expression is conformal with the output power of a standard linear system. The matrix form of this relation is given by M −1 6.3. Wiener filter on a graph X Th = H(L) = hm Lm = UH(Λ)UT , (110) Consider a real-valued graph signal, s, which serves as m=0 an input to a linear shift-invariant system on an undirected graph, to yield a noisy output where Tm {h(n)} are the elements of Th . M −1 Note that the graph filter response function is well lo- calized on a graph. Namely, if we use, for example, the X x= hm Am s + ε. m=0 (M −1)-neighborhood of a vertex n, within a filtering func- tion of order M defined by H(Λ), then only the vertices In the spectral domain, this system is described by within this neighborhood are used in the calculation of X = H(Λ)S + E, graph filter response. From (110), we see that the local- ization operator acts in the spectral domain and associates where E is the GDFT of the noise, ε. the corresponding shift to the vertex domain. Assume that the signal and noise are statistically in- Definition: A random graph signal, x(n), is GWSS if its dependent, and that the noise is a zero-mean GWSS ran- autocorrelation function is invariant with respect to the dom signal. The aim is to find the system function of the shift, Tm {rx (n)}, that is optimal filter, G(Λ), such that its output Y = G(Λ)X, estimates the GDFT of the input, S, in the least squares rx (m) = E{x(n)x(n − m)} = Tm {rx (n)}. sense. This condition can be expressed as Similar to (108), the autocorrelation matrix, Rx , of a 2 2 e2 = E{kS − Yk2 } = E{kS − G(Λ)Xk2 }. GWSS signal is diagonalizable based on the matrix of eigenvectors of the graph Laplacian L, whereby Upon setting the derivative of e2 with respect to the ele- ments of G(Λ) to zero, we arrive at L = UΛUT . (111) 2E{(S − G(Λ)X)XT } = 0, For the basic autocorrelation we use which results in the system function of the graph Wiener Rx = UPx (Λ)UT filter in the form E{SXT } E{S(H(Λ)S + E)T } so that G(Λ) = = E{XXT } E{(H(Λ)S + E)(H(Λ)S + E)T } N X −1 H(Λ)Ps Tm {rx (n)} = px (λk )uk (m)uk (n) = 2 k=0 H (Λ)Ps + Pε where or element-wise Px (Λ) = URx UT H(λk )ps (k) G(λk ) = . is a diagonal matrix. H 2 (λk )ps (k) + E(k) When the noise is not present, the elements of the vector E 6.5. Isometric Shift Operator are zero-valued, E(k) = 0 for all k, and the graph inverse Another possible approach may pbe based on the shift filter (introduced in Section 3.5.5) directly follows. operator defined as Tm = exp(jπ L/ρ), where ρ is the Remark 29: The above expressions for the graph Wiener upper bound on the eigenvalues, ρ = maxk {λk }, [55, 56]. filter are conformal with the standard frequency domain Physically, this operator casts the eigenvalues of the Lapla- Wiener filter, given by cian, L, onto a unit circle, thus preserving in this way the isometry property, since Ps (ω) G(ω) = , p p Ps (ω) + Pε (ω) Th = exp(jπ L/ρ) = U exp(jπ Λ/ρ)UT . (112) which again demonstrates the generic nature of Graph The property f (L) = Uf (Λ)UH was used above. Observe Data Analytics. that for real-valued eigenvalues, λk , all eigenvalues of the 35 1 p 0 1 matrix exp(jπ Λ/ρ) p reside on the unit circle, with the 2 frequency 0 ≤ ωk = π λk /ρ ≤ π being associated to the 3 eigenvector uk . 4 However, with this setup the corresponding GDFT bases 6 5 7 are not necessarily orthogonal for a general graph struc- ture. In turn, the work in [57] drew inspiration from (a) Directed graph structure. (b) Graph signal, x. the classical definition of the unitary shift operator acting on Hilbert spaces [58] to define an isometric shift oper- ator with orthogonal GDFT bases. Since the adjacency matrix, A, provides the minimal information required to fully reflect the connectivity structure arising from the (c) Backward shifted signal, Sx. (d) Backward shifted signal, Ax. graph topology, and therefore to define the most elemen- tary graph shift, the task of determining an isometric graph shift operator can be formulated as finding the matrix S closest to A in a Hilbert space, that is, max hS, Ai = tr SAT S (e) Forward shifted signal, ST x. (f) Forward shifted signal, AT x. s.t. ST S = SST = I Figure 31: Graph signal shifts by the proposed isometric shift opera- This can be achieved analytically by evaluating the singu- tor, S, evaluated on a directed graph. (a) Directed graph structure. (b) A simple graph signal, x. (c) Backward shifted version of x, lar value decomposition of A, given by A = UΣVT where given by Sx. (d) Backward shifted version, Ax. (e) Forward shifted U, V ∈ RN ×N are respectively the left and right matrix of version, ST x. (f) Forward shifted version, AT x. The red arrows singular vectors, and Σ ∈ RN ×N is the diagonal matrix of indicate the movement of the pulse at vertex n = 2 towards vertices connected with (i) outgoing blue arrows in (a) for a forward shift singular values. In this way, the backward shift operator and (ii) to vertices connected with incoming blue arrows in (a) for a can be expressed as backward shift. S = UQVT (113) in a natural analogy with classical time-frequency analysis where [63, 64, 65]. I(N −1)×(N −1) 0(N −1)×1 It is important to note that, while the concept of win- Q= (114) 01×(N −1) det(UVT ) dow functions for signal localization has been extended ensures that det(S) = 1, so as to produce a proper rotation to signals defined on graphs [18, 66, 67, 68, 69], such ex- matrix. The resulting matrix, S, is called the symmetric tensions are not straightforward, since, owing to inher- orthogonalization of the matrix A, and is unique [59, 60]. ent properties of graphs as irregular but interconnected The solution is also closely related to the orthogonal Pro- domains, even an operation which is very simple in clas- crustes problem [61] and the Kabsch algorithm [62]. sical time-domain analysis, like the time shift, cannot be Example 16: Consider the directed graph in Fig. 31(a) which straightforwardly generalized to graph signal domain. This has N = 8 vertices in the set V = {0, 1, 2, 3, 4, 5, 6, 7}. The has resulted in several approaches to the definition of the backward and forward shifted versions of the signal in Fig. graph shift operator, and much ongoing research in this 31(b) were evaluated using both the elementary shift matrix, domain [18, 66, 67, 68, 69]. A, and the proposed isometric graph shift operator, S in (113). A common approach to signal windowing in the graph Notice that, as desired, the signal energy was preserved when domain is to utilize the eigenspectrum of a graph signal employing the isometric graph shift operator, S, while the en- to obtain window function for each graph vertex [5]. An- ergy of the signals shifted through A increased. other possibility is to define the window support as a local neighborhood for each vertex [69]. In either case, the lo- calization window is defined based on a set of vertices that 7. Vertex-Frequency Representations contain the current vertex, n, and all vertices that are close Oftentimes in practical applications concerned with large in some sense to the vertex n, that is, a neighborhood of graphs, we may not be interested in the analysis of the en- vertex n. In this monograph, special attention is devoted tire graph signal, but rather in its local behavior. Indeed, to the class of local graph Fourier transform approaches the Big Data paradigm has revealed the possibility of using which can be implemented in the vertex domain, since smaller and localized subsets of the available information this domain often offers a basis for numerically efficient to enable reliable mathematical analysis and local char- analysis in the case of very large graphs. acterization of subsets of data of interest [12]. Our aim Notice that, as in classical signal analysis, a localization in this section is to characterize the localized graph signal window should be narrow enough so as to provide good behavior simultaneously in the vertex-frequency domain, localization of signal properties, but at the same time wide 36 enough to produce high resolution in the spectral domain. while matrix U is composed of the eigenvectors uk , with With vertex-frequency analysis serving as a key to graph elements uk (n), k = 0, 1, . . . , N −1, of the graph Laplacian signal estimation, filtering, and efficient representation, as its columns. two forms of the local graph Fourier transform inversion Special cases: are considered here, while the inversion condition is de- fined within the frames framework, that is, based on the • For hm (n) = 1, the localized vertex spectrum is analysis of energy of the graph spectrogram. A relation equal to the standard spectrum, S(m, k) = X(k), between the graph wavelet transform and the local graph in (115) for each m; this means that no vertex local- Fourier transform implementation and its inversion is also ization is performed. established. • If hm (m) = 1 and hm (n) = 0 for n 6= m, the lo- Remark 30: The energy versions of the vertex-frequency calized vertex spectrum √ is equal to the graph signal, representations are also considered, as these representa- S(m, 0) = x(m)/ N , for k = 0. tions can be implemented without a localization window, In the following, we outline ways to create vertex do- and they can serve as estimators of the local smoothness main windows with desirable localization characteristics, index. and address two methods for defining graph localization The reduced interference vertex-frequency distributions, window functions, hm (n): which satisfy the marginal property and localize graph sig- • Spectral domain definition of windows, hm (n), which nal energy in the vertex-frequency domain are also defined, are defined using their spectral basic function. The and are subsequently related to classical time-frequency spectral domain definition of the window is shown to analysis, as a special case. be related to the wavelet transform. Consider a graph with N vertices, n ∈ V = {0, 1, . . . , N − 1}, which are connected with edges whose weights are • Vertex domain window definitions, with one method Wmn . Spectral analysis of graphs is most commonly based bearing a direct relation to the spectral analysis of on the eigendecomposition of the graph Laplacian, L, or the graph window, while the other method represents the adjacency matrix, A. By default, we shall assume a purely vertex domain formulation. the decomposition of the graph Laplacian, L, if not stated otherwise. 7.1.1. Windows Defined in the GDFT Domain The basic function of a window, h(n), can be conve- 7.1. Localized Graph Fourier Transform (LGFT) niently defined in the spectral domain, for example, in the The localized graph Fourier transform (LGFT), de- form noted by S(m, k), can be considered as an extension of H(k) = C exp(−λk τ ), (116) the standard time-localized (short-time) Fourier transform where C denotes the “window amplitude” and τ > 0 is a (STFT), and can be calculated as the GDFT of a signal, constant which determines the window width in the spec- x(n), multiplied by an appropriate vertex localization win- tral domain. Notice that the graph shifted and “modu- dow function, hm (n), to yield lated” versions of this window are straightforwardly ob- N X −1 tained using the generalized convolution of graph signals, S(m, k) = x(n)hm (n) uk (n). (115) defined in Section 3.9. The graph-shifted window in the n=0 vertex domain is then defined by the IGDFT of H(k)uk (m), to give the window localized at the vertex m, denoted by In general, it is desired that a graph window function, hm (n), as in (57), in the form hm (n), should be such that it localizes the signal content N −1 around the vertex m. To this end, its values should be close X to 1 at vertex m and vertices in its close neighborhood, hm (n) = h(n) ∗ δm (n) = H(k)uk (m)uk (n). (117) while it should approach to 0 for vertices that are far from k=0 vertex m. For an illustration of the concept of localization An example of two windows obtained in this way is given window on a graph see Fig. 33, panels (a) and (c). in Fig. 33(a), (b). The localized GDFT in (115) admits a matrix notation, Observe that the exponential function in (116) corre- S, and contains all elements, S(m, k), m = 0, 1, . . . , N − 1, sponds to a Gaussian window in classical analysis (thus of- k = 0, 1, . . . , N − 1. The columns of S which correspond fering the best time-frequency concentration [63, 64, 65]), to a vertex m are given by since graph signal processing on a path graph reduces to classical signal analysis. In this case, the eigenvalues of sm = GDFT{x(n)hm (n)} = UT xm , the graph Laplacian, λ, may be related to the frequency, ω, in classical signal analysis as λ ∼ ω 2 . where xm is the vector of which the elements, x(n)hm (n), Properties of graph window functions. The graph are equal to the graph signal samples, x(n), multiplied by window which is localized at the vertex m, and defined by the window function, hm (n), centered at the vertex m, (117), satisfies the following properties: 37 1 1 0 0 -1 -1 0.5 0.5 0 0 -0.5 -0.5 -1 0 1 (a) -1 0 1 (b) 99 5 79 74 8 93 59 52 51 31 24 13 8280 62 55 32 22 98 76 75 69 28 29 15 7 63 39 26 95 94 78 66 58 27 19 12 1 86 60 4240 37 23 17 81 65 53 54 35 11 6 4 89 67 61 57 48 41 25 20 85 84 44 2 96 92 38 34 68 64 56 45 43 97 83 18 90 88 73 71 50 30 21 0 77 49 91 36 33 9 87 72 70 47 46 1614 10 3 (c) (d) Figure 32: Concept of a signal on a graph. (a) Vertices on a three-dimensional manifold Swiss roll surface. (b) A graph representation on the Swiss roll manifold. (c) Two-dimensional presentation of the three-dimensional graph from (b), with vertex colors defined by the three smoothest graph Laplacian eigenvectors u1 (n), u2 (n), and u3 (n). (d) A signal observed on the graph in (c), which is composed of three Laplacian eigenvectors (signal components). The supports of these three components are designated by different vertex colors. The vertex-frequency representations are then assessed based on their ability to accurately resolve and localize these three graph signal components. 38 W1 : Symmetry, hm (n) = hn (m), which follows from the the value of S(m, k) in (119) becomes the standard STFT, definition in (117). that is W2 : A sum of all coefficients of a localized window, hm (n), N −1 N −1 1 X X 2π 2π 2π is equal to H(0), since S(m, k) = x(n)H(p)ej− N mp ej N np e−j N nk , N 3/2 n=0 p=0 N −1 N −1 N −1 X X X N −1 hm (n) = H(k)uk (m) uk (n) 1 X = x(n)h(n − m)e−j2πnk/N , (123) n=0 k=0 n=0 N n=0 N −1 X √ = H(k)uk (m)δ(k) N = H(0), where h(n) is the inverse DFT of H(k). k=0 Example 17: To illustrate the principle of local vertex-frequency PN −1 √ with n=0 uk (n) = δ(k) N , following from the def- representations, consider the graph and the graph signal from inition of the eigenvectors, uk (n). Fig. 32. A graph with N = 100 vertices, randomly placed on the so called Swiss roll surface, is shown in Fig. 32(a). W3 : The Parseval theorem for hm (n) has the form The vertices are connected with edges whose weights are de- 2 fined as Wmn = exp(−rmn /α), where rmn is the Euclidean N X −1 N X −1 distance between the vertices m and n, measured along the |hm (n)|2 = |H(k)uk (m)|2 . (118) Swiss roll manifold, and α is a constant. Small weight values n=0 k=0 were hard-thresholded to zero, in order to reduce the number of edges associated with each vertex to only a few strongest These properties will be used in the sequel in the inversion ones. The so produced graph is shown in Fig. 32(b), and its analysis of the LGFT. two-dimensional presentation in Fig. 32(c). Vertices are or- Based on the above properties, the LGFT can now be dered so that the values of the Fiedler eigenvector, u1 (n), are written as nondecreasing. N −1 A signal on this graph was created so as to be composed X of parts of three Laplacian eigenvectors. For the subset, V1 , of S(m, k) = x(n)hm (n) uk (n) (119) all vertices, V, which comprises the vertices with indices from n=0 m = 0 to m = 29, the eigenvector with the spectral index N −1 N −1 X X k = 8 was used. For the subset, V2 , with the vertex indices = x(n)H(p)up (m)up (n) uk (n). (120) from m = 30 to m = 59, the signal was equal to the eigenvector n=0 p=0 u66 (n), that is, with k = 66. The remaining vertices form the vertex subset V3 , and the signal on this subset was equal to the The modulated (frequency shifted) version of the window eigenvector with the spectral index k = 27. The amplitudes of centered at a vertex m and for a spectral index k will be these eigenvectors were scaled too. referred to as the vertex-frequency kernel, Hm,k (n), which Consider now the vertex-frequency localization kernels, is defined as −1 Hm,k (n) = hm (n)uk (n), NX Hm,k (n) = hm (n)uk (n) = H(p)up (m)up (n) uk (n). √ shown in Fig. 33. The constant eigenvector, u0 (n) = 1/ N , p=0 was used in the panel shown in Fig. 33(a) at m = 34. In this (121) case, the√localization window, h34 (n), is shown since H34,0 (n) = Using the kernel notation, it becomes obvious that the h34 (n)/ N . The illustration is repeated in the panel in Fig. LGFT in (120), for a given vertex m and a spectral index k, 33(c) for the vertex m = 78. The frequency shifted version of physically represents a projection of a graph signal, x(n), these two vertex-domain kernels, shown in Figs. 33(a) and (c), onto the graph kernel, Hm,k (n), that is, are given respectively in Figs. 33(b) and (d), where Hm,20 (n) = hm (n)u20 (n) is shown for m = 34 and m = 78, respectively. N −1 X Next, the vertex-frequency representation, S(n, k), using S(m, k) = hHm,k (n), x(n)i = Hm,k (n)x(n). (122) the LGFT and the localization window defined in the spectral n=0 domain is shown in Fig. 34. From this representation, we can clearly identify the three constituent signal components, within Remark 31: The classical STFT, a basic tool in time- their intervals of support. The marginal properties, such as frequency analysis, can be obtained as a special case of the the projections of S(n, k) onto the vertex index axis and the GDFT when the graph is directed and circular. For this spectral index axis, are also clearly distinguishable. From the type of graph, the eigendecomposition of the adjacency marginal properties, we can conclude that the considered graph matrix√produces complex-valued eigenvectors of the form signal in hand is spread over all vertex indices, while its spectral uk (n) N = exp(j2πnk/N ). Then, having in mind the localization is dominated by the three spectral indices which complex nature of these eigenvectors, correspond to the three components of the original graph signal. In an ideal case of vertex-frequency analysis, these marginals N X −1 N X −1 should respectively be equal to |x(n)|2 and |X(k)|2 , which is S(m, k) = x(n)H(p)u∗p (m)up (n) u∗k (n), not the case here. n=0 p=0 39 (a) (b) H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n) (c) (d) H34,20 (n) = h34 (n)u20 (n) H78,20 (n) = h78 (n)u20 (n) 1 1 0 0 -1 -1 0.5 0.5 0 0 -0.5 -0.5 -1 0 1 (e) -1 0 1 (f) H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n) Figure 33: Illustration of localization kernels, Hm,k (n) = hm (n)uk (n), for vertex-frequency analysis based on spectral domain defined windows PN −1 within the local graph Fourier transform, S(m, k) = √ n=0 x(n)Hm,k (n). (a) Localization kernel H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n), for a constant eigenvector, u0 (n) = 1/ N , centered at the vertex m = 34. (b) The same localization kernel as in (a) but centered at the vertex m = 78. (c) Localization kernel, H34,20 (n) = h34 (n)u20 (n), centered at the vertex m = 34 and frequency shifted by u20 (n). Notice that the variations in kernel amplitude indicate the effects of modulation of the localization window, hm (n). (d) The same localization kernel as in (c), but centered at the vertex m = 78. (e) Three-dimensional representation of the kernel H34,0 (n) = h34 (n)u0 (n). (f) Three-dimensional representation of the kernel H78,0 (n) = h78 (n)u0 (n). 7.1.2. Spectral Domain Localization of the LGFT Remark 32: Recall that the classical time-frequency anal- Recall that the classical STFT admits frequency lo- ysis counterpart of (124) is [63] calization in the spectral domain; this is achieved based N −1 on the DFT of the original signal and a spectral domain 1 X 2π S(m, k) = √ X(p)H(k − p)ej N mp . window. For graph signals, we may also adapt this ap- N p=0 proach to perform signal localization in the spectral do- main, whereby the LGFT is obtained as an inverse GDFT of X(p) that is localized by a spectral domain window, H(k − p), which is centered around spectral index k, that 7.1.3. LGFT Realization with Band-Pass Functions is Assume that the GDFT of the localization window, NX−1 hm (n), corresponds to the transfer function of a band- S(m, k) = X(p)H(k − p) up (m). (124) pass system on a graph, centered at an eigenvalue, λk , and p=0 around it, and that it is defined in the form of a polynomial Note that this form of the LGFT can be entirely imple- given by mented in the graph spectral domain. The spectral domain −1 LGFT form in (124) can be implemented using band-pass Hk (λp ) = h0,k + h1,k λp + · · · + hM −1,k λM p , (126) transfer functions, Hk (λp ) = H(k − p), as with (M − 1) as the polynomial order and k = 0, 1, . . . , K, N −1 where K is the number of spectral bands. The vertex shifted version of the window, hm (n), has X S(m, k) = X(p)Hk (λp ) up (m). (125) p=0 the GDFT of the form, GDFT{h(n)∗δm (n)} = H(p)up (m). Therefore, the inverse GDFT of Hk (λp )up (m) represents a vertex domain kernel, where Hk (λp ) is centered at the 40 0.5 0 100 Classical time-frequency domain kernel. To addi- tionally clarify the previous two forms of kernels, we will observe their special cases for a circular directed graph and 80 write the kernels in the classical time-frequency domain. The kernel defined by (121) uses low-pass function H(k) 60 and assumes the following form N −1 40 1 X 2π 2π 2π Hm,k (n) = H(p)e−j N mp ej N np e−j N kn N 3/2 p=0 20 1 2π 1 = h(n − m)e−j N kn = √ hk (n − m), N N 0 which is shifted for m in time, and modulated√ by the 2π 0.08 kth eigenvector elements u∗k (n) = e−j N kn / N , to achieve 0.06 centering around the spectral index k. 0.04 The classical time-frequency domain form of the kernel 0.02 in (127) is given by 0 0 20 40 60 80 100 N −1 1 X 2π 2π Hm,k (n) = Hk (λp )e−j N mp ej N np N p=0 Figure 34: Local vertex-frequency spectrum calculated using the LGFT and the vertex-frequency localized kernels defined in the spec- N −1 1 X 2π 2π 1 tral domain, as in (121). From this representation, observe that the = H(p − k)e−j N mp ej N np = √ hk (n − m), graph signal consists of three distinct components located at spectral N p=0 N indices k = 8, k = 66, and k = 27, with the corresponding vertex index subsets V1 , V2 , and V3 , where V1 ∪ V2 ∪ V3 = V. The marginal (vertex and spectrum-wise) properties are shown in the panels to the where hk (n−m) is the temporary shifted version of hk (n) = right and below the vertex-frequency representation. Observe that, IGDFT{Hk (λp )} = IDFT{H(k − p)}, which corresponds while the graph signal is spread across all vertices, its spectral con- to the already frequency shifted (band-pass) transfer func- tent is localized at the three spectral indices which correspond to the tion Hk (λp ) = H(p − k). constituent signal components. In an ideal case of vertex-frequency analysis, these marginals should be respectively equal to |x(n)|2 and In the case of kernel (127), the local vertex-frequency |X(k)|2 . transform for a vertex, m, and a spectral index, k, becomes N X −1 spectral index k by definition, while up (m) corresponds to S(m, k) = Hm,k (n)x(n) the shift in the vertex domain which centers the window at n=0 the vertex m. In other words, this kernel, centered around N X −1 N X −1 N X −1 the spectral index k and vertex m, is defined as = x(n)Hk (λp )up (m)up (n) = X(p)Hk (λp )up (m). n=0 p=0 p=0 N −1 X (128) Hm,k (n) = Hk (λp )up (m)up (n). (127) p=0 The relation (128) can be written in a vector form as Remark 33: It is important to emphasize crucial differ- M X −1 ence between the vertex-frequency kernels in (121) and sk = UHk (Λ)UT x = Hk (L)x = hp,k Lp x, (129) (127). The kernel in (121) is defined based on the low- p=0 pass transfer function H(k), such as in (116), appropri- ately shifted in the vertex domain and the spectral do- where sk is the column vector with elements S(m, k), m = main, to be centered at a vertex m and at a spectral in- 0, 1, . . . , N − 1, and the property of the eigendecomposi- dex k. This is achieved involving adequate modulation tion of a matrix polynomial is used in derivation. The terms uk (n) and up (m). The transfer function in the ker- number of bands (shifted transfer functions, Hk (λp ), k = nel given by (127), Hk (λp ), is centered at k by definition 0, 1, . . . , K) is equal to K + 1 and is not related to the total (126). Hence, it is needed to perform the spectral modula- number of indices, N . tion only, by up (n), in order to center the kernel, Hm,k (n), Example 18: Consider the simplest decomposition into a low- at a vertex m. Therefore, the main difference between the pass and high-pass part of a graph signal, with K = 1. In this kernels in (121) and (127) is that the spectral shift in (121) case, the two values, k = 0 and k = 1, represent respectively is achieved by a modulation in the vertex domain using the low-pass part and high-pass part of the graph signal. Such a decomposition can be achieved using the graph Laplacian uk (n), while in (127) the kernel is directly shifted (defined with h0,0 = 1, h0,1 = −1/λmax , and h1,0 = 0, h1,1 = 1/λmax , as a pass-band function) in the spectral domain. 41 where the coefficients are chosen so as to form a simple lin- 1 early decreasing function of λp for the low-pass, and a linearly 0.8 increasing function of λp for the high-pass, in the correspond- ing transfer functions. These low-pass and high-pass transfer 0.6 functions are respectively given by 0.4 λp λp H0 (λp ) = (1 − ), H1 (λp ) = , 0.2 λmax λmax which leads to the vertex domain implementation of the LGFT 0 0 10 20 30 40 50 60 70 80 90 100 in the form (a) 1 1 s0 = (I − L) x, s1 = L x. λmax λmax To improve the spectral resolution, we can employ the same transfer function, but divide the low-pass part into its low-pass 1 and high-pass part. The same can be performed for the high- pass part, to obtain 0.5 L 2 L L L2 s00 = I− x, s01 = 2 I− x, s11 = 2 x. λmax λmax λmax λmax The factor 2 appears in the new middle pass-band, s01 , since 0 the low-high-pass and the high-low-pass components are the 0 1 2 3 4 5 6 7 same. (b) A division into (K +1) bands would correspond to the terms of a binomial form K (I − L/λmax ) + L/λmax x, 1 with the corresponding transfer functions in the vertex domain given by 0.5 ! K 1 K−k 1 k Hk (L) = I− L L . k λmax λmax 0 0 1 2 3 4 5 6 7 Example 19: Consider the transfer functions Hk (λp ), p = (c) 0, 1, . . . , N − 1, k = 0, 1, . . . , K in the spectral domain, cor- responding to the binomial form terms for K = 25, which Figure 35: Exemplar of transfer functions in the spectral domain. (a) The spectral domain transfer functions Hk (λp ), p = 0, 1, . . . , N − are shown in Fig. 35(a). These functions are used for the 1, k = 0, 1, . . . , K which correspond to the binomial form terms LGFT calculation at vertex indices m = 0, 1, . . . , N − 1 in the for K = 50. (b) The transfer functions Hk (λp ), p = 0, 1, . . . , N − k = 0, 1, . . . , K bands for the graph and signal from Fig. 32. 1, k = 0, 1, . . . , K which correspond to the raised cosine (Hann) Since the bands are quite spread out, the resulting LGFT is window form for K = 25. (c) The spectral index-varying (wavelet- also spread along the frequency axis. The frequency concentra- like) transfer functions Hk (λp ), p = 0, 1, . . . , N − 1, k = 0, 1, . . . , K tion can be improved by reassigning the values of S(m, k) to which correspond to the raised cosine (Hann) window form for K = the position of their maximum value along the frequency band 13. The transfer function H9 (λ) is designated by the thick black line for each considered domain, while its discrete values at λp , H9 (λp ), index, k, for each vertex index, m. The so reassigned LGFT are shown in gray, in panels (b) and (c). values are given in Fig. 36. Of course, any band-pass function, Hk (Λ), can be used in (125) or (128) to produce the LGFT in the form where (ak , bk ] and (bk , ck ], k = 1, 2, . . . , K, define the spec- tral bands for Hk (Λ). For uniform bands within 0 ≤ λ ≤ sk = UHk (Λ)UT x = Hk (L)x. (130) λmax , the intervals can be defined by Commonly used examples of such band-pass functions are λmax the spline or raised cosine (Hann window) functions. We ak = ak−1 + K will next use the general form of the shifted raised cosine λmax functions as the transfer functions, defined by bk = ak + K  λmax 2 π ak λ ck = ak + 2 (132) sin 2 bk −ak ( ak − 1) , for ak < λ ≤ bk      K Hk (λ) = cos2 π bk ( λ − 1) , for b < λ ≤ c with a1 = 0 and limλ→0 (a1 /λ) = 1. The initial transfer 2 ck −bk bk k k function, H0 (λ), is defined using only 0 = b0 ≤ λ ≤ c0 =     0, elsewhere, λmax /K, while the last transfer function, HK (λ), is defined  (131) using the interval aK < λ ≤ bK = λmax in (131). 42 100 80 25 60 20 40 15 20 10 5 20 40 60 80 100 20 40 60 80 100 Figure 37: Vertex-frequency representation from Fig. 36(c) with the (a) axis of the eigenvalue index, p, instead of the frequency band index, k. The same value of LGFT, S(m, k), is assigned to each spectral index, p, when λp ∈ ( ak +b 2 k bk +ck , 2 ], and without any scaling. 25 20 The raised cosine transfer function satisfy the following condition XK 15 Hk (λp ) = 1. (133) k=0 10 The conditions for graph signal reconstruction from the LGFT will be discussed in Section 7.2. 5 Example 20: The shifted raised cosine functions, defined by (131) and (132), are shown in Fig. 35(b) for the graph from Fig. 32, for K = 25. These functions are used for the LGFT 20 40 60 80 100 calculation of the graph signal from Fig. 32 at the vertex (b) indices m = 0, 1, . . . , N − 1, and in (K + 1) spectral bands, k = 0, 1, . . . , K. The absolute LGFT values are given in Fig. 36(b). Spectral resolution depends on the number of bands K, 10 with a larger number of spectral bands resulting in a higher spectral resolution. 8 Example 21: The experiment from Examples 19 and 20 is repeated with varying bounds of the spectral intervals in the 6 raised cosine transfer functions Hk (λp ), p = 0, 1, . . . , N − 1, k = 0, 1, . . . , K. The spectral index-varying (wavelet-transform 4 like) form of the raised cosine transfer functions Hk (λp ), p = 0, 1, . . . , N −1, k = 0, 1, . . . , K, is defined by the interval bounds 5 2 λmax (1.5 + p)/11.5 , for p = 0, 1, 2, . . . , 10, 20 40 60 80 100 ak ∈ {0, 0.004, 0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9}, (c) bk ∈ {0.004, 0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9, 4.8}, ck ∈ {0.02, 0.07, 0.19, 0.44, 0.9, 1.7, 2.9, 4.8, 7.63}, Figure 36: Vertex-frequency representation of a three-component sig- nal in Fig. 32(d). (a) The LGFT of the signal from Fig 32(d), calcu- k = 1, 2, . . . , 9, lated using the transfer functions for frequency selection given in Fig. and depicted in Fig. 35(c). The LGFT values, S(m, k), cal- 35(a). The LGFT values, S(m, k), were reassigned to the position culated with the so-obtained transfer functions, Hk (λp ), are of its maximum value along the frequency band index, k, for each vertex index, m. (b) The LGFT of the signal from Fig 32(d), calcu- shown in Fig. 36(c). In order to illustrate the change of resolu- lated using the transfer functions for frequency selection given in Fig. tion in this case, the LGFT was reassigned to each eigenvalue 35(b). The LGFT values, S(m, k), were reassigned to the positions λp , p = 0, 1, . . . , N − 1, and shown in Fig. 37. As in classi- of their maximum values along the frequency band index, k, for each cal wavelet transform, the spectral resolution is lower for the vertex index, m. (c) The LGFT of the signal from Fig 32(d), calcu- higher spectral indices. lated using the wavelet-like transfer functions for frequency selection given in Fig. 35(c). 7.1.4. Signal Adaptive LGFT The spectral graph wavelet-like transform is just an ex- ample of varying spectral transfer functions in the LGFT, 43 where the spectral resolution is the highest (spectral wavelet 1 functions narrowest) for small values of the smoothness in- dex, λp . The spectral resolution decreases as the spectral 0.5 wavelet functions become wider for large smoothness index 0 values, Fig. 35(c). In general, the change of resolution may be arbitrary and signal adaptive, for example, the resolu- tion may be higher for the spectral intervals of λ which are 0 1 2 3 4 5 6 7 (a) rich in signal components and lower within the intervals where there are no signal components. Before introducing an example with a signal adaptive LGFT, we will modify the transfer functions, Hk (λp ), in 1 (131) to satisfy the condition K X 0.5 Hk2 (λp ) = 1. (134) k=0 as this will be important for the frame-based LGFT inver- 0 0 1 2 3 4 5 6 7 sion. (b) Notice that a simple transformation of the transfer functions, Hk (λp ) → Hk2 (λp ), would allow for the condi- PK PK tion k=0 Hk2 (λp ) = 1 to hold instead of k=0 Hk (λp ) = 16 1. This means that a simple removal of squares in the 14 sine and cosine functions in (131) would produce a form PK 12 to satisfy the condition k=0 Hk2 (λp ) = 1. Both of these conditions will be used in Section 7.2 in various approaches 10 to the graph signal reconstruction from the LGFT. 8 By removing the squares in the sine and cosine func- 6 tions in (131), their first derivative loses continuity in λ 4 at the end interval points. In order to preserve continuous derivatives, the arguments in the sine and cosine functions 2 can be mapped by a polynomial, 20 40 60 80 100 vx (x) = x4 (35 − 84x + 70x2 − 20x3 ), for 0 ≤ x ≤ 1, (c) with vx (0) = 0 and vx (1) = 1. In this way, we arrive 100 at the Meyer wavelet-like transfer functions [70] for the LGFT calculation, given by 80  π ak λ    sin 2 vx bk −ak ( ak − 1) , for ak < λ ≤ bk 60   Hk (λ) = cos π v bk λ   2 x ck −bk ( bk − 1) , for bk < λ ≤ ck 40   0, elsewhere.  20 (135) The initial transfer function, k = 0, and the last transfer function, k = K, are calculated using only the half of the 20 40 60 80 100 interval, as explained after the spectral band definition in (d) relation (132). Example 22: The transfer functions of the form defined in Figure 38: A graph signal and transfer functions in the spectral (135) are used with signal adaptive intervals. These intervals domain for a signal adaptive LGFT. (a) Graph signal in the spec- are defined in such a way that they are small (fine) around λ, tral domain, X(p), as a function of the eigenvalues, λp . (b) The spectral domain transfer functions Hk (λp ), p = 0, 1, . . . , N − 1, where a significant signal spectral content is detected, and are k = 0, 1, . . . , K which satisfy the condition K P 2 big (rough) around λ where the signal spectral content is low, k=0 Hk (λp ) = 1, with K = 16. (c) The LGFT of the signal from Fig 32(d), calculated as in Fig. 38(a) and (b). The intervals are narrow (with a high using the transfer functions for frequency selection given in (b). (d) resolution) around the three signal components at λ = 0.38, Vertex-frequency representation from (c) with the eigenvalue (spec- λ = 1.87, and λ = 4.62. Vertex-frequency representation with tral) index, p, axis instead of the frequency band index, k. The same these transfer functions is shown in Fig. 38(c) and (d) with the value of LGFT, S(m, k), is assigned to each spectral index, p, when spectral band index, k, and the assigned eigenvalue (spectral) λp ∈ ( ak +b 2 k bk +ck , 2 ], without any scaling. 44 index, p, as a spectral axis. Fine intervals around the spectral Table 2: Coefficients, hi,k , i = 0, 1, . . . , M − 1, k = 0, 1, . . . , K, for signal components allowed for high spectral resolution repre- the polynomial calculation of the LGFT, sk , of a signal, x , in various sentation, as in Fig. 38(c), with a smaller number of transfer spectral bands, k, shown in Fig. 39(b). The obtained LGFT of the functions K + 1 = 17. A wider interval width for the third three-component signal from Fig. 32(d) is given in Fig. 40(a). component resulted in a lower spectral resolution than in the sk = (h0,k I + h1,k L + h2,k L2 + h3,k L3 + h4,k L4 + h5,k L5 )x case of the other two components. k h0,k h1,k h2,k h3,k h4,k h5,k 7.1.5. Polynomial LGFT Approximation 0 1.062 −1.925 1.168 −0.3115 0.03776 −0.001702 Bandpass LGFT functions, Hk (λ), k = 0, 1, . . . , K, 1 −0.002 1.773 −1.655 0.5357 −0.07250 0.003508 of the form (131) or (135) can be implemented using the 2 −0.154 1.016 −0.601 0.1295 −0.01155 0.000349 Chebyshev finite (M −1)-order polynomial approximation, 3 0.005 −0.301 0.621 −0.2674 0.04200 −0.002225 P̄k,M −1 (λ), k = 0, 1, . . . , K, of the form 4 0.089 −0.748 0.869 −0.3042 0.04217 −0.002040 5 0.060 −0.381 0.319 −0.0704 0.00461 0.000000 M −1 6 −0.024 0.277 −0.430 0.2055 −0.03570 0.002040 ck,0 X 7 −0.076 0.598 −0.714 0.2814 −0.04292 0.002225 P̄k,M −1 (λ) = + ck,m T̄m (λ). (136) 2 m=1 8 −0.027 0.159 −0.122 0.0198 0.00177 −0.000349 9 0.087 −0.699 0.868 −0.3662 0.06140 −0.003508 This leads to the vertex domain implementation of the 10 −0.026 0.220 −0.293 0.1333 −0.02435 0.001536 spectral LGFT form, given by sk = P̄k,M −1 (L)x, operations is needed to calculate the LGFT in the vertex do- main, with (K + 1) spectral bands, using a polynomial whose for k = 0, 1, 2, . . . , K, with order is (M − 1). The number of nonzero elements in the graph Laplacian is denoted by NL . M −1 ck,0 X P̄k,M −1 (L) = + ck,m T̄m (L), (137) 2 m=1 7.1.6. The Spectral Graph Wavelet Transform As in classical signal processing, wavelet coefficients = h0,k I + h1,k L + h2,k L2 + · · · + h(M −1),k LM −1 can be defined as a projection of a graph signal onto the as discussed in Section 3.5.4 and shown in Table 2. The wavelet kernel functions. Assume that the basic form for polynomial form in (137) uses only the (M −1)-neighborhood the wavelet definition in the spectral domain is a band-pass in calculation of the LGFT for each considered vertex, function, H(λ). The wavelet in spectral domain then rep- without the need for eigendecomposition analysis, thus sig- resents a scaled version of H(λ) in scale si , i = 1, 2, . . . , K, nificantly reducing the computational cost. and is denoted by H(si λ). The wavelet kernel is already, Example 23: Consider the shifted transfer functions, Hk (λ), k = by definition, of a high-pass form. Now, in the same way 0, 1, . . . , K, defined by (131) and (132), as in the case of the kernel form of the LGFT in (128), the PKshown in Fig. 39(a), graph wavelet transform (spectral graph wavelet transform for K = 10. Functions Hk (λ) satisfy k=0 Hk (λ) = 1, which is numerically confirmed and designated by the horizontal dot- – SGWT) is defined using the wavelet kernel, ψm,si (n), ted line in 39(a). Each individual transfer function, Hk (λ), is N −1 approximated using the Chebyshev polynomial, P̄k,M −1 , k = X ψm,si (n) = H(si λp )up (m)up (n), (138) 0, 1, . . . , K, as detailed in Section 3.5.4, with three polynomial p=0 orders defined by M = 6, M = 20 and M = 80. These poly- nomial approximations are shown in P Fig. 39(b), (c) and (d). which corresponds to the LGFT kernel, Hm,k (n), defined In each considered case, summations K k=0 P̄k,M −1 (λ) are cal- in (127). This yields the wavelet coefficients given by culated. It can be observed that for different values of M , the summations in all considered cases are very close to 1, thus N X −1 guaranteeing numerically stable invertibility of the LGFT, as W (m, si ) = ψm,si (n)x(n) = discussed later. n=0 The so obtained approximations of transfer functions, Hk (λ), N X −1 N X −1 N X −1 are used for the LGFT based vertex-frequency analysis. Ab- H(si λp )x(n)up (m)up (n) = H(si λp )X(p)up (m). solute LGFT values, calculated for the three-component graph n=0 p=0 p=0 signal from Fig. 32(d), are shown in Fig. 40(a),(b) and (c), for M = 6, M = 20 and M = 80. Low resolution in Fig. The wavelet coefficients may be interpreted as the IGDFT 40(a) is directly related to the imprecise and very wide (with a of H(si λp )X(p), that is low spectral resolution) approximation of the spectral transfer functions for M = 6, in Fig. 39 (b). Notice that high values of W (m, si ) = IGDFT{H(si λp )X(p)}. (139) the polynomial order, (M − 1), increase calculation complexity and require wide vertex neighborhood in the calculation of the Remark 34: We will use the notation H(si λ) = Hi (λ) LGFT. with the corresponding matrix function form Hi (Λ). No- Based on the analysis of calculation complexity in Section tice that this scale-based indexing is opposite to the classi- 3.5.4 we may conclude that an order of KM NL of arithmetic cal frequency band indexing. The largest scale for H(s1 λ), 45 10 1 8 6 0.5 4 0 0 1 2 3 4 5 6 7 2 20 40 60 80 100 (a) 1 10 0.5 8 0 0 1 2 3 4 5 6 7 6 4 1 2 20 40 60 80 100 0.5 (b) 0 0 1 2 3 4 5 6 7 10 8 1 6 4 0.5 2 0 0 1 2 3 4 5 6 7 20 40 60 80 100 (c) Figure 39: Chebyshev approximation of LGFT transfer functions, Figure 40: Vertex-frequency representation of a three-component sig- which correspond to the raised cosine window in the spectral domain. nal in Fig. 32(d). The LGFT is based on raised cosine (Hann win- (a) Original transfer functions Hk (λ), k = 0, 1, . . . , K, for K = 10. dow) like bandpass transfer functions for frequency selection, with PK The dotted horizontal line designates k=0 Hk (λ). (b) Polyno- K = 10, approximated using the Chebyshev polynomials of various mial Chebyshev approximations, P̄k,M −1 (λ), k = 0, 1, . . . , K, with order, as shown in Fig. 39 (b), (c), (d). (a) The LGFT of the signal M = 6. (c) Polynomial Chebyshev approximations, P̄k,M −1 (λ), k = from Fig 32(d), calculated using the Chebyshev polynomial approx- 0, 1, . . . , K, with M = 20. (d) Polynomial Chebyshev approxima- imation of transfer functions given in Fig. 39 (b), with M = 6. tions, P̄k,M −1 (λ), k = 0, 1, . . . , K, with M = 80. The dotted hor- (b) The LGFT of the signal from Fig 32(d), calculated using the izontal line designates PK transfer functions for frequency selection given in Fig. 35(c), with k=0 P̄k,M −1 (λ), which is close to 1 in all considered approximations, thus guaranteeing stable transform in- M = 20. (c) The LGFT of the signal from Fig 32(d), calculated us- vertibility. Transfer function H6 (λ) and approximations, P̄6,M −1 (λ), ing the transfer functions for frequency selection given in Fig. 35(d), are designated by the thick black line. with M = 80. Low resolution in (a) can be directly related with low M = 6 used in approximation in Fig. 39 (b). The resolution is considerably improved for M = 20. 46 1 < s1 λ ≤ M , is obtained for the smallest s1 , 1/s1 < In classical wavelet transforms the dyadic scheme with M = 2 λ ≤ M/s1 , where M > 1 is the coefficient of the scale is commonly used. The last value of the scale factor, sK = changes, which will be explained later. The associated M K /λmax /M , is defined by K and indicates how close the last spectral wavelet transfer function, H(s1 λ) = H1 (λ), corre- wavelet transfer function is to λ = 0. sponds to the highest frequency band. The wavelet trans- The polynomial function, vx (x), is defined by fer function in scale sK , H(sK λ) = HK (λ), is associated vx (x) = x4 (35 − 84x + 70x2 − 20x3 ), for 0 ≤ x ≤ 1, with with the lowest frequency band. Notation for the spectral scale function (low-pass transfer function complementary vx q(0) = vx (0) = 0, vx q(M − 1) = vx (1) = 1. (142) to H(sK λ) within the lowest spectral interval) is G(λ). The wavelet transfer functions, The spectral scale function, G(λ), plays the role of low- pass transfer function with spectral index 0 in the LGFT. Hi (λ) = H(si λ), Therefore, K spectral wavelet transfer functions H(si λ), are of a band-pass type. The main property (condition for the i = 1, 2, . . . , K, along with the scale function G(λ), cover reconstruction) is that the wavelet functions in two successive exactly K + 1 spectral bands as in the LGFT case. scales satisfy the following property According to (47), we can write Hi2 (λ) + Hi+1 2 (λ) wi = Hi (L)x, (140) π si λ π si λ = cos2 vx q( − 1) + sin2 vx q( − 1) = 1, where wi a column vector with elements W (m, si ), m = 2 M 2 M 0, 1, . . . , N − 1. within If Hi (λ) = H(si λ) can be approximated by a polyno- M < si λ ≤ M 2 . mial in λ, Hi (λ) ≈ Pi (λ), then the relation PK This property implies i=1 H 2 (si λ) = 1 for all λ except in the last interval, sK λ ∈ [0, M 2 ]. To handle the low-pass spectral wi ≈ Pi (L)x, (141) components (the interval for λ closest to λ = 0), the low-pass follows, where Pi (L) is a polynomial in the graph Lapla- type scale function, G(λ)), is added in the form cian (see Section 3.5.4 and Example 23).  K−1 0 ≤ λ ≤ M/sK =λmax /M 1, for  Example 24: The wavelet transform (vertex-scale) representa-   tion of a three-component signal in Fig. 32(d), obtained using G(λ) = cos π2 vx q( sM Kλ − 1) , for M < sK λ ≤ M 2 the Meyer-like graph wavelet in the spectral domain, λ, will be    0, elsewhere. illustrated here. As in classical wavelet transform, the wavelet in the first scale should correspond to the high-pass transfer function with nonzero values in the interval λmax /M < λ ≤ Remark 35: The number of wavelet transfer functions, λmax , where M > 1 is the coefficient of the scale changes. In K, does not depend on the other wavelet parameters. A classical wavelet transforms the dyadic scheme with M = 2 is large value of K will only increase the number of intervals commonly used. The scale based indexing is opposite to the and the resolution (producing smaller width of the first classical frequency indexing, where large indices indicate the high frequency content. The Meyer-like graph wavelet in the interval defined by λmax /M K−1 ) toward λ → 0, as shown first scale is defined by [70, 71] in Fig. 41(a), (b), and (c).  sin π v q(s λ − 1) , for 1 < s λ ≤ M, Remark 36: The wavelet transfer functions, H(si λ), in- x 1 1 H(s1 λ) = 2 cluding the low-pass scale function, G(λ), defined in Ex-  0, elsewhere. ample 24 satisfy the relation For 2 ≤ i ≤ K the Meyer-like graph wavelet is given by K X  H 2 (si λ) + G2 (λ) = 1. π sin vx q(si λ − 1) , for 1 < si λ ≤ M     2 i=1  H(si λ) = cos π v q( si λ − 1) , for M < s λ ≤ M 2 2 x M i Example 25: For q = 1, M = 2, and K = 9 the Meyer     0, elsewhere, wavelet functions are given in Fig. 41(a). The Meyer wavelet  functions for q = 3, M = 4/3, K = 13 and q = 9, M = 10/9, where q = 1/(M − 1). The initial interval is defined by s1 = K = 45 are shown in Fig. 41(b) and (c). The vertex-frequency M/λmax , so that 1 < si λ ≤ M corresponds to λmax /M < representation of the signal from Fig. 32 using these three sets λ ≤ λmax , while the other interval bounds are defined using a of wavelet transfer functions are shown in Fig. 42(a),(b), and geometric sequence of scale factors, (c). 1 si = si−1 M = s1 M i−1 = M i. Polynomial SGWT approximation. Chebyshev ap- λmax proximation of the wavelet functions, H(si λ) = Hi (λ), in Observe that the larger the scale factor si (and the scale in- the form dex i), the narrower the transfer function, H(si λ), while the M −1 progression coefficient is ci,0 X P̄i,M −1 (λ) = + ci,m T̄m (λ), (143) M = (q + 1)/q > 1. 2 m=1 47 1 100 0.5 80 60 0 0 1 2 3 4 5 6 7 (a) 40 20 1 20 40 60 80 100 (a) 0.5 100 0 0 1 2 3 4 5 6 7 80 (b) 60 40 1 20 0.5 20 40 60 80 100 0 (b) 0 1 2 3 4 5 6 7 (c) 100 Figure 41: Exemplars of Meyer wavelet functions (acting as transfer functions in the wavelet transform), shown in the spectral domain. 80 (a) Band-pass Meyer wavelet functions H(si λ), i = 1, 2, . . . , K and the low-pass scale function G(λ), for K = 9 and M = 2. (b) Band- 60 pass Meyer wavelet functions H(si λ), i = 1, 2, . . . , K and the low- pass scale function G(λ), for K = 13 and M = 3/2. (c) Band- pass Meyer wavelet functions H(si λ), i = 0, 1, . . . , K and the low- 40 pass function G(λ), for K = 45 and M = 10/9. Transfer functions H(s2 λ), H(s2 λ), H(s5 λ) are designated by the tick black line, for each of the considered setups in (a), (b) and (c), respectively; their 20 values at λp are shown in gray. 20 40 60 80 100 can be used for the vertex domain wavelet transform im- (c) plementation Figure 42: Vertex-frequency representation of a three-component sig- M −1 nal in Fig. 32(d). (a) The Meyer wavelet transform of the signal from ci,0 X Fig 32(d), calculated using the transfer functions for frequency se- P̄i,M −1 (L) = + ci,m T̄m (L), 2 m=1 lection given in Fig. 41(a). (b) The Meyer wavelet transform of the signal from Fig 32(d), calculated using the transfer functions i = 0, 1, 2, . . . , K for frequency selection given in Fig. 41(b). (c) The Meyer wavelet transform of the signal from Fig 32(d), calculated using the Meyer using only the (M − 1)-neighborhood of each considered wavelet transform transfer functions for frequency selection given in Fig. 41(c). Wavelet values were reassigned to spectral indices, vertex, and without any graph Laplacian eigendecompo- p, in order to illustrate the change in resolution. The same value sition analysis. The Chebyshev polynomials can be cal- of SGWT, W (m, k), is assigned to each spectral index, p, when culated recursively, as in (39), with a change of variables λp ∈ ( ak +b 2 k bk +ck , 2 ], without any scaling. and the recursive implementation as described in detail in Examples 5 and 23. 48 7.1.7. Windows Defined Using the Vertex Neighborhood so that a graph signal which is localized around a vertex In order to show that the window, hm (n), which is lo- m, may be formed based on this matrix, as calized at a vertex m can also be defined using the vertex neighborhood, recall that the distance, dmn , between ver- xm (n) = hm (n)x(n) = PD (n, m)x(n). tices m and n is equal to the length of the shortest walk The LGFT representation of a graph signal, x(n), then from vertex m to vertex n, and that dmn takes integer becomes values. Then, the window function can be defined as a N −1 N −1 function of vertex distance, in the form X X S(m, k) = x(n)hm (n) uk (n) = x(n)PD (n, m) uk (n), hm (n) = g(dmn ), n=0 n=0 (145) where g(d) corresponds to any basic window function in with the vertex-frequency kernel given by classical signal processing. For example, we can use the Hann window, given by Hm,k (n) = hm (n)uk (n) = PD (n, m)uk (n). (146) 1 This allows us to arrive at the matrix form of the LGFT, hm (n) = 1 + cos(πdmn /D) , for 0 ≤ dmn < D, given by 2 S = UT PD ◦ [x, x, . . . , x] , (147) where D is the assumed window width. For convenience, window functions for every vertex can where [x, x, . . . , x] is an N × N matrix, the columns of be calculated in a matrix form as follows: which are the signal vectors, x. For a rectangular function g(d) = 1, and for any d < D, • For the vertices for which the distance is dmn = 1, the LGFT can be calculated recursively with respect to the window functions are defined trough an adjacency window width, D, as (neighborhood one) matrix A1 = A. In other words, the vertices which belong to the one-neighborhood SD = SD−1 + UT AD−1 ◦ [x, x, . . . , x] . (148) of a vertex, m, are indicated by unit-value elements in the mth row of the adjacency matrix A (in un- Example 26: Consider the local vertex-frequency represen- weighted graphs). In weighed graphs, the corre- tation of the signal from Fig. 32, using vertex domain defined sponding adjacency matrix A can be obtained from windows. The localization kernels, Hm,k (n) = hm (n)uk (n), are the weighting matrix W as A = sign(W). shown in Fig. 43 for two vertices and two spectral indices. Ob- serve that for the spectral index k = 0, the localization kernel • Window functions for vertices m and n, for which is proportional to the localization function hm (n), given in Fig. the distance is dmn = 2 are defined by the matrix 43(a) and (c) for the vertices m = 34 and m = 78. Frequency modulated forms of these localization functions are shown in A2 = (A A1 ) ◦ (1 − A1 ) ◦ (1 − I), Figs. 43(b) and (d), for the same vertices and k = 20. where the symbol denotes the logical (Boolean) A vertex domain window is next used to analyze the graph signal from Fig. 32. The vertex-frequency representation, matrix product, ◦ is the Hadamard (element-by-element) S(n, k), obtained with the LGFT and the vertex domain lo- product, and 1 is a matrix with all elements equal to calization window is given in Fig. 44. Again, we can observe 1. The nonzero elements of the mth row of the ma- three constituent graph signal components in three distinct ver- trix A A1 then designate the vertices that are con- tex regions. The marginals of S(n, k) are also shown in the right nected to the vertex m with walks of length K = 2 and bottom panels. or lower. It should be mentioned that the element- by-element multiplication of (A A1 ) by matrix Remark 37: Directed graphs. The vertex neighbor- (1 − A1 ) removes the vertices connected with walks hood, as a set of vertices that can be reached from the of length 1, while the multiplication by (1 − I) re- considered vertex by a walk whose length is at most D, moves the diagonal elements from (A A1 ). may be also defined on directed graphs. In this case, this approach corresponds to one-sided windows in classical sig- • For dmn = d ≥ 2, we arrive at a recursive relation for nal analysis. the calculation of a matrix which will give the infor- If we want to define two-sided window, then we should mation about the vertices separated by the distance also include all vertices from which we can reach the con- d. Such a matrix has the form sidered vertex by walk whose length is at most D. This means that for a directed graph we should assume that Ad = (A Ad−1 ) ◦ (1 − Ad−1 ) ◦ (1 − I). (144) vertices with distance dmn = 1 form the considered vertex m are the vertices from which we can reach vertex m with The window matrix for an assumed graph window width, walk of length 1. In this case A1 = A+AT where addition D, can now be defined as is logical operation (Boolean OR). The matrix A2 is PD = g(0)I + g(1)A1 + · · · + g(D − 1)AD−1 , A2 = (A A + AT AT ) ◦ (1 − I) ◦ (1 − A1 ). 49 (a) (b) H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n) (c) (d) H34,20 (n) = h34 (n)u20 (n) H78,20 (n) = h78 (n)u20 (n) 1 1 0 0 -1 -1 0.5 0.5 0 0 -0.5 -0.5 -1 0 1 (e) -1 0 1 (f) H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n) H78,0 (n) = h78 (n)u0 (n) ∼ h78 (n) Figure 43: Localization kernels for vertex-frequency analysis, Hm,k (n) = hm (n)uk (n), for the case of vertex domain defined windows in the PN −1 local graph Fourier transform, S(m, k) = √ n=0 x(n)Hm,k (n). (a) Localization kernel H34,0 (n) = h34 (n)u0 (n) ∼ h34 (n), for a constant eigenvector, u0 (n) = 1/ N , centered at the vertex m = 34. (b) The same localization kernel as in (a), but centered at the vertex m = 78. (c) Localization kernel, H34,20 (n) = h34 (n)u20 (n), centered at the vertex m = 35 and frequency shifted by u20 (n). Observe the variations in kernel amplitude, which indicate a modulation of the localization window, hm (n). (d) The same localization kernel as in (c), but centered at the vertex m = 78. (e) Three-dimensional representation of the kernel H34,0 (n) = h34 (n)u0 (n). (f) Three-dimensional representation of the kernel H78,0 (n) = h78 (n)u0 (n). This procedure could be continued for walks up to the Example 27: The concentration measure, M(τ ) = kSk1 /kSkF , desired maximal length D. for the signal from Fig. 32, the window given in (116), and for For a circular directed graph in this way, we will get various τ is shown in Fig. 45, along with the optimal vertex the classical STFT with symmetric window. frequency representation. This representation is similar to that shown in Fig. 34, where an empirical value of τ = 3 was used, with the same localization window and kernel form. 7.1.8. Window Parameter Optimization The optimal τ can be obtained in only a few steps through The concentration of local vertex spectrum representa- the iteration tion can be measured using the normalized one-norm [72], as τk = τk−1 − α M(τk−1 ) − M(τk−2 ) , N −1 N −1 1 X X 1 M= |S(m, k)|= kSk1 , (149) with α a step-size parameter. F m=0 F k=0 The optimization of parameter τ can also be achieved where v trough graph uncertainty principle based techniques [32, uN −1 N −1 uX X 14]. F = kSkF = t |S(m, k)|2 m=0 k=0 7.2. Inversion of the LGFT is the Frobenius norm of matrix S. Alternatively, any The inversion relation of the LGFT, calculated using other norm kSkpp , with 0 ≤ p ≤ 1 can be used instead any of the presented localization (window) forms, will next of kSk1 . Recall that norms with p close to 0 are noise be considered in a unified way; the two approaches for the sensitive, while the norm with p = 1 is the only convex LGFT inversion here are: (i) inversion by summation of norm, which hence allows for gradient based optimization LGFT and (ii) kernel based inversion. [72]. 50 1 0 42 100 40 80 38 60 36 40 0 10 20 30 (a) 20 1 0.5 0 100 0 80 0.1 60 0.05 40 0 0 20 40 60 80 100 20 Figure 44: Local vertex-frequency spectrum calculated using the LGFT and vertex neighborhood windows, as in (146). This represen- 0 tation immediately shows that the graph signal consists of three com- ponents located at spectral indices k = 8, k = 66, and k = 27, with 0.06 the corresponding vertex indices in their respective vertex subsets V1 , V2 , and V3 , where V1 ∪ V2 ∪ V3 = V. The marginal properties are 0.04 also given in the panels to the right and below the vertex-frequency representation, and they differ from the ideal ones given respectively 0.02 by |x(n)|2 and |X(k)|2 . 0 0 20 40 60 80 100 (b) 7.2.1. Inversion by the Summation of the LGFT The reconstruction of a graph signal, x(n), from its Figure 45: Principle of the optimization of localization window. (a) local spectrum, S(m, k), can be performed through an in- Measure of the concentration of graph spectrogram for a varying spectral domain window parameter τ . (b) The corresponding opti- verse GDFT of (119), based on the graph windowed signal mal vertex-frequency representation, calculated with τ = 7, together N −1 with its marginals. X x(n)hm (n) = S(m, k) uk (n) (150) k=0 is a projection of the LGFT onto the spectral index axis. followed by a summation over all vertices, m, to yield For windows obtained using the generalized graph shift in N −1 N −1 (144), this conditions is always satisfied since H(0) = 1. 1 X X x(n) = PN −1 S(m, k)uk (n). (151) The condition PN −1 m=0 hm (n) = 1 can be enforced by m=0 hm (n) m=0 k=0 normalizing the elements of the matrix Ad , d = 1, 2, . . . , D− Remark 38: If the windows, hm (n), for every vertex, n, 1 in (144), prior to the calculation of matrix PD , in such satisfy the condition a way that the sum of each of its columns is equal to 1, which allows us to arrive at N X −1 hm (n) = 1, N X −1 N X −1 D−1 X m=0 hm (n) = PD (n, m) = g(d) = const. m=0 m=0 d=1 then the reconstruction does not depend on the vertex index, n, or in other words such reconstruction is vertex In general, the local vertex spectrum, S(m, k), can also independent. This becomes clear from be calculated over a reduced set of vertices, m ∈ M ⊂ V. N X −1 N X −1 N X −1 In this case, the summation over m in the reconstruction x(n) = S(m, k)uk (n) = X(k)uk (n), (152) formula should be executed over only the vertices m ∈ M, m=0 k=0 k=0 while P a vertex-independent reconstruction is achieved if where m∈M hm (n) = 1. N X −1 X(k) = S(m, k) m=0 51 7.2.2. Inversion of the LGFT with Band-Pass Functions Remark 39: This kind of kernel-based inversion is vertex- PM −1 For the LGFT, defined in (129) as sk = p=0 hp,k Lp x, invariant if the sum over all vertices, m, is invariant with the inversion is obtained by a summation over all spectral respect to n and is equal to 1, that is index shifts, k = 0, 1, . . . , K, that is N X −1 K X K N X X −1 K X h2m (n) = 1. (157) p sk = hp,k L x = Hk (L)x = x, (153) m=0 k=0 k=0 p=0 k=0 If the LGFT, S(m, k), is calculated over a reduced set PK of vertices, m ∈ M ⊂ V, then the P vertex independent if k=0 Hk (L) = I. This condition is equivalent to the following spectral domain form reconstruction condition becomes m∈M h2m (n) = 1. (b) For the LGFT with spectral shifted spectral windows, K X defined in (128), the kernel based inversion is of the form Hk (Λ) = I (154) k=0 N X −1 X K PK x(n) = S(m, k)Hm,k (n) (158) since U k=0 Hk (Λ)UT = I and UT U = I. The condi- m=0 k=0 tion in (154) is used to define the transfer functions in Fig. 35. if the following condition K X 7.2.3. Kernel-Based Inversion Hk2 (λp ) = 1 (159) Another approach to the inversion of the local vertex k=0 spectrum, S(m, k), follows the Gabor expansion frame- work [63], whereby the local vertex spectrum, S(m, k), is is satisfied for all λp , p = 0, 1, 2, . . . , N − 1. projected back to the vertex-frequency localized kernels, The inversion formula in (158), with condition (159), Hm,k (n). The inversion for two forms of the LGFT, de- follows from fined as in (120) and (128), will be analyzed. N X −1 X K (a) For the LGFT defined in (120), the sum of all of its S(m, k)Hm,k (n) (160) projections to the localized kernels, Hm,k (n), is m=0 k=0 N −1 X K N −1 N −1 −1 N −1 −1 N −1 N N X X X = X(p)Hk (λp )up (m) Hk (λl )ul (m)ul (n). X X X X S(m, k)Hm,k (n) = S(m, k)hm (n)uk (n) m=0 k=0 p=0 l=0 m=0 k=0 m=0 k=0 N −1 N −1 PN −1 Since m=0 up (m)ul (m) = δ(p − l), the last expression X X = IGDFT{S(m, k)}IGDFT{hm (n)uk (n)} reduces to the graph signal, x(n), k→i k→i m=0 i=0 N X −1 N X −1 K N X X −1 = [x(i)hm (i)][hm (n)δ(n − i)] X(p)Hk (λp )Hk (λp )up (n) = x(n), (161) m=0 i=0 k=0 p=0 N X −1 N X −1 = x(n)h2m (n) = x(n) h2m (n), (155) if the transfer functions, Hk (λp ), k = 0, 1, . . . , K, satisfy m=0 m=0 the condition in (159) for all λp . where IGDFT denotes the inverse GDFT transform. Par- 7.2.4. Vertex-Varying Filtering seval’s theorem for graph signals Filtering in the vertex-frequency domain may be imple- N −1 N −1 mented using a vertex-frequency support function, B(m, k). The filtered LGFT is then given by X X x(n)y(n) = X(k)Y (k) n=0 k=0 Sf (m, k) = S(m, k)B(m, k), was used in the derivation. In this form of the LGFT all possible spectral shifts, k = 0, 1, . . . , N − 1, are used. and the filtered signal, xf (n), is obtained by the inversion The inversion formula for the local vertex spectrum, of Sf (m, k) using the above mentioned inversion methods. S(m, k), which yields the original graph signal, x(n), then The filtering support function, B(m, k), can be obtained, becomes for example, by thresholding noisy values of the local ver- tex spectrum, S(m, k). N −1 N −1 1 X X Example 28: Consider the graph signal, x(n), from Fig. 32(d), x(n) = PN −1 S(m, k)Hm,k (n). (156) 2 also shown in Fig. 46 (a), and its version corrupted by an m=0 hm (n) m=0 k=0 additive white Gaussian noise, at the signal-to-noise ratio of SN Rin = 5.3 dB, given in Fig. 46 (b). The LGFT, S(m, k) 52 or in other words, that the product of the number of nonzero signal values, kxk0 , and the number of its nonzero DFT coefficients, kXk0 , is greater or equal than the total number of signal samples N ; they cannot simultaneously assume small values. To arrive at the uncertainty principle for graph signals, (a) consider a graph signal, x, and its spectral transform, X, in a domain of orthonormal basis functions, uk (n). Then, the uncertainty principle states that [32, 14, 73, 74] 1 kxk0 kXk0 ≥ . (163) maxk,m {|uk (m)|2 } This form of the uncertainty principle is generic, and in- deed for the basis functions uk (n) = √1N exp(j2πnk/N ), (b) the standard DFT uncertainty principle form in (162) fol- lows. Remark 40: Note, however, that in graph signal process- ing, the eigenvectors/basis functions can assume quite dif- ferent forms than in the standard DFT case. For example, when one vertex is loosely connected with other vertices, then max{|uk (m)|2 } → 1 and even kxk0 kXk0 ≥ 1 is pos- (c) sible for the uncertainty condition in (163). This means that, unlike the classical Fourier transform-based Figure 46: Vertex-varying filtering of a graph signal. (a) The original graph signal, x(n), from Fig. 32 (d). (b) The graph signal, x(n), time and frequency domains, a graph signal can be corrupted by an additive white Gaussian noise, at SN Rin = 5.3 dB. well localized in both the vertex and the spectral (c) The graph signal, xf (n), after vertex-varying filtering based on domains. thresholding of the LGFT of noisy graph signal, S(m, k), with the final signal-to-noise ratio SN Rout = 10.36 dB. Example 29: For the graph shown in Fig. 32, we have max{|uk (m)|2 } = 0.8713 k,m of the noisy graph signal is calculated according to (128), us- ing shifted bandpass spectral transfer functions, Hk (λp ), k = which indicates that even kxk0 kXk0 ≥ 1.1478 is possible. In 0, 1, . . . , K, p = 0, 1, . . . , N − 1, given Pby (131) without squares other words, a graph signal for which the number of nonzero (Hk (λp ) → Hk2 (λp )), which allows K k=0 Hk2 (λp ) = 1 to hold, samples, x(n), in the vertex domain is just two, will not violate instead of K the uncertainty principle even if it has just one nonzero GDFT P k=0 Hk (λp ) = 1. In this way, the condition for the inversion (159) is satisfied. The transfer functions, Hk (λp ), oth- coefficient, X(k). erwise correspond to those shown in Fig. 35 (b) with K = 25. The vertex-varying filtering is performed using Sf (m, k) = 7.4. Graph Spectrogram and Frames S(m, k)B(m, k) for m = 0, 1, . . . , N − 1, k = 0, 1, . . . , K, with a simple thresholding-based filtering support function Based on (119), the graph spectrogram can be defined as −1 NX ( 0, for |S(m, k)|< T 2 B(m, k) = |S(m, k)|2 = x(n)hm (n) uk (n) . (164) 1, otherwise, n=0 m = 0, 1, . . . , N − 1, k = 0, 1, . . . , K, with the threshold T = Then, according to Parseval’s theorem, the vertex marginal 0.09 set empirically. The output graph signal, xf (n), is ob- property, which is a projection of |S(m, k)|2 onto the vertex tained using the inversion relation in (158) for the filtered index axis, is given by LGFT, Sf (m, k), and shown in Fig. 46(c). The achieved output SNR was SN Rout = 10.36 dB. N X −1 N X −1 N X −1 2 |S(m, k)| = S(m, k) x(n)hm (n) uk (n) 7.3. Uncertainty Principle for Graph Signals k=0 k=0 n=0 N −1 In classical signal analysis, the purpose of a window X = |x(n)hm (n)|2 , function is to enhance signal localization in the joint time- n=0 frequency domain. However, the uncertainty principle pre- vents the ideal localization in both time and frequency. which would be equal to the signal power, |x(m)|2 , at the Indeed, in the classical DFT analysis the uncertainty prin- vertex m, if hm (n) = δ(m − n). Since this is not the case, ciple states that the vertex marginal property of the graph spectrogram is kxk0 kXk0 ≥ N, (162) 53 equal to the power of the graph signal in hand, smoothed A frame is termed a tight frame if the equality in (168) by the window, hm (n). holds, that is, if Energy of graph spectrogram. For the total energy of N X −1 graph spectrogram, we consequently have |hm (n)|2 = 1, m=0 N −1 N −1 N −1 N −1 what is the same condition as in (157). X X X X |S(m, k)|2 = |x(n)|2 |hm (n)|2 ) . (165) m=0 k=0 n=0 m=0 (b) The LGFT defined in (128) is a tight frame if PN −1 K N −1 K N −1 If m=0 |hm (n)|2 = 1 for all n, then the spectrogram on X X X X the graph is energy unbiased (statistically consistent with |S(m, k)|2 = |X(p)Hk (λp )|2 = Ex , (170) k=0 m=0 k=0 p=0 respect to the energy), that is N X −1 N X −1 N X −1 where Parseval’s theorem for the S(m, k) as the GFT of |S(m, k)| = 2 2 2 |x(n)| = ||x|| = Ex . (166) X(p)Hk (λp ) was used to yield m=0 k=0 n=0 N X −1 N X −1 The LGFT viewed as a frame. A set of functions, |S(m, k)|2 = |X(p)Hk (λp )|2 . S(m, k), is called a frame for the expansion of a graph m=0 p=0 signal, x, if This means that the LGFT in (128) is a tight frame if N X −1 A||x||2 ≤ |S(m, k)|2 ≤ B||x||2 , K X m=0 |Hk (λp )|2 = 1 for p = 0, 1, . . . , N − 1. k=0 where A and B are positive constants. If A = B, the frame is termed Parseval’s tight frame and the signal can This condition is used to define transfer functions in Fig. be recovered as 35(b) and (c). N −1 N −1 From (170), it is straightforward to conclude that the 1 X X graph spectrogram energy is bounded with x(n) = S(m, k)hm (n)uk (n). A m=0 k=0 K N −1 X X The constants A and B govern the numerical stability AEx ≤ |S(m, k)|2 ≤ BEx , (171) k=0 m=0 of recovering the original signal x from the coefficients S(m, k). where A and B are respectively the minimum and the The conditions for two forms of the LGFT, defined as maximum of value of in (120) and (128), to represent frames will be analyzed K next: X g(λp ) = |Hk (λp )|2 . (a) The LGFT, defined as in (120), is a frame, since in k=0 this case Parseval’s theorem holds [42, 75, 44, 55], that is N −1 N −1 7.4.1. Graph Wavelet Transform Inversion The wavelet inversion formula X X 2 2 2 |hm (n)| = |H(k)| |uk (n)| , (167) m=0 k=0 N −1 X K X which allows us to write x(n) = ψ(n, si )W (n, si ) (172) n=0 i=0 N −1 N −1 1 2 X X H (0) ≤ |hm (n)|2 ≤ max|uk (n)|2 |H(k)|2 = γ 2 Eh , can be derived in the same way and under the same con- N m=0 m,k dition as in (158)-(159), where a set of discrete scales for k=0 (168) the wavelet calculation, denoted by s ∈ {s1 , s1 , . . . , sK }, is where γ = maxm,k |uk (n)| and assumed, and ψ(n, s0 ) is used as a notation for the scale function, φ(n), whose spectral transfer function is G(λ), N −1 X as explained in Remark 34. In the same way as in the Eh = |H(k)|2 . LGFT case, it can be shown that the wavelet transform k=0 represents a frame with By multiplying both sides of the above inequalities by N −1 X K ||x||2 , we arrive at X A||x||2 ≤ |W (n, si )|2 ≤ B||x||2 , (173) N −1 N −1 n=0 i=0 1 2 X X H (0)||x||2 ≤ |S(m, k)|2 ≤ ||x||2 γ 2 Eh . (169) N m=0 k=0 54 where [71, 76, 77] where for each vertex, the vertex-frequency energy distri- bution, E(n, k), is defined by [78, 79] A= min g(λ), 0≤λ≤λmax N −1 X B= max g(λ), E(n, k) = x(n)X(k)uk (n) = x(n)x(m)uk (m)uk (n). 0≤λ≤λmax m=0 (175) and the function g(λ) is defined by K X Remark 41: The definition in (175) corresponds to the g(λ) = H 2 (si λ) + G2 (λ). Rihaczek distribution in classical time-frequency analysis i=1 [63, 64, 65]. Observe that based on the Rihaczek distribu- tion and the expression in (175), we may obtain a vertex- The low-pass scale function, G(λ), is added in the recon- frequency representation even without a localization win- struction formula, since all H(si λ) = 0 for λ = 0, as ex- dow. This very important property is also the main advan- plained in Example 24 and Remark 34. It should be men- tage (along with the concentration improvement) of classi- tioned that the spectral functions of the wavelet transform, cal time-frequency distributions with respect to the spec- H(si λ), form Parseval’s frame if trogram and STFT based time-frequency representations. g(λ) = 1. The marginal properties of the vertex-frequency energy distribution, E(n, k), are defined as its projections onto the Since the number of wavelet transform coefficients, W (n, si ), spectral index axis, k, and the vertex index axis, n, to give for each n and i, is greater than the number of signal sam- N −1 N −1 ples, N , this representation is redundant, and this redun- X E(n, k) = |X(k)|2 and X E(n, k) = x2 (n), dancy allows us to implement the transform through a fast n=0 k=0 algorithm, rather than using the explicit computation of all wavelet coefficients [76, 77]. Indeed, for large graphs, it which correspond respectively to the squared spectra, |X(k)|2 , can be computationally too complex to compute the full and the signal power, x2 (n), of the graph signal, x(n). eigendecomposition of the graph Laplacian. A common Example 30: Fig. 47 shows the vertex-frequency distribution, way to avoid this computational burden is to use a poly- E(n, k), of the graph signal from Fig. 32, together with its nomial approximation schemes for H(si λ), i = 1, 2, . . . , K, marginal properties. The marginal properties are satisfied up and G(λ). One such approach is the truncated Chebyshev to the computer precision. Observe also that the localization of polynomial approximation method which is based on the energy is better than in the cases obtained with the localization application of the continuous spectral window functions windows in Figs. 34, 44, and 45. Importantly, the distribution, with Chebyshev polynomials, which admit order-recursive E(n, k), does not employ a localization window. calculation (see Section 3.5.4 and Example 23). If, for a given scale, si , the wavelet function is approximated by 7.5.1. Smoothness Index and Local Smoothness a polynomial in the Laplacian, Pi (L), then the wavelet The smoothness index, l, in graph signal processing transform can be efficiently calculated using plays the role of frequency, ω, in classical spectral analysis. For a graph signal, x, the smoothness index is defined as wi = Pi (L)x, (174) the Rayleigh quotient of the matrix L and vector x, that is (see Section 4.2.1, Part I) where wi a column vector with elements W (m, si ), m = 0, 1, . . . , N − 1. Note that this form corresponds to the xT Lx l= ≥ 0. (176) LGFT form in (129). xT x 7.5. Vertex-Frequency Energy Distributions Remark 42: The expression in (176) indicates that the The energy of a general signal is usually defined as smoothness index can be considered as a measure of the N −1 N −1 N −1 rate of change of a graph signal. Faster changing signals (corresponding to high-frequency signals) have larger val- X X X E= x2 (n) = x(n) X(k)uk (n). n=0 n=0 k=0 ues of the smoothness index. The maximally smooth graph signal is then a constant signal, x(n) = c, for which the This expression can be rearranged into smoothness index is l = 0. In the mathematics literature, the inverse of the smooth- N −1 N −1 N −1 N −1 X X X X ness index is known as the curvature (curvature ∼ 1/l). E= x(n)X(k)uk (n) = E(n, k), While larger values of the smoothness index correspond n=0 k=0 n=0 k=0 to graph signals with larger rates of change (less smooth graph signals), the larger values of curvature would indi- cate smoother graph signals. 55 1 0.5 0 100 2. Assume a piece-wise monocomponent signal x(n) = αi uki (n) for n ∈ Vi , i = 1, 2, . . . , M, 80 where Vi ⊂ V are the subsets of the vertices such that 60 Vi ∩Vj = ∅ for i 6= j, V1 ∪V2 ∪· · ·∪VM = V, that is, ev- ery vertex belongs to only one subset, Vi . Given the monocomponent nature of this signal, within each 40 subset, Vi , the considered signal is proportional to the eigenvector, uki (n). 20 Then, for each interior vertex, n ∈ Vi , i.e., a vertex whose neighborhood lies in the same set, Vi , the local 0 smoothness index is given by αi Luki (n) 0.4 λ(n) = = λki . (179) αi uki (n) 0.2 3. An ideally concentrated vertex-frequency distribu- 0 tion (ideal distribution) can be defined as 0 20 40 60 80 100 I(n, k) ∼ |x(n)|2 δ λk − [λ(n)] , Figure 47: Vertex-frequency energy distribution for the graph sig- nal whose vertex-frequency representation is given in Fig. 34. No whereby it is assumed that the local smoothness in- localization window was used here. dex is rounded to the nearest eigenvalue. This distribution can also be used as a local smooth- Notice that the smoothness index for an eigenvector, ness estimator, since for each vertex, n, the maxi- uk , of the graph Laplacian, L, is equal to its corresponding mum of I(n, k) is positioned at λk = λ(n). An es- eigenvalue, λk , that is timate of the spectral index at a vertex, n, denoted uTk Luk by k̂(n), is then obtained as = λk , (177) uTk uk k̂(n) = arg max{I(n, k)}, k since by definition Luk = λk uk . Remark 43: If the above eigenvectors are the classical so that the estimated local smoothness index be- Fourier transform basis functions, then the smoothness in- comes λ̂(n) = λk̂(n) . This type of estimator is widely dex corresponds to the squared frequency of the considered used in classical time-frequency analysis [63, 64, 65]. basis function, λk ∼ ωk2 , while the curvature corresponds to the squared period in harmonic signals. 4. Local smoothness property. The vertex-frequency distribution, E(n, k), satisfies the local smoothness This makes it possible to define the local smoothness property if index for a vertex n, λ(n), in analogy with the standard instantaneous frequency, ω(t), at an instant t, as [80] PN −1 k=0 λk E(n, k) PN −1 = λ(n). (180) Lx (n) k=0 E(n, k) λ(n) = , (178) x(n) In that case, the centers of masses of the vertex- where it was assumed that x(n) 6= 0 and Lx (n) are the frequency distribution along the spectral index axis, elements of the vector Lx. k, should be exactly at λ = λ(n), and can be used as The properties of the local smoothness include: an unbiased estimator of this graph signal parame- ter. 1. The local smoothness index, λ(n), for a monocom- ponent signal Example 31: The vertex-frequency distribution, defined by x(n) = αuk (n), E(n, k) = x(n)X(k)uk (n), satisfies the local smoothness prop- is vertex independent, and is equal to the global erty in (180), since smoothness index, λk , since PN −1 PN −1 k=0 λk E(n, k) k=0 λk x(n)X(k)uk (n) Lx (n) = = = λ(n). Lx (n) = αLuk (n) = αλk uk (n). PN −1 N −1 x(n) P k=0 E(n, k) k=0 x(n)X(k)u k (n) In the standard time-domain signal analysis, this P −1 The above relation follows from the fact that N k=0 λk X(k)uk (n) property means that the instantaneous frequency of are the elements of the IGDFT of λk X(k). Upon employ- a sinusoidal signal is equal to its global frequency. ing the matrix form of the IGDFT of ΛX, we have UΛX = 56 7 6 20 5 4 40 3 60 2 1 80 0 0 20 40 60 80 100 100 20 40 60 80 100 Figure 48: Local smoothness index, λ(n), of the graph signal from Fig. 32. Figure 49: The sinc kernel of the reduced interference vertex- frequency distribution in the frequency domain. UΛ(UT U)X = (UΛUT )(UX) = Lx. With the notation, Lx (n), for the elements of Lx, we next obtain This is obvious from N −1 X N −1 N −1 N −1 λk X(k)uk (n) = Lx (n). X X X G(n, k) = X(p)X ∗ (q)up (n)u∗q (n) = |x(n)|2 . k=0 k=0 p=0 q=0 The local smoothness index for the graph signal from Fig. 32 is shown in Fig. 48. • The frequency marginal property is satisfied if 7.5.2. Reduced Interference Distributions (RID) on Graphs φ(p, k, p) = δ(p − k). In order to emphasize the close relations with classical time-frequency analysis, in this subsection we will use the Then, the sum over all vertex indices produces complex-sensitive notation for eigenvectors and spectral N −1 N −1 vectors. The frequency domain definition of the energy X G(n, k) = X |X(p)|2 φ(p, k, p) = |X(k)|2 , distribution in (175) is given by n=0 p=0 N −1 PN −1 since n=0 up (n)u∗q (n) = δ(p−q), that is, the eigen- X E(n, k) = x(n)X ∗ (k)u∗k (n) = X(p)X ∗ (k)up (n)u∗k (n). p=0 vectors are orthonormal. Then, the general form of a graph distribution can be de- 7.5.3. Reduced Interference Distribution Kernels fined with the help of a kernel φ(p, k, q), as [81] A straightforward extension of classical time-frequency N X −1 N X −1 kernels to graph signal processing would be naturally based G(n, k) = X(p)X ∗ (q)up (n)u∗q (n)φ(p, k, q). upon exploiting the relation λ ∼ ω 2 , together with an ap- p=0 q=0 propriate exponential kernel normalization. (181) The simplest reduced interference kernel in the frequency- frequency shift domain, which would satisfy the marginal Observe that for φ(p, k, q) = δ(q − k), the graph Rihaczek properties, is the sinc kernel, given by distribution in (175) follows, while the unbiased energy PN −1 PN −1 ( condition k=0 n=0 G(n, k) = Ex is satisfied if 1 , for |k − p|≤ |p − q|, φ(p, k, q) = 1+2|p−q| N X −1 0, otherwise, φ(p, k, p) = 1. k=0 which is is shown in Fig. 49 at the frequency shift corre- sponding to k = 50. The so obtained distribution, G(n, k), may also satisfy Example 32: The sinc kernel was used for a vertex-frequency the vertex and frequency marginal properties, as elabo- representation of the signal from Fig. 32(d), with the results rated bellow. shown in Fig. 50. This representation is a smoothed version of • The vertex marginal property is satisfied if the energy vertex-frequency distribution in Fig. 47, whereby both (vertex and frequency) marginals are preserved. N −1 Remark 44: Marginal properties of graph spectro- X φ(p, k, q) = 1. k=0 gram. A general vertex-frequency distribution can be 57 1 0.5 0 100 in (181), the classical (Rihaczek based) Cohen class of dis- tributions directly follows, where c(k, n) is the distribution kernel in the ambiguity domain [63, 64, 65]. 80 60 8. Conclusion Fundamental ideas of graph signals and their analy- 40 sis have been introduced starting from an intuitive multi- sensor estimation example, frequently considered in tradi- 20 tional data analytics. The concept of systems on graphs has been defined using graph signal shift operators, which generalize the fundamental signal shift concepts in tra- 0 ditional signal processing. In part II of our monograph, the Graph Discrete Fourier Transform (GDFT) has been 0.4 at the core of the spectral domain representation of graph signals and systems on graphs, and has been defined based 0.2 on both the adjacency matrix and graph Laplacian. These 0 spectral domain representations have been used as the ba- 0 20 40 60 80 100 sis to introduce graph signal filtering concepts. Methods for the design of the graph filters have been presented next, Figure 50: Reduced interference vertex-frequency distribution of a including those based on the polynomial approximation. signal whose vertex-frequency representation is given in Fig. 34. The Various ideas related to the sampling of graph signals, and marginal properties are given in the panels to the right and below the particularly, the challenging topic of the subsampling, have vertex-frequency representation, and are equal to their corresponding also been addressed in this part of the monograph. This is ideal forms given by |x(n)|2 and |X(k)|2 . followed by conditions for the recovery of signals on graphs, from a reduced number of samples. The concepts of time- written for the vertex-vertex shift domain as a dual form varying signals on graphs and basic definitions related to of (181), to yield random graph signals have also been reviewed. While traditional approaches for graph analysis, clus- N X −1 N X −1 tering and segmentation consider only graph topology and G(n, k) = x(m)x∗ (l)uk (m)u∗k (l)ϕ(m, n, l), (182) spectral properties of graphs, when dealing with signals on m=0 l=0 graphs, localized analyzes should employed in order to con- sider both data on graphs and the graph topology. Such where ϕ(m, n, l) is the kernel in this domain (the same a unified approach to define and implement graph signal mathematical form as for the frequency-frequency shift do- localization methods, which takes into account both the main kernel). The frequency marginal is then satisfied if PN −1 data on graph and the corresponding graph topology, is n=0 ϕ(m, n, l) = 1 holds, while the vertex marginal is at the core of vertex-frequency analysis. Like in classi- met if ϕ(m, n, m) = δ(m−n). The relation of this distribu- cal time-frequency analysis, the main research efforts have tion with the vertex domain spectrogram (115) is simple, been devoted to linear representations of the graph signals and is given by which include a localization window for enhanced signal discrimination. Several methods for the definition of lo- ϕ(m, n, l) = hn (m)h∗n (l). calization widows in the spectral and vertex domain have However, this kernel cannot satisfy both the frequency and been addressed in Part II of this monograph. Optimiza- vertex P marginal properties, while the unbiased energy con- tion of the window parameters, uncertainty principle, and N −1 dition n=0 ϕ(m, n, m) = 1 reduces to (157). inversion methods have also been discussed. Following classical time-frequency analysis, energy forms of vertex- Remark 45: Classical time-frequency analysis fol- frequency energy and reduced interference distributions, lows as a special case from the general form of graph which do not use localization windows, have also been con- distributions in (181), if the considered graph is a di- sidered, together with their role as an estimator of the local rected circular graph. This becomes obvious upon re- smoothness index. calling that the adjacency matrix eigendecomposition pro- duces complex-valued √ eigenvectors of the form uk (n) = 9. Bibliography exp(j2πnk/N )/ N . With the kernel choice References N −1 −j 2πnk j 2πnp X φ(p, k, q) = φ(p − q, k − p) = c(p − q, n)e N eN [1] J. M. Moura, “Graph signal processing,” in Cooperative and n=0 Graph Signal Processing, P. Djuric and C. Richard, Editors, pp. 239–259, Elsevier, 2018. 58 [2] M. Vetterli, J. Kovačević, and V. Goyal, Foundations of Signal Society, vol. 6, pp. 1–12, 1951. Processing. Cambridge University Press., 2014. [24] P. B. Billingsley, Convergence of Probability Measures. John [3] A. Sandryhaila and J. M. Moura, “Discrete signal processing Wiley & Sons, 1999. on graphs,” IEEE Transactions on Signal Processing, vol. 61, [25] R. Durrett, Probability: Theory and Examples. Cambridge Uni- no. 7, pp. 1644–1656, 2013. versity Press, 1996. [4] V. N. Ekambaram, Graph-structured data viewed through a [26] D. Revuz and M. Yor, Continuous Martingales and Brownian Fourier lens. University of California, Berkeley, 2014. Motion. Springer, 1999. [5] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van- [27] P. B. Billingsley, Probability and Measure. John Wiley & Sons, dergheynst, “The emerging field of signal processing on graphs: 1995. Extending high-dimensional data analysis to networks and other [28] S. Chen, A. Sandryhaila, and J. Kovačević, “Sampling theory irregular domains,” IEEE Signal Processing Magazine, vol. 30, for graph signals,” in Proc. IEEE International Conference on no. 3, pp. 83–98, 2013. Acoustics, Speech and Signal Processing (ICASSP), pp. 3392– [6] R. Hamon, P. Borgnat, P. Flandrin, and C. Robardet, “Extrac- 3396, 2015. tion of temporal network structures from graph-based signals,” [29] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete IEEE Transactions on Signal and Information Processing over signal processing on graphs: Sampling theory,” IEEE Transac- Networks, vol. 2, no. 2, pp. 215–226, 2016. tions on Signal Processing, vol. 63, no. 24, pp. 6510–6523, 2015. [7] S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, “Sig- [30] S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, “Sig- nal denoising on graphs via graph filtering,” in Proc. 2014 IEEE nal recovery on graphs: Variation minimization,” IEEE Trans- Global Conference on Signal and Information Processing (Glob- actions on Signal Processing, vol. 63, no. 17, pp. 4609–4624, alSIP), pp. 872–876, 2014. 2015. [8] A. Gavili and X.-P. Zhang, “On the shift operator, graph fre- [31] S. Chen, R. Varma, A. Singh, and J. Kovačević, “Signal re- quency, and optimal filtering in graph signal processing,” IEEE covery on graphs: Fundamental limits of sampling strategies,” Transactions on Signal Processing, vol. 65, no. 23, pp. 6303– IEEE Transactions on Signal and Information Processing over 6318, 2017. Networks, vol. 2, no. 4, pp. 539–554, 2016. [9] L. Stankovic, D. Mandic, M. Dakovic, I. Kisil, E. Sejdic, and [32] M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, “Signals on A. G. Constantinides, “Understanding the basis of graph sig- graphs: Uncertainty principle and sampling,” IEEE Transac- nal processing via an intuitive example-driven approach,” IEEE tions Signal Processing, vol. 64, no. 18, pp. 539–554, 2016. Signal Processing Magazine, arXiv preprint arXiv:1903.11179, [33] X. Wang, P. Liu, and Y. Gu, “Local-set-based graph signal re- 2019. construction,” IEEE Transactions on Signal Processing, vol. 63, [10] L. Stanković, E. Sejdić, and M. Daković, “Vertex-frequency en- no. 9, pp. 2432–2444, 2015. ergy distributions,” IEEE Signal Processing Letters, vol. 25, [34] L. Stanković, E. Sejdić, S. Stanković, M. Daković, and I. Orović, no. 3, pp. 358–362, 2017. “A tutorial on sparse signal reconstruction and its applications [11] A. Sandryhaila and J. M. Moura, “Discrete signal processing in signal processing,” Circuits, Systems, and Signal Processing, on graphs: Frequency analysis,” IEEE Transactions on Signal pp. 1–58, 2018. Processing, vol. 62, no. 12, pp. 3042–3054, 2014. [35] L. Stanković, Digital signal processing with selected topics. Cre- [12] A. Sandryhaila and J. M. Moura, “Big data analysis with signal ateSpace Independent Publishing Platform, An Amazon.com processing on graphs: Representation and processing of mas- Company, 2015. sive data sets with irregular structure,” IEEE Signal Processing [36] S. K. Narang and A. Ortega, “Downsampling graphs using spec- Magazine, vol. 31, no. 5, pp. 80–90, 2014. tral theory,” in Proc. IEEE International Conference on Acous- [13] A. Venkitaraman, S. Chatterjee, and P. Händel, “Hilbert trans- tics, Speech and Signal Processing (ICASSP), pp. 4208–4211, form, analytic signal, and modulation analysis for graph signal 2011. processing,” arXiv preprint arXiv:1611.05269, 2016. [37] H. Q. Nguyen and M. N. Do, “Downsampling of signals on [14] A. Agaskar and Y. M. Lu, “A spectral graph uncertainty princi- graphs via maximum spanning trees.,” IEEE Transactions on ple,” IEEE Transactions on Information Theory, vol. 59, no. 7, Signal Processing, vol. 63, no. 1, pp. 182–191, 2015. pp. 4338–4356, 2013. [38] S. K. Narang and A. Ortega, “Perfect reconstruction two- [15] X. Yan, B. M. Sadler, R. J. Drost, P. L. Yu, and K. Lerman, channel wavelet filter banks for graph structured data,” IEEE “Graph filters and the z-Laplacian,” IEEE Journal of Selected Transactions on Signal Processing, vol. 60, no. 6, pp. 2786– Topics in Signal Processing, vol. 11, pp. 774–784, Sept 2017. 2799, 2012. [16] X. Wang, J. Chen, and Y. Gu, “Local measurement and recon- [39] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Inter- struction for noisy bandlimited graph signals,” Signal Process- polation of graph signals using shift-invariant graph filters,” ing, vol. 129, pp. 119–129, 2016. in Proc. 23rd European Signal Processing Conference (EU- [17] S. Segarra and A. Ribeiro, “Stability and continuity of centrality SIPCO), pp. 210–214, 2015. measures in weighted graphs,” IEEE Transactions on Signal [40] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sam- Processing, vol. 64, no. 3, pp. 543–555, 2016. pling of graph signals with successive local aggregations.,” IEEE [18] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex- Transactions Signal Processing, vol. 64, no. 7, pp. 1832–1843, frequency analysis on graphs,” Applied and Computational Har- 2016. monic Analysis, vol. 40, no. 2, pp. 260–291, 2016. [41] A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set se- [19] S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “\ell 1 trend lection for bandlimited graph signals using graph spectral prox- filtering,” SIAM review, vol. 51, no. 2, pp. 339–360, 2009. ies,” IEEE Transactions on Signal Processing, vol. 64, no. 14, [20] L. Stanković, M. Daković, and E. Sejdić, “Introduction to graph pp. 3775–3789, 2016. signal processing,” in Vertex-Frequency Analysis of Graph Sig- [42] H. Behjat, U. Richter, D. Van De Ville, and L. Sörnmo, “Signal- nals, pp. 3–108, Springer, 2019. adapted tight frames on graphs,” IEEE Transactions on Signal [21] A. Heimowitz and Y. C. Eldar, “A Unified View of Diffusion Processing, vol. 64, no. 22, pp. 6017–6029, 2016. Maps and Signal Processing on Graphs,” In Proceedings of the [43] Y. Tanaka and A. Sakiyama, “M-channel oversampled graph International Conference on Sampling Theory and Applications filter banks,” IEEE Transactions Signal Processessing, vol. 62, (SampTA), pp. 308–312, 2017. no. 14, pp. 3578–3590, 2014. [22] B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, [44] A. Sakiyama and Y. Tanaka, “Oversampled graph Laplacian and D. P. Mandic, “A Unifying Analysis of Shift Operators on matrix for graph filter banks,” IEEE Transactions on Signal a Graph,” arXiv:1908.01596, 2019. Processing, vol. 62, no. 24, pp. 6425–6437, 2014. [23] M. D. Donsker, “An Invariance Principle for Certain Probabil- [45] N. Tremblay and P. Borgnat, “Subgraph-based filterbanks for ity Limit Theorems,” Memoirs of the American Mathematical graph signals,” IEEE Transactions Signal Processing, vol. 64, 59 no. 15, pp. 3827–3840, 2016. ture notes],” IEEE Signal Processing Magazine, vol. 34, no. 4, [46] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in pp. 176–182, 2017. Proceedings of the 12th ACM SIGKDD International Confer- [70] Y. Meyer, Wavelets and Operators. Cambridge University ence on Knowledge Discovery and Data Mining, pp. 631–636, Press, 1992. ACM, 2006. [71] N. Leonardi and D. Van De Ville, “Tight wavelet frames on [47] L. Stankovic, D. Mandic, M. Dakovic, and I. Kisil, “An in- multislice graphs,” IEEE Transactions on Signal Processing, tuitive derivation of the coherence index relation in compres- vol. 61, no. 13, pp. 3357–3367, 2013. sive sensing,” IEEE Signal Processing Magazine, arXiv preprint [72] L. Stanković, “A measure of some time–frequency distributions arXiv:1903.11136, 2019. concentration,” Signal Processing, vol. 81, no. 3, pp. 621–631, [48] E. J. Candes, “The restricted isometry property and its implica- 2001. tions for compressed sensing,” Comptes rendus mathematique, [73] M. Elad and A. M. Bruckstein, “Generalized uncertainty prin- vol. 346, no. 9-10, pp. 589–592, 2008. ciple and sparse representation in pairs of bases,” IEEE Trans- [49] N. Perraudin and P. Vandergheynst, “Stationary signal pro- actions on Information Theory, vol. 48, no. 9, pp. 2558–2567, cessing on graphs,” IEEE Transactions on Signal Processing, 2002. vol. 65, no. 13, pp. 3462–3477, 2017. [74] N. Perraudin, B. Ricaud, D. I. Shuman, and P. Vandergheynst, [50] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Stationary “Global and local uncertainty principles for signals on graphs,” graph processes and spectral estimation,” IEEE Transactions APSIPA Transactions on Signal and Information Processing, on Signal Processing, vol. 65, no. 22, pp. 5911–5926, 2017. vol. 7, no. e3, pp. 1–26, 2018. [51] A. Loukas and N. Perraudin, “Stationary time-vertex signal pro- [75] D. K. Hammond, P. Vandergheynst, and R. Gribonval, cessing,” arXiv preprint arXiv:1611.00255, 2016. “Wavelets on graphs via spectral graph theory,” Applied and [52] S. P. Chepuri and G. Leus, “Subsampling for graph power spec- Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, trum estimation,” in Proc. IEEE Sensor Array and Multichan- 2011. nel Signal Processing Workshop (SAM), pp. 1–5, 2016. [76] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “The [53] G. Puy, N. Tremblay, R. Gribonval, and P. Vandergheynst, spectral graph wavelet transform: Fundamental theory and fast “Random sampling of bandlimited signals on graphs,” Applied computation,” in Vertex-Frequency Analysis of Graph Signals, and Computational Harmonic Analysis, vol. 44, no. 2, pp. 446– pp. 141–175, Springer, 2019. 475, 2016. [77] H. Behjat and D. Van De Ville, “Spectral design of signal- [54] C. Zhang, D. Florêncio, and P. A. Chou, “Graph signal process- adapted tight frames on graphs,” in Vertex-Frequency Analysis ing - A probabilistic framework,” Microsoft Research, Redmond, of Graph Signals, pp. 177–206, Springer, 2019. WA, USA, Tech. Rep. MSR-TR-2015-31, 2015. [78] L. Stanković, E. Sejdić, and M. Daković, “Vertex-frequency en- [55] B. Girault, “Stationary graph signals using an isometric graph ergy distributions,” IEEE Signal Processing Letters, vol. 25, translation,” in Proc. 23rd European Signal Processing Confer- no. 3, pp. 358–362, 2018. ence (EUSIPCO), pp. 1516–1520, 2015. [79] L. Stanković, M. Daković, and E. Sejdić, “Vertex-frequency en- [56] B. Girault, P. Gonçalves, and É. Fleury, “Translation on graphs: ergy distributions,” in Vertex-Frequency Analysis of Graph Sig- An isometric shift operator,” IEEE Signal Processing Letters, nals (L. Stanković and E. Sejdić, eds.), pp. 377–415, Springer, vol. 22, no. 12, pp. 2416–2420, 2015. 2019. [57] B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, [80] M. Daković, L. Stanković, and E. Sejdić, “Local smooth- and D. P. Mandic, “Unitary Shift Operators on a Graph,” ness of graph signals,” Mathematical Problems in Engineering, arXiv:1909.05767, 2019. vol. 2019, 2019. [58] P. A. Fillmore, “The Shift Operator,” American Mathematical [81] L. Stanković, E. Sejdić, and M. Daković, “Reduced interfer- Monthly, vol. 81, no. 7, pp. 717–723, 1974. ence vertex-frequency distributions,” IEEE Signal Processing [59] P. O. Löwdin, “On the Non-Orthogonality Problem Connected Letters, vol. 25, no. 9, pp. 1393–1397, 2018. with the Use of Atomic Wave Functions in the Theory of Molecules and Crystals,” Journal of Chemical Physics, vol. 18, no. 3, pp. 365–375, 1950. [60] P. O. Löwdin, “On the Nonorthogonality Problem,” Advances in Quantum Chemistry, vol. 5, pp. 185–199, 1970. [61] P. H. Schönemann, “A Generalized Solution of the Orthogonal Proctrustes Problem,” Psychometrika, vol. 31, pp. 1–10, 1966. [62] W. Kabsch, “A Solution for the Best Rotation to Related Two Sets of Vectors,” Acta Crystallographica, vol. 32, p. 922, 1976. [63] L. Stanković, M. Daković, and T. Thayaparan, Time-frequency signal analysis with applications. Artech House, 2014. Graph Signal Processing – Part III: [64] L. Cohen, Time-frequency Analysis. Electrical Engineering Sig- nal Processing, Prentice Hall PTR, 1995. Learning Graph Topology [65] B. Boashash, Time-frequency signal analysis and processing: A comprehensive reference. Academic Press, 2015. L. Stankovic, D. Mandic, M Dakovic, [66] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “A windowed graph Fourier transform,” in Proc. IEEE Statistical Signal Pro- M. Brajovic, B. Scalzo, A. G. Constantinides cessing Workshop (SSP), pp. 133–136, 2012. [67] X.-W. Zheng, Y. Y. Tang, J.-T. Zhou, H.-L. Yuan, Y.-L. Wang, L.-N. Yang, and J.-J. Pan, “Multi-windowed graph Fourier frames,” in Proc. IEEE International Conference on Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 1042–1048, 2016. [68] M. Tepper and G. Sapiro, “A short-graph Fourier transform via personalized pagerank vectors,” in Proc. IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4806–4810, 2016. [69] L. Stanković, M. Daković, and E. Sejdić, “Vertex-frequency analysis: A way to localize graph spectral components [lec- 60

References (81)

J. M. Moura, "Graph signal processing," in Cooperative and Graph Signal Processing, P. Djuric and C. Richard, Editors, pp. 239-259, Elsevier, 2018.
M. Vetterli, J. Kovačević, and V. Goyal, Foundations of Signal Processing. Cambridge University Press., 2014.
A. Sandryhaila and J. M. Moura, "Discrete signal processing on graphs," IEEE Transactions on Signal Processing, vol. 61, no. 7, pp. 1644-1656, 2013.
V. N. Ekambaram, Graph-structured data viewed through a Fourier lens. University of California, Berkeley, 2014.
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van- dergheynst, "The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains," IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83-98, 2013.
R. Hamon, P. Borgnat, P. Flandrin, and C. Robardet, "Extrac- tion of temporal network structures from graph-based signals," IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 215-226, 2016.
S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, "Sig- nal denoising on graphs via graph filtering," in Proc. 2014 IEEE Global Conference on Signal and Information Processing (Glob- alSIP), pp. 872-876, 2014.
A. Gavili and X.-P. Zhang, "On the shift operator, graph fre- quency, and optimal filtering in graph signal processing," IEEE Transactions on Signal Processing, vol. 65, no. 23, pp. 6303- 6318, 2017.
L. Stankovic, D. Mandic, M. Dakovic, I. Kisil, E. Sejdic, and A. G. Constantinides, "Understanding the basis of graph sig- nal processing via an intuitive example-driven approach," IEEE Signal Processing Magazine, arXiv preprint arXiv:1903.11179, 2019.
L. Stanković, E. Sejdić, and M. Daković, "Vertex-frequency en- ergy distributions," IEEE Signal Processing Letters, vol. 25, no. 3, pp. 358-362, 2017.
A. Sandryhaila and J. M. Moura, "Discrete signal processing on graphs: Frequency analysis," IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3042-3054, 2014.
A. Sandryhaila and J. M. Moura, "Big data analysis with signal processing on graphs: Representation and processing of mas- sive data sets with irregular structure," IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 80-90, 2014.
A. Venkitaraman, S. Chatterjee, and P. Händel, "Hilbert trans- form, analytic signal, and modulation analysis for graph signal processing," arXiv preprint arXiv:1611.05269, 2016.
A. Agaskar and Y. M. Lu, "A spectral graph uncertainty princi- ple," IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4338-4356, 2013.
X. Yan, B. M. Sadler, R. J. Drost, P. L. Yu, and K. Lerman, "Graph filters and the z-Laplacian," IEEE Journal of Selected Topics in Signal Processing, vol. 11, pp. 774-784, Sept 2017.
X. Wang, J. Chen, and Y. Gu, "Local measurement and recon- struction for noisy bandlimited graph signals," Signal Process- ing, vol. 129, pp. 119-129, 2016.
S. Segarra and A. Ribeiro, "Stability and continuity of centrality measures in weighted graphs," IEEE Transactions on Signal Processing, vol. 64, no. 3, pp. 543-555, 2016.
D. I. Shuman, B. Ricaud, and P. Vandergheynst, "Vertex- frequency analysis on graphs," Applied and Computational Har- monic Analysis, vol. 40, no. 2, pp. 260-291, 2016.
S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, "\ell 1 trend filtering," SIAM review, vol. 51, no. 2, pp. 339-360, 2009.
L. Stanković, M. Daković, and E. Sejdić, "Introduction to graph signal processing," in Vertex-Frequency Analysis of Graph Sig- nals, pp. 3-108, Springer, 2019.
A. Heimowitz and Y. C. Eldar, "A Unified View of Diffusion Maps and Signal Processing on Graphs," In Proceedings of the International Conference on Sampling Theory and Applications (SampTA), pp. 308-312, 2017.
B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, and D. P. Mandic, "A Unifying Analysis of Shift Operators on a Graph," arXiv:1908.01596, 2019.
M. D. Donsker, "An Invariance Principle for Certain Probabil- ity Limit Theorems," Memoirs of the American Mathematical Society, vol. 6, pp. 1-12, 1951.
P. B. Billingsley, Convergence of Probability Measures. John Wiley & Sons, 1999.
R. Durrett, Probability: Theory and Examples. Cambridge Uni- versity Press, 1996.
D. Revuz and M. Yor, Continuous Martingales and Brownian Motion. Springer, 1999.
P. B. Billingsley, Probability and Measure. John Wiley & Sons, 1995.
S. Chen, A. Sandryhaila, and J. Kovačević, "Sampling theory for graph signals," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3392- 3396, 2015.
S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, "Discrete signal processing on graphs: Sampling theory," IEEE Transac- tions on Signal Processing, vol. 63, no. 24, pp. 6510-6523, 2015.
S. Chen, A. Sandryhaila, J. M. Moura, and J. Kovačević, "Sig- nal recovery on graphs: Variation minimization," IEEE Trans- actions on Signal Processing, vol. 63, no. 17, pp. 4609-4624, 2015.
S. Chen, R. Varma, A. Singh, and J. Kovačević, "Signal re- covery on graphs: Fundamental limits of sampling strategies," IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 4, pp. 539-554, 2016.
M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, "Signals on graphs: Uncertainty principle and sampling," IEEE Transac- tions Signal Processing, vol. 64, no. 18, pp. 539-554, 2016.
X. Wang, P. Liu, and Y. Gu, "Local-set-based graph signal re- construction," IEEE Transactions on Signal Processing, vol. 63, no. 9, pp. 2432-2444, 2015.
L. Stanković, E. Sejdić, S. Stanković, M. Daković, and I. Orović, "A tutorial on sparse signal reconstruction and its applications in signal processing," Circuits, Systems, and Signal Processing, pp. 1-58, 2018.
L. Stanković, Digital signal processing with selected topics. Cre- ateSpace Independent Publishing Platform, An Amazon.com Company, 2015.
S. K. Narang and A. Ortega, "Downsampling graphs using spec- tral theory," in Proc. IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pp. 4208-4211, 2011.
H. Q. Nguyen and M. N. Do, "Downsampling of signals on graphs via maximum spanning trees.," IEEE Transactions on Signal Processing, vol. 63, no. 1, pp. 182-191, 2015.
S. K. Narang and A. Ortega, "Perfect reconstruction two- channel wavelet filter banks for graph structured data," IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2786- 2799, 2012.
S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, "Inter- polation of graph signals using shift-invariant graph filters," in Proc. 23rd European Signal Processing Conference (EU- SIPCO), pp. 210-214, 2015.
A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, "Sam- pling of graph signals with successive local aggregations.," IEEE Transactions Signal Processing, vol. 64, no. 7, pp. 1832-1843, 2016.
A. Anis, A. Gadde, and A. Ortega, "Efficient sampling set se- lection for bandlimited graph signals using graph spectral prox- ies," IEEE Transactions on Signal Processing, vol. 64, no. 14, pp. 3775-3789, 2016.
H. Behjat, U. Richter, D. Van De Ville, and L. Sörnmo, "Signal- adapted tight frames on graphs," IEEE Transactions on Signal Processing, vol. 64, no. 22, pp. 6017-6029, 2016.
Y. Tanaka and A. Sakiyama, "M-channel oversampled graph filter banks," IEEE Transactions Signal Processessing, vol. 62, no. 14, pp. 3578-3590, 2014.
A. Sakiyama and Y. Tanaka, "Oversampled graph Laplacian matrix for graph filter banks," IEEE Transactions on Signal Processing, vol. 62, no. 24, pp. 6425-6437, 2014.
N. Tremblay and P. Borgnat, "Subgraph-based filterbanks for graph signals," IEEE Transactions Signal Processing, vol. 64, no. 15, pp. 3827-3840, 2016.
J. Leskovec and C. Faloutsos, "Sampling from large graphs," in Proceedings of the 12th ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, pp. 631-636, ACM, 2006.
L. Stankovic, D. Mandic, M. Dakovic, and I. Kisil, "An in- tuitive derivation of the coherence index relation in compres- sive sensing," IEEE Signal Processing Magazine, arXiv preprint arXiv:1903.11136, 2019.
E. J. Candes, "The restricted isometry property and its implica- tions for compressed sensing," Comptes rendus mathematique, vol. 346, no. 9-10, pp. 589-592, 2008.
N. Perraudin and P. Vandergheynst, "Stationary signal pro- cessing on graphs," IEEE Transactions on Signal Processing, vol. 65, no. 13, pp. 3462-3477, 2017.
A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, "Stationary graph processes and spectral estimation," IEEE Transactions on Signal Processing, vol. 65, no. 22, pp. 5911-5926, 2017.
A. Loukas and N. Perraudin, "Stationary time-vertex signal pro- cessing," arXiv preprint arXiv:1611.00255, 2016.
S. P. Chepuri and G. Leus, "Subsampling for graph power spec- trum estimation," in Proc. IEEE Sensor Array and Multichan- nel Signal Processing Workshop (SAM), pp. 1-5, 2016.
G. Puy, N. Tremblay, R. Gribonval, and P. Vandergheynst, "Random sampling of bandlimited signals on graphs," Applied and Computational Harmonic Analysis, vol. 44, no. 2, pp. 446- 475, 2016.
C. Zhang, D. Florêncio, and P. A. Chou, "Graph signal process- ing -A probabilistic framework," Microsoft Research, Redmond, WA, USA, Tech. Rep. MSR-TR-2015-31, 2015.
B. Girault, "Stationary graph signals using an isometric graph translation," in Proc. 23rd European Signal Processing Confer- ence (EUSIPCO), pp. 1516-1520, 2015.
B. Girault, P. Gonçalves, and É. Fleury, "Translation on graphs: An isometric shift operator," IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2416-2420, 2015.
B. Scalzo Dees, L. Stankovic, M. Dakovic, A. G. Constantinides, and D. P. Mandic, "Unitary Shift Operators on a Graph," arXiv:1909.05767, 2019.
P. A. Fillmore, "The Shift Operator," American Mathematical Monthly, vol. 81, no. 7, pp. 717-723, 1974.
P. O. Löwdin, "On the Non-Orthogonality Problem Connected with the Use of Atomic Wave Functions in the Theory of Molecules and Crystals," Journal of Chemical Physics, vol. 18, no. 3, pp. 365-375, 1950.
P. O. Löwdin, "On the Nonorthogonality Problem," Advances in Quantum Chemistry, vol. 5, pp. 185-199, 1970.
P. H. Schönemann, "A Generalized Solution of the Orthogonal Proctrustes Problem," Psychometrika, vol. 31, pp. 1-10, 1966.
W. Kabsch, "A Solution for the Best Rotation to Related Two Sets of Vectors," Acta Crystallographica, vol. 32, p. 922, 1976.
L. Stanković, M. Daković, and T. Thayaparan, Time-frequency signal analysis with applications. Artech House, 2014.
L. Cohen, Time-frequency Analysis. Electrical Engineering Sig- nal Processing, Prentice Hall PTR, 1995.
B. Boashash, Time-frequency signal analysis and processing: A comprehensive reference. Academic Press, 2015.
D. I. Shuman, B. Ricaud, and P. Vandergheynst, "A windowed graph Fourier transform," in Proc. IEEE Statistical Signal Pro- cessing Workshop (SSP), pp. 133-136, 2012.
X.-W. Zheng, Y. Y. Tang, J.-T. Zhou, H.-L. Yuan, Y.-L. Wang, L.-N. Yang, and J.-J. Pan, "Multi-windowed graph Fourier frames," in Proc. IEEE International Conference on Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 1042-1048, 2016.
M. Tepper and G. Sapiro, "A short-graph Fourier transform via personalized pagerank vectors," in Proc. IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4806-4810, 2016.
L. Stanković, M. Daković, and E. Sejdić, "Vertex-frequency analysis: A way to localize graph spectral components [lec- ture notes]," IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 176-182, 2017.
Y. Meyer, Wavelets and Operators. Cambridge University Press, 1992.
N. Leonardi and D. Van De Ville, "Tight wavelet frames on multislice graphs," IEEE Transactions on Signal Processing, vol. 61, no. 13, pp. 3357-3367, 2013.
L. Stanković, "A measure of some time-frequency distributions concentration," Signal Processing, vol. 81, no. 3, pp. 621-631, 2001.
M. Elad and A. M. Bruckstein, "Generalized uncertainty prin- ciple and sparse representation in pairs of bases," IEEE Trans- actions on Information Theory, vol. 48, no. 9, pp. 2558-2567, 2002.
N. Perraudin, B. Ricaud, D. I. Shuman, and P. Vandergheynst, "Global and local uncertainty principles for signals on graphs," APSIPA Transactions on Signal and Information Processing, vol. 7, no. e3, pp. 1-26, 2018.
D. K. Hammond, P. Vandergheynst, and R. Gribonval, "Wavelets on graphs via spectral graph theory," Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129-150, 2011.
D. K. Hammond, P. Vandergheynst, and R. Gribonval, "The spectral graph wavelet transform: Fundamental theory and fast computation," in Vertex-Frequency Analysis of Graph Signals, pp. 141-175, Springer, 2019.
H. Behjat and D. Van De Ville, "Spectral design of signal- adapted tight frames on graphs," in Vertex-Frequency Analysis of Graph Signals, pp. 177-206, Springer, 2019.
L. Stanković, E. Sejdić, and M. Daković, "Vertex-frequency en- ergy distributions," IEEE Signal Processing Letters, vol. 25, no. 3, pp. 358-362, 2018.
L. Stanković, M. Daković, and E. Sejdić, "Vertex-frequency en- ergy distributions," in Vertex-Frequency Analysis of Graph Sig- nals (L. Stanković and E. Sejdić, eds.), pp. 377-415, Springer, 2019.
M. Daković, L. Stanković, and E. Sejdić, "Local smooth- ness of graph signals," Mathematical Problems in Engineering, vol. 2019, 2019.
L. Stanković, E. Sejdić, and M. Daković, "Reduced interfer- ence vertex-frequency distributions," IEEE Signal Processing Letters, vol. 25, no. 9, pp. 1393-1397, 2018. Graph Signal Processing -Part III: Learning Graph Topology L. Stankovic, D. Mandic, M Dakovic, M. Brajovic, B. Scalzo, A. G. Constantinides

FAQs

How do graph signals enhance spatial data analysis compared to classical methods?add

Graph signals accommodate irregular data relations and maintain spatial awareness, outperforming classical uniform grid analyses. For instance, representing multisensor temperature data on graphs highlighted significant geospatial variations overlooked by traditional linear methods.

What are the key implications of using the Localized Graph Fourier Transform (LGFT)?add

LGFT enables precise vertex-frequency localization, aiding in efficient graph signal representation and reconstruction. This method demonstrates superior energy concentration, aligning closely with classical time-frequency analysis benefits.

What findings support the effectiveness of graph-based filtering over traditional methods?add

Graph-based filtering significantly increased signal-to-noise ratios (SNR) from 14.2 dB to 19.8 dB during temperature estimates, showcasing enhanced noise reduction capabilities via localized averaging. Traditional methods failed to consider spatial relationships, resulting in inferior performance.

How does graph signal processing address subsampling challenges in large datasets?add

Graph signal processing employs structured sampling strategies that exploit graph topology, allowing stable reconstruction from reduced data points. The techniques ensure efficient data recovery even in scenarios where traditional methods struggle with dimensionality.

What is the role of the graph Laplacian in defining systems on graphs?add

The graph Laplacian facilitates spectral decomposition, enabling the formulation of linear systems that incorporate local neighborhood effects and enhance signal processing on graphs. This operator captures the smoothness characteristics of graph signals effectively.

About the author