1.1 Eigen-stratified models
Stratified models are models that depend in an arbitrary way on a selected categorical feature (or set of features) that takes K values, and depend linearly on the other features. For example in a date-stratified model we might have a different linear model for each day of the year, with K = 365. Laplacian regularization can be added to exploit some known relations among the categorical features, expressed as a graph. In our date-stratified example, Laplacian regularization encourages the models for adjacent dates to be close, including the January 1 and December 31 models. In this example, the underlying graph is a cycle with 365 vertices.
Laplacian regularization can greatly improve the performance of a stratified model, especially in the low-data regime. In particular, it allows us to form a reasonable model even when we have no training data for some values of the categorical variable. The number of parameters in a Laplacian-regularized stratified model is K, the number of values of the categorical feature, times the size of the base model, which can be quite large when the catgorical values take many values. For example, a date-stratified model contains 365 times more coefficients than the base model. This is one of the challenges that we address in this paper.
Laplacian regularization encourages the model parameters to vary smoothly across the graph that encodes our prior information about the categorical values. If the model parameters vary smoothly across the graph, it is reasonable to assume they can be well approximated as linear combinations of a modest number of the eigenvectors of the associated graph Laplacian associated with the smallest eigenvalues. Our idea is simple: We impose the constraint that the model parameters are linear combinations of some number m of the bottom eigenvectors of the graph Laplacian. We refer to such a model as an eigen-stratified model. The resulting eigen-stratified model uses only a factor m parameters more than the base model, compared to a factor K for a general stratified model. In addition to this savings in model size, insisting that the model parameters are linear combinations of the bottom m eigenvectors acts as an additional useful regularization, that enforces smooth variation of the model parameters across the graph.
In our date-stratified example, the bottom eigenvector is constant, and the next ones occur in sine and cosine pairs, with periods one year, a half year, one-third of year, and so on. Using m = 7, say, requires that the model parameters are Fourier series with 7 terms (i.e., constant plus three harmonics). So here the eigen-stratified model is very natural.
In more complex cases, the eigen-stratified model is far less obvious. For example, the underlying graph can contain multiple edge weights, which are hyper-parameters. In any but the simplest cases, we do not have analytical expressions for the eigenvalues and eigenvectors, but they are readily computed, even for very large graphs.
1.2 Related work
Model approximations. It is quite common to approximate a larger model with a smaller, but only slightly less accurate model. In signal processing, discrete signals are transformed into a basis where they may be approximated by a linear combination of a small number of basis vectors, such as complex exponentials or cosines, in order to achieve significant size compression at the cost of signal degradation, which in many cases is minimal [ANR74, OS09].
Categorical embeddings. Learning low-dimensional vector representations of discrete variables is consistently used as a method to handle categorical features. Embeddings are a popular tool in fields such as natural language processing, to embed text as continuous vectors [Elm90, MCCD13, GB16]. We can associate with each vertex the m coefficients of the bottom Laplacian eigenvectors. This gives a Laplacian or spectral embedding of the features into
Spectral graph theory. The study of properties of graphs through their Laplacian eigen-decomposition is a long studied field in graph theory [Chu97, CSKSV18]. Three example applications include spectral clustering [NJW02], which is a form of dimensionality reduction that uses the the eigen-decomposition of the graph Laplacian to cluster nodes in a graph; finding the fastest Mixing Markov process on a graph, whose convergence guarantees rely on the spectrum of the graph’s Laplacian matrix (namely, the Fiedler eigenvalue of the Laplacian) [BDX04, SBXD06, BDPX09]; and graph coloring [Bro41, Br´e79], where the goal is to assign one of a set of colors to a graph node such that no two adjacent nodes share a color. Graph coloring is an NP-hard task in general, but ideas from spectral graph theory are naturally used as heuristics to sub-optimally solve this problem [AG84].
Laplacian regularization in large-scale optimization. There are many general methods to solve convex optimization problems with Laplacian regularization. Examples include the alternating direction method of multipliers (ADMM) [TBB19], majorization-minimization (MM) [THB19], and Anderson accelerated Douglas-Rachford splitting [FZB19]. In addition, the idea of applying Laplacian approximations to large-scale optimization problems has been studied in the past, where one approximates the graph Laplacian by a linear combination of the eigenvectors to solve extremely large semidefinite programs in, e.g., maximum variance unfolding [WSZS07].
1.3 Outline
In 2.1 we review stratified models, fixing our notation; in
2.2 we formally describe the eigen-stratified model fitting problem, and in
3, we give a distributed solution method. In
4 we give some simple numerical examples, carried out using an accompanying opensource implementation of our method.
In this section, we give a brief overview of stratified models; see [TBB19] for much more detail.
2.1 Stratified models
We fit a model to data records of the form (is the feature over which we stratify,
is the other features, and
is the outcome, label, or dependent variable. The feature and label spaces X and Y are arbitrary data types; the stratified feature values Z, however, must consist of only K possible values, which we denote as Z = {1, . . . , K}.
A stratified model is built on top of a base model, which models pairs (x, y) (or, when x is absent, just y). The base model is parametrized by a parameter vector stratified model, we use a different value of the parameter
for each value of z. We denote these parameters as
is the parameter value used when z = k. We let
denote the parameter values for the stratified model, where
(In [TBB19], the individual parameter vectors were stacked into one vector of dimension nK; here it will be more convenient to assemble them into a matrix.) To choose the parameters
, we minimize
The first term is the sum of K local objective functions, with the kth local objective function consisting of a local loss of the form
with loss function , and local regularizer
values of the regularizer encode constraints on allowable into a matrix.) Choosing
minimize
) gives the regularized empirical risk minimization model parameters, based only on the data records that take the particular value of the stratification feature z = k.
The second term ) in (1) measures the non-smoothness of the model parameters over
be a symmetric matrix with nonnegative entries. The associated Laplacian regularization or Dirichlet energy is the function
We can associate the Laplacian regularization with a graph with K vertices, with an edge (i, j) for each positive , with weight
. We can express the Laplacian regularization as the positive semidefinite quadratic form
where is the (weighted) Laplacian matrix associated with the weighted graph, given by
We note that the Laplacian regularization ) is separable in the rows of
We refer to the model obtained by solving (1) as a standard stratified model. When the loss function and local regularization function r are convex, the objective in (1) is convex, which implies that a global solution can be found efficiently [BV04]. When this assumption does not hold, heuristic methods can be used to approximately solve (1).
2.2 Eigen-stratified models
The eigen-decomposition of the Laplacian matrix L is
where Λ , a diagonal matrix consisting of the eigenvalues of L, is of the form Λ =
is a matrix of orthonormal eigenvectors of L. Since L1 = 0, where 1 is the vector with all entries one, we have
[Spi10]. (When the graph is connected,
is unique, and
0.) In many cases, the eigenvectors and eigenvalues of a graph Laplacian matrix can be computed analytically; in Appendix A, we mention a few of these common graphs and give their eigenvectors and eigenvalues. For
, we refer to
as the bottom m eigenvalues, and
eigenvectors. They are an orthonormal basis of the subspace of
smoothest, i.e., minimizes
, subject to ˜
Roughly speaking, functions on Z that are smooth should be well approximated by a linear combination of the bottom m eigenvectors (for suitable m).
Assuming that has low Dirichlet energy, i.e., a small Laplacian regularization term, we conclude that its rows are well approximated by a linear combination of the bottom m eigenvectors. This motivates us to impose a further constraint on the rows of
: They must be linear combinations of the bottom m eigenvectors of L. This can be expressed as
where are the (factorized) model parameters and ˜
are the bottom m eigenvectors of L.
Adding the constraint (4) to the Laplacian regularized stratified model fitting problem (1), we obtain the problem
where now both are optimization variables, coupled by the equality constraint. We can express the Laplacian regularization term in (5) directly in terms of Z as
where Λ) is the diagonal matrix of the eigenvalues corresponding to the bottom m eigenvectors of L. We refer to the model obtained by solving (5) as an eigen-stratified model.
We note that the sum of empirical losses and local regularization are clearly separable in the columns of , and the Laplacian regularization is a separable function in the rows of Z.
Comparison to standard stratified models. With standard stratified models, we allow arbitrary variations of the model parameter across the graph. With eigen-stratified models, we sharply limit how
varies across the graph by constraining
to be a linear combination of the m bottom eigenvectors of the graph.
Storage. The standard stratified model requires us to store Kn model parameters. An eigen-stratified model, on the other hand, stores m(K + n) variables in the eigenvectors ˜Q and the factorized model parameters Z. This implies that when storage savings is significant.
are convex, then (5) is a convex problem, which is readily solved globally in an efficient manner. It is easily formulated using domain specific languages for convex optimization [BV04, GBY06, GB14, DB16, FNB19]. If any of the
nonconvex, it is a hard problem to solve (5) globally. In this case, our method (described in
3) will provide a good heuristic approximate solution.
The two extremes. For a given set of edge weights, we analyze the behavior of the eigen-stratified model as we vary m. When we take m = 1 and the graph is connected, we recover the common model (i.e., a stratified model with all equal). We can see this by noting that when m = 1 and the graph is connected, the constraint in (5) becomes a consensus constraint (recall that the bottom eigenvector of a Laplacian matrix is a scalar multiple of 1). If we take m = K, the eigen-stratified model is the same as the standard stratified model.
In this section we describe a distributed algorithm for solving the fitting problem (5). To derive the algorithm, we first express (5) in the equivalent form
where are the (scaled) dual variables associated with the two constraints in (6), respectively, and
0 is the penalty parameter. The ADMM algorithm (in scaled dual form) for the splitting (
consists of the iterations
If the are convex, the iterates
are guaranteed to converge to each other and
are guaranteed to converge to a primal optimal point of (6) [BPC
This algorithm can be greatly simplified (and parallelized) by making use of a few observations. Our first observation is that the first step in ADMM can be expressed as
where is the proximal operator of the function g [PB14], and
This means that we can compute at the same time, since they do not depend on each other. Also, we can compute
in parallel. Our second observation is that the second step in ADMM can be expressed as
Complexity. Generally, the dominant cost of this algorithm depends on the complexity of computing a single proximal operator of . Otherwise, the dominant costs are in multiplying a dense
matrix, a dense
matrix, and a diagonal
together.
In this section, we illustrate the efficacy of the proposed method on two simple and relatively small examples.
Software implementation. An implementation of our method for fitting an eigen-stratified model is given as an extension of the stratified model fitting implementation in [TBB19], available at www.github.com/cvxgrp/strat_models (along with the accompanying examples). To fit an eigen-stratified model, one may invoke
Here, data are the problem data (i.e., (z, x, y) or (z, y)) and num_eigen is the number of bottom eigenvectors to use in the eigen-stratified model (i.e., m); if num_eigen is None, a standard Laplacian-regularized stratified model is fit.
4.1 Cardiovascular disease prediction
We consider the problem of predicting whether a patient has cardiovascular disease, given their sex, age, and other medical features.
Dataset. We use data describing approximately 70000 patients across the world [Kag19]. The dataset is comprised of males and females between the ages of 39 and 65 (inclusive), with approximately 50% of the patients diagnosed with cardiovascular disease.
There are 9 raw medical features in this dataset, which include: height, weight, systolic blood pressure (a categorical feature with values “below average”, “average”, and “above average”), diastolic blood pressure (a categorical feature with values “below average”, “average”, and “above average”), cholesterol, glucose levels, whether or not the patient smokes, whether or not the patient drinks alcohol, and whether or not the patient undergoes regular physical activity. We randomly partition the data into a training set consisting of 5% of the records, a validation set containing 5% of the records, and a test set containing the remaining 90% of the records. We choose extremely small training and validation sets to illustrate the efficacy of stratified models in low-data regimes.
Data records. We performed basic feature engineering on the raw medical features to derive a feature vector = 14), namely scalarization and converting the systolic blood pressure and diastolic blood pressure basic categorical features into multiple features via one-hot encoding, and adding a constant feature. The outcomes
whether or not the patient has contracted cardiovascular disease, with y = 1 meaning the patient has cardiovascular disease. The stratification feature z is a tuple of the patient’s sex and age; e.g., z = (Male, 47) corresponds to a 47 year old male. The number of stratification feature values is thus
Data model. We model the conditional probability of contracting cardiovascular disease given the features using a logistic regression base model (with intercept). We use logistic loss and sum of squares regularization, , with associated hyper-parameter
Regularization graph. We take the Cartesian product of two regularization graphs:
• Sex. The regularization graph is a path graph that has one edge between male and female, with edge weight
• Age. The regularization graph is a path graph between ages, with edge weight
Figure 1 illustrates the structure of this regularization graph. From Appendix A, the eigenvectors of the regularization graph’s Laplacian, and j = 1, . . . , 27, are given in closed-form as
where denotes a Kronecker product,
1), and cos(
) is applied elementwise. (It is convenient for the eigenvectors to be indexed by two numbers, corresponding to the sex and age subgraphs that make up the regularization graph.)
Figure 1: Regularization graph for
Table 1: Results for
Figure 2 plots 8 of the 54 eigenvectors of the sex/age regularization graph Laplacian, with the particular sex/age edge weights (175), sorted in increasing order corresponding to the bottom 8 eigenvalues of the Laplacian.
Results. For each of the fitting methods, we ran a crude hyper-parameter search over their hyper-parameters and selected hyper-parameters that performed well over the validation set. For the separate model, we used = 35, and for the common model, we used
(Recall that the separate model is a stratified model with all edge weights zero, and a common model is a stratified model with all edge weights +
[TBB19].) For the standard stratified model, we used
= 150. For the eigen-stratified model, we used
= 175, and m = 5. Table 1 shows the average negative log likelihood (ANLL) over the training, validation, and test datasets for the separate, common, standard stratified and eigen-stratified models. We see that this test ANLL attains a minimum when only 5 bottom eigenvectors are used for the eigen-stratified model. This minimum test ANLL of the eigen-stratified model is competitive with (in fact, slightly smaller than) the test ANLL of the standard stratified model.
In the eigen-stratified model with m = 5, the model parameters are linear combinations of 5 bottom eigenvectors. There are 54 = 756 parameters in the standard stratified model to store, whereas the eigen-stratified model with minimum test ANLL stores
(14 + 54) = 340 values, or approximately 45% as many parameters. So there is some storage efficiency gain even in this very simple example.
Figure 2: Heatmaps of the eigenvectors of the sex/age regularization graph Laplacian corresponding to the bottom 8 eigenvalues of the Laplacian.
4.2 Weather distribution modeling
We consider the problem of modeling the distribution of weather temperature as a function of week of year and hour of day.
Data records and dataset. We use temperature measurements from the city of Atlanta, Georgia for all of 2013 and 2014, sampled every hour (for a total of approximately 17500 measurements). The temperature is in Celsius; we round the temperatures to the nearest integer. There are n = 43 unique temperatures, ranging from -9 to 33 Celsius. Each data record includes the temperature, as well as the week of the year and the hour of the day (which will be the stratification features). The number of stratification features is
We partition the dataset into three separate sets; a training set consisting of 30% of the data, a validation set consisting of 35% of the data, and a held-out test set consisting of the remaining 35% of the data. The model is trained on approximately 4.2 samples per stratification feature.
Data model. We model the distribution of temperature in Atlanta at each week and hour using a non-parametric discrete distribution [TBB19]. Our local regularizer is a sum of two regularizers: a sum of squares regularizer and a scaled sum of squares regularizer on the difference between adjacent parameters, and
; the associated hyper-parameters with each are
. The distribution
at each node z is calculated as
where exp() is evaluated elementwise.
Regularization graph. We take the Cartesian product of two regularization graphs:
• Week of year. The regularization graph is a cycle graph with 52 nodes (one for each week of the year) with edge weights
• Hour of day. The regularization graph is a cycle graph with 24 nodes (one for each hour of the day) with edge weights
The Cartesian product of these two graphs is a torus, illustrated in figure 3. This graph has K = 1248 eigenvectors. The eigenvectors of this graph are given by
Figure 3: Regularization graph for 4.2. Each node corresponds to a week of year and hour of day, where the toroidal direction corresponds to increasing week of year, and the poloidal direction corresponds to increasing hour of day.
where
for i = 1, . . . , 26 and j = 1, . . . , 12, u = (0, . . . , 51), v = (0, . . . , 23), and cos(are applied elementwise. Figures 4 and 5 plots the bottom 10 eigenvectors of the week/hour regularization graph Laplacian, with the particular week/hour edge weights (
(.45, .55), sorted in increasing order corresponding to the bottom 10 eigenvalues of the Laplacian.
Results. For each of the fitting methods, we ran a crude hyper-parameter search over their hyper-parameters and selected hyper-parameters that performed well over the validation set. For the separate model, we used 3, and for the common model, we used
55. For the standard stratified model, we used
5. For the eigen-stratified model, we used
= 90 (roughly 7% of the 52
24 = 1248 eigenvectors). We compare the ANLLs over the training, validation, and test datasets for the separate, common, standard stratified and eigen-stratified models in table 2. The validation and held-out test ANLLs of the eigen-stratified model were smaller than the respective ANLLs of every other model in table 2, including those of the standard stratified model.
Figure 4: Heatmaps of the eigenvectors of the week/hour regularization graph Laplacian corresponding to
Figure 5: Heatmaps of the eigenvectors of the week/hour regularization graph Laplacian corresponding to
Table 2: Results for
In figure 6 we plot the cumulative distribution functions (CDFs) of temperature for week 1, hour 1; week 28, hour 12; and week 51, hour 21 (which have 2, 3, and 2 empirical measurements in the training dataset, respectively), for the eigen-stratified model, and for the test empirical data.
In figure 7, we plot heatmaps of the expected value and standard deviation of the distributions given by the eigen-stratified model. The statistics vary smoothly as hours of day and weeks of year vary.
We gratefully acknowledge discussions with Shane Barratt and Peter Stoica, who provided us with useful suggestions that improved the paper.
Figure 6: CDFs of various weeks of the year and hours of the day, as given by the eigen-stratified model, along with their corresponding empirical CDFs.
Figure 7: Heatmaps of the expected value (top) and standard deviation (bottom) of the distributions given by the eigen-stratified model.
[AG84] B. Aspvall and J. R. Gilbert. Graph coloring using eigenvalue decomposition. SIAM Journal on Matrix Analysis and Applications, 5(4):526–13, 12 1984.
[ANR74] N. Ahmed, T. Natarajan, and K. R. Rao. Discrete cosine transform. IEEE Transactions on Computers, C-23(1):90–93, Jan 1974.
[BDPX09] S. Boyd, P. Diaconis, P. Parrilo, and L. Xiao. Fastest mixing markov chain on graphs with symmetries. SIAM Journal on Optimization, 20(2):792819, June 2009.
[BDX04] S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing markov chain on a graph. SIAM Review, 46(4):667689, April 2004.
[BH12] A. E. Brouwer and W. H. Haemers. Spectra of Graphs. Springer, New York, NY, 2012.
[BPCS. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine Learning, 3(1):1–122, 2011.
[Br´e79] D. Br´elaz. New methods to color the vertices of a graph. Communications of the ACM, 22(4):251–256, April 1979.
[Bro41] R. L. Brooks. On colouring the nodes of a network. Mathematical Proceedings of the Cambridge Philosophical Society, 37:194197, 1941.
[But08] S. K. Butler. Eigenvalues and Structures of Graphs. PhD thesis, University of California, San Diego, 2008.
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
[CSKSV18] D. Cohen-Steiner, W. Kong, C. Sohler, and G. Valiant. Approximating the spectrum of a graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pages 1263– 1271, 2018.
[DB16] S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(1):2909–2913, 2016.
[Elm90] J. L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990.
[FNB19] A. Fu, B. Narasimhan, and S. Boyd. CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 2019.
[FZB19] A. Fu, J. Zhang, and S. Boyd. Anderson accelerated Douglas-Rachford splitting. http://stanford.edu/~boyd/papers/a2dr.html, 2019.
[GB14] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
[GB16] C. Guo and F. Berkhahn. Entity embeddings of categorical variables. CoRR, abs/1604.06737, 2016.
[GBY06] M. Grant, S. Boyd, and Y. Ye. Disciplined convex programming. In Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Application Series, pages 155–210. Springer, 2006.
[JM85] W. N. Anderson Jr. and T. D. Morley. Eigenvalues of the laplacian of a graph. Linear and Multilinear Algebra, 18(2):141–145, 1985.
[Kag19] Kaggle. Cardiovascular disease dataset. https://www.kaggle.com/sulianova/ cardiovascular-disease-dataset, 2019.
[Lan50] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45, October 1950.
[MCCD13] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv e-prints, Jan 2013.
[Mer98] R. Merris. Laplacian graph eigenvectors. Linear Algebra and its Applications, 278(1):221 – 236, 1998.
[NJW02] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 849–856. MIT Press, 2002.
[ON70] I. U. Ojalvo and M. Newman. Vibration modes of large structures by an automatic matrix-reduction method. AIAA Journal, 8(7):1234–1239, 1970.
[OS09] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
[Pai71] C. C. Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London, 1971.
[PB14] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.
[SBXD06] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Review, 48(4):681699, November 2006.
[Spi10] D. Spielman. Algorithms, graph theory, and linear equations in Laplacian matrices. In Proceedings of the International Congress of Mathematicians, pages 2698–2722. World Scientific, 2010.
[TBB19] J. Tuck, S. Barratt, and S. Boyd. A distributed method for fitting Laplacian regularized stratified models. arXiv preprint arXiv:1904.12017, 2019.
[THB19] J. Tuck, D. Hallac, and S. Boyd. Distributed majorization-minimization for Laplacian regularized problems. IEEE/CAA Journal of Automatica Sinica, 6(1):45–52, January 2019.
[WSZS07] K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul. Graph Laplacian regularization for large-scale semidefinite programming. In B. Sch¨olkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1489–1496. MIT Press, 2007.
[ZLY09] Y. Zhang, X. Liu, and X. Yong. Which wheel graphs are determined by their Laplacian spectra? Computers and Mathematics with Applications, 58(10):1887 – 1890, 2009.
Figure 8: A path graph with 8 vertices and unit weights.
The direct relation of a graph’s structure to the eigenvalues and eigenvectors of its corresponding graph Laplacian is well-known [JM85]. In some cases, mentioned below, we can find them analytically, especially when the graph has many symmetries. The eigenvectors are given in normalized form (= 1.) Outside of these common graphs, many other simple graphs can be analyzed analytically; see, e.g., [BH12].
A note on complex graphs. If a graph is complex, i.e., there is no analytical form for its graph Laplacian’s eigenvalues and eigenvectors, the bottom eigenvalues and eigenvectors of the Laplacian of a graph can be computed extremely efficiently by, e.g., the Lanczos algorithm or other more exotic methods. We refer the reader to [Lan50, ON70, Pai71] for these methods.
Path graph. A path or linear/chain graph is a graph whose vertices can be listed in order, with edges between adjacent vertices in that order. The first and last vertices only have one edge, whereas the other vertices have two edges. Figure 8 shows a path graph with 8 vertices and unit weights.
Eigenvectors of a path graph Laplacian with K nodes and unit edge weights are given by
where 1) and cos(
) is applied elementwise. The eigenvalues are 2
2 cos(
Cycle graph. A cycle graph or circular graph is a graph where the vertices are connected in a closed chain. Every node in a cycle graph has two edges. Figure 9 shows a cycle graph with 10 vertices and unit weights.
Eigenvectors of a cycle graph Laplacian with K nodes and unit weights are given by
where 1) and cos(
) are applied elementwise. The eigenvalues are 2
Figure 9: A cycle graph with 10 vertices and unit weights.
Figure 10: A star graph with 10 vertices (9 outer vertices) and unit weights.
Figure 11: A wheel graph with 11 vertices (10 peripheral vertices) and unit weights.
Star graph. A star graph is a graph where all of the vertices are only connected to one central vertex. Figure 10 shows an example of a star graph with 10 vertices (9 outer vertices) and unit weights.
Eigenvectors of a star graph with K vertices (1 outer vertices) and unit edge weights are given by
where th basis vector in
. The smallest eigenvalue of this graph is zero, the largest eigenvalue is K, and all other eigenvalues are 1.
Wheel graph. A wheel graph with K nodes consists of a center (hub) vertex and a ring of 1 peripheral vertices, each connected to the hub [BDPX09]. Figure 11 shows a wheel graph with 11 vertices (10 peripheral vertices) and unit weights.
Eigenvectors of a wheel graph with K vertices (1 peripheral vertices) are given by [ZLY09]
where 1) and cos(
) are applied elementwise. The smallest eigenvalue of the graph is zero, the largest eigenvalue is K, and the middle eigenvalues are given by 3
2, with multiplicity 2 [But08].
Figure 12: A complete graph with 8 vertices and unit weights.
Complete graph. A complete graph contains every possible edge; we assume here the edge weights are all one. The first eigenvector of a complete graph Laplacian with K nodes is , and the other
1 eigenvectors are any orthonormal vectors that complete the basis. The eigenvalues are 0 with multiplicity 1, and K with multiplicity
Figure 12 shows an example of a complete graph with 8 vertices and unit weights.
Complete bipartite graph. A bipartite graph is a graph whose vertices can be decomposed into two disjoint sets such that no two vertices share an edge within a set. A complete bipartite graph is a bipartite graph such that every pair of vertices in the two sets share an edge. We denote a complete bipartite graph with vertices on the first set and
on the second set as an (
)-complete bipartite graph. We have that
the convention that
. Figure 13 illustrates an example of a complete bipartite graph with (
6) and unit weights.
Eigenvectors of an ()-complete bipartite graph with unit edge weights are given
Figure 13: A (3,6)-complete bipartite graph with unit weights.
by [Mer98]:
The eigenvalues are zero (multiplicity 1), (multiplicity
(multiplicity
(multiplicity 1).
Scaling and products of graphs. We can find the eigenvectors and eigenvalues of the graph Laplacian of more complex graphs using some simple relationships. First, if we scale the edge weights of a graph by 0, the eigenvectors remain the same, and the eigenvalues are scaled by
. Second, the eigenvectors of a Cartesian product of graph Laplacians are given by the Kronecker products between the eigenvectors of each of the individual graph Laplacians; the eigenvalues consist of the sums of one eigenvalue from one graph and one from the other. This can be seen by noting that the Laplacian matrix of the Cartesian product of two graphs with graph Laplacians
is given by
where L is the Laplacian matrix of the Cartesian product of the two graphs. With Cartesian products of graphs, we find it convienent to index the eigenvalues and eigenvectors of the Laplacian by two indices, i.e., the eigenvalues may be denoted as with corresponding eigenvector
1. (The eigenvalues will need to be sorted, as explained below.)
As an example, consider a graph which is the product of a chain graph with P vertices, edge weights and eigenvalues
; and a cycle graph with Q vertices, edge weights
, and eigenvalues
. The eigenvalues have the form
To find the m smallest of these, we sort them. The order depends on the ratio of the edge weights,
As a very specific example, take = 2. The eigenvalues of the chain and cycle graphs are
The bottom six eigenvalues of the Cartesian product of these two graphs are then