Many application domains generate network structured data. Networked data arises in the study of self-organizing systems constituted by individual agents who can interact [21], [22]. Networked data also arises in computer vision where nodes represent individual pixels that are connected if they are close-by. Metrological observations collected by spatially distributed stations forms a network of time series with edges connecting close-by stations. We represent networked data conveniently using an “empirical” or “similarity” graph G [4], [30].
The analysis of networked data is often facilitated by grouping or clustering the data points into coherent subsets of data points. Clustering methods aim at finding subsets (clusters) of data points that are more similar to each other than to the remaining data points. Most existing clustering methods are unsupervised as they do not require the true cluster assignments for any data point [17]–[20], [23], [30].
Local graph clustering starts from few “seed nodes” and explore their neighbourhoods to find clusters around them [27], [29]. With a runtime depending only on the resulting clusters, these methods are attractive for massive graphs [8], [27].
Spectral clustering methods use the eigenvectors of the graph Laplacian matrix to approximate the indicator functions of clusters [1], [5], [23], [26], [30]. These methods are computationally attractive as they amount to linear systems which can be implemented as scalable message passing protocols [7], [32]. Our approach differs from spectral clustering in the approximation of the cluster indicators.
To approximate cluster indicators, we use the solutions of a particular instance of the network Lasso (nLasso) optimization problem [9]. We solve this nLasso clustering problem using an efficient primal-dual method. This primal-dual method has attractive convergence guarantees and can be implemented as scalable message passing (see Section IV).
Building on our recent work on the duality between TV minimization and network flow optimization [14], [15], we
AJ is with the Department of Computer Science, Aalto University, Finland. YS is with the Department of Mathematical Sciences, Sharif University of Technology, Iran.
show that the proposed nLasso clustering method can be interpreted (in a precise sense) as a flow-based clustering method [16], [28], [29], [31]. As detailed in Section III, our nLasso problem (and its dual flow optimization problem) is similar but different from the TV minimization problems (and its dual flow optimization problems) studied in [14], [15].
Compared with spectral methods, flow-based methods (including our approach) better handle sparsely connected (chain-like) clusters (see Section VI) and are more robust to “structural heterogeneities” [11], [31]. In contrast to existing flow-based local clustering methods, our approach is based on efficient convex optimization methods instead of computationally expensive combinatorial algorithms.
This paper makes the following contributions:
• Section II formulates local graph clustering as a particular instance of the nLasso problem.
• In Section III we derive the dual problem of the nLasso. We provide an interpretation of this dual problem as an instance of network flow optimization.
• Section IV presents a local clustering method by applying a primal-dual method to the nLasso problem. This method is appealing for big data applications as it can be implemented as a scalable message-passing method.
• Section V characterizes the clusters delivered by nLasso in terms of the amount of flow that can be routed from cluster boundaries to the seed nodes within that cluster. This offers a novel link between flow-based clustering and convex optimization.
We consider networked data which is represented by a simple undirected weighted graph . The nodes V = {1, . . . , n} represent individual data points. Undirected edges
connect similar data points
and are assigned a positive weight
. Absence of an edge between nodes
implies
. The neighbourhood of a node
It will be convenient to define a directed version of the graph G by replacing each undirected edge {i, j} by the directed edge. We overload notation and use G to denote the undirected and directed version of the empirical graph. The directed neighbourhoods of a node
Local graph clustering starts from a given set of seed nodes
The seed nodes might be obtained by exploiting domain knowledge and are grouped into batches
Each batch contains seed nodes of the same cluster
We allow the number of seed nodes to be a vanishing fraction of the entire graph. This is an extreme case of semi-supervised learning where the labelling ratio (viewing seed nodes as labeled data points) goes to zero.
The proposed local graph clustering method (see Section 3) operates by exploring the neighbourhoods of the seed nodes S. It constructs clusters around the seed nodes
such that only few edges leave the cluster
We characterize a cluster via its boundary
A good cluster is such that the total weight of the edges in its boundary
is small. We make this characterization more precise in Section V using network flows to quantify the connectivity between cluster boundary and seed nodes.
Local graph clustering methods learn graph signals that are good approximations to the indicator signals
. These indicator signals represent the clusters
around the seed nodes
(see (3)) via
Spectral graph clustering uses eigenvectors of the graph Laplacian matrix to approximations to cluster indicator signals. In contrast, we use TV minimization to learn approximations to the cluster indicators
We have recently explored the relation between network flow problems and TV minimization [14], [15]. Loosely speaking, the solution of TV minimization is piece-wise constant over clusters whose boundaries have a small total weight. This property motivates us to learn the indicator function for the cluster around the seed nodes
by solving
Here, we used the total variation (TV)
Note that (6) is a non-smooth convex optimization problem. It is a special case of the nLasso problem [9].
We solve a separate nLasso problem (6) for each batch for k=1, . . . , F, of seed nodes in the same cluster. The nodes
which are not seed nodes for
belong to one of two groups. One group
of nodes which belong to
and the other group of nodes
outside the cluster.
The special case of (6) when is studied in [14]. We can also interpret (6) as TV minimization using soft constraints instead of hard constraints [15]. While [15] enforces
for each seed node
, (6) uses soft constraints such that typically
at seed nodes
. Spectral methods use optimization problems similar to (6) but with the Laplacian quadratic form
instead of TV (7).
We hope that any solution to (6) is a good approximation to the indicator function of a well-connected subset around the seed nodes
. We use the graph signal
from solving (6) to determine a reasonable cluster
The idea of determining clusters via learning graph signals as (approximations) of indicator functions of good clusters is also underlying spectral clustering [30]. Instead of TV minimization underlying nLasso (6), spectral clustering uses the matrix Laplacian to score candidates for cluster indicator functions. Moreover, spectral clustering methods do not require any seed nodes with known cluster assignment.
The choice of the tuning parameters influence the behaviour of the clustering method and the properties of clusters delivered by (6). Their choice can be based on the intuition provided by a minimum cost flow problem that is dual (equivalent) to nLasso (6). This minimum cost flow problem is not defined directly on the empirical graph G but the augmented graph
. This augmented graph is obtained by augmenting the graph G with an additional node “
” and edges
for each node
As detailed in the supplementary material, the nLasso (6) is equivalent (dual) to the minimum cost flow problem [14], [15]
The constraints (9) enforce conservation of the flow at every node
. The constrains (10) enforce the flow
not exceeding the edge capacity
. There are no capacity constrains for augmented edges
The node signal solves (6) and the edge signal
solve (8), respectively, if and only if [25, Ch. 31]
We can interpret conditions (11), (12) as conservation laws satisfied by any flow that solves the nLasso dual (8). We can think of injecting (extracting) a flow of value
nodes
. The nodes
are leaking a flow of value
. The optimal flow
has to provide these demands while respecting the capacity constraints (13).
We illustrate the conditions (11)-(14) in Fig. 1 for a simple chain graph. According to (14), the nLasso solution change across edges e = (i, j) which are saturated
For a chain graph, using a suitable choice for
nLasso is able to recover a cluster structure as soon as the weights of boundary edges exceeds the weights of intra-cluster edges.
Fig. 1: The dual (8) of nLasso optimizes the flow through a leaky network obtained from the empirical graph G.
We use the optimality condition (11)-(14) to characterize the solutions of nLasso (6) in Section V. Combining this characterization with generative models for the clusters , such as stochastic block models, allows to derive sufficient conditions on the parameters of the generative model such that solutions of (6) allow to recover the true underlying local clusters [13].
The necessary and sufficient conditions (11)-(14) characterize any pair of solutions for (6) and its dual (8). We can find solutions to the conditions (11)-(14), which provides a solution to nLasso in turn, by reformulating those coupled condition as a fixed point equation.
There are many different fixed-point equations that are equivalent to the optimality conditions (11)-(14). We will use a particular construction which results in a method that is guaranteed to converge to a solution of (6) and (8) and can be implemented as a scalable message passing on the empirical graph G. This construction is discussed in great detail in [2] and has been applied to the special case of nLasso (6) for in our recent work [14]. For local graph clustering, we need
to force the solutions of (6) to decay towards zero outside the local cluster around the seed nodes
Applying tools from [14], we obtain the following updates generating two sequences and
, for r = 0, 1, . . ., converging to solutions of (6) and (8), respectively.
Here, is the inverse of the node degree
. Starting from an arbitrary initialization
and
, the iterates
and
converge to a solution of nLasso (6) and its dual (8), respectively [10].
The updates (15)-(20) define a message-passing on the empirical graph G to jointly solve nLasso (6) and its dual (8). The computational complexity of one full iteration is proportional to the number of edges in the empirical graph. The overall complexity also depends on the number of iterations required to ensure the iterate being sufficiently close to the nLasso (6) solution.
Basic analysis of proximal methods shows that the number of required iterations required scales inversely with the required sub-optimality of (see [3], [6]). This convergence rate cannot be improved for chain graphs [12]. For a fixed number of iterations and empirical graphs with bounded maximum node degree, the computational complexity of our method scales linearly with the number of nodes (data points).
We now develop an interpretation of the updates (15)-(20) as an iterative method for network flow optimization. The update (17) enforces the capacity constraints (10) to be satisfied for the flow iterates . The update (18) amounts to adjusting the current nLasso estimate
, for each node
by the demand induced by the current flow approximation
Together with the updates (19) and (20), the update (18) enforces the flow to satisfy the conservation laws (11) and (12). The update (16) aims at enforcing (14) by adjusting the cumulated demands
via the flow
through an edge
according to the difference
The above interpretation helps to guide the choice for the parameters . The edge capacities
rate by which the values
can be “build up”. Choosing
too small would, therefore, slow down the convergence of
On the other hand, using nLasso (6) with too large
allow to detect small local clusters
(see Section V).
We use the solution of nLasso (6) to approximate the indicator of a local cluster around the seed nodes
. The cluster delivered by our method is obtained by thresholding,
In practice we replace the exact nLasso solution in (21) with the iterate
obtained after a sufficient number r of primal-dual updates (15)-(20) (see Section VI). The threshold 1/2 is (21) is somewhat arbitrary. Our theoretical results can be easily adapted for other choices for the threshold. The question if there exists an optimal choice for the threshold and what this actually means precisely is beyond the scope of this paper.
Our main theoretical result is a necessary condition on the cluster (21) and the nLasso parameters (see (6)).
Proposition 1. Consider the cluster (21) obtained from the nLasso solution. Then, if
and
Proof. Follows from the optimality conditions (11)-(14).
The necessary conditions (22) and (23) can guide the choice of the parameters and
in (6). We can enforce nLasso to deliver clusters with small boundary
by using a a large
. Since the left side of (22) must not exceed the right hand side, using a large
enforces a cluster (21) such that
is small. In the extreme case of very large
leads to
being empty. There is a critical value for
in (6) beyond which the cluster
contains all connected components with seed nodes
We can combine (23) with an upper bound U on the number of nodes reached by message-passing updates (15)-(20). Inserting this bound U on the number of “relevant” nodes
into (23), yields the necessary condition
We verify Proposition 1 numerically on a chain graph with nodes V ={1, . . . , 100}. Consecutive nodes i and i+1 are connected by edges of weight
with the exception of edge
with the weight
We determine a cluster around seed node i = 1 using (21). The updates (15)-(20) are iterated for a fixed number of K = 1000 iterations. The nLasso parameters were set to
and
(see (6)). These parameter values ensure conditions (22) and (24) (with U =80) are satisfied for the resulting cluster is
We depict the resulting graph signal ”) for the first 20 nodes of
in Fig. 2. We also show the (scaled) eigenvector (“
”) of the graph Laplacian corresponding to the smallest non-zero eigenvalue. This eigenvector is known as the Fiedler vector and used by spectral graph clustering methods to approximate the cluster indicators [24]. According to Fig. 2, the (approximate) nLasso solution better approximates the indicator of the true cluster
Fig. 2: Signal (“”) obtained after 1000 iterations of (15)- (20) for the chain graph
. The resulting local cluster is
(see (21)). We also depict the graph signal (“
”) obtained by the eigenvector of the normalized graph Laplacian corresponding to the smallest non-zero eigenvalue.
In a second experiment, we compare our method with existing local clustering methods in a simple image segmentation task. We represent an image as a grid graph whose nodes are individual pixels. Vertically and horizontally adjacent pixels are connected by edges with weight with the greyscale value
We determine a local cluster around a set of seed nodes (see Fig. 3a) using K = 1000 iterations of (15)-(20) to approximately solve nLasso (6) (see Fig. 3b). The local cluster obtained by the flow-based capacity releasing diffusion (CRD)
Fig. 3: Image segmentation results.
method [31] is depicted in Fig. 3c. The local clustering obtained by the spectral method presented in [1] is shown in Fig. 3d. The seed nodes and resulting clusters obtained by the three methods are enclosed by a red contour line in Fig. 3. It seems that our method is the only method which can accurately determine the pixels belonging to the foreground object (a coin) around the seed nodes (see Fig. 3a).
A third experiment compares our method with existing clustering methods for an empirical graph being the realization of a partially labelled stochastic block model (SBM). We used a SBM with two blocks or clusters and
. Each cluster consists of 100 nodes. A randomly chosen pair of nodes is connected by an edge with probability 1/5 (1/100) if they belong to the same block (different blocks). The cluster (21) delivered by nLasso (6), with
and
and using 20 randomly chosen seed nodes, perfectly recovered the true clusters. The spectral method [1] achieved labelling accuracy (fraction of correctly labelled nodes) of 1/2. The flow-based methods [16], [31] achieved a labelling accuracy of around 9/10.
The source code for the above experiments can be found at https://github.com/alexjungaalto/.
We have studied the application of nLasso to local graph clustering. Our main technical result is a characterization of the nLasso solutions in terms of network flows between cluster boundaries and seed nodes. Conceptually, we provide an interesting link between flow-based clustering and non-smooth convex optimization. This work offers several avenues for follow-up research. We have recently proposed networked exponential families to couple the network topology with the information geometry of node-wise probabilistic models. It is interesting to study how the properties of these node-wise probabilistic models can be exploited to guide local clustering methods.
[1] R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In Proc. 47th Annual IEEE Symp. Found. Computer Science, pages 475–486, Oct. 2006.
[2] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis., 40(1), 2011.
[3] A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numer., 25:161–319, 2016.
[4] O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. The MIT Press, Cambridge, Massachusetts, 2006.
[5] F. R. ˜K. Chung. Spectral Graph Theory. 1997.
[6] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, and H. Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, volume 49. Springer New York, 2011.
[7] A. G. Dimakis, S. Kar, J. M. F. Moura, M. G. Rabbat, and A. Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847–1864, Nov. 2010.
[8] K. Fountoulakis, D. F. Gleich, and M. W. Mahoney. A short introduction to local graph clustering methods and software. In The 7th Int. Conf. Compl. Networks and App., Cambridge, UK, Dec. 2018.
[9] D. Hallac, J. Leskovec, and S. Boyd. Network lasso: Clustering and optimization in large graphs. In Proc. SIGKDD, pages 387–396, 2015.
[10] B. He, Y. You, and X. Yuan. On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imaging Sci., 7(4):2526–2537, 2014.
[11] L.G.S. Jeub, P. Balachandran, M.A. Porter, P.J. Mucha, and M.W. Mahoney. Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Phys. Rev. E, 2015.
[12] A. Jung. On the complexity of sparse label propagation. Front. Appl. Math. Stat., 4:22, July 2018.
[13] A. Jung. Clustering in partially labeled stochastic block models via total variation minimization. In Proc. 54th Asilomar Conf. Signals, Systems, Computers, Pacific Grove, CA, Nov. 2020.
[14] A. Jung. On the duality between network flows and network lasso. IEEE Sig. Proc. Lett., 27:940 – 944, 2020.
[15] A. Jung, A O. Hero, A. Mara, S. Jahromi, A. Heimowitz, and Y.C. Eldar. Semi-supervised learning in network-structured data via total variation minimization. IEEE Trans. Signal Processing, 67(24), Dec. 2019.
[16] K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. In D. Bienstock and G. Nemhauser, editors, Integer Programming and Combinatorial Optimization, pages 325–337, Berlin, Heidelberg, 2004. Springer.
[17] H. Li, Z. Bu, Z. Wang, and J. Cao. Dynamical clustering in electronic commerce systems via optimization and leadership expansion. IEEE Trans. Ind. Inf., 16(8):5327–5334, Aug. 2020.
[18] H.-J. Li, Z. Bu, Z. Wang, J. Cao, and Y. Shi. Enhance the performance of network computationby a tunable weighting strategy. IEEE Trans. Emerg. Top. Comp. Int., 2(3):214–223, Jun. 2018.
[19] H.-J. Li and J.J. Daniels. Social significance of community structure: Statistical view. Phys. Rev. E, 91(1):012801, Jan. 2015.
[20] H.-J. Li and L. Wang. Multi-scale asynchronous belief percolation model on multiplex networks. New Journal of Physics, 21, Jan. 2019.
[21] H.-J. Li, Q. Wang, S. Liu, and J. Hu. Exploring the trust management mechanism in self-organizing complex network based on game theory. Physica A: Stat. Mech. App., 542, 2020.
[22] M. E. J. Newman. Networks: An Introduction. Oxford Univ. Press, 2010.
[23] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Adv. Neur. Inf. Proc. Syst., 2001.
[24] P. Orponen and S.E. Schaeffer. Local clustering of large graphs by approximate fiedler vectors. In Proc. of the 4th international conference on Experimental and Efficient Algorithms, pages 524–533, 2005.
[25] R. T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, NJ, 1970.
[26] D. Spielman. Spectral graph theory. In U. Naumann and O. Schenk, editors, Combinatorial Scientific Computing. Chapman and Hall/CRC, 2012.
[27] D.A. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning. SIAM J. Comput., 42(1):1–26, Jan. 2013.
[28] N. Veldt, D.F. Gleich, and M.W. Mahoney. A simple and strongly-local flow-based method for cut improvement. In Proc. 33rd Int. Conf. Mach. Learn. (ICML), volume 48 of JMLR: W and CP, 2016.
[29] N. Veldt, C. Klymko, and D.F. Gleich. Flow-based local graph clustering with better seed set inclusion. In Proc. SIAM Int. Conf. on Data Mining, May 2019.
[30] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, Dec. 2007.
[31] D. Wang, K. Fountoulakis, M. Henzinger, M.W. Mahoney, and S. Rao. Capacity releasing diffusion for speed and locality. In Proc. of ICML, pages 3598–3607, 2017.
[32] L. Xiao, S. Boyd, and S.-J. Kim. Distributed average consensus with least-mean-square deviation. Journal of Parallel and Distributed Computing, 67(1):33–46, 2007.
Duality of nLasso (6) and Minimum Cost Flow (8). Let us rewrite nLasso (6) as
with the unweighted incidence matrix B of G. For some edge and node
if e = (i, j) for some
,
otherwise. The components in (25) are
According to [25, Cor. 31.2.1] (see also [3, Sec. 3.5]),
with the convex conjugates
and
Exploiting the separability of the supremum in (29),
Using (30) and (28) allows to rewrite the RHS of (27) as
Since maximizing some real-valued function is equivalent to minimizing
, the optimization problem (31) is equivalent to
Primal-Dual Optimality Condition (11)-(14). Consider the primal form (25) of the nLasso (6) and the corresponding dual problem on the RHS of (27). According to [25, Thm. 31.3], the graph signal solves (25) and the edge signal
solves (8), respectively, if and only if,
The second condition in (33) is equivalent to (13)-(14). This
Fig. 4: Components of convex conjugate
equivalence can be verified by evaluating the sub-differential of (see Figure 4). The first condition in (33) is equivalent to (11) and (12) since