b

DiscoverSearch
About
My stuff
Some Geometrical and Topological Properties of DNNs' Decision Boundaries
2020·arXiv
Abstract
Abstract

Geometry and topology of decision regions are closely related with clas-sification performance and robustness against adversarial attacks. In this paper, we use differential geometry to theoretically explore the geometrical and topological properties of decision regions produced by deep neural networks (DNNs). The goal is to obtain some geometrical and topological properties of decision boundaries for given DNN models, and provide some principled guidance to design and regularization of DNNs. First, we present the curvatures of decision boundaries in terms of network parameters, and give sufficient conditions on network parameters for producing flat or developable decision boundaries. Based on the Gauss-Bonnet-Chern theorem in differential geometry, we then propose a method to compute the Euler characteristics of compact decision boundaries, and verify it with experiments.

Although deep learning has been successfully applied to various domains including computer vision, speech recognition and natural language processing, the theoretical understanding of its success is still limited. One important theoretical aspect of deep learning is the geometry and topology of decision regions produced by DNNs. Geometry of decision boundaries concerns their curvature, size and convexity etc., and topology of decision boundaries considers their connectivity and genus (the number of holes) etc. There are two important relevant questions: on one hand, how to obtain the geometrical and topological properties of decision boundaries for given DNN models? And on the other hand, what are the constraints on network architecture and parameters in order to produce decision boundaries with desired geometrical and topological features? Besides theoretical interest, answering these questions is important to model selection [10, 17], reducing overfitting [5] and improving DNNs’ robustness against adversarial attacks [14, 9]. Moreover, researches on the geometry and topology of decision regions can provide some principled guidance for network design. For example, choose appropriate network architecture to produce disconnected decision boundaries [15], and regularize curvature or number of connected components to boost robustness or accuracy [14, 5, 11].

Researches on the geometry and topology of DNN produced decision regions are still at their infancy. This paper aims at exploring some geometrical and topological properties of DNNs’ decision regions using knowledges from differential geometry and topology [4, 13]. We first use differential geometry to compute the curvature of decision boundaries. We formulate curvatures in terms of network parameters, and based on that it is possible to design DNNs with desired geometrical properties. We then use global differential geometry, which establishes the analytical connection between local curvature and global topology of manifolds, to compute some topological properties of decision boundaries.

The contributions of this paper are summarized as follows:

For DNNs with any input dimension, we give sufficient conditions on network parameters for producing flat or developable decision boundaries.

Based on global differential geometry, more specifically, the Gauss-Bonnet-Chern theorem for 2D surfaces and higher-dimensional manifolds, we propose a method to compute the Euler characteristics (a topological property) of compact decision boundary manifolds.

This paper is organized as follows. Section 1.1 is related work. In section 2, we present the preliminaries of differential geometry, and introduce the feed-forward deep neural network models, as well as some definitions and notations. Curvatures of decision boundaries for 2D, 3D and higher-dimensional input spaces are explored in section 3. Section 4 presents the sufficient conditions on network parameters for producing flat or developable decision boundaries. In section 5, we propose the method to compute Euler characteristics of decision boundaries, with experiments on synthetic data. The proofs of main results are given in section 6. Finally, we give our conclusion and future work.

1.1 Related work

There have been some works on the geometry and topology of DNN produced decision regions. [16] experimentally finds that the decision boundary can become exponentially curved with depth, thus enabling highly complex classifications. [14, 9] show that decision boundaries exhibiting quasi-linear behavior in the vicinity of data points can improve adversarial robustness. Although numerical computations of curvatures have been presented in these works, systematic expressions of curvatures in terms of network parameters are still missing. [9] demonstrates that the decision regions of CaffeNet [12] trained on ImageNet are connected. Based on persistent homology [8], [17, 10] compute the topological properties of DNNs’ decision boundaries by constructing simplicial complexes of decision boundaries with discrete data points. In comparison, our method is analytical and based on global differential geometry. [15] demonstrates that a sufficiently wide hidden layer is necessary to produce disconnected decision regions. [2] shows that the decision regions are unbounded for DNNs with width less than or equal to input dimensions. [3] shows the topological expressiveness advantage of deep networks over shallow ones in terms of bounds on the sum of Betti numbers. To achieve a better balance between accuracy and overfitting, [5, 11] propose persistent homology based connectivity regularizers for logistic regression classifier and autoencoder respectively. [1] presents an algorithm to compute the topological features of neural networks’ objective functions.

2.1 Fully-connected feed-forward deep neural networks

We consider in this paper fully-connected feed-forward deep neural networks for binary classification. Let d be the input dimension, L be the number of hidden layers and  dibe the width of layer  i with d0 = d. The output of a fully-connected feed-forward DNN is

image

where  x ∈ Rdis the input, Wi ∈ Rdi×di−1and  bi ∈ Rdiare the weight matrix and bias vector of layer i respectively,  a ∈ RdLand c are the weight vector and bias of the output layer.  σ : R → Ris the activation function of every hidden layer which applies component-wise. We consider in this paper activation functions that are at minimum twice differentiable and strictly monotonically increasing (thus bijective), including the widely used sigmoid, tanh, and softplus (  σ(t) = 1α log(1 + eαt) ) which can approximate ReLU activation arbitrarily well.

We consider binary classification problems in this paper. The solution to f(x) = 0 gives the decision boundary,  {x ∈ Rd|f(x) > 0}gives the decision region of the positive class and  {x ∈ Rd|f(x) < 0}specifies the decision region of the negative class. We assume that the decision boundaries are smooth manifolds.

The pre-image of a mapping  f : U → Vis defined as the set  {x ∈ U|f(x) ∈V }. [d] stands for  {1, 2, · · · , d}.Both  wTiand Wi,·represent the ith row of matrix W, and W·,idenotes the ith column of W.

2.2 Differential geometry and topology

In this subsection, we will give a brief introduction to the concepts and knowledges of differential geometry and topology that will be used in this paper. Interested readers are referred to [4, 13] for the details.

2.2.1 Differential geometry of curves

image

parameter  t ∈ Rto a point  r(t) ∈ R2on the curve. Arc length s is usually used as the parameter t. The tangent vector at a point on the curve is defined as T = dr(s)ds .The Frenet formula  dT(s)ds = kNgives the definition of planar curvature k, where N is the unit normal vector.

2.2.2 Differential geometry of surfaces

A 2D surface is locally parameterized as r(u, v) = (x(u, v), y(u, v), z(u, v))T ,which maps the parameters (u, v) ∈ R2to a point  r(u, v) ∈ R3on the surface. Let ds be the infinitesimal arc length on the surface, then ds2= dr · dr= Edu2+ 2Fdudv + Gdv2, where  E = ru · ru, F = ru · rv, G = rv · rvare called coefficients of the first fundamental form. rudenotes ∂r(u,v)∂uand  rvdenotes ∂r(u,v)∂v. The second fundamental form, which measures the distance from r(u+du, v+dv) to the tangent plane at r(u, v), is given as  Ldu2+2Mdudv+Ndv2,where  L = ruu · n, N = rvv · n, M = ruv · nare called coefficients of the second fundamental form. n := ru×rv|ru×rv|is the unit normal vector,  ruu:= ∂2r(u,v)∂u2,

image

At any point on the surface, there are two principal curvatures  k1and  k2defined along two orthogonal principal directions. The Gaussian curvature is defined as  K = k1k2 = LN−M 2EG−F 2 .

The classification theorem of compact surfaces states that all orientable 2D compact surfaces can be classified into homeomorphism classes by their Euler characteristics. The Euler characteristic  χ(S) of a 2D compact surface S can take a value only in  {2, 0, −2, . . . , −2n, . . .}.For example,  χ= 2 for a sphere, χ= 0 for a torus and  χ = −2 for a double torus.  n = 2−χ(S)2is called the genus of S, which is the number of holes in S.

For an orientable 2D compact surface S, the Gauss-Bonnet theorem states ��S Kdσ = 2πχ(S), where dσis the area element. Gauss-Bonnet theorem estab- lishes the connection between local curvature and global topological property.

2.2.3 Differential geometry of higher-dimensional manifolds

High-dimensional case is much complex and unintuitive. A n-dimensional manifold is locally parameterized as  r = r (x1, x2, . . . , xn) . Then

image

where  gij = ri · rjand  ri = ∂r∂xi. g = (gij)is called metric tensor. The Riemannian connection is defined as Γmik:= 12Σjgmj (∂kgij + ∂igkj − ∂jgik) ,where  gmj is the element of  g−1,the inverse matrix of  g. ∂kgij = ∂gij∂xk .

The Riemann curvature tensor is defined as

image

At any point on the manifold, the curvature along any 2D tangent subspace (like K for 2D surfaces) can be obtained from  Rabik.The curvature 2-form is defined as

image

where  ∧is the outer product of differential forms.

image

sum of its Betti numbers, which in turn is a concept from algebraic topology. Intuitively, the 0-st Betti number  b0is the number of connected components, and the ith Betti number  bi (i ≥1) counts the number of i-dimensional holes of the manifold. The Gauss-Bonnet-Chern theorem for high-dimensional orientable compact manifolds gives the relationship between local curvature and global topological property. For a 2n-dimensional orientable compact manifold M, Gauss-Bonnet-Chern theorem states

image

where  e(M) = 12n(2π)nn!�a1,a2,··· ,a2n εa1,a2,··· ,a2nΩa1a2 ∧ · · · ∧ Ωa2n−1a2n, Ωab =Σc,dgacgbdΩcd, εa1··· ,a2n =�det(g)δ1···2na1···a2n.

δ1···2na1···a2n is the generalized Kronecker symbol which equals 1 if (a1 · · · a2n) is aneven permutation of (1 2  · · · 2n), −1 if odd permutation and 0 otherwise.

While there are numerical computations of curvatures in (e.g. [16, 14]), a systematic and analytical treatment of decision boundary’s curvature in the framework of differential geometry is still absent. In this section, we will compute the curvature of decision boundaries for 2D, 3D and higher-dimensional input spaces respectively. The key observation is that although the decision boundaries usually cannot be solved analytically, we can use the equation f(x) = 0 to describe them implicitly and use the derivatives of implicit functions to compute curvatures. The expressions of curvatures obtained in this section will be used frequently later in this paper.

3.1 2D input space

In 2D input space, the decision boundary is in general a curve and satisfy f(x, y) = 0. This gives an implicit function y = y(x).

Lemma 3.1. The planar curvature of a decision boundary in 2D input space is

image

where f = f(x, y) is the output function of neural network,  fx = ∂f∂x, fxx = ∂2f∂x2and so on.

For neural networks, we can compute the derivatives involved in (2) and consequently k in terms of network weights. For a one-hidden-layer network f(x) = aTσ(Wx + b) + c, its 1st-order and 2nd-order partial derivatives are as follows,

image

then the 1st-order derivative is as follows

image

where  x1 := x, x2 := y, J(xl) := diag(σ′(xl1), σ′(xl2), · · · , σ′(xldl)). The 2nd-order derivatives are tedious and omitted here to save space.

3.2 3D input space

In 3D input space, the decision boundary is a 2D surface satisfying f(x, y, z) = 0. Its Gaussian curvature is described by the following lemma.

Lemma 3.2. The Gaussian curvature of a decision boundary in 3D input space is

image

3.3 Higher-dimensional input space

For high-dimensional case, we need to compute the curvature tensor and curvature 2-form. Before that, it is necessary to compute metric tensor (gij) and Riemannian connection Γmik. The details of computation will be presented in subsection 6.3.

image

Given the expressions of curvatures of decision boundaries, one way wonder what network parameters can generate decision boundaries with some desired geometrical properties, such as flatness and convexity etc. As a starting point, we discuss in this paper how to get flat or developable decision boundaries with appropriate network weights. It has been shown in [14, 9] that there is a strong relation between small curvature decision boundaries and large robustness against adversarial attacks.

We will start with low-dimensional input space and networks with one hidden layer, and then extend to the general case of arbitrary input dimensions and DNNs. We will derive the sufficient conditions for flat or developable decision boundaries, and give geometric interpretations of some conditions to show intuitively how they are produced.

4.1 2D input space and one-hidden-layer networks

In 2D input space, the flat decision boundary is a line. The sufficient conditions for 2D input space and one-hidden-layer networks are given by the following theorem.

Theorem 4.1. For one-hidden-layer neural networks f(x) =  aT σ(Wx + b) + c �a ∈ Rd1�with 2D input x and twice differentiable and bijective activation function  σ,one of the following conditions is sufficient to produce linear decision boundaries with zero curvatures:

a) aiWi1 = 0 or aiWi2 = 0 (∀i ∈ [d1]).

b) aiaj (Wi1Wj2 − Wi2Wj1) = 0 (∀i, j ∈ [d1]).

image

if  ∀i, ai ̸= 0, we have W·,1 = 0 or W·,2 = 0.Geometrically, W·,1 = 0 or W·,2 = 0transforms the 2D input space into a line which, after activation by  σ,results in a curve. This curve will intersect the (hyper)plane (or line) in  Rd1 specified by (a, c) at a point (the case in which the curve lies on the (hyper)plane is omitted, since it implies that f(x) = 0 (∀x ∈ R2) and thus the whole input space is decision boundary), whose pre-image in input space is a line. Thus, W·,1= 0 or W·,2= 0 produces linear decision boundaries. Fig.2(a) illustrates the case of W·,1 = 0.If ai ̸= 0 and Wi1= 0 or Wi2= 0 only for partial values of i, for example,if  d1= 3 and a = (0, 1, 1)T , W21= W31= 0, then the image of Wx + b =

image

becomes a curved surface whose normal’s x component is also 0. The intersection of  σ(p) with the plane q in  Rd1with normal a is a straight line parallel to x axis, whose pre-image in input space is a line parallel to x axis. This case is illustrated in fig.2(b).

The case of (Wi1Wj2 − Wi2Wj1) = 0 (∀i, j ∈ [d1])means W is rank-deficient and the image of Wx + b (x ∈ R2) is a line, similar to the case shown in fig. 2(a).

4.2 3D input space and one-hidden-layer networks

For 3D input space and one-hidden-layer networks, the sufficient conditions for flat or developable decision boundaries are presented in the following theorem.

Theorem 4.2. For one-hidden-layer networks  f(x) = aT σ(Wx+b)+c�W ∈ Rd1×3�

with 3D input and twice differentiable and bijective activation function  σ, one of the following conditions is sufficient to produce decision boundaries with zero

Gaussian curvatures:

image

Fig. 2(c) illustrates the case of W·,1= 0 for  d1= 3. The 3D input space is mapped into a plane p. Denote by q the plane in  Rd1determined by (a, c), σ−1(q) will be a curved surface. The intersection of p with  σ−1(q) is a curve, whose pre-image in input space is a developable surface that is curved along one direction and straight along the orthogonal direction (using the terminology of differential geometry, one of the principal curvatures  k1and  k2is zero). It is worthy to point out that in 3D input space,  K = k1k2= 0 does not always imply planar decision boundaries (k1 = k2= 0), developable surfaces (one of  k1and  k2is zero) are also possible, as shown in fig. 2(c).

4.3 General case: higher dimensional input space and deep neural networks

From previous subsections, it can be seen that for one-hidden-layer networks, while deriving from different expressions of curvatures for different input dimensions, there are some common conditions for flat or developable decision boundaries. By extension, we will give in this subsection the sufficient conditions for producing flat or developable decision boundaries with DNNs of arbitrary input dimensions. Instead of starting from complex curvature tensor, our proof (in subsection 6.6) will be based on analyzing the equation f(x) = 0. We shall point out that although flat decision boundaries can be obtained without utilizing the expressions of curvatures, many other geometry and topology related tasks, such as those in section 5, still rely heavily on curvatures.

image

Remark: We have obtained decision boundaries that are straight along certain axes. If one wants to have a decision boundary that is straight along an arbitrary direction, the input space can be rotated at first to make the desired direction be aligned with certain axis, and a decision boundary that is straight along this axis can be produced like before and then the input space is rotated back.

image

In this section, based on the Gauss-Bonnet-Chern theorem for surfaces and higher-dimensional manifolds, we describe how to obtain the Euler characteristic of a compact decision boundary.

5.1 3D input space

In this subsection, given a compact decision surface in 3D input space, we discuss how to compute its topological property in terms of Euler characteristic. The result is presented in the following theorem, which is a direct consequence of the Gauss-Bonnet theorem for 2D surfaces.

Theorem 5.1. In 3D input space, for a compact orientable decision boundary surface S, its Euler characteristic is  χ(S) = 12π��S Kdσ, where K is the Gaussian curvature given in (5), dσis the area element.

Example of compact surface and numerical computation of the integral ��S Kdσwill be presented in subsection 5.3.

5.2 Higher-dimensional input space

For higher-dimensional input space, the topological structure of a compact orientable decision boundary can be characterized by its Euler characteristic. The following theorem shows that the Euler characteristic of a DNN’s decision boundary can be obtained by an integral of local curvature, which is a direct application of the Gauss-Bonnet-Chern theorem for higher-dimensional manifolds.

Theorem 5.2. For a 2n-dimensional compact orientable decision boundary manifold M, its Euler characteristic  χ(M)can be obtained by

image

where

image

and ab= Σc,dgacgbdΩcd , εa1,a2,···a2n =�det(g)δ1···2na1···a2n. The expressions of metric tensor g and curvature 2-form abin terms of network weights will be given in subsection 6.3.

Remark. For high-dimensional input space, computing the integral in (6) faces two difficulties: the exponentially increasing number of terms in e, and the high dimensions of integral. Efficient computation of this integral using techniques such as sampling will be one of our future work.

5.3 Experiments on Topological Properties

5.3.1 3D input space

In this subsection, we conduct experiments to show that in 3D input space, for sphere-like (genus zero) decision boundaries obtained by neural network training, the integral��S Kdσin Theorem 5.1 is indeed 2πχ(S), where  χ(S) = 2 is the Euler characteristic of sphere-like surfaces, thus demonstrating the correctness of Theorem 5.1 for genus zero decision boundaries.

We generate a synthetic dataset, shown in fig. 2(a), as follows for binary classification. For the first class, we draw 600 3D data points from the 3D Gaussian distribution with zero mean vector and identity covariance matrix. Each data point is then normalized to have unit vector length, i.e., these points are located on a unit sphere after normalization. Data points of the second class are obtained by scaling the data vectors of the first class such that they lie on the sphere with a radius of 2.

A one-hidden-layer neural network is trained to classify the data points. In the hidden layer, there are 40 hidden neurons with the tanh activation function. Network weights are initialized by a Gaussian distribution with zero mean and a variance of 0.01, and each bias is initialized to zero. Gradient descent is used to train the network with cross-entropy loss, and the learning rate is set to 0.5. The training stops after 5  × 105 iterations, obtaining a training accuracy of 98.9%.

We now describe how to compute the integral��S Kdσ, where Kis the Gaus- sian curvature given in (5), dσis the surface element. We partition the decision boundary surface into a number of small patches, then according to the defini-tion of surface integral,��S Kdσcan be approximated by  �ni=0 K(ξi, ηi, ζi)∆Si,where (ξi, ηi, ζi) is any point in the ith patch, ∆Siis its area. When the patches are sufficiently small, the integral��S Kdσcan be obtained as such with high accuracy. Matlab is used to implement the computation. We use levelset of f(x, y, z) = 0 to compute the decision boundary explicitly. We first use the meshgrid() function in Matlab to sample a 3D grid of points evenly spaced in the region of interest that contains all input data points, and the distance between two neighboring grid points is given by a parameter  λ. Then, the values of f(x, y, z) at all grid points are computed, and the isosurface() function in Matlab is utilized to find the decision boundary, defined as the levelset of f(x, y, z) = 0, from the values at grid points. The isosurface() function returns the decision boundary in the form of faces and vertices of isosurface, and we then use the patch() function of Matlab to visualize the decision boundary through plotting filled polygonal faces. Fig. 2(b) shows the obtained decision boundary. Based on the faces and vertices of isosurface, we calculate the term  K(ξi, ηi, ζi) for each patch on the decision boundary using (5). Each patch generated by isosurface() is a triangle, thus the area of each patch ∆Sican be computed by 0.5|−→ab × −→ac|,where a, b, c are the three vertices of a triangle. Finally, integral��S Kdσis approximated by �ni=0 K(ξi, ηi, ζi)∆Si.

In our experimental results, for the obtained genus zero (sphere-like) decision boundary, we have��S Kdσ= 12.65 when  λ= 0.02, which is very close to 4π = 12.57 and apparently different from��S Kdσfor surfaces with other Euler characteristics (2πχ(S) = 0, −4π etc), demonstrating the correctness of Theorem 5.1 for genus zero decision boundaries. The accuracy of evaluating��S Kdσ canbe increased with smaller  λ.

6.1 Proof of Lemma 3.1

The curvature of a planar curve can be found in some standard calculus textbooks (e.g. [6]) as follows,

image

where  y′= dydxand  y′′ = d2ydx2. We then need to compute  y′and  y′′involved in (8). Given the equation of decision boundary f(x, y) = 0 which implicitly defines a curve y = y(x), its derivative with respect to x leads to

image

Substituting them into (8), we get (2).

6.2 Proof of Lemma 3.2

The equation f(x, y, z) = 0 defines a decision boundary surface z = z(x, y) implicitly. The Gaussian curvature K at a point r = (x, y, z(x, y))Ton the surface is given by [7]

image

From  f(x, y, z) = 0, we have ∂f∂x + ∂f∂z · ∂z∂x = 0, thus zx = ∂z∂x = − fxfz and similarly zy = − fyfz. Similar to (9), we can obtain  zxx, zyyand  zxy. Substituting them into (10) and through direct calculation, we can obtain (5).

6.3 Curvatures for higher-dimensional input space

For high-dimensional input space, we need to compute the curvature tensor and curvature 2-form, which in turn depend on the metric tensor (gij) and Riemannian connection Γmik.

In d-dimensional input space, the decision boundary is generally a (d −1)-dimensional manifold. We use  xi (i= 1, 2, . . . , d −1) to parameterize the decision boundary, thus a point on the decision boundary is locally represented as r = (x1, x2, · · · , xd−1, xd)T ,where  xd = xd (x1, x2, · · · , xd−1)is implicitly defined by the equation  f (x1, x2, · · · , xd−1, xd)= 0. The metric tensor  gijis defined by ds2 = �d−1i,j=1 gijdxidxj. Therefore, by

image

and by ∂xd∂xi = −fi/fd,there is ds2 = �d−1i=1 dx2i + �d−1i,j=1fifjf 2d dxidxj.We get the components of (gij) as follows,

image

where  gmj is the element of  g−1. The computation is very lengthy and tedious, hence we only show the key techniques to save space. For example,  ∂kgijis computed as follows,

image

(12) The Riemann curvature tensor is defined by

image

Therefore, we have

image

Finally, the curvature 2-form is

image

abis used in Gauss-Bonnet-Chern theorem to compute Euler characteristics of compact manifolds.

6.4 Proof of Theorem 4.1

For a one-hidden-layer network f(x) =  aTσ(Wx + b) + c, its 1st-order and 2nd-order partial derivatives have been given in (3). By (2), we can obtain the expression of k in terms of network weights. Here we only concern the following part of k that will cause k = 0.

image

=�

image

= 0.

(16) For general activation functions,  σ′′(y) and σ′(y) cannot equal zero everywhere. Therefore by (16), k(x) = 0 at any point x on the decision boundary requires

aiajak[Wi1Wk2 (Wi1Wj2 − Wi2Wj1) + Wi2Wj1 (Wi2Wk1 − Wi1Wk2)] = 0, (∀i, j, k ∈ [d1]) .

(17) There are serval possible non-trivial (i.e.,  a ̸= 0 and W ̸= 0 ) cases making (17) satisfied, which constitute the sufficient conditions for k(x) = 0.

a) aiWi1= 0 or aiWi2= 0 (∀i ∈ [d1]). So, if ai ̸= 0, there is Wi1= 0 or Wi2 = 0. Let I =�i ∈ Rd1|ai ̸= 0�,we have (assume aiWi1 = 0)

image

due to Wi1= 0 (i ∈ I).This equation only involves variable y. As a result, its solution  y∗ determines a linear decision boundary that is parallel to x axis.

b) aiaj (Wi1Wj2 − Wi2Wj1)= 0 (∀i, j ∈ [d1]). Wi1Wj2 − Wi2Wj1= 0 means the ith row and jth row of W are linear dependent, hence vectors in {wi|i ∈ I}are linear dependent. Without loss of generality, suppose  w1 ̸= 0 and1  ∈ I, thus wi = ciw1 (i ∈ [k], ciis a constant) . Then, we have

image

f(x) now only involves  w1 · x. The solution  w1 · x = d∗ to f(x) = 0 produces a linear decision boundary in input space whose normal is  w1.

6.5 Proof of Theorem 4.2

Given the derivatives for one-hidden-layer networks in (3), the numerator of K in (5) can be expressed as

image

= �

image

and aiajak[Wi3Wj1 (Wi2Wk3 − Wi3Wk2) + Wi1Wk3 (Wi3Wj2 − Wi2Wj3)] = 0,(19) which further lead to the following sufficient conditions for K = 0.

a) aiWi1= 0 or aiWi2= 0 or aiWi3= 0 (∀i ∈ [d1]). As an example, aiWi1= 0 means Wi1 = 0 (i ∈ I).Thus, on the decision boundary,

image

This is an equation of y and z, giving an implicit curve z = z(y). The pre-image of this curve in 3D input space is a developable surface that is straight along x direction.

b) aiaj (Wi3Wj2 − Wi2Wj3)= 0 (∀i, j ∈ [d1]) .This case also generates developable decision boundaries. The reason is as follows. We have

image

Choose an index  k ∈ I for which wk ̸= 0, then Wi3Wj2 −Wi2Wj3 = 0 (∀i, j ∈ I)implies  wj,2∼3 = cjwk,2∼3 (j ∈ I), where  wTj,2∼3denotes the vector composed of the 2nd and 3rd columns of  wTj,  cjis a constant. Define a new variable

image

which specifies a line parallel to the  y − zplane with normal (0, Wk2, Wk3)⊤. As a result, the whole decision boundary is a developable surface.

6.6 Proof of Theorem 4.3

image

We have explored some geometrical and topological properties of decision boundaries produced by DNNs using knowledge of differential geometry. Based on the curvatures of decision boundaries in terms of network parameters, we give sufficient conditions on neural network parameters in order to produce flat or developable decision boundaries. Based on the Gauss-Bonnet-Chern theorem in differential geometry, we then proposed a method to obtain the Euler characteristics of compact decision boundaries using a particular integral of curvatures, and conducted experiments to compute the Euler characteristics of sphere-like decision boundaries generated by neural networks.

In future work, we plan to apply efficient numerical techniques to compute the integrals in section 5 for topological properties of decision boundaries in high-dimensional input space. We are also interested in exploring the constraints on network architecture and parameters in order to produce decision boundaries with more complex geometrical properties, such as convexity or constant curvatures.

[1] Serguei Barannikov, Alexander Korotin, Dmitry Oganesyan, Daniil Emtsev, and Evgeny Burnaev. Barcodes as summary of objective functions’ topology. arXiv preprint arXiv:1912.00043, 2019.

[2] Hans-Peter Beise, Steve Dias Da Cruz, and Udo Schr¨oder. On decision regions of narrow deep neural networks. arXiv preprint arXiv:1807.01194, 2018.

[3] M. Bianchini and F. Scarselli. On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Transactions on Neural Networks and Learning Systems, 25(8):1553–1565, 2014.

[4] Manfredo P. Do Carmo. Differential Geometry of Curves and Surfaces, 1st edition. Prentice-Hall, 1976.

[5] Chao Chen, Xiuyan Ni, Qinxun Bai, and Yusu Wang. A topological regularizer for classifiers via persistent homology. In AISTATS, 2019.

[6] Paul Dawkins. Paul’s online notes on calculus 3, section 1-10: Curvature. https://tutorial.math.lamar.edu/Classes/CalcIII/Curvature. aspx, 2021.

[7] Markus Deserno. Notes on differential geometry with special emphasis on surfaces in r3. https://www.cmu.edu/biolphys/deserno/pdf/diff_ geom.pdf, 2004.

[8] H Edelsbrunner and J Harer. Persistent homology—a survey. Contemporary Mathematics, 453:257–282, 2008.

[9] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard, and Stefano Soatto. Empirical study of the topology and geometry of deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[10] William H. Guss and Ruslan Salakhutdinov. On characterizing the capacity of neural networks using algebraic topology. arXiv preprint arXiv:1802.04443, 2018.

[11] Christoph D. Hofer, Roland Kwitt, Mandar Dixit, and Marc Niethammer. Connectivity-optimized representation learning via persistent homology. In International Conference on Machine Learning, 2019.

[12] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia (MM), 2014.

[13] J. M. Lee. Manifolds and differential geometry. American Mathematical Society Providence, 2009.

[14] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[15] Q. Nguyen, M. Mukkamala, and M. Hein. Neural networks should be wide enough to learn disconnected decision regions. In International Conference on Machine Learning, 2018.

[16] Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, and Surya Ganguli. Exponential expressivity in deep neural networks through transient chaos. In Advances in Neural Information Processing Systems, 2016.

[17] Karthikeyan Natesan Ramamurthy, Kush R. Varshney, and Krishnan Mody. Topological data analysis of decision boundaries with application to model selection. In International Conference on Machine Learning, 2019.

image

(a) Rank-deficient W (W  ∈ R2×2, W·,1= 0) maps 2D input space into a line, and then the curve  σ(Wx + b) intersects the line with normal a at a point whose pre-image in input space is a linear decision boundary.

image

(b) Wx + b (W ∈ R3×2, W21 = W31= 0) maps 2D input space into a plane p in 3D, then the curved surface  σ(p) intersects the plane q with normal  a = (0, 1, 1)T at a line (both planes p and q are parallel to x axis) whose pre-image in input space is a linear decision boundary.

image

(c) Rank-deficient W (W  ∈ R3×3, W·,1= 0) maps 3D input space into a plane p, and its intersection with  σ−1(q) (qis the plane with normal a ) is a curve whose pre-image in 3D input space is a developable decision boundary.

Figure 1: Illustrations of how flat or developable decision boundaries are generated.

image

Figure 2: (a). a synthetic dataset for binary classification. (b). the decision boundary produced by a one-hidden-layer neural network.


Designed for Accessibility and to further Open Science