b
We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distribution that integrates the Riemannian distance, allowing us to achieve GDP in Riemannian manifolds with bounded Ricci curvature. To the best of our knowledge, this work marks the first instance of extending the GDP framework to accommodate general Riemannian manifolds, encompassing curved spaces, and circumventing the reliance on tangent space summaries. We provide a simple algorithm to evaluate the privacy budget on any one-dimensional manifold and introduce a versatile Markov Chain Monte Carlo (MCMC)-based algorithm to calculate
any Riemannian manifold with constant curvature. Through simulations on one of the most prevalent manifolds in statistics, the unit sphere
, we demonstrate the superior utility of our Riemannian Gaussian mechanism in comparison to the previously proposed Riemannian Laplace mechanism for implementing GDP.
As technological advancements continue to accelerate, we are faced with the challenge of managing and understanding increasingly complex data. This data often resides in nonlinear manifolds, commonly found in various domains such as medical imaging [Pennec et al., 2019, Dryden, 2005, Dryden et al., 2009], signal processing [Barachant et al., 2010, Zanini et al., 2018], computer vision [Turaga and Srivastava, 2015, Turaga et al., 2008, Cheng and Vemuri, 2013], and geometric deep learning [Belkin et al., 2006, Niyogi, 2013]. These nonlinear manifolds are characterized by their distinct geometric properties, which can be utilized to extract valuable insights from the data.
As data complexity grows, so does the imperative to safeguard data privacy. Differential Privacy (DP) [Dwork et al., 2006b] has gained recognition as a prominent mathematical framework for quantifying privacy protection, and a number of privacy mechanisms [McSherry and Talwar, 2007, Barak et al., 2007, Wasserman and Zhou, 2010, Reimherr and Awan, 2019] have been devised with the aim of achieving DP. Conventional privacy mechanisms, while effective for dealing with linear data, encounter difficulties when handling complex non-linear data. In such cases, a common approach, referred to as the extrinsic approach, is to embed the non-linear data into the ambient Euclidean space, followed by the application of standard DP mechanisms. However, as exemplified in the work of Reimherr et al. [2021], the intrinsic properties of such non-linear data enable us to achieve better data utility while simultaneously preserving data privacy. Therefore, it is imperative that privacy mechanisms adapt to the complexity of non-linear data by employing tools from differential geometry to leverage the geometric structure within the data.
Related Work Reimherr et al. [2021] is the first to consider the general manifolds in the DP literature. It extends the Laplace mechanism for -DP from Euclidean spaces to Riemannian manifolds. Focusing on the task of privatizing Frechet mean, it demonstrates that better utility can be achieved when utilizing the underlying geometric structure within the data. Continuing its work, Soto et al. [2022] develops a K-norm gradient mechanism for
-DP on Riemannian manifolds and shows that it outperforms the Laplace mechanism previously mentioned in the task of privatizing Frechet mean. Similarly, Utpala et al. [2023b] extends
-DP and its Gaussian mechanism but only to one specific manifold, the space of symmetric positive definite matrices (SPDM). Equipping the space of SPDM with the log Euclidean metric, it becomes a geometrically flat space [Arsigny et al., 2007]. This allows them to simplify their approach and work with fewer complications, although at the expense of generality. In contrast to the task of releasing manifold-valued private summary, Han et al. [2022], Utpala et al. [2023a] focus on solving empirical risk minimization problems in a
-DP compliant manner by privatizing the gradient which resides on the tangent bundle of Riemannian manifolds. Working on tangent spaces instead of the manifold itself, they could bypass many of the difficulties associated with working under Riemannian manifolds.
Motivations Although the -differential privacy (DP) and its Laplace mechanism have been extended to general Riemannian manifolds in Reimherr et al. [2021], there are other variants of DP [Dwork et al., 2006a, Mironov, 2017, Bun and Steinke, 2016, Dong et al., 2022]. Each of them possesses unique advantages over the pure DP definition, and therefore their extensions should be considered as well. As one such variant, GDP offers superior composition and subsampling properties to that of
-DP. Additionally, it’s shown that all hypothesis testing-based privacy definitions converge to the guarantees of GDP in the limit of composition [Dong et al., 2022]. Furthermore, when the dimension of the privatized data approaches infinity, a large class of noise addition private mechanisms is shown to be asymptotically GDP [Dong et al., 2021]. These traits establish GDP as the focal privacy definition among different variants of DP definitions and therefore make it the most suitable option for generalizing to Riemannian manifolds.
Main Contributions With the goal of releasing manifold-valued statistical summary in a GDPcompliant manner, we extend the GDP framework to general Riemannian manifolds, establishing the ability to use Riemannian Gaussian distribution for achieving GDP on Riemannian manifolds. We then develop an analytical form to achieve -GDP that covers all the one-dimensional cases. Furthermore, we propose a general MCMC-based algorithm to evaluate the privacy budget
on Riemannian manifolds with constant curvature. Lastly, we conduct numerical experiments to evaluate the utility of our Riemannian Gaussian mechanism by comparing it to the Riemannian Laplace mechanism. Our results conclusively demonstrate that to achieve GDP, our Gaussian mechanism exhibits superior utility compared to the Laplace mechanism.
In this section, we first cover some basic concepts from Riemannian geometry. The materials covered can be found in standard Riemannian geometry texts such as Lee [2006], Petersen [2006], Pennec et al. [2019], Said [2021]. Then we review some definitions and results on DP and GDP, please refer to Dwork and Roth [2014], Dong et al. [2022, 2021] for more detail.
2.1 Riemannian Geometry
Throughout this paper we let M denote a d-dimensional complete Riemannian manifold unless stated otherwise. A Riemannian metric g is a collection of scalar products on each tangent space
at points x of the manifold that varies smoothly from point to point. For each x, each such scalar product is a positive definite bilinear map
Equipped with a Riemannian metric g, it grants us the ability to define length and distance on M. Consider a curve , the length of the curve is given by the integral
where the is the velocity vector and norm
is uniquely determined by the Riemannian metric g. Note we use
to denote the
norm throughout this paper.
It follows that the distance between two points is the infimum of the lengths of all piecewise smooth curves from x to
. In a similar fashion, we can introduce the notion of measure on M. The Riemannian metric g induces a unique measure
Borel
-algebra of M such that in any chart
is the matrix of the Riemannian metric
is the Lebesgue measure in U [Grigoryan, 2009].
Given a point and a tangent vector
, there exists a unique geodesic
from
with tangent vector
defined in a small neighborhood of zero. It can then be extended to R since we assume M is complete. This enables us to define the exponential map
as
. For any
, there is a neighborhood V of the origin in
and a neighborhood
is a diffeomorphism. Such U is called a normal neighborhood of p. Locally, the straight line crossing the origin in
transforms into a geodesic crossing through p on M via this map. On the normal neighborhood U, the inverse of the exponential map can be defined and is denoted by
. The injectivity radius at a point
is then defined as the maximal radius R such that
is a normal neighborhood of p, and the injectivity radius of M is given by
2.2 Differential Privacy
We start this section with the definition of
Definition 2.1 ([Dwork et al., 2006a]). A data-releasing mechanism M is said to be -differentially private with
, if for any adjacent datasets, denoted as
, differing in only one record, we have
for any measurable set A in the range of M.
Differential privacy can be interpreted from the lens of statistical hypothesis testing [Wasserman and Zhou, 2010, Kairouz et al., 2017]. Given the outcome of a -DP mechanism and a pair of neighboring datasets
, consider the hypothesis testing problem with
: The underlying dataset is D and
: The underlying dataset is
. The smaller the
are, the harder this hypothesis testing will be. That is, it will be harder to detect the presence of one individual based on the outcome of the mechanism. More specifically,
-DP tells us the power (that is, 1 - type II error) of any test at significance level
is bounded above by
. Using this hypothesis testing interpretation, we can extend
-DP to the notion of Gaussian differential privacy.
Let denote the distributions of the outcome under
respectively. Let
denote the optimal tradeoff between type I error and type II error. More specifically,
is the smallest type II error when type I error equals
Definition 2.2 ([Dong et al., 2022]). A mechanism M is said to satisfy -Gaussian Differential Privacy
-GDP) if
for all neighboring datasets
with
Informally, -GDP states that it’s harder to distinguish D from
than to distinguish between
. Similar to the case of
-differential privacy, a smaller value of
stronger privacy guarantees. As a privacy definition,
-GDP enjoys several unique advantages over the
-DP definition. Notably, it has a tight composition property that cannot be improved in general. More importantly, a crucial insight in Dong et al. [2022] is that the best way to evaluate the privacy of the composition of many "highly private" mechanisms is through
-GDP. More specifically, it gives a central limit theorem that states all hypothesis testing-based privacy definitions converge to the guarantees of
-GDP in the limit of composition. Furthermore, Dong et al. [2021] shows that a large class of noise addition mechanisms is asymptotic
-GDP when the dimension of the privatized data approaches infinity. These distinct characteristics position
-GDP as the focal privacy definition among different variants of DP definitions, and we will extend the
-GDP framework to general Riemannian manifolds in Section 3.
Our primary objective in this study is to disclose a M-valued statistical summary while preserving privacy in a GDP-compliant manner. To this end, we first extend the GDP definition to general
Riemannian manifolds. In Definition 2.2, -GDP is defined through the optimal trade-off function
, which is challenging to work with on Riemannian manifolds. We successfully resolve this difficulty by expressing
-GDP as an infinite collection of
-DP (Corollary 1 in Dong et al. [2022]). Since
-DP is a well-defined notion on any measurable space [Wasserman and Zhou, 2010], it readily extends to any Riemannian manifold equipped with the Borel
Following this methodology, we define
-GDP on general Riemannian manifolds as follows. Definition 3.1. A M-valued data-releasing mechanism M is said to be
-GDP if it’s
for all
and denotes the cumulative distribution function of the standard normal distribution.
Similarly, we extend the notion of sensitivity to Riemannian manifolds as well.
Definition 3.2 (Reimherr et al. [2021]). A summary f is said to have a global sensitivity of with respect to
, if we have
for any two dataset
Following the extension of -GDP to Riemannian manifolds, it is crucial to develop a private mechanism that is compliant with
-GDP. Given that the Gaussian distribution satisfies
-GDP on Euclidean space [Dong et al., 2022], we hypothesize that an analogous extension of the Gaussian distribution into Riemannian manifolds would yield similar adherence to
-GDP. (e.g., [Reimherr et al., 2021] for extending Laplace distribution to satisfy
-DP on Riemannian manifolds). We introduce the Riemannian Gaussian distribution in the following definition.
Definition 3.3 (Section 2.5 in Pennec et al. [2019]). Let (M, g) be a Riemannian manifold such that
We define a probability density function w.r.t
We call this distribution a Riemannian Gaussian distribution with footprint and denote it by
The necessity for to be finite is generally of little concern, as it has been shown to be so in any compact manifolds [Chakraborty and Vemuri, 2019] or any Hadamard manifolds1—with lower-bounded sectional curvature [Said, 2021]. The distribution we introduce has been established to maximize entropy given the first two moments [Pennec et al., 2019, Pennec, 2006], and has already found applications in a variety of scenarios [Zhang and Fletcher, 2013, Hauberg, 2018, Said et al., 2017, Cheng and Vemuri, 2013, Zanini et al., 2018, Chakraborty and Vemuri, 2019]. When
the Riemannian Gaussian distribution reduces to the multivariate Gaussian distribution with a mean of
and a variance of
However, it is critical to highlight that this extension is not the only possible method for integrating the Gaussian distribution into Riemannian manifolds. Other approaches are available, such as the heat kernel diffusion process articulated by Grigoryan [2009], or the exponential-wrapped Gaussian introduced by Chevallier et al. [2022]. For an in-depth exploration of sampling from the Riemannian Gaussian distribution defined above, we refer readers to Section 4.1.
Furthermore, the subsequent theorem underscores the ability of the Riemannian Gaussian distribution, as defined herein, to meet the requirements of Gaussian Differential Privacy.
Theorem 3.1. Let M be a Riemannian manifold with lower bounded Ricci curvature and f be a M-valued summary with global sensitivity . The Riemannian Gaussian distribution with footprint
-GDP for some
Proof. See Appendix A.1.
While Theorem 3.1 confirms the potential of the Riemannian Gaussian distribution to achieve GDP, it leaves the relationship between the privacy budget () and the rate (
) undefined. The subsequence theorem establishes such a connection:
Theorem 3.2 (Riemannian Gaussian Mechanism). Let M be a Riemannian manifold with lower bounded Ricci curvature and f be a M-valued summary with global sensitivity . The Riemannian Gaussian distribution with footprint f(D) and rate
is
-GDP if and only if
satisfies the following condition,
where A
Proof. See Appendix A.2.
Given the rate , Theorem 3.2 provides us a way of computing the privacy budget
through the inequality (2). When
enjoys a tractable form,
, and the inequality (2) then reduces to,
where the equality holds if and only if , which reduces to the Gaussian mechanism on Euclidean spaces in Dong et al. [2022]. It’s worth pointing out that on
, the Pythagorean theorem allows us to reduce the integrals to one-dimensional integrals resulting in a simple solution for any dimension d. Unfortunately, the lack of Pythagorean theorem on non-Euclidean spaces makes it difficult to evaluate the integrals on manifolds of dimensions greater than one. It’s known that any smooth, connected one-dimensional manifold is diffeomorphic either to the unit circle
interval of R [Milnor and Weaver, 1997]. Therefore, we encompass all one-dimensional cases by presenting the following result on
Corollary 3.2.1 (Riemannian Gaussian Mechanism on ). Let f be a
-valued summary with global sensitivity
. The Riemannian Gaussian distribution
-GDP if and only if
satisfies the following condition,
Proof. See Appendix A.3.
In summary, Corollary 3.2.1 provides us the analytical form for the integrals in (2). By specifying the rate and sensitivity
, it becomes feasible to compute the privacy budget
, which is the smallest
such that
. The systematic procedure is summarized in Algorithm 1. It is important to emphasize that the result in Corollary 3.2.1 together with the existing Euclidean space result covers all one-dimensional cases, and for manifolds with dimension d > 1, we will tackle it in Section 4.
In Section 3, we demonstrate that our proposed Riemannian Gaussian distribution can be used to achieve GDP in Theorem 3.1 and 3.2. Furthermore, we document our method in Algorithm 1 to compute the privacy budget in
. This and already existing Euclidean results encompass all the one-dimensional Riemannian manifolds. However, for general Riemannian manifolds of dimension
d > 1, the integrals in inequality (2) are difficult to compute (see Section 3 for more explanation). One of the central difficulties is the dependence of the normalizing constant on the footprint
makes the expression of A intractable. To avoid this dependence, we introduce the concept of homogeneous Riemannian manifolds.
Definition 4.1 (Definition 4.6.1 in Berestovskii and Nikonorov [2020]). A Riemannian manifold (M, g) is called a homogeneous Riemannian manifold if a Lie group G acts transitively and isometrically on M.
Homogeneous Riemannian manifolds encompass a broad class of manifolds that are commonly encountered in statistics such as the (hyper)sphere [Bhattacharya and Bhattacharya, 2012, Mardia et al., 2000], the space of SPD Matrices[Pennec et al., 2019, Said et al., 2017, Hajri et al., 2016], the Stiefel manifold [Chakraborty and Vemuri, 2019, Turaga et al., 2008] and the Grassmann manifold [Turaga et al., 2008]. It’s a more general class than Riemannian symmetric space, which is a common setting used in geometric statistics [Cornea et al., 2017, Asta, 2014, Said et al., 2018, Said, 2021, Chevallier et al., 2022]. Please refer to Appendix A.4 for more detail.
Informally, a homogeneous Riemannian manifold looks geometrically the same at every point. The transitive property required in the definition implies that any homogeneous Riemannian manifold M only has one orbit. Therefore, we have the following proposition.
Proposition 4.1. If M is a homogeneous Riemannian manifolds, then
Therefore, on homogeneous Riemannian manifolds, we can simplify the set A in Theorem 3.2 to . To further simplify (2), we will need a much stronger assumption than homogeneous Riemannian manifolds. In particular, we require the condition of constant curvature.
Theorem 4.1 (Riemannian Gaussian Mechanism on Manifolds with Constant Curvature). Let M be a Riemannian manifold with constant curvature and f be a M-valued summary with global sensitivity . The Riemannian Gaussian distribution with footpoint
if and only if
satisfies the following condition:
where A
Proof. See Appendix A.5.
Theorem 4.1 tells us that instead of evaluating the integrals of 2 on every neighboring pairs and
, we only need to check one such pair. Despite the convenience it provides us, evaluating the integrals remains a challenge, even for an elementary space like
with d > 1. To circumvent this challenge, we employ the MCMC technique for the integral computations. The main idea is simple: Let
be independent and identically distributed random variables from
denotes
. By the strong law of large number [Dudley, 2002], we then have
By using as an approximation, we avoid the challenge of evaluating the integrals analyti- cally. The detailed algorithm is documented in Algorithm 2. It’s known that any space of constant curvature is isomorphic to one of the spaces: Euclidean space, sphere, and hyperbolic space [Vinberg et al., 1993, Woods, 1901]. Therefore, Algorithm 2 offers a straightforward and practical method for assessing the privacy budget
on spheres and hyperbolic spaces. Furthermore, Algorithm 2 can be extended to a more general class of Riemannian manifold by sampling more than one pair of
step 1. The determination of the number of pairs being sampled and the selection method to ensure sufficient dissimilarity among the sampled pairs is an aspect that requires further investigation.
4.1 Sampling from Riemannian Gaussian Distribution
The one crucial step in Algorithm 2 involves sampling from the two Riemannian Gaussian distributions and
. Since their densities (1) are known up to a constant, a Metropolis-Hasting algorithm would be a natural choice. In this section, we describe a general Metropolis-Hasting algorithm for sampling from a Riemannian Gaussian distribution on an arbitrary homogeneous Riemannian manifold [Pennec et al., 2019]. However, there are more efficient sampling algorithms that are tailored to specific manifolds (e.g., [Said et al., 2017, Hauberg, 2018]).
The Metropolis-Hasting algorithm involves sampling a candidate y from a proposal distribution . The acceptance probability of accepting y as the new state is
. A natural choice for the proposal distribution
be an exponential-wrapped Gaussian distribution [Galaz-Garcia et al., 2022, Chevallier et al., 2022]. Informally, it’s the distribution resulting from "wrapping" a Gaussian distribution on tangent space back to the manifold using the exponential map. Given the current state
, we sample a tangent vector
on the tangent space
. If
is less than the injectivity radius, we then accept the newly proposed state
with probability
. Please refer to Section 2.5 in Pennec et al. [2019] for the detailed algorithm.
4.2 GDP on R and S1
To evaluate the performance of Algorithm 2, we will conduct simulations on Euclidean space R and unit circle as the relation between
is established in Dong et al. [2022] and Corollary 3.2.1.
We fix sensitivity and let
with
. For each
, we determine the privacy budget
using two approaches: (i) using Algorithm 1 with
for R; (ii) using Algorithm 2 with
for
and
. Since Algorithm 2 is a randomized algorithm, we generate 20 replicates for approach (ii) for each
. In Figure 1, the first panel plots the sample means of the
generated by approach (ii) (in grey with rectangular symbols) with error bars indicating the minimum & the maximum of the
’s and the
computed by approach (i) (in red with circular symbols). Additionally, as a comparison, we also plot the
for Euclidean space (in blue with triangular symbols) in the first plot. As we see from the first plot, the GDP results on
are almost exactly the same as on R for smaller
. This is expected as manifolds are locally Euclidean,
Figure 1: First and Second plots: Red lines with circular symbols represent the relation between privacy budget and rate
on the unit circle
. Blue lines with triangular symbols represent the relation in Euclidean space. Gray lines with rectangular symbols plot the sample mean of the
the 20 repeats, computed at a variety of
using Algorithm 2. The error bar indicates the minimum and maximum of the
’s. Refer to Section 4.2 for details. Third plot: Blue line with triangular symbols indicates the sample mean, across 100 repeats, of the Riemannian distances
, while the red line with circular symbols indicates the sample mean of the Riemannian distances
. The error bands indicate the sample mean
SE. Refer to Section 5.2 for details.
and the smaller the the closer the results will be. As the
gets larger, the privacy budget
gets smaller on
compared to on R. This can be explained by the fact that
is a compact space while R is not. For the second panel, we plot the exactly same things for Euclidean spaces. As we can observe from both panels, our Algorithm 2 gives a fairly accurate estimation for larger
as
gets smaller, Algorithm 2 has a tendency to generate estimates that exhibit a higher degree of overestimation.
In this section, we evaluate the utility of our Riemannian Gaussian mechanism by focusing on the task of releasing differentially private Fréchet mean. Specifically, we conduct numerical examples on the unit sphere, which is commonly encountered in statistics [Bhattacharya and Bhattacharya, 2012, Mardia et al., 2000]. As a comparison to our Riemannian Gaussian mechanism, we use the Riemannian Laplace mechanism implemented in Reimherr et al. [2021] to achieve GDP. Although the Riemannian Laplace mechanism is developed originally to achieve -DP, it’s shown in Liu et al. [2022] any mechanism that satisfies
-DP can achieve
-GDP with
. Our results show significant improvements in utility for the privatization of the Fréchet mean when using our proposed Gaussian mechanism, as opposed to the Laplace mechanism, in both examples. In Section 5.1, we cover some basics on differentially private Fréchet mean. In Section 5.2, we discuss the numerical results on the sphere. Simulations are done in R on a Mac Mini computer with an Apple M1 processor with 8 GB of RAM running MacOS 13. For more details on each simulation, please refer to Appendix A.6. The R code is available in the GitHub repository: https://github. com/Lei-Ding07/Gaussian-Differential-Privacy-on-Riemannian-Manifolds
5.1 Differentially Private Fréchet Mean
For more details on Fréchet mean under the DP setting, please refer to Reimherr et al. [2021]. Consider a set of data on M. The Euclidean sample mean can be generalized to Riemannian manifolds as the sample Fréchet mean, which is the minimizer of the sum-of-squared distances to the data,
. To ensure the existence & uniqueness of the Fréchet mean and to determine its sensitivity, we need the following assumption.
Assumption 1. The data for some
, where
with
for
and
for
. Note
denotes an upper bound on the sectional curvatures of M.
Under Assumption 1, we can then compute the sensitivity of the Fréchet mean [Reimherr et al., 2021]. consider two datasets . If
and
are the two sample Fréchet means of D and
respectively, then
5.2 Sphere
First, we revisit some background materials on spheres, refer to Bhattacharya and Bhattacharya [2012], Reimherr et al. [2021] for more details. We denote the d-dimensional unit sphere as and identify it as a subspace of
as
. Similarly, at each
, we identify the tangent space
as
. The geodesics are the great circles,
with
where
denotes the geodesic starts at p with unit direction vector v. The exponential map
is given by
and
for
. The inverse of the exponential map
has the expression
and
for
. It follows that the distance function is given by
. Therefore,
has an injectivity radius of
We initiate our analysis by generating sample data from a ball of radius
on
and subsequently computing the Fréchet mean
. To disseminate the private Fréchet mean, we implement two methods: (i) We first generate the privatized mean
by drawing from
employing the sampling method proposed by Hauberg [2018]. The privacy budget
is then computed using Algorithm 2. (ii) Next, we convert
-GDP to the equivalent
-DP using
, and generate the privatized mean
by sampling from the Riemannian Laplace distribution with footprint
using the sampling method introduced by You and Shung [2022].
Throughout these simulations, we fix the sample size at n = 10 to maintain a constant sensitivity held constant, we let the rate
. The objective here is to discern the difference between the two distances
across varying privacy budgets
. The third plot in Figure 1 displays the sample mean of the Riemannian distances
(in red with circular symbols) and
(in blue with triangular symbols) across 1000 iterations with the error band indicating the sample mean
. From observing the third plot, we see that our Gaussian mechanism achieves better utility, especially with a smaller privacy budget
. With larger
the gain in utility is less pronounced. One obvious reason is that there are much fewer perturbations with larger
for both approaches, so the difference is subtle. The other reason is that Algorithm 2 has a tendency to overestimate
with smaller
. Effectively,
-GDP with a smaller
compared to
In this paper, we extend the notion of GDP over general Riemannian manifolds. Then we showed that GDP can be achieved when using Riemannian Gaussian distribution as the additive noises. Furthermore, we propose a general MCMC-based algorithm to compute the privacy budget on manifolds with constant curvature. Lastly, we show through simulations that our Gaussian mechanism outperforms the Laplace mechanism in achieving
-GDP on the unit sphere
There are many future research directions. First of all, the framework established in this paper can be used for extending -DP to general Riemannian manifolds. There are several points of improvement around Algorithm 2 as well. Although Algorithm 2 provides us a general method of computing the privacy budget
, it lacks an error bound on its estimation. Furthermore, due to the random nature of Algorithm 2, a variance estimator of the output is desirable and can better inform the end user. Though we demonstrate the utility of our mechanism through simulation in Section 5, it’s difficult to obtain a theoretical utility guarantee due to the lack of simple analytical relation between
. Furthermore, although Algorithm 1 requires the manifolds to have constant curvature, it’s possible to extend Algorithm 1 and Theorem 4.1 to a slightly more general class of manifolds. Additionally, the Riemannian Gaussian distribution defined in this paper is not the only way of extending Gaussian distribution to Riemannian manifolds as mentioned in Section 3. Potentially, the two other approaches, the wrapped distribution approach, and the heat kernel approach can also be used to achieve GDP on Riemannian manifolds as well. In particular, there are many rich results around heat kernel on Riemannian manifolds (see Grigoryan [2009] for example). Incorporating the heat kernel in a privacy mechanism presents ample potential for novel and noteworthy discoveries.
We would like to express our greatest appreciation to Professor Xiaowei Wang of Rutgers University and Professor Xi Chen from the University of Alberta for their invaluable guidance, support, discussion, and expertise throughout the course of this research. This work was supported by the Canada CIFAR AI Chairs program, the Alberta Machine Intelligence Institute, the Natural Sciences and Engineering Council of Canada (NSERC), the Canada Research Chair program from NSERC, and the Canadian Statistical Sciences Institute. We also thank Qirui Hu from Tsinghua University and all the constructive suggestions and comments from the reviewers.
V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications, 29 (1):328–347, 2007. doi: 10.1137/050637996. URL https://doi.org/10.1137/050637996.
D. M. Asta. Kernel density estimation on symmetric spaces. In International Conference on Geometric Science of Information, 2014.
B. Balle and Y.-X. Wang. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 394–403. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/ v80/balle18a.html.
B. Balle, G. Barthe, and M. Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6280–6290, Red Hook, NY, USA, 2018. Curran Associates Inc.
A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Riemannian geometry applied to bci classifica- tion. Latent Variable Analysis and Signal Separation, 09 2010. doi: 10.1007/978-3-642-15995-4_ 78.
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In Proceedings of the TwentySixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’07, page 273–282, New York, NY, USA, 2007. Association for Computing Machinery. ISBN 9781595936851. doi: 10.1145/1265530.1265569. URL https://doi.org/10.1145/1265530. 1265569.
M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(85): 2399–2434, 2006. URL http://jmlr.org/papers/v7/belkin06a.html.
V. Berestovskii and Y. Nikonorov. Riemannian manifolds and homogeneous geodesics. Springer, 2020.
A. Bhattacharya and R. Bhattacharya. Nonparametric Inference on Manifolds: With Applications to Shape Spaces. IMS monograph. Cambridge University Press, 2012. ISBN 9781107019584.
M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. 14th Theory of Cryptography Conference, 2016. URL https://arxiv.org/abs/1605. 02065.
R. Chakraborty and B. C. Vemuri. Statistics on the Stiefel manifold: Theory and applications. The Annals of Statistics, 47(1):415 – 438, 2019. doi: 10.1214/18-AOS1692. URL https: //doi.org/10.1214/18-AOS1692.
G. Cheng and B. C. Vemuri. A novel dynamic system in the space of spd matrices with applications to appearance tracking. SIAM J. Img. Sci., 6(1):592–615, jan 2013. doi: 10.1137/110853376. URL https://doi.org/10.1137/110853376.
E. Chevallier, D. Li, Y. Lu, and D. Dunson. Exponential-wrapped distributions on symmetric spaces. SIAM Journal on Mathematics of Data Science, 4(4):1347–1368, 2022.
E. Cornea, H. Zhu, P. Kim, and J. G. Ibrahim. Regression models on riemannian symmetric spaces. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 79(2):463–482, 2017. ISSN 13697412, 14679868. URL http://www.jstor.org/stable/44682521.
J. Dong, W. Su, and L. Zhang. A central limit theorem for differentially private query answering. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 14759–14770. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ 7c2c48a32443ad8f805e48520f3b26a4-Paper.pdf.
J. Dong, A. Roth, and W. J. Su. Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37, 2022.
I. L. Dryden. Statistical analysis on high-dimensional spheres and shape spaces. The Annals of Statistics, 33(4):1643–1665, 2005. ISSN 00905364. URL http://www.jstor.org/stable/ 3448620.
I. L. Dryden, A. Koloydenko, and D. Zhou. Non-euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. The Annals of Applied Statistics, 3(3):1102–1123, 2009. ISSN 19326157, 19417330. URL http://www.jstor.org/stable/30242879.
R. M. Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2nd edition, 2002. doi: 10.1017/CBO9780511755347.
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9:211–407, aug 2014. ISSN 1551-305X. doi: 10.1561/0400000042. URL https: //doi.org/10.1561/0400000042.
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the 24th Annual International Conference on The Theory and Applications of Cryptographic Techniques, EUROCRYPT’06, page 486–503, Berlin, Heidelberg, 2006a. Springer-Verlag. ISBN 3540345469. doi: 10.1007/11761679_29. URL https://doi.org/10.1007/11761679_29.
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006b. Springer Berlin Heidelberg.
F. Galaz-Garcia, M. Papamichalis, K. Turnbull, S. Lunagomez, and E. Airoldi. Wrapped distributions on homogeneous riemannian manifolds. arXiv preprint arXiv:2204.09790, 2022.
A. Grigoryan. Heat kernel and analysis on manifolds. American Mathematical Soc., 2009.
H. Hajri, I. Ilea, S. Said, L. Bombrun, and Y. Berthoumieu. Riemannian laplace distribution on the space of symmetric positive definite matrices. Entropy, 18(3), 2016. ISSN 1099-4300. doi: 10.3390/e18030098. URL https://www.mdpi.com/1099-4300/18/3/98.
A. Han, B. Mishra, P. Jawanpuria, and J. Gao. Differentially private riemannian optimization. arXiv preprint arXiv:2205.09494, 2022.
S. Hauberg. Directional statistics with the spherical normal distribution. In 2018 21st International Conference on Information Fusion (FUSION), pages 704–711, 2018. doi: 10.23919/ICIF.2018. 8455242.
S. Helgason. Differential geometry and symmetric spaces. New York and London: Academic Press, 1962.
P. Kairouz, S. Oh, and P. Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037–4049, 2017. doi: 10.1109/TIT.2017.2685505.
J. M. Lee. Riemannian manifolds: an introduction to curvature. Springer Science & Business Media, 2006.
Y. Liu, K. Sun, B. Jiang, and L. Kong. Identification, amplification and measurement: A bridge to gaussian differential privacy. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id= UpNCpGvD96A.
K. V. Mardia, P. E. Jupp, and K. Mardia. Directional statistics. Wiley, 2000.
F. McSherry and K. Talwar. Mechanism design via differential privacy. Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 94–103, 11 2007. doi: 10.1109/FOCS.2007.66.
J. Milnor and D. W. Weaver. Topology from the differentiable viewpoint. Princeton university press, 1997.
I. Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017.
P. Niyogi. Manifold regularization and semi-supervised learning: Some theoretical analyses. Journal of Machine Learning Research, 14(37):1229–1250, 2013. URL http://jmlr.org/papers/ v14/niyogi13a.html.
X. Pennec. Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision, 25:127–154, 07 2006. doi: 10.1007/s10851-006-6228-4.
X. Pennec, S. Sommer, and T. Fletcher. Riemannian geometric statistics in medical image analysis. Academic Press, 2019.
P. Petersen. Riemannian geometry. Springer, 3rd edition, 2006.
M. Reimherr and J. Awan. Kng: The k-norm gradient mechanism. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ faefec47428cf9a2f0875ba9c2042a81-Paper.pdf.
M. Reimherr, K. Bharath, and C. Soto. Differential privacy over riemannian manifolds. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 12292–12303. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ 6600e06fe9350b62c1e343504d4a7b86-Paper.pdf.
S. Said. Statistical models and probabilistic methods on riemannian manifolds, 2021.
S. Said, L. Bombrun, Y. Berthoumieu, and J. H. Manton. Riemannian gaussian distributions on the space of symmetric positive definite matrices. IEEE Transactions on Information Theory, 63(4): 2153–2170, 2017. doi: 10.1109/TIT.2017.2653803.
S. Said, H. Hajri, L. Bombrun, and B. C. Vemuri. Gaussian distributions on riemannian symmetric spaces: Statistical learning with structured covariance matrices. IEEE Transactions on Information Theory, 64(2):752–772, 2018. doi: 10.1109/TIT.2017.2713829.
C. Soto, K. Bharath, M. Reimherr, and A. Slavkovi´c. Shape and structure preserving differential privacy. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24693–24705. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/ file/9c84feb75eae1ef6389f31b3ef050b6a-Paper-Conference.pdf.
P. Turaga, A. Veeraraghavan, and R. Chellappa. Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. doi: 10.1109/CVPR.2008.4587733.
P. K. Turaga and A. Srivastava. Riemannian Computing in Computer Vision. Springer Publishing Company, Incorporated, 1st edition, 2015. ISBN 3319229567.
S. Utpala, A. Han, P. Jawanpuria, and B. Mishra. Improved differentially private riemannian optimization: Fast sampling and variance reduction. Transactions on Machine Learning Research, 2023a. ISSN 2835-8856. URL https://openreview.net/forum?id=paguBNtqiO.
S. Utpala, P. Vepakomma, and N. Miolane. Differentially private fréchet mean on the manifold of symmetric positive definite (SPD) matrices with log-euclidean metric. Transactions on Machine Learning Research, 2023b. ISSN 2835-8856. URL https://openreview.net/forum?id= mAx8QqZ14f.
E. B. Vinberg, V. Minachin, D. Alekseevskij, O. Shvartsman, and A. Solodovnikov. Geometry II: spaces of constant curvature. Springer, 1993.
L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010. ISSN 01621459. URL http://www.jstor. org/stable/29747034.
F. S. Woods. Space of constant curvature. Annals of Mathematics, 3(1/4):71–112, 1901. ISSN 0003486X. URL http://www.jstor.org/stable/1967636.
K. You and D. Shung. On the spherical laplace distribution, 2022.
P. Zanini, M. Congedo, C. Jutten, S. Said, and Y. Berthoumieu. Transfer learning: A riemannian geometry framework with applications to brain–computer interfaces. IEEE Transactions on Biomedical Engineering, 65(5):1107–1116, 2018. doi: 10.1109/TBME.2017.2742541.
M. Zhang and T. Fletcher. Probabilistic principal geodesic analysis. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips. cc/paper_files/paper/2013/file/eb6fdc36b281b7d5eabf33396c2683a2-Paper.pdf.
A.1 Theorem 3.1
First, we need to introduce the notion of privacy profile [Balle et al., 2018]:
The privacy profile of a mechanism M is a function associating to each privacy parameter
a bound on the
-divergence between the results of running the mechanism on two adjacent datasets, i.e.
where the
-divergence
between two probability measures
is defined as
where E ranges over all measurable subsets of Informally speaking, the privacy profile represents the set of all of the privacy parameters under which a mechanism provides differential privacy. Furthermore, the privacy profile can be computed using Theorem 5 of Balle and Wang [2018]:
where is the privacy loss random variable of the mechanism M on inputs
defined as
, where
, and
. It follows that for our Riemannian Gaussian mechanism M, the privacy profile
can be rewritten as
where and
. This is exactly the left-hand side of (2).
The following theorem establishes a connection between GDP and privacy profile:
Theorem A.1 (Theorem 3.3 of Liu et al. [2022]). Let . A privacy mechanism A with the privacy profile
is
-GDP if and only if
and
is no smaller than
Theorem A.1 implies that a mechanism with finite is
-GDP for some privacy budget
. Note that Theorem 3.1 only tells us that Riemannian Gaussian distribution can be used to achieve
for some
. Therefore, to prove it, we only need to show that
is finite. We will show
by demonstrate that
. To do so, we will need the Bishop-Gromov comparison theorem:
Theorem A.2 (Bishop-Gromov; Lemma 7.1.4 in Petersen [2006]). Let M be a complete n-dimensional Riemannian manifold whose Ricci curvature satisfies the lower bound
for a constant . Let
be the complete n-dimensional simply connected space of constant sectional curvature K (and hence of constant Ricci curvature
). Denote by B(p, r) the ball of radius r around a point p, defined with respect to the Riemannian distance function. Then, for any
, the function
is non-increasing on . As r goes to zero, the ratio approaches one, so together with the monotonicity this implies that
Bishop-Gromov comparison theorem not only gives us the control of volume growth of certain manifolds but also gives a rough classification by sectional curvature. Besides, this is a global property in the sense that can be arbitrary points on the manifolds.
A.1.1 Proof of Theorem 3.1
Proof. By A.1, we only need to show that for any
where A is given by
Let’s consider
where . Thus we have,
Let , note that since
, we have
. Thus, we only need to prove the following:
when . One can easily show the following inequality,
By Theorem A.2, we have the following three cases:
1. K > 0. Then the standard space is the n-sphere of radius
is obviously less than the volume of the whole space
. Thus we have
where is a constant relative to the dimension n. One can easily find that as
will cover the
, and the right-hand side of inequality 5 will approach to
2. K = 0. Then the standard space dimensional Euclidean space
The same with above, when will cover the
and the right-hand side of 6 will approach to
3. K < 0. The standard space is the hyperbolic
. The hyperbolic volume
is given by
where the hyperbolic function is given by . It’s not hard to see that
Plugging the 8 into 7, we have
The principle part of the exponent of the right-hand side of 10 is still . Thus, when
, it approaches to
A.2 Theorem 3.2
Proof. Follows directly from Definition 2.2, Theorem 3.1 and Theorem 5 in Balle and Wang [2018].
A.3 Corollary 3.2.1
Proof. In this proof, we will parameterize points on using their polar angles.
On , the Riemannian Gaussian distribution with footprint
has the following density,
Note that since has constant curvature, we can use Theorem 4.1 instead of Theorem 3.2
WLOG we assume . Given an arbitrary
A takes the following form,
and it follows that we must have
Thus we have
We have evaluated the first integral, and now let’s consider the second integral. For
Thus for
Similarly, for , we have,
Thus for ε >
Put (11), (12) and (13) together with Theorem 4.1, we have proved Corollary 3.2.1.
A.4 Homogeneous Riemannian Manifolds
For more detailed treatment on homogenous Riemannian manifolds and related concepts, refers to Helgason [1962], Berestovskii and Nikonorov [2020], Lee [2006] for details and Chakraborty and Vemuri [2019] for a more concise summary.
A.4.1 Group actions on Manifolds
In this section, we will introduce some basic facts about group action which will be used to introduce homogeneous Riemannian manifolds in later section. The materials covered in this section can be found in any standard Abstract Algebra texts.
Definition A.1. A group is a non-empty set G together with a binary operation
such that the following three axioms are satisfied:
• Associativity:
• Identity element:
• Inverse element:
Definition A.2. Let G be a group and X be an arbitrary set. A left group action is a map , that satisfies the following axioms:
• α(e, x) = x
• α
Note here we use the juxtaposition gh to denote the binary operation in the group. If we shorten , it’s equivalent to say that
Note each induces a map
A.4.2 Homogeneous Riemannian manifolds, symmetric spaces and spaces of constant curvature
Let M be a Riemannian manifold and I(M) be the set of all isometries of M, that is, given . It is clear that I(M) forms a group, and thus, for a given
is a group action. We call I(M) the isometry group of M.
Consider Stabilizer of
Given
, its linear representation
in the tangent space
is called the isotropy representation and the linear group
is called the isotropy group at the point x.
We say that G acts transitively on M, iff, given , there exists a
Definition A.3 ([Helgason, 1962]). Let G = I(M) act transitively on (called the "origin" of M ) be a subgroup of G. Then M is called a homogeneous Riemannian manifold and can be identified with the quotient space G/H under the diffeomorphic mapping
By definition, we have . More importantly, any integrable function
, we have [Helgason, 1962]
This property leads to Proposition 4.1.
Definition A.4 ([Helgason, 1962]). A Riemannian symmetric space is a Riemannian manifold M such that for any , there exists
called symmetry at x.
That is, a Riemannian symmetric space is a Riemannian manifold M with the property that the geodesic reflection at any point is an isometry of M. Note that any Riemannian symmetric space is a homogeneous Riemannian manifold, but the converse is not true.
Definition A.5 ([Vinberg et al., 1993]). A simply-connected homogeneous Riemannian manifold is said to be a space of constant curvature if its isotropy group (at each point) is the group of all orthogonal transformations with respect to some Euclidean metric.
Once again, a space of constant curvature is a symmetric space but the converse is not true.
A.5 Theorem 4.1
Proof. Let G be the isometry group of be arbitrary points such that
. By Corollary 4.1, the set A reduces to
What we need to show is the following, for any points
where A. It’s sufficient to show
We can separate the proof into three cases: (1) ; (2)
; (3)
Case (1):
It follows that is in the sphere centered at
with radius
. (14) then follows from the rotational symmetry of the constant curvature spaces.
Case (2):
Same as case (1).
Case (3):
For any , there exists
, such that
For, the proof is the same. Combine with the result of case (1), we have finished the proof for case (3).
A.6 Simulation Details
For sampling from Riemannian Gaussian distribution (section 4.2), we first sample from truncated normal distribution with
and
, then embed the sample to R and lastly counter-wise rotate the sample with degree
For simulation on (section 5.2), we choose the pair of
and
to be (1, 0, 0) and
. Though any pair
with
works, we simply choose this specific pair for convenience. For Fréchet mean computation, we use a gradient descent algorithm described in Reimherr and Awan [2019].
A.6.1 R Codes
For simulations in section 4.2, refer to R files euclid_functions.R & euclid_simulation.R for Euclidean space and sphere_functions.R & s1_simulation.R for unit circle . For simulations in section 5.2, refer to R files sphere_functions.R & sphere_simulation.R.