b

DiscoverSearch
About
My stuff
Context-aware Data Aggregation with Localized Information Privacy
2018·arXiv
Abstract
Abstract

In this paper, localized information privacy (LIP) is proposed, as a new privacy definition, which allows statistical aggregation while protecting users’ privacy without relying on a trusted third party. The notion of context-awareness is incorporated in LIP by the introduction of priors, which enables the design of privacy-preserving data aggregation with knowledge of priors. We show that LIP relaxes the Localized Differential Privacy (LDP) notion by explicitly modeling the adversary’s knowledge. However, it is stricter than  2ǫ-LDP and ǫ-mutualinformation privacy. The incorporation of local priors allows LIP to achieve higher utility compared to other approaches. For four different applications in privacy-preserving data aggregation, including survey, summation, weighted summation and histogram, we present an optimization framework, with the goal of minimizing the mean square error while satisfying the LIP privacy constraints. Utility-privacy tradeoffs are obtained under each model in closed-form, we then theoretically compare with the centralized information privacy and LDP. At last, we validate our analysis by simulations using both synthetic and real-world data. Results show that our LIP mechanism provides better utility-privacy tradeoffs than LDP and when the prior is not uniformly distributed, the advantage of LIP is even more significant.

Index Terms—privacy-preserving data aggregation, differential privacy, information-theoretic privacy

PRIVACY issues are crucial in this big data era, as users’data are collected both intentionally or unintentionally by a large number of private or public organizations. Most of the collected data are used for ensuring high quality of service, but may also put one’s sensitive information at potential risk. For instance, when someone is rating a movie, his/her preferences may be leaked; when someone is searching for a parking spot nearby using a smartphone, his/her real location is uploaded and may be prone to leakage. To mitigate such privacy leakage, it is desirable to design privacy-preserving mechanisms that provide strong privacy guarantees without affecting data utility. Traditional privacy notions such as k-anonymity [2] do not provide rigorous privacy guarantee and are prone to various attacks. On the other hand, Differential Privacy (DP) [3], [4] has become the defacto standard for ensuring data privacy in the database community [5]. The definition of DP assures each user’s data has minimal influence on the output of certain types of queries on a database. In the classical DP setting, it is assumed that there is a trusted server which perturbs users’ data while answering queries. However, more often then not,

image

B. Jiang M. Li and R. Tandon are with the Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, 85742 USA email: {bjiang, lim, tandonr}@email.arizona.edu

organization collecting users’ data may not be trustworthy and the database storage system may not be secure [6].

Recently, localized privacy protection mechanisms have gained attention as this setting allows local data aggregation while protecting each user’s data without relying on a trusted third party [7]. In localized privacy-preserving data release, each user perturbs his or her data before uploading it; organizations that want to take advantage of users’ data then aggregate with collected users’ published results. Earliest such mechanism is randomized response [8], which randomly perturbs each user’s data. However, the original randomized response does not have formal privacy guarantees. Later, Localized Differential Privacy (LDP) was proposed as a local variant of DP that quantifies the privacy leakage in the local setting [9]. Many schemes were proposed under the notion of LDP. For example, [10][12], and Google’s RAPPOR [13]. LDP based data aggregation mechanisms have already been deployed in the real-world. For example, in June 2016, Apple announced that it would deploy LDP-based mechanisms to collect user’s typing data [14]. However, Tang et al. show that although each user’s perturbation mechanism satisfies LDP, the privacy budget is too large (ǫ = 43)1 to provide any useful privacy protection. Wang et al. provide a variety of LDP protocols for frequency estimation [15] and compare their performance with Google’s RAPPOR. However, for a given reasonable privacy budget, these protocols provide limited utility. Intuitively, compared with the centralized DP model, it is more challenging to achieve a good utility-privacy tradeoff under the LDP model. The main reasons are two-fold: (1) LDP requires introducing noise at a significantly higher level than what is required in the centralized model. That is, a lower bound of noise magnitude of  Ωǫ(√N)is required for LDP, where N is the number of users. In contrast, only  Oǫ(1)is required for centralized DP [16]. (2) LDP does not assume a neighborhood constraint on users’ data as inputs, thus when the domain of data is very large, LDP leads to a significantly reduced utility [17].

In general, both localized and centralized DP provide strong context-free theoretical guarantees against worst-case adversaries [18]. Context-free means that there is no knowledge of users’ data (either instantaneous or statistical). On the other hand, context-aware privacy notions such as ones where statistical knowledge is available are favorable, as the utility can be increased by explicitly modeling the adversary’s knowledge. Information-theoretic privacy notions [19] [20] that incorporate statistical (prior) knowledge fall into this category, which use mutual information (MI) to measure the information leaked about the original database in the released data [21][23]. Compared with context-free privacy notions, context-aware privacy notions, especially prior-aware notions achieve a better utility-privacy tradeoff [18].

We next discuss a simple illustrative example to motivate the need and advantages of context-aware privacy notions.

Example 1. Consider taking a survey over N = 100 individuals, where each person is independently asked whether he/she has been infected by some kind of disease. It is known based on clinical studies that this disease infects 1 out of 10 people on average. Each individual holds a local true answer  Xi, where i is the individual’s index and  Xi = 1if his/her answer is yes, Xi = 0if the answer is no. For privacy consideration, each individual perturbs his/her data by a randomized response mechanism (shown in Fig. 1) before publishing it. The goal is to estimate the aggregate �Ni=1 Xibased on  Yi, i = 1, ..., N.

image

Fig. 1. Randomized response mechanism for each individual.

We first assume that the perturbation mechanism satisfies the context-free  ǫ-LDP for a given privacy budget  ǫ. By the definition of LDP, for each user:  max{ p1−q , q1−p, 1−qp , 1−pq } ≤ǫ. In [15], each user’s input  Xiis treated as a fixed instance and a pair of valid solution for (p, q) is  p = q = 1eǫ+1. Using the unbiased estimator adopted in [15], the expected error of the aggregate is  ELDP = Np(1−p)(p−q)2 = 100eǫ(eǫ+1)2. The authors then derived an optimal solution for (p, q) by minimizing this error while subject to the privacy constraints. Their optimal values of p and q are different, where  q∗ = 1 − eǫ/2and  p∗ = 0.5, resulting in an expected estimation error of EOpt−LDP = 25(0.5−eǫ/2)2in our example, which is smaller than  ELDP(as shown in Fig. 2). This approach increases utility by implicitly using prior knowledge, as it is based on the fact that the answers of a majority of users are zeros (q∗is smaller than  p∗). Unfortunately, the assumption that Xi, i = 1, 2, ..., Nare instances rather than random variables prohibits introducing prior in a principled manner. In addition, the definition of LDP is independent of the priors, which is unable to adjust the perturbation parameters based on different priors. To explicitly introduce prior knowledge, a new privacy definition is needed.

Assuming that each  Xiis a random variable with priors P1 = Pr(Xi = 1)and  P0 = Pr(Xi = 0), we propose a context-aware LIP notion which imposes a bound on the ratio between the prior and posterior. In this example, we have: −ǫ ≤ { P r(Yi=0)1−q , P r(Yi=0)q , P r(Yi=1)1−p , P r(Yi=1)p } ≤ ǫ. This notion guarantees that taking observations on the published data provides limited additional information of the real data. Restricted by this privacy notion, the optimal  p∗and  q∗which

image

Fig. 2. Comparison among the expected estimation error of different approaches in Example 1: considering prior in the perturbation mechanism significantly reduces the error.

minimize the mean square error (MSE) can be derived as: q∗ = P1/eǫand  p∗ = P0/eǫ. Intuitively, it reduces MSE by carefully adjusting p and q to different priors. By this perturbation mechanism, the resulting error is  ELIP = 10eǫ−10.64which is significantly smaller than  ELDP(as shown in Fig. 2). As we discussed above, introducing priors can provide

higher utility. However, this comes at the overhead of esti-

mating or learning prior knowledge. This can be obtained in two ways: (1) Sometimes, each individual user’s local prior is available (e.g., can be obtained by training based on historical published data) [24], [25]. For instance, when Google wants to survey multiple users’ current locations to construct a traffic heat-map, it is possible that it already possesses past reported (unperturbed) locations of each user. (2) However, local-prior is relatively strong knowledge on users’ data that may not always be attainable, and one may only be able to learn a global prior (assuming that users’ data are identically distributed). For example, when taking a periodic survey, aggregated results in the recent past can be used as the global prior. Another example is when estimating the frequency of a rare disease, one can leverage the results of past clinical research to obtain a global prior [26]. The main contributions of this paper are three-fold: (1) We propose a new notion of Localized Information

Privacy (LIP) for the local data release setting (without a

trusted third party), which relaxes the notion of LDP by introducing priors to increase data utility. We formally show that,  ǫ-LIP implies  ǫ-Mutual Information Privacy (MIP), and ǫ-LIP is sandwiched between  ǫ-LDP and 2ǫ-LDP. (2) We apply the LIP notion to privacy-preserving data

aggregation. Focusing on four applications including: survey,

(weighted) summation and histogram, we present a utility-privacy optimization framework with the goal of minimizing the mean squared error while satisfying the LIP constraints. We deploy the randomized response type of perturbation mechanisms. We formulate two perturbation and aggregation models, including: binary-input binary-output (BIBO) model, multiple-input multiple-output (MIMO) model, and derive the corresponding optimal perturbation parameters in closed form and discuss how each application can be realized. We show that one set of the optimal solution can minimize the MSE and the Mean absolute error (MAE) between the raw data and the perturbed data simultaneously. We theoretically demonstrate the advantages of the proposed mechanism and compare with the LDP mechanisms. For comparison, we also study a centralized version of data aggregation under information privacy (CIP), and derive its optimal utility-privacy tradeoff.

(3) We validate our analysis using simulations on both synthetic and real-world datasets (i.e., Karosak, a websiteclick stream data set, and Gowalla, a location aggregation data set). Both theoretical and simulation results show that optimal perturbation mechanisms under  ǫ-LIP always achieve a better utility-privacy tradeoff than those under  ǫ-LDP when  ǫ > 0, especially when the prior is not uniformly distributed. In the MIMO case, We show that when the domain size increases, the advantage of the LIP is enhanced. We show that the advantages of the context-aware LIP are two-fold: on one hand, users are setting the perturbation parameters according to the prior knowledge; on the other hand, the adversary is allowed to build prior-related estimators that can minimize the expected errors. When compared with CIP, it always leads to a better utility-privacy tradeoff than LIP, as the server possesses global information. However, the utility gap is not large.

The remainder of the paper is organized as follows. In Section II, we describe the proposed LIP notion and its relationship with other existing privacy notions. In Section III, we introduce the system model and problem formulation. In Section IV, we derive the utility-privacy tradeoff under the several applications. In Section V, we present the simulation results and compare utility-privacy tradeoffs among different models. In Section VI, we offer concluding remarks and discuss future directions.

In local privacy-preserving data release, each user uploads its perturbed data directly to an untrusted aggregator. In this section, we first recap two existing privacy notions in localized settings, and then present our new LIP definition.

The traditional LDP definition guarantees that each user’s perturbed data has a similar probability to result in the same output for any two inputs from the data domain D:

Definition 1. (ǫ-Localized Differential Privacy (LDP)) [27] A mechanism M which takes input X and outputs Y satisfies ǫ-LDP for some  ǫ ∈ R+, if  ∀x, x′ ∈ Dand  ∀y ∈ Range(M):

image

LDP provides strong context-free privacy guarantee against worst-case adversaries. However, there are many scenarios where some context of X is available (e.g., prior distribution). In such situations, introducing context provides relaxed privacy guarantees. One such definition is mutual information privacy, which uses the mutual information between Y , X to measure the average information leakage of X contained in Y :

Definition 2. (ǫ-Mutual Information Privacy (MIP)) [22] A mechanism M which takes input X and outputs Y , satisfies -MIP for some  ǫ ∈ R+, if the mutual information between X and Y satisfies  I(X; Y ) ≤ ǫ, where I(X; Y ) is:

image

Originally, MIP was proposed under the centralized setting where X is the database or individual items and Y is a query output. Here we can adapt it to the local setting, where X and Y are each individual user’s input and output.

Although MIP is context-aware, it is a relative weak privacy notion since it only bounds the average information leakage. There may exist some (x, y) pair that makes the ratio between the joint and product of marginal distributions very large (while the joint probability is very small).

In order to limit the information leakage of every pair of realizations of X and Y , we consider a bound on the ratio between the prior Pr(X) and posterior Pr(X|Y ), which leads to our proposed localized information privacy notion:

Definition 3. (ǫ-Localized Information Privacy (LIP)) A mechanism M which takes input X and output Y satisfies  ǫ-LIP for some  ǫ ∈ R+, if  ∀x, y ∈ D:

image

Intuitively, LIP guarantees that having the knowledge of users’ priors, the adversary can’t infer too much additional information about each input x by observing each output y. Note that, when  ǫis small, this ratio is bounded close to 1. Definition (3) can be viewed as the localized version of information privacy, which focused on a centralized setting [28], and the main differences are in the definitions of input and output. Again, here X and Y stand for each user’s input and output variables, respectively.

In the following, We show that LIP is a stronger privacy notion than MIP, since the latter only provides an average privacy guarantee while LIP bounds the leakage on every pair of realizations of X and Y .

Proposition 1. If a mechanism M satisfies  ǫ-LIP, it also satisfies  ǫ-MIP.

Proof. Assume that M satisfies  ǫ-LIP. By Bayes rules, we have that,  ∀x, y ∈ D:

e−ǫ ≤ Pr(X = x, Y = y)Pr(X = x)Pr(Y = y) ≤ eǫ.(4) Substituting (4) into (2), we get:

image

Furthermore, the following theorem shows the relationship between LIP and LDP:

Theorem 1. If a mechanism M satisfies  ǫ-LIP, then it also satisfies  2ǫ-LDP; if a mechanism M satisfies  ǫ-LDP, then it also satisfies  ǫ-LIP.

Proof. To prove the first part, consider a mechanism M that takes any two inputs  X = x, X = x′and outputs the same Y = y.

When M satisfies  ǫ-LIP, using the Bayes rule, Definition (3) is equivalent to:

image

image

Since the above also holds for  X = x′, we have:

e−ǫ ≤ Pr(Y = y)Pr(Y = y|X = x′) ≤ eǫ.(6) Inequality (5) is equivalent to:

image

Since both of the metrics in above inequalities are positive, by multiplying these two inequalities, we get:

image

Since we can switch x and  x′, it is equivalent to the definition of  2ǫ-LDP.

image

On the other hand:

image

by switching inputs, we can also get:

Pr(Y = y) ≥ e−ǫPr(Y = y|X = x),which means that M also satisfies  ǫ-LIP.

Thus,  ǫ-LIP is a more relaxed privacy notion than  ǫ-LDP. However, it is stronger than  2ǫ-LDP. Intuitively, LIP relaxes LDP because LDP results in the same output y for every input x, no matter what his/her prior is. On the other hand, for inputs with different priors, LIP perturbs differently. For example, when a user with Pr(X = 1) = 0.99, if he holds X = 1, which means his real data is consistent with the prior knowledge. As LIP bounds on the ratio between prior and posterior, it has a large probability to output 1 to make the posterior probability similar to its prior; if he holds X = 0, which has a small prior to happen, he also has large probability to output 1 to make the posterior probability small.

image

Fig. 3. System Model of Privacy-Preserving Data Aggregation.

A. System and Threat Models

Consider a data aggregation system with N users and a data curator. Each user i locally generates private data which is denoted as random variable  Xi, taking value  xifrom the domain  D = {a1, a2...ad}with probability  P ik = Pr(Xi = k). We assume that  Xis are independent from each other. Before publishing his/her data to the curator, each user locally perturbs it by a privacy-preserving mechanism  Mi. The output is denoted as  Yiwhich takes value  yifrom D. The mechanism Mimaps each possible input to each possible output with certain probability, the set of them are called perturbation parameters (denoted as  qi). After receiving each user’s perturbed data, the curator computes a statistical function on those data (for example, estimating the frequency of certain input which will be useful for data mining). The system model is depicted in Fig. 3.

The curator is considered as untrusted due to both internal and external threats. On the one hand, users’ private data is profitable and companies can be interested in user tracking or selling their data. On the other hand, data breaches may happen from time to time due to hacking activities. Denote the true aggregated result by  f( ¯X), where ¯X = {X1, X2, ..., XN}. The curator (adversary) observes all the users’ perturbed outputs ¯Y = {Y1, Y2, ..., YN}and tries to obtain an estimate of  S = f( ¯X). Furthermore, we assume that the adversary possesses knowledge of prior distributions of users’ inputs, and the algorithms/perturbation mechanisms that users adopt to publish their data. The curator aims at performing accurate estimations using all the information above, but is also interested in inferring each user’s real input  Xi.

For different applications of data aggregation, the definition of  f(·)s varies. In this paper, four most common applications are considered:

Survey: the curator is interested in estimating how many people in the survey possess a private value v. Each user’s data can be mapped into a binary bit as:  fi(Xi) =

1{v=Xi}, where1{a=b}is an indicator function, which is 1 if a = b; 0 if  a ̸= b. Then the curator just sum up all the collected data to get the answer. Thus S = f( ¯X) = �Ni=11{v=Xi}.

Summation: Summation results are usually used to measure an average property of the surveyed individuals. For example, the census bureau wants to survey the

average income over a group of people, mathematically, S = f( ¯X) = 1N�Ni=1 Xi. Weighted summation (linear combination): It’s straightforward to extend the direct summation to a weighted summation (plus an offset) for more general applications. For example, assume that the curator is interested in some particular users more than others, such as employer v.s. employees; adults v.s. children; the professionals v.s. amateurs. Thus, these important users’ data are assigned a larger coefficient than others. The offset can be used as a correction to the raw data. For example, avoiding summation blowing up or controlling data in a particular range. Thus  S = f( ¯X) = �Ni=1 (aiXi + bi).•Histogram: In this application, the curator is interested in estimating how many people possess each of the data category inD, or classify people according to their data value. For example, the curator wants to statistically estimate a location frequency matrix with each entry standing for how many people are within a particular location. For this application, S is a set of ”categorized” data: {S1, S2, ..., Sd}, such that,  ∀k ∈ D, Sk = �Ni=11{Xi=k}.

B. Privacy and Utility Definitions

The privacy of each user’s input satisfies LIP and is parameterized by the privacy budget (ǫ) in Definition (3). The smaller  ǫis, the higher privacy level the mechanism satisfies. Note that, for simplicity, we consider  ǫto be the same for all the users; however it is straightforward to extend our model and results to different  ǫfor different users. Under LIP, the privacy constraints can be formulated as:  ∀i ∈ {1, 2, ..., N}and  ∀xi, yi ∈ D, there is

image

Note that,  qi ≜ {Pr(Yi = yi|Xi = xi), ∀xi, yi ∈ D}. When  ǫis given, the set of inequalities in Eq. (7) constrains qito be within a feasible region  Ti, ∀i ∈ 1, 2, ..N.

The definition of utility depends on the application scenario. For example, in statistical aggregation, estimation accuracy is often measured by absolute error or mean square error [29] [30]; in location tracking, it is typically measured by Euclidean distance [24]; in privacy-preserving data publishing, distortion is usually used to measure the utility [22].

Focus on the four applications discussed above, we define utility as the inverse of the Mean Square Error (MSE): U(S, ˆS) = −E(S, ˆS), where  E(S, ˆS) = E[(S − ˆS)2], ˆSis the estimated S. Notice that, given  P i, E(S, ˆS)depends only on each user’s perturbation parameters:  q1, ..., qN, as any estimator ˆSwill depend on the output ¯Ywhose distribution is a function of  qi. Thus, maximizing the utility is equivalent to find the optimal parameters to minimize the MSE.

C. Problem Formulation

In general, there is a tradeoff between utility and privacy. We can formulate the following optimization problem to find the optimal perturbation mechanism that yields the optimal tradeoff:

image

From [31], it is well known that the optimal estimator that results in the minimized mean square error (MMSE) is ˆS =g( ¯Y ) = E[S| ¯Y ]. Since  E[E[S| ¯Y ]] = E[S], ˆSis an unbiased estimator. Therefore, we use the MMSE estimator in Eq. (8).

Problem Decomposition: Next, we show how the problem defined in Eq. (8) can be decomposed into the local cases.

Since we assume that each user’s input is independent from each other, all the  f(·)functions above can be decomposed into local functions of each  Xi: local functions in application 1 and application 4 are indicator functions (or vector); local functions in application 3 is a linear function. Without loss of generality, we denote the local function for user i as  fi.

In the local setting, users independently perturb their data, thus each of them results in a MSE in aggregation, which is denoted by  Ei = E[(fi(Xi) − E[fi(Xi)|Yi])2](for the fourth application, denote  Eki = E[(f ki (Xi)−E[f ki (Xi)|Yi])2]) as the MSE of user i when aggregating the k-th data, and the overall utility defined in Eq. (8) satisfies decomposition proposition:

Proposition 2. The global optimization problem defined in Eq. (8) can be decomposed into N local optimization problems, under independent user inputs.

image

Proof. Observe that, the  f(·)s in the four basic applications above can be expressed as a summation over all the  fi(Xi)s, as the semantic of aggregation implies a summation operation. thus the summation based MMSE estimator ˆScan be expressed as:

image

where (a) in Eq. (10) is due to the independence of  Xis, and (b) is because  Xiis only correlated with  Yiin the output sequence. Thus  E(S, ˆS)can be derived as:

image

Note that, for the application of histogram, the error forms a error vector that  (Sk, ˆSk)dk=1. By the definition of second order norm. the mean square error of this case is:

image

where  f ki (Xi) =1{Xi=k}. For the first three applications, we next show that the total MSE can be decompose into the summation of local MSEs. (the proof for histogram is shown in Appendix.D.

image

(b) Model with multiple input and multiple output (c) Centralized binary information privacy model (trusted server based) Fig. 4. Different models for the perturbation mechanism considered in this paper ((a) and (b) are for the i-th user).

image

The cross terms equal to 0 because  ∀j, l ∈ {1, N}and  j ̸= l:

image

E{E[f kl (Xi)|Yl]}are 0, thus,  E(S, ˆS) = �Ni=1 Eki (qi)We next show that the overall optimal solutions (perturbation parameters) satisfy each local privacy constraints:

image

achieved at  qi∗ ∈ Ti, then  E(q1∗, ..., qN∗) = �Ni=1 ei.If for some  userkwho takes parameters  qk ∈ Tk, by assumption, we know that  Ek(qk) ≥ ek. Thus

image

That means the minimal value of  E(q1, ..., qN), where  qi ∈Ti, ∀i ∈ [1, N]can be achieved if for each user,  qi = qi∗.

image

By proposition 2, when the perturbation parameters of each user are optimal, the overall MSE of the mechanism achieves its minimum. In addition, each user can perform its local optimization independent from each other, which well suits the local setting.

Notice that the resulted MSE by a MMSE estimator for each user is  Ei(qi) = E[V ar(fi(Xi)|Yi)]. By the law of total variance:

image

In the context-aware setting,  V ar[fi(Xi)]is a constant, thus the MSE is a function of the variance of each user’s estimator.

In this section, we study the privacy-utility tradeoffs by solving the optimization problems defined in Eq. (8). We start by a binary input binary output (BIBO) model where each user has an input/output range of {0, 1}, which well suits the application of survey. In the BIBO model, we illustrate the way of perturbation by analysis on the optimal solutions. we first assume that  fi(Xi) = Xidue to settings of the first three applications, we discuss optimal solutions for each of the four applications in section IV-C.

A. Utility-Privacy Tradeoff under the Binary Input Binary Output Model The binary model is widely used for survey: each individual’s data is first mapped to one bit, than randomly perturbed before publishing to the curator. For a binary input/output model, each user has a binary input range, i.e, D = {0, 1} (shown in Fig. 4(a)). As a direct result, the  V ar(Xi)in (14) becomes  P i1(1 − P i1). Denote the perturbation parameters by:

image

constraints. By Eq. (10), the MMSE estimator ˆXifor user i is derived as:

image

where  λi0=  Pr(Yi= 0) = (1  − P1)(1  − qi0) +  P1qi1and λi1=  Pr(Yi= 1) = (1  − P1)qi0+  P1(1 − qi1). On the other hand,

image

Take (17) into (14), the each user’s MSE function  Ei(qi0, qi1)can be derived as

image

For the privacy constraints, by Eq. (7), when the perturbation mechanism satisfies  ǫ-LIP: we have:

image

where  F i1, F i2, F i3, F i4are directly derived from Definition (3): F i1(qi0, qi1) = λi0qi1,  F i2(qi0, qi1) = λi11−qi1,  F i3(qi0, qi1) = λi01−qi0,

F i4(qi0, qi1) = λi1qi0. Then, the feasible region  Tiis defined as those  (qi0, qi1)pairs satisfying constraints in Eq. (19). By proposition 2, the optimization problem of Opt-binary-LIP can be reformulated as:

image

We have the following result:

Theorem 2. In Opt-binary-LIP, for the i-th user, the optimal (qi0, qi1)pairs that minimize  Ei(qi0, qi1)in problem (20) are: either  qi∗0 = P i1/eǫand  qi∗1 = (1−P i1)/eǫ, or  qi∗0 = 1−P i1/eǫand  qi∗1 = 1 − (1 − P i1)/eǫ, for any given  ǫ ≥ 0. The resulting MSE by (qi∗0 , qi∗1) is:

image

Proof. Here we outline the proof sketch (detailed proofs are shown in Appendix. A.

(1) We show that  Eiis monotonically increasing with  qi0and qi1within the region of  {qi0 ≥ 0} ∩ {qi1 ≥ 0} ∩ {qi0 + qi1 ≤ 1};

(2) We simplify the feasible region  Tiby showing that both Ei(qi0, qi1)and  Tiare symmetric w.r.t. point (0.5, 0.5); Then we change  Tito the monotonic region in step. (1).

(3) By the monotonicity, showing that the optimal solution is at the boundary of  Ti, which is a linear function of  (qi0, qi1).

(4) The final step is to show that optimal solution is at the intersection of two linear functions in step (3) by testing the monotonicity of  Ei(qi0, qi1)on the boundary.

Note that, the optimal solution of each user is achieved when F i1(qi0, qi1) = F i4(qi0, qi1) = eǫ(or  F i2(qi0, qi1) = F i3(qi0, qi1) =eǫ). Intuitively, to increase utility, we need the probability of perturbation as small as possible (when  qi0 + qi1 ≤ 0.5), and the smallest perturbation probability is bounded by the privacy constraints. As a result, the optimal solution is at the point where the privacy requirement is just met. The two optimal (qi∗0 , qi∗1 )pairs are symmetric w.r.t. (0.5, 0.5). This is due to the symmetric properties of the binary input/output model. The symmetric properties can also be explained as: if we do not consider privacy, utility is maximized in two ways: the first way is each user publishes his/her data directly; the second way is swapping his/her data from 0 to 1 and 1 to 0 before publishing it.

From  qi∗0 = P i1/eǫand  qi∗1 = (1 − P i1)/eǫ, we can see that qi∗0is proportional to  P i1, and  qi∗1is proportional to  1 − P i1. Intuitively, from the perspective of one user, when his/her true input value  xi’s prior is small, directly revealing  xiwill leak too much information about it. In such cases, to satisfy LIP constraints, a large perturbation probability is needed to limit the posterior about  xi. On the contrary, if  xihappens with a large prior, directly releasing  xiwill reveal little additional information. Thus a small perturbation probability can be used for  xiin this case.

Minimizing mean absolute error between the raw data and the perturbed data: In many applications, the published data by each user should contain certain information even for the parties with no prior knowledge. For example, the reported locations from smart phones: facilities or companies want to take advantages from the aggregated location data frequencies on one hand, the users are uploading locations for location based service on the other. As a result, although the locally published data are perturbed, it should remain a certain level of accuracy. As a result, We want  Yias close to  Xias possible as long as it still satisfies the privacy constraints. As  P iis given, we want the mean absolute error (MAE) between  Xiand  Yias small as possible, that is, we also want to minimize:

image

Regarding at the optimal solutions of the binary model, we have the following proposition:

Proposition 3. The optimal solution that minimize the MSE defined in (18) and the MAE defined in (22) simultaneously is:  qi∗0 = P i1/eǫand  qi∗1 = (1 − P i1)/eǫ.

Proof. Notice that there are two set of optimal solutions under the Opt-binary-LIP model.

image

As  P i1is given, minimizing (23) is equivalent to minimize  qi1and  qi0, which are also restricted by the privacy constraint, thus the optimal solution  qi∗0 = P i1/eǫand  qi∗1 = (1 − P i1)/eǫcan also minimize the MAE.

From the result of proposition. 3, we can see a trend from the optimal perturbation parameters that minimize the MAE and MSE at the same time, that is when  ǫincrease, the raw data is more likely to be directly published. Notice that, with the optimal solutions,

image

Thus the distribution of each  Xiis identical to  Yi, this also guarantees that the output  Yiis very close to  Xi. Comparison with Optimized LDP: We next compare our optimal LIP-based perturbation mechanism with the optimal LDP-based one. Define the Opt-binary-LDP problem to be the same with Opt-binary-LIP in (20), except having different privacy constraints of LDP:  {Ri1, Ri2, Ri3, Ri4} ≤ eǫ, where Ri1, Ri2, Ri3, Ri4are derived from Definition (1):  Ri1 = 1−qi1qi0,

image

Proposition 4. In Opt-binary-LDP, for the i-th user, the optimal solution for  (qi0, qi1)is  qi∗0 = qi∗1 = 1eǫ+1, which results in a MSE  E∗bi−LDPof:

image

Given any fixed  ǫ ≥ 0, we have  E∗bi−LDP ≥ E∗bi−LIP.

Proof. The proof of the optimal solution is similar to that of Opt-binary-LIP, the only difference is the feasible region in Opt-binary-LIP is flexible for different priors, while the feasible region in Opt-binary-LDP is fixed. The optimal solution also coincides with the one used in [15].

For the second part, it’s easy to check by taking derivative over  eǫthat  E∗bi−LIP ≤ E∗bi−LDP, where  E∗bi−LIP = E∗bi−LDPif  ǫ = 0or  ǫ = ∞. This means LIP provides increased utility given any  ǫ. We then taking derivative of  P i1over  ∆E∗ =E∗bi−LDP − E∗bi−LIP, result shows that  ∂∆E∗∂P i1 = 0when  P i1 =0.5. As  ∆E∗(P i1=0.5) ≥ 0, E∗bi−LDP ≥ E∗bi−LIPfor any  P i1. Result also shows that as  |P i1−0.5|grows,  ∆E∗also increases.

image

The above result shows that, Opt-binary-LIP always achieves a better utility than Opt-binary-LDP under any  ǫ, this is because explicitly considering prior in the privacy definition allows a larger search space for optimal parameters than that in Opt-binary-LDP. We can also learn this result from the aspect of information theory, when  |P i1 − 0.5|is small, then the  H(Xi)is large, which means  Xihas the largest amount of uncertainty, thus knowing the prior of  Xidoes not help in perturbation; However, when  H(Xi)is small, prior knowledge of  Xiprovides a clearer indication of the real value, thus the context-aware model achieves enhanced advantages.

The binary case is an illustrative example that shows how we derive the optimal solutions, as the problem is equivalent to finding the maximum in a monotonically increasing region, thus the methods with goal to find the station points in the feasible region such as Lagrangian multiplier and KKT conditions are not applicable. On the other hand, thanks to the theorem that the maximum value of the monotocity function can be found at the boundary, the problem can be first regarded as developing the monotocity region and then finding the boundary values of the parameters. Moreover, with the goal to minimize the MAE, we can further restrict the optimal solutions to make the model more practical.

When each user has a same prior and when the perturbation channel is symmetric, these can be considered as special cases of the BIBO model, and details are referred to our conference version in [1].

B. Utility-Privacy Tradeoff under MIMO Model

More generally, we study the case in which D has a large domain, i.e.,  D = {a1, a2, ..., ad}with prior distribution Pr(Xi = am) = P im. This case well suits the application of summation and weighted summation and we denote this model as MIMO model. The optimization of the MIMO model is obviously obscure, as there are  d2parameters that need to optimize for a single user. (shown in 4(b))

Assume that a random-response perturbation mechanism which satisfies  ǫ-LIP takes input  Xiand output  Yi (Yihas the same range, we discuss the optimal output range after deriving the main results) with probability  Pr(Yi = ak|Xi = am) =qimk. Denote  Pr(Yi = ak) = λik. Thus, by (7):

image

By Bayes rules, (25) can be transferred to:

image

In (14),  V ar[Xi] = �dm=1 a2mP im − (�dm=1 amP im)2, and

image

where1ikis the indicator function of1i{Yi=ak}, which is 1 if Yi = akand 0 if not, thus1ikcan be regarded as a binary random variable which has the distribution of:  Pr(1ik = 1) =λikand  Pr(1ik = 0) = 1 − λik, as a result:  V ar[1ik] = λik(1 −λik)and  Cov[1ik,1il] = −λikλil.

image

Theorem 3. For the constraint optimization problem defined in (29). The optimal solutions is:  qi∗mm = 1 − (1 − P im)/eǫ, qi∗mk = P ik/eǫ, for all  m, k ∈ {1, 2, ..., d}, m ̸= k.

Brief steps of proof (detailed proof is shown in Appendix. B.

Regardless of the privacy constraints, we first show that the maximum value of  V ar[ ˆXi]can be reached when the  qimm = 1, for  m ∈ {1, 2, ..., d}. also, the minimum value of  V ar[ ˆXi]can be reached when  qimk = λk, for any  m, k ∈ {1, 2, ..., d}.

We then take derivative over a randomly chosen parameter:  qimkand derive the monotonicity region.

In each monotonicity region we show that the parameters that decrease will first reach the boundary. Thus the optimal solutions lies at the lower boundaries. As we can permute the sequence of  Yi, we can find d! optimal solutions.

Considering minimizing the MAE defined in (22), there is only one set of optimal solution remaining. From Theorem 3, we can see that the optimal solutions of the Opt-mimo-LIP also lies at the boundary of the privacy constraints. As  ǫincreases,  ∀m ∈ {1, 2, ..., d}, all the  qimms are increasing while all the  qimks are decreasing (m ̸= k),

and the value of  qimks are proportional to  P iks. thus an input value that has a larger prior should also be output with a large probability.

For example, consider a model in which D = {1, 2, 3} the prior of one of the users is given as:  P1 = 0.1, P2 = 0.2, P3 = 0.7. By Theorem 3  q∗11 = 1 − 0.9/ǫ, q∗22 = 1 − 0.8/ǫ, q∗33 = 1 − 0.3/ǫ, q∗21 = q∗31 = 0.1/ǫ, q∗12 = q∗32 = 0.2/ǫ, q∗13 = q∗23 = 0.7/ǫ. When  ǫincreases, each value is more likely to be directly published. As 3 has a larger prior than 1 and 2, if 1 and 2 are not directly published, they are more likely to be published as a 3.

Notice the main goal of (29) is to minimize the MSE rather than the MAE, so, we still deploy the MMSE estimator. Regardless of the goal to minimize MAE, there are d! optimal solutions for a single user in the problem of Opt-mimo-LIP, as we can randomly permute the order of  Yi. When minimizing the MAE, there is a unique solution remaining, as :  λik = P ik(1 − (1 − P ik))/eǫ + �dj̸=k P ij P ik/eǫ = P ik.Similarly, we formulate the optimization problem for the LDP, which is the same goal defined in (29) while subject to the constraints of LDP:  ∀m, n, k ∈ {1, 2, ..., d}, m ̸= n, there is qimkqink ≤ eǫ. We derive the optimal solutions for Opt-mimo- LDP as  qi∗mm = eǫeǫ+d−1, qi∗mk = 1eǫ+d−1, ∀m, k ∈ 1, 2, ..., d, m ̸= k. Direct comparison between the two mechanisms involves large amount of calculation, thus we compare them in simulation .

Optimal Output Range: In terms of the optimal range for each  Yi, previous models are considering  Yihas the same domain with  Xi, now we consider that the outputs take values from a different domain size. Denote the domain of the input as  Dx = {a1, a2, ..., ad}; the domain of the output as  Dy ={a1, a2, ..., af}. When d is fixed, we want to find the optimal value of f.

Theorem 4. In the Opt-mimo-LIP problem, when the input range of d is fixed, the optimal output range  f ∗is  f ∗ = d.

Detailed proof is shown in Appendix. C. From Theorem 4, we know the optimal output range is the same with the input. This property is helpful for model setting.

C. Applications to Histogram Estimation

Obviously, for the application of survey, the Opt-binary-LIP is suitable, and Opt-mimo-LIP can deal with the problem of direct summation and weighted summation. We now consider the problem of estimating a histogram.

The difference between the application of the histogram and other applications is even though for each user, before uploading data, the perturbation mechanism still takes one input data and outputs one data, the estimator of a histogram is a random vector rather than a random variable, as the value of each data stands for a category.

Derive the optimal estimator of the histogram vector, ˆS = { ˆS1, ˆS2, ..., ˆSd} = {E[S1| ¯Y ], E[S2| ¯Y ], ..., E[Sd| ¯Y ]}, with each entry:

image

Thus the mean square error of the estimation is

image

Theorem 5. The optimal perturbation parameters of problem defined in (32) is  qi∗mm = 1 − (1 − P im)/eǫ, qi∗mk = P ik/eǫ, for all  m, k ∈ {1, 2, ..., d}, m ̸= k.

Proof. Detailed proof is shown in Appendix. D.

The application of histogram can be viewed as a special case of the Opt-mimo-LIP because in the Opt-mimo-LIP, the optimal solutions are to make the estimation of each user ’s data as close to the real value as possible, while the request of the histogram is to classify each user’s data as accurate as possible, when the perturbation parameters are making each of the estimation as accurate as possible, it also handle the problem of identifying its category.

For implementation of the histogram, users are still publishing data according to the Opt-mimo-LIP perturbation mechanism. After receiving the published data by each user, the curator then estimates  {E[S1|Yi], E[S2|Yi], ..., E[Sd|Yi]} =

image

according to the optimal perturbation parameters.

D. Centralized Information Privacy (Binary case)

For comparison purposes, we now derive the formula for centralized information privacy under the binary perturbation mechanism, we first illustrate the idea by the global prior model: Consider N users are in a data aggregating model who directly submit their data to a trusted central aggregator, who publishes a perturbed version Y of S = f(X) using a CIP mechanism M(S). See Fig. 4(c). Assume that each user’s input value is taken as a random variable  Xiwhich takes value from a range of  Di = {0, 1}, ∀i ∈ {1, 2, ..., N}. It is known that the prior distribution of  Xiis:  P i1 = Pr(X = 1), ∀i ∈ {1, 2, ..., N}. Denote D as the database holding all users’ data, where  Di = Xi. Thus  Pr(Di = 1) = P1. Assume that the data the curator wants to aggregate is  S = �Ni=1 Diand the trusted third party perturbs S by using a randomized response mechanism M. Notice that  D1, D2, ..., DNis a N times Bernoulli trial, Thus:

image

Definition 4. A mechanism M which takes input values of D = {X1, X2, ..., XN}and output Y satisfies centralized information privacy if  ∀i, y{1, 2, ..., N}:

image

Intuitively, the difference between the centralized and the localized IP is the input of the mechanism: the input of the centralized IP is a database, while the input of the LIP is each user’s local data. In terms of the implementation, in the settings of CIP, all the raw data is hold by the trusted server, and the mechanism is run at the server rather than at each user. The configuration of the centralized IP model is shown in Fig.4(c).

Notice that the right side of the configuration of the centralized IP is multiple input multiple output LIP for a single user. Based on this observation, we have the following theorem.

Theorem 6. If the mechanism M which takes the inputs of S and output Y satisfies  ǫ-LIP, then it also satisfies the centralized model defined in Definition (4).

Proof. If the mechanism  M ǫ-LIP:

image

then, for any  j ∈ 0, 1

image

Thus, if the requirement of localized information privacy is met, then centralized information privacy is also satisfied. It’s easy to check it’s not vise versa as  Di, Yiand S does not form a Markov chain.

From Theorem 6, we know that localized information privacy infers centralized information privacy. In terms of the utility, we still want to minimize the mean square error: E[(S − ˆS)2]using the MMSE estimator: ˆS = E[S|Y ]. Thus the optimization problem with goal to minimize the MSE while subject to the privacy constraints satisfying Definition (5) involves large amount of calculation and is not the goal of this paper, however, we can still find the optimal utility-privacy trade off without specifying each parameter.

Proposition 5. When there is a global prior for each user, the utility-privacy trade off of the centralized Information privacy is found at the solutions of the optimization problem that

image

Proof. We know by previous results:

image

where  V ar(S) = NP1(1 − P1). We now derive the  V ar( ˆS). Base on the privacy constraints, we further derive the privacy metric:

image

On the other hand:

image

So, we have  e−ǫ ≤ { NP1E[S|Y ], NP1N−E[S|Y ]} ≤ eǫ, which further infers  e−ǫNP1 ≤ ˆS ≤ N − e−ǫ(N − P1N).

To deriving the optimal solutions of each parameters is not a main objective of this paper, however, we want to show the comparison between the centralized IP and the localized IP.

In this section, we use simulation to validate our analyt- ical results. In the first part, we validate our analysis using synthetic data and via Monte-Carlo simulation. We first show the advantages of the context-aware privacy notion (based on LIP) versus the context-free notion (based on LDP), by comparing their utility-privacy tradeoffs. Then we compare different models of LIP and LDP under different privacy budgets and prior distributions in both binary case and MIMO case (with both global priors and local priors). At last, we validate the analytical results through Monte-Carlo simulation, and compare utility-privacy trade offs provided by the localized models and the centralized model In section IV-D. In the second part, we evaluate on two real-world datasets: Karosak (click-streams of websites) and Gowalla (location check-in data). We evaluate utility by square root average MSE  (�E/N). This is because MSE depends on the number of users, for comparison purposes, we first average the MSE to normalize the influence of user count. In addition, since MSE is square error, we take the square root in order to make it comparable with absolute error, which roughly means how much deviation percentage is the result from the exact value. Note that, doing so does not change the optimal solutions in any of our optimization problems. In addition, since LIP achieves a relaxed privacy level than LDP, it is difficult to compare their utilities under the same privacy level. Thus, we will compare their optimal utilities under any given privacy budget  ǫ. All the simulations are done in Matlab (R2016a) on a Dell desktop (OptiPlex 7040; CPU: Intel (R) Core (TM) i5-6500 @ 3.2GHz; RAM 8.0 GB; OS: windows 64bit).

A. Simulation Results on Synthetic Data

1) Benefit of Context-awareness: First we would like to compare the utility-privacy tradeoffs between  ǫ-LIP and  ǫ-LDP using the binary model, and the goal is to observe

image

Fig. 5. The utility-privacy tradeoff comparison between prior-aware and prior- free models (log scale for y-axis)

the advantage of our proposed context-aware notion versus context-free notion of LDP. Intuitively, the utility gain of the former can be attributed to two factors: 1) using the prior in the MMSE estimator, which improves the accuracy compared with estimators that do not use prior knowledge; 2) the privacy guarantee of LIP is relaxed compared with LDP, by explicitly modeling prior in the definition. As a result, less perturbation is needed to satisfy the same privacy budget  ǫ. The latter factor is already proven in proposition 3. To decouple the influence of the above two factors, we compare the utility-privacy tradeoff of Opt-LIP with two other schemes: Opt-LDP (defined earlier), as well as context-free LDP adopted in previous work [15]. The prior-unaware estimator used in [15] is denoted as ˆC, which treats  Xias instances rather than variables:

image

where  pi = eǫeǫ+1as we discussed in section IV. This is an unbiased estimator under the binary symmetric channel (BSC) model. The MSE function of this estimator is:

image

The comparison is shown in Fig. 5. The privacy budget  ǫchanges from 1 to 5 with a step of 0.5. For now we assume that N users share the same global prior. We can see that, the square root average MSE of “ǫ-Opt-LDP” is always smaller than that of “context-free LDP” under any given  ǫ. When P1 = 0.5(prior is uniformly distributed), the distance between these two models is smaller; when the prior is more skewed, advantage of the former is even enhanced. This validates the benefit of the prior-aware estimator. On the other hand, by comparing the curves of “ǫ-Opt-LDP” and “ǫ-Opt-LIP” (using the same MMSE estimator), the error of Opt-LIP is always smaller than that of Opt-LDP, and the gap between the two models increases when  P1 = 0.99. This result confirms that our relaxed prior-aware privacy notion leads to increased utility. When  P1 = 0.99, users’ inputs are highly certain, merely considering prior in the estimator can already result in accurate aggregation. Thus advantage of the context-aware ǫ-LIP is even enhanced.

2) Comparing Models in MIMO Settings: We now evaluate the utility-privacy tradeoffs under the optimal solutions of LIP and LDP models when the D has a large domain.

We first fix the users number as N = 500 and |D| = 5.

image

(b) Utility-privacy tradeoff comparison with |D| increasing from 2 to 20 Fig. 6. Utility-privacy tradeoff comparisons under MIMO model

image

Fig. 7. Utility-privacy tradeoff comparison for the application of histogram

Without loss of generality, we assume that for each user, D = {0, 1, 2, 3, 4}, with the prior of each value randomly generated. The utility-privacy trade offs are shown in Fig.6(a). We can see that the figure shows that the  ǫ-MIMO-LIP performs better than under the binary model, as the  ǫ/2-MIMO-LIP provides higher utility even than the-MIMO-LDP. This means the advantage of LIP becomes more obvious with |D| increases.

We next compare in detail how the data domain affects the LIP and LIP: Consider 500 users are in the system and each of them has an input range varying from |D| = 2 to |D| = 20. To explicitly illustrate the comparison of LIP and LDP models with |D| increasing, we assume that each user’s prior is uniformly distributed. We then fixed  ǫ = 1, and show the utility with different input ranges under  ǫ = 1. In Fig.6(b), we observe that when |D| is small, the  ǫ-MIMO-LDP model provides better utility than the  ǫ/2-MIMO-LIP model. However, as |D| increases, the  ǫ/2-MIMO-LIP eventually outperforms  ǫ-MIMO-LDP. We can also see that both the LDP and LIP models suffer from decreased utility when the input range increases, but the LIP models decreases linearly with the input range increasing while the LDP model decreases faster than that.

3) Histogram: For the application of histogram, the defini-tion of the total MSE is different from the MIMO case, as the estimator is an expected vector. We compare the performance of LIP and the LDP in two ways: (1) the LIP v.s. the context aware LDP, in this case, the utility of LDP is measured by the square root average MSE between the real value and the prior-related estimated vector; (2) the LIP v.s. the context-free LDP. To measure the utility of the context-free LDP, we adopt the optimal unary encoding (OUE-LDP) protocol proposed in [15]. Unary encoding is first mapping the input data into a binary vector and perturb each bit independently, which is proved to provide the highest utility among all the other LDP protocols, as the binary encoding method reduces the amount of increased error from a large domain. We fix the data domain

image

Fig. 8. Utility-privacy tradeoff comparison when N users have different local priors, y-axis is square root average MSE:�E/N.

to |D| = 20 and range  ǫfrom 0 to 10. The MSE resulted by the OUE-LDP in the application of histogram is  n 4eǫ(eǫ−1)2. The comparison is shown in Fig. 7. Observe that the tradeoff curve of the  ǫ-context-aware LDP is sandwiched between the ǫ-LIP and the  ǫ/2-LIP. The LDP with optimal unary encoding results in very large MSE with small  ǫ, when  ǫincreasing, it achieves increased utility than the context-aware LDP, but it still sandwiched between the  ǫ-LIP and the  ǫ/2-LIP. It further shows that the context-aware models are more applicable with strong privacy protection.

4) Monte-Carlo Simulation: We further study the case when each user has different local priors and use Monte-Carlo Simulation to study the convergence of performance when N increases. Fig. 8 shows the comparison among three models described above as well as the model with CIP described in section IV-D. We assume that each user’s prior probability is sampled uniformly at random from [0, 1]. We create three datasets with N = 5, 500 and 50000, each of which contains randomly generated binary data according to their priors. For each dataset, in the localized cases, we assume that each user publishes  Yiusing the LIP models or the LDP models and the curator aggregates data using corresponding estimators discussed above. The error is measured by the square root of the averaged error over all users. In the centralized case, as the close form of the optimal parameters are difficult to find, we use build in optimization tools to find the parameters and derive the error. We ran each simulation 10000 times and average the errors, which are shown in Fig.8. Note that when N = 5, curves of  ǫ/2-Opt-LIP and  ǫ-Opt-LDP cross over. This is because some users’  P i1values are far from 0.5, which makes the MSE of  ǫ/2-Opt-LIP smaller than that of  ǫ-OptLDP for smaller  ǫ. As N becomes larger, we can see that  ǫ-Opt-LIP always provides higher utility than  ǫ-Opt-LDP model under any  ǫ. We also observe that the curve of  ǫ-Opt-LDP is almost sandwiched between the ones of  ǫ/2-Opt-LIP and  ǫ-Opt-LIP. In comparison with the centralized information privacy, we can observe that the centralized information privacy always provide increased utility than the localized models.

B. Simulation with Real-world Datasets

1) Website Popularity Statistics: The first comparison is run using the dataset of Karosak which is a collection of anonymized click-stream data of a hungarian on-line news

image

(b) utility-privacy tradeoffs for location tracking with Gowalla Fig. 9. Utility-privacy tradeoff comparisons using real world data

portal. There are around 8 million click events for 41,270 different pages. In this data set, each row stands for a click-stream for a website in different time slots. Our goal is to estimate the frequency of popular websites (with a total click over 15,000). We treat each website as a user, thus  Xi = 1if the total clicks of website i is above 15,000; otherwise, Xi = 0. Since no historical data is available, we regard it as the first special case in which users have a global prior.

In Fig. 9(a), we can see that  ǫ-Opt-LDP results in larger square root average MSE than  ǫ-Opt-LIP. For some smaller value of  ǫ, ǫ/2-Opt-LIP performs better than  ǫ-Opt-LDP. This is because the popular websites are rare, thus the global prior is relative small. Again, this result confirms that, the more specific the prior is, the more beneficial is LIP than LDP.

2) Location Check-In Dataset: We then compare the performance of different models with another real-world dataset Gowalla, which is a social networking application where users share their locations by checking-in. There are 6,442,892 users in this dataset. For each user, a trace of his/her check-in locations are recorded. For this dataset, we wish to estimate a histogram of users’ last check-in location. We first divide the area into  36 × 36districts. We then map each user’s locations into districts. Each user’s past check-in locations are used for calculating a global prior of the last check-in location for all the users. As studied in section. IV-C, for each user, the last check-in location is perturbed according to a MIMO-LIP (LDP) channel and a random vector is used for the adversary to estimate the histogram. The results are shown in Fig. 9(b), where similar trends can be observed as in the empirical results. Note that comparing with the theoretical results from Fig. 7, the advantage of LIP is even more clear in Fig. 9(b), that is because the theoretical analysis uses data from a domain with |D| = 20. On the other hand, in the dataset of Gowalla, the input data is from a domain with  |D| = 36 × 36, even though many of the districts has o users checking-in, which results in zero priors for these districts. Based on the MIMO perturbation mechanism, for those districts with 0 prior, the system will also never output those districts, as a result, the data domain is equivalent to a much decreased one. Nevertheless, the effective domain size is |D| = 83, which is much larger than 20. As we discussed, when domain size is large, the advantage of LIP is enhanced.

In this paper, the notion of localized information privacy is proposed. As a context-aware privacy notion, it provides relaxed privacy guarantee than LDP by introducing prior knowledge in the privacy definition while achieving increased utility. Combined with an MMSE estimator which also leverages prior knowledge, larger gains in utility can be obtained. We studied the utility-privacy tradeoff of our proposed LIP notion and the traditional LDP notion under different both binary perturbation model and multiple-input and multiple-output perturbation model. In the binary model, we show that our  ǫ-LIP always outperform  ǫ-LDP for any given privacy budget  ǫ. The advantage is more enhanced when the prior distribution is more skewed (even  ǫ/2-LIP with a stronger privacy guarantee than  ǫ-LDP is better than the latter for small  ǫvalues). In the MIMO model, we show that when the input data has a larger range, both the LIP mechanisms and LDP mechanism suffer decreased utility. However, as the input range increasing, the decreased amount of the LIP mechanism is less than that of the LDP mechanism. Which means the context-aware privacy notion is more applicable than the context-free privacy notion in the MIMO case.

For future work, we will extend our work to handle more general models. For example, we will consider an adversary that has less accurate/complete prior knowledge than users, and also understand the impact of user correlations. We will also study the optimality of the perturbation model.

[1] B. Jiang, M. Li, and R. Tandon, “Context-Aware data aggregation with localized information privacy,” in 2018 IEEE Conference on Communications and Network Security (CNS) (IEEE CNS 2018), (Beijing, P.R. China), May 2018.

[2] P. Samarati and L. Sweeney, “Protecting privacy when disclosing in- formation: k-anonymity and its enforcement through generalization and suppression,” tech. rep., 1998.

[3] C. Dwork, “Differential privacy: A survey of results,” in Theory and Applications of Models of Computation: 5th International Conference, TAMC (M. Agrawal, D. Du, and Z. Duan, eds.), pp. 1–19, 2008.

[4] C. Dwork, “Differential privacy,” in Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Part II (M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, eds.), pp. 1–12, 2006.

[5] C. Dwork, F. McSherry, and K. Nissim, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, pp. 265–284, 2006.

[6] A. FITZPATRICK, “What to do after the massive yahoo hack,” 2016.

[7] R. Chen, H. Li, A. K. Qin, S. P. Kasiviswanathan, and H. Jin, “Private spatial data aggregation in the local setting,” in 2016 IEEE 32nd ICDE, pp. 289–300, May 2016.

[8] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63–69, 1965.

[9] J. Freudiger, R. Shokri, and J.-P. Hubaux, “Evaluating the privacy risk of location-based services,” in Proceedings of the 15th International Conference on Financial Cryptography and Data Security, FC’11, (Berlin, Heidelberg), pp. 31–46, Springer-Verlag, 2012.

[10] P. Kairouz, S. Oh, and P. Viswanath, “Extremal mechanisms for local differential privacy,” in Advances in Neural Information Processing Systems 27, pp. 2879–2887, Curran Associates, Inc., 2014.

[11] A. D. Sarwate and L. Sankar, “A rate-disortion perspective on local differential privacy,” in 2014 52nd Annual Allerton Conference on Communication, Control, and Computing, pp. 903–908, Sept 2014.

[12] S. Xiong, A. D. Sarwate, and N. B. Mandayam, “Randomized requan- tization with local differential privacy,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2189–2193, March 2016.

[13] lfar Erlingsson, V. Pihur, and A. Korolova, “Rappor: Randomized aggregatable privacy-preserving ordinal response,” in Proceedings of the 21st ACM CCCS, 2014.

[14] J. Tang, A. Korolova, X. Bai, X. Wang, and X. Wang, “Privacy loss in apple’s implementation of differential privacy on MacOS 10.12,” CoRR, vol. abs/1709.02753, 2017.

[15] T. Wang, J. Blocki, N. Li, and S. Jha, “Locally differentially private protocols for frequency estimation,” in 26th USENIX Security 17, pp. 729–745, USENIX Association, 2017.

[16] T.-H. H. Chan, E. Shi, and D. Song, “Optimal lower bound for differentially private multi-party aggregation,” in Proceedings of the 20th Annual ECA, ESA’12, pp. 277–288, 2012.

[17] R. Bassily, K. Nissim, U. Stemmer, and A. Thakurta, “Practical locally private heavy hitters,” CoRR, vol. abs/1707.04982, 2017.

[18] C. Huang, P. Kairouz, X. Chen, L. Sankar, and R. Rajagopal, “Context- aware generative adversarial privacy,” CoRR, vol. abs/1710.09549, 2017.

[19] W. Wang, L. Ying, and J. Zhang, “On the tradeoff between privacy and distortion in differential privacy,” CoRR, vol. abs/1402.3757, 2014.

[20] S. Asoodeh, F. Alajaji, and T. Linder, “Notes on information-theoretic privacy,” in 2014 52nd Allerton, pp. 1272–1278, Sept 2014.

[21] G. Wu and X. Xia, “Extending differential privacy for treating dependent records via information theory,” CoRR, vol. abs/1703.07474, 2017.

[22] W. Wang, L. Ying, and J. Zhang, “On the relation between identifiability, differential privacy, and mutual-information privacy,” IEEE Transactions on Information Theory, vol. 62, pp. 5018–5029, Sept 2016.

[23] S. Asoodeh, F. Alajaji, and T. Linder, “On maximal correlation, mutual information and data privacy,” in IEEE CWIT, pp. 27–31, July 2015.

[24] M. E. Andr´es, N. E. Bordenabe, and K. Chatzikokolakis, “Geo- indistinguishability: Differential privacy for location-based systems,” CoRR, vol. abs/1212.1984, 2012.

[25] Y. Cao, M. Yoshikawa, Y. Xiao, and L. Xiong, “Quantifying differential privacy under temporal correlations,” in 2017 IEEE 33rd ICDE, pp. 821– 832, April 2017.

[26] F. Tram`er and Z. Huang, “Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies,” in Proceedings of the 22Nd ACM SIGSAC, pp. 1286–1297, 2015.

[27] P. Kairouz, Oh, and Sewoong, “Extremal mechanisms for local differ- ential privacy,” in Advances in Neural Information Processing Systems 27, pp. 2879–2887, 2014.

[28] F. Calmon and N. Fawaz, “Privacy against statistical inference,” 10 2012.

[29] F. A. S. Asoodeh and T. Linder, “Privacy-aware mmse estimation,” in 2016 IEEE ISIT, pp. 1989–1993, July 2016.

[30] Z. Qin, Y. Yang, and T. Yu, “Heavy hitter estimation over set-valued data with local differential privacy,” in Proceedings of the 2016 ACM SIGSAC, CCS ’16, pp. 192–203, 2016.

[31] A. Papoulis and S. Pillai, Probability, random variables, and stochastic processes. McGraw-Hill, 2002.

image

Proof. 1) Step 1 Notice that  P i1is a constant, thus the optimization problem is equivalent to maximize:

image

In order to test the monotoncity of  Li(qi0, qi1), taking partial derivative on it.

image

∂Li(qi0,qi1)∂qi0 < 0when  qi1+qi0−1 < 0. Which means given any value of  qi1, Li(qi0, qi1)is monotonically decreasing with  qi0from 0 to  1 − qi1. By the same way, when  qi0is fixed,  Li(qi0, qi1)is mono- tonically decreasing with  qi1from 0 to  1 − qi0. 2) Step 2 To test the symmetry of  Li(qi0, qi1), take  qi′0 = 1 − qi0, qi′1 = 1 − qi1.

image

Thus,  Li(qi0, qi1)is symmetric about (0.5, 0.5), which means for every point  (qi0, qi1)on the left of  qi0 +qi1 = 1, we can find a point on the right side of  qi0 + qi1 = 1that result in a same  Li(qi0, qi1)value.

In terms of the constraints, first derive  F i1(qi0, qi1), F i2(qi0, qi1), F i3(qi0, qi1), F i4(qi0, qi1)as:

image

if taking qi′0 = 1 − qi0, qi′1 = 1 − qi1into F  i1(qi0, qi1), F i2(qi0, qi1), F  i3(qi0, qi1)and F  i4(qi0, qi1):

image

image

As a result, the four constraints functions are symmetric about point (0.5, 0.5). Assume that  F ij(qi0, qi1), j = 1, 2, 3, 4 form a feasible region  Tifor  (qi0, qi1), thus for any point in this region, another point can be found on the other side of  qi0 + qi1 = 1also in  Ti. Now (qi∗0 , qi∗1 ) ∈ Ti = Ti ∩ {qi0 + qi1 ≤ 1}.

Fig. ?? illustrates the function of  Li(qi0, qi1)and feasible region  Rion the plane of  qi0, qi1, curves in the figure is the contour line of  Li(qi0, qi1), while the shadow region is the feasible region of  Tiwhen  ǫis fixed.

3) Step 3

image

for any point  (qi0, qi1) ∈ Ti, there is  F i1 ≥ F i2and F i4 ≥ F i3

image

Since  Li(qi0, qi1)is monotonically decreasing with  qi0and qi1in  Ti, For any  qi0in  Ti, its according optimal  qi∗1is the smallest value s.t.  (qi0, qi∗1 )is in  Ti. It is the same with  qi∗0. On the other hand,  F i1, F i2, F i3, F i4are all liner, their bounds value can be achieved when they are equal to their constraints values  eǫor  e−ǫ. As a result,

image

4) Step 4

To find the minimal value of  Li(qi0, qi1), where  (qi0, qi1) ∈Ti, the last thing we need to do is to test its monotocity of  Li(qi0, qi1)given  F i1 = eǫor  F i4 = eǫ.

image

image

Which is obviously greater than 0, similarly, when  F i4 =eǫis given:

image

Which is also greater then 0. Noticed that, when  qi0 ≤P i1

eǫ, optimal point is  (qi∗0 , qi∗1 ) = ( P i1eǫ , 1−P i1eǫ ), when  qi0 >P i1

eǫ, the optimal point is also  (qi∗0 , qi∗1 ) = ( P i1eǫ , 1−P i1eǫ ). Thus the global optimal solution is  ( P i1eǫ , 1−P i1eǫ ).

image

APPENDIX B PROOF OF THEOREM 3 Proof. Notice that  V ar[Xi]is a non-negative constant, thus minizing MSE is equivalent to maximize  V ar[ ˆXi].

Lemma 1. Regardless of the privacy constraints, one set of solutions of  qithat result in the minimal value of  V ar[ ˆXi]is qink = λik, ∀n, k ∈ 1, 2, 3...d; one set of solutions of  qithat result in the maximal value of  V ar[ ˆXi]is all  qikjs are from the set of {0, 1}, and for any  n ̸= m, qinjand  qimjcan not be 1 simultaneously.

Proof. Obviously, minimized solution:

Consider a set of parameters:  qimin, when  qink = λik, ∀n, k ∈ 1, 2, 3...d, V ar[ ˆXi] = 0, as  V ar[ ˆXi] ≥ 0, thus qink = λikresults in a minimal value of  V ar[ ˆXi].

On the other hand, maximized solution: Consider a set of parameters:  qimax, assume that for all  n ∈ 1, 2...d, there’s a k ∈ 1, 2, ..., d, such that  qink = 1and  qinl = 0for all  l ̸= kand for different n, k is different, thus  λik = P in

image

Notice that  MSEi ≥ 0, V ar[Xi] ≥ V ar[ ˆXi]. Thus  qminiis a maximum when for all  n ∈ 1, 2, ..., d, qink = 1and  qinl = 0

for all  l ̸= k.

When  ǫ = 0, the mechanism satisfies strongest privacy guarantee and λikqijk = 1. We know that when  qink = λik, the mechanism achieves minimized utility.

We now develop monotocity of the region between minimum and maximum with the following lemma:

Lemma 2. The sets of solutions in the form in lemma.1 that result in the maximum and minimum of  V ar[ ˆXi]are unique. On the other hand,  V ar[ ˆXi]is monotonically increasing when qilk > λik; V ar[ ˆXi]is monotonically decreasing when  qilk <

image

Proof. Taking derivative over  qilk, where  l, k ∈ 1, 2..., d:

image

From (48), we can see that the station point of  qilkis  λik, which we know is the minimal value and  V ar[ ˆXi]is monotonically increasing when  qilk > λik; V ar[ ˆXi]is monotonically decreasing when  qilk < λik. As a result, without considering the privacy constraints, the optimal solutions of each  qimnis either 0 or 1. We first show that the maximum value of  V ar[ ˆXi]can only be achieved by the solutions discussed above. We take one set of optimal solution as an examples, other solutions follow the same idea, the set of optimal solution is:  qikk = 1for any  k ∈ 1, 2, ..., d, and  qikj = 0for any j ∈ 1, 2, ..., d. Know we assume that there is an subset of index 1 to n, s.t:  qillk ̸= 1 ̸= 0, for any  k ∈ {1, 2, ..., n}. By the monotocity, we know that, regardless the privacy constraint and the total probability constraints,  V ar[ ˆXi]with qilk ̸= 1 ̸= 0and is less than  V ar[ ˆX′i]with  qilk = 1, for any k ∈ {1, 2, ..., n}. Compare with the two solutions, we have:

image

Thus the optimal solution is: only one of the  qikkcan be one, other  qikjs are all zeros.

image

Now, we know that the optimal solution for the miniming MSE problem with  ǫgrowing is from the minimal value to its maximal value. As a result of the monotocity property, the optimal solution (with privacy constraints) lies on the boundaries of the constraints:  e−ǫ = λikqijk, or λikqijk = eǫas well as the constraints:  0 ≤ qijk; �dn=1 qijn = 1;, ∀j, k ∈ 1, 2, ..., d.Based on the solutions which result in the maximum and minimum of  V ar[ ˆXi]. We know that one of the probabilities of  qim1, qim2, ..., qimd, ∀mapproaches 1 and others approaches 0. As we can randomly permute the sequence of  Yi, there are d! feasible solutions. We now consider the case in which qikks approach 1 for all  k ∈ 1, 2, ..., d, and other  qikjs are approaching 0, where  j ̸= k. For the  qikks which approach 1, the upper bounds is restricted, and for  qikjs which approach 0, the lower bounds are mounted. Considering the privacy constraints, we know the upper of  qikkis  λik/e−ǫand the lower bound of  qikjis  λik/eǫ. As  qikk + �dj=1,j̸=k qikj = 1, for all js qikjs are approaching boundaries simultaneously, as a result, they may not reach the boundaries at the same time.

We now discuss whether lower bounds or upper bounds are reached first.

Lemma 3. The parameters that decrease when  ǫgrows reach the boundary first.

Proof. When lower bounds are reached,  qij,k = λkeǫfor all j, k ∈ 1, 2, 3, ..., d, j ̸= k. Thus  qikk = 1 − (1 − P ik)/eǫ.

image

So, we know that when  qikjs reach the lower bound,  qikkis still in the feasible region, it’s easy to check that when  qikkreaches the upper bound,  qikjs do not satisfies the privacy constraints.

image

As a result, one optimal solution is:  qikj = P ij/eǫfor all k, j ∈ 1, 2, 3...dand  j ̸= k; qikk = 1 − (1 − P ik)/eǫfor all k ∈ 1, 2, 3...d. We can randomly permute the sequence of  Yi, thus there are d! optimal solutions.

image

Proof. We know the optimal solution of the parameters of any input  akare in the form of:  qikkis approaching to 1 while other  qikjs are approaching to 0 so that each input value can be inferred by a particular output. For example, given  Y i = ak, we can probably infer that  Xiis also  akand the confidence increases with  ǫ.

Assume that f < d, intuitively, this will achieve decreased utility because at least one of input values with index  k ∈ {1, 2, ..., d}can not find an parameter that approaches to 1, as a result, inputs with indexed {f + 1, f + 2, ..., d} can not be inferred by any outputs. Mathematically, as the d is fixed, V ar(X) is also fixed. denote  V ar( ˆXi)as the variance of the estimator with d = f and  V ar( ˆX′i)as the variance of the estimator with d > f. recall that

image

First assume that for each  j ∈ {1, 2, ..., d}, k ∈{1, 2, ..., f}, the parameters of ˆXiand ˆX′iare identical. We know that for each  j ∈ {1, 2, ..., d}, k ∈ {1, 2, ..., f}, ajPr(Xi = aj|Yi = ak) ≥ 0, thus  V ar( ˆX′i)is monotonically increasing with f. Notice that the parameters of ˆXiand ˆX′ican not be identical as for at least one  j, qikjwill increase for k ∈ {f + 1, f + 2, ..., d}, j ∈ {1, 2, ..., f}. However, the increase of these values will make each  Pr(Xi =ak|Yi = aj)smaller, thus  Pr(Xi = ak|Yi = aj) >Pr(X′i = ak|Y ′i = aj). As a result: V ar(Xi)> V ar(X′i).

Assume that d < f, this case can be viewed as a special case of the general model with  P id+1 = P id+2 = ... =P if. Thus the optimal solutions is straightforward:  qikk =1 − (1 − P ik)/eǫ, qikj = P ij/eǫfor  k, j ∈ {1, 2, ..., d}; qikj = 0, for  k ∈ {1, 2, ..., d}, j ∈ {d + 1, d + 2, ..., f}. As a result, the optimal solution is equivalent to the case of the general model with d = f.

In summary, the optimal range of output is f = d.

image

Proof. The parameter-related term in (31) can be further expressed as:

N �i=1

N �i=1

image

N �i=1

N �i=1

(55) Comparing with the parameter-related term in the MSE formulation of the mimo model:

image

when each m = n = k, and  aj = 1for  j ∈ {1, 2, ..., d}, (56) becomes:

image

(57) Notice that (57) is the same with the parameter-related term in and we can consider  aj = 1, ∀j ∈ {1, 2, ..., d}, m = n =k as a special case of (55); On the other hand, these two minimization problem has the same privacy constraints. Thus their optimal solutions are identical.

image


Designed for Accessibility and to further Open Science