Human biometrics as identity credential offers great usability for identity management. Biometrics based authentication systems have been widely deployed and commonplace. With the prevalent of biometrics systems, the public are getting concern about the security and privacy of the biometric templates if they are compromised. For instance, if an adversary manages to retrieve an individual’s template, impersonation with the stolen template is straightforward. Furthermore, biometrics is intrinsically linked to individual and it is limited, thus revocation and replacement like passwords are impossible. With the advancement of technology, another critical threat on privacy is that the original biomet-
ric data can actually be reconstructed with high accuracy [1–3].
Due to above concern, cancellable biometrics (CB) has been devised for biometric templates. CB is a parameterized irreversible yet revocable transform to ensure security and privacy of the biometric template. [4–7]. Revocability of CB demands the transformed template can be cancelled and replaced by changing the parameter (or known as helper data) whenever required. The transformed templates should be independent to each other as well as with original template to prevent cross-matching (unlinkability criterion). This can be done with application-specific helper data. Another vital criterion is irreversibility wherein the original template should be computationally difficult be inverted from a single instance or multiple instances of transformed templates with or without helper data.
Numerous CB schemes have been proposed in the literature and shown satisfy the CB criteria up to certain level. Several specific attacks have also been proposed for CB schemes such as brute-force inversion attack, dictionary attack (or known as zero effort false accept attack), correlation attack (or known as attacks via record multiplicity, ARM) and similarity-based attack (also named masquerade or preimage attack). (see review papers [4–7]). While correlation attack is associated to unlinkability criterion, dictionary attack exploits decision threshold of biometric systems and brute-force inversion is related to irreversibility, similarity-based attack breaches all the criteria mentioned above.
Brute-force inversion assumes by given transformed templates with or without helper data, an adversary may obtain an exact or approximate original template. The attack could be largely resisted if the transformation is a many-to-one function [8] or salted [9]. On the other hand, similarity-based attack (SA) attempts to find a preimage, an inverse of the transformed template, which may or may not close to original template that against irreversibility criterion. Furthermore, an attacker can leverage the preimage with the compromised helper data to impersonate the genuine user even his/her template is protected. Accordingly, cross-matching also become possible when the associated application-specific helper data is known. SA is enabled by the similarity preserving property of the cancellable biometric transformation while this property is essential to achieve non-degradation in terms of accuracy performance after transformation. To be specific, the distance or similarity between templates in the original space should preserve relatively in its transform domain. Hence, SA utilizes the distances relationships between templates instead of the exact distances between them for inversion [10, 11]. In literature, SA is often ignored while the rest of the attacks have became a must component to overcome wherever a new CB scheme is designed [9, 12] .
There are a few realizations of SA. In 2010, Nagar et al. propose a method to find preimage based on a given BioHashing template (A salting-based CB scheme) and its parameters [13]. The Biohashing is formulated as an (approximated) linear equation system and the reconstruction problem is solved via a bounded least-squares solver. However, this method does not work on non-linear systems and sensitive to outlier [14].
In 2012, Feng et al. propose a two-step method to retrieve preimage from binary transformed templates [15] . In the first step, given a binary template b, the bases , and the threshold
, the reconstruction to real-valued templates problem can be solved by using a Perceptron. In the second step, a modified hill-climbing algorithm is used to construct the fake face image from the real-valued template. Despite this scheme can achieve good performance, the implementation is complex. The latest work about SA is published in 2019 [11] . The authors justify theoretically most of the existing CB schemes are highly susceptible to SA as they largely rely on the similarity preserving property to maintain accuracy performance. However, no implementation is given in this work.
Despite a handful works of SA has been proposed, they are either method or biometric modality dependence or of inefficient, such as hill climbing that requires long iterations. In this paper, we introduce a Genetic Algorithm based similarity-based attack framework (GASAF) that works ef-ficiently, effectively and easily irrespective to CB algorithm or biometric modality. With this framework, we break two well-known cancellable schemes, namely BioHashing [16] on deep model face features and Bloom-filter [17] on IrisCode [18]. We made our source code available at https://bit.ly/2FduSYl.
The contributions of this paper are as follows:
• We propose a framework of similarity-based attack by means of GA (GASAF). The GASAF can be applied to CB schemes that transformed real-valued and binary biometrics templates. The former instance is face feature vector that generated by the deep network and the
Figure 1. Locality sensitive hashing (LSH)
• We demonstrate the GASAF can find the preimage from its transformed counterparts in an efficient and effective manner.
• We design a set of new metrics to evaluate the effectiveness of the similarity-based attack for cancellable biometrics.
In this section, we first provide a brief account of Locality Sensitive Hashing (LSH) in which most of the CB schemes based upon and then followed by the description of BioHashing and Bloom-filter.
Cancellable biometrics schemes leverage similarity preserving property to enable accuracy performance preservation. More specifically, given two biometric feature sets in its original feature space, they are transformed into a new space as
, and the distance between X and Y should be nearly preserved in the new space
. Therefore, matching can be accomplished in the transformed domain. LSH is one such a notion used as a transformation function in CB attributed to its similarity/distance preserving characteristic. LSH aims to map similar items into same ”buckets” with a maximized probability (see Figure 1 ).
Given the LSH family which maps data points from
to a bucket
, and the similarity function
, for any two given points
,the LSH family satisfies the following conditions :
where .
BioHashing is a representative salting-based generic CB scheme [16]. It is a two-factor scheme based on the user-specific token, act as salt (helper data) and real-valued biometric feature vector, and followed by a binarization procedure. The n-bit BioHash code c of a feature vector is computed as
, where
is a signum function, and
is an empirically determined threshold, and
is a random vector. The Hamming distance is computed between two hash codes to indicate the similarity between two biometric vectors. New instance of x can be reissued by replacing a newly generated pseudo-random vector.
The theory of Biohashing is grounded in random projection (RP), which is an instance of LSH. RP is a process of projecting feature vector from n dimensions to m dimensions () in the Euclidean space by using random matrices [12]. RP is based on Johnson-Lindenstrauss lemma (J-L lemma) [19] that warrants the points from a highdimensional space can be embedded onto low-dimensional space while preserving the distance approximately.
Bloom-filter is adapted as a means of generic CB schemes and also shares same characteristic of LSH [20]. In bloom-filter, the biometric feature is mapped to a bit array b with several independent hashing functions, where b is an bit-array of length . Specifically,
independent hash functions denoted as
are pre-defined first, then each element in set S is hashed and the hashed result is derived as k indices. Finally, set all k indices of the bit-array b to unity. At the verification stage, the bit-array of the query element y is matched to the stored template by means of hamming distance. The Bloom-filter is proposed on IrisCode by Rathgeb et. al [17]. In their approach, the
IrisCode is divided into K blocks where each block consists of l = W/K columns of codewords. The word size of each codeword is denoted as
bits. Finally K Bloom-filters with length of
bits are generated as the transformed template.
In this section, a general framework of SA is presented firstly, followed by a detailed implementation based on Genetic Algorithm. Finally the evaluation metrics of SA is proposed.
3.1. Framework of Similarity-Based Attack
In SA, we introduce Kerckhoffs’s assumption and assume that the attacker can access to the protected template, and known well about the transformation function as well as the parameters of the function. The essential part of the similarity-based attack is to generate the original template approximately (preimage) based on an initial guess of template instance. SA can be launched by solving the following minimization problem:
where x is the original template and is the pre-image of x. Given a CB transformation function
, the relative distance of two transformed templates should be preserved with respect to their counterparts in the original space, hence (3) can be rewritten as:
where indicates an algorithm-specific distance function and h(x) is the compromised template in the database. The cost function of the optimization problem can be defined as:
It is worth noting that the is the CB transformation function that possess similarity preserving property. The x can be manifested in the form including discrete (binary or integer) and continuous-valued.
As suggested by [21] and [11], can be estimated by a search algorithm based on the similarity preserving characteristic of their transformed counterparts. Basically the process of estimating the best solution resembles number guess game wherein several steps are taken to ’guess’ the best
:
1. Define the solution space and generate the first guess randomly;
2. Project the guessed into the transform space with
;
3. Compute the cost of the guessed by its cost function (4);
4. Generate a new guess based on the current guess and compute the cost. Repeat step 2 to step 4 until the stop condition is met.
3.2. Genetic Algorithm
GA is a commonly used robust search algorithm that simulates the genetic mechanism of the biology world such as mutation, crossover and selection. Like the biological evolution, GA repeatedly modifies a population of individual solutions to drive the population to evolve into a optimal solution. The children generation generally can be formed by three types of manipulations:
• Selection: select the individuals of generated features as parents, which is used to generate the next generation of features.
• Crossover: combine or exchange some part of two parents features to form children for the next generation of features, e.g. exchange some bits between two generated IrisCode.
• Mutation: apply random changes to individual parents features to form children, e.g. inverse some bits of one generated IrisCode.
Specifically, the solution space encoding is generated firstly, and the first population is initialized randomly.A fitness function is designed to evaluate how optimal the individual in the population is, and the aforementioned three types of manipulation are used to form the next generation. Then same fitness function is used to evaluate the individual in the children population and continue the above procedures repeatedly (see Algorithm 1). Over successive generations, GA drives the population to evolve into a optimal solution. Until the newly generated individuals reach the threshold of the fitness function or the average change in the fitness value less than certain threshold, the best individual is returned by GA. GA can be used to solve various optimization problems even though the objective function is discontinuous, non-differentiable, stochastic, or highly nonlinear.
3.3. Similarity-Based Attack Evaluation Metrics
The key idea of SA is to find the preimage of the original template x based on the compromised template h(x). Let
be the helper data that associated to h(x) in an application. We denote this process as
. With
, the attacker can gain access to other applications if
sat-isfies
, where
is a distance function and
is the system threshold.
For clarity, the matching scores under a few scenarios are defined as follows:
• Genuine scores: In a biometric authentication system, the similarity matching score between two transformed templates from same identity by same helper data is named as Genuine score;
• Imposter scores: Matching score from different identity by same helper data is known as Imposter scores;
Figure 2. Score distribution shifting
• Mated-SA-Imposter scores: In the proposed SA, an attacker attempts to estimate the user original template, then use the preimage to access another application. Under this situation, the matching score computed among
and its corresponding transformed template in the target database, i.e.
, is named as Mated-SA-Imposter scores (see Figure 2).
Normally, the threshold value in Figure 2 should be chosen to minimize the False Accept Rate (FAR) of the system. However, under SA, the Mated-SA-Imposter score distribution can ’shift’ to the right as illustrated in Figure 2. This shift implies more matching scores will exceed threshold thus become easier to gain access to the system.
Based on the above discussion, the overlapping region of the Imposter and Mated-SA-Imposter score distributions, denoted as OL, is used to evaluate the system’s resistances to SA. If OL = 1, this indicates the Mated-SA-Imposter and Imposter are overlapped totally, hence the attacker would fail to gain access. Otherwise, OL = 0 implies the attacker can easily gain access with since the matching score is greater than the threshold.
To compute the overlapping region of two distributions, we assume the score distribution follows Gaussian distribution for simplistic. Let be the Imposter score and
be the Mated-SA-Imposter score, hence
,
where
. Let c denote the intersection point of the overlapping region, the area of the intersection zone can be computed as:
Even though OL can be used to indicate the score distribution shifting, the overlapping area depends on the specific score distribution, hence may not reflect the exact shifting circumstance. Hence apart from the overlapping of the distribution (shifting), a more practical indicator - false accept rate (FAR) under the pre-defined threshold is also used to measure the system’s resistance to SA. The pre-defined threshold in this work is determined based on the matching score corresponding to the Equal Error Rate (EER) of the Genuine and Imposter scores. The Mated-SA-Imposter scores is used to compute the FAR of SA under threshold of EER, and here we denote as FAR@ET (FAR at EER threshold).
To validate GASAF, two representative CB schemes are employed. BioHashing is tested firstly on a face benchmark dataset, i.e. LFW. Then Bloom-filter is evaluated on an iris dataset, i.e. CASIA-Iris-Interval.
To explore the capability of successive attack for BioHashing, LFW [22] is used in this work to compute the above scores distribution under different simulations.Since GA is time consuming, and LFW contains 5749 subjects, it is impossible to perform the SA test on all LFW subjects, hence we build a customized LFW dataset to validate our work. In LFW, the subjects who have more than 10 images are selected and the first 10 images are choose to form a new small dataset denoted as LFW10 (158 users). The face feature of LFW10 is extracted by a deep network namely InsightFace (a.k.a ArcFace) [23].
As for Bloom-filter, the left eye images in the CASIA-v4-Iris-Interval (The Center of Biometrics and Security Research, http://www.cbsr.ia.ac.cn/china/Iris Databases CH.asp) are used to generate the IrisCode [18] by Libor Masek’s method [24].
4.1. BioHashing
In this work, the resistance to SA for BioHashing template h(x) with different bit size l is evaluated. Firstly, the EER, FAR of BioHashing system under helper data stolen scenario (all users use the same helper data), denoted as is setup. The system threshold
is computed and fixed with respect to the system EER under normal situation. Next, the SA is launched in
to find each user’s pre-image (deep features)
from a transform template stored in a compromised system
. The
is then used as a mated-SA-imposter to match with h(x) in
to generate the mated-SA-imposter scores. Finally the FAR@ET under the threshold
, and the overlapping area OL of imposter and mated-SA-imposter distribution are reported (see Table 1, Figure 3).
The results reveal several interesting points:
• For a BioHashing secured biometric system, the longer BioHash code can achieve better accuracy performance in terms of EER and FAR@ET.
• However, under SA, longer BioHash code leads to poor FAR@ET, which implies weak resistance to the similarity-based attack. This is not surprising as the
Table 1. Similarity-based attack evaluation on BioHashing (%).
longer BioHash code suggests contain more information, thus can bring decent accuracy performance. However, more information implies lower attack complexity.
• As shown in Figure 3, the longer BioHash code will cause larger shifting of the mated-SA-imposter distribution, and this leads to more instances which can exceed the threshold and gain access to the system.
It is worth to point out that short BioHash code does not imply better security, since it is vulnerable to brute-force inversion attack. Thus both long and short templates are not recommended.
4.2. Bloom-filter
In this paper, the Bloom-filter is implemented as described in [17] without using the key, since we focus on irreversibility instead of revocability. The IrisCode is evaluated by Equal Error Rate (EER). We follow the settings in [17] and the EER of 8 bits shifting and direct matching of IrisCode is calculated. In our experiment, the EER of
8 bits shifting matching is 3.46% while the direct Hamming matching is 21.6%.
Bloom-filter has two parameters, i.e. word size and block size l. In this work we fix the block size as
, and the word size rangers from 8 to 10 bits. We consider two SA variants:
• Attack 1-template: SA attack based on one compromised template per subject.
• Attack n-templates: SA attack based on multiple compromised templates per subject (n = 3 templates in this work), i.e. , where i denotes i- th subject, j = 1, 2, 3. The fitness is calculated as
.
Like BioHashing, the system threshold is computed and fixed with respect to the system EER under normal situation first. Next, two SAs are launched to reconstruct
(b) (c) Figure 3. Score distribution shifting of BioHashing under attack. The bits length l of (a),(b),(c) are 100,300,500 respectively.
back the IrisCode. The experiment is reported in Table 2. Firstly we can find that the SA attack can increase the FAR@ET under Attack 1-template. Unsurprisingly, Attack n-templates can significantly increase the FAR@ET compared with 1-template. This is due to more information are presented.Overall, we can conclude it is feasible to perform SA attack on Bloom-filter.
4.3. Time cost
The time efficiency of the GASAF is evaluated in this section. By setting the BioHash code to 200 bits, the time spend for BioHashing are recorded. The machine we use for simulation is equipped with a MATLAB Ver. 2018b, Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz and 16GB RAM. To reconstruct a face biometric feature, 35.8 seconds is spent for a template. As shown in Figure 4 (a), only 50 generations is needed to find the best solution.
As for Bloom-filter, Attack-1-template with is used to evaluate the time cost. Under this setting, it takes average 2175 seconds, 1500 generations to complete one attack (Figure 4 (b)). Even though it takes around 30 minutes to complete one attack, Bloom-filter remains high risk under SA.
In a nutshell, the result shows that the proposed attack can be time efficient and has great potential risk in real life.
In this work, we present a framework of Genetic Algorithm based similarity-based attack (GASAF) on cancellable biometrics. The GASAF can be launched on real-valued and binary based biometric templates.The genetic algorithm used in this work does not need training data and is time efficient. To evaluate the resistance to similarity-based attack, the overlapping area of Imposter and Mated-SA-Imposter distributions and FAR@ET are proposed. Two representative cancellable biometrics, namely BioHashing and Bloom-filter have been used as case study for this paper.
Similarity-based attack exploits similarity preserving
Figure 4. Time cost (a): BioHashing under l = 200 bits, (b) Bloom-filter under
characteristics of cancellable biometrics to breach the biometric security while similarity preserving is a vital ingredient to keep accuracy performance intact after transformation. This suggests the trade-off between security and performance is inevitable in cancellable biometric schemes. To alleviate similarity-based attack, we recommend that the system should carefully tune the parameters to achieve a balance between security and accuracy. From this work, we notice if higher accuracy is needed, then the template must capture enough information e.g. long transformed code. However, this will lead to information leakage and weak resist to similarity-based attack. On the other hand, small code is vulnerable to brute-force inversion attack.
This research was partly supported by Fundamental Research Grant Scheme (FRGS/1/2018/ICT02/MUSM/03/3). We also gratefully acknowledge the support of NVIDIA Corporation with the GPU grant donation of the Titan Xp GPU used for this research. Thanks for the helpful discus-
Table 2. Similarity-based attack evaluation on Bloom-filter (%) with block size
sion with my colleague Jeffrey Ting. Special thanks for my grandfather.
[1] G. Mai, K. Cao, C. Y. Pong, and A. K. Jain, “On the recon- struction of face images from deep face templates,” IEEE transactions on pattern analysis and machine intelligence, 2018.
[2] J. Galbally, A. Ross, M. Gomez-Barrero, J. Fierrez, and J. Ortega-Garcia, “Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms,” Computer Vision and Image Understanding, vol. 117, no. 10, pp. 1512–1525, 2013.
[3] K. Cao and A. K. Jain, “Learning fingerprint reconstruction: From minutiae to image,” IEEE Transactions on information forensics and security, vol. 10, no. 1, pp. 104–117, 2015.
[4] V. M. Patel, N. K. Ratha, and R. Chellappa, “Cancelable Biometrics: A review,” IEEE Signal Processing Magazine, vol. 32, pp. 54–65, Sept. 2015.
[5] M. Sandhya and M. V. N. K. Prasad, “Biometric Template Protection: A Systematic Literature Review of Approaches and Modalities,” in Biometric Security and Privacy: Opportunities & Challenges in The Big Data Era (R. Jiang, S. Almaadeed, A. Bouridane, P. D. Crookes, and A. Beghdadi, eds.), Signal Processing for Security Technologies, pp. 323– 370, Cham: Springer International Publishing, 2017.
[6] E. Chandra and K. Kanagalakshmi, “Cancelable biometric template generation and protection schemes: A review,” in 2011 3rd International Conference on Electronics Computer Technology, vol. 5, pp. 15–20, Apr. 2011.
[7] K. Nandakumar and A. K. Jain, “Biometric Template Protec- tion: Bridging the performance gap between theory and practice,” IEEE Signal Processing Magazine, vol. 32, pp. 88– 100, Sept. 2015.
[8] N. K. Ratha, S. Chikkerur, J. H. Connell, and R. M. Bolle, “Generating cancelable fingerprint templates,” IEEE Transactions on pattern analysis and machine intelligence, vol. 29, no. 4, pp. 561–572, 2007.
[9] Z. Jin, J. Y. Hwang, Y. Lai, S. Kim, and A. B. J. Teoh, “Ranking-Based Locality Sensitive Hashing-Enabled Cancelable Biometrics: Index-of-Max Hashing,” IEEE Transactions on Information Forensics and Security, vol. 13, pp. 393–407, Feb. 2018.
[10] P. Lacharme, E. Cherrier, and C. Rosenberger, “Preimage at- tack on biohashing,” in 2013 International Conference on Security and Cryptography (SECRYPT), pp. 1–8, IEEE, 2013.
[11] Y. Chen, Y. Wo, R. Xie, C. Wu, and G. Han, “Deep se- cure quantization: On secure biometric hashing against similarity-based attacks,” Signal Processing, vol. 154, pp. 314–323, 2019.
[12] J. K. Pillai, V. M. Patel, R. Chellappa, and N. K. Ratha, “Se- cure and robust iris recognition using random projections and sparse representations,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 9, pp. 1877–1893, 2011.
[13] A. Nagar, K. Nandakumar, and A. K. Jain, “Biometric tem- plate transformation: a security analysis,” in Media Forensics and Security II, vol. 7541, p. 75410O, International Society for Optics and Photonics, 2010.
[14] M. Natrella, “Nist/sematech e-handbook of statistical meth- ods,section 4.1.4.1. linear least squares regression,” 2010.
[15] Y. C. Feng, M.-H. Lim, and P. C. Yuen, “Masquerade at- tack on transform-based binary-template protection based on perceptron learning,” Pattern Recognition, vol. 47, no. 9, pp. 3019–3033, 2014.
[16] A. B. J. Teoh, A. Goh, and D. C. L. Ngo, “Random Mul- tispace Quantization as an Analytic Mechanism for BioHashing of Biometric and Random Identity Inputs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 1892–1901, Dec. 2006.
[17] C. Rathgeb, F. Breitinger, and C. Busch, “Alignment-free cancelable iris biometric templates based on adaptive bloom filters,” in 2013 international conference on biometrics (ICB), pp. 1–8, IEEE, 2013.
[18] J. Daugman, “Probing the uniqueness and randomness of iriscodes: Results from 200 billion iris pair comparisons,” Proceedings of the IEEE, vol. 94, no. 11, pp. 1927–1935, 2006.
[19] W. B. Johnson and J. Lindenstrauss, “Extensions of Lips- chitz mappings into a Hilbert space,” in Contemporary Mathematics (R. Beals, A. Beck, A. Bellow, and A. Hajian, eds.), vol. 26, pp. 189–206, Providence, Rhode Island: American Mathematical Society, 1984.
[20] M. Gomez-Barrero, C. Rathgeb, J. Galbally Herrero, J. Fier- rez, and C. H. Busch, “Protected facial biometric templates based on local gabor patterns and adaptive bloom filters,” in International Conference on Pattern Recognition, IEEE, 2014.
[21] E. Kaplan, M. E. Gursoy, M. E. Nergiz, and Y. Saygin, “Known sample attacks on relation preserving data transformations,” IEEE Transactions on Dependable and Secure Computing, 2017.
[22] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
[23] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Ad- ditive angular margin loss for deep face recognition,” arXiv preprint arXiv:1801.07698, 2018.
[24] P. K. Libor Masek, “Matlab source code for a biometric iden- tification system based on iris patterns,” 2003.