IRIS recognition has emerged as one of the most accurate,convenient and low-cost biometric modality to verify the identity of an individual. It is common knowledge [1][2] that iris patterns are known to be unique among different subjects, even among identical twins, and be easily acquired using low-cost cameras. Therefore iris recognition has been widely incorporated in the national ID programs for the benefit of citizens and effective e-governance. However, the constrained imaging requirements for such widely deployed conventional iris recognition systems, i.e., requirements for the subjects to stop, stand and stare at the iris sensors in the vicinity, poses severe limitations to incorporate iris recognition for the surveillance and forensics. Iris recognition under less-constrained or distantly acquired images has gained increasing importance in recent years. Iris image acquisition module widely use near-infrared (NIR) illumination, typically in the wavelength range of 700-900nm, which can reveal enhanced quality of iris texture under constrained imaging environment. However, with the increase in the standoff distances, the quality of acquired iris images significantly degrades. In such imaging scenarios the periocular information can play an increasingly important role for accurate personal identification. In recent years, periocular recognition has been receiving increasing attention for its promising performance under such less constrained imaging conditions [3][4]. The periocular region usually refers to the region around the eye, which preferably includes the eyebrow [5]. Such periocular near-infrared iris images, in particular, presents highly discriminative features for the person identification. Earlier work [3][4][6][7][8] in this area has validated that the periocular region is highly discriminative among different persons, and can be considered as an effective alternative or supplement to the face or iris recognition especially when the entire face or clear iris images are not available. This work is motivated to further such advances in the less-constrained iris recognition capabilities and introduces a new framework to more accurately and adaptively match less-constrained iris images.
A. Related Work
This section presents a brief summary of earlier or related work. We firstly review the related work on iris recognition, followed by the periocular recognition in Section I-A2 which also includes promising references on the less-constrained iris recognition.
1) Iris Recognition: Daugman [1] proposed one of the most classic and popular approaches for the automated iris recognition which uses band-pass Gabor filters, on the segmented and normalized iris images, for the feature encoding. These filter responses, including the real-part and imaginary-part, are then binarized to generate IrisCode which offers a compact and more robust feature representation. The Hamming distance between two IrisCodes is used as the dissimilarity score for verification. Based on [1], 1D log-Gabor filter was incorporated in [9] to replace 2D Gabor filter for more efficient iris feature extraction. In 2007, a different approach [10] using discrete cosine transform (DCT) was explored for analyzing frequency information from fixed-size image blocks and encoded binarized iris features. Miyazawa et al. [11] propose another spatial-frequency domain approach using 2D discrete Fourier transforms (DFT) which offered promising results. In 2009, Sun and Tan [12] employed the multi-lobe differential filter (MLDF), and referred to as the ordinal filters, which offered an alternative for the Gabor/log-Gabor filters in generating rich iris feature templates.
Iris recognition research has also attracted a variety of approaches to enhance segmentation accuracy for the acquired iris images and accurate segmentation is critical in enhancing reliability for the iris recognition. Some of the most widely employed iris segmentation algorithms are based on the integrodifferential operator [1] and circular Hough transforms [9] which are adapting for detecting iris and pupil circles from the near-infrared eye images. These methods perform well for the high-quality iris images but are quite known to be least reliable for the noisy images acquired under relaxed environments. Tan et.al [13] proposed an iterative approach to coarsely cluster the iris and non-iris region pixels before applying the integrodifferential operator, and achieved higher reliability in segmenting the iris pixels from noisy iris images. Following the similar coarse-to-fine strategy, a competitive approach is detailed in [3] which makes use of the Random Walker algorithm [14] for coarsely locating the iris region, followed by a couple of gray-level statistics based operators to refine the boundaries. These operators have shown to enable pixel-level precision in the final output or the iris masks. More recent approaches include [15] which utilizes an improved total variation model to address accompanying noise and artifacts in less constrained iris images, and [16] which relies on the color/illumination correction along with the watershed transform for segmenting noisy iris images acquired under visible wavelength.
There has been quite limited work to exploit the potential from deep neural network capabilities for the iris recognition, especially while considering the tremendous popularity of deep learning for various computer vision tasks including for face recognition. An earlier attempt for deep representation of iris appears in [17] in 2015, but such proposal was to detect presentation attacks, a two-class classification problem, instead of the iris recognition. A new approach using DeepIrisNet was investigated in [18] and used a deep learning-based framework for general iris recognition. This work is essentially a direct application of typical convolutional neural networks (CNN) without many optimizations for the iris patterns. Another more recent work in [19] has attempted to exploit a deep belief net (DBN) for iris recognition. Its core component, however, is the optimal Gabor filter selection, while the DBN is again an application on the IrisCode without iris-specific optimization. More recent work in [20] proposes a UniNet [20] employing the deep fully convolutional networks (FCN) [21] to generate iris binary images and masks for Hamming distance calculation, which explores the substantial connections between iris recognition and deep learning. This work introduces a new loss function that incorporates conventional bit-shifting operations and masks in matching score computations, and achieves state-of-the-art accuracy on several publicly available datasets. Another related and promising work appears in [22] which uses a deep learning architecture to infer misalignment between a pair of iris images that are represented in a segmentation-less polar domain.
2) Periocular Recognition: In recent years, researchers have devoted consistent efforts to investigate new periocular recognition algorithms for the images acquired under less-constrained environments [7][23]. Earlier feasibility study on using the periocular regions for human recognition under varying imaging conditions is undertaken by Park et al. [6] in 2009, and promising results were reported, which provides support for the subsequent research. Bharadwaj et al. [4] further explored the effectiveness of periocular recognition in situations arising from the failure of iris recognition. In this work, part of the later research focuses on cross-spectral periocular matching [24] using the potential from the neural networks. The above explorative works have further motivated the researchers to continuously improve the matching accuracy of periocular images. In 2013, another promising approach appeared in [3], which exploited key-point features and spatial-filter banks, i.e., Dense-SIFT and LMF features, followed by K-means clustering for dictionary learning and representation. However, this approach did not investigate periocular-specific feature representations, and the uses computationally demanding Dense-SIFT features matching. Smereka and Kumar [5] proposed the Periocular Probabilistic Deformation Model (PPDM) in 2015, which provided sound modeling for the potential deformations that exists among two matched periocular images. Inference of the captured deformation using correlation filter is utilized for matching periocular image pairs. Later in 2016, the same group of researchers improved their basic model by selecting discriminative patch regions for more accountable matching [25]. These two methods achieved promising performance on multiple datasets. Nevertheless, both of them relied on patch-based matching scheme, and therefore are more susceptible to scale variations or misalignment, that often violate the patch correspondences, which is more likely to happen during the real deployments. Deep learning techniques, especially convolutional neural networks (CNN), have gained immense popularity for computer vision and pattern analysis tasks in recent years.
A recent survey on periocular recognition methods [7] [23] suggests that few studies have considered the potential from deep learning techniques to boost the periocular matching accuracy. Reference [26] provides insightful observations on periocular features and comparison of machine with human matching performance. In [27], Bowyer and Burge present a systematic summary on the related ocular recognition systems and algorithms. More recently, Proena and Neves [28] claimed that iris and sclera regions might be less reliable for periocular recognition and proposed Deep-PRWIS. In their work, periocular images are augmented with inconsistent iris and sclera regions for training a deep CNN, so that the network implicitly degrades the iris and sclera features during learning. Promising results were reported from the Deep-PRWIS on two public databases. More promising efforts appear in [8], which uses a deep learning-based architecture for robust and accurate periocular recognition incorporating the attention model to emphasize the region with higher discriminative information. This algorithm achieves state-of-the-art accuracy on six publicly available databases and can serve as a reasonable baseline for further research in this area.
3) Iris and Periocular Feature Fusion: The periocular information is simultaneously accessible from the iris images and therefore its use to achieve better iris recognition performance is a feasible strategy. Quite a few prior works have attracted attention to this aspect and several approaches have non-ideal scenarios. In 2010, Woodard et al. [32] combined iris and periocular features using score level combination, i.e., weight sum rule, to improve the recognition performance in non-ideal iris imagery. Optimal weights for the two modalities were empirically obtained. Another promising attempt appears in [3] which simultaneously recovers the iris feature extracted from log-Gabor filters, periocular features extracted from Dense-SIFT and LMF, to enhance the iris recognition accuracy under relaxed imaging constraints. Raja et al. [33] propose a framework to combine the information from face, iris and periocular biometric modalities for the user authentication on their smartphones. Various score level combination schemes are explored, including min rule, max rule, product rule, and weighted- score fusion rule, where the weight for each
TABLE I: Comparative summary of related and recent work on less-constrained iris recognition.
modality is determined according to its contribution to the recognition performance. Besides, some approaches adopt learning-based score-level fusion strategies. Santos et al. [34] present an artificial neural network with two hidden layers to fuse iris and periocular information at the score level for the mobile cross- sensor applications. Verma et al. [35] utilize the random decision forest (RDF), which is an ensemble learning method, to combine the match scores of iris and periocular biometrics. Noticeable improvement in the performance is shown for at-a-distance person recognition. Ahuja et al. [36] extract the periocular feature using deep learning and the iris feature using the root SIFT. Then they combine the match scores from these two modalities using the mean rule and linear regression. There are some other promising attempts in the literature that integrate the information from these two biometric at the decision level and feature level combination. Santos and Hoyle [37] fuse iris and periocular modality at the decision level to increase the reliability in the unconstrained iris recognition. They train a logistic regression model to predict the weights for each of the classifiers and obtain a final response. Joshi et al. [38] investigate iris and periocular biometric performance from their feature level combination. They first concatenate iris and periocular features and then employ the Direct Linear Discriminant Analysis (DLDA) to obtain discriminative and low dimensional feature vectors for the final classification. More recently, Zhang et.al [29] provide a promising framework to combine the iris and periocular features extracted from maxout CNN to enhance the performance for mobile based personal identification.
B. Our work
Accuracy of iris recognition under a less-constrained environment is known to significantly degrade, as compared to those from the conventional or standoff iris recognition systems. Such iris images are generally acquired with greater standoff distances, for the surveillance or from the mobile devices with less-cooperative individuals. This research is motivated to address such iris recognition challenges and evaluate iris recognition capabilities under more realistic scenarios. Iris images acquired under less constrained imaging environments often present varying regions of effective iris pixels [1][39]. In the context of such less constrained iris images, we revisit the conventional Hamming distance to match binarized iris templates. Such iris images present significant variations in occlusions which should be carefully considered while simultaneously utilizing available periocular features. Since iris information is inherently embedded in periocular images, the effectiveness of iris matching can benefit from the relative attention or eye area, like for the human visual systems [40]. Table I presents a summary of related work with our work in this paper for the less constrained iris recognition. The key contributions of this paper can be summarized as follows:
• This paper introduces a new framework for the periocular assisted iris recognition. Iris images under a less-constrained imaging environment often present varying regions of effective iris pixels for the iris matching. Such differences in the effective number of available iris pixels can be used to dynamically reinforce periocular information which is simultaneously available from such iris images. Such dynamic reinforcement should also consider effective regions of discriminative features that receive varying attention during respective periocular matching. Our framework therefore incorporates such discriminative information using a multilayer perceptron network for the less-constrained iris recognition. The experimental results presented in Section III-B for within-database matching using the receiver operating characteristics curve (ROC), on three publicly available databases, indicate outperforming results over state of the art methods. Also, the
ROC results presented in Section III-C show that our algorithm outperforms others in cross-dataset matching. The results from within dataset matching and cross-dataset matching validate the effectiveness and generalization ability of the framework presented in this paper for the less-constrained iris recognition.
• The importance of black (0) and white (1) pixels in binarized iris templates may not be the same or similar for iris image templates acquired under less constrained imaging. Therefore this paper presents a new approach to match such templates using a similarity measure, instead of Hamming distance in the literature, which can accommodate the importance of different bits in iris templates. The experimental results presented in this paper on three publicly available iris databases consistently indicate outperforming results and validate the effectiveness of such approach for less-constrained iris recognition.
Comparative performance from our approach with other competing methods, on three common and public iris images datasets, is also summarized in Table I. The rest of this paper is organized as follows. Section II, provides details on our unified framework for less constrained iris recognition. This section also includes the architectures for iris and periocular recognition, together with the formulation of the dynamic fusion approach introduced in this work. Our comparative experimental results from within dataset matching and cross-dataset matching using three different public databases are presented in Section III, The discussion section appears in Section IV which discusses the theoretical reasons of the effectiveness of our proposed approaches. The key conclusions from this paper are summarized in Section V.
The framework for periocular-assisted and multi-feature collaboration schemes to achieve dynamic iris recognition is illustrated in Figure 1. The detailed explanation of different blocks in this diagram is systematically introduced in the following three sections. This framework adopts the UniNet [20] to achieve accurate iris matching while the AttenNet and FCN-Peri [8] are embedded in simultaneously matching the periocular regions in the acquired eye or iris images. The network is trained during two different training or offline phases. We firstly pre-process each of the acquired eye images to independently recover the normalized iris images respective periocular images. The corresponding region of interest images is fed to the respective subnets and trained independently during the first network training phase. During the second training phase, all the parameters in two subnets are frozen and and used to recover recover several cues that indicate the similarity among the iris and periocular templates, including the effective region of iris images among matched template and the corresponding periocular region components among the matched templates. Finally these several cues from the two subnets and employed to train a multilayer perceptron (MLP) network that can enable a binary prediction using the softmax cross-entropy loss. During the performance evaluation or the test phase, a pair of eye images are fed into the trained models and recover the prediction results from the the last softmax layer. These softmax layer results are considered as the consolidated match scores between the input or the unknown eye pair images. Thesse consolidated match scores are used to achieve the binary or the classification decisions for the different applications. Following sections provide further details on different components of the framework.
A. Iris Template Generation and Comparisons
Each of the acquired eye images is first subjected to the localization of region of interest or the iris segmentation and image normalization. These preprocessing steps results in the normalized iris images and were same as employed in earlier work [15]. The dimension of all the segmented and normalized iris images generated from the preprocessing steps, for all the databases employed in our work, is 51264 pixels. These images are also subjected to the contrast enhancement which saturates 5% of iris region pixels at high and low intensities.
The normalized rectangular iris images are subjected to recover respective feature templates and respective masks depicting valid iris pixels or regions. The UniNet architecture introduced in [20] has shown to offer state-of-art iris matching capabilities and was also adopted in this work. The UniNet includes two fully-convolutional sub-networks called FeatNet and MaskNet as specified in Table II. The MaskNet generates binary mask distinguishing the valid and invalid or less reliable regions in the iris templates that often degrade the iris matching accuracy. The network uses triplet architecture for the training and we generate triplets in a ratio of 1:3 between the genuine match pairs and the imposter match pairs for the respective training sets. The MaskNet is pre-trained using from ND-IRIRS-0405 Iris Image Dataset [41] and all the parameters are frozen in this work. The FeatNet, pretrained with ND-IRIRS-0405 Iris Image Dataset and publicly made available from [20], is finetuned using the triplet pairs generated from the respective training sets. The FeatNet is essentially a fully convolutional neural network and aims to learn the same size but more robust pseudo-binary representation of the input iris images. The loss function introduced in the FeatNet training is the extended triplet loss which aims to enlarge the margin of the pseudo-Hamming distance between the intra-class and inter-class matching. The extended triplet loss can be defined as follows.
where N is the batch size, are the corresponding masked feature maps generated by the FeatNet, and m is the hyperparameter controlling the margin between anchorpositive and anchor-negative distances.
B. Comparisons using Similarity Score
Hamming distance is widely employed to compute the dissimilarities between two binary feature templates in a range of biometric identification problems, such as for the iris or the palmprint recognition. It assumes that the information
Fig. 1: The framework for the deep dynamic fusion using iris and periocular information.
TABLE II: The specification of incorporated UniNet. Layer Name Layer Type Kernel Size OutputChannel
content from all the template values in the coding space is equally important to distinguish the user identity. However the choices of feature extraction and binarization methods, along with the nature of input images, can effectively determine the importance of white (ones) area and black (zeros) area in the encoded images. Therefore, a more flexible distance measure that can consider such asymmetric importance is proposed to be incorporated for matching less-constrained iris images. Such measure is also referred to as the weighted similarity score (WS) with azzoo similarity measure [42] and was also incorporated for matching iris templates.
The effectiveness of white pixel matching and the black pixel matching in feature templates can also be experimentally evaluated. Let us assume that the number of white pixels and black pixels from one feature template A can be respectively represented as and
. While comparing two template A and template B, we can perform only white pixels matching
and only black pixels matching
, and can compute the white pixel matching rates
and black pixel matching rate
as shown in the following two equations.
The difference in the contributions from different pixels matching,i.e. average and
from the genuine and imposter pairs, can also be empirically observed from the experiments using templates generated from the databases. We select 1,000 genuine matching and 2,000 imposter matching from the test on CASIA-Mobile-V1-S3 dataset for empirical evaluation. It was observed that the average
are 0.5733 and the average
is 0.6138 for the genuine matches, while
are 0.4159 and the average
is 0.4563 for the imposter matches
In order to accommodate differences in the discriminative information from the white pixel pairs and from the black pixel pairs, we use different weight and generate weighted similarity measure as follows:
where are pixels in row i and column j in two matched two iris templates, and
is hyperparameter controlling the significance of coding pairs. In all our experiments,
is empirically set as 0.3. Assuming the image size of iris images are
, we generate the match score using the weighted similarity as follows:
It can be observed that when the is unity, the value of
is essentially the difference between unity and the normalized Hamming distance. Therefore weighted similarity can be considered as a more flexible alternative for the templates matching.
C. Periocular Template Generation and Comparisons
The periocular preprocessing is more simplified and incorporates image normalization with a bilinear filter. The dimensions of all normalized periocular images are empirically fixed as 300240. Earlier research has shown that periocular recognition with attention models can offer state-of-art performances [8] and was also employed for generating periocular template images for the matching. Therefore the periocular recognition model also includes two components, FCN-Peri and AttenNet. The architecture for these networks are detailed in Table III. The FCN-Peri is a fully convolutional network which aims to detect the eye region and eyebrow region in the presented periocular images. We use the FCN-Peri for the near-infrared (NIR) images, as publicly made available in [8], and do not perform any further tuning. With such automatically detected eye and eyebrow region, the AttenNet provides pixel locations to these specific particular regions so that specific attention is incorporated to these locations in generating more discriminant periocular features. The output of AttenNet is a feature vector with 512 elements. We compute the distance-driven sigmoid cross-entropy (DSC) loss between the siamese pairs, which are generated from the corresponding training set during the training phase. The ratio of genuine pairs and imposter pairs is set empirically set as 1:2 for all our experiments. The DSC loss
[8] can be defined as follows.
where N is the batch size, t is the ground truth label for every genuine and imposter pair, and s is a transformed Euclidean distance.
D. Segmentation-Aware Dynamic Fusion
Any effective dynamic mechanism to simultaneously utilize the iris and periocular information should carefully consider multiple cues, not just from the individual feature similarity but also from the segmentation steps which can provide (dynamic) importance for the individual similarity scores. Iris images under less-constrained imaging often present varying number of effective iris pixels, that are incorporated to generate respective iris match scores. The differences in the effective number of available iris pixels, among two matched iris images, can be used to dynamically reinforce periocular information for more reliable match score. Such dynamic
TABLE III: Details on the architecture for the AttenNet and FCNPeri.
TABLE IV: The specification of incorporated MLP.
reinforcement should also consider effective regions of discriminative features, which are receiving varying attention during respective periocular matching. Therefore we incorporate multilayer perceptron network to dynamically consolidate such multiple pieces of discriminative information and generate more reliable consolidated match score between two unknown or input images.
As illustrated in Figure 1, the UniNet generates pseudo-binary feature maps, along with the respective masks, while the AttenNet generates the feature vectors to compute Euclidean distance among respective ROI maps. Therefore we can simultaneously generate iris match scores and periocular match score using the Euclidean distance. Another important input for MLP, which effectively represents the importance or the quality of respective iris match scores, is the mask rate. This mask rate is the ratio between the valid pixels and all iris pixels among two matched iris image templates. Similarily the effectiveness of periocular feature template match scores is represented using the eye and eyebrow ratio sum and the difference, i.e., sum (also difference) of eye areas among matched periocular images and sum (also difference) of eyebrow areas among matched periocular images. It should be noted that these eye and eyebrow areas are automatically predicted or available from AttenNet as shown in Figure 1. The MLP network therefore receives an eight-element feature vector and is trained offline using respective genuine and impostor pairs from the training dataset. The architecture of incorporated MLP is shown in Table IV. The network training attempts for a binary classification using softmax cross-entropy loss, with respective genuine and impostor class labels. The trained network is used to generate consolidated match scores from the softmax value in the last layer output which ranges between 0 and 1.
We perform a series of experiments on three publicly available datasets to ascertain the effectiveness of the proposed framework for less-constrained iris recognition. This section firstly provides brief but necessary information for the three public datasets used in this work. We then explain the experimental protocols in the following section. This section also provides a comparative analysis of results from our method with other state-of-the-art methods.
A. Datasets and Protocols
The experimental results presented in this section utilized the following three near-infrared eye image datasets in the public domain. Figure 2 illustrates the sample eye images from these different datasets.
1) Q-FIRE-05-middle-illumination Dataset: The Quality in Face and Iris Research Ensemble (Q-FIRE) dataset [30] is a publicly available dataset with at-a-distance iris images. Our experiments use Q-FIRE-05-middle-illumination subset which has been acquired at a distance of five feet. under middlelevel near-infrared illumination. We automatically segment the periocular region images with a trained Fast-RCNN detector. The processed dataset includes both eye images from 159
Fig. 2: Sample eye images from employed datasets(a) CASIA-Mobile-V1-S3 dataset. (b) CASIA iris image v.4 distance dataset. (c) Q-FIRE-05-middle-illumination dataset.
different subjects. The first 15 right-eye images are used to train the network while the first ten left-eye images are used for the test evaluation. Therefore this set of experiments generate 7,155 (45 159) genuine scores and 1,256,100 (159
158
50) imposter match scores.
2) CASIA-Moblie-V1-S3 Dataset: CASIA-Mobile-V1-S3 dataset [29] is another publicly available dataset that includes 3600 face images from 360 different subjects and these images have been acquired using a mobile device with near-infrared illumination. A Fast-RCNN detector [43] is trained with 100 manually labeled samples to detect the periocular region. We follow the same match protocols, both for the iris matching and periocular matching as described in [29]. Therefore the training set includes 3600 samples from 360 classes (eyes) in the first 180 subjects. The test set includes the other 3600 samples from 360 classes (eyes) in 180 subjects. The left eye is matched with all the left-eye images while the right-eye images are matched with all the right ones. After that, the left eye match scores and right eye match scores are combined using the sum rule and generate 8,100 genuine and 1,611,000 imposter match scores.
3) CASIA Iris Image v.4 Distance Dataset: This subset of the CASIA.v4 database [31] contains the upper part of faces images from 142 subjects. We detect the iris region images with an OpenCV-implemented iris detector [44], as in earlier references, and generate an eye dataset with 2,446 instances. The training set comprises all the right eye samples, and the test set is composed of all the left eye samples as in [20]. The test set therefore generates 20,702 genuine and 2,969,533 imposter match scores.
B. Iris and Periocular Recognition
We firstly present comparative experimental results using simultaneously recovered iris and periocular features using the framework presented in Section II. Under this set of experiments, all the models were trained using their respective training set and verification performance is evaluated using the respective test set as detailed in earlier sections. We use iris recognition results generated from the UniNet [20], and periocular recognition results generated using the AttenNet [8], as the baseline methods for the comparative performance evaluation. Also, we provide comparison using the static score
Fig. 3: Comparative receiver characteristic curve (ROC) results from within dataset matching.
level combination, using the iris match scores generated using similarity measure by us with the periocular match scores, with weighted sum. These comparative results from our algorithms and respective benchmarks are presented in Figure 3 and summarized in Table V.
The receiver characteristic curves (ROC) shown in Figure 3, along with the GAR and equal error rate (EER) summarized in Table V, indicate outperforming results in this set of within database experiments. It can be observed that the iris recognition itself, using the proposed similarity measure, achieve significantly superior performance over the state of the art iris recognition approach in [20]. The combination of respective iris and periocular match scores using static fusion offers significant performance improvement while the dynamic fusion framework using DCNN provides consistently outperforming results on three different datasets. Our approach also outperforms the framework proposed in TIP13 [3]. This limited performance can be attributed to the lack of any specialized periocular matching algorithm in [3] and our analysis indicates that it is the main constraint in limiting the overall performance. The Maxout CNN is implemented by ourselves based on the parameters provided in [29] since there is no publicly available code for the employed DCNN model and the segmentation algorithm. Also, the Bath dataset used to pre-train the model is no longer publicly available.
C. Cross-Database Performance Evaluation
In the cross-database configuration, we incorporate the model trained from CASIA.v4-distance to match CASIA-Mobile-V1-S3 and Q-FIRE dataset images directly without any fine-tuning. In addition, we also present cross-database experimental results with the model trained using CASIA-Mobile-V1-S3 database and tested on the CASIA.v4-distance and Q-FIRE dataset images. These set of experiments are aimed to validate the generalization capability of our framework, especially when the image samples available for the training are quite limited. The EER values are summarized in Table VI and the respective ROCs are shown in Figure 4.
The results summarized in this set of experiments indicate consistent improvement from our framework during the cross-database matching which reveals the generality of the framework in matching less-constrained iris images.
The complementary nature of match scores generated from the deep features in our experiments can be visualized from the two-dimensional plots representing iris and periocular scores. Figure 5 illustrates such plots for the distribution of (normalized) genuine and imposter scores from iris and periocular matching using respective databases. The subplots in each axis are the kernel density estimation of each score distribution. These plots from less-constrained images indicate that the joint use of individual match scores can be used to more effectively separate genuine and impostor match scores as pursued in this work.
This work has introduced a new framework for the peri- ocular assisted iris less-constrained recognition. Our approach has attempted to use better matches for the periocular matching and introduces a similarity score for more accurate iris recognition. The fusion mechanism can dynamically consider the importance of each of the modality, their relative importance, and effective region of the interest to generate more reliable consolidated match scores. The experimental results presented in Section III using three publicly available datasets demonstrate the merit of the proposed approach, with the outperforming ROC results under within dataset and cross dataset scenario. In order to ensure reproducibility of all our results we will provide all codes, along with ground truth labels. Building an end-to-end framework for periocular and iris recognition framework is one possible direction to further improve this work. Iris recognition itself can be considered as an attention in the periocular recognition. An end-to-end framework that can perform segmentation and simultaneously learn robust features is expected to be more attractive, elegant and is part of further work in this area.
TABLE V: Summary of recognition rates and equal error rates values from the comparison within dataset matching.
Fig. 4: Comparative ROC results from cross-dataset performance evaluation. The results in (a)-(b) use the model trained with CASIA.v4 Distance dataset while results in (c)-(d) use the model trained with CASIA-Mobile-V1-S3 dataset.
Fig. 5: Distribution of matching scores from iris and periocular features.
[1] J. Daugman, “How Iris Recognition Works,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, 2004.
[2] K. W. Bowyer, K. Hollingsworth, and P. J. Flynn, “Image understanding for iris biometrics: A survey,” Computer vision and image understanding, vol. 110, no. 2, pp. 281–307, 2008.
[3] C. W. Tan and A. Kumar, “Towards online iris and periocular recog- nition under relaxed imaging constraints,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 3751–3765, 2013.
[4] S. Bharadwaj, H. S. Bhatt, M. Vatsa, and R. Singh, “Periocular biomet- rics: When iris recognition fails,” in 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).
TABLE VI: Comparative summary of recognition rates and equal error rates values from the cross-dataset matching.
Fig. 6: Degraded quality image samples: (a) Defocus blur sample in Q-FIRE dataset, (b) Poor illumination sample in CAISA-Mobile-V1-S3 dataset, (c) Severely occluded sample in CASIA v.4 Distance dataset.
IEEE, 2010, pp. 1–6.
[5] J. M. Smereka, V. N. Boddeti, and B. V. K. Vijaya Kumar, “Probabilistic Deformation Models for Challenging Periocular Image Verification,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 9, pp. 1875–1890, 9 2015.
[6] U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visible spectrum: A feasibility study,” in 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems. IEEE, 9 2009, pp. 1–6.
[7] F. Alonso-Fernandez and J. Bigun, “A survey on periocular biometrics research,” Pattern Recognition Letters, vol. 82, pp. 92–105, 2016.
[8] Z. Zhao and A. Kumar, “Improving periocular recognition by explicit attention to critical regions in deep neural network,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 12, pp. 2937–2952, 2018.
[9] L. Masek, “Recognition of human iris patterns for biometric identifica- tion,” Ph.D. dissertation, University of Western Australia, 2003.
[10] D. M. Monro, S. Rakshit, and D. Zhang, “DCT-Based Iris Recogni- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 4, pp. 586–595, 4 2007.
[11] K. Miyazawa, K. Ito, T. Aoki, K. Kobayashi, and H. Nakajima, “An Effective Approach for Iris Recognition Using Phase-Based Image Matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1741–1756, 10 2008.
[12] Zhenan Sun, Tieniu Tan, Z. Sun, and T. Tan, “Ordinal Measures for Iris Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2211–2226, 2008.
[13] T. Tan, Z. He, and Z. Sun, “Efficient and robust segmentation of noisy iris images for non-cooperative iris recognition,” Image and Vision Computing, vol. 28, no. 2, pp. 223–230, 2 2010.
[14] L. Grady, “Random Walks for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1768– 1783, 11 2006.
[15] Z. Zhao and A. Kumar, “An Accurate Iris Segmentation Framework Under Relaxed Imaging Constraints Using Total Variation Model,” in 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 12 2015, pp. 3828–3836.
[16] M. Frucci, M. Nappi, D. Riccio, and G. Sanniti di Baja, “WIRE: Watershed based iris recognition,” Pattern Recognition, vol. 52, pp. 148– 159, 4 2016.
[17] D. Menotti, G. Chiachia, A. Pinto, W. Robson Schwartz, H. Pedrini, A. Xavier Falcao, and A. Rocha, “Deep Representations for Iris, Face, and Fingerprint Spoofing Detection,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 864–879, 4 2015.
[18] A. Gangwar and A. Joshi, “DeepIrisNet: Deep iris representation with applications in iris recognition and cross-sensor iris recognition,” in Proceedings - International Conference on Image Processing, ICIP, vol. 2016-Augus, 2016, pp. 2301–2305.
[19] F. He, Y. Han, H. Wang, J. Ji, Y. Liu, and Z. Ma, “Deep learning architecture for iris recognition based on optimal Gabor filters and deep
belief network,” Journal of Electronic Imaging, vol. 26, no. 2, p. 023005, 3 2017.
[20] Z. Zhao and A. Kumar, “Towards More Accurate Iris Recognition Using Deeply Learned Spatially Corresponding Features,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob. IEEE, 12 2017, pp. 3829–3838.
[21] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6 2015, pp. 3431– 3440.
[22] H. Proenca and J. C. Neves, “Segmentation-less and non-holistic deep- learning frameworks for iris recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
[23] A. Rattani and R. Derakhshani, “Ocular biometrics in the visible spectrum: A survey,” Image and Vision Computing, vol. 59, pp. 1–16, 3 2017.
[24] A. Sharma, S. Verma, M. Vatsa, and R. Singh, “On cross spectral periocular recognition,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 10 2014, pp. 5007–5011.
[25] J. M. Smereka, B. V. K. V. Kumar, and A. Rodriguez, “Selecting discrim- inative regions for periocular verification,” in 2016 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA). IEEE, 2 2016, pp. 1–8.
[26] K. P. Hollingsworth, S. S. Darnell, P. E. Miller, D. L. Woodard, K. W. Bowyer, and P. J. Flynn, “Human and machine performance on periocular biometrics under near-infrared light and visible light,” IEEE transactions on information forensics and security, vol. 7, no. 2, pp. 588–601, 2011.
[27] K. W. Bowyer and M. J. Burge, Handbook of iris recognition. Springer, 2016.
[28] H. Proenca and J. C. Neves, “Deep-PRWIS: Periocular Recognition Without the Iris and Sclera Using Deep Learning Frameworks,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 4, pp. 888–896, 4 2018.
[29] Q. Zhang, H. Li, Z. Sun, and T. Tan, “Deep feature fusion for iris and periocular biometrics on mobile devices,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2897–2912, 2018.
[30] P. A. Johnson, P. Lopez-Meyer, N. Sazonova, F. Hua, and S. Schuckers, “Quality in face and iris research ensemble (Q-FIRE),” in 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2010, pp. 1–6.
[31] “Biometrics Ideal Test, CASIA.v4 database.” [Online]. Available: http://www.idealtest.org/dbDetailForUser.do?id=4
[32] D. L. Woodard, S. Pundlik, P. Miller, R. Jillela, and A. Ross, “On the Fusion of Periocular and Iris Biometrics in Non-ideal Imagery,” in 2010 20th International Conference on Pattern Recognition. IEEE, 8 2010, pp. 201–204.
[33] K. B. Raja, R. Raghavendra, M. Stokkenes, and C. Busch, “Multi-modal authentication system for smartphones using face, iris and periocular,” in 2015 International Conference on Biometrics (ICB). IEEE, 5 2015, pp. 143–150.
[34] G. Santos, E. Grancho, M. V. Bernardo, and P. T. Fiadeiro, “Fusing iris and periocular information for cross-sensor recognition,” Pattern Recognition Letters, vol. 57, pp. 52–59, 5 2015.
[35] S. Verma, P. Mittal, M. Vatsa, and R. Singh, “At-a-distance person recognition via combining ocular features,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 9 2016, pp. 3131–3135.
[36] K. Ahuja, R. Islam, F. A. Barbhuiya, and K. Dey, “A preliminary study of CNNs for iris and periocular verification in the visible spectrum,” in 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 12 2016, pp. 181–186.
[37] V. Talreja, M. C. Valenti, and N. M. Nasrabadi, “Multibiometric secure system based on deep learning,” in 2017 IEEE Global conference on signal and information processing (globalSIP). IEEE, 2017, pp. 298– 302.
[38] A. Joshi, A. K. Gangwar, and Z. Saquib, “Person recognition based on fusion of iris and periocular biometrics,” in 2012 12th International Conference on Hybrid Intelligent Systems (HIS). IEEE, 12 2012, pp. 57–62.
[39] J. Cambier, “Biometric data interchange formats–part 6: Iris image data,” ISO/IEC, vol. 19794, 2011.
[40] V. Mnih, N. Heess, A. Graves et al., “Recurrent models of visual attention,” in Advances in neural information processing systems, 2014, pp. 2204–2212.
[41] P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, and M. Sharpe, “FRVT 2006 and ICE 2006 largescale experimental results,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 831–846, 2010.
[42] H. M. Cheng and A. Kumar, “Advancing Surface Feature Encoding and Matching for More Accurate 3D Biometric Recognition,” in Proceedings - International Conference on Pattern Recognition, vol. 2018-Augus, 2018, pp. 3501–3506.
[43] R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448, 2015.
[44] “OpenCV based face and eye detector.” [Online]. Available: http: //docs.opencv.org/runk/d7/d8b/tutorial py face detection.html.