There are two main paralleled sub-processes in our framework: background saliency and foreground saliency. They are calculated separately based on foreground and background seeds. Then two saliency maps are fused into one and enhanced by a refinement step based on geodesic distance to derive the final saliency map. The main framework of proposed approach is depicted in Fig.1.
Fig. 1: Overview of the main framework of our proposed approach
2.1 Foreground saliency
This section will detail on how to find reliable foreground seeds and generate saliency map based on these selected seeds.
2.1.1 Foreground seeds estimation
To extract foreground seeds from an image reliably, surroundedness cue is employed. We adopt the binary segmentation based method in BMS[11] , which exploit surroundedness cue thoroughly in an image, to guide our foreground seeds localization. We denote the map generated by BMS as a surroundedness map, , in which pixel value indicates its degree of surroundedness. To better utilize structural information and abstract small noises, We decompose image into a set of superpixels by SLIC algorithm [14]. All operation in rest of this paper is performed on superpixel-level. The surroundedness value of each superpixel is defined by averaging the value
of all its pixels inside, denoted by
is the number of superpixels.
Unlike previous works[12, 13] that treat some regions as certain seeds, we provide a more flexible scheme for seeds estimation. We define two types of seed elements: strong seeds and weak seeds. Strong seeds have high probability of belonging to foreground/background while weak seeds have relatively low probability of belonging to foreground/background. For foreground seeds, the two types of seeds are selected by:
where denotes the set of strong seeds and
weak seeds, i represent ith superpixel. mean(.) is the averaging function. It is obvious from formula (1)(2) that elements of higher degree of surroundedness are more likely to be chosen as strong foreground seeds, which is consistent with intuition.
2.1.2 Foreground saliency map
For saliency calculation based on given seeds, a ranking method in [15] that exploits the intrinsic manifold structure of data for graph labelling is utilized. The ranking method is to rank the relevance of every element to the given set of seeds. We construct a graph that can represent an whole image as in work [16], where each node is a superpixel generated by SLIC.
The ranking procedure is as follows: Given a graph G = (V, E) ,where the nodes are V and the edges E are weighted by an affinity matrix . The degree matrix is defined by
, where
. The ranking function is given by:
The is the resulting vector which stores the ranking results of each element. The
is a vector indicating the seed queries.
In this work, the weight between two nodes is defined by:
where and
denote the mean of the superpixels corresponding to two nodes in the CIE LAB color space, and
is a constant that controls the strength of the weight.
Different from [16] that define if i is a query and
otherwise, we define
as the strength of the query extra. That is,
if i is a strong query, and
if i is a weak query, and
otherwise.
For foreground seeds based ranking, all elements are ranked by formula (4) given the sets of seeds in (1)(2). The process of foreground saliency is illustrated in Fig.2(first row).
Fig. 2: Illustration of foreground and background saliency. (a) original image; (b) superpixel segmentation; (c)top: foreground seeds, bottom: background seeds(blue : mask of strong seeds, green: mask of weak seeds); (d)top: foreground saliency map, bottom: background saliency map.
2.2 Background saliency
Complementary to foreground saliency, background saliency aims to extract regions that are different from background in feature distribution. We first select a set of background seeds and then calculate saliency of every image element according to its relevance to these seeds. This section elaborates on the process of seeds estimation and background saliency calculation.
Fig. 3: Comparison of different seeds estimation scheme. (a) original image; (b) superpixel segmentation; (c) our scheme for background seeds estimation; (d) saliency map corresponding to (c); (e) common seeds estimation scheme; (f) saliency map corresponding to (e).
2.2.1 Background seeds estimation
Unlike most previous works [7] that use the elements on image boundary as background seeds, we divide the elements on image border into two categories(strong seeds and weak seeds) as in foreground situation. We denote the average value of all border elements as c. The euclidean distance between each feature vector and the average feature vector is computed by , the average of
is denoted by
. The background seeds are estimated by:
where denotes strong background seeds,
denotes weak background seeds.
2.2.2 Background saliency map
Similar to foreground situation, the value of indication vector for background seeds y is if i belongs to
,
if i belongs to
and 0 otherwise. Relevance of each element to background seeds is computed by formula (3). Elements in resulting vector
indicates the relevance of a node to the background queries, and its complement is the saliency measure.The saliency map using these background seeds can be written as:
The process of background saliency is shown in Fig.2(second row), and comparison between our seeds estimation scheme and common scheme is illustrated in Fig.3. It is noted that our scheme is robust for extracting more salient regions from an image.
2.3 Geodesic distance refinement
A combination of Foreground and background saliency maps is performed as follows: elements whose value is larger than the average value of that map is selected as saliency elements separately in these two maps and combined into one set, a ranking is conducted again using these elements as seeds to get a combination map .
The final step of our proposed approach is refinement with geodesic distance [17]. The motivation underlying this operation is based on observation that determining saliency of an element as weighted sum of saliency of its surrounding elements, where weights are corresponding to Euclidean distance, has a limited performance in uniformly highlighting salient object. We tend to find a solution that could enhance regions of salient object more uniformly. From recent works [18] we found the weights may be sensitive to geodesic distance.
For jth superpixel, its posterior probability can be denoted , thus the saliency value of the qth superpixel is re-fined by geodesic distance as follows:
where N is the total number of superpixels, and is a weight based on geodesic distance [17] between qth and jth superpixel. Based on the graph model constructed in section 2.1.2 , the geodesic distance between two superpixels
can be defined as accumulated edge weights along their shortest path on the graph:
In this way we can get geodesic distance between any two superpixels in the image. Then the weight is defined as
where
is the deviation for all
val- ues. The salient objects are highlighted uniformly after this step of processing, as will be seen in experiment section.
This section presents evaluation of our proposed method.
Datesets. We test our proposed model on ASD dataset [5], OUT-OMRON dataset [16]. ASD dataset provides 1000 images with annotated object-contour-based ground truth, while DUT-OMRON dataset provide 5168 more challenging images with pixel-level annotation.
Fig. 4: Visual comparison of saliency models
Fig. 5: (a) PR curve on ASD dataset; (b) precision, recall and F-measure on ASD dataset; (c) PR curve on DUT-OMRON dataset; (d) precision, recall and F-measure on DUT-OMRON dataset;
Table 1: Quantitative comparision of MAE and AUC on ASD dataset
Table 2: Quantitative comparision of MAE and AUC on DUT-OMRON dataset
Evaluation metircs. For accurate evaluation, we adopts four metrics: Precion-recall(PR) curve, F-measure, mean absolute error(MAE), and AUC score. Fig.5 shows the PR curves, and precision, recall and F-measure values for adaptive threshold that is defined as twice the mean saliency of the image. Table 1 and table 2 shows the MAE and AUC scores on two datasets.
Comparison We compare our proposed method with 11 state-of-the-art models, including CAS[19], wCtr[17], FT[5], DRFI[7], GBVS[20], ITTI[21], MILPS[22], MR[13], PCA[9], SBD[23], BMS[11] . It is noted that our method highlights salient regions more uniformly and achieves better results especially in PR curve, MAE scores. In general our method outperforms other competitive approaches.
In this paper, we present a novel and efficient framework for salient object detection via complementary combination of foreground and background priors. The key contributions of our method are: (1) surroundedness cue is utilized for exploiting foreground prior, which is proved to be extremely effective when combined with backgournd prior. (2) A robust seed estimation scheme is provided for seeds selection with their confidence of belonging to background/foreground estimated. Extensive experimental results demonstrate the superiority of our proposed method against other outstanding methods. Our proposed also has a efficient implementation which is useful for real-time applications.
[1] Shi Min Hu, Tao Chen, Kun Xu, Ming Ming Cheng, and Ralph R. Martin, “Internet visual media processing: a survey with graphics and vision applications,” Visual Computer, vol. 29, no. 5, pp. 393–405, 2013.
[2] Chenlei Guo and Liming Zhang, “A novel multireso- lution spatiotemporal saliency detection model and its applications in image and video compression,” vol. 19, pp. 185 – 198, 02 2010.
[3] Ming Ming Cheng, Fang Lue Zhang, Niloy J Mitra, Xi- aolei Huang, and Shi Min Hu, “Repfinder: finding approximately repeated scene elements for image editing,” Acm Transactions on Graphics, vol. 29, no. 4, pp. 1–8, 2010.
[4] Zhixiang Ren, Shenghua Gao, Liang Tien Chia, and Wai Hung Tsang, “Region-based saliency detection and its application in object recognition,” IEEE Transactions on Circuits & Systems for Video Technology, vol. 24, no. 5, pp. 769–779, 2014.
[5] Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk, “Frequency-tuned salient region detection,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 1597–1604.
[6] Zheshen Wang and Baoxin Li, “A two-stage approach to saliency detection in images,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 965–968.
[7] Jingdong Wang, Huaizu Jiang, Zejian Yuan, Ming Ming Cheng, Xiaowei Hu, and Nanning Zheng, “Salient object detection: A discriminative regional feature integration approach,” vol. 123, no. 2, pp. 2083–2090, 2014.
[8] M. M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S. M. Hu, “Global contrast based salient region detection,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, no. 3, pp. 569–582, 2015.
[9] Margolin Ran, Ayellet Tal, and Lihi Zelnikmanor, “What makes a patch distinct?,” vol. 9, no. 4, pp. 1139– 1146, 2013.
[10] V Mazza, M Turatto, and C Umilt, “Foregroundbackground segmentation and attention: a change blindness study,” Psychological Research, vol. 69, no. 3, pp. 201–210, 2005.
[11] Jianming Zhang and Stan Sclaroff, “Exploiting sur-
roundedness for saliency detection: A boolean map approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 5, pp. 889, 2016.
[12] X. Li, H. Lu, L. Zhang, X. Ruan, and M. H. Yang, “Saliency detection via dense and sparse reconstruction,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 2976–2983.
[13] C. Yang, L. Zhang, H. Lu, X. Ruan, and M. H. Yang, “Saliency detection via graph-based manifold ranking,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp. 3166–3173.
[14] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aure- lien Lucchi, Pascal Fua, and Sabine Susstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 34, no. 11, pp. 2274, 2012.
[15] Zhou, Dengyong, Weston, Jason, Gretton, Arthur, Bous- quet, Olivier, Schlkopf, and Bernhard, “Ranking on data manifolds,” Advances in Neural Information Processing Systems, pp. 169–176, 2003.
[16] Chuan Yang, Lihe Zhang, Huchuan Lu, Ruan Xiang, and Ming Hsuan Yang, “Saliency detection via graph-based manifold ranking,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3166– 3173.
[17] Wangjiang Zhu, Shuang Liang, Yichen Wei, and Jian Sun, “Saliency optimization from robust background detection,” in Computer Vision and Pattern Recognition, 2014, pp. 2814–2821.
[18] Keren Fu, Chen Gong, Irene Y. H. Gu, and Jie Yang, “Geodesic saliency propagation for image salient region detection,” in IEEE International Conference on Image Processing, 2014, pp. 3278–3282.
[19] Stas Goferman, Lihi Zelnikmanor, and Ayellet Tal, “Context-aware saliency detection,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 34, no. 10, pp. 1915, 2012.
[20] Bernhard Schlkopf, John Platt, and Thomas Hofmann, “Graph-based visual saliency,” Advances in Neural Information Processing Systems, vol. 19, pp. 545–552, 2007.
[21] L. Itti, C. Koch, and E. Niebur, “A model of saliency- based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, Nov 1998.
[22] Fang Huang, Jinqing Qi, Huchuan Lu, Ruan Xiang, and Ruan Xiang, “Salient object detection via multiple instance learning,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 26, no. 4, pp. 1911–1922, 2017.
[23] Tong Zhao, Lin Li, Xinghao Ding, Yue Huang, and Delu
Zeng, “Saliency detection with spaces of backgroundbased distribution,” IEEE Signal Processing Letters, vol. 23, no. 5, pp. 683–687, 2016.