Adaptive Control of Embedding Strength in Image Watermarking using Neural Networks

2020Β·Arxiv

Abstract

Abstract

Digital image watermarking has been widely used in different applications such as copyright protection of digital media, such as audio, image, and video files. Two opposing criteria of robustness and transparency are the goals of watermarking methods. In this paper, we propose a framework for determining the appropriate embedding strength factor. The framework can use most DWT and DCT based blind watermarking approaches. We use Mask R-CNN on the COCO dataset to find a good strength factor for each sub-block. Experiments show that this method is robust against different attacks and has good transparency. Keywordsβ watermark; strength factor; mask R-CNN; discrete cosine transform; discrete wavelet transform.

I. INTRODUCTION

Digital image watermarking technology has been receiving more attention than before due to the rapid development of multimedia. This technology is a method for embedding a logo into the host image without causing a considerable visual change in the image. The logo could be copyright information, side information or any other information required to be embedded in the host image.

A watermarking method can be blind or non-blind. Nonblind methods, such as that of [1], need side information to extract watermark data but blind methods do not require such information. The embedding method can be in the spatial domain or the transform domain. The embedded watermark data should be robust enough against attacks and also transparent sufficient to meet the standard measures. Different methods have proposed during the past few years, for watermarking with different capabilities and restrictions. In [2] as an example, proper blocks are found for embedding in the host image, and a discrete cosine transform (DCT) is applied. Then coefficients that have fewer effects on the imperceptibility are selected, and information is embedded in those coefficients using a threshold. Margolis et al. [3] proposed a method that embeds in the transform domain. They use student-T distribution to find transform coefficients. DCT and discrete wavelet transform (DWT) are transform domains they use in their paper separately. Some papers cascade two transforms [4], [5] and [6]. The work done in [4] at first, applies DWT on image and then DCT on LL of the first level in the transformed image. Then it chooses the lower frequency coefficients of the final transformed image for embedding watermark string. In [5], after using ROI detection for selecting best host blocks, lowest ones in an ROI ranking, DCT is applied on sub-blocks of LH, HL, and HH in wavelet transform of them. Here like [2], the authors choose a couple of coefficients that have a lower effect on human eyes and also adequate robustness at the same time. Embedding phase in this paper finishes by conditional swapping of the coefficients couple considering each bit of watermark stream. The distance of couple value will be increased to enhance the robustness of their embedding method. We, in one of our previous works [6], used a cascade of DCT after DWT on a color space transforming. The embedding phase in this paper is different for each channel of YUV color space. As Y channel stores illuminance date, we embed in half of its capacity although the full capacity of U and V channels are used for embedding bits of watermark logo. In [7], after applying DWT on blocks of the image, an adaptive strength factor is used for embedding. The strength factor in this paper is computed by a function of wavelet energy and saliency of the image. Some research works used optimization algorithms to calculate better strength-factor, similar to [8]. The authors use the Cuckoo search algorithm in the wavelet domain to find the best strength factor. These mentioned papers used almost naΓ―ve methods without any serious use of artificial intelligence.

Around the year 2012, scientists started to use neural networks (NN) and similarly, researchers use these networks in watermarking methods. But most of the papers use neural networks for some parts of their methods. For example, an NN in [9] is used for finding the best strength factor. They used their own designed NN After applying a four-level wavelet transform on the host image. This NN results in predicting a sequence of coefficients that can choose the best strength factor for embedding. Zheng et al. [10] applied a DWT and then SVD on the host image and, after that, used a designed network. This network is trained to embed the watermark data in the transformed image. In [9] and [10], the authors both developed and trained a network for watermarking. As there is a lot of pre-trained networks, it seems reasonable to choose one of them and use it for embedding. Mask R-CNN [11] is one of the popular networks which is used for semantic segmentation. It was pre-trained on the COCO dataset [12] and can segment 48 classes in the target image. Mask R-CNN is an advanced version of Faster R-CNN [13]. Faster R-CNN is pre-trained on the PASCAL dataset to detect objects based on object locations hypothesis.

In this paper, we propose a blind method based on NN for calculating an adaptive strength factor. Our embedding phase consists of blocking, DWT, sub-blocking, and then DCT. The extraction phase is the inverse of the embedding phase. We also applied different attacks to embedded images to show the robustness of our proposed method.

This paper is organized as follows: The proposed method is given in Section II, the experimental results are shown in Section III, and the paper concludes in the last Section.

II. PROPOSED METHOD

Our proposed watermarking method contains four main stages. In this section, we first explain a NN that segments the important parts of the image. Then, the way of computing strength factor and embedding phase is, and finally, extracting the logo image from the watermarked image is explained. Figure 1 shows our proposed scheme for adaptive image watermarking.

A. Mask R-CNN Network

The new generation of Faster R-CNN is Mask R-CNN and is appropriate for semantic segmentation. The whole structure of Mask R-CNN is shown in Figure 2-a. The first stage of RCNN finds candidates to be accepted as objects and points them by rectangular bounding boxes. This stage, called Region Proposal Network (RPN), is a neural network that uses a spatial sliding window to map images to a lower dimension vector. This vector is the input of two layers: the box-regression layer (reg layer) and the box-classification layer (cls layer). As shown in figure 2-b, if we have K regions, the cls layer and reg layer produce 2ΓK scores and 4ΓK coordinates, respectively [13]. The output of the second stage in Mask R-CNN is a mask that predicts the class and box offset for each ROI [11]. We used Mask R-CNN for our proposed method to get masks for each class in the host image.

B. Compute Strength Factor

One of the outputs of Mask R-CNN contains masks for any given class. As a study, we defined a table that consists of two classes: person and car. We chose images inclusive of both person and car and extract their masks by Mask R-CNN. As Figure 3 shows, each mask multiplies by its coefficient that could be arbitrarily set considering any particular condition. The following equation produces as a final embedding map,

where represent for respective classes Car and Person and are coefficients and maps for each class. constant coefficient to scale for embedding. The number of classes could be more significant in the actual implementation of our method to cover any number of desired objects and ROIs. Mean of values corresponding to sub-block is computed as strength factor.

C. Embedding Phase

The start stage of our embedding method is similar to our previous work [5] and applies a saliency detection on host grayscale image to locate blocks with minimum saliency as candidates of embedding. Figure 1 shows the overall operation of embedding information in the host image of our previous work [5], resize all images to 512Γ512. Then compute saliency

for grayscale host images to find ROI areas by semantic segmentation from Mask R-CNN.

ROI areas have a lower rank in host image blocks and have a lower probability of embedding. Thus, after partitioning the host image to 128Γ128 blocks, a subset of them with lower ROI pixels are embedding candidates. The first five blocks of size 128Γ128 are selected, and two levels of DWT are applied to them, and then LH, HL, and HH parts will be partitioned to 8Γ8 sub-blocks. Then, we applied DCT on each sub-block and embed one bit in each 8Γ8 sub-block. Redundancy could be considered if the candidate sub-blocks were more than watermark bits.

The embedding process is done by the state of a couple of coefficients in the DCT domain. If the watermark bit is β1β, DCT(6,7) should be higher than DCT(7,6) else, we swap them. For β0β, the rule is inverse as equation 2.

Fig 1. Block diagram of the proposed method.

After that, we use a strength-factor that is calculated for each block. It enlarges the distance between the two coefficients to ensure maintaining the embedded bits in case of attacks to the host image.

We apply IDCT on each sub-block and transfer sub-blocks to

128Γ128 blocks after embedding watermark information.

Then we apply IDWT on each of five blocks and transfer them to create the watermarked image.

D. Extracting Phase

The extraction of this method is blind and does not need the host image. When the watermarked image is delivered to the receiver, it will be partitioned to 128Γ128 blocks, and five selected blocks can be detected by indexes as side information. After that, DWT is applied to each of them and LH, HL, and HH parts are divided into 8Γ8 sub-blocks. Then, we use DCT on each, and the extraction method is based on Equation 3.

We use a voting algorithm when redundant watermark bits are embedded in the same image to extract information that leverages the robustness of our method in the presence of attacks to the watermark image.

III. EXPERIMENTAL RESULT

We test our method on images of the COCO dataset that has a person and car because our table concludes these classes. The COCO dataset has about 82K images that we select 93 images for evaluating our method. Input images have different sizes so at first, we resize all images to 512Γ512. The watermark bits by the size 4Γ4, one of the input images and itβs watermarked

image are shown in Figure 3. Our method is implemented in MATLAB R2015a in laptop Corei5-4200U CPU and 6GB of RAM and we use pre-train Mask R-CNN in Jupyter Notebook. The watermark information is a 4Γ4 logo that can be random that half of them should be 0, and the other half should be 1 like Figure 4-b.

A. Evaluation Parameters

There are parameters that we can evaluate our methods by them. These parameters can show the goodness of our method and get the ability to compare our approach to other similar methods. One of them is the normalized correlation (NC)

where are sizes of the image, watermark information and extracted watermark respectively. The other parameter is Structural Similarity (SSIM) where

That Β΅ and Ο are mean and variance and c1 and c2 are constants. The Bit Error Rate (BER) is defined as below

The last parameter that we use in this paper is Peak Signal to Noise Ratio (PSNR), which is the influence of imperceptibility of an image.

Fig 2. Structure of network, a) Mask R-CNN [11], b) Region Proposal Network (RPN) [13]

Fig 3. Embedding map formation.

Fig 4. a) Input image, b) watermark bits, c) watermarked image

B. Results

We test our method for different parameters and compare them with [5]. The maximum PSNR and SSIM that our approach can achieve are 49.1052 and 0.9985 respectively. In comparison with [5], we have higher results. For example, in equal SSIM, we get 43.6043 for PSNR, but Jamaliβs PSNR is 42.4306. Results in Table 1 show our method is more robust against different attacks than [5]. Attacks that we test our approach by them are Gaussian noise (GN), JPEG compression (JC), 3Γ3 median filter (MF), histogram equalization (HE), and salt and pepper noise (S&P). We use three types of GN and JC that show the variance and ratio of compression respectively. Figures 5 show the effect of kΞ± on PSNR and SSIM that values of cp and cc are 1. By increasing kΞ± from 0.01 to 1, the value of PSNR and SSIM will be decreased. One of the results of our method represents in Figure 4.

IV. CONCLUSION

In this paper, we proposed a method to use the neural network to find the strength factor of ROI based blind image watermarking. Each watermark bit embedded 15 times in the COCO dataset for redundancy improvement. Our method is robust against different attacks. In comparison to similar work, our approach has better results and in equal NC, has higher PSNR and SSIM.

REFERENCES

[1] A. Roy, A. K. Maiti, and K. Ghosh, βAn HVS Inspired Robust Non-blind Watermarking Scheme in YCbCr Color Space,β Int. J. Image Graph., vol. 18, no. 03, p. 1850015, 2018.

[2] F. Ernawan and M. N. Kabir, βA robust image watermarking technique with an optimal DCT-psychovisual threshold,β IEEE Access, vol. 6, pp. 20464β20480, 2018.

[3] A. Mairgiotis, L. P. Kondi, and Y. Yang, βDct/dwt blind multiplicative watermarking through student-t distribution,β in Image Processing (ICIP), 2017 IEEE International Conference on, 2017, pp. 520β524.

[4] C. Dong, J. Li, and Y. Chen, βA DWT-DCT Based Robust Multiple Watermarks for Medical Image,β in Photonics and Optoelectronics (SOPO), 2012 Symposium on, 2012, pp. 1β4.

[5] M. Jamali, S. Samavi, N. Karimi, S. M. R. Soroushmehr, K. Ward, and K. Najarian, βRobust watermarking in non-ROI of medical images based on DCT-DWT,β in Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the, 2016, pp. 1200β1203.

[6] M. Jamali, M. Bagheri, N. Karimi, and S. Samavi, βRobustness and Imperceptibility Enhancement in Watermarked Images by Color Transformation,β arXiv Prepr. arXiv1911.00772, 2019.

[7] H. Liu, J. Liu, and M. Zhao, βVisual Saliency Model-Based Image Watermarking with Laplacian Distribution,β Information, vol. 9, no. 9, p. 239, 2018.

[8] M. Ali and C. W. Ahn, βAn optimal image watermarking approach through Cuckoo search algorithm in wavelet domain,β Int. J. Syst. Assur. Eng. Manag., vol. 9, no. 3, pp. 602β611, 2018.

[9] A. Rajpal, A. Mishra, and R. Bala, βMultiple scaling factors based semiblind watermarking of grayscale images using OS-ELM neural network,β in Signal Processing, Communications and Computing (ICSPCC), 2016 IEEE International Conference on, 2016, pp. 1β6.

[10] W. Zheng et al., βRobust and high capacity watermarking for image based on DWT-SVD and CNN,β in 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2018, pp. 1233β1237.

[11] K. He, G. Gkioxari, P. DollΓ‘r, and R. Girshick, βMask r-cnn,β in Computer Vision (ICCV), 2017 IEEE International Conference on, 2017, pp. 2980β2988.

[12] βhttp:// cocodataset.org.β.

[13] S. Ren, K. He, R. Girshick, and J. Sun, βFaster r-cnn: Towards real-time object detection with region proposal networks,β in Advances in neural information processing systems, 2015, pp. 91β99.

Fig 5. a) PSNR (dB), b) SSIM for different values of k.

Table 1. Results for different attacks