DBP thanks the Royal Academy of Engineering for support. MJP acknowledges financial support from UK Quantum Technology Hub in Quantum Enhanced Imaging (Grant No. EP/M01326X/1), the Wolfson foundation and the Royal Society. M-JS acknowledges support from the National Natural Foundation of China (Grant No. 61307021) and China Scholarship Council (Grant No. 201306025016). DBP thanks Joy Hollamby and Kevin Mitchell for providing photos to image.
Correspondence and requests for materials should be sent to DBP (email: david.phillips@glasgow.ac.uk) or M-JS (email: mingle.sun@buaa.edu.cn).
[1] P. Sen, B. Chen, G. Garg, S. R. Marschner, M. Horowitz, M. Levoy, and H. Lensch, in ACM Transactions on Graphics (TOG) (ACM, 2005), vol. 24, pp. 745–755.
[2] D. Takhar, J. N. Laska, M. B. Wakin, M. F. Duarte, D. Baron, S. Sarvotham, K. F. Kelly, and R. G. Baraniuk, in Computational Imaging IV at SPIE Electronic Imaging (SPIE, 2006), pp. 43–52.
[3] K. Goda, K. Tsia, and B. Jalali, Nature 458, 1145 (2009).
[4] J. Hunt, T. Driscoll, A. Mrozack, G. Lipworth, M. Reynolds, D. Brady, and D. R. Smith, Science 339, 310 (2013).
[5] W. L. Chan, K. Charan, D. Takhar, K. F. Kelly, R. G. Baraniuk, and D. M. Mittleman, Applied Physics Letters 93, 121105 (2008).
[6] C. M. Watts, D. Shrekenhamer, J. Montoya, G. Lipworth, J. Hunt, T. Sleasman, S. Krishna, D. R. Smith, and W. J. Padilla, Nature Photonics 8, 605 (2014).
[7] R. I. Stantchev, B. Sun, S. M. Hornett, P. A. Hobson, G. M. Gibson, M. J. Padgett, and E. Hendry, Science Advances 2, e1600190 (2016).
[8] T. J. Kane, C. E. Byvik, W. Kozlovsky, and R. L. Byer, Optics letters 12, 239 (1987).
[9] G. A. Howland, P. B. Dixon, and J. C. Howell, Applied optics 50, 5917 (2011).
[10] G. A. Howland, D. J. Lum, M. R. Ware, and J. C. Howell, Optics express 21, 23822 (2013).
[11] M.-J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Rad- well, R. Lamb, and M. J. Padgett, Nature Communications 7 (2016).
[12] S. Popoff, A. Goetschy, S. Liew, A. D. Stone, and H. Cao, Physical review letters 112, 133903 (2014).
[13] T. ˇCiˇzm´ar and K. Dholakia, Nature communications 3, 1027 (2012).
[14] R. N. Mahalati, R. Y. Gu, and J. M. Kahn, Optics express 21, 1656 (2013).
[15] M. Pl¨oschner, T. Tyc, and T. ˇCiˇzm´ar, Nature Photonics 9, 529 (2015).
[16] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk, in 2006 International Conference on Image Processing (IEEE, 2006), pp. 1273–1276.
[17] E. Candes and J. Romberg, Inverse problems 23, 969 (2007).
[18] R. G. Baraniuk, IEEE signal processing magazine 24 (2007).
[19] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, IEEE Signal Processing Magazine 25, 83 (2008).
[20] O. Katz, Y. Bromberg, and Y. Silberberg, Applied Physics Letters 95, 131110 (2009).
[21] M. Amann and M. Bayer, Scientific reports 3 (2013).
[22] W.-K. Yu, M.-F. Li, X.-R. Yao, X.-F. Liu, L.-A. Wu, and G.-J. Zhai, Optics express 22, 7133 (2014).
[23] F. Soldevila, E. Salvador-Balaguer, P. Clemente, E. Tajahuerce, and J. Lancis, Scientific reports 5 (2015).
[24] C. L. Chen, A. Mahjoubfar, and B. Jalali, PloS one 10, e0125106 (2015).
[25] F. Crescitelli, in Photochemistry of vision (Springer, 1972), pp. 245–363.
[26] M. S. Banks, W. W. Sprague, J. Schmoll, J. A. Parnell, and G. D. Love, Science advances 1, e1500391 (2015).
[27] B. O?BRIEN, JOSA 41, 882 (1951).
[28] W. Becker and A. Fuchs, Vision research 9, 1247 (1969).
[29] B. Fischer and E. Ramsperger, Experimental Brain Re- search 57, 191 (1984).
[30] M. C. Morrone and D. Burr, Proceedings of the Royal Society of London B: Biological Sciences 235, 221 (1988).
[31] J. Duncan, R. Ward, K. Shapiro, et al., Nature 369, 313 (1994).
[32] H. Strasburger, I. Rentschler, and M. J¨uttner, Journal of vision 11, 13 (2011).
[33] M. Bolduc and M. D. Levine, Computer vision and image understanding 69, 170 (1998).
[34] W. S. Geisler and J. S. Perry, in Electronic Imaging (International Society for Optics and Photonics, 1998), pp. 294–305.
[35] O. Stasse, Y. Kuniyoshi, and G. Cheng, in International Workshop on Biologically Motivated Computer Vision (Springer, 2000), pp. 150–159.
[36] P. Kortum and W. S. Geisler, in Electronic Imaging: Science & Technology (International Society for Optics and
Photonics, 1996), pp. 350–360.
[37] G. Carles, S. Chen, N. Bustin, J. Downing, D. McCall, A. Wood, and A. R. Harvey, Optics letters 41, 1869 (2016).
[38] H. Yamamoto, Y. Yeshurun, and M. D. Levine, Com- puter Vision and Image Understanding 63, 50 (1996).
[39] J. Tanida, T. Kumagai, K. Yamada, S. Miyatake, K. Ishida, T. Morimoto, N. Kondou, D. Miyazaki, and Y. Ichioka, Applied optics 40, 1806 (2001).
[40] S. C. Park, M. K. Park, and M. G. Kang, IEEE signal processing magazine 20, 21 (2003).
[41] G. Carles, J. Downing, and A. R. Harvey, Optics letters 39, 1889 (2014).
[42] J. H. Shapiro, Physical Review A 78, 061802 (2008).
[43] Y. Bromberg, O. Katz, and Y. Silberberg, Physical Review A 79, 053840 (2009).
[44] D. Phillips, R. He, Q. Chen, G. Gibson, and M. Padgett, Optics Express 24, 14172 (2016).
[45] M. P. Edgar, G. M. Gibson, R. W. Bowman, B. Sun, N. Radwell, K. J. Mitchell, S. S. Welsh, and M. J. Padgett, Scientific reports 5 (2015).
[46] M.-J. Sun, M. P. Edgar, D. B. Phillips, G. M. Gibson, and M. J. Padgett, Optics Express 24, 10476 (2016).
[47] N. J. Sloane and M. Harwit, Applied optics 15, 107 (1976).
[48] D. S. Davis, Applied optics 34, 1170 (1995).
[49] M. Harwit, Hadamard transform optics (Elsevier, 2012).
[50] R. Murray-Smith and B. A. Pearlmutter, in Deterministic and Statistical Methods in Machine Learning (Springer, 2005), pp. 110–123.
[51] J. K. Aggarwal and Q. Cai, in Nonrigid and Articulated Motion Workshop, 1997. Proceedings., IEEE (IEEE, 1997), pp. 90–102.
[52] J. Aggarwal and N. Nandhakumar, Tech. Rep., DTIC Document (1988).
[53] C. S. Burrus, R. A. Gopinath, and H. Guo (1997).
[54] D. Le Gall, Communications of the ACM 34, 46 (1991).
[55] I. E. Richardson, H. 264 and MPEG-4 video compression: video coding for next-generation multimedia (John Wiley & Sons, 2004).
[56] C. M. Bishop, Machine Learning 128 (2006).
[57] C. E. Rasmussen (2006).
[58] A. C. Sankaranarayanan, L. Xu, C. Studer, Y. Li, K. F. Kelly, and R. G. Baraniuk, SIAM Journal on Imaging Sciences 8, 1489 (2015).
[59] Z. Li, J. Suo, X. Hu, and Q. Dai, Optics express 24, 7328 (2016).
[60] F. Ferri, D. Magatti, L. Lugiato, and A. Gatti, Physical review letters 104, 253603 (2010).
basis. A Hadamard matrix is defined as an matrix with elements that take the values of +1 or -1, and with rows that are orthogonal to one another. The Supplementary Information of reference [7], and the references therein give an excellent description of the generation and use of the Hadamard matrices.
In summary a 2D uniform resolution mask of index n is formed by reformatting row n of the Hadamard matrix into a uniform 2D grid, as shown in Figs. 2(a-b). However, our experimental implementation uses a DMD that can represent masks that transmit (mirrors ‘on’) or block (mirrors ‘off’) intensity regions within the image. This corresponds to masks consisting of +1 (transmitted light) and 0 (blocked light), but not the -1 required by the Hadamard matrix. This problem is circumvented by first displaying a ‘positive’ pattern of +1s and 0s (in place of the -1s) followed by the ‘negative’ of this pattern (i.e. where the positions of 1s and 0s have been swapped). The desired Hadamard encoding matrix can then be emulated by subtraction of the intensity transmitted by the negative pattern from the positive pattern. In addition, displaying the negative mask immediately after the positive mask also acts to cancel out some of the noise due to low frequency fluctuations in the ambient illumination, a technique analogous to differential ghost imaging [60].
frames. Here we derive and discuss in further detail Equation 2 of the main text. We will make use of the transformation matrix T , which maps from N-element cell space to the larger M-element hr-pixel space. T is an binary matrix where the locations of the ‘ones’ in column n denote the hr-pixels that belong to cell n.
With the help of this matrix, we can define a new basis , formed by “stretching” the Hadamard vectors
to conform to our nonuniform pixel grid:
These are the raw patterns that we will measure using our DMD. It is important to note that, in contrast to conventional computational imaging using Hadamard matrices on a regular grid, the vectors are not orthogonal in hr-pixel space. However, after some matrix algebra it can be shown that there exists a dual basis ˜
forming a biorthogonal set with
, i.e. the following relation holds:
Here A is an diagonal matrix such that
is equal to the area of the cell to which hr-pixel m belongs.
The existence of this biorthogonality relationship makes it helpful to represent our high-resolution object o in the dual basis ˜as follows:
where represents those high-spatial-frequency components that are orthogonal to all
(i.e.
n), and hence that the imaging system described here is not sensitive to.
Then, by projecting o onto our basis set and expanding, we can show that
as follows:
These measurements can then be used to derive our estimate of the object (Equation 2 in the main text):
However, if we substitute (3) into this equation then we see that the same reconstruction in fact be computed more efficiently in cell space:
Although this matrix equation is slightly more longwinded than (2), it shows that the reconstruction can be performed in the (lower-dimensional) cell space, and then the final result remapped just once onto the uniform hr-pixel grid.
tion sub-frames. In passive single-pixel imaging techniques, the signal-to-noise ratio (SNR) scales approximately in inverse proportion to the square of the linear resolution, for a given constant exposure-time. Following from this observation, in our spatially-variant resolution imaging system the SNR across each individual sub-frame is also spatially-variant, with the higher resolution regions being most sensitive to noise. Therefore the local SNR scales in inverse proportion to the square of the local linear resolution. The weighted-averaging method does go some way towards improving the SNR, as is discussed in more detail in [46].
However, our space-variant imaging system does offer a reduction in the noise caused by motion blur. When a scene changes during the measurement of a computational image using Hadamard patterns (uniform or spatially-variant), the reconstruction not only exhibits conventional motion blur, but also a splash of noisy pixels across the field-of-view. This pattern multiplexing noise is due to scene movement causing inconsistencies in the measured weights of each pattern. By lowering the resolution in regions of the scene deemed static, our foveated imaging system reduces the amount of time required to image a moving part of the scene to a given resolution (in the examples here, by a factor of 4), therefore reducing both conventional motion blur and pattern multiplexing noise.
is formed by weighting the contribution of data from each sub-frame in inverse proportion to the area of the corresponding sub-frame cell that the data is taken from. Therefore the intensity of pixel i, j in the weighted mean composite image, ), is given by:
where k indexes the sub-frames used to calculate the composite image, ) is the value of pixel i, j in space-variant sub-frame k, and
) is the area of the cell that pixel i, j belongs to.
) serves to normalise the sum. Consistent with our earlier vector notation, we can equivalently write:
where . Equations 6 and 7 therefore specify an equal weighting of sub-frames within the fovea (where the pixels are all of the same size), and in the peripheral region promotes data from pixels that have a smaller area and thus a higher local resolution. This strategy incorporates local data from all sub-frames in every composite image pixel, which has the benefit of suppressing noise.
We note that a variety of other weightings may also be applied. Other examples that we investigated include using only data from the sub-frame with the highest resolution pixel (with equal weighting given in regions where sub-frames have the same sized pixels), and the weighting of more recent measurements more prominently. This second weighting strategy can be applied if some parts of the scene are expected to change throughout the measurement. The weighting choice depends upon the distribution of pixel areas in the sub-frames, the noise levels in the measurement, and the expected level of scene motion.
constraint algorithm, we fuse information from multiple sub-frames by forming a system of linear equations representing constraints on the high-resolution reconstructed image. The problem can be expressed as:
where T is our binary stretching transform matrix as defined as above. Therefore here (is an
binary matrix encoding which hr-pixels belong to each cell (i.e. element m, n is 1 if pixel m belongs to cell n, and 0 otherwise).
is a column vector of length N, element n of which represents the sum of all the hr-pixel values in cell n of sub-frame k. Conveniently, the vector c is already computed as part of the reconstruction of sub-frame k (referring to Equation 5, we see that c =
). Note that, in the same way as T maps from cell
space to hr-pixel space, maps from hr-pixel space
to cell space. These transformations are related by: , indicating that conversion from cell to hr-pixel space and back again is lossless. However, the reverse transformation
does not equal 1, indicating that a transformation from hr-pixel to cell space and back again is not lossless, and high resolution detail is lost in the transformation. In practice, we solve for
using a least-squares
method that is suitable for systems that may be lo-
cally overdetermined, critically determined, or underdetermined depending on the number of sub-frames available for the reconstruction. Our linear-constraint method can be sensitive to noise in the sub-frame measurements, and in particular noise is amplified in the highest spatial frequencies of the composite image (i.e. within the fovea). We suppress this noise by applying a spatially-variant smoothing constraint to the system of equations, which maintains generality as it is derived from the weighted average composite image formed from the raw data. This is achieved by adding extra rows to the system of equations incorporating the information present in the weighted average composite image. In this case the relative importance of the constraint terms can be tuned by solving for using a weighted least-squares method. This effectively gives us a tunable compromise between
noise suppression and faithful reproduction of the high
spatial frequencies close to the cut-off frequency of the reconstruction: for example a greater weighting of the constraint leads to lower noise images but with high frequencies suppressed (non-uniformly across the field of view, reflecting the underlying measurements). In practical terms it is highly attractive to use the weighted average image as a constraint for the linear-constraint reconstruction, as it represents a ready-made space-variant noise suppression function, which would be non-trivial to otherwise synthesise from our irregular grids of sub-frame cells. Therefore the linear-constraint method is essentially equivalent to performing a deconvolution of the weighted-average reconstruction using the appropriate spatially-variant PSF. In Equation 8 we can also account for cases where local
motion has been detected in the images, as encoded in
the difference map stacks. We do this simply by deleting any rows from the matrices , along with the corresponding elements from vectors
, that correspond to cells in sub-frame k that are deemed to have changed. We note that the matrices in the equations presented
here are highly sparse, and an efficient implementation of the reconstruction code benefits significantly from exploiting this property. Note also that in the case of a regular square pixel grid (e.g. Fig. 2(a)) the algebraic problem is separable in the x and y dimensions. This reduces the formal computational complexity of the problem, thus making it possible to implement a real-time reconstruction. However, it would be significantly more challenging to construct an irregular pixel grid (e.g. Fig. 2(d)) that is separable in this way, while still meeting the other requirements for our geometry.
Media 1: Real-time sub-frame display. This movie shows data presented in Fig. 3(a). The left hand panel shows the sub-frames captured at 8 Hz (and processed and displayed in real-time). The super-sampling from one frame to the next can be seen both within the fovea where they repeat every 4 frames, and in the periphery where they repeat every 36 frames (the same as the length of the movie). The right hand panel shows the cell grid for each frame.
Media 2: Post-processed linear constraints reconstruction. This movie shows data presented in Fig. 3(c). The left hand panel shows the frame-by-frame linear constraints reconstruction. The high resolution appears to spread from the centre as in the periphery each new frame is fused with the existing data to improve the reconstruction. Right hand panel shows the effective exposure-time across the field-of-view. Initially the entire field-of-view has the same effective exposure-time as only a single frame has been recorded. In the centre only the most recent 4 sub-frames are used in the reconstruction (hence an effective exposure-time of 0.5 s). Surrounding this data from progressively more frames back is used in the reconstruction (thus increasing the effective exposure time).
Media 3: Real-time motion tracking and fovea guidance. This movie shows data presented in Fig. 4(b). The top-left panel shows the low-resolution blip-frame (recorded after every 4th sub-frame in 31 ms). The bottom-left panel shows the difference between the 2 most recent consecutive blip-frames. The region of the moving object is clearly visible. The blip-frame and blip-difference frame are reconstructed, analysed and displayed in real-time. The middle panel shows the sub-frames captured at 8 Hz (and processed and displayed in real-time). Here the fovea is programmed to follows the moving part of the scene to image it at high resolution. Additionally, 20% of the time the fovea is programmed to jump to a random location within the field-of-view that was not recently accessed. This is performed to ensure that all of the sub-frames are at least intermittently sampled, improving the quality of the longer exposure super-sampled reconstruction of static parts of the scene (shown in Media 4). The global position of the fovea is updated at 2 Hz (every 4 sub-frames), based on the blip-difference frame analysis. During each 4 sub-frame fixation phase in-between blip-frame measurements, the fovea cell footprints are shifted by 4 half-cell displacements in x and y [46]. This enables the resolution within the fovea to be doubled by combining the 4 measurements should the scene with the fovea remain static for the time of the measurements. The right hand panel shows the cell grid for each frame.
Media 4: Real-time weighted-averaging and post-processed linear constraints reconstruction of a dynamic scene. This movie shows data presented in Fig. 4(c-d). The top-left panel shows the post-processed linear-constraints reconstruction of the raw data shown in Media 3. The effective exposure-time of this reconstruction is shown in the top-right panel. Here the minimum effective exposure-time is 0.25 s and the maximum is 4 s. Therefore in the parts of the scene that are currently deemed to be moving, we display the average of the most recent 2 frames. In other regions the data used in this reconstruction is flushed after a maximum of 4 s (i.e. the maximum number of previous sub-frames from which data is used is 32). In this case as in some regions no change was detected throughout the entire duration of the clip (15 s, 120 sub-frames), the maximum effective-exposure could have been 15 s. We show the case for a reduced 4 s maximum effective exposure-time to demonstrate the super-sampled recovery of the entire field-of-view.
The bottom-left panel shows the real-time weighted-averaging reconstruction. Here instead of choosing a hard limit on the maximum effective exposure-time we have weighted older frames less prominently, which protea recent measurements.
The bottom-right panel shows a uniform resolution video of a similar scene, using the same measurement resource as the data in the other panels. Here the data is completely flushed every frame (0.125 s), however the resolution is never high enough to identify the lettering on the moving sign, or any of the features on the resolution target in the background.