The main contribution of this paper is to introduce the concept of the quantum neural network layer (QNNL) as a quantum analog to the classic neural network layer in DNN. Since all quantum gates are unitary and hence linear, the main difficulty of building a QNNL is introducing non-linearity. We solve this problem by encoding the input vector to a quantum state non-linearly with a parameterized quantum circuit (PQC), which is similar to [25]. Since all quantum gates are unitary and hence linear, the main difficulty of building a QNNL is introducing non-linearity. We solve this problem by encoding the input vector to a quantum state non-linearly with a parameterized quantum circuit (PQC) [16]. A QNNL is a quantum circuit that is different from the classical neural network layer. Different from [25], we consider a multi-layer structure that may contain multiple PQCs with a deep neural network. A quantum DNN (QDNN) can be easily built with QNNLs since the input and output of a QNNL are classical values.
The advantage of introducing QNNLs is that we can access vectors of exponential dimensional Hilbert spaces with only polynomial resources on a quantum computer. We prove that this model can not be classically simulated efficiently unless universal quantum computing can be classically simulated efficiently. So QDNNs have more representation power than classical DNNs. We also give training algorithms of QDNNs which are similar to the backpropagation (BP) algorithm. Moreover, QNNLs use the hybrid quantum-classical scheme. Hence, a QDNN with a reasonable size can be trained efficiently on NISQ processors. Finally, a numerical experiment for image recognition is given using QDNNs, where a high accuracy rate is achieved and only 8 qubits are used.
We finally remark that all tasks using DNN can be turned into quantum algorithms with more representation powers by replacing the DNN by QDNN.
Figure 1: Hybrid quantum-classical scheme.
2.1 Quantum neural network layers
A deep neural network consists of a large number of neural network layers, and each neural network layer is a non-linear function ) :
with parameters
. In the classical DNN,
takes the form of
, where
is an affine transformation and
is a non-linear activation function. The power of DNNs comes from the non-linearity of the activation function. Without activation functions, DNNs will be nothing more than affine transformations.
However, all quantum gates are unitary matrices and hence linear. So the key point of developing QNNLs is introducing non-linearity.
We build QNNLs using the hybrid quantum-classical algorithm scheme [26], which is widely used in many NISQ quantum algorithms [13, 27]. As shown in figure 1, a hybrid quantum-classical algorithm scheme consists of a quantum part and a classical part. In the quantum part, parameterized quantum circuits (PQCs) are used to prepare quantum states with quantum processors. In the classical part, parameters of the PQCs are optimized using classical computers.
A PQC is a quantum circuit with parametric gates, which is of the form
where = (
) are the parameters, each
) is a rotation gate
) = exp(
), and
is a 1-qubit or a 2-qubits gate such that
= I. For example, in this paper we will use the Pauli gates X, Y, Z, and the CNOT gate.
As shown in figure 1, once fixed an ansatz circuit ) and a Hamiltonian H, we can define the loss function of the form
. Then we can optimize L by updating parameters
using optimizating algorithms [28, 29]. With gradient-based algorithms [28], one can efficiently get gradient imformation
which is essentially important in our model. Hence, we will focus on gradient-based algorithms in this paper.
Now we are going to define a QNNL, which consists of 3 parts: the encoder, the transformation, and the output, as shown in figure 2.
For a classical input data , we introduce non-linearity to our QNNL by encoding the input
to a quantum state
non-linearly. Precisely, we choose a PQC
) with at most O(n) qubits and apply it to an initial state
Figure 2: The structure of a QNNL
to obtain a quantum state
encoded from . The PQC is naturally non-linear in the parameters. For example, the encoding process
from x to is non-linear. Moreover, we can compute the gradient of each component of
efficiently. The gradient of the input in each layer is necessary when training the QDNN. The encoding step is the analog to the classical activation step.
After encoding the input data, we apply a linear transformation as the analog of the linear transformation in the classical DNNs. This part is natural on quantum computers because all quantum gates are linear. We use another PQC ) with parameters
for this purpose. We assume that the number of parameters in
) is O(poly(n)).
Finally, the output of a QNNL will be computed as follow. We choose m fixed Hamiltonians = 1, . . . , m, and output
Here, the bias term = (
) is an analog of bias in classical DNNs. Also, each
is a hybrid quantum-classical scheme with PQC U and Hamiltonian
).
To compute the output efficiently, we assume that the expectation value of each of these Hamiltonians can be computed in O(poly()), where
is the precision. It is easy to show that all Hamiltonians of the following form satisfy
this assumption
where are tensor products of Pauli matrices or k-local Hamiltonians. In summary, a QNNL is a function
defined by (1) and (2), and shown in figure 2. Note that a QNNL is a function with classic input and output, and can be determined by a tuple
with parameters (). Notice that the QNNLs activate before affine transformations while classical DNNLs activate after affine transformations. But this difference can be ignored when considering multi-layers.
2.2 QDNN and its training algorithms
Since the input and output of QNNLs are classical values, the QNNLs can be naturally embedded in classical DNNs. A neural network consists of the composition of multiple compatible QNNLs and classical DNN layers is called quantum DNN (QDNN):
where each is a classical or a quantum layer from
to
for i = 1, . . . , l and
= 1, . . . , l} are the parameters of the QDNN.
We will use gradient descent to update the parameters. In classical DNNs, the gradient of parameters in each layer is computed by the backpropagation algorithm (BP). Suppose that we have a QDNN. Consider a QNNL Q with parameters , whose input is
and output is
. Refer to (1) and (2) for details.
To use the BP algorithm, we need to compute and
. Computing
is trivial. Because U, V are PQCs and each component of
is an output of a hybrid quantum-classical scheme, both
and
can be estimated by shifting parameters [28].
We can use the above algorithm to estimate the gradient in each quantum layer.
Hence, gradients can be back propagated through the quantum layer, and QDNNs can be trained with the BP algorithm.
2.3 Representation power of QDNNs
In this section, we will consider the representation power of the QDNN. We will show that QDNN can approximate any continuous function similar to the classical DNN. Moreover, if quantum computing can not be classically simulated efficiently, the QDNN has more representation power than the classical DNN with polynomial computation resources.
2.3.1 Approximate functions with QDNNs
The universal approximation theorem ensures that DNNs can approximate any continuous function [20, 21, 30, 31]. Since the class of QDNNs is an extension of the class of classical DNNs, the universal approximation theorem can be applied to the QDNN trivially. Now, let us consider two subclasses of the QDNN.
• DNN with only QNNLs.
• DNN with QNNLs and affine layers.
In the first subclass, let us consider a special type of QNNLs which can represent monomials [25]. Consider the circuit
and the Hamiltonian . The expectation value
is the monomial x for
[0, 1]. For multivariable monomial
, we use the circuit
and the Hamiltonian . Similarly, the expectation value of
for [0, 1]. With the above results and Stone-Weierstrass theorem [32], we can deduce the following theorem.
Theorem 1. The QDNN with only QNNLs can uniformly approximate any continuous function f : [0, 1]
Now let us consider the second subclass. As the affine transformation can map the hypercube [0, 1]to [
] for any
. Hence we have the following result.
Corollary 1. The QDNN with QNNLs and affine layers can uniformly approximate any continuous function
where D is a compact set in .
Also, the QNNL can perform as a non-linear activation function. For example, we consider a QNNL with the input circuit
and the Hamiltonian
where the projection is on the j-th qubit for j = 1, . . . , m. By simple computation, we have
By the universal approximation property [21, 33], neural networks with nonpolynomial activation functions can approximate any continuous function f : . Thus, the QDNN with QNNLs and affine layers can approximate any continuous function.
2.3.2 Quantum advantages
According to the definition of QNNLs in (2), each element of the outputs in a QNNL is of the form
In general, estimation of on a classical computer will be difficult by the following theorem.
Theorem 2. Estimation (6) with precision is BQP-hard, where BQP is the bounded-error quantum polynomial time complexity class.
Proof. Consider any language . There exists a polynomial-time Turing machine which takes
as input and outputs a polynomial-sized quantum circuit C(x). Moreover,
if and only if the measurement result of
of the first qubit has the probability
to be
.
Because CNOT} are universal quantum gates, C(x) can be expressed as a polynomial-sized PQC:
) = C(x) with proper parameters. Consider
, then
if and only if , and
if and only if .
Given inputs, computing the outputs of classical DNNs is polynomial time. Hence, functions represented by classical DNNs are characterized by the complexity class P/poly. On the other hand, computing the outputs of QDNNs is BQP-hard in general according to Theorem 2. The functions represented by QDNNs are characterized by a complexity class that has a lower bound BQP/poly. Here, BQP/poly is the problems which can be solved by polynomial sized quantum circuits with bounded error probability [34]. Under the hypothesis that quantum computers cannot be simulated efficiently by classical computers, which is generally believed, there exists a function represented by a QDNN which cannot be computed by classical circuits of polynomial size. Hence, QDNNs have more representation power than DNNs.
We used QDNN to conduct a numerical experiment for an image classification task. The data comes from the MNIST data set. We built a QDNN with 3 QNNLs. The goal of this QDNN is to recognize the digit in the image is either 0 or 1 as a classifier.
3.1 Experiment details
The data in the MNIST is 28 28 = 784 dimensional images. This dimension is too large for the current quantum simulator. Hence, we first resize the image to 8
8 pixels. We use three QNNLs in our QDNN, which will be called the input layer, the hidden layer, and the output layer, respectively.
In each layer, we set the ansatz circuit to be the one in figure 3, which is similar to the hardware efficient ansatz [35].
Figure 3: Ansatz circuits in each QNNL. a. The ansatz circuit of the encoder. b. The ansatz circuit of the transformation. c. The structure of gate Ent.
3.1.1 Input layer
The input layer uses an 8-qubit circuit which accept an input vector and output a vector
. The structure in FIG. 3 is used, where
= 2
= 5. We denote
) for
to be the transformation circuit in this layer.
The output of the input layer is of the form
where and
denotes the result obtained by applying the operator M on the j-th qubit for
.
3.1.2 Hidden layer
The hidden layer uses 6 qubits. It accepts an vector and outputs a vector
. The structure shown in figure 4 is used, with
= 1
= 4. Because there are 30 parameters in the encoder, we set the last column of
gates to be
(0). Similar to the input layer, the output of the hidden layer is
Table 1: Settings of three layers
3.1.3 Output layer
The output layer uses 4 qubits. We also use the structure in FIG. 3 with = 1
= 2. Because there are 20 parameters in the encoder, we set the last column of
and
gates to be
(0) and
(0). The output of the output layer is
Notice that we do not add bias term here, and it will output a vector in . Moreover, after training, we hope to see if the input
is from an image of digit
0, the output should be close to
, otherwise it should be close to
. In conclusion, the settings of these three layers are shown in table 1. Finally, the loss function is defined as
where D is the training set.
3.2 Experiments result
We used the Julia package Yao.jl [36] as a quantum simulator in our experiments. All data were collected on a desktop PC with Intel CPU i7-4790 and 4GB RAM.
All parameters were initialized randomly in (). We use Adam optimizer [37] to update parameters. We train this QDNN for 400 iterations with batch size of 240. In the first 200 of iterations, the hyper parameters of Adam is set to be
= 0
= 0
= 0.999. In the later 200 of iterations, we change
to 0.001.
The values of the loss function on the training set and test set during training is shown in figure 4. The accurate rate of this QDNN on the test set rise to 99.29% after training.
We introduced the model of QNNL and built QDNN with QNNLs. We proved that QDNNs have more representation power than classical DNNs. We pre-
Figure 4: Loss function
sented a practical gradient-based training algorithm as the analog of BP algorithms. Because the model is based on the hybrid quantum-classical scheme, it can be realized on NISQ processors. As a result, the QDNN has more representation powers than classical DNNs and still keeps most of the advantages of the classical DNNs.
Since we use a classical simulator on a desktop PC for quantum computation, only QDNNs with a small number of qubits can be used and only simple examples can be demonstrated. Quantum hardware is developing fast. Google achieved quantum supremacy by using a superconducting quantum processor with 53 qubits [38]. From table 1, up to 8 qubits are used in our experiments described in Sections 2.5 and 4.2, so in principle, our image classification experiment can be implemented in Google’s quantum processor. With quantum computing resources, we can access exponential dimensional feature Hilbert spaces [14] with QDNNs and only use polynomial-size of parameters. Hence, we believe that QDNNs will help us to extract features more efficiently than DNNs. This is similar to the ideas of kernel methods [39, 40].
We thank Xiu-Zhe Luo and Jin-Guo Liu for helping in Julia programming. This work is partially supported by an NSFC grant no. 11688101 and an NKRDP grant no. 2018YFA0306702.
All codes are available on http://github.com/ChenZhao44/QDNN.jl. And the data that support the findings of this study are available if request.
[1] Shor, P. W. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings 35th Annual Symposium on Foundations of Computer Science, 124–134 (Ieee, 1994).
[2] Grover, L. K. A fast quantum mechanical algorithm for database search. In Proceedings of 28th Annual ACM Symposium on Theory of Computing, STOC ’96, 212–219 (ACM, New York, NY, USA, 1996). URL http:// doi.acm.org/10.1145/237814.237866.
[3] Preskill, J. Quantum computing in the nisq era and beyond. Quantum 2, 79 (2018).
[4] Wiebe, N., Braun, D. & Lloyd, S. Quantum algorithm for data fitting. Physical review letters 109, 050505 (2012).
[5] Schuld, M., Sinayskiy, I. & Petruccione, F. An introduction to quantum machine learning. Contemporary Physics 56, 172–185 (2015).
[6] Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
[7] Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Physical review letters 113, 130503 (2014).
[8] Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum principal component analysis. Nature Physics 10, 631 (2014).
[9] Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B. & Melko, R. Quantum boltzmann machine. Physical Review X 8, 021050 (2018).
[10] Gao, X., Zhang, Z. & Duan, L. A quantum machine learning algorithm based on generative models. Science advances 4, eaat9004 (2018).
[11] Lloyd, S. & Weedbrook, C. Quantum generative adversarial learning. Physical review letters 121, 040502 (2018).
[12] Dallaire-Demers, P.-L. & Killoran, N. Quantum generative adversarial networks. Physical Review A 98, 012324 (2018).
[13] Liu, J.-G. & Wang, L. Differentiable learning of quantum circuit born machines. Physical Review A 98, 062324 (2018).
[14] Schuld, M. & Killoran, N. Quantum machine learning in feature hilbert spaces. Physical review letters 122, 040504 (2019).
[15] Havl´ıˇcek, V. et al. Supervised learning with quantum-enhanced feature spaces. Nature 567, 209 (2019).
[16] Benedetti, M., Lloyd, E., Sack, S. & Fiorentini, M. Parameterized quantum circuits as machine learning models. Quantum Science and Technology 4, 043001 (2019).
[17] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
[18] Voulodimos, A., Doulamis, N., Doulamis, A. & Protopapadakis, E. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience 2018 (2018).
[19] Socher, R., Bengio, Y. & Manning, C. D. Deep learning for nlp (without magic). In Tutorial Abstracts of ACL 2012, 5–5 (Association for Computational Linguistics, 2012).
[20] Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 251 – 257 (1991). URL http://www.sciencedirect. com/science/article/pii/089360809190009T.
[21] Leshno, M., Lin, V. Y., Pinkus, A. & Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6, 861 – 867 (1993). URL http: //www.sciencedirect.com/science/article/pii/S0893608005801315.
[22] Killoran, N. et al. Continuous-variable quantum neural networks. Physical Review Research 1, 033063 (2019).
[23] Zhao, J. et al. Building quantum neural networks based on a swap test. Phys. Rev. A 100, 012334 (2019). URL https://link.aps.org/doi/10. 1103/PhysRevA.100.012334.
[24] Kerenidis, I., Landman, J. & Prakash, A. Quantum algorithms for deep convolutional neural networks. In International Conference on Learning Representations (2020). URL https://openreview.net/forum?id= Hygab1rKDS.
[25] Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learn- ing. Phys. Rev. A 98, 032309 (2018). URL https://link.aps.org/doi/ 10.1103/PhysRevA.98.032309.
[26] McClean, J. R., Romero, J., Babbush, R. & Aspuru-Guzik, A. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18, 023023 (2016).
[27] Liu, J.-G., Zhang, Y.-H., Wan, Y. & Wang, L. Variational quantum eigen- solver with fewer qubits. Phys. Rev. Research 1, 023025 (2019). URL https://link.aps.org/doi/10.1103/PhysRevResearch.1.023025.
[28] Li, J., Yang, X., Peng, X. & Sun, C.-P. Hybrid quantum-classical approach to quantum optimal control. Physical review letters 118, 150503 (2017).
[29] Nakanishi, K. M., Fujii, K. & Todo, S. Sequential minimal optimization for quantum-classical hybrid algorithms. arXiv preprint arXiv:1903.12166 (2019).
[30] Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 303–314 (1989).
[31] Pinkus, A. Approximation theory of the mlp model in neural networks. Acta Numerica 8, 143–195 (1999).
[32] Stone, M. H. The generalized weierstrass approximation theorem. Mathematics Magazine 21, 167–184 (1948). URL http://www.jstor.org/ stable/3029750.
[33] Kratsios, A. The universal approximation property: Characterizations, existence, and a canonical topology for deep-learning. arXiv preprint arXiv:1910.03344 (2019).
[34] Aaronson, S. The complexity zoo.
[35] Kandala, A. et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246 (2017).
[36] Luo, X.-Z., Liu, J.-G., Zhang, P. & Wang, L. Yao.jl: Extensible, efficient framework for quantum algorithm design. arXiv preprint arXiv:1912.10877 (2019).
[37] Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[38] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).
[39] Shawe-Taylor, J., Cristianini, N. et al. Kernel methods for pattern analysis (Cambridge university press, 2004).
[40] Zelenko, D., Aone, C. & Richardella, A. Kernel methods for relation ex- traction. Journal of machine learning research 3, 1083–1106 (2003).