Restricted boltzmann machine via bayesian machine learning. We first discribe the restricted Boltzmann machine (RBM) model and the methodology to train it by utilizing bayesian framework [33]. The RBM model is a two-layer, bipartite neural network, which is a ”restricted version”of the Boltzmann machine with only inter-connections between hidden layers and visible layers. The visible layer v = (encodes the input data into a
dimensional binary string, and the hidden layer h = (
is composed of
stochastic binary variables. Then the joint probability of (v, h) is expressed as p(v, h) =
exp (
)), where the potential function E(v, h) =
. The Boltzmann weights w = (
compose a symmetry matrix, and the formalized factor Z =
exp (
)).
Given the training data , the Boltzmann weights can be estimated by utilizing the maximum a posterior (MAP) within the bayesian learning framework [29]:
The probability p(w) is the priori distribution of the Boltzmann weights, and p(D|w) is the likely-hood probability distribution, which can be computed as p(D|w) = ) on the RBM model.
Meanwhile, in some application scenarios, the predictive distribution of the new data point prevails over the Boltzmann parameter w itself, and we can directly express the predictive distribution
) by sum of
) with the weight parameter p(w|D):
In general, the distribution ) can not be analytically computed, therefore researchers often utilize Markov Chain Monte Carlo (MCMC) method to estimate this distribution. Nevertheless, if the machine learning tasks process the data on a high dimensional feature space Φ(
), estimating the posterior distribution
)) and predictive distribution
Φ(D)) will take a large amount of computational overheads which is extremely hard for classical computing. We therefore choose the quantum state space as the feature space, and we design a quantum bayesian framework to handle this issue.
Encoding phase of bayesian learning framework. The quantum bayesian framework contains two fundamental components, namely the encoding phase and the training phase. The first stage of encoding phase indicates the procedure that maps the real data onto the quantum feature space which is special designed for quantum bayesian learning. In quantum settings, the feature map is an injective encoding of
Figure 1: Circuits of generating the feature states and
, respectively.
classical information into a quantum state
on an
qubits register, s.t. Φ :
. The family of feature map circuit can be defined as
, where the unitary operator
The notation S indicates a subset that belongs to set [m] = , and
are non-linear functions of the input data
[34], and other choices of the feature maps are also possible such as squeezed vacuum state [28]. For the supervised classification tasks, the training data x is stored into two parts x = (
), namely data parts and label parts. Then the feature state is defined as:
where the operator = (
encodes the label parts into the last
qubits, and the binary string (
) indicates the
. For instance, if handling the binary classification task, the label is marked by
when
= 1, otherwise
.
On the other hand, the predictive data x, which does not have its label, is encoded by the quantum feature map:
Still taking the binary classification into consideration, the feature state of predictive data x becomes . Fig.1 illustrates the quantum circuits of these two quantum feature maps respectively.
The second subroutine aims at encoding the parameter w into a density matrix ), whose structure satisfies connection rules between the visible nodes and hidden nodes. To achieve this, we propose a parallel hardware-efficient ansatz to represent the priori density matrix
) of the RBM model (see Methods). Training phase of quantum bayesian framework. We then discuss the training phase of our framework. First, we deliver a quantum algorithm for maximum the posterior distribution, and this quantum algorithm is designed according to the measure of a physical quantity. We find the overlap between the priori density matrix
) and likelyhood density matrix
(Φ(
)) can be recognized as a principle to evaluate the posterior distribution in quantum settings, furthermore this physical quantity can be efficiently computed by quantum computer but #P hard for classical computer [35], and this measure reveals the quantum advantages. The validity for choosing this measure as well as the method for computing the likelyhood density matrix
(Φ(
)) are presented in Methods.
Specifically, the quantum maximum posterior distribution (QMAP) algorithm can be regarded as a procedure that adjusts the parameter to achieve
The QMAP algorithm starts with estimating the measure, s.t. Tr ((Φ(
))), by swap test technique [30]. Utilizing this information, QMAP algorithm is followed by simultaneous perturbation stochastic approximation (SPSA) approach [31] to find out the optimal parameter
. For every step k of the optimization, we sample from p symmetrical Bernoulli distributions ∆
using preassigned elements from a sequence converging to zero, s.t.,
. The gradient at
is approximated using energy evaluations at
=
, and is constructed as
Noting that this gradient approximation only requires two estimations of the energy, regardless of the number of variables in . The value of
) can be obtained by swap test technique, then the parameters are updated as
The step length is selected based on experience, and the optimal solution
convergences rapidly after several iterations in this iterative algorithm. So far, the QMAP algorithm obtains the optimal parameter
which can be used to classify the predictive data.
The classification rules of QMAP algorithm are interpreted as follows. After determining the optimal solution , the likelyhood distribution
) of a test data point
can be computed by the QCL subroutine (see Methods), where the training feature state
should be substituted by the testing feature state
. Then we utilize computational basis to measure the last
registers of
). Taking binary classification as an example, we select
basis to measure the third register. If the probability p(0) > p(1), then
is assigned to the class
1, otherwise +1.
The QMAP algorithm involves swap test technique as well as the QCL subroutine proposed in Methods, which actually takes gate complexity. The parameter n is the number of utilized qubits or the scale of the quantum system,
indicates the error taken by swap test, N represents the number of data points in the data set, and M is the iterative times of SPSA approach. Given the real data set
and the corresponding feature space Φ(D) =
Φ(
, then the QMAP algorithm achieves the exponential speed-up under the assumption of d = O(n).
Then we discuss another inference method in our framework. The quantum predictive distribution estimator (QPDE) can directly construct a predictive distribution rather than deliver the Boltamann weight, and this result naturally corresponds to a quantum state which represents such distribution. Given a predictive visible data (without label), the QPDE aims at obtaining the corresponding predictive distribution
Figure 2: (a) Experimental classification result by QMAP algorithm. The blue line chart shows the classifi- cation accuracies for different depths, and each depth involves training 400 data points and testing 40 data points for 10 times, where the mean values for the 10 times are represented by the square-shaped filled dots. And the inset blue bars shows the probability distribution of with depth 10 and accuracy 98.1%. (b) The accuracy distribution of 10 experimental results for each depth.
Figure 3: (a) Illustration of the classification accuracy utilizing QPDE, the number of sampling times vary from 20 to 80 with variation length 20. The sampling interval of parameter is 0
, and the depth of variational quantum circuit for generating state
equals to 5. (b) Illustration of the classification success with the changing of layer depth, from 5 to 10, the success curve witnesses from 0.849 to 0.854. (c) Success rates by QPDE for sampling 40, 60, 80 times, respectively.
where ) is the likelyhood distribution of the predicted data
that has been introduced above. We indicate the fact that traversing all the possible state
can be achieved by modifying the rotation parameter
accompanied by an appropriate error in Theorem 2 (see Methods). Generally, the predictive distribution
) can not be calculated analytically, unless priori and likelyhood distribution belong to the same conjugate distribution family. Furthermore, the likelyhood density matrix
Φ(D)), in which visible data has been mapped on the quantum feature space, is computationally hard by classical means because of the construction of the feature map
and overleap
). Thus we also believe that function (9) may not be estimated directly by Markov chain Monte Carlo (MCMC) method efficiently [34]. Combining MCMC method and quantum technique, the QPDE provides a method to estimate function (9).
Figure 4: (a) Example data used for QMAP algorithm. The data labels (red for +1 label and blue for -1 label). (b) Decision bound provided by QMAP algorithm with 10 layers depth. (c) Evolution of the quantum posterior distribution Tr ((Φ(
))) after 140 iterations of using SPSA algorithm. This graph corresponds to the quantum circuit with 10 layers depth.
We first select and prepare a sampling distribution ) which satisfies the condition
= 1, and there are some possible choices of
) such as uniform distribution, multivariate Gaussian distribution and multivariate Laplasian distribution. Then the function (9) can be rewritten as
This is followed by obtaining a set of samples (where l = 1, 2, ..., K) drawn independently from the distribution
), which allows the function (9) to be approximated by a finite sum
where
Noting that QPDE does not directly achieve ), but it just measures the last
qubits and then performs simple calculations on the measurement results. We still deliver a binary classification as an illustrative example. In detail, parameters Tr
(Φ(
can be calculated by the swap test, then one can measure the last qubit of
) by computational basis
, where the probability p(0), p(1) can be estimated by simply counting the measurement results. Finally
is assigned to the class
1 if p(0) > p(1), otherwise +1 class is assigned.
Noting that QPDE also needs the supportive of QCL subroutine, meanwhile sampling from the distribution ) takes additive complexity of O(K) times. Therefore, QPDE takes
time, where N is the number of data points in real dataset,
represents the error taken by swap test.
Figure 5: Wave-function evolutions by running QPDE for different sampling times on the two testing data points = (4.272566, 5.08938) and
= (5.08938, 3.39293), shown in the left side (a and b) and right side (c and d), respectively. There are eight amplitudes in the wavefunction and the
= +1 and
=
1.
Experimental results. The experimental preparation of the QMAP algorithm is illustrated as follows. We utilize 1 hidden node and 3 visible nodes
to construct a QRBM to implement a binary classification task. At first, we encode the classical training data
onto the quantum qubits
and
by utilizing the quantum feature map
, which is illustrated in Fig.6. Then we invoke the QCL subroutine to add the hidden node
onto the RBM model and computing the likelyhood distribution at the same time. Finally, QMAP algorithm can find out the optimal parameter
with the help of SPSA algorithm and swap test technique, and the evolution of quantum posterior distribution is illustrated as Fig.4(c). The optimal parameter
can be further examined by the testing data set. We set the iterative parameter
= 1
, afterwards the test results are illustrated as Fig.2. Our quantum algorithm yields 98.1% success for the test data set. The original data distribution and the decision boundary of the QMAP algorithm are illustrated as Fig.4(a) and (b).
For the experiments of QPDE, the preparation stage is to obtain a set of samples from a uniform
Figure 6: Illustration of the quantum restricted boltzmann machine (QRBM) which is composed of 6 qubits. In detail, qubits are utilized to encode hidden nodes (
) in the form of superposition 1
. The qubits
and
are the visible nodes, from which
encode the visible data via nonlinear operator
and
represents the corresponding label of x.
distribution ) and the function (9) can be evaluated as
where the parameter ), and
. We first implement the quantum variational classifier
) with 20, 40, 60, 80 sampling times, respectively, on the quantum simulation processor. We expect the classification success rate goes up with increasing the number of samplings. We utilize training sets consisting of 400 data points per label. And we take 40 testing data points for each class. Two of the testing data
= (4.272566, 5.08938) and
= (5.08938, 3.39293)’s wavefunction are shown in Fig.5 for different number of samplings. And the classification success is also illustrated in Fig.3(a). We observe that the upper bound of the accuracy converges to 0.850 albeit with more optimization steps. The number of layer depth of priori distribution
) is set to 5, and sampling interval is 0
.
Second, we implement the quantum variational circuit ) for different layer depth from 5 to 10 on the quantum simulation processor. We observe that the classification accuracy shows a slight rise from 0.850 to 0.854 with layer depth varying from 5 to 10, and maintains a constant regardless of increasing the layer depth. The sampling interval is set to 0
and the number of sampling times is 40.
Finally, we also implement the experiments on varying the sampling interval from 0to 2
when executing 40 to 80 different sampling times on the 5 layer depth quantum circuit. It is interesting to note that larger number of sampling times with larger
interval actually yield higher success rate up to 85.8%.
According to the experiments results, compared with QPDE, the QMAP algorithm has a larger complexity, but achieves higher success rate up to 98.1%. Therefore, we in practice have to face a tradeoff between complexity and success rate before choosing a more suitable one between the two algorithms.
In summary, we implement a quantum bayesian learning framework in the exponentially large feature spaces. Our framework provides two variational quantum algorithms, which build upon the realization that the posterior and predictive distributions are hard to estimate classically in the quantum-state feature spaces. Both of them can handle the classical dataset and output the classical predictive label, and it is shown that our algorithms are able to be exponentially faster than the classical counterparts. We hope our work can inspire more bayesian machine learning algorithms accessible to NISQ devices. .
Parallel hardware-efficient ansatz. In this section, we present a quantum technique to generate the priori probability in the pattern of quantum state. To process the data in the high dimensional quantum-enhanced feature space with ) dimension, researchers propose the hardware efficient ansatz to approximate an arbitrary quantum state with an affordable error [21]. To represent the relationship between the visible nodes and hidden nodes, we propose the parallel hardware-efficient ansatz whose fundamental quantum circuit is illustrated in Fig.7(a). If the RBM model has
hidden nodes
, the quantum priori quantum state can be prepared by
in which ) =
). The shallow quantum circuit
) can be achieved by appending layers of single-qubit unitaries and entangling gates, and each layer contains an additional set of entanglers across all the qubits used. The operator
) is a circuit of l repeated entanglers, and interleave them with layers comprised of local single qubit rotations:
) =
. It is interesting to
note that ) is only confined to invoking
and
, of course, there are other ways to construct the rotations. The rotation angles
are sampled from a classical distribution (such as uniform distribution, multivariate Gaussian distribution and multivariate Laplasian distribution), and the entangler
comes from the graph model G(V, E), s.t.|V | = n, which is uniquely determined by the structure of the RBM:
The implementing of the parallel hardware efficient ansatz ) =
) depends on the Hadamard gate H and controlled unitary W =
), i.e.,
where= 0. Therefore the priori probability distribution
) can be computed by measuring the above state on the basis
, and the oblivious amplitude amplification technique [3] can realize the polynomial acceleration in this measuring step.
Density matrix of likelyhood distribution. In the feature space, the posterior distribution of RBM can be further simplified as
where denotes the j-th column vector of W. Noting that Φ(
is actually an overlap, or the inner product, between the vectors Φ(
) and
[33]. Given the weight state
and feature state
, we attempt to estimate there overlap
using quantum computer. The implementing of the following quantum algorithm depends on two two quantum techniques provided in Supplementary materials. The first technique [38], the combination of amplitude estimation [36] and swap test, promises
Figure 7: (a) Illustration of the quantum circuit of W = ) by introducing log
ancilla qubits. And the gate complexity of W is
log
). (b) Variational circuit used for encoding the boltzmann weight between the hidden nodes
and visible nodes. We choose the hardware-efficient ansatz for the variational unitary
) =
(
)
(
), with entanglement layer
. (c) The fundamental structure of
.
our proposed algorithm enabling calculating that overlap and immediately encoding it into the register, and the second one interprets the procedure of linear combination of unitary. Drawing supportive from these two techniques, we can implement the following quantum algorithm that computes likelyhood distribution in quantum settings.
Subroutine (QCL): Quantum algorithm for computing likelyhood density matrix (Φ(D)|w)
Input: A quantum restricted boltzmann machine composed of visible nodes and
hidden nodes; n qubits initialized to
to encode
visible nodes; n qubits initialized to
to encode
hidden nodes; an auxiliary qubit; and training data set
= (
.
Output:Likely-hood distribution (Φ(D)|w)
1. For any training data , we first construct quantum state
in the first system, and then apply Hardmard gate H to the auxiliary qubit (the second system) resulting in the state of the whole system
The operation is performed onto the the third system controlled by the second register, then we obtain
2. According to the linear combination of unitary (LCU) technique, we have
3. Considering the QRBM model with hidden nodes, for the first hidden node
and corresponding weight states
. Construct the controlled operator
) and apply it on the second and the third system, where the parameter
is determined by the previous phase. The resulting state becomes
4. Perform Hadamard gate H on the second system, then the system turns to the following state by invoking the analogue of swap test technique (see Supplementary material), s.t.
5. Perform exponential operator onto the third system, then add ancillary qubit
and utilize controlled rotation procedure, s.t.
exp(
), so the state becomes
the number of qubits to describe the overlap ), and the controlled rotation therefore takes complexity of O(t).
6. Uncomputing the step 3 and step 4, we have the state
Then repeat the steps 3-5 adding the hidden nodes respectively, we finally obtain the likely-hood state of the training data D under the quantum restricted boltzmann machine model.
in which the probability parameter ) =
exp
is the likely-hood proba- bility of the data
. We therefore obtain the likely-hood distribution density operator
(Φ(D)|w) = |P (Φ(
(Φ(D)|w) |.
To save the sources of qubits, actually, one can also measure the third system and save each ) on the classical memory without utilizing amplitude estimation technique, and finally designs the unitary operator
and perform it on the initial state to achieve the state of Eq.(25) when implementing the experiments. Thus the procedure does not depend on the auxiliary qubit and amplitude estimation algorithm necessarily, furthermore this schema does not submerge the quantum supremacy taken by the feature state. On the other hand, we should also point that the number of hidden nodes
(1) is a constant which does not depend on the scale of visible data set D. In fact, our experiment only sets 1 hidden nodes but manifests a descent performance.
Theorem 1: Suppose the ansatz ) can be generated in the way of
) =
) (
), and there are enough layers in the quantum circuit
), s.t.
can generate an arbitrary state, then the physical quantity Tr (
(Φ(
))) can be recognized as a measure to represent posterior distribution in quantum settings.
Proof: If we expand the physical quantity under the computational basis, then we have
If we traverse all the possible w, this physical quantity can be simplified as:
Therefore, the physical quantity, s.t. the overlap between likelyhood and priori density matrixes, can be recognized as a measure to represent posterior distribution in quantum settings.
Theorem 2: If the ansatz ) can be prepared by
) =
) (
), then modifying the rotation parameter
can actually traverse all the possible weight parameter
with an affordable error.
Proof: Suppose we perform a sequence of gates intended to approximate some other sequence of gates:
. Then it turns out that the error caused by the entire sequence of imperfect gates is at most the sum of the errors in the individual gates:
Now we consider the local single qubit rotation ) =
, then the difference between
) and
+ ∆
) can be measured as
Combining the theorem, therefore the difference between and
can be measured as
Thus the microelement ), which implies the fact that modifying the rotation parameter
can actually traverse all the possible weight states with an exponential small error.
1. M.Nielsen and I.Chuang. Quantum computation and quantum information. Mathematical Structures in Computer Science, 21(1):1-59 (2002).
2. C.H.Bennett, J.I.Cirac, M.S.Leifer, D.W.Leung, N.Linden, S.Popescu, and G.Vidal. Optimal simulation of two-qubit hamiltonians using general local operations. Physical Review A, 66(1):144-144 (2001).
3. D.W.Berry, A.M.Childs, R.Cleve, R.Kothari, and R.D.Somma. Simulating hamiltonian dynamics with a truncated taylor series. Phys. Rev. Lett, 114(9):090502 (2014).
4. G.H.Low and I.Chuang. Optimal hamiltonian simulation by quantum signal processing. Phys. Rev. Lett., 118(1):010501 (2017).
5. P.W.Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. Siam Rev., 41(2):303-332 (1999).
6. Grover and k.Lov. A fast quantum mechanical algorithm for database search. Phys. Rev. lett., 78:212- 219 (1996).
7. P.Wittek and S.Lloyd. Quantum machine learning. Nature, 549(7671):195 (2017).
8. W.A.Harrow, A.Hassidim, and S.Lloyd. Quantum algorithm for linear systems of equations. Phys. Rev. lett., 103(15):150502 (2009).
9. Leonard Wossnig, Zhikuan Zhao, and Anupam Prakash Quantum linear system algorithm for dense matrices. Phys. Rev. lett., 120, 050502 (2018).
10. L.C.Wan, C.-H.Yu, S.J.Pan, F.Gao, Q.Y.Wen, and S.J.Qin. Asymptotic quantum algorithm for the toeplitz systems. Phys. Rev. A (2018).
11. M.Schuld, I.Sinayskiy, and F.Petruccione. Prediction by linear regression on a quantum computer. Phys. Rev. A, 94(2) (2016).
12. P.Rebentrost, M.Mohseni, and S.Lloyd. Quantum support vector machine for big data classification. Phys. Rev. lett., 113(13):130503 (2014).
13. S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum algorithms for supervised and unsupervised machine learning. arXiv (2013).
14. S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum principal component analysis. Nature Physics, 10(9):108-113 (2013).
15. C.H.Yu, F.Gao, and Q.Y.Wen. Quantum algorithms for ridge regression. arXiv:1707.09524 (2017).
16. N. Wiebe, D. Braun, and S. Lloyd. Quantum algorithm for data fitting, Phys. Rev. Lett. 109, 050505 (2012).
17. Y. Liu and S. Zhang, Fast quantum algorithms for least squares regression and statistic leverage scores, Theor. Comput. Sci. 657, 38 (2017).
18. P.Rebentrost and S.Lloyd. Quantum gradient descent and newton’s method for constrained polynomial optimization. arXiv (2016).
19. B.Kulchytskyy, E.Andriyash, M.Amin, and R.Melko. Quantum boltzmann machine. Phys. Rev. X, 33(2):489-493 (2016).
20. A. Peruzzo, J. McClean, P. Shadbolt, M. Yung, X. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O a´rBrien, A variational eigenvalue solver on a quantum processor. Nat. Commun.4213 (2014).
21. A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature.549, 242 (2017).
22. E. Farhi, J. Goldstone, and S. Gutmann, A Quantum Approximate Optimization Algorithm, arXiv:1411.4028, (2014).
23. E. Farhi and A. W. Harrow, Quantum supremacy through the Quantum Approximate Optimization Algorithm, arXiv:1602.07674, (2016).
24. J. S. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. S. Fried, S. Hong, P. Karalekas, C. B. Osborn, A. Papageorge, E. C. Peterson, G. Prawiroatmodjo, N. Rubin, C. A. Ryan, D. Scarabelli, M. Scheer, E. A. Sete, P. Sivarajah, R. S. Smith, A. Staley, N. Tezak,W. J. Zeng, A. Hudson, B. R. Johnson, M. Reagor, M. P. da Silva, and C. Rigetti, Unsupervised machine learning on a hybrid quantum computer, arXiv:1712.05771, (2017).
25. K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quantum circuit learning, Phys. Rev. A 98, 032309 (2018).
26. Ryan LaRose, Arkin Tikku, ´letude O
a´rNeel-Judy, Lukasz Cincio and Patrick J. Coles Variational quantum state diagonalization, npj Quantum Information (2019).
27. M. Schuld, K. Brsadler, R. Israel, D. Su, and B. Gupt. A quantum hardware-induced graph kernel based on Gaussian Boson sampling. arXiv:1905.12646.
28. M. Schuld, Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. (2019)
29. N. Friedman et al. Bayesian network classifiers. Machine Learning, 29(2-3):131-163, (1997).
30. H. Buhrman, R. Cleve, J.Watrous, and R. deWolf, Quantum fingerprinting. Phys. Rev. Lett. 87, 167902 (2001).
31. Spall J. C. Multivariate stochastic approximation using a simultaneous perturbation gradient approxi- mation In Proceedings of C IEEE Transactions on Automatic Control, pages 332-341 (1992).
32. Zhikuan Zhao et al. Bayesian deep learning on a quantum computer, arXiv:1806.11463, (2018).
33. Hanchen Xiong et al. Towards sparsity and selectivity: bayesian learning of restricted Boltzmann machine for early visual features ICNN, (2014).
34. Havlicek, Vojtech, et al. Supervised learning with quantum-enhanced feature spaces. Nature: 567.7747 (2019): 209-212.
35. Goldberg L. A., Guo H. The complexity of approximating complex-valued Ising and Tutte partition functions Computational Complexity., 26(4): 765-833, (2017).
36. G.Brassard, P.Hoyer, M.Mosca, and A.Tapp. Quantum amplitude amplification and estimation. Quan- tum Computation and Information, 5494:53-74 (2012).
37. Huawei HiQ team, ’Huawei HiQ: A High-performance quantum computing simulator and programming framework’, http://hiq.huaweicloud.com.
38. S.S.Zhou, T.Loke, J.A.Izaac, and J. B. Wang. Quantum fourier transform in computational basis. Quantum Information Processing, 16(3):82 (2017).
39. Yusen Wu et al. Quantum conditional random field. arXiv:1901.01027, (2019).
This work is supported by NSFC (Grant Nos. 61672110, 61671082, 61976024, 61972048), and the Fundamental Research Funds for the Central Universities (Grant No.2019XD-A01). We thank Bailing Zhang, Shijie Pan and Binbin Cai on discussing the writing thoughts, and we also acknowledge using of the HiQ for this work.
Y.W. contributed to the initiation of the idea. All authors wrote and reviewed the manuscript.