The D-Wave 2000Q quantum annealing system is an adiabatic quantum system [1], which is composed of a 2048-qubit processor. The quantum annealing algorithm can be used to achieve a performance improvement for a number of optimization problems [2, 3, 4]. Recent attention has explored both the theoretical prospects of machine learning on the D-wave [5, 6, 7] and using the D-Wave for machine learning applications [8, 9]. Building on previous work, we describe a hybrid system that uses a deep convolutional autoencoder neural network that provides a way to compress images to be used for quantum machine learning. We demonstrate this compression technique using a Restricted Boltzmann Machine (RBM) and the D-Wave 2000Q. The D-Wave 2000Q is limited by the number of qubits available, which limits the overall size of the problem that can be embedded on the D-Wave. In addition, using the D-Wave for image-based sampling provides another complication in that only binary information can be sampled. We overcome both of these issues in our work by using a classical deep convolutional autoencoder to provide a translation from images represented on the classical machine and image representation on the D-Wave. In doing so, we are able to achieve compression currently from an image size of 28 x 28 grayscale to an image size of 6 x 6 binary and recover the original 28 x 28 grayscale (with some loss). We show how we have been able to train a RBM using the D-Wave as the sampler and how sampling from the D-Wave provides enough low-level noise to extract new image variants of images. We compare the results of a hybrid quantum RBM with a hybrid classical RBM using a downstream classification problem. This approach for quantum compression and image mapping is not constrained to quantum RBMs, but could be coupled with other quantum-based algorithms.
RBMs have a long history dating back to the original paper introducing Boltzmann machines [10]. Boltzmann machines are known for their intractability on classical machines [11] due to the connectivity among units. In addition to every unit in the visible layer being connected to every unit in the hidden layer, the Boltzmann machine also contains connections between units within the same layer. RBMs are restricted in that there is no connectivity among units within a given layer. An early example of an autoencoder, the RBM is a simple neural network. It is distinct from other neural networks in that it is probabilistic and represents an undirected graphical model.
Due to this property, RBMs can be used to learn the stochastic representation of its input, modeling the input distribution [11]. During training, the model parameters are changed so that the probability distribution fits the input data [12].
The general structure of a RBM is one visible layer, one hidden layer, and two corresponding bias vectors as shown in Figure 1. RBMs in the standard form have binary values for the v visible and h hidden units.
Figure 1: Restricted Boltzmann Machine Network Diagram.
2.1 Energy Function
The energy of a given combination of v and h units is defined by the following equation:
where v represents the visible units and h represents the hidden units, a and b represent the bias, and w represents the weight matrix and E defines a probability distribution over v and h and is used for measuring the quality of the model by minimizing the E.
The joint probability distribution is defined in terms of the following equation (Gibbs distribution):
Z defines a partition function, which acts as a normalizer for this equation calculated over all possible states for v and h.
Hidden and visible unit states are independent and calculated using conditional probabilities where the conditional probability of the visible units state is conditioned upon the current state of the hidden units, and conditional probability of the hidden units state is conditioned upon the current state of the visible units. This independence is due to the that fact that there are no connections between units of a given layer. This improves the Gibbs sampling method because the states for a given layer can be jointly sampled [12].
2.2 Learning
Maximizing the probability of the training data for the model can be compared to the maximum likelihood estimation, where a likelihood function is maximized by some configuration of the state space. To maximize the likelihood function, the gradients of the log-likelihood are needed. Calculating these gradients are not tractable. Since gradient calculations are intractable, Gibb’s sampling [13], a Markov Chain Monte Carlo (MCMC) method, is used to sample from the joint Boltzmann distribution. Often the Gibb’s sampling requires many steps and can become computationally expensive. Therefore, a method known as Contrastive Divergence [14] can be used to perform only n steps of the Gibb’s sampling. It has been shown that even in some cases 1 step is enough for training the RBM [14].
Given a set of training samples the RBM is trained to learn how to adjust model state such that the probability distribution is fit to the training data probability distribution.
2.3 Generative Models
Generative models model the joint probability distribution rather than a conditional probability distribution. For image generation, deep networks are used to learn a distribution that is similar to the input data distribution. The distribution of sampled output does not have a relationship with the distribution of samples from input variables. RBMs learn a joint probability distribution P(v, h), where v represents the visible units and h represents the hidden units. Given P(v, h), sampling from this distribution, could enable generated output that is not necessarily a recreation of a sample from the input distribution. Previous work has used RBMs for generative sampling [15, 16, 17]. We illustrate this type of learning in Figure 2.
Figure 2: Generative Learning Data Distributions.
2.4 RBMs on the D-Wave
In order to implement a RBM which uses the D-Wave, the RBM problem to be solved on the D-Wave needs to be expressed as a QUBO objective function [18]. We express the QUBO in terms of a chimera graph [19] which is the architecture formed by D-Wave qubit connectivity. This architecture (a 2-D grid) entails groupings of ’unit cells’ that are connected, where each cell contains a set of qubits that have bipartite connectivity locally and is connected to qubits in other sets through couplers[20]. Minor embedding entails mapping logical qubits to physical qubits where nodes and edges map to logical qubits and couplers in the chimera graph. This architecture is similar to the architecture of a RBM.
Bias noise in this model are different than bias noise in classical RBMs, in that quantum RBM bias variables are random [21].
We follow the approach of mapping images to the RBM by having each pixel of the image represented by a visible unit of the RBM. However, since images go through the autoencoder before RBM processing, we formulate an encoding based on the number of pixels and a compression size.
Early work by Dorband [22] explored a RBM implementation using the D-Wave. This implementation used a different approach and was not necessarily identifying generative differences between a classical and quantum approach. Generative sampling using the D-Wave and a similar approach for the RBM, in the past yielded poor results. In the work by Thulasidasan et al. [23], generative sampling after training MNIST down-scaled samples were not visibly distinguishable. Previous work by Adachi et al. [24] described a generative RBM with 32 visible nodes and 32 hidden nodes using 512 qubits for sampling. They showed that the quantum sampling was able to achieve comparable accuracy to the classical system with fewer iterations.
Work by Romero et al. [5] describe the need for tools which reduce experimental overhead as advantageous. They describe the idea of a quantum autoencoder. Though they describe using the quantum autoencoder for compression, they solve a different problem than what we propose. Work by Ni et al. [25] performed a comparison between a classical RBM and a quantum RBM using binary problems and saw some improvements using the quantum RBM. Most experiments were simple binary problems. We build on this work and extend it to support RBMs for real problems (such as MNIST). Recent work by Amin et al. [6] explored using a quantum RBM for generative sampling. Interestingly, when experimenting with their model of fully-connected 8 inputs and 3 outputs of binary data, they showed that the distribution learned using the quantum method when compared with the classical for a small test was very different from the actual distribution. Recent work by Khoshaman et al. [7] used a quantum Boltzmann machine to generate the latent space for a Variational autoencoder, showing state of the art results using the MNIST dataset. This work is most closely related to our work in that we both explore using the D-Wave for sampling to generate latent space. However, our approach includes providing a hybrid classical quantum approach to overcome quantum hardware limitations that affect the number of qubits available to represent problems.
To summarize, many of the early theoretical contributions have shown that using the D-Wave for generative sampling can enable faster learning of the latent space. However, as Amin et al. [6] concluded, the learned distribution can deviate significantly from a given actual distribution. We believe the classical autoencoder performing the translation between the classical system and the quantum system acts as a stabilizer for the latent space since part of the latent space is captured on the classical side.
Our approach is designed to address the challenges of image sampling using the D-Wave with regards to the number of qubits available by simultaneously mapping binary output to floating point values and the original image space to a compressed binary image space. The overall architecture of our approach is shown in Figure 3 using the MNIST dataset as an example dataset. An autoencoder is trained on the original input, a bottleneck is defined in terms of the compression size, and the autoencoder learns how to map binary compressed encodings to the original grayscale full resolution images.
We use a 3-layer convolutional autoencoder with compressions sizes of 6 x 6 and 7 x 7. We use the bottleneck of the autoencoder to generate the binary compression.
After training, we take the encoded data and use it to train the RBM. Each pixel of the encoded input is represented by a unit in the visible layer. For example, a 6 x 6 binary compression would have 36 units in the visible layer. To create a mapping of this for use on the D-Wave we create an embedding which has 36 visible units, 18 hidden units, and connectivity from each visible unit to each hidden unit. Hence, with 2048 qubits available (not all are available) on the D-Wave 2000Q system, if we compress larger than a 7 x 7 size we go beyond the capacity of the D-Wave.
During training, the RBM learns the best configuration for recreating the binary encodings, meaning the learned data distribution moves towards the actual training data distribution. When this learned distribution no longer improves (converges), we use the trained RBM for sampling binary encodings. Those encodings are then decoded by the autoencoder to obtain images in the original representation space, both in size and in pixel values (grayscale rather than binary).
Figure 3: Hybrid Approach that used a Classical Autoencoder to map the Image Space to a Compressed Space.
We used the MNIST dataset to evaluate this approach. The autoencoder runs on a classical machine (Intel Core i7-7700HQ CPU 2.8GHz x 8) with 32 GBs of memory, using a GeForce GTX 1060 GPU and takes as input grayscale MNIST digits at a size of 28 x 28. The output from the encoder is compressed binary data of a size 7 x 7 in run 1 and 6 x 6 in run 2 as binary encodings. The autoencoder is trained using the 60,000 MNIST training digits. The RBM is then trained using the binary encodings of these 60,000 digits. Using the D-Wave during training, the model is updated by sampling states of P(v, h) that minimize E. Once convergence is reached during training, i.e. samples can be recreated, the training is ended. The D-Wave is then used for sampling to obtain learned binary encodings. The samples obtained from the D-Wave are then given to the classical autoencoder to recover 28 x 28 sized grayscale MNIST images. To assess the quality of the MNIST generated images, we performed downstream classification experiments.
For the quantum RBM, we use the D-Wave API for working with the quantum annealer. We use the D-Wave 2000Q solver. We create a BinaryQuadraticModel for the QUBO and use minorminer to create the embedding. Samples are obtained using the DWaveSampler. These experiments do not include the new D-Wave Hybrid API (our experimentation began before the Hybrid API was available). The classical RBM is built to be as similar as possible to the quantum RBM. Both methods use Gibbs sampling and contrastive divergence. Both have a visible layer with unit size equal to the number of encoded values (36 for 6 x 6 compressions and 49 for 7 x 7 compressions) and a hidden layer that is half of the size of the visible layer.
After training the quantum RBM we sample from the D-Wave to obtain 60,000 samples, where each class is balanced based on the original size of the MNIST training data. We use those samples to train a downstream deep convolutional neural network to classify 10,000 unseen MNIST test digits. We compare these results with training a downstream deep convolutional neural network using the original MNIST training data and classify the same 10,000 unseen MNIST test digits. We also compare these results with training a downstream deep convolutional neural network using the classical RBM recreated samples as training data and classify the same 10,000 unseen MNIST test digits. With the classical RBM Gibbs sampling is used.
Though classical RBMs and quantum RBMs are different in the way they learn, this comparison provides insights into how the D-Wave sampling method compares to the Gibb’s sampling method performed on the classical system. These experiments are intended to be the overfit case, in that we train on a set of images and try to regenerate those training samples.
The training data in each experiment consists of 60,000 samples (with a subset reserved for validation) and the test set consists of 10,000 samples (unseen). We trained the classifier for 5 epoch, since accuracy can reach 100% with the original data set in 5 epoch.
As shown in Table 1, the classifier that was trained with the original MNIST training data, achieved about a 99% accuracy (ExpID 1). This measure serves as a benchmark for maximum accuracy that could be achieved. We then used samples from a classical RBM after training it on the original 28 x 28 grayscale MNIST digits. The learned samples were then used to train the downstream MNIST classifier. In this case, the results were 98% (ExpID 2) as shown in Table 1. We used to get a baseline for what the classical RBM trained on the original data could achieve.
Table 1: Comparing MNIST classification accuracy scores when using the original training data and samples generated from a classical RBM.
The next set of experiments as shown in Table 2 was used to evaluate the classical autoencoder binary compression recovery of the original MNIST data representation.
Table 2: Comparing MNIST classification accuracy when using different compressed encodings that are decoded to recover the original MNIST representations.
ExpID 3 was used to evaluate MNIST digits that were converted from grayscale to binary then to grayscale using the original 28 x 28 size. ExpID 3 was used to evaluate compression only, the method of converting from grayscale to binary and back to grayscale. The results in this case were 97% accuracy (averaged over three runs). This was compared to compressing the digits to a binary 16 x 16 size then decoded back to 28 x 28 grayscale (ExpID 4). When using these images for training the classifier it achieved a 95% accuracy. The importance of a 16 x 16 compression is that this provides an anticipated size that the next generation D-Wave will support given an increase in the number of qubits that will be available. We treat this accuracy as a potential upper bound for what could be achieved when a 16 x 16 sized compression could be supported on the D-Wave. ExpID 5 evaluated MINST digits compressed to a binary 6 x 6 size and decoded back to the 28 x 28 grayscale representation, achieving a 91% accuracy. These encoding/decoding results provide a secondary baseline for results, in that they provide an upper bound for accuracy that could be obtained using encoded learned samples from a RBM.
Examples of 6 x 6 grayscale encoding/decoding results are shown in Figure 4, where the first row represents the original 28 x 28 MNIST digits, the second row represents the 6 x 6 grayscale encodings, the third row represents the 6 x 6 binary encodings and the fourth row represents the recovered 28 x 28 digits. It is observed from the decoded digits that there are times when the digits are incorrectly decoded in the case of 6 x 6. This is due to the loss incurred when compressing from the 28 x 28 to 6 x 6 binary.
Figure 4: Examples of 6 x 6 binary encoding and decoding to recover 28 x 28 grayscale MNIST digits.
In Table 3 we show results related to using MNIST digits recovered from the encodings of the size required for embedding the RBM model on the D-Wave.
Table 3: Comparing MNIST classification accuracy when using classical and quantum RBM samples.
We compress MNIST digits from 28 x 28 to binary 7 x 7 and also 28 x 28 to binary 6 x 6. In these experiments, we encoded and converted the MNIST digits to a 6 x 6 binary representation. We then trained a purely classical RBM and also the quantum RBM. The number of visible layers was composed of 36 units. The number of hidden layers was composed of 18 units for both RBMs. We set the learning rate for the classical RBM to be the same as the quantum RBM.
We achieved a 75% accuracy using the downstream classifier with the samples obtained Gibbs sampling using the classical RBM (ExpID 6). We achieved a 72% accuracy using the downstream classifier with the samples obtained from the quantum RBM (ExpID 7). By modifying the autoencoder to include dropout layers and by increasing the RBM epoch, we were able to achieve a 12% increase in downstream classification results which gave us a 72% accuracy (previously we achieved 60% accuracy on average).
For the quantum RBM, using 6 x 6 binary encodings, we were able to recover MNIST digits, as shown in Figure 5 when sampling from the D-Wave after training and using the classical autoencoder to decode sampled encodings.
Figure 5: Recovered MNIST digits from the quantum RBM after a 6 x 6 binary encoding.
The quality of the digits tend to have a better appearance when using 7 x 7 binary encodings sampled from the D-Wave and decoded using the classical autoencoder, as shown in Figures 6. Though downstream classification results were not significantly different with modest improvements using 7 x 7 decoded samples.
Figure 6: Recovered MNIST digits from the quantum RBM after a 7 x7 binary encoding.
All results are shown in Table 4.
Table 4: Downstream classification results for all ExpIDs.
6.1 Measuring Image Similarity
To understand how samples generated using the quantum RBM differ from the original data set and from the classical RBM samples, we use a metric to measure image structural similarity [26] defined by the following equation:
Where x,y are the two images to be compared and is the average of
is the average of
is the variance of x,
is the variance of
is the correlation coefficient of x and y.
We use this method to compare MNIST generated images to the original training data set of 60000 samples. We also use this measure to measure how much similarity there is among samples within a dataset. To use SSIM for the whole dataset would require comparing each image against 60,000 images. This computation would be extremely computer intensive. Instead, given we have a dataset, , we randomly select n set of images from
and compare each image with the rest of the 59, 999 images and average the results to calculate the SSIM. In the case of measuring the SSIM of
and
, if we have 1-to-1 mapping between
and
, we compare them 1-to-1. If we do not, have a 1-to-1 mapping, as in the case with the D-Wave generated samples, we use the same sampling method.
To evaluate what is the best n, we experimented with different values for n. In Table 5, we show the averege SSIM score given the number of samples used to calculate it for a given dataset of size 60,000 samples. Given 10 samples, 100 samples, or 1000 samples, a stable average of how much similarity there is across digits is consistent. Therefore, we use 100 samples to measure similarity among images in a given dataset.
Table 5: Measuring how similar a sample of decodings is to the remaining decodings in the dataset after binary encoding/decoding and comparing this to measuring similarity among the original MNIST digits.
We compare this measure for the original MNIST dataset with digits that are decoded after encoding and compressing down to a 7 x 7 size. For 7 x 7 after binary encoding and decoding to the original grayscale 28 x 28 representation, image sample similarity scores shows only a .01 difference from the original MNIST to the encoded/decoded recovered digits. In Table 6 we show a comparison of image sample similarity scores for the original MNIST dataset, the encoded/decoded recovered digits, and the classical RBM learned encoded/decoded recovered digits. As observed, there tends to be more duplication among what is learned using the classical RBM.
Table 6: Measuring dataset similarity using SSIM and comparing the original MNIST dataset with the binary encoded/decoded dataset, with the Classical RBM learned binary encoded/decoded dataset, and with the Quantum RBM learned binary encoded/decoded dataset.
To get a better ideas of how much images overlap structurally for a given class, we calculated SSIM measures for each class of the original training data set, again using 100 sized sample sets. The digits which are classified as the number 1 tend to have a higher average SSIM score, as would be expected. We show these results in Table 7.
Table 7: Measuring dataset similarity using SSIM by Digit using the original MNIST dataset.
6.2 Generating MNIST Samples
We examined individual MNIST digits of large samples taken from the D-Wave. Using the MNIST digit 3, we show a large sampling in Figure 7 of this digit after training the RBM using the D-Wave as the sampler. We sampled 100,000 3’s.
Figure 7: D-Wave samples of the digit 3 after training the RBM.
We also show its binary output and recovered digit in Figure 8 from the D-Wave sampling. As can be seen, both binary output and recovered digits represent a distinct set of 3’s that were not necessarily part of the original training distribution.
Figure 8: D-Wave samples of the digit 3 after training the RBM.
We were able to produce these variations in samples trained on 100’s of samples of the original training digits. Reproducing these variations in digits was repeatable on the original D-Wave 2000Q system. When the D-Wave 2000Q was replaced with the D-Wave 2000Q lower-noise system, (holding all parameters constant), we were not able to reproduce these results, as shown in Figure 9.
Figure 9: Using Low Noise D-Wave Quantum Annealing for RBM sampling training on the digit 3.
In addition, using simulated annealing for sampling also produced results that were not comparable shown in Figure 10. We conclude from these results that there was enough thermal temperature fluctuations that enabled the D-Wave sampling to result in variations in sampled encodings, and digits after decoding the generated encodings.
Figure 10: Using Simulated Annealing for RBM sampling training on the digit 3.
However, with the D-Wave 2000Q lower-noise system, when we reduce learning rates and increased the number of epoch, we were able to achieve variations in samples.
6.3 The First Quantum Generated Fashion
In addition to the MNIST dataset, we also trained the quantum RBM using the MNIST-Fashion dataset [27]. Designed to be similar to the MNIST dataset as shown in Figure 11, but harder to classify, the Fashion MNIST dataset provides another dataset for experimenting with the D-Wave. There are 10 classes, 60,000 images in total, and the images are grayscale sized at 28 x 28.
Figure 11: Fashion MNIST
We show in Figure 12 the first row is the original grayscale fashion images, the second row is the 7 x 7 encoded fashion, the third row is the 7 x 7 binary encoded fashion and the final row is the 28 x 28 grayscale decoded fashion. As can be observed, the autoencoder tends to have a harder time generating details on shirts, shoes, and handbags.
Figure 12: Fashion MNIST Generated By Sampling the D-Wave After Training the Quantum RBM and Decoding the 7 x 7 Samples.
In Figure 13 we show decoded samples from the D-Wave after training the quantum RBM. Future experiments will include classification of generated Fashion MNIST samples.
Figure 13: New Shirts - Fashion MNIST Generated By Sampling the D-Wave After Training the Quantum RBM and Decoding the 7 x 7 Samples.
The method we described provides a way to overcome the limitations of the D-Wave 2000Q by providing a hybrid method from mapping original data representations to a representation that could be processing using quantum annealing.
Our prior attempts at using the classical autoencoder for this work were unsuccessful where learned encodings from the D-Wave did not decode to digits. Initially we began with a fully connected autoencoder, which showed only hints of a digit recovered. We saw improvement when moving to a convolutional neural network for the autoencoder. However, the real improvements came when using a denoising autoencoder for this work. By applying Gaussian noise to the images prior to encoding them, we saw improved quality when decoding the quantum sampled learned encodings.
Using a classical RBM with Gibb’s sampling did not produce the same sort of variations as when the D-Wave was used as the sampler. In fact, when running a number of experiments varying the epoch and learning rate on the classical RBM, we often saw the network learning only one digit and we saw this more frequently using 6 x 6 compressed encodings. As we moved up to 16 x 16 encoding we could finally see the classical RBM learning different types of digits. We conclude from these experiments that the quantum RBM was able to tolerate and learn from these highly compressed encodings whereas the classical RBM could not. Though we are able to decode results from the classical RBM, they often decoded to same digit.
Though RBMs are generative, using a RBM to generate new images from a latent representation, as opposed to recreating training samples, is not typically performed on the classical computer but has been achieved, for example with deep layered RBMs [28]. By taking advantage of the inherent noise on the D-Wave and the natural quantum properties of the D-Wave, we have been able to successfully use it to generate images. By generating images from sampling the D-Wave, using the 2000Q lower-noise system, we conclude that the variations in images were not solely due to thermal temperature fluctuations. However, more experiments are required to prove this claim completely. Though there were runs when the quantum D-Wave RBM learned only a single digit, we were able to successfully, repeatably create these variations. In addition, we observed when epoch reaches a certain threshold, the samples tend to collapse to a single image.
Given the samples we collect from the D-Wave after training the RBM using the MNIST dataset encodings and also after training the RBM using the MNIST-Fashion dataset, we show early quantum image generation. Thus, paraphrasing Biamonte et. al., "if a small quantum [D-Wave 2000Q] processor can produce statistical patterns that are computationally difficult to be produced by a classical computer, then larger quantum annealers [perhaps the 5000 qubit D-Wave processor using the above hybrid RBM method] might recognize patterns that are significantly more difficult to recognize classically".
[1] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. Quantum computation by adiabatic evolution. arXiv preprint quant-ph/0001106, 2000.
[2] Yoichiro Hashizume, Takashi Koizumi, Kento Akitaya, Takashi Nakajima, Soichiro Okamura, and Masuo Suzuki. Singular-value decomposition using quantum annealing. Physical Review E, 92(2):023302, 2015.
[3] Florian Neukart, Gabriele Compostella, Christian Seidel, David Von Dollen, Sheir Yarkoni, and Bob Parney. Traffic flow optimization using a quantum annealer. Frontiers in ICT, 4:29, 2017.
[4] Hayato Ushijima-Mwesigwa, Christian FA Negre, and Susan M Mniszewski. Graph partitioning using quantum annealing on the d-wave system. In Proceedings of the Second International Workshop on Post Moores Era Supercomputing, pages 22–29. ACM, 2017.
[5] Jonathan Romero, Jonathan P Olson, and Alan Aspuru-Guzik. Quantum autoencoders for efficient compression of quantum data. Quantum Science and Technology, 2(4):045001, 2017.
[6] Mohammad H Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko. Quantum boltzmann machine. Physical Review X, 8(2):021050, 2018.
[7] Amir Khoshaman, Walter Vinci, Brandon Denis, Evgeny Andriyash, and Mohammad H Amin. Quantum variational autoencoder. Quantum Science and Technology, 4(1):014001, 2018.
[8] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195, 2017.
[9] Daniel O’Malley, Velimir V Vesselinov, Boian S Alexandrov, and Ludmil B Alexandrov. Nonnegative/binary matrix factorization with a d-wave quantum annealer. PloS one, 13(12):e0206653, 2018.
[10] David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9(1):147–169, 1985.
[11] Ruslan Salakhutdinov and Geoffrey Hinton. Deep boltzmann machines. In Artificial intelligence and statistics, pages 448–455, 2009.
[12] Asja Fischer and Christian Igel. Training restricted boltzmann machines: An introduction. Pattern Recognition, 47(1):25–39, 2014.
[13] Chris K Carter and Robert Kohn. On gibbs sampling for state space models. Biometrika, 81(3):541–553, 1994.
[14] Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
[15] Graham W Taylor, Geoffrey E Hinton, and Sam T Roweis. Modeling human motion using binary latent variables. In Advances in neural information processing systems, pages 1345–1352, 2007.
[16] Tanya Schmah, Geoffrey E Hinton, Steven L Small, Stephen Strother, and Richard S Zemel. Generative versus discriminative training of rbms for classification of fmri images. In Advances in neural information processing systems, pages 1409–1416, 2009.
[17] Sainbaya Sukhbaatar, Takaki Makino, Kazuyuki Aihara, and Takashi Chikayama. Robust generation of dynamical patterns in human motion by a deep belief nets. In Asian Conference on Machine Learning, pages 231–246, 2011.
[18] Endre Boros, Peter L Hammer, and Gabriel Tavares. Local search heuristics for quadratic unconstrained binary optimization (qubo). Journal of Heuristics, 13(2):99–132, 2007.
[19] Jun Cai, William G Macready, and Aidan Roy. A practical heuristic for finding graph minors. arXiv preprint arXiv:1406.2741, 2014.
[20] D-Wave. Wave qpu architecture: Chimera.
[21] Vincent Dumoulin, Ian J Goodfellow, Aaron Courville, and Yoshua Bengio. On the challenges of physical implementations of rbms. In Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
[22] John E Dorband. A boltzmann machine implementation for the d-wave. In 2015 12th International Conference on Information Technology-New Generations, pages 703–707. IEEE, 2015.
[23] Sunil Thulasidasan. Generative modeling for machine learning on the d-wave. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2016.
[24] Steven H Adachi and Maxwell P Henderson. Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356, 2015.
[25] Shitian Ni and Shota Nagayama. Performance comparison on cfrbm between gpu and quantum annealing. Technical report, Mercari, 2018.
[26] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
[27] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
[28] Hengyuan Hu, Lisheng Gao, and Quanbin Ma. Deep restricted boltzmann networks. arXiv preprint arXiv:1611.07917, 2016.