The authors want to thank Victor Albert, Fernando Brandão, Manuel Endres, Ingo Roth, Joel Tropp, Thomas Vidick and John Wright for valuable inputs and inspiring discussions. Leandro Aolita and Giuseppe Carleo provided helpful advice regarding presentation. Our gratitude extends, in particular, to Joseph Iverson who helped us devising a numerical sampling strategy for toric code ground states. We also thank Marco Paini and Amir Kalev for informing us about their related work [58], where they discussed succinct classical “snapshots” of quantum states obtained from randomized local measurements. HH is supported by the Kortschak Scholars Program. RK acknowledges funding provided by the Office of Naval Research (Award N00014-17-1-2146) and the Army Research Office (Award W911NF121054). JP acknowledges funding from ARO-LPS, NSF, and DOE. The Institute for Quantum Information and Matter is an NSF Physics Frontiers Center.
H.H. and R.K. developed the theoretical aspects of this work. H.H. conducted the numerical experiments and wrote the open source code. J.P. conceived the applications of classical shadows. H.H., R.K. and J.P. wrote the manuscript.
The authors declare no competing interests.
Throughout this work we restrict attention to multi-qubit systems and is a fixed, but unknown, quantum state in
dimensions. We present a general-purpose strategy for predicting many properties of this unknown state. To extract meaningful information about
, we need to perform a collection of measurements.
Definition 1 (measurement primitive). We can apply a restricted set of unitary evolutions , where U is chosen from an ensemble U. Subsequently, we can measure the rotated state in the computational basis
. Moreover, we assume that this collection is tomographically complete, i.e. for each
there exist
and b such that
.
Based on this primitive, we repeatedly perform a simple randomized measurement procedure: randomly rotate the state and perform a computational basis measurement. Then, after the measurement, we apply the inverse of U to the resulting computational basis state. This procedure collapses
to
This random snapshot contains valuable information about in expectation:
For any unitary ensemble U, this relation describes a quantum channel . Tomographic completeness ensures that M — viewed as a linear map — has a unique inverse
and we set
The classical shadow is a modified post-measurement state that has unit trace, but need not be positive semi-definite. However, it is designed to reproduce the underlying state exactly in expectation:
. This classical shadow
corresponds to the linear inversion (or least squares) estimator of
in the single-shot limit. Linear inversion estimators have been used to perform full quantum state tomography [35, 68], where an exponential number of measurements is needed. We wish to show that
can predict many properties from only very few measurements.
Classical shadows are well suited to predict linear functions in the unknown state :
To achieve this goal, we simply replace the (unknown) quantum state by a classical shadow
. Since classical shadows are random, this produces a random variable that yields the correct prediction in expectation:
Fluctuations of around this desired expectation are controlled by the variance.
Lemma 1. Fix O and set , where
is a classical shadow (S3). Then
The norm only depends on the measurement primitive:
It is easy to check that is nonnegative and homogeneous (
). After some work, one can verify that this expression also obeys the triangle inequality, and so is indeed a norm.
Proof. Classical shadows have unit trace by construction (). This feature implies that the variance only depends on the traceless part
of O, not O itself:
Moreover, it is easy to check that the inverse of M (S2) is self-adjoint (for any pair of matrices X, Y with compatible dimension). These two observations allow us to rewrite the variance in the following fashion:
Classical shadows arise from mixing two types of randomness: (i) a (classical) random choice of unitary and (ii) a random choice of computational basis state
that is governed by Born’s rule (S1). Inserting the average over computational basis states produces a (squared) norm that closely resembles the advertised expression, but does depend on the underlying state:
Maximizing over all possible states removes this implicit dependence and produces a universal upper bound on the variance. Ignoring the subtraction of
(which can only make the bound tighter), we obtain (S6).
Lemma 1 sets the stage for successful linear function estimation with classical shadows. A single classical shadow (S3) correctly predicts any linear function in expectation. Convergence to this desired target can be boosted by forming empirical averages of multiple independent shadow predictions. The empirical mean is the canonical example for such a procedure. Construct N independent classical shadows
and set
Each summand is an independent random variable with correct expectation and variance bounded by Lemma 1. Convergence to the expectation value can be controlled by classical concentration arguments (e.g. Chernoff or Hoeffding inequalities). In order to achieve a failure probability of (at most)
, the number of samples must scale like
. While the scaling in variance and approximation accuracy
is optimal, the dependence on
is particularly bad. Unfortunately, this feature of sample mean estimators cannot be avoided without imposing additional assumptions (that do not apply to classical shadows). Median of means [41, 55] is a conceptually simple trick that addresses this issue. Instead of using all samples to construct a single empirical mean (S11), construct K independent sample means and form their median:
for . This estimation technique requires NK samples in total, but it is much more robust with respect to outlier corruption. Indeed,
if and only if more than half of the empirical means individually deviate by more than
. The probability associated with such an undesirable event decreases exponentially with the number of batches K. This results in an exponential improvement over sample mean estimation in terms of failure probability. The main result of this work capitalizes on this improvement.
Theorem 1. Fix a measurement primitive U, a collection of
Hermitian matrices and accuracy parameters
. Set
where denotes the norm defined in Eq. (S7). Then, a collection of NK independent classical shadows allow for accurately predicting all features via median of means prediction (S12):
with probability at least .
Proof. The claim follows from combining the variance estimates from Lemma 1 with a rigorous performance guarantee for median of means estimation [41, 55]: Let X be a random variable with variance . Then, K independent sample means of size
suffice to construct a median of means estimator
that obeys
for all
. The parameters N and K are chosen such that this general statement ensures
for all
. Apply a union bound over all M failure probabilities to deduce the claim.
Remark 1 (Constants in Theorem 1). The numerical constants featuring in N and K result from a conservative (worst case) argument that is designed to be simple, not tight. We expect that the actual constants are much smaller in practice.
Each classical shadow is the result of a single quantum measurement on . Viewed from this angle, Theorem 1 asserts that a total of
measurement repetitions suffice to accurately predict a collection of M linear target functions .
Importantly, this sample complexity only scales logarithmically in the number of target functions M. Moreover, the problem dimension does not feature explicitly. The sample complexity does, however, depend on the measurement primitive via the norm
. This term reflects expressiveness and structure of the measurement primitive in question. This subtle point is best illustrated with two concrete examples. We defer technical derivations to subsequent sections and content ourselves with summarizing the important aspects here.
Example 1: Random Clifford measurements Clifford circuits are generated by CNOT, Hadamard and Phase gates and form the group . The “random global Clifford basis” measurement primitive —
(endowed with uniform weights) — implies the following simple expression for classical shadows and the associated norm
:
We refer to Supplementary Section 5 B for details and proofs. Combined with Eq. (S15), this ensures that random global Clifford basis measurements suffice to accurately predict M linear functions. This prediction technique is most powerful, when the target functions have constant Hilbert-Schmidt norm. In this case, the sample rate is completely independent of the problem dimension
. Prominent examples include estimating quantum fidelities (with pure states), or entanglement witnesses.
Example 2: Random Pauli measurements Although (global) Clifford circuits are believed to be much more tractable than general quantum circuits, they still feature entangling gates, like CNOT. Such gates are challenging to implement reliably on today’s devices. The “random Pauli basis” measurement primitive takes this serious drawback into account and assumes that one is only able to apply single-qubit Clifford gates, i.e. (endowed with uniform weights). This is equivalent to assuming that we can perform arbitrary Pauli (basis) measurements, i.e., measuring each qubit in the X-, Y - and Z-basis, respectively. Such basis measurements decompose nicely into tensor products (
for
) and respect locality. The associated classical shadows and the norm
inherit these desirable features:
Here, locality(O) counts the number of qubits on which O acts nontrivially. We refer to Supplementary Section 5 C for details and proofs. Combined with Eq. (S15) this ensures that local Clifford (Pauli) basis measurements suffice to predict M bounded observables that are at most k-local. For observables that are the tensor product of k single-qubit observables, the sample complexity can be further improved to
. This prediction technique is most powerful when the target functions do respect some sort of locality constraint. Prominent examples include k-point correlators, or individual terms in a local Hamiltonian.
Discussion and information-theoretic optimality These two examples complement each other nicely. Random Clifford measurements excel at performing useful subroutines in quantum computing and communication tasks, such as certifying (global) entanglement, which will be feasible using sufficiently advanced hardware. Their practical utility, however, hinges on the ability to execute circuits with many entangling gates. Random Pauli measurements, on the other hand, are much less demanding from a hardware perspective. In today’s NISQ era, local Pauli operators can be accurately measured using available hardware platforms. While not well-suited for predicting global features, Pauli measurements excel at making local predictions. Furthermore, for both kinds of randomized measurements, linear prediction based on classical shadows saturates fundamental lower bounds from information theory.
Theorem 2 (random Clifford measurements; informal version). Any procedure based on a fixed set of single-copy measurements that can predict, with additive error arbitrary linear functions
, requires at least
copies of the state
.
Theorem 3 (random Pauli measurements; informal version). Any procedure based on a fixed set of single-copy local measurements that can predict, with additive error arbitrary k-local linear functions
, requires at least
copies of the state
.
We refer to Supplementary Section 7 (Clifford) and 8 (Pauli) for further context, details and proofs. In the random Pauli basis measurement setting, classical shadows provably saturate this lower bound only for tensor product observables. For general k-local observables, there is a small discrepancy between (upper bound) and
(lower bound).
Feature prediction with classical shadows readily extends beyond the linear case. Here, we shall focus on quadratic functions, but the procedure and analysis readily extend to higher order polynomials. Every quadratic function in an unknown state can be recast as a linear function acting on the tensor product
:
An immediate generalization of linear feature prediction with classical shadows suggests the following procedure. Take two independent snapshots of the unknown state
and set
This random variable is designed to yield the correct target function in expectation. Similar to linear function prediction we can boost convergence to this desired target by forming empirical averages. To make the best of use of N samples, we average over all (distinct) pairs:
This idea provides a systematic approach for constructing estimators for nonlinear (polynomial) functions. Estimators of this form always yield the desired target in expectation. For context, we point out that the estimator (S20) closely resembles the sample variance, while estimators of higher order polynomials are known as U-statistics [38]. Fluctuations of around its desired expectation are once more controlled by the variance. U-statistics estimators are designed to minimize this variance and therefore considerably boost the rate of convergence.
Lemma 2. Fix O and a sample size N. Then, the variance of the U-statistics estimator (S20) obeys
We emphasize that this variance decreases with the number of samples N. This sets the stage for successful quadratic function prediction with classical shadows. Similar to the linear case, we will not use all samples to construct a single U-statistics estimator. Instead, we construct K of them and form their median:
This renders the entire estimation procedure more robust to outliers and exponentially suppresses failure probabilities.
Theorem 4. Fix a measurement primitive U, a collection of (quadratic) target functions and accuracy parameters
. Set
Then, a collection of NK independent classical shadows allow for accurately predicting all quadratic features via the median of U-statistics estimators (S22):
with probability at least .
Proof. The proof is similar to the argument for linear prediction. We combine the bound on the variance of U-statistics estimators from Lemma 2 with a rigorous performance guarantee for median estimation [41, 55]. Let Z be a random variable with variance at most . Then, setting
produces an estimator that obeys
. The parameter N is chosen ensure that each
has variance at most
. The parameter K is chosen such that each probability of failure is at most
. The advertised statement then follows from taking a union bound over all M target estimations.
Remark 2 (Constants in Theorem 4). The numerical constants featuring in N and K result from a conservative (worst case) argument that is designed to be simple, not tight. We expect that the actual constants are much smaller in practice.
Theorem 4 is a general statement that provides upper bounds for the sample complexity associated with predicting quadratic target functions:
independent randomized measurements suffice to accurately predict a collection of M nonlinear target functions . This sampling rate once more depends on the measurement primitive and it is instructive to consider concrete examples.
Example 1: Random Pauli measurements We first discuss the practically more relvant example for today’s NISQ era: classical shadows constructed from random single-qubit Pauli basis measurements. This measurement primitive remains well-suited for predicting local quadratic features . Suppose that O acts nontrivially on k qubits in the first state copy and on k qubits in the second state copy. Thus, when viewed as an observable for a 2n-qubit system, O is 2k-local. A technical argument shows that the maximum of the variances in Equation (S25) is bounded by
. We emphasize that this scaling is much better than the naive guess
– one of the key advantages of U-statistics. Hence we only need a total number of
random Pauli basis measurements to predict M quadratic functions
. An important concrete application of this procedure is the prediction of subsystem Rényi-2 entanglement entropies.
Example 2: Random Clifford measurements Theorem 4 also applies to the global Clifford measurement primitive. There, the maximum of the variances in Equation (S25) can be bounded by. Hence we only need a total number of
random Clifford basis measurements to predict M quadratic functions
. While a clean extension of linear feature prediction with Clifford basis measurements, the applicability of this result seems somewhat limited. Interesting global quadratic features tend to have prohibitively large Hilbert-Schmidt norms. The purity
provides an instructive non-example. It can be written as
, where
denotes the swap operator. Alas,
which scales exponentially in the number of qubits. Nonetheless, quadratic feature prediction with Clifford measurements is by no means useless. For instance, it can help provide statistical a posteriori guarantees on the quality of linear feature prediction — for example, by estimating sample variances to construct confidence intervals.
Supplementary Figure 1: Comparison between classical shadow and neural network tomography (NNQST); toric code. (a) (Left): Number of measurements required for neural network tomography to identify a particular toric-code ground state. We use classical fidelity for NNQST, which is an upper bound for quantum fidelity. (b)(Right): Performance of classical shadows for the same problem. We use quantum fidelity for classical shadows. The shaded regions are the standard deviation of the estimated fidelity over ten runs.
In this section we report additional numerical experiments that demonstrate the viability of linear feature prediction with classical shadows. We focus on the Clifford basis measurement primitive, i.e. applying a random Clifford circuit to and then measuring in the computational basis.
In the main text, we have considered direct fidelity estimation for GHZ states and compared it with neural network quantum state tomography (NNQST). While highly instructive from a theoretical perspective, GHZ states comprised of 100 qubits are very fragile and challenging to implement in practice. To conduct experiments for more physical target states, we consider Toric code ground states [21]. Not only are they the most prominent example of a topological quantum error correcting code and thus highly relevant for quantum computing devices. They also correspond to ground states of a Hamiltonian: , where
and
denote vertex- and plaquette operators
. The ground space of H is four-fold degenerate and we select the superposition of all closed-loop configurations (
) as a test state for both classical shadows and NNQST: how many measurement repetitions are required to accurately identify this toric code ground state with high fidelity? The results are shown in Supplementary Figure 1. Neural network tomography based on a deep generative model seems to require a number of samples that scales unfavorably in the system size n (left). In contrast, fidelity estimation with classical shadows is completely independent of the system size. The difficulty of NNQST in learning 2D toric code may be related to some observed failures of deep learning [66] for learning patterns with combinatorial structures. In Supplementary Section 4, we provide further evidence for potential difficulties when using machine learning approaches to reconstruct some simple quantum states due to a well-known computational hardness conjecture.
Entanglement is at the heart of virtually all quantum communication and cryptography protocols and an important resource for quantum technologies in general. This renders the task of detecting entanglement
Supplementary Figure 2: Detection of GHZ-type entanglement for 3-qubit states.
(a) (Left): Schematic illustration of 3-partite entanglement. Entanglement witnesses are linear functions that separate part of one entanglement class from all other classes.
(b) (Right): Number of entanglement witnesses vs. number of experiments required to accurately estimate all of them. The dashed lines represent the expected number of (random) entanglement witnesses required to detect genuine threepartite entanglement and GHZ-type entanglement in a randomly rotated GHZ state. The shaded region is the standard deviation of the required number of experiments over ten independent repetitions of the entire setup.
important both in theory and practice [28, 36]. While bipartite entanglement is comparatively well-understood, multi-partite entanglement has a much more involved structure. Already for n = 3 qubits, there is a variety of inequivalent entanglement classes. These include fully-separable, as well as bi-separable states, W-type states and finally GHZ-type states. The relations between these classes are summarized in Supplementary Figure 2 and we refer to [4] for a complete characterization. Despite this increased complexity, entanglement witnesses remain a simple and useful tool for testing which class a certain state belongs to. However, any given entanglement witness only provides a one-sided test – see Supplementary Figure 2 (left) for an illustration – and it is often necessary to compute multiple witnesses for a definitive answer.
Classical shadows based on random Clifford measurements can considerably speed up this search: according to Theorem 1 a classical shadow of moderate size allows for checking an entire list of fixed entanglement witnesses simultaneously. Supplementary Figure 2 (right) underscores the economic advantage of such an approach over measuring the individual witnesses directly. Directly measuring M different entanglement witnesses requires a number of quantum measurements that scales (at least) linearly in M. In contrast, classical shadows get by with log(M)-many measurements only.
More concretely, suppose that the state to be tested is a local, random unitary transformation of the GHZ state. Then, this state is genuinely tripartitely entangled and moreover belongs to the GHZ class. The dashed vertical lines in Supplementary Figure 2 (right) denote the expected number of (randomly selected) witnesses required to detect genuine tripartite entanglement (first) and GHZ-type entanglement (later). From the experiment, we can see that classical shadows achieve these thresholds with an exponentially smaller number of samples than the naive direct method. Finally, classical shadows are based on random Clifford measurements and do not depend on the structure of the concrete witness in question. In contrast, direct estimation crucially depends on the concrete witness in question and may be considerably more difficult to implement.
General quantum state tomography The task of reconstructing a full classical description — the density matrix — of a d-dimensional quantum system from experimental data is one of the most fundamental problems in quantum statistics, see e.g. [5, 7, 34, 39] and references therein. Sample-optimal protocols, i.e. estimation techniques that get by with a minimal number of measurement repetitions, have only been developed recently. Information-theoretic bounds assert that of order
state copies are necessary to fully reconstruct
[37]. Constructive protocols [37, 57] saturate this bound, but require entangled circuits and measurements that act on all state copies simultaneously. More tractable single-copy measurement procedures require of order
measurements [37]. This more stringent bound is saturated by low rank matrix recovery [26, 48, 49]
and projected least squares estimation [35, 68].
These results highlight an exponential bottleneck for tomography protocols that work in full generality: at least copies of an unknown n-qubit state are necessary. This exponential scaling extends to the computational cost associated with storing and processing the measurement data.
Matrix product state tomography Restricting attention to highly structured subsets of quantum states sometimes allows for overcoming the exponential bottleneck that plagues general tomography. Matrix product state (MPS) tomography [18] is the most prominent example for such an approach. It only requires a polynomial number of samples, provided that the underlying quantum state is well approximated by a MPS with low bond dimension. In quantum many-body physics this assumption is often justifiable [51]. However, MPS representations of general states have exponentially large bond dimension. In this case, MPS tomography offers no advantage over general tomography. Similar ideas could also be extended to multi-scale entangled states (MERA) tomography [50].
Neural network tomography Recently, machine learning has also been applied to the problem of predicting features of a quantum systems. These approaches construct a classical representation of the quantum system by means of a deep neural network that is trained by feeding in quantum measurement outcomes. Compared to MPS tomography, neural network tomography may be more broadly applicable [15, 29, 69]. However, the actual class of systems that can be efficiently represented, reconstructed and manipulated is still not well understood.
Compressed classical description of quantum states To circumvent the exponential scaling in representing quantum states, Gosset and Smolin [30] have proposed a stabilizer sketching approach that compresses a classical description of quantum states to an accurate sketch of subexponential size. This approach bears some similarity with classical shadows based on random Clifford measurements. However, stabilizer sketching requires a fullycharacterized classical description of the state as an input. So, it still suffers from an exponential scaling in the resources used in practice. Recently, Paini and Kalev [58] have proposed an approximate classical description of a quantum state that can estimate the expectation value of an observable from Haar-random single-qubit rotations followed by computational basis measurements. They focus on estimating a single observable, while we focus on estimating many observables simultaneously. In our classical shadow approach, the Haar-random single-qubit rotations [58] are replaced by random single-qubit Clifford rotations, or – equivalently – measuring each qubit in a random Pauli basis. This simplification may be viewed as a partial derandomization and works, because the (single-qubit) Clifford group forms a 3-design [47, 70, 72].
Direct fidelity estimation Direct fidelity estimation is a procedure that allows for predicting a single pure target fidelity up to accuracy
. The best-known technique is based on few Pauli measurements that are selected randomly using importance sampling [19, 27]. The required number of samples depends on the target: it can range from a dimension-independent order of
(if
is a stabilizer state) to roughly
in the worst case.
Efficient estimation of local observables In quantum many-body physics, many interesting observables can be decomposed into local constituents. This renders the task of accurately predicting many local observables very important — both in theory and practice. A series of recent works [8, 16, 24, 42] propose different measurement strategies to measure many local observables simultaneously. All of them focus on estimating k-local Pauli observables up to accuracy . This would directly translate to an approximation error
for general k-local observables. For some measurement schemes, this general error bound seems unavoidable. But, for certain strategies a careful analysis could lead to an improved performance. The two works [8, 16] are based on properly analyzing the commutation relations between the k-local Pauli observables of interest. Subsequently, one can group commuting observables together and measure them all at once. Different from this more standardized strategy, [42] uses entangled Bell-basis measurements, and [24] is based on randomized measurements to efficiently measure local observables. The prior earlier works [8, 16] have worse performance compared to the more recent two [24, 42]. While the latter two procedures are seemingly different from prediction with classical shadows (Pauli measurements), the sample complexities associated with all three approaches are comparable. Derandomizing classical shadows, however, could considerably reduce the number of measurements required. We will address such a substantial and practical improvement in upcoming work.
Shadow tomography Shadow tomography aims at simultaneously estimating the outcome probabilities associated with M 2-outcome measurements up to accuaracy , where each
is a positive semidefinite matrix with operator norm at most one [1, 3, 10]. This may be viewed as a generalization of fidelity estimation. The best existing result is due to Aaronson and Rothblum [3]. They showed that
copies of the unknown state suffice to achieve this task
. Broadly speaking, their protocol is based on performing gentle 2-outcome measurements one-by-one and subsequently (partially) reversing the damage to the quantum state caused by the measurement. This task is achieved by explicit quantum circuits of exponential size that act on all copies of the unknown state simultaneously. This rather intricate procedure bypasses the no-go result advertised in Theorem 2 and results in a sampling rate that is independent of the 2-outcome measurements in question — only their cardinality M matters.
This numerical experiment considers classical shadows based on random Clifford measurements. We exploit the Gottesman-Knill theorem for efficient classical computations. This well-known result states that Clifford circuits can be simulated efficiently on classical computers; see also [2] for an improved classical algorithm. This has allowed us to address rather large system sizes (more than 160 qubits). To test the performance of feature prediction with classical shadows we first have to simulate the (quantum) data acquisition phase. We do this by repeatedly executing the following (efficient) protocol:
1. Sample a Clifford unitary U from the Clifford group using the algorithm proposed in [45]. This Clifford unitary is parameterized by which fully characterize its action on Pauli operators:
for all j = 1, . . . , n. Here, are the Pauli X, Z-operators acting on the j-th qubit, and
.
2. Given a unitary U parameterized by , we can apply U on any stabilizer state by changing the stabilizer generators and the destabilizers as defined in [2].
3. A computational basis measurement can be simulated using the standard algorithm provided in [2].
Although originally designed for pure target states , we can readily extend this strategy to mixed states
. Operationally speaking, mixed states arise from sampling from a pure state ensemble. This mixing process can be simulated efficiently on classical machines.
For neural network quantum state tomography, we use the open-source code provided by the authors [15]. The main challenge is generating training data, i.e. simulating measurement outcomes. For pure and noisy GHZ states, we use the tetrahedral POVM [15]. For the toric code ground state, we use the Psi2 POVM (which is a measurement in the computational (Z-) basis). Note that measuring in the Z-basis is not a tomographically complete measurement, but we found machine learning models to perform better using Psi2. This is possibly because the pattern is much more obvious (closed-loop configurations) and the figure of merit used in NNQST is a classical fidelity.
A concrete algorithm for creating training data for pure GHZ states is included in the aforementioned open-source implementation of [15]. It uses matrix product states to simulate quantum measurements efficiently. The training data for noisy GHZ states is a slight modification of the existing code. With probability , we sample a measurement outcome from the original state
. And with probability p,
we sample a measurement outcome from (phase error). Since the figure of merit is the fidelity with the pure GHZ state in both pure and noisy GHZ experiment, we reuse the implementation provided in [15].
Creating training data for toric code is somewhat more involved. The goal is to sample a closed-loop configuration on a 2D torus uniformly at random. This can again be done using classical simulations of stabilizer states [2]. The main technical detail is to create a tableau that contains both the stabilizer and the de-stabilizer for the state in question. The rich structure of the toric code renders this task rather easy. The stabilizers are the X-stars and the Z-plaquettes, with two Z-strings over the two loops of the torus. The de-stabilizer of each stabilizer is a Pauli-string that anticommutes with the stabilizer, but commutes with other stabilizers and other de-stabilizers. The full set of stabilizers and de-stabilizers for the toric code can be seen in Supplementary Figure 3.
In our numerical studies, we have seen that neural network quantum state tomography based on deep generative models seems to have difficulty learning toric code ground states.
Here, we take a closer look at this curious aspect and construct a simple class of quantum states where efficient learning of the quantum state from the measurement data would violate a well-known computational
hardness conjecture. First of all, each computational (Z-) basis measurement of the toric code produces a random bit-string. Most bits are sampled uniformly at random from {0, 1} and the remaining bits are binary functions that only depend on these random bits. Consider a simple class of quantum states that mimic this property. Given and
(mod 2), we define
. Such states can be created by preparing
on the first
qubits,
on the n-th qubit followed by CNOT gates between i-th qubit and n-th qubit for every
. Measuring
in the computational (Z-) basis is equivalent to sampling the first
bits x uniformly at random. The final bit is characterized by the deterministic formula
. Now, consider a (globally) depolarized version of this pure state:
One of the most widely used conjectures for building post-quantum cryptography is the hardness of learning with error (LWE) [63]. LWE considers the task of learning a linear n-ary function f over a finite ring from noisy data samples , where x is sampled uniformly at random and
is some independent error. An efficient learning algorithm for LWE will be able to break many post-quantum cryptographic protocals that are believed to be hard even for quantum computers. The simplest example of LWE is called learning parity with error, where
(mod 2) for
and some unknown
. Learning parity with error is also conjectured to be computationally hard [6]. Since learning
from computational (Z-) basis measurements on
is equivalent to learning parity with error, it is unlikely there will be a neural network approach that can learn
efficiently.
This numerical experiment considers classical shadows based on random Clifford measurements. The numerical studies regarding entanglement witnesses are based locally rotated 3-qubit (n = 3) GHZ states:
For , we hope to verify the tripartite entanglement present in the system. To this end, we consider a simple family of entanglement witnesses with compatible structure:
The single-qubit unitaries parametrize different witnesses.
A complete characterization of entanglement in three-qubit systems can be found in Supplementary Figure 2. The expectation value of an entanglement witness in the tripartite state
can certify that
belongs to a particular entanglement class. For example, it is known from the analysis in [4] that for any state
with only bipartite entanglement,
, while for any state
with at most W-type entanglement,
. Therefore verifying that
certifies that
has tripartite entanglement, while
certifies that
has GHZ-type entanglement.
After choosing random unitaries to specify the GHZ-type state
, we generate a list of random
to specify a set of potential entanglement witnesses for
:
If the randomly generated satisfies
, then
is an entanglement witness for genuine tripartite entanglement, and if
, then
is a witness for GHZ-type entanglement. We can compute the expected number of random candidates we have to test to find an observable O such that
or
; these numbers are indicated as the dashed lines on the right side of Supplementary Figure 2.
Given the list of randomly generated witness candidates , we would like to predict
for all
. The naive approach is to directly measure all observables (witnesses). We refer to this as the direct measurement approach. For this approach, we consider the number of total experiments required to estimate every
up to an error 0.1. Note that the number of required samples may vary from witness to witness — it depends on the variance associated with the estimation. In the worst case, one would need
measurements for each witness candidate.
Instead of this direct measurement approach, one could use classical shadows (Clifford measurements) to predict all the observables (witnesses) at once. Because,
for al
, the shadow norm obeys
, according to the analysis in Supplementary Section 1 B. Hence Theorem 1 shows that classical shadows can predict the expectation values of many candidate witnesses very efficiently.
In the numerical experiment, we gradually increased the number of random Clifford measurements we use to construct classical shadows until the classical shadows could accurately predict all up to 0.1-error. The results are shown in Supplementary Figure 2. Because the system size is small (n = 3 qubits), we simulate the quantum experiments classically by storing and processing all
amplitudes. In practice, one should use statistics, like sample variance estimation or the bootstrap [22], to determine confidence intervals and a posteriori guarantees. Quadratic function prediction with classical shadows (Clifford measurements) can be used to achieve this goal efficiently.
Predicting two-point correlation function could be done efficiently using classical shadows based on random Pauli measurements. To facilitate direct comparison, this numerical experiment is designed to reproduce one of the core examples in in [15]. In particular, we use the same data, downloaded from https://github.com/ carrasqu/POVM_GENMODEL. The classical shadow (based on random Pauli basis measurements) replaces the original machine learning based approach for predicting local observables. We use multi-core CPU for training and making prediction with the machine learning model. The reported time is the total CPU time. Predicting local observables O using the (Pauli) classical shadow can be done efficiently by creating the reduced density matrix , where A is the subsystem O acts on. The reduced density matrix
can be created by simply neglecting the data for the rest of the system. Importantly,
is never created as an
matrix. Taking the inner product of
with the local observables O yields the desired result.
We consider classical shadows based on random Pauli measurements for predicting subsystem entanglement entropies. In the first part of the experiment, we consider the ground state of a disordered Heisenberg model. The associated Hamiltonian is , where each
is sampled uniformly (and independently) from the unit interval [0, 1]. The approximate ground state is found by implementing the recursive procedure from [62]: identify the largest
, forming singlet for the connected sites, and reduce the system by removing
. We refer to [62] for details. In the experiment, we perform single-shot random Pauli basis measurements on the approximate ground state. I.e. we measure the state in a random Pauli basis only once and then choose a new random basis. However, in physical experiments, it is often easier to repeat a single Pauli basis measurement many times before re-calibrating to measure another Pauli basis. Performing a single random basis measurement for many repetitions can be beneficial experimentally compared to measuring a random basis every single time. Classical shadows (Pauli) are flexible enough to incorporate economic measurement strategies that take this discrepancy into account. We refer to the open source implementation in https: //github.com/momohuang/predicting-quantum-properties for the exact details.
To obtain a reasonable benchmark, we compare this procedure with the approach proposed by Brydges et al. [12]. For a subsystem A comprised of k qubits, the approach proposed in [12] for predicting the Rényi entropy works as follows. First, one samples a random single-qubit unitary rotations independently for all k qubits. Then, one applies the single-qubit unitary rotation to the system and measures the system in the computational basis to obtain a string of binary values . For each random unitary rotation, several repetitions are performed. The precise number of repetitions for a single random basis is a hyper-parameter that has to be optimized. The estimator for the Rényi entropy takes the following form:
The function is the Hamming distance between strings s and
(i.e, the number of positions at which individual bits are different), while P(s) and
are the probabilities for measuring
and obtaining the outcomes s and
, respectively. The probability P(s) is a function that depends on the randomly sampled single-qubit rotation.
is the expectation of
averaged over the random single-qubit rotations.
The random single-qubit rotations could be taken as single-qubit Haar-random rotations or single-qubit random Clifford rotations. The latter choice is equivalent to random Pauli measurements – the measurement primitive we consider for classical shadows also. For the test cases we considered, using random Pauli measurements yields similar (and sometimes improved) performance compared to single-qubit Haar-random unitary rotation. This allows the approach by [12] and the procedure based on classical shadows to be compared on the same ground. We follow the strategy in [12] to estimate the formula in Eq. (S31). First, we sample random unitary rotations. For each random unitary rotation, we perform
repetitions of rotating the system and measuring in the computational basis. The
measurement outcomes allow us to construct an empirical distribution for P(s). Thus we could use the
measurement outcomes to estimate
for a single random unitary rotation. We then take the average over
different random unitary rotations. Choosing a suitable parameter for
and
is nontrivial. We employ the strategy advocated in [12] for finding the best parameter for
and
. This strategy is called grid search and is performed by trying many different choices for
and recording the best one.
The application for variational quantum simulation uses classical shadows based on random Pauli measurements which is designed to predict a large number of local observables efficiently. It is based on the seminal work presented in [46]. After a Kogut-Susskind encoding to map fermionic configurations to a spin-1/2 lattice with an even number N of lattice sites and a subsequent Jordan-Wigner transform, the Hamiltonian becomes
Here, denote Pauli-X, Y, Z operators acting on the j-th qubit (
). This Hamiltonian has very advantageous structure. Each of the three contributions can be estimated by performing a single Pauli basis measurement (measure every qubit in the X basis to determine
, measure every qubit in the Y basis to determine
and measure every qubit in the Z basis to determine
). The measurement of the Hamiltonian variance
is more complicated, because
does not decompose nicely. To determine its value, we must first measure
and
. This is the easy part, because 3 measurement bases once more suffice. However, in addition, we must also estimate the anti-commutators
. This may be achieved by measuring the following k-local observables (with k at most 4):
,
P
P
j, j
, N
, s.t. j
, j
, j
,
,
P
P
j, j
, j
, N
, s.t. j
, j
, j
, j
, j
< j
,
Although local, estimating all observables of this form is the main bottleneck of the entire procedure. To minimize the number of measurement bases, the original work [46] has performed an analysis of symmetry in the lattice Schwinger model. First, the target Hamiltonian in Equation (S32) satisfies , which corresponds to a charge conservation symmetry in the scalar fermionic field. [46] further consider a charge symmetry subspace with
, which corresponds to a
symmetry. In this subspace, we have
. This ensures that we only have to estimate local observables corresponding to
and
. In the original setup [46], this task was achieved by measuring roughly 2N bases in total. We refer to [46, Appendix B and Appendix C] for further details and explanation. We propose to replace this original approach by linear feature prediction with classical shadows (Pauli measurements).
For classical shadows based on random Pauli measurements, every measurement basis is an independent random X, Y , or Z measurement for every qubit. This randomized general purpose procedure does not take into account the fact that we want to measure a specific set of k-local observables given in Equation (S33). The derandomized version of classical shadows is based on the concept of pessimistic estimators [60, 67] (see also [71] for an application with quantum information context). It removes the original randomness by utilizing the knowledge of this specific set of k-local observables. When we throw a dice (or coin) to decide whether we want to measure in either, the , the
, or the
basis, the derandomized version would choose the measurement basis (X, Y , or Z) that would lead to the best expected performance on the set of k-local observables given in Equation (S33). The expected performance is computed based on random Pauli basis measurements and the analysis in Supplementary Section 1. The derandomized version of classical shadows would perform at least as well as the original randomized version. Furthermore, due to the dependence on the specific set of observables for choosing the measurement bases, the derandomized version can exploit advantageous structures in the set of observables we want to measure. As detailed in the main text, classical shadows based on random Pauli measurements provide improvement only for larger system sizes (more than 50 qubits). A derandomized version of classical shadows improves upon the randomized version and leads to a substantial improvement in efficiency and scalability over a wide range of system sizes. As an added benefit, derandomization can be completely automated and does not depend on the concrete set of target observables. We refer to https://github. com/momohuang/predicting-quantum-properties for a (roughly linear time) algorithm that derandomizes random Pauli measurements for any collection of target observables with Pauli structure.
Clifford circuits were introduced by Gottesman [31] and form an indispensable tool in quantum information processing. Applications range from quantum error correction [56], to measurement-based quantum computation [11, 61] and randomized benchmarking [23, 44, 53]. For systems comprised of n qubits, the Clifford group is generated by CNOT, Hadamard and phase gates. This results in a finite group of cardinality
that maps (tensor products of) Pauli matrices to Pauli matrices upon conjugation. This underlying structure allows for efficiently storing and simulating Clifford circuits on classical computers – a result commonly known as Gottesman-Knill theorem. The n-qubit Clifford group also comprises a unitary 3-design [47, 70, 72]. Sampling Clifford circuits uniformly at random reproduces the first 3 moments of the full unitary group endowed with the Haar measure. For k = 1, 2, 3
The right hand side of this equation can be evaluated explicitly by using techniques from representation theory, see e.g. [33, Sec. 3.5]. This in turn yields closed-form expressions for Clifford averages of linear and quadratic operator-valued functions. Choose a unit vector and let
denote the space of Hermitian
matrices. Then,
Here, denotes a n-qubit depolarizing channel with loss parameter p. Linear maps of this form can be readily inverted. In particular,
These closed-form expressions allow us to develop very concrete strategies and rigorous bounds for classical shadows based on (global and local) Clifford circuits.
Proposition 1. Adopt a “random Clifford basis” measurement primitive, i.e. each rotation is chosen uniformly from the n qubit Clifford group
. Then, the associated classical shadow is
where is the observed computational basis measurement outcome (of the rotated state
). Moreover, the norm defined in Eq. (S7) is closely related to the Hilbert-Schmidt norm:
Note that passing from O to its traceless part is a contraction in Hilbert-Schmidt norm:
Hence, we can safely replace the upper bound in Eq. (S39) by — the Hilbert Schmidt norm (squared) of the original observable.
Proof. Eq. (S35) readily provides a closed-form expression for the measurement channel defined in Eq. (S2):
This depolarizing channel can be readily inverted, see Eq. (S37). In particular,
for any traceless matrix . The latter reformulation considerably simplifies the expression for the norm
defined in Eq. (S7). A slight reformulation allows us to furthermore capitalize on Eq. (S36) to exactly compute this norm for traceless observables:
To further simplify this expression, recall and note that
, where
denotes the spectral norm. The bound Eq. (S39) then foloows from the elementary relation between the spectral and Hilbert-Schmidt norms:
.
Proposition 2. Adopt a “random Pauli basis” measurement primitive, i.e. each rotation is a tensor product
of randomly selected single-qubit Clifford gates
. Then, the associated classical shadow is
Moreover, the norm defined in Eq. (S7) respects locality. Suppose that only acts nontrivially on k-qubits, e.g.
with
. Then
, where
is the same norm, but for k-qubit systems.
Proof. Unitary rotation and computational basis measurements factorize completely into tensor products. This insight allows us to decompose the measurement channel M defined in Eq. (S2) into a tensor product of single-qubit operations. For elementary tensor products we can apply Eq. (S35) separately for each single-qubit action and infer
Linear extension to all of yields the following formula for M and its inverse:
where according to Eq. (S37). This formula readily yields a closed-form expression for the classical shadow. Use
to conclude
For the second claim, we exploit a key feature of depolarizing channels and their inverses. The identity matrix is a fix-point, i.e. . For k-local observables, e.g.
, this feature ensures
where denotes the inverse channel of a k-qubit local Clifford measurement procedure. This observation allows us to compress the norm (S7) to the “active” subset of k qubits. Exploit the tensor product structure
with
to conclude
where denotes the partial trace over all “inactive” subsystems. Partial traces preserve the space of all quantum states. So maximizing over all partial traces
is equivalent to maximizing over all k-qubit states and we exactly recover the norm
on k qubits. Finally, it is easy to check that the actual location of the active k-qubit support of O does not affect the argument.
Recall that the (squared) norm is the most important figure of merit for feature prediction with classical shadows. According to Theorem 1,
determines the number of samples required to accurately predict a collection of linear functions
. Viewed from this angle, Proposition 2 has profound consequences for predicting (collections of) local observables under the local Clifford measurement primitive. For each local observable
, the norm
collapses to its active support, regardless of its precise location. The size of these supports is governed by the locality alone, not the total number of qubits!
It is instructive to illustrate this point with a simple special case first.
Lemma 3. Let O be a single k-local Pauli observable, e.g. , where
. Then,
, for any choice of the k qubits where nontrivial Pauli matrices act. This scaling can be generalized to arbitrary elementary tensor products supported on k qubits, e.g.
.
Proof. Pauli matrices are traceless and obey, and
for each
. Proposi- tion 2 and the tensor product structure of the problem then ensure
where we have used Eq. (S36) to explicitly evaluate the single qubit Clifford averages. We leave the extension to more general tensor product observables as an exercise for the dedicated reader.
The norm expression in Lemma 3 scales exponentially in the locality k, but is independent of the total number of qubits n. The compression property (Proposition 2) suggests that this desirable feature should extend to general k-local observables. And, indeed, it is relatively straightforward to obtain crude upper bounds that scale with . The additional factor of two, however, effectively doubles the locality parameter and can render conservative feature prediction with classical shadows prohibitively expensive in concrete applications.
The main result of this section considerably improves upon these crude bounds and almost reproduces the (tight) scaling associated with k-local Pauli observables.
Proposition 3. Let O be a k-local observable, e.g. with
Then,
The same bound holds for the shadow norm of the traceless part of .
The proof is considerably more technical than the proof of Lemma 3 and relies on the following auxiliary result.
Lemma 4. Fix two k-qubit Pauli observables with
. Then, the following formula is true for any state
:
where f(p, q) = 0 whenever there exists an index i such that and
. Otherwise,
, where s is the number of non-identity Pauli indices that match (
).
This combinatorial formula follows from a straightforward, but somewhat cumbersome, case-by-case analysis based on the (single-qubit) relations (S35) and (S36). We include a proof at the end of this subsection.
Proof of Proposition 3. Proposition 2 allows us to restrict our attention to the relevant k-qubit region on which acts nontrivially. Next, expand
in the (tensor product) Pauli basis, i.e.
with
. Fix an arbitrary k-qubit state
and use Lemma 4 to conclude
where f(p, q) is the combinatorial function defined in Lemma 4. The last equality follows from the dual characterization of the spectral norm: for any positive semidefinite matrix A.
We can further simplify this expression by introducing a partial order on Pauli strings . We write
if it is possible to obtain q from s by replacing some local non-identity Paulis with I. Moreover, let
denote the number of non-identity Pauli’s in the string q. Then,
where we have used for all Pauli strings. Next, note that for fixed
,
Together with Cauchy-Schwarz, this numerical insight implies
Finally, observe that every is dominated by exactly
different strings
. This ensures
because Pauli matrices are proportional to an orthonormal basis of :
The general claim then follows from the fundamental relation among Schatten norms:
.
The bound on traceless parts of observables is nearly analogous, because the transition from O to
respects locality. E.g.
obeys
. To get the same bound, we use that this transition is a contraction in Hilbert-Schmidt norm:
Proof of Lemma 4. Since Pauli observables decompose nicely into tensor products, this claim readily follows from extending a single-qubit argument. Note that for
and
. It is straightfor- ward to evaluate the single-qubit expression for the trivial case
. Fix a state
and compute
Next, suppose , but
. This single-qubit case is covered by Eq. (S35):
because . The case
and
leads to analogous results. Finally, suppose that both
. By assumption
and both matrices are traceless. Hence, we can resort to Eq. (S36) to conclude
for any state . Pauli matrices are orthogonal (
) and anticommute (
). This implies that the above expression vanishes whenever
. If p = q it evaluates to
and we can conclude that the single qubit average always equals
The statement then follows from extending this formula to tensor products of k Pauli matrices.
We focus on the particularly relevant task of predicting quadratic functions with classical shadows, using
Lemma 5 (Variance). The variance associated with the estimator obeys
where is the symmetrized version of O and S denotes the swap operator (
).
Proof. First, note that and the target
are invariant under symmetrization. This ensures
Thus, we may without loss replace the original observable O by its symmetrized version . Next, we expand the definition of the variance:
We can use the inequality (for any pair of random variables A, B) to obtain a simplified upper bound:
Proposition 4. Suppose that O describes a quadratic function that acts on at most k-qubits in the first system and at most k-qubits in the second system and obeys
. Then,
Proof. Because of the single-qubit tensor product structure in the random Pauli measurement and the inverted quantum channel , the tensor product of two snapshots
of the unknown quantum state
may be viewed as a single snapshot of the tensor product state
:
Hence and, by assumption, O is an observable that acts on k + k = 2k qubits only. The claim then follows from invoking the variance bounds for linear feature prediction presented in Proposition 3.
In contrast to the Pauli basis setup, variances for quadratic feature prediction with Clifford basis measurements cannot be directly reduced to its linear counterpart. Nonetheless, a more involved direct analysis does produces bounds that do closely resemble the linear base case.
Proposition 5. Suppose that O describes a quadratic function and obeys
. Then, the variance associated with classical shadow estimation (random Clifford measurements) obeys
The pre-factorconverges to the constant 3 at an exponential rate in system size.
This claim is based on the following technical Lemma and insights regarding linear feature prediction.
Lemma 6. Suppose that O describes a quadratic function . Then,
Proof of Proposition 5. The variance of is equivalent to the variance of
, where
describes a linear function. According to Proposition 1, this variance term obeys
because and
. A similar argument takes care of the second variance contribution
. Lemma 6 supplies a bound for the square of the final contribution. By assumption
and the claim follows.
The remainder of this section is devoted to proving Lemma 6. Unfortunately, there does not seem to be a direct way to relate this task to variance bounds for linear feature prediction. Instead, we base our analysis on the 3-design property (S36) of Clifford circuits and a reformulation of this feature in terms of permutation operators. This strategy is inspired by the approach developed in [9], but conceptually and technically somewhat simpler. We believe that similar arguments extend to variances associated with higher order polynomials, but do refrain from a detailed analysis. Instead, we carefully outline the main ideas and leave a rigorous extension to future work.
Problem statement and reformulation: We will ignore symmetrization (which can only make the variance smaller) and focus on bounding the variance of , where each
is an independent classical shadow. To simplify notation, we set
and define the following traceless variants of O:
Here, with a = 1, 2 denotes the partial trace over the first and second system, respectively. All three operators are traceless (recall
) and the final (bipartite) operator has the additional property that both partial traces vanish identically:
.
Proposition 1 asserts , where each
is a random Clifford unitary and
is the outcome of a computational basis measurement. These explicit formulas allow us to decompose the expression of interest in the following fashion:
The variance corresponds to the expected square of this expression. The second term is constant and does not contribute. We analyze the remaining terms on a case-by case basis.
Linear terms: The third and fourth terms in Eq. (S73) are linear feature functions in one classical shadow only. Their (squared) contribution to the overall variance is characterized by Proposition 1:
Both bounds can be related to the Hilbert-Schmidt norm (squared) of the original observable:
the special property that both partial traces vanish identically: for a = 1, 2. Moreover, the Hilbert-Schmidt norm (squared) of this operator factorizes nicely:
Not only is this expression bounded by the original Hilbert-Schmidt norm . The norms of partial traces also feature explicitly with a minus sign. This will allow us to fully counter-balance the variance contributions (S75) from the linear terms.
Next, we use the 3-design property (S34) of Clifford circuits in dimension :
where is the projector onto the totally symmetric subspace of
. This formula implies
The specific structure of implies that several contributions must vanish. Permutations that have either 1 or 2 as a fix-point lead to a partial trace of
that evaluates to zero. There are only three permutations that do not have such fix-points: The flip
and the two cycles
(2, 3, 1). There are in total
potential combinations of such permutations. Each of them results in a trace expression that can be upper-bounded by Hilbert-Schmidt norms. For instance the pair flip and flip produces
All other 8 contributions can also be bounded by this expression and we conclude
Bounds on cross-terms: Cross-terms are considerably easier to evaluate, because one (or both) random matrices only feature linearly. We can use to effectively get rid of the linear contribution. For instance,
where denotes the operator norm. Cross terms that do feature the leading order term require slightly more work, but can be addressed in a similar fashion. Using linearity in one snapshot reduces the expression to an expectation of a quadratic function in one snapshot only. The remaining computation is similar to the proof of Proposition 1 and yields
Standard norm inequalities, as well as the explicit expression for allow for counter-balancing some of the sub-leading terms and we conclude
Before stating the content of the statement, we need to introduce some additional notation. In quantum mechanics, the most general notion of a quantum measurement is a POVM (positive operator-valued measure). A d-dimensional POVM F consists of a collection of positive semidefinite matrices that sum up to the identity matrix:
for all
and
. The index i is associated with different potential measurement outcomes and Born’s rule asserts
for all
and any d-dimensional quantum state
. We present a simplified version of the proof by consider the relevant case where
. The full proof can be found in [40].
Theorem 5 (Detailed restatement of Theorem 2 for Hilbert-Schmidt norm). Fix a sequence of POVMs . Suppose that given any M features
with
, there exists a machine (with arbitrary runtime as long as it always terminates) that can use the measurement outcomes of
on N copies of an unknown d-dimensional quantum state
to
-accurately predict
with high probability. Assuming
, then necessarily
It is worthwhile to put this statement into context and discuss consequences, as well as limitations. Theorem 1 (Clifford measurements) equips classical shadows with a universal convergence guarantee: (order) single-copy measurements suffice to accurately predict any collection of M target func- tions in any state. Theorem 5 implies that there are cases where this number of measurements is unavoidable. This highlights that the sample complexity of feature prediction with classical shadows is optimal in the worst case – a feature also known as minimax optimality.
Minimax optimality, however, does not rule out potential for further improvement in certain best-case scenarios. Advantageous structure in or the
’s (or both) can facilitate the design of more efficient prediction techniques. Prominent examples include matrix product state tomography (MPST) [18, 51] and neural network tomography (NNQST) [15]. Such tailored approaches, however, hinge on additional assumptions about the states to be measured or the properties to be predicted.
Finally, we emphasize that Theorem 2 only applies to single-copy measurements. Another way to bypass this lower bound is to use joint quantum measurements that act on all copies of the quantum state simultaneously. Although very challenging to implement, such procedures can get by with substantially fewer state copies while still being universal. Shadow tomography [1, 3] is a prominent example.
Proof idea: We adapt a versatile proof technique for establishing information-theoretic lower bounds on tomographic procedures that is originally due to Flammia et al. [26]; see also [37, 65] for adaptations and refinements. The key idea is to consider a communication task in which Alice chooses a quantum state from among an alphabet of possible states and then sends copies of her chosen state to Bob, who measures all the copies hoping to extract a classical message from Alice. If we choose Alice’s alphabet suitably, then by learning many properties of Alice’s state Bob will be able to identify the state, hence decoding Alice’s message. Information-theoretical lower bounds on the number of copies Bob needs to decode the message can therefore be translated into lower bounds on how many copies Bob needs to learn the properties.
To be more specific, suppose Alice chooses her state from an ensemble of M possible n-qubit signal states and suppose there are M linear operators
, each with tr
, such that learning the expectation values of all the operators
up to an additive error
suffices to determine
uniquely. Suppose furthermore that if Bob receives N copies of any n-qubit state, and measures them one at a time, he is able to learn all of the properties
with an additive error no larger than
with high success probability. This provides Bob with a method for identifying the state
with high probability. Therefore, if Alice chooses her signal state uniformly at random from among the M possible states, by performing the appropriate single-copy measurements Bob can acquire
bits of information about Alice’s message. A lower bound on how many copies Bob needs to gain
bits of information about Alice’s state, then, becomes a lower bound on how many copies Bob needs to learn the M properties
. To get the best possible lower bound, we choose Alice’s signal ensemble
so that it is as hard as possible for Bob to distinguish the signals using properties with tr
.
So far, this lower bound on N would apply even if Bob has complete knowledge of Alice’s signal states and the properties he should learn to distinguish them. We can derive a stronger lower bound on N by invoking a powerful feature of classical shadows — that Bob must make his measurements before he finds out which properties he must learn. To obtain this stronger bound, we introduce into the communication scenario a third party, named Loki, who tampers with the signal states. Loki chooses a Haar-random n-qubit unitary U, and
Randomly Select X
Supplementary Figure 4: Illustration of the communication protocol behind Theorem 5 and Theorem 6. Two parties (Alice and Bob) devise a protocol that allows them to communicate classical bit strings: Alice encodes a bit string X in a quantum state and sends N independent copies of the state to Bob. Bob performs quantum measurements and uses a black box device (e.g. classical shadows) to decode Alice’s original message. An unpredictable trickster (Loki) tampers with this procedure by randomly rotating Alice’s quantum states en route to Bob. Loki reveals his actions only after Bob has completed the measurement stage of his protocol.
replaces all N copies of Alice’s signal state by the rotated states
before presenting the states to Bob (Loki’s mischief).
If Bob knew Loki’s unitary U, he could modify his measurement procedure to learn the rotated properties . These rotated properties are just as effective for distinguishing the rotated states as the unrotated properties were effective for distinguishing the unrotated states. However, Loki keeps U secret, so Bob is forced to perform his measurements on the rotated states without knowing U. Only after Bob’s data acquisition phase is completed does Loki confide in Bob and provide him with a full classical description of the unitary he applied earlier (Loki’s redemption). This three-party scenario is illustrated in Supplementary Figure 4.
Suppose, though, that using the classical shadow based on his measurements, Bob can predict any M properties (with additive error bounded by and with high success probability), provided that the HilbertSchmidt norm is no larger than
for each property. Then he is just as well equipped to learn
as
, and can therefore decode Alice’s message successfully once Loki reveals U. It must be, then, that Bob’s measurement outcomes provide
bits of information about Alice’s prepared state, when U is known. This is the idea we use to derive the stronger upper bound on N, and hence prove Theorem 5.
We emphasize again that quantum feature prediction with classical shadows can cope with Loki’s mischief, by merely rotating the features Bob predicts, because the predicted features need not be known at the time Bob measures. The lower bound in Theorem 5 does not apply to the task of learning features that are already known in advance. We also emphasize again that Theorem 5 assumes that the copies of the state are measured individually. It does not apply to protocols where collective measurements are applied across many copies.
We show how Alice can communicate any integer in {1, . . . , M} to Bob. Alice and Bob first agree on a codebook for encoding any integer selected from {1, . . . , M} in a d-dimensional quantum state. We denote these codebook states by . Alice and Bob also agree on a set of linear features
that satisfies
Therefore, if each feature can be predicted with additive error , these features can be used to identify the state
. The communication protocol between Alice and Bob is now apparent:
1. Alice randomly selects an integer X from {1, . . . , M}.
2. Alice prepares N copies of the code-state associated to X and sends them to Bob.
3. Bob performs POVMs on individual states and receives a string of measurement outcomes Y .
4. Bob inputs Y into the feature prediction machine to estimate .
5. Bob finds X that has the largest .
The working assumption is that the feature prediction machine can estimate within
-error and high success probability. This in turn ensures that this plain communication protocol is mostly successful, i.e. X = X with high probability. In words: Alice can transmit information to Bob, when no adversary is present.
We now show how they can still communicate safely in the presence of an adversary (Loki) who randomly rotates the transmitted code states en route: and U is a Haar-random unitary.
This random rotation affects the measurement outcome statistics associated with the fixed POVMs . Each element of
is now a random variable that depends on both X and U. After Bob has performed the quantum measurements to obtain Y , the adversary confesses to Bob and reveals the random unitary U. While Bob no longer has any copies of
, he can still incorporate precise knowledge of U by instructing the machine to predict linear features
, instead of the original
. This reverses the effect of the original unitary transformation, because
. This modification renders the original communication protocol stable with respect to Loki’s actions. Alice can still send any integer in {1, . . . , M} to Bob with high probability.
The following arguments use properties of Shannon entropy and mutual information which can be found in standard textbooks on information theory, such as [17].
The communication protocol is guaranteed to work with high probability, ensuring that Bob’s recovered message equals Alice’s input X with high probability. Moreover, we assume that Alice selects her message uniformly at random. Fano’s inequality then implies
where I(X : X) is the mutual information, and H(X) is the Shannon entropy. By assumption, Loki chooses the unitary roatation U uniformly at random, regardless of the message X. This implies I(X : U) = 0 and, in turn
For fixed U, X is the output of the machine that only takes into account the measurement outcomes Y . The data processing inequality then yields
Recall that Y is the measurement outcome of the N POVMs . We denote the measurement outcome of
as
. Because
are random variables that depend on X and U,
The second to last equality uses the fact that when X, U are fixed, are independent. To obtain the best lower bound, we should choose Alice’s signal states
such that
on
is as small as possible. In Sec. 7 D, we will see that, no matter how Bob chooses his measurements
, there are signal states satisfying (S87) such that
Assuming that this relation holds, we have established a connection between M and and, therefore,
. This establishes the claim in Theorem 5.
We now construct a codebook and linear features
with
that obey two key properties:
1. the code states obey the requirement displayed in Eq. (S92).
2. the linear features are capable of identifying a unique code state:
The second condition requires each to be distinguishable from
via linear features
. The first condition, on the contrary, requires
to convey as little information about X as possible. The general idea would then be to create distinguishable quantum states that are, at the same time, very similar to each other.
In order to achieve these two goals, we choose M rank-B/4 subspace projectors that obey
for all
. The probabilistic method asserts that such a projector configuration exists; see Lemma 7 below. Now, we set
It is easy to check that this construction meets the requirement displayed in Eq. (S93). The other condition – Eq. (S92) is verified in Lemma 8 below.
Lemma 7. If and
, then
rank-r subspace projectors
such that
Proof. We find the subspace projectors using a probabilistic argument. We randomly choose M rank-r subspaces according to the unitarily invariant measure in the Hilbert space, the Grassmannian, and bound the probability that the randomly chosen subspaces do not satisfy the condition. For a pair of fixed , we have
where we make use of [37, Lemma 6] in the first inequality and for all
in the second inequality. A union bound then asserts
Because the probability is less than one, there must exist that satisfy the desired property.
Lemma 8. Consider a set of d-dimensional quantum states such that
, where
is a rank-r subspace projector. Consider U sampled from Haar measure, and X sampled from {1, . . . , M} uniformly at random. Consider any POVM measurement F. Then the information gain regarding X, conditioned on U, obtained from the measurement F performed on the state
satisfies
Note that we can obtain the statement (S92) by choosing and r = B/4, hence completing the proof of Theorem 5.
Proof. First of all, let us decompose all POVM elements to rank-1 elements
, where
. We can perform measurement F by performing measurement with
: when we measure a rank-1 element, we return the original POVM element the rank-1 element belongs to. Using data processing inequality, we have I(X : F on
on
. From now on, we can consider the POVM
to be
. Normalization demands
Let us define the probability vector so
And the expression we hope to bound satisfies I(X : F on
on
on
F on
using the chain rule and the nonnegativity of mutual information. We now bound
The second equality uses the fact that which follows from the fact that
. The inequality uses the fact that log(x) is concave, so
. Using properties of Haar random unitary
matrices, we conclude
Therefore we have
which establishes the claim:
In Theorem 5, we have shown that if a procedure can predict arbitrary observables with , then it must use at least
single-copy measurements (as long as M is not extraordinarily large). A similar argument can be used to show that if a procedure can predict arbitrary k-local observables, then it requires at least
single-copy measurements (when M is not too large). This is because if we focus on a k-qubit subsystem, then the guarantee allows us to predict arbitrary observables
with
. In the following theorem, we show a stronger lower bound by focusing on local measurements. A local measurement is a POVM
where
,
, and
. This is the same as not performing any entangling gates when implementing the measurement. (Random) Pauli basis measurements are a prominent example.
Theorem 6 (Detailed restatement of Theorem 2 for exponential scaling in locality). Fix a sequence of local measurements on n-qubit system, i.e.,
where
,
, and
. Suppose that given any M k-local observables
, there exists a machine (with arbitrary runtime as long as it always terminates) that can use the measurement outcomes of
on N copies of an unknown quantum state
to
-accurately predict
with high probability. Assuming
, then necessarily
Proof. The proof uses a quantum communication protocol between Alice and Bob, with Loki interfering in the middle. Alice would encode some classical information in the quantum state and send to Bob. Bob would then use the prediction procedure to decode the encoded classical information. In the middle, Loki will alter the quantum state by applying a random unitary. Loki would then reveal the random unitary to Bob after Bob performed quantum measurements on the quantum states. An illustration of the communication protocol can be found in Supplementary Figure 4. The quantum state Alice encodes, the unitary applied by Loki, and the features predicted by Bob is considerably simplified in this result compared to the previous proof.
We define is the i-th Pauli observable acting on k qubits in the n-qubit system. Any ordering of the Pauli observables is fine. Note that there are at most
such Pauli observables. This is the reason why we assume
. The corresponding linear functions chosen by Bob are
. This guarantees the following relation:
where is the Kronecker-delta (
if i = j and
otherwise). The random unitary applied by Loki consists of random single-qubit unitary rotations, i.e.
. The complete communication protocol works as follows.
1. Alice randomly selects an integer X from {1, . . . , M}.
2. Alice prepares N copies of the code-state according associated to X and sends them to Bob.
3. Loki intercepts the N copies, samples a random unitary , applies U on all copies of
, and sends to Bob.
4. Bob performs local measurements on individual states and receives a string of measurement outcomes Y .
5. Loki reveals the random unitary U to Bob. Now Bob would have to predict the expectation value of instead of the original
.
6. Since are still k-local observables, Bob can input Y into the feature prediction machine to estimate
.
7. Bob finds that has the largest
.
Because are predicted to
additive error, and
, if the prediction procedure works as guaranteed, Bob’s decoded information
would be equal to Alice’s encoded information X with high probability. Moreover, we assume that Alice selects her message uniformly at random. Fano’s inequality then implies
where I(X : X) is the mutual information, and H(X) is the Shannon entropy. By assumption, Loki chooses the random unitary U regardless of the message X. This implies I(X : U) = 0 and, in turn
For fixed U, X is the output of the machine that only takes into account the measurement outcomes Y . The data processing inequality then implies
Recall that Y is the measurement outcome of the N POVMs . We denote the measurement outcome of
as
. Because
are random variables that depend on X and U,
The second to last equality uses the fact that when X, U are fixed, are independent. This part of the derivation is exactly the same as in Supplementary Section 7 C. All that is left is to properly upper bound
on
. First, by definition,
The last inequality exploits concavity of the Shannon entropy . By assumption, the
’s must be local measurements, i.e.
where
, and
. We define the probability of measuring i-th outcome using POVM
as
which is a random number depending on X and U. Using Equation (S110) and the definition of , we have
The second inequality uses the fact that log(x) is concave, so . We now compute
and
by using the following relation for single-qubit random unitary:
where j refers to the j-th qubit, and S is the two qubit swap operator (). Recall the definition of
in Equation (S111). Together with the above relation, we have
Putting this computation into Inequality (S112), we have obtained
Combining the above result with Inequality (S108) and (S109), we have
[1] S. Aaronson. Shadow tomography of quantum states. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pages 325–338, New York, NY, USA, 2018. ACM.
[3] S. Aaronson and G. N. Rothblum. Gentle measurement of quantum states and differential privacy. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 322–333, New York, NY, USA, 2019. Association for Computing Machinery.
[4] A. Acín, D. Bruß, M. Lewenstein, and A. Sanpera. Classification of mixed three-qubit states. Phys. Rev. Lett., 87:040401, Jul 2001.
[5] K. Banaszek, M. Cramer, and D. Gross. Focus on quantum tomography. New J. Phys., 15(12):125020, Dec 2013. [6] A. Blum, A. Kalai, and H. Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM, 50(4):506–519, July 2003.
[7] R. Blume-Kohout. Optimal, reliable estimation of quantum states. New J. Phys., 12(4):043034, Apr 2010. [8] X. Bonet-Monroig, R. Babbush, and T. E. O’Brien. Nearly optimal measurement scheduling for partial tomography of quantum states. arXiv preprint arXiv:1908.05628, 2019.
[9] F. G. Brandão, W. Chemissany, N. Hunter-Jones, R. Kueng, and J. Preskill. Models of quantum complexity growth. arXiv preprint arXiv:1912.04297, 2019.
[10] F. G. Brandão, A. Kalev, T. Li, C. Y.-Y. Lin, K. M. Svore, and X. Wu. Quantum SDP solvers: Large speed-ups, optimality, and applications to quantum learning. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
[11] H. J. Briegel, D. E. Browne, W. Dür, R. Raussendorf, and M. Van den Nest. Measurement-based quantum compu- tation. Nat. Phys., 5:19–26, Jan 2009.
[12] T. Brydges, A. Elben, P. Jurcevic, B. Vermersch, C. Maier, B. P. Lanyon, P. Zoller, R. Blatt, and C. F. Roos. Probing Rényi entanglement entropy via randomized measurements. Science, 364(6437):260–263, 2019.
[13] G. Carleo and M. Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
[14] J. Carrasquilla and R. G. Melko. Machine learning phases of matter. Nat. Phys, 13(5):431–434, 2017. [15] J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita. Reconstructing quantum states with generative models. Nat. Mach. Intell., 1(3):155–161, 2019.
[16] J. Cotler and F. Wilczek. Quantum overlapping tomography. Phys. Rev. Lett., 124:100401, Mar 2020. [17] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2006.
[18] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma, D. Gross, S. D. Bartlett, O. Landon-Cardinal, D. Poulin, and Y.-K. Liu. Efficient quantum state tomography. Nat. Commun., 1:149, 2010.
[19] M. P. da Silva, O. Landon-Cardinal, and D. Poulin. Practical characterization of quantum devices without tomog- raphy. Phys. Rev. Lett., 107(21):210404, 2011.
[20] C. Dasgupta and S.-k. Ma. Low-temperature properties of the random heisenberg antiferromagnetic chain. Phys. Rev. B, 22(3):1305, 1980.
[21] E. Dennis, A. Kitaev, and J. Preskill. Topological quantum memory. volume 43, pages 4452–4505. 2002. Quantum information theory.
[22] B. Efron and R. J. Tibshirani. An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York, 1993.
[23] J. Emerson, R. Alicki, and K. Życzkowski. Scalable noise estimation with random unitary operators. J. Opt. B Quantum Semiclass. Opt., 7(10):S347–S352, 2005.
[24] T. J. Evans, R. Harper, and S. T. Flammia. Scalable Bayesian Hamiltonian learning. arXiv preprint
[25] R. M. Fano. Transmission of information: A statistical theory of communications. The M.I.T. Press, Cambridge, Mass.; John Wiley & Sons, Inc., New York-London, 1961.
[26] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert. Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators. New J. Phys., 14(9):095022, 2012.
[27] S. T. Flammia and Y.-K. Liu. Direct fidelity estimation from few Pauli measurements. Phys. Rev. Lett., 106:230501, Jun 2011.
[28] N. Friis, G. Vitagliano, M. Malik, and M. Huber. Entanglement certification from theory to experiment. Nat. Rev. Phys., 1(1):72–87, 2019.
[29] X. Gao and L.-M. Duan. Efficient representation of quantum many-body states with deep neural networks. Nat. Commun., 8(1):662, 2017.
[30] D. Gosset and J. Smolin. A Compressed Classical Description of Quantum States. In 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019), volume 135 of Leibniz International Proceedings in Informatics (LIPIcs), pages 8:1–8:9, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
[31] D. Gottesman. Stabilizer codes and quantum error correction. Caltech Ph. D. PhD thesis, Thesis, eprint: quant- ph/9705052, 1997.
[32] D. M. Greenberger, M. A. Horne, and A. Zeilinger. , pages 69–72. Springer Netherlands, Dordrecht, 1989.
[33] D. Gross, F. Krahmer, and R. Kueng. A partial derandomization of PhaseLift using spherical designs. J. Fourier Anal. Appl., 21(2):229–266, 2015.
[34] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert. Quantum state tomography via compressed sensing. Phys. Rev. Lett., 105:150401, Oct 2010.
[35] M. Guta, J. Kahn, R. J. Kueng, and J. A. Tropp. Fast state tomography with optimal error bounds. J. Phys. A, 2020.
[36] O. Gühne and G. Tóth. Entanglement detection. Phys. Rep., 474(1):1 – 75, 2009. [37] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu. Sample-optimal tomography of quantum states. IEEE T. Inform. Theory, 63(9):5628–5641, 2017.
[38] W. Hoeffding. A class of statistics with asymptotically normal distribution. In Breakthroughs in Statistics, pages 308–334. Springer, 1992.
[39] Z. Hradil. Quantum-state estimation. Phys. Rev. A, 55:R1561–R1564, Mar 1997. [40] H.-Y. Huang and R. Kueng. Predicting features of quantum systems using classical shadows. arXiv preprint
[41] M. R. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci., 43(2-3):169–188, 1986.
[42] Z. Jiang, A. Kalev, W. Mruczkiewicz, and H. Neven. Optimal fermion-to-qubit mapping via ternary trees with applications to reduced quantum states learning. arXiv preprint arXiv:1910.10746, 2019.
[43] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549(7671):242–246, 2017.
[44] E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. Randomized benchmarking of quantum gates. Phys. Rev. A, 77:012307, Jan 2008.
[45] R. Koenig and J. A. Smolin. How to efficiently select an arbitrary Clifford group element. J. Math. Phys., 55(12):122202, 12, 2014.
[46] C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi, R. Blatt, C. F. Roos, et al. Self-verifying variational quantum simulation of lattice models. Nature, 569(7756):355–360, 2019.
[47] R. Kueng and D. Gross. Qubit stabilizer states are complex projective 3-designs. arXiv preprint arXiv:1510.02767, 2015.
[48] R. Kueng, H. Rauhut, and U. Terstiege. Low rank matrix recovery from rank one measurements. Appl. Comput. Harmon. Anal., 42(1):88–116, 2017.
[49] R. Kueng, H. Zhu, and D. Gross. Low rank matrix recovery from Clifford orbits. arXiv preprint arXiv:1610.08070, 2016.
[50] O. Landon-Cardinal and D. Poulin. Practical learning method for multi-scale entangled states. New J. of Phys., 14(8):085004, 2012.
[51] B. P. Lanyon, C. Maier, M. Holzäpfel, T. Baumgratz, C. Hempel, P. Jurcevic, I. Dhand, A. S. Buyskikh, A. J. Daley, M. Cramer, M. B. Plenio, R. Blatt, and C. F. Roos. Efficient tomography of a quantum many-body system. Nat. Phys., 13:1158 EP –, Sep 2017.
[52] S.-k. Ma, C. Dasgupta, and C.-k. Hu. Random antiferromagnetic chain. Phys. Rev. Lett., 43(19):1434, 1979. [53] E. Magesan, J. M. Gambetta, and J. Emerson. Scalable and robust randomized benchmarking of quantum processes. Phys. Rev. Lett., 106:180504, May 2011.
[54] R. Nandkishore and D. A. Huse. Many-body localization and thermalization in quantum statistical mechanics. Annu. Rev. Condens. Matter Phys., 6(1):15–38, 2015.
[55] A. S. Nemirovsky and D. B. a. Yudin. Problem complexity and method efficiency in optimization. A WileyInterscience Publication. John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics.
[56] M. A. Nielsen and I. L. Chuang. Quantum computation and quantum information. Cambridge University Press, Cambridge, 2000.
[57] R. O’Donnell and J. Wright. Efficient quantum tomography. In Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing, STOC ’16, pages 899–912, New York, NY, USA, 2016. ACM.
[60] P. Raghavan. Probabilistic construction of deterministic algorithms: approximating packing integer programs. volume 37, pages 130–143. 1988. Twenty-Seventh Annual IEEE Symposium on the Foundations of Computer Science (Toronto, ON, 1986).
[61] R. Raussendorf and H. J. Briegel. A one-way quantum computer. Phys. Rev. Lett., 86:5188–5191, May 2001. [62] G. Refael and E. Altman. Strong disorder renormalization group primer and the superfluid–insulator transition. C. R. Phys., 14(8):725–739, 2013.
[63] O. Regev. On lattices, learning with errors, random linear codes, and cryptography. J. ACM, 56(6), Sept. 2009. [64] J. M. Renes, R. Blume-Kohout, A. J. Scott, and C. M. Caves. Symmetric informationally complete quantum measurements. J. Math. Phys., 45(6):2171–2180, 2004.
[65] I. Roth, R. Kueng, S. Kimmel, Y.-K. Liu, D. Gross, J. Eisert, and M. Kliesch. Recovering quantum gates from few average gate fidelities. Phys. Rev. Lett., 121:170502, Oct 2018.
[66] S. Shalev-Shwartz, O. Shamir, and S. Shammah. Failures of gradient-based deep learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3067–3075, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
[67] J. Spencer. Ten lectures on the probabilistic method, volume 64 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 1994.
[68] T. Sugiyama, P. S. Turner, and M. Murao. Precision-guaranteed quantum tomography. Phys. Rev. Lett., 111:160406, Oct 2013.
[69] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo. Neural-network quantum state tomography. Nat. Phys., 14(5):447–450, 2018.
[70] Z. Webb. The clifford group forms a unitary 3-design. Quantum Information & Computation, 16(15-16):1379–1400, 2016.
[71] A. Wigderson and D. Xiao. Derandomizing the Ahlswede-Winter matrix-valued Chernoff bound using pessimistic estimators, and applications. Theory Comput., 4:53–76, 2008.