35:[["$","audio",null,{"id":"tts"}],["$","$L3a",null,{"paperID":"2405.19380","publisher":"arxiv","paperJSON":{"title":"Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\\sqrt{T})$ Regret","paperID":"2405.19380","avgLineHeight":13.55,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of ","element":"span"},{"style":{"height":18.3},"width":109.21,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/0-0.png","element":"img","alt":" O(√T","inline":true},{"text":"). Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an ","element":"span"},{"style":{"height":18.3},"width":109.21,"height":45.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/0-1.png","element":"img","alt":"O(√T","inline":true},{"text":") regret bound without relying on the restrictive assumptions that are often used in the literature.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Balancing the exploration-exploitation trade-off is a fundamental challenge in reinforcement learning (RL) because in most cases, there is no clear criterion to choose between acting to learn about the unknown environment (‘exploration’) or making a reward-maximizing decision given the information gathered thus far (‘exploitation’). This dilemma has been systematically addressed by two principal approaches: ","element":"span"},{"style":{"fontStyle":"italic"},"text":"optimism in the face of uncertainty ","element":"span"},{"text":"(OFU) and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Thompson sampling ","element":"span"},{"text":"(TS). OFU-based methods construct confidence sets for the environment or model parameters using the data observed thus far. An optimistic or reward-maximizing set of parameters is then selected from within this confidence set, and a corresponding optimal policy is executed ","element":"span"},{"href":"#id-0","referenceIndex":1,"text":"[1]","element":"a"},{"text":". Algorithms based on OFU have been shown to provide strong theoretical guarantees, particularly in the context of bandit problems ","element":"span"},{"href":"#id-1","referenceIndex":2,"text":"[2]","element":"a"},{"text":". On the other hand, TS is a Bayesian method in which the environment or model parameters are sampled from a posterior distribution that is updated over time using observed data and a prior ","element":"span"},{"href":"#id-2","referenceIndex":3,"text":"[3]","element":"a"},{"text":". An optimal policy with respect to the sampled parameters is then constructed and executed. TS is often more computationally tractable than OFU, as OFU typically requires solving a nonconvex optimization problem over a confidence set in each episode. TS has demonstrated effectiveness in online learning across a wide range of sequential decision-making problems, including multi-armed bandits ","element":"span"},{"href":"#id-3","referenceIndex":4,"text":"[4–","element":"a"},{"href":"#id-4","referenceIndex":6,"text":"6]","element":"a"},{"text":", Markov decision processes ","element":"span"},{"href":"#id-5","referenceIndex":7,"text":"[7–","element":"a"},{"href":"#id-6","referenceIndex":9,"text":"9]","element":"a"},{"text":", and LQR problems ","element":"span"},{"href":"#id-7","referenceIndex":8,"text":"[8,","element":"a"},{"href":"#id-8","referenceIndex":10,"text":"10–","element":"a"},{"href":"#id-9","referenceIndex":13,"text":"13]","element":"a"},{"text":".","element":"span"}],[{"text":"In TS-based online learning, posterior sampling becomes challenging in high-dimensional settings. It is also computationally intractable when the posterior distribution lacks a closed-form expression, which occurs when the noise and prior distributions are not conjugate. To address this, Markov Chain Monte Carlo (MCMC) methods—particularly Langevin MCMC—have been proposed ","element":"span"},{"href":"#id-10","referenceIndex":14,"text":"[14–","element":"a"},{"href":"#id-11","referenceIndex":17,"text":"17]","element":"a"},{"text":". With these theoretical foundations, there have been attempts to leverage Langevin MCMC to effectively solve contextual bandit problems ","element":"span"},{"href":"#id-12","referenceIndex":18,"text":"[18–","element":"a"},{"href":"#id-13","referenceIndex":20,"text":"20] ","element":"a"},{"text":"and MDPs ","element":"span"},{"href":"#id-14","referenceIndex":21,"text":"[21, ","element":"a"},{"href":"#id-15","referenceIndex":22,"text":"22]","element":"a"},{"text":". Nevertheless, Langevin MCMC is computationally intensive. To mitigate this issue, various acceleration techniques have been studied (see ","element":"span"},{"href":"#id-11","referenceIndex":17,"text":"[17, ","element":"a"},{"href":"#id-16","referenceIndex":23,"text":"23–","element":"a"},{"href":"#id-17","referenceIndex":26,"text":"26] ","element":"a"},{"text":"and references therein). In particular, preconditioning has been shown to be effective for improving sampling efficiency ","element":"span"},{"href":"#id-11","referenceIndex":17,"text":"[17, ","element":"a"},{"href":"#id-18","referenceIndex":27,"text":"27–","element":"a"},{"href":"#id-19","referenceIndex":29,"text":"29]","element":"a"},{"text":". Motivated by these findings, we incorporate preconditioned Langevin MCMC into TS for LQR problems.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Related work","element":"span"}],[{"text":"There is a rich body of literature regarding regret analysis for online learning of LQR problems, which are categorized as follows.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Certainty equivalence (CE)","element":"span"},{"text":": The certainty equivalence principle ","element":"span"},{"href":"#id-20","referenceIndex":30,"text":"[30] ","element":"a"},{"text":"has been widely adopted for learning dynamical systems with unknown transitions, wherein the optimal policy is designed under the assumption that the estimated system parameters accurately represent the true parameters. The performance of CE-based methods has been extensively studied across various settings, including online learning ","element":"span"},{"href":"#id-21","referenceIndex":31,"text":"[31–","element":"a"},{"href":"#id-22","referenceIndex":34,"text":"34]","element":"a"},{"text":", sample complexity analysis ","element":"span"},{"href":"#id-23","referenceIndex":35,"text":"[35]","element":"a"},{"text":", finite-time stabilization ","element":"span"},{"href":"#id-24","referenceIndex":36,"text":"[36]","element":"a"},{"text":", and asymptotic regret bounds ","element":"span"},{"href":"#id-9","referenceIndex":13,"text":"[13]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Optimism in the face of uncertainty (OFU)","element":"span"},{"text":": ","element":"span"},{"href":"#id-25","referenceIndex":37,"text":"[37, ","element":"a"},{"href":"#id-26","referenceIndex":38,"text":"38] ","element":"a"},{"text":"proposed OFU-based learning algorithms that iteratively select high-performing control actions while constructing confidence sets. These methods achieve a frequentist regret bound of ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.83,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-0.png","element":"img","alt":"O(√T","inline":true},{"text":"), but are often computationally impractical due to the complexity of the resulting constraints. ","element":"span"},{"text":"To address this issue, subsequent works ","element":"span"},{"href":"#id-27","referenceIndex":39,"text":"[39,","element":"a"},{"href":"#id-28","referenceIndex":40,"text":"40] ","element":"a"},{"text":"translated the nonconvex optimization problem inherent in OFU into a semidefinite programming (SDP) formulation, attaining the same ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.83,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-1.png","element":"img","alt":"O(√T","inline":true},{"text":") regret bound with high probability. Alternatively, ","element":"span"},{"href":"#id-9","referenceIndex":13,"text":"[13,","element":"a"},{"href":"#id-29","referenceIndex":41,"text":"41] ","element":"a"},{"text":"introduced randomized control actions to avoid constructing confidence sets, while still achieving an asymptotic regret bound of ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.82,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-2.png","element":"img","alt":"O(√T","inline":true},{"text":"). More recently, ","element":"span"},{"href":"#id-30","referenceIndex":42,"text":"[42] ","element":"a"},{"text":"proposed an algorithm that rapidly stabilizes the system and attains a ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.84,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-3.png","element":"img","alt":"O(√T","inline":true},{"text":") frequentist regret bound without requiring a stabilizing control gain matrix.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Thompson sampling (TS)","element":"span"},{"text":": It has been shown that the upper bound for the frequentist regret under Gaussian noise can be as large as ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.33},"width":133.9,"height":50.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-4.png","element":"img","alt":"O(T 2/3","inline":true},{"text":") ","element":"span"},{"href":"#id-31","referenceIndex":12,"text":"[12]","element":"a"},{"text":", which was later improved to ","element":"span"},{"text":"˜","element":"span"},{"href":"#id-32","referenceIndex":43,"style":{"height":19.98},"width":271.03,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-5.png","element":"img","alt":"O(√T) in [43]","inline":true,"padRight":true},{"text":"using a TS-based approach; however, this result is limited to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"scalar ","element":"span"},{"text":"systems. Subsequently, ","element":"span"},{"href":"#id-33","referenceIndex":44,"text":"[44] ","element":"a"},{"text":"extended the analysis to multidimensional systems, achieving a ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.82,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-6.png","element":"img","alt":"O(√T","inline":true},{"text":") frequentist regret bound. Nonetheless, the Gaussian noise assumption remains essential for establishing these guarantees. For the Bayesian regret bound, prior results ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10,","element":"a"},{"href":"#id-34","referenceIndex":45,"text":"45] ","element":"a"},{"text":"demonstrate the potential of TS-based algorithms to achieve a ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.84,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/1-7.png","element":"img","alt":"O(√T","inline":true},{"text":") Bayesian regret bound. However, these methods are subject to several limitations. Specifically, both the noise and the prior distribution over system parameters are assumed to be Gaussian, ensuring conjugacy between the prior and posterior. Additionally, the columns of the system parameter matrix are assumed to be mutually independent.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Comparison with ","element":"span"},{"href":"#id-13","referenceIndex":20,"style":{"fontWeight":"bold"},"text":"[20]","element":"a"},{"style":{"fontWeight":"bold"},"text":": ","element":"span"},{"text":"Our work builds on the ideas introduced in ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20]","element":"a"},{"text":", which focuses on multi-armed bandits. However, key differences arise due to the fundamentally different nature of LQR problems. For example, in the bandit setting, the strong log-concavity of the reward function ensures linear growth of the likelihood function as more data is collected. This property plays a crucial role in their analysis. In contrast, such growth does not occur in LQR problems, prompting us to introduce an adaptive preconditioner to improve computational efficiency. ","element":"span"},{"text":"Moreover, the Lipschitz smoothness of the log-reward function in ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20] ","element":"a"},{"text":"facilitates the analysis of the gap between exact and approximate posteriors—a simplification that does not hold in the LQR setting.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Contributions","element":"span"}],[{"text":"In this paper, we propose a computationally efficient approximate Thompson sampling algorithm for learning linear quadratic regulators (LQR) with a Bayesian regret bound of ","element":"span"},{"style":{"height":19.98},"width":278.86,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-0.png","element":"img","alt":" O(√T).1 Our","inline":true,"padRight":true},{"text":"algorithm is based on carefully designed Langevin dynamics that achieve an improved convergence rate. The regret analysis is conducted under the assumption that the system noise follows a strongly log-concave distribution—a relaxation of the Gaussian noise assumption commonly adopted in prior works. To the best of our knowledge, our method achieves the tightest known Bayesian regret bound for online LQR learning, improving upon the existing ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":304.7,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-1.png","element":"img","alt":"O(√T) bounds2 ","inline":true,"padRight":true},{"text":"in the literature ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10,","element":"a"},{"href":"#id-32","referenceIndex":43,"text":"43,","element":"a"},{"href":"#id-34","referenceIndex":45,"text":"45]","element":"a"},{"text":".","element":"span"}],[{"text":"It is worth noting that in ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10,","element":"a"},{"href":"#id-34","referenceIndex":45,"text":"45]","element":"a"},{"text":", the system noise is assumed to follow independent and identically distributed Gaussian. Moreover, the columns of the system parameter matrix are assumed to be mutually independent and Gaussian in the prior, which is key to both the tractability of their regret analysis and the simplification of posterior updates. In contrast, our work not only achieves a tighter regret bound but also relaxes these restrictive assumptions. While we adopt the assumption on system parameters from ","element":"span"},{"href":"#id-32","referenceIndex":43,"text":"[43]","element":"a"},{"text":", we go beyond their analysis by establishing a regret bound that holds for multi-dimensional systems.","element":"span"}],[{"text":"The two key components of our method are: (","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":") a preconditioned unadjusted Langevin algorithm (ULA) for approximate Thompson sampling, and (","element":"span"},{"style":{"fontStyle":"italic"},"text":"ii","element":"span"},{"text":") a simple excitation mechanism. The proposed excitation mechanism injects a noise signal into the control input at the end of each episode, which causes the minimum eigenvalue of the preconditioner to increase over time, thereby accelerating the posterior sampling process. We identify appropriate step sizes and iteration counts for the preconditioned Langevin MCMC and demonstrate both an accelerated convergence rate for approximate Thompson sampling and improved learning performance. Specifically, we show that the sampled system parameters converge to the true parameters at a rate of ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":22.89},"width":113.35,"height":57.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-2.png","element":"img","alt":"O(t− 14","inline":true,"padRight":true},{"text":"). This improvement yields a tighter bound on the system state norm, which in turn contributes to achieving the improved regret bound of ","element":"span"},{"style":{"height":19.98},"width":148.36,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-3.png","element":"img","alt":" O(√T).","inline":true}]]},{"heading":"2 Preliminaries","paragraphs":[[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Linear-Quadratic Regulators","element":"span"}],[{"text":"Consider a linear stochastic system of the form","element":"span"}],[{"id":"id-95","style":{"width":"69%"},"width":1296,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":145.72,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-5.png","element":"img","alt":" xt ∈ Rn ","inline":true,"padRight":true},{"text":"is the system input, and ","element":"span"},{"style":{"height":14.62},"width":163.31,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-6.png","element":"img","alt":" ut ∈ Rnu ","inline":true,"padRight":true},{"text":"is the control input. The disturbance ","element":"span"},{"style":{"height":14.62},"width":197.45,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-7.png","element":"img","alt":" wt ∈ Rn is","inline":true,"padRight":true},{"text":"an independent and identically distributed (i.i.d.) zero-mean random vector with covariance matrix ","element":"span"},{"style":{"fontWeight":"bold"},"text":"W","element":"span"},{"text":". Throughout the paper, let ","element":"span"},{"style":{"height":14.62},"width":40.18,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-8.png","element":"img","alt":" In","inline":true,"padRight":true},{"text":"denote the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"by ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"identity matrix, let ","element":"span"},{"style":{"height":20.96},"width":438.49,"height":52.41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/2-9.png","element":"img","alt":" |v|P :=√v⊤Pv be the","inline":true,"padRight":true},{"text":"weighted 2-norm of a vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"with respect to a positive semidefinite matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"indicate the Euclidean norm, and let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"|","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"style":{"fontStyle":"italic"},"text":"| ","element":"span"},{"text":"represent the spectral norm of a matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":".","element":"span"}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Assumption 2.1. ","element":"span"},{"text":"For every ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . .","element":"span"},{"text":", the random vector ","element":"span"},{"style":{"height":10.62},"width":43.24,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-0.png","element":"img","alt":" wt","inline":true,"padRight":true},{"text":"satisfies the following properties:","element":"span"}],[{"text":"1. The probability density function (pdf) of noise ","element":"span"},{"style":{"height":17.6},"width":78.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-1.png","element":"img","alt":" pw(·","inline":true},{"text":") is known and twice differentiable. Additionally, ","element":"span"},{"style":{"height":19.13},"width":939.69,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-2.png","element":"img","alt":" mIn ⪯ −∇2 log pw(·) ⪯ mIn. for some m, m > 0.3","inline":true}],[{"text":"2. ","element":"span"},{"style":{"height":17.6},"width":753.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-3.png","element":"img","alt":" E[wt] = 0 and E[wtw⊤t ] = W, where W","inline":true,"padRight":true},{"text":"is positive definite.","element":"span"}],[{"text":"Our paper deals with a broader class of disturbances compared to existing methods ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10,","element":"a"},{"href":"#id-32","referenceIndex":43,"text":"43,","element":"a"},{"href":"#id-34","referenceIndex":45,"text":"45]","element":"a"},{"text":", as any multivariate Gaussian distribution satisfies the assumption.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":20.8},"width":1725.96,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-4.png","element":"img","alt":" d := n + nu and Θ be the system parameter matrix defined by Θ :=�Θ(1) · · · Θ(n)�:=","inline":true},{"style":{"height":21},"width":918.84,"height":52.49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-5.png","element":"img","alt":"�A B�⊤ ∈ Rd×n, where Θ(i) ∈ Rd is the i","inline":true},{"text":"th column of Θ. ","element":"span"},{"text":"We also let ","element":"span"},{"style":{"height":17.6},"width":323.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-6.png","element":"img","alt":" θ := vec(Θ) :=","inline":true,"padRight":true},{"text":"(Θ(1)","element":"span"},{"style":{"height":19.53},"width":440.39,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-7.png","element":"img","alt":", Θ(2), . . . , Θ(n)) ∈ Rdn ","inline":true,"padRight":true},{"text":"denote the vectorized version of Θ. We often refer to ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-8.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"as the parameter vector.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":17.6},"width":883.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-9.png","element":"img","alt":" ht := (x1, u1, . . . , xt−1, ut−1, xt) be the history","inline":true,"padRight":true},{"text":"of observations made up to time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", and let ","element":"span"},{"style":{"height":14.62},"width":48.27,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-10.png","element":"img","alt":"Ht","inline":true,"padRight":true},{"text":"denote the collection of such histories at stage ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". A (deterministic) policy ","element":"span"},{"style":{"height":10.22},"width":36.88,"height":25.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-11.png","element":"img","alt":" πt","inline":true,"padRight":true},{"text":"maps history ","element":"span"},{"style":{"height":15.02},"width":91.94,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-12.png","element":"img","alt":" ht to","inline":true,"padRight":true},{"text":"action ","element":"span"},{"style":{"height":17.6},"width":1740.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-13.png","element":"img","alt":" ut, i.e., πt(ht) = ut. The set of admissible policies is defined as Π := {π = (π1, π2, . . .) | πt :","inline":true},{"style":{"height":14.62},"width":188.44,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-14.png","element":"img","alt":"Ht → Rnu ","inline":true,"padRight":true},{"text":"is measurable ","element":"span"},{"style":{"height":17.6},"width":73.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-15.png","element":"img","alt":" ∀t}.","inline":true}],[{"text":"The stage-wise cost is chosen to be a quadratic function of the form ","element":"span"},{"style":{"height":17.6},"width":527.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-16.png","element":"img","alt":" c(xt, ut) := x⊤t Qxt+u⊤t Rut,","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":16.73},"width":187.24,"height":41.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-17.png","element":"img","alt":" Q ∈ Rn×n ","inline":true,"padRight":true},{"text":"is symmetric positive semidefinite and ","element":"span"},{"style":{"height":13.93},"width":223.39,"height":34.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-18.png","element":"img","alt":" R ∈ Rnu×nu ","inline":true,"padRight":true},{"text":"is symmetric positive definite. The cost matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Q ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R ","element":"span"},{"text":"are assumed to be known.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-19.png","element":"img","alt":"4 ","inline":true,"padRight":true},{"text":"We consider the infinite-horizon average cost LQ setting with the following cost function:","element":"span"}],[{"style":{"width":"68%"},"width":1276,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-20.png","element":"img"}],[{"text":"Given ","element":"span"},{"style":{"height":19.53},"width":327.89,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-21.png","element":"img","alt":" θ ∈ Rdn, π∗(x; θ","inline":true},{"text":") denotes an optimal policy if it exists, and the corresponding optimal cost is given by ","element":"span"},{"style":{"height":17.6},"width":370.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-22.png","element":"img","alt":" J(θ) = infπ∈Π Jπ(θ","inline":true},{"text":"). It is well known that the optimal policy and cost can be obtained using the Riccati equation under the standard stabilizability and observability assumptions (e.g., ","element":"span"},{"href":"#id-35","referenceIndex":46,"text":"[46]","element":"a"},{"text":").","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"Theorem 2.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that ","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"A, B","element":"span"},{"text":") ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is stabilizable, and ","element":"span"},{"text":"(","element":"span"},{"style":{"height":20.33},"width":156.42,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-23.png","element":"img","alt":"A, Q1/2)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is observable. Then, the following algebraic Riccati equation (ARE) has a unique positive definite solution ","element":"span"},{"style":{"height":17.6},"width":122.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-24.png","element":"img","alt":" P ∗(θ):","inline":true}],[{"style":{"width":"84%"},"width":1584,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-25.png","element":"img"}],[{"id":"id-37","style":{"fontStyle":"italic"},"text":"Furthermore, the optimal cost function is given by ","element":"span"},{"style":{"height":17.6},"width":393.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-26.png","element":"img","alt":" J(θ) = tr(WP ∗(θ))","inline":true},{"style":{"fontStyle":"italic"},"text":", which is continuously differentiable with respect to ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-27.png","element":"img","alt":" θ","inline":true},{"style":{"fontStyle":"italic"},"text":", and the optimal policy is uniquely obtained as ","element":"span"},{"style":{"height":17.6},"width":346.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-28.png","element":"img","alt":" π∗(x; θ) = K(θ)x,","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"where the control gain matrix ","element":"span"},{"style":{"height":17.6},"width":95.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-29.png","element":"img","alt":" K(θ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is given by ","element":"span"},{"style":{"height":19.13},"width":791.63,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-30.png","element":"img","alt":" K(θ) := −(R + B⊤P ∗(θ)B)−1B⊤P ∗(θ)A.","inline":true}],[{"text":"The optimal policy, called the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"linear-quadratic regulator ","element":"span"},{"text":"(LQR), is an asymptotically stabilizing controller: it drives the closed-loop system state to the origin, that is, the spectrum of ","element":"span"},{"style":{"height":17.6},"width":215.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-31.png","element":"img","alt":" A + BK(θ)","inline":true,"padRight":true},{"text":"is contained in the interior of a unit circle ","element":"span"},{"href":"#id-35","referenceIndex":46,"text":"[46]","element":"a"},{"text":".","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Online learning of LQR","element":"span"}],[{"text":"The theory of LQR is applicable when the true system parameters ","element":"span"},{"style":{"height":20.8},"width":630.24,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-32.png","element":"img","alt":" θ∗ := vec(Θ∗) := vec(�A∗ B∗�⊤)","inline":true,"padRight":true},{"text":"are fully known and stabilizable. However, we consider the case where the true parameter vector ","element":"span"},{"style":{"height":15.02},"width":37.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/3-33.png","element":"img","alt":" θ∗","inline":true,"padRight":true},{"text":"is unknown. Online learning is a popular approach to addressing this case ","element":"span"},{"href":"#id-25","referenceIndex":37,"text":"[37]","element":"a"},{"text":". The performance of an online learning algorithm is typically measured by regret. In particular, we consider the Bayesian setting where the prior distribution ","element":"span"},{"style":{"height":11.6},"width":38.96,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-0.png","element":"img","alt":" p1","inline":true,"padRight":true},{"text":"of the true system parameter random variable ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":15.02},"width":37.49,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-1.png","element":"img","alt":"θ∗","inline":true,"padRight":true},{"text":"is assumed to be given, and define the Bayesian regret over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"stages as:","element":"span"}],[{"id":"id-72","style":{"width":"67%"},"width":1264,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-2.png","element":"img"}],[{"text":"The expectation is taken with respect to the distributions of system noise (","element":"span"},{"style":{"height":17.6},"width":384.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-3.png","element":"img","alt":"w1, w2, . . . , wT ), the","inline":true,"padRight":true},{"text":"internal randomness of the learning algorithm, and the prior distribution since we only have the belief of true system parameters in the form of the prior distribution.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Thompson sampling","element":"span"}],[{"text":"Thompson sampling (TS) or posterior sampling has been used in a large class of online learning problems ","element":"span"},{"href":"#id-36","referenceIndex":47,"text":"[47]","element":"a"},{"text":". The naive TS algorithm for learning LQR starts with sampling a system parameter from the posterior ","element":"span"},{"style":{"height":12},"width":44.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-4.png","element":"img","alt":" µk","inline":true,"padRight":true},{"text":"at the beginning of episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":". Considering this sample parameter as true, the control gain matrix ","element":"span"},{"style":{"height":17.6},"width":95.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-5.png","element":"img","alt":" K(θk","inline":true},{"text":") is computed by solving the ARE ","element":"span"},{"href":"#id-37","text":"(3)","element":"a"},{"text":". During the episode, the control gain matrix is used to produce control action ","element":"span"},{"style":{"height":17.6},"width":441.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-6.png","element":"img","alt":" ut = K(θk)xt, where xt","inline":true,"padRight":true},{"text":"is the system state observed at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Along the way, the state-input data is collected and the posterior is updated using the dataset. We will use dynamic episodes meaning that the length of the episode increases as the learning proceeds. Specifically, the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"th episode starts at ","element":"span"},{"style":{"height":24.22},"width":193.82,"height":60.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-7.png","element":"img","alt":" t = k(k+1)2","inline":true,"padRight":true},{"text":"and the sampled system parameter is used throughout the episode.","element":"span"}],[{"text":"The posterior update is performed using Bayes’ rule and it preserves the log-concavity of distributions. To see this we let ","element":"span"},{"style":{"height":19.54},"width":1300.1,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-8.png","element":"img","alt":" zt := (xt, ut) ∈ Rd and write p(xt+1|zt, θ) = pw(xt+1 − Θ⊤zt), which","inline":true,"padRight":true},{"text":"is log-concave with respect to ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-9.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"under Assumption ","element":"span"},{"href":"#id-38","text":"2.1. ","element":"a"},{"text":"Hence, the posterior at stage ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is given as","element":"span"}],[{"style":{"width":"79%"},"width":1484,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-10.png","element":"img"}],[{"id":"id-50","text":"Thus, if ","element":"span"},{"style":{"height":17.6},"width":109.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-11.png","element":"img","alt":" p(θ|ht","inline":true},{"text":") is log-concave, then so is ","element":"span"},{"style":{"height":17.6},"width":184.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-12.png","element":"img","alt":" p(θ|ht+1).","inline":true}],[{"text":"However, sampling from the posterior is computationally intractable particularly when the distributions at hand are not conjugate. Without conjugacy, posterior distribution does not have a closed-form expression. A popular approach to resolving this issue is using Markov chain Monte Carlo (MCMC) type algorithm that can be used for posterior sampling in an approximate but tractable way as described in the following subsection.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"2.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"The unadjusted Langevin algorithm (ULA)","element":"span"}],[{"text":"Consider the problem of sampling from a probability distribution with density ","element":"span"},{"style":{"height":20.33},"width":395.77,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-13.png","element":"img","alt":" p(x) ∝ e−U(x), where","inline":true,"padRight":true},{"text":"the potential ","element":"span"},{"style":{"height":12.4},"width":243.53,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-14.png","element":"img","alt":" U : Rnx → R","inline":true,"padRight":true},{"text":"is twice differentiable. The Langevin dynamics take the form","element":"span"}],[{"id":"id-43","style":{"width":"65%"},"width":1223,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-15.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":51.1,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-16.png","element":"img","alt":" Bτ","inline":true,"padRight":true},{"text":"is standard Brownian motion in ","element":"span"},{"style":{"height":12},"width":69.07,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-17.png","element":"img","alt":" Rnx","inline":true},{"text":". It is well-known that given an arbitrary ","element":"span"},{"style":{"height":15.6},"width":145.46,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-18.png","element":"img","alt":" X0, the","inline":true,"padRight":true},{"text":"pdf of ","element":"span"},{"style":{"height":17.24},"width":51.15,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-19.png","element":"img","alt":" Xξ","inline":true,"padRight":true},{"text":"converges to the target pdf ","element":"span"},{"href":"#id-39","referenceIndex":24,"style":{"height":17.6},"width":474.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-20.png","element":"img","alt":" p(x) as ξ → ∞ [24, 48].","inline":true,"padRight":true},{"text":"To approximate ","element":"span"},{"style":{"height":16.4},"width":265.31,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-21.png","element":"img","alt":" Xτ, we apply","inline":true,"padRight":true},{"text":"the Euler–Maruyama discretization to the Langevin diffusion, yielding the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"unadjusted Langevin algorithm ","element":"span"},{"text":"(ULA):","element":"span"}],[{"id":"id-44","style":{"width":"68%"},"width":1277,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/4-22.png","element":"img"}],[{"text":"where (","element":"span"},{"style":{"height":18.22},"width":134.59,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-0.png","element":"img","alt":"Wj)j≥1","inline":true,"padRight":true},{"text":"are i.i.d. standard ","element":"span"},{"style":{"height":10.62},"width":45.19,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-1.png","element":"img","alt":" nx","inline":true},{"text":"-dimensional Gaussian random vectors, and (","element":"span"},{"style":{"height":18.22},"width":286.13,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-2.png","element":"img","alt":"γj)j≥1 are step","inline":true,"padRight":true},{"text":"sizes. While Metropolis–Hastings corrections are often used to mitigate discretization error ","element":"span"},{"href":"#id-40","referenceIndex":15,"text":"[15,","element":"a"},{"href":"#id-41","referenceIndex":49,"text":"49]","element":"a"},{"text":", small step sizes can eliminate the need for such adjustments. In this work, we propose adaptive step sizes and iteration counts that ensure improved concentration properties, as discussed in Section ","element":"span"},{"href":"#id-42","text":"3.2.","element":"a"}],[{"text":"The condition number of the Hessian of the potential is a key factor in determining the rate of convergence. More precisely, the following concentration property of ULA holds, which is a modification of Theorem 5 in ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20]","element":"a"},{"text":".","element":"span"}],[{"id":"id-56","style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"2.3","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"It is important to note that if ","element":"span"},{"style":{"height":18.33},"width":524.12,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-3.png","element":"img","alt":" X0 ∼ e−U, then Xt ∼ e−U ","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-43","text":"(6) ","element":"a"},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Thus, we can regard the noise sequence in ","element":"span"},{"href":"#id-44","text":"(7) ","element":"a"},{"text":"to achieve ","element":"span"},{"style":{"height":15.1},"width":286.32,"height":37.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-4.png","element":"img","alt":" XN for N ∈ N","inline":true,"padRight":true},{"text":"as a realization of the continuous Brownian motion in ","element":"span"},{"href":"#id-43","text":"(6) ","element":"a"},{"text":"up to time ","element":"span"},{"style":{"height":24.4},"width":248.68,"height":61.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-5.png","element":"img","alt":" τ = �N−1j=0 γj","inline":true},{"text":", which is further specified in Appendix ","element":"span"},{"href":"#id-45","text":"A.1.","element":"a"}],[{"id":"id-48","style":{"fontWeight":"bold"},"text":"Theorem 2.4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that the pdf ","element":"span"},{"style":{"height":20.33},"width":258.1,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-6.png","element":"img","alt":" p(x) ∝ e−U(x) ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is strongly log-concave and ","element":"span"},{"style":{"height":19.13},"width":360.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-7.png","element":"img","alt":" λminI ⪯ ∇2U(x) ⪯","inline":true},{"style":{"height":16.4},"width":743.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-8.png","element":"img","alt":"λmaxI for all x, where λmax, λmin > 0","inline":true},{"style":{"fontStyle":"italic"},"text":". Let the stepsize be given by ","element":"span"},{"style":{"height":25.94},"width":532.4,"height":64.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-9.png","element":"img","alt":" γj ≡ γ = O� λminλ2max�and the","inline":true}],[{"style":{"fontStyle":"italic"},"text":"number of iterations ","element":"span"},{"style":{"height":23.89},"width":1179.15,"height":59.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-10.png","element":"img","alt":" N satisfy N = Ω�( λmaxλmin )2�.5 Given X0 ∈ arg min U(x), let pN","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the pdf of ","element":"span"},{"style":{"height":14.7},"width":66.15,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-11.png","element":"img","alt":" XN","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obtained by iterating ","element":"span"},{"href":"#id-44","text":"(7)","element":"a"},{"style":{"fontStyle":"italic"},"text":". Then, ","element":"span"},{"style":{"height":32.9},"width":571.84,"height":82.26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-12.png","element":"img","alt":" Ex∼p,˜x∼pN�|x − ˜x|2� 12 ≤ O��","inline":true}],[{"style":{"fontStyle":"italic"},"text":"solution to ","element":"span"},{"href":"#id-43","text":"(6) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":18.55},"width":244.69,"height":46.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-13.png","element":"img","alt":" X0 ∼ e−U(x) ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and the joint probability distribution of ","element":"span"},{"style":{"height":16},"width":428.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-14.png","element":"img","alt":" x ∼ p and ˜x ∼ pN is","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obtained via the shared Brownian motion.","element":"span"}]]},{"heading":"3 Online Learning Algorithm","paragraphs":[[{"text":"The naive TS approach for learning LQR has two main weaknesses. The first arises from the potential selection of a destabilizing controller, which can cause the system state to grow exponentially and lead to unbounded regret. To address this issue, we control the probability of the state exhibiting excessively large norms. The second weakness stems from inefficiencies in the sampling process when the system noise and prior distributions are not conjugate. In such cases, ULA offers an alternative for posterior approximation, but it is often extremely slow. To accelerate the sampling process, we introduce a preconditioning technique.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"3.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Preconditioned ULA for approximate posterior sampling","element":"span"}],[{"text":"One of the key components of our learning algorithm is approximate posterior sampling via preconditioned Langevin dynamics. The potential in ULA is chosen as ","element":"span"},{"style":{"height":17.6},"width":550.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-15.png","element":"img","alt":" Ut(θ) := − log p(θ|ht), where","inline":true},{"style":{"height":17.6},"width":109.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-16.png","element":"img","alt":"p(θ|ht","inline":true},{"text":") denotes the posterior distribution of the true system parameter given the history up to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Unfortunately, a direct implementation of ULA to TS for LQR is inefficient as it requires a large number of iterations. To accelerate the convergence of Langevin dynamics, we propose a preconditioning technique.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-17.png","element":"img","alt":"6","inline":true}],[{"text":"To describe the preconditioned Langevin dynamics, we choose a positive definite matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", referred to as a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"preconditioner","element":"span"},{"text":". The change of variables ","element":"span"},{"style":{"height":22.89},"width":797.63,"height":57.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-18.png","element":"img","alt":" θ′ = P12 θ yields dθτ = −P −1∇Ut(θτ)dτ +","inline":true},{"style":{"height":18.86},"width":212.88,"height":47.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-19.png","element":"img","alt":"√2P −1dBτ","inline":true},{"text":". Applying the Euler–Maruyama discretization with constant stepsize ","element":"span"},{"style":{"height":11.6},"width":24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-20.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"yields the preconditioned ULA:","element":"span"}],[{"style":{"width":"70%"},"width":1322,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-21.png","element":"img"}],[{"text":"where (","element":"span"},{"style":{"height":18.22},"width":134.59,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/5-22.png","element":"img","alt":"Wj)j≥1","inline":true,"padRight":true},{"text":"is an i.i.d. sequence of standard ","element":"span"},{"style":{"fontStyle":"italic"},"text":"dn","element":"span"},{"text":"-dimensional Gaussian random vectors.","element":"span"}],[{"text":"Given the data ","element":"span"},{"style":{"height":17.6},"width":205.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-0.png","element":"img","alt":" zt = (xt, ut","inline":true},{"text":") collected, the preconditioner in our setting is defined as","element":"span"}],[{"style":{"width":"67%"},"width":1262,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-1.png","element":"img"}],[{"text":"where blkdiag","element":"span"},{"style":{"height":20.02},"width":354.82,"height":50.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-2.png","element":"img","alt":"{Ai}ni=1 ∈ Rdn×dn","inline":true,"padRight":true},{"text":"denotes the block diagonal matrix of the ","element":"span"},{"style":{"height":16},"width":390.3,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-3.png","element":"img","alt":" Ai, and λ > 0 is a","inline":true,"padRight":true},{"text":"constant determined by the prior. Then, the curvature of the Hessian of the potential is bounded when scaled along the spectrum of the preconditioner, which is shown in the following lemma:","element":"span"}],[{"id":"id-47","style":{"fontWeight":"bold"},"text":"Lemma 3.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumption ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"holds and the potential of the prior satisfies ","element":"span"},{"style":{"height":20.05},"width":291.33,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-4.png","element":"img","alt":" ∇2θU1(·) = λIdn","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"for some ","element":"span"},{"style":{"height":13.2},"width":115.32,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-5.png","element":"img","alt":" λ > 0","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for all ","element":"span"},{"style":{"height":27.25},"width":1260.94,"height":68.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-6.png","element":"img","alt":" θ and t, we have mIdn ⪯ P− 12t ∇2Ut(θ)P− 12t ⪯ MIdn, where m =","inline":true,"padRight":true},{"text":"min","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"style":{"fontStyle":"italic"},"text":"m, ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"} ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"= max","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"style":{"fontStyle":"italic"},"text":"m, ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":"}","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"text":"The proof of this lemma can be found in Appendix ","element":"span"},{"href":"#id-46","text":"A.2. ","element":"a"},{"text":"It follows from Lemma ","element":"span"},{"href":"#id-47","text":"3.1 ","element":"a"},{"text":"and Theorem ","element":"span"},{"href":"#id-48","text":"2.4 ","element":"a"},{"text":"that we can rescale the number of iterations required for the convergence of ULA while ensuring improved accuracy in the concentration of the sampled system parameter. In fact, we show later that the number of required iterations scales only with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n","element":"span"},{"text":". To demonstrate the effect of preconditioning, note that Lemma ","element":"span"},{"href":"#id-47","text":"3.1 ","element":"a"},{"text":"implies ","element":"span"},{"style":{"height":19.13},"width":744.6,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-7.png","element":"img","alt":" mλmin(Pt)Idn ⪯ ∇2Ut ⪯ Mλmax(Pt)Idn","inline":true},{"text":". Theorem ","element":"span"},{"href":"#id-48","text":"2.4 ","element":"a"},{"text":"then implies that ","element":"span"},{"style":{"height":20.8},"width":475.94,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-8.png","element":"img","alt":" O�(λmax(Pt)/λmin(Pt))2�","inline":true,"padRight":true},{"text":"iterations are needed to achieve an error bound of ","element":"span"},{"style":{"height":21.59},"width":321.85,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-9.png","element":"img","alt":"O�1/�λmin(Pt)�","inline":true},{"text":". Our algorithm improves this bound to ","element":"span"},{"style":{"height":21.59},"width":481.85,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-10.png","element":"img","alt":" O�1/�max{λmin(Pt), t}�","inline":true},{"text":". Throughout the paper, we use the notation ","element":"span"},{"style":{"height":16.87},"width":202.06,"height":42.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-11.png","element":"img","alt":" Uk := Utk","inline":true,"padRight":true},{"text":"to explicitly indicate the dependence on the current episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"3.2","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Our preconditioner can be viewed as an adaptive scaling mechanism analogous to the Fisher information matrix in natural policy gradient methods. This connection arises because the empirical covariance matrix captures the local curvature of the posterior distribution, effectively conditioning the Langevin dynamics for more efficient sampling.","element":"span"}],[{"id":"id-42","style":{"fontWeight":"bold"},"text":"3.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Algorithm","element":"span"}],[{"text":"We begin by introducing the following log-concavity condition on the prior, centered arbitrarily. This condition is a slight relaxation of the assumption in ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10]","element":"a"},{"text":".","element":"span"}],[{"id":"id-57","style":{"width":"97%"},"width":1830,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-12.png","element":"img"}],[{"text":"The initialization of the preconditioner ","element":"span"},{"style":{"height":14.62},"width":40.02,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-13.png","element":"img","alt":" Pt","inline":true,"padRight":true},{"text":"plays a crucial role in the efficiency of the sampling process. If ","element":"span"},{"style":{"height":14.62},"width":45.02,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-14.png","element":"img","alt":" P0","inline":true,"padRight":true},{"text":"is too small, the algorithm may suffer from slow exploration due to small step sizes in the Langevin dynamics. Conversely, if ","element":"span"},{"style":{"height":14.62},"width":45.02,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-15.png","element":"img","alt":" P0","inline":true,"padRight":true},{"text":"is too large, the algorithm may place excessive trust in the prior, potentially slowing adaptation to the true system parameters. Our choice of ","element":"span"},{"style":{"height":15.02},"width":248.8,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-16.png","element":"img","alt":" P0 = λI with","inline":true,"padRight":true},{"text":"a moderate ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-17.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"ensures a balance between these effects. For mathematical convenience, it suffices to set ","element":"span"},{"style":{"height":13.2},"width":71.57,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-18.png","element":"img","alt":" λ >","inline":true,"padRight":true},{"text":"0, but we assume ","element":"span"},{"style":{"height":14.8},"width":71.58,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-19.png","element":"img","alt":" λ ≥","inline":true,"padRight":true},{"text":"1 to simplify the analysis.","element":"span"}],[{"text":"Following ","element":"span"},{"href":"#id-32","referenceIndex":43,"text":"[43]","element":"a"},{"text":", we consider an admissible set of parameters defined as ","element":"span"},{"style":{"height":19.53},"width":426.94,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-20.png","element":"img","alt":" C := {θ ∈ Rdn : |θ| ≤","inline":true},{"style":{"height":17.6},"width":693,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-21.png","element":"img","alt":"S, |A + BK(θ)| ≤ ρ < 1, J(θ) ≤ MJ}","inline":true,"padRight":true},{"text":"for some constants ","element":"span"},{"style":{"height":20.8},"width":800.32,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-22.png","element":"img","alt":" S, ρ, MJ > 0 where θ = vec(�A B�⊤). To","inline":true,"padRight":true},{"text":"sample from the posterior distribution, we restrict the sample to lie within ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"via rejection sampling. This ensures that for any sampled system parameter ","element":"span"},{"style":{"height":13.2},"width":104.91,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-23.png","element":"img","alt":" θ ∈ C","inline":true},{"text":", there exists a positive constant ","element":"span"},{"style":{"height":14.7},"width":83.52,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-24.png","element":"img","alt":" MP ∗","inline":true,"padRight":true},{"text":"such that ","element":"span"},{"style":{"height":17.6},"width":274.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-25.png","element":"img","alt":" |P ∗(θ)| ≤ MP ∗","inline":true,"padRight":true},{"href":"#id-31","referenceIndex":12,"text":"[12]","element":"a"},{"text":". Consequently, ","element":"span"},{"style":{"height":17.6},"width":683.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-26.png","element":"img","alt":" |[I K(θ)⊤]| ≤ MK for some MK >","inline":true,"padRight":true},{"text":"1, and therefore, ","element":"span"},{"style":{"height":18.62},"width":741.27,"height":46.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-27.png","element":"img","alt":"|A∗ + B∗K(θ)| ≤ Mρ for some Mρ ≥ 1.","inline":true}],[{"text":"Our proposed algorithm is presented in Algorithm ","element":"span"},{"href":"#id-49","text":"1. ","element":"a"},{"text":"We employ dynamic episode scheduling, as it has been shown to be effective in the literature ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10,","element":"a"},{"href":"#id-31","referenceIndex":12,"text":"12,","element":"a"},{"href":"#id-25","referenceIndex":37,"text":"37]","element":"a"},{"text":". In the algorithm, ","element":"span"},{"style":{"height":15.24},"width":174.46,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-28.png","element":"img","alt":" tk and Tk","inline":true,"padRight":true},{"text":"denote the start time and the length of episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", respectively. By definition, ","element":"span"},{"style":{"height":16.44},"width":594.72,"height":41.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/6-29.png","element":"img","alt":" t1 = 1 and tk+1 = tk + Tk. The","inline":true}],[{"id":"id-49","style":{"width":"100%"},"width":1872,"height":1436,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-0.png","element":"img"}],[{"text":"episode length is chosen as ","element":"span"},{"style":{"height":15.24},"width":127.16,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-1.png","element":"img","alt":" Tk = k","inline":true,"padRight":true},{"text":"+ 1. To update the posterior—or equivalently, the potential—at episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", we use the dataset ","element":"span"},{"style":{"height":18.22},"width":534.86,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-2.png","element":"img","alt":" D := {(zt, xt+1)}tk−1≤t≤tk−1","inline":true,"padRight":true},{"text":"collected during the previous episode. It follows from ","element":"span"},{"href":"#id-50","text":"(5) ","element":"a"},{"text":"that the potential can be updated as Line 5, where ","element":"span"},{"style":{"height":14.62},"width":55.6,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-3.png","element":"img","alt":" U0","inline":true,"padRight":true},{"text":"is initialized as ","element":"span"},{"style":{"height":15.2},"width":60.72,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-4.png","element":"img","alt":" U1,","inline":true,"padRight":true},{"text":"the potential of the prior. ","element":"span"},{"text":"Approximate TS is then performed using the preconditioned ULA with the preconditioner, step size, and number of iterations chosen as ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.27},"width":489.3,"height":43.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-5.png","element":"img","alt":"Pk := Ptk, ˜γk := γtk and","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"height":18.07},"width":527.48,"height":45.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-6.png","element":"img","alt":"Nk := max(1, ⌈Ntk⌉), where","inline":true}],[{"style":{"width":"98%"},"width":1837,"height":148,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-7.png","element":"img"}],[{"text":"Here, ","element":"span"},{"style":{"height":17.42},"width":314.34,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-8.png","element":"img","alt":" λmin,t and λmax,t","inline":true,"padRight":true},{"text":"denote the minimum and maximum eigenvalues of ","element":"span"},{"style":{"height":14.62},"width":40.02,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-9.png","element":"img","alt":" Pt","inline":true},{"text":". This choice is based on a detailed analysis of the concentration properties of ULA, as established in Proposition ","element":"span"},{"href":"#id-51","text":"4.1. ","element":"a"},{"text":"The additional operations on ","element":"span"},{"style":{"height":20.88},"width":356.98,"height":52.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-10.png","element":"img","alt":" Ntk ensure ˜Nk ∈ N","inline":true},{"text":", avoiding the possibility of infinite rejection when ˜","element":"span"},{"style":{"height":17.42},"width":1226.86,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-11.png","element":"img","alt":"Nk = 0. In the algorithm, we obtain the unique minimizer θmin,t","inline":true,"padRight":true},{"text":"using Newton’s method.","element":"span"}],[{"text":"After performing the preconditioned ULA update ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":14.84},"width":53.06,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-12.png","element":"img","alt":"Nk","inline":true,"padRight":true},{"text":"times, we check whether ","element":"span"},{"style":{"height":20.41},"width":241.06,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-13.png","element":"img","alt":" θ ˜Nk ∈ C. If","inline":true},{"text":"so, the sampled parameter is accepted and the corresponding control gain matrix is computed via ARE ","element":"span"},{"href":"#id-37","text":"(3)","element":"a"},{"text":". To ensure that the rejection step ends in a finite number of iterations, we assume that there exists a small positive constant ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-14.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"such that, for each episode ","element":"span"},{"style":{"height":20.6},"width":599.79,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/7-15.png","element":"img","alt":" k, Pr(˜θk ∈ C) ≥ 1 − ϵ under the","inline":true,"padRight":true},{"text":"posterior distribution. Although this assumption may appear restrictive, it has been empirically validated in all of our examples, as shown in Appendix ","element":"span"},{"href":"#id-52","text":"C.3.","element":"a"}],[{"style":{"width":"46%"},"width":873,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-0.png","element":"img"}],[{"text":"Figure 1: Infusing noise for enhanced exploration","element":"figcaption","subtype":"caption"}],[{"id":"id-53","text":"A novel component of our algorithm is the injection of a noise signal into the control input ","element":"span"},{"style":{"height":10.62},"width":36.98,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-1.png","element":"img","alt":" ut","inline":true,"padRight":true},{"text":"at the end of each episode as illustrated in Figure ","element":"span"},{"href":"#id-53","text":"1. ","element":"a"},{"text":"This perturbation enhances exploration. The external noise signal is assumed to satisfy the following:","element":"span"}],[{"id":"id-64","style":{"fontWeight":"bold"},"text":"Assumption 3.4. ","element":"span"},{"text":"The random variable ","element":"span"},{"style":{"height":17.43},"width":299.3,"height":43.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-2.png","element":"img","alt":" νs ∈ Rnu is ¯Lν","inline":true},{"text":"-sub-Gaussian,","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-3.png","element":"img","alt":"7 ","inline":true,"padRight":true},{"text":"and satisfies ","element":"span"},{"style":{"height":15.02},"width":183.8,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-4.png","element":"img","alt":" νs = 0 if","inline":true},{"style":{"height":18.22},"width":468.97,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-5.png","element":"img","alt":"s ∈ [tj, tj+1 − 2] for j ≥","inline":true,"padRight":true},{"text":"2. Moreover, ","element":"span"},{"style":{"height":17.6},"width":546.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-6.png","element":"img","alt":" E[νs] = 0 and W′ := E[νsν⊤s ","inline":true,"padRight":true},{"text":"] is a positive definite matrix ","element":"span"},{"text":"whose maximum and minimum eigenvalues are identical to those of ","element":"span"},{"style":{"height":15.13},"width":81.7,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-7.png","element":"img","alt":" W.8","inline":true}],[{"text":"Since our algorithm does not rely on a predefined stabilizing set of parameters, one may be concerned that the control policies generated during the early learning phase could exhibit instability due to limited data. To address this issue, our excitation mechanism ensures that the preconditioner matrix grows over time, thereby improving the concentration properties of the sampled system parameters, as shown in the following section.","element":"span"}]]},{"heading":"4 Concentration Properties","paragraphs":[[{"text":"To show that Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"achieves an ","element":"span"},{"style":{"height":19.98},"width":118.83,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-8.png","element":"img","alt":" O(√T","inline":true},{"text":") regret bound, we first examine the concentration properties of the exact and approximate posterior distributions given the history up to a fixed time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"for the potential ","element":"span"},{"style":{"height":21.6},"width":961.19,"height":54.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-9.png","element":"img","alt":" Ut(θ) = U1(θ)−�t−1s=1 log pw(xs+1−Θ⊤zs). When t","inline":true,"padRight":true},{"text":"is chosen as ","element":"span"},{"style":{"height":14.04},"width":33.76,"height":35.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-10.png","element":"img","alt":" tk","inline":true},{"text":", we recover the ","element":"span"},{"text":"case corresponding to Algorithm ","element":"span"},{"href":"#id-49","text":"1. ","element":"a"},{"text":"As illustrated in Figure ","element":"span"},{"href":"#id-54","text":"2, ","element":"a"},{"text":"the concentration results established in this section enable us to bound the moments of the system state, which is essential for attaining the desired regret bound in Section ","element":"span"},{"text":"5.","element":"span"}],[{"id":"id-54","style":{"width":"85%"},"width":1606,"height":451,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/8-11.png","element":"img"}],[{"text":"Figure 2: Flow chart of our theoretical results.","element":"figcaption","subtype":"caption"}],[{"style":{"fontWeight":"bold"},"text":"4.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Comparing exact and approximate posteriors","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-0.png","element":"img","alt":" µt","inline":true,"padRight":true},{"text":"denote the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"exact posterior ","element":"span"},{"text":"distribution defined by ","element":"span"},{"href":"#id-55","style":{"height":19.13},"width":317.2,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-1.png","element":"img","alt":" µt ∝ exp(−Ut).9 ","inline":true,"padRight":true},{"text":"For the approximate posterior, recall the preconditioned ULA that generates ","element":"span"},{"style":{"height":21.6},"width":761.63,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-2.png","element":"img","alt":" θj+1 ∼ N�θj − γtP −1t ∇Ut(θj), 2γtP −1t �","inline":true,"padRight":true},{"text":"starting from ","element":"span"},{"style":{"height":17.6},"width":319.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-3.png","element":"img","alt":" θ0 ∈ arg min Ut(·","inline":true},{"text":"). After repeating this update for ","element":"span"},{"style":{"height":14.62},"width":47.06,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-4.png","element":"img","alt":" Nt","inline":true,"padRight":true},{"text":"steps, we obtain ","element":"span"},{"style":{"height":16.7},"width":224.28,"height":41.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-5.png","element":"img","alt":" θNt. We let","inline":true,"padRight":true},{"text":"˜","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-6.png","element":"img","alt":"µt","inline":true,"padRight":true},{"text":"denote the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"approximate posterior","element":"span"},{"text":", defined as the distribution of ","element":"span"},{"style":{"height":16.7},"width":58.37,"height":41.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-7.png","element":"img","alt":" θNt","inline":true},{"text":". We first compare the exact and approximate posteriors. The result quantifies the concentration depending on the moment ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". The higher moment bound for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"2 is used to characterize a set of system parameters with which the state does not grow exponentially as illustrated in the following subsection, while the bound for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= 2 is necessary for our regret analysis. Throughout the paper, the joint distribution between ","element":"span"},{"style":{"height":20.21},"width":410.28,"height":50.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-8.png","element":"img","alt":"θt ∼ µt and ˜θt ∼ ˜µt","inline":true,"padRight":true},{"text":"is characterized via a shared Brownian path driving both the continuous Langevin diffusion and the discrete ULA dynamics with the preconditioner, as demonstrated in Remark ","element":"span"},{"href":"#id-56","text":"2.3.","element":"a"}],[{"id":"id-51","style":{"fontWeight":"bold"},"text":"Proposition 4.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"3.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Then, the exact posterior ","element":"span"},{"style":{"height":16.4},"width":203.9,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-9.png","element":"img","alt":" µt and the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"approximate posterior ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":12},"width":38.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-10.png","element":"img","alt":"µt","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"obtained via preconditioned ULA satisfy","element":"span"}],[{"style":{"width":"33%"},"width":632,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-11.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":26.11},"width":996.03,"height":65.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-12.png","element":"img","alt":" p ≥ 2, where Dp =� pdnm� p2 �22p+1 + 5p�. When p = 2","inline":true},{"style":{"fontStyle":"italic"},"text":", we further have","element":"span"}],[{"id":"id-59","style":{"width":"74%"},"width":1391,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":21.69},"width":412.2,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-14.png","element":"img","alt":" D = 114dnm and λmin,t","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the minimum eigenvalue of ","element":"span"},{"style":{"height":14.62},"width":55.24,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-15.png","element":"img","alt":" Pt.","inline":true}],[{"text":"The proof of this proposition is contained in Appendix ","element":"span"},{"href":"#id-58","text":"A.3. ","element":"a"},{"text":"Without the preconditioner, it would have been inevitable to obtain a result weaker than Proposition ","element":"span"},{"href":"#id-51","text":"4.1; ","element":"a"},{"text":"Theorem ","element":"span"},{"href":"#id-48","text":"2.4 ","element":"a"},{"text":"would yield a convergence rate of ","element":"span"},{"style":{"height":20.8},"width":242.05,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-16.png","element":"img","alt":" O(1/�λmin,t","inline":true},{"text":"), which is an LQR version of ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Theorem 5]. We infused the time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"into the step size required for ULA so that the right-hand side of ","element":"span"},{"href":"#id-59","text":"(11) ","element":"a"},{"text":"decreases with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Thus, max","element":"span"},{"style":{"height":18.22},"width":345.83,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-17.png","element":"img","alt":"{λmin,t, t} ≥ λmin,t","inline":true,"padRight":true},{"text":"contributes to an improved concentration property.","element":"span"}],[{"text":"Another important observation is a concentration bound for the exact posterior. This concentration property is essential for characterizing a confidence set used in the proof of Theorem ","element":"span"},{"href":"#id-60","text":"4.3.","element":"a"}],[{"id":"id-62","style":{"fontWeight":"bold"},"text":"Proposition 4.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"3.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Then, the following inequality","element":"span"}],[{"style":{"width":"85%"},"width":1606,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-18.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"holds with probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":16.4},"width":609.51,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-19.png","element":"img","alt":" − δ for any 0 < δ < 1 and p ≥ 2","inline":true},{"style":{"fontStyle":"italic"},"text":", where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depends only on ","element":"span"},{"style":{"height":17.42},"width":559,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-20.png","element":"img","alt":" p, m, n, d, and λ, and λmax,t","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the maximum eigenvalue of ","element":"span"},{"style":{"height":17.35},"width":89.55,"height":43.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/9-21.png","element":"img","alt":" Pt.10","inline":true}],[{"id":"id-55","text":"The proof of this proposition can be found in Appendix ","element":"span"},{"href":"#id-61","text":"A.4.","element":"a"}],[{"id":"id-93","style":{"fontWeight":"bold"},"text":"4.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Bounding expected state norms by a polynomial of time","element":"span"}],[{"text":"A key result we derive from Propositions ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-62","text":"4.2 ","element":"a"},{"text":"is that the system state grows at most polynomially in expectation over time. To show this property, we modify the confidence set construction and self-normalization technique developed for the OFU approach ","element":"span"},{"href":"#id-25","referenceIndex":37,"text":"[37,","element":"a"},{"href":"#id-63","referenceIndex":53,"text":"53]","element":"a"},{"text":". Our key idea is to construct a set that contains the system parameters sampled via ULA with high probability. The higher-moment bounds from Propositions ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-62","text":"4.2 ","element":"a"},{"text":"are crucial to our analysis as Markov-type inequalities can be exploited for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". We then partition the probability space of the stochastic process into two sets, “good” and “bad,” as in the OFU approach.","element":"span"}],[{"id":"id-60","style":{"fontWeight":"bold"},"text":"Theorem 4.3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1,","element":"a"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"3.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. For ","element":"span"},{"style":{"height":15.6},"width":281.55,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-0.png","element":"img","alt":" T > 0, p ≥ 2","inline":true},{"style":{"fontStyle":"italic"},"text":", and a random trajectory ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.81},"width":118.56,"height":49.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-1.png","element":"img","alt":"xs)Ts=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"generated by Algorithm ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"35%"},"width":659,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-2.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depends only on ","element":"span"},{"style":{"height":17.82},"width":532.29,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-3.png","element":"img","alt":" p, m, n, nu, W, Mρ and λ.","inline":true}],[{"text":"The proof of this theorem can be found in Appendix ","element":"span"},{"href":"#id-65","text":"A.5. ","element":"a"},{"text":"It is worth emphasizing that this polynomial-time bound is attained without using predefined sets of parameters that make the true system stabilizable. In Section ","element":"span"},{"text":"5, ","element":"span"},{"text":"we will further improve the result to a uniform bound, which plays a critical role in our regret analysis.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"4.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Concentration of exact and approximate posteriors","element":"span"}],[{"text":"Leveraging the previous results on the concentration and the expected state norms, we can deduce that the minimum eigenvalue of the preconditioner actually grows in time. Exploiting this property and Theorem ","element":"span"},{"href":"#id-60","text":"4.3, ","element":"a"},{"text":"an improved concentration property of the exact posterior follows. Finally, the triangle inequality yields the desired result, the concentration of the approximate posterior around the true system parameter.","element":"span"}],[{"text":"We begin by characterizing the growth of the minimum eigenvalue of the preconditioner which results from injecting a random noise signal ","element":"span"},{"style":{"height":10.62},"width":37.56,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-4.png","element":"img","alt":" νs","inline":true,"padRight":true},{"text":"to perturb the action at the end of each episode. To derive this result, we decompose the preconditioner in each episode into two parts—a random matrix and a self-normalized matrix-valued process—as in ","element":"span"},{"href":"#id-22","referenceIndex":34,"text":"[34]","element":"a"},{"text":". Specifically, by Lemma ","element":"span"},{"href":"#id-66","text":"B.4,","element":"a"}],[{"style":{"width":"99%"},"width":1872,"height":300,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-5.png","element":"img"}],[{"text":"matrix used in the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":"th episode. The random matrix part contributes the growth of the minimum eigenvalue of the preconditioner with high probability. More precisely, the following proposition holds:","element":"span"}],[{"id":"id-68","style":{"fontWeight":"bold"},"text":"Proposition 4.4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1–","element":"a"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. For ","element":"span"},{"style":{"height":18.62},"width":621.91,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-6.png","element":"img","alt":" k ≥ k0(m, n, nu, λ, MK, Mρ, W),","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"31%"},"width":583,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-7.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":15.24},"width":77.58,"height":38.1,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-8.png","element":"img","alt":" tk+1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the start time of episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"+ 1 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"in Algorithm ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"height":18.87},"width":157.21,"height":47.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-9.png","element":"img","alt":" λmin,tk+1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denotes the minimum eigenvalue of ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":18.47},"width":255.98,"height":46.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-10.png","element":"img","alt":"Pk+1 := Ptk+1","inline":true},{"style":{"fontStyle":"italic"},"text":", and the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depends only on ","element":"span"},{"style":{"height":16},"width":440.06,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/10-11.png","element":"img","alt":" p, n, nu, W, MK and λ.","inline":true}],[{"text":"The proof of this proposition can be found in Appendix ","element":"span"},{"href":"#id-67","text":"A.6. ","element":"a"},{"text":"Recalling the probabilistic bound for ","element":"span"},{"style":{"height":17.6},"width":175.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-0.png","element":"img","alt":" |θt −θ∗|Pt","inline":true,"padRight":true},{"text":"from Proposition ","element":"span"},{"href":"#id-62","text":"4.2, ","element":"a"},{"text":"we observe that ","element":"span"},{"style":{"height":17.6},"width":142.82,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-1.png","element":"img","alt":" |θt −θ∗|","inline":true,"padRight":true},{"text":"is controlled by 1","element":"span"},{"style":{"height":20.8},"width":168.77,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-2.png","element":"img","alt":"/�λmin,t","inline":true,"padRight":true},{"text":"and the self-normalization term. Using Theorem ","element":"span"},{"href":"#id-60","text":"4.3, ","element":"a"},{"text":"we can show that the latter is dominated by the former, which grows at most polynomially in time due to Proposition ","element":"span"},{"href":"#id-68","text":"4.4. ","element":"a"},{"text":"Consequently, the following improved concentration bound holds for the exact posterior.","element":"span"}],[{"id":"id-70","style":{"fontWeight":"bold"},"text":"Theorem 4.5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1–","element":"a"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Then, the exact posterior ","element":"span"},{"style":{"height":12},"width":38.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-3.png","element":"img","alt":" µt","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and the approximate posterior ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-4.png","element":"img","alt":"µt","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"realized from the shared Brownian motion satisfy","element":"span"}],[{"style":{"width":"93%"},"width":1741,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":16},"width":317.69,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-6.png","element":"img","alt":" t ≥ 1 and p ≥ 2","inline":true},{"style":{"fontStyle":"italic"},"text":", where the outer expectation is taken over all histories, and the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depends only on ","element":"span"},{"style":{"height":17.82},"width":553.7,"height":44.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-7.png","element":"img","alt":" p, n, nu, W, MK, Mρ, and λ.","inline":true}],[{"text":"The proof of this theorem can be found in Appendix ","element":"span"},{"href":"#id-69","text":"A.7.","element":"a"}]]},{"heading":"5 Regret Bound","paragraphs":[[{"text":"To further improve the bound in Theorem ","element":"span"},{"href":"#id-60","text":"4.3, ","element":"a"},{"text":"we decompose the moment of the system state into two parts based on the following cases: ","element":"span"},{"style":{"height":20.61},"width":827.56,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-8.png","element":"img","alt":" |˜θt − θ∗| ≤ ϵ0 and |˜θt − θ∗| > ϵ0, where ϵ0","inline":true,"padRight":true},{"text":"is a positive constant. When ","element":"span"},{"style":{"height":10.22},"width":34.71,"height":25.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-9.png","element":"img","alt":" ϵ0","inline":true,"padRight":true},{"text":"is sufficiently small, we have ","element":"span"},{"style":{"height":20.6},"width":332.81,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-10.png","element":"img","alt":" |A∗ + B∗K(˜θt)| <","inline":true,"padRight":true},{"text":"1, and thus the first part can be easily handled. For the second part, we invoke the Markov inequality to balance the growth of the state with the tail probability by choosing an appropriate value of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". This intuitive argument can be made rigorous using Theorems ","element":"span"},{"href":"#id-60","text":"4.3 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-70","text":"4.5, ","element":"a"},{"text":"leading to the following result.","element":"span"}],[{"id":"id-113","style":{"fontWeight":"bold"},"text":"Theorem 5.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1-","element":"a"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. For any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and a random trajectory ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.81},"width":118.56,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-11.png","element":"img","alt":"xs)Ts=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"generated by Algorithm ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"we have","element":"span"}],[{"style":{"width":"23%"},"width":440,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depends only on ","element":"span"},{"style":{"height":17.82},"width":777.98,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-13.png","element":"img","alt":" p, n, nu, W, MK, Mρ, ϵ0, and λ. Here, ϵ0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a positive constant such that ","element":"span"},{"style":{"height":17.6},"width":768.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-14.png","element":"img","alt":" |θ − θ∗| ≤ ϵ0 implies |A∗ + B∗K(θ)| < 1.","inline":true}],[{"text":"The proof of this theorem can be found in Appendix ","element":"span"},{"href":"#id-71","text":"A.8. ","element":"a"},{"text":"Finally, we establish our main result: Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"achieves an ","element":"span"},{"style":{"height":19.98},"width":118.82,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-15.png","element":"img","alt":" O(√T","inline":true},{"text":") Bayesian regret bound.","element":"span"}],[{"id":"id-115","style":{"fontWeight":"bold"},"text":"Theorem 5.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1-","element":"a"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Then, the Bayesian regret ","element":"span"},{"href":"#id-72","text":"(4) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"of Algorithm ","element":"span"},{"href":"#id-49","style":{"fontStyle":"italic"},"text":"1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"is bounded as follows:","element":"span"}],[{"style":{"width":"16%"},"width":301,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-16.png","element":"img"}],[{"text":"The proof of this theorem can be found in Appendix ","element":"span"},{"href":"#id-73","text":"A.9. ","element":"a"},{"text":"The regret bound is empirically verified by the results of our experiments. See Appendix ","element":"span"},{"text":"C ","element":"span"},{"text":"for our empirical analyses.","element":"span"}]]},{"heading":"6 Concluding Remarks","paragraphs":[[{"text":"We proposed a novel approximate Thompson sampling algorithm for learning LQR with an improved ","element":"span"},{"style":{"height":19.98},"width":118.84,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/11-17.png","element":"img","alt":" O(√T","inline":true},{"text":") regret bound. Our method does not require the noise to be Gaussian or the columns of Θ to be independent. This relaxation of restrictive assumptions is enabled by a carefully designed preconditioned ULA and the use of perturbed control actions only at the end of each episode.","element":"span"}],[{"text":"As a future research direction, it may be possible to extend our algorithm to settings with noise distributions having non-log-concave potentials. In our work, the log-concavity of the posterior potential is preserved under the considered noise models, which enables acceleration of the sampling process through preconditioning. To handle more general classes of noise, alternative techniques beyond the current ULA framework may be necessary. Recently, ","element":"span"},{"href":"#id-74","referenceIndex":54,"text":"[54] ","element":"a"},{"text":"derived sharp non-asymptotic convergence rates for Langevin dynamics in nonconvex settings. We plan to investigate the incorporation of such results into our framework.","element":"span"}]]},{"heading":"A Proofs","paragraphs":[[{"id":"id-45","style":{"fontWeight":"bold"},"text":"A.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-48","style":{"fontWeight":"bold"},"text":"2.4","element":"a"}],[{"text":"To prove Theorem ","element":"span"},{"href":"#id-48","text":"2.4, ","element":"a"},{"text":"we use the following lemma.","element":"span"}],[{"id":"id-80","style":{"fontWeight":"bold"},"text":"Lemma A.1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumption ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"holds. Let ","element":"span"},{"style":{"height":12.8},"width":164.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-0.png","element":"img","alt":" X ∈ Rnx ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a random variable with probability density function ","element":"span"},{"style":{"height":20.95},"width":1544.8,"height":52.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-1.png","element":"img","alt":" p(x) ∝ e−U(x), where λminInx ⪯ ∇2U ⪯ λmaxInx for λmax, λmin > 0. Let {Yj},","inline":true},{"style":{"height":17.02},"width":165.26,"height":42.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-2.png","element":"img","alt":"Yj ∈ Rnx","inline":true},{"style":{"fontStyle":"italic"},"text":", be generated by the ULA as","element":"span"}],[{"style":{"width":"33%"},"width":624,"height":54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.62},"width":42.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-4.png","element":"img","alt":" Y0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a random variable with an arbitrary density function. If ","element":"span"},{"style":{"height":25.94},"width":195.13,"height":64.86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-5.png","element":"img","alt":" γ ≤ λmin16λ2max ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", then we have","element":"span"}],[{"style":{"width":"51%"},"width":958,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":17.42},"width":191.47,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-7.png","element":"img","alt":" X and Yj","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are understood via the shared Brownian motion in continuous and discretized stochastic differential equations as demonstrated in Remark ","element":"span"},{"href":"#id-56","style":{"fontStyle":"italic"},"text":"2.3.","element":"a"}],[{"style":{"height":18.22},"width":371.1,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-8.png","element":"img","alt":"Proof. Let {Zτ}τ≥0","inline":true,"padRight":true},{"text":"be a continuous interpolation of ","element":"span"},{"style":{"height":18.22},"width":86.68,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-9.png","element":"img","alt":" {Yj}","inline":true},{"text":", defined by","element":"span"}],[{"id":"id-77","style":{"width":"77%"},"width":1458,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-10.png","element":"img"}],[{"text":"Note that lim","element":"span"},{"style":{"height":19.24},"width":1047.06,"height":48.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-11.png","element":"img","alt":"τ↗jγ Zτ = Yj = limτ↘jγ Zτ for each j, and thus {Zτ}","inline":true,"padRight":true},{"text":"is a continuous process. We introduce another stochastic process ","element":"span"},{"style":{"height":17.6},"width":100.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-12.png","element":"img","alt":" {Xτ}","inline":true},{"text":", defined by","element":"span"}],[{"style":{"width":"31%"},"width":581,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-13.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":53.16,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-14.png","element":"img","alt":" X0","inline":true,"padRight":true},{"text":"is a random variable with pdf ","element":"span"},{"style":{"height":20.33},"width":257.24,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-15.png","element":"img","alt":" p(x) ∝ e−U(x)","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-75","text":"A.2, ","element":"a"},{"style":{"height":14.62},"width":54.14,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-16.png","element":"img","alt":" Xτ","inline":true,"padRight":true},{"text":"has the same pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") for all ","element":"span"},{"style":{"height":8},"width":23,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-17.png","element":"img","alt":" τ","inline":true},{"text":". We use the same Brownian motion ","element":"span"},{"style":{"height":14.62},"width":51.1,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-18.png","element":"img","alt":" Bτ","inline":true,"padRight":true},{"text":"to define both ","element":"span"},{"style":{"height":17.6},"width":296.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-19.png","element":"img","alt":" {Zτ} and {Xτ}","inline":true},{"text":". Fix an arbitrary ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". Differentiating ","element":"span"},{"style":{"height":19.13},"width":201.93,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-20.png","element":"img","alt":" |Zτ − Xτ|2 ","inline":true,"padRight":true},{"text":"with respect to ","element":"span"},{"style":{"height":17.6},"width":448.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-21.png","element":"img","alt":" τ ∈ [jγ, (j + 1)γ) yields","inline":true}],[{"id":"id-76","style":{"width":"91%"},"width":1705,"height":178,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-22.png","element":"img"}],[{"text":"Therefore, we have","element":"span"}],[{"style":{"width":"74%"},"width":1400,"height":197,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/12-23.png","element":"img"}],[{"text":"where the first inequality follows from the strong convexity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":". On the other hand, using Young’s inequality, we have","element":"span"}],[{"style":{"width":"74%"},"width":1387,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-0.png","element":"img"}],[{"text":"Combining all together, we deduce that","element":"span"}],[{"style":{"width":"81%"},"width":1521,"height":299,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-1.png","element":"img"}],[{"text":"Integrating both sides from ","element":"span"},{"style":{"height":17.6},"width":266.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-2.png","element":"img","alt":" jγ to (j + 1)γ","inline":true,"padRight":true},{"text":"and then multiplying ","element":"span"},{"style":{"height":19.13},"width":400.96,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-3.png","element":"img","alt":" e−λmin(j+1)γ, we have","inline":true}],[{"style":{"width":"78%"},"width":1467,"height":191,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-4.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":15.02},"width":187.77,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-5.png","element":"img","alt":" Xt and X","inline":true,"padRight":true},{"text":"have the same pdf, we have","element":"span"}],[{"style":{"width":"92%"},"width":1724,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-6.png","element":"img"}],[{"text":"where the first inequality follows from ","element":"span"},{"style":{"height":18.33},"width":338.98,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-7.png","element":"img","alt":" e−λmin((j+1)γ−s) ≤","inline":true,"padRight":true},{"text":"1 and the second inequality follows from the Lipschitz smoothness of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":".","element":"span"}],[{"text":"To bound ","element":"span"},{"href":"#id-76","text":"(A.2)","element":"a"},{"text":", we handle its first and second terms separately. Regarding the second term, we first integrate the SDE ","element":"span"},{"href":"#id-77","text":"(A.1) ","element":"a"},{"text":"from ","element":"span"},{"style":{"height":17.6},"width":417.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-8.png","element":"img","alt":" jγ to s ∈ [jγ, (j + 1)γ","inline":true},{"text":") to obtain","element":"span"}],[{"id":"id-78","style":{"width":"46%"},"width":863,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-9.png","element":"img"}],[{"text":"The second term of ","element":"span"},{"href":"#id-76","text":"(A.2) ","element":"a"},{"text":"can then be bounded by","element":"span"}],[{"style":{"width":"91%"},"width":1713,"height":250,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-10.png","element":"img"}],[{"text":"For ","element":"span"},{"style":{"height":17.6},"width":304.34,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-11.png","element":"img","alt":" s ∈ [jγ, (j + 1)γ","inline":true},{"text":"), we note that ","element":"span"},{"style":{"height":17.6},"width":419.57,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-12.png","element":"img","alt":" |s − jγ| ≤ γ, and thus","inline":true}],[{"style":{"width":"81%"},"width":1524,"height":326,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/13-13.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.62},"width":81.56,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-0.png","element":"img","alt":" xmin","inline":true,"padRight":true},{"text":"is a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":". It follows from ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 9] that","element":"span"}],[{"style":{"width":"71%"},"width":1337,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-1.png","element":"img"}],[{"text":"Moreover, ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 8] yields","element":"span"}],[{"id":"id-79","style":{"width":"67%"},"width":1263,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-2.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-78","text":"(A.3)","element":"a"},{"text":"–","element":"span"},{"href":"#id-79","text":"(A.6)","element":"a"},{"text":", we obtain that","element":"span"}],[{"id":"id-81","style":{"width":"94%"},"width":1764,"height":627,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-3.png","element":"img"}],[{"text":"where the second inequality follows from the fact that ","element":"span"},{"style":{"height":19.22},"width":419.8,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-4.png","element":"img","alt":" e−x ≤ 1− x2 for x ∈ [0,","inline":true,"padRight":true},{"text":"1]. To further simplify","element":"span"}],[{"text":"the upper-bound, we use the following two inequalities: 2","element":"span"},{"style":{"height":27.58},"width":764.62,"height":68.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-5.png","element":"img","alt":"2 λ4maxλmin γ3 = λmin64 � 16λ2maxλmin �2γ3 ≤ λmin64 γ and","inline":true},{"style":{"height":24.21},"width":692.19,"height":60.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-6.png","element":"img","alt":"�1 − λmin4 γ�2 + λmin64 γ ≤�1 − λmin8 γ�2","inline":true},{"text":". Consequently, ","element":"span"},{"style":{"height":21.49},"width":311.64,"height":53.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-7.png","element":"img","alt":" E[|Z(j+1)γ − X|2","inline":true},{"text":"] is bounded as","element":"span"}],[{"style":{"width":"63%"},"width":1187,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-8.png","element":"img"}],[{"text":"Invoking this inequality repeatedly yields","element":"span"}],[{"style":{"width":"86%"},"width":1620,"height":411,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-9.png","element":"img"}],[{"text":"Since (1 ","element":"span"},{"style":{"height":27.64},"width":761.06,"height":69.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-10.png","element":"img","alt":" − λmin8 γ) ≤ ( 12)λmin8 γ and Z(j+1)γ = Yj+1","inline":true},{"text":", we conclude that","element":"span"}],[{"style":{"width":"79%"},"width":1493,"height":131,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/14-11.png","element":"img"}],[{"text":"Replacing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"+ 1 with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":", the result follows.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-48","style":{"fontStyle":"italic"},"text":"2.4. ","element":"a"},{"text":"We now prove Theorem ","element":"span"},{"href":"#id-48","text":"2.4. ","element":"a"},{"text":"It follows from ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 10] that","element":"span"}],[{"style":{"width":"31%"},"width":587,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":10.62},"width":81.56,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-1.png","element":"img","alt":" xmin","inline":true,"padRight":true},{"text":"is a minimizer of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"U","element":"span"},{"text":". ","element":"span"},{"text":"Using Lemma ","element":"span"},{"href":"#id-80","text":"A.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":15.02},"width":172.32,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-2.png","element":"img","alt":" nx = dn","inline":true,"padRight":true},{"text":"and the initial distribution ","element":"span"},{"style":{"height":17.08},"width":200.05,"height":42.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-3.png","element":"img","alt":"X0 ∼ δxmin","inline":true},{"text":", we obtain that","element":"span"}],[{"style":{"width":"64%"},"width":1215,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-4.png","element":"img"}],[{"text":"Taking the stepsize and the number of steps as ","element":"span"},{"style":{"height":31.1},"width":518.42,"height":77.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-5.png","element":"img","alt":" γ = λmin16λ2max and N = 64λ2maxλ2min ","inline":true,"padRight":true},{"text":", respectively, the first ","element":"span"},{"text":"and second terms on the RHS of the inequality above are bounded as","element":"span"}],[{"style":{"width":"60%"},"width":1133,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-6.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"22%"},"width":413,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-7.png","element":"img"}],[{"text":"respectively. Therefore, we conclude that","element":"span"}],[{"style":{"width":"49%"},"width":926,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-8.png","element":"img"}],[{"text":"as desired.","element":"span"}],[{"id":"id-46","style":{"fontWeight":"bold"},"text":"A.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-47","style":{"fontWeight":"bold"},"text":"3.1","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"By direct calculation, we first observe that","element":"span"}],[{"style":{"width":"60%"},"width":1129,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-9.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":12},"width":34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-10.png","element":"img","alt":" ⊗","inline":true,"padRight":true},{"text":"denotes Kronecker product. Then, the Hessian ","element":"span"},{"style":{"height":20.05},"width":97.09,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-11.png","element":"img","alt":" ∇2θUt","inline":true,"padRight":true},{"text":"is given by","element":"span"}],[{"style":{"width":"52%"},"width":986,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-12.png","element":"img"}],[{"text":"Under Assumption ","element":"span"},{"href":"#id-38","text":"2.1, ","element":"a"},{"text":"for any state action pair ","element":"span"},{"style":{"height":17.6},"width":412.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-13.png","element":"img","alt":" zs = (xs, us), we have","inline":true}],[{"style":{"width":"84%"},"width":1575,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-14.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"92%"},"width":1736,"height":281,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-15.png","element":"img"}],[{"text":"Finally, letting the preconditioner ","element":"span"},{"style":{"height":21.6},"width":708.13,"height":54.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/15-16.png","element":"img","alt":" Pt := λIdn + �t−1s=1 blkdiag({zsz⊤s }ni=1","inline":true},{"text":"), the result follows.","element":"span"}],[{"id":"id-58","style":{"fontWeight":"bold"},"text":"A.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Proposition ","element":"span"},{"href":"#id-51","style":{"fontWeight":"bold"},"text":"4.1","element":"a"}],[{"text":"To prove Proposition ","element":"span"},{"href":"#id-51","text":"4.1, ","element":"a"},{"text":"we first introduce the following two lemmas regarding the stationarity of the preconditioned Langevin diffusion and the non-asymptotic behavior of the preconditioned ULA.","element":"span"}],[{"id":"id-75","style":{"fontWeight":"bold"},"text":"Lemma A.2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that Assumption ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"holds. Let ","element":"span"},{"style":{"height":14.62},"width":186.02,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-0.png","element":"img","alt":" Xτ ∈ Rnx ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the solution of the preconditioned Langevin equation","element":"span"}],[{"style":{"width":"39%"},"width":747,"height":59,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.62},"width":53.16,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-2.png","element":"img","alt":" X0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is distributed according to ","element":"span"},{"style":{"height":20.33},"width":591.01,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-3.png","element":"img","alt":" p(x) ∝ e−U(x), and P ∈ Rnx×nx ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is an arbitrary positive definite matrix. Then, ","element":"span"},{"style":{"height":14.62},"width":54.15,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-4.png","element":"img","alt":" Xτ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has the same probability density ","element":"span"},{"style":{"height":17.6},"width":345.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-5.png","element":"img","alt":" p(x) for all τ ≥ 0.","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Consider the following Fokker-Planck equation associated with the preconditioned Langevin equation:","element":"span"}],[{"style":{"width":"92%"},"width":1735,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-6.png","element":"img"}],[{"text":"It is well known that ","element":"span"},{"style":{"height":17.6},"width":105.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-7.png","element":"img","alt":" q(x, τ","inline":true},{"text":") is the probability density function of ","element":"span"},{"style":{"height":14.62},"width":54.15,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-8.png","element":"img","alt":" Xτ","inline":true},{"text":". We can check that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") is a solution of the Fokker-Planck equation by plugging ","element":"span"},{"style":{"height":17.6},"width":245.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-9.png","element":"img","alt":" q(x, τ) = p(x","inline":true},{"text":") into ","element":"span"},{"href":"#id-81","text":"(A.7)","element":"a"},{"text":". Specifically,","element":"span"}],[{"style":{"width":"92%"},"width":1732,"height":280,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-10.png","element":"img"}],[{"text":"Since the Fokker-Planck equation has a unique smooth solution ","element":"span"},{"href":"#id-82","referenceIndex":48,"text":"[48]","element":"a"},{"text":", we conclude that ","element":"span"},{"style":{"height":17.6},"width":254.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-11.png","element":"img","alt":" q(x, t) ≡ p(x)","inline":true,"padRight":true},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", and the result follows.","element":"span"}],[{"id":"id-87","style":{"fontWeight":"bold"},"text":"Lemma A.3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose Assumption ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"holds. Let ","element":"span"},{"style":{"height":12.8},"width":164.84,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-12.png","element":"img","alt":" X ∈ Rnx ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a random variable with probability density function ","element":"span"},{"style":{"height":20.33},"width":272.59,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-13.png","element":"img","alt":" p(x) ∝ e−U(x)","inline":true},{"style":{"fontStyle":"italic"},"text":", and the stochastic process ","element":"span"},{"style":{"height":18.22},"width":301.42,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-14.png","element":"img","alt":" {Yj}, Yj ∈ Rnx","inline":true},{"style":{"fontStyle":"italic"},"text":", be generated by the preconditioned ULA as","element":"span"}],[{"style":{"width":"41%"},"width":782,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-15.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":14.62},"width":42.33,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-16.png","element":"img","alt":" Y0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a random variable with an arbitrary density function, and ","element":"span"},{"style":{"height":13.93},"width":222.06,"height":34.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-17.png","element":"img","alt":" P ∈ Rnx×nx ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is a positive def-","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"inite matrix with minimum eigenvalue ","element":"span"},{"style":{"height":15.02},"width":82.07,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-18.png","element":"img","alt":" λmin","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and maximum eigenvalue ","element":"span"},{"style":{"height":25.96},"width":536.97,"height":64.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-19.png","element":"img","alt":" λmax. If γ ≤ mλmin16M2 max{λmin,t}","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":22.7},"width":581.93,"height":56.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-20.png","element":"img","alt":" mInx ⪯ P − 12 ∇2UP − 12 ⪯ MInx","inline":true},{"style":{"fontStyle":"italic"},"text":", then we have","element":"span"}],[{"style":{"width":"59%"},"width":1112,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-21.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for any ","element":"span"},{"style":{"height":17.42},"width":428.34,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-22.png","element":"img","alt":" p ≥ 2 where X and Yj","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are understood via the shared Brownian motion in continuous and discretized stochastic differential equations as demonstrated in Remark ","element":"span"},{"href":"#id-56","style":{"fontStyle":"italic"},"text":"2.3.","element":"a"}],[{"style":{"height":18.22},"width":371.1,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-23.png","element":"img","alt":"Proof. Let {Zτ}τ≥0","inline":true,"padRight":true},{"text":"be a continuous interpolation of ","element":"span"},{"style":{"height":18.22},"width":86.68,"height":45.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-24.png","element":"img","alt":" {Yj}","inline":true},{"text":", defined by","element":"span"}],[{"id":"id-84","style":{"width":"82%"},"width":1537,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/16-25.png","element":"img"}],[{"text":"Note that lim","element":"span"},{"style":{"height":19.24},"width":1047.06,"height":48.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-0.png","element":"img","alt":"τ↗jγ Zτ = Yj = limτ↘jγ Zτ for each j, and thus {Zτ}","inline":true,"padRight":true},{"text":"is a continuous process. We introduce another stochastic process ","element":"span"},{"style":{"height":17.6},"width":100.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-1.png","element":"img","alt":" {Xτ}","inline":true},{"text":", defined by","element":"span"}],[{"style":{"width":"39%"},"width":747,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":53.16,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-3.png","element":"img","alt":" X0","inline":true,"padRight":true},{"text":"is a random variable with pdf ","element":"span"},{"style":{"height":20.33},"width":257.23,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-4.png","element":"img","alt":" p(x) ∝ e−U(x)","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-75","text":"A.2, ","element":"a"},{"style":{"height":14.62},"width":54.15,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-5.png","element":"img","alt":" Xτ","inline":true,"padRight":true},{"text":"has the same pdf ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") for all ","element":"span"},{"style":{"height":8},"width":23,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-6.png","element":"img","alt":" τ","inline":true},{"text":". We use the same Brownian motion ","element":"span"},{"style":{"height":14.62},"width":51.1,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-7.png","element":"img","alt":" Bτ","inline":true,"padRight":true},{"text":"to define both ","element":"span"},{"style":{"height":17.6},"width":306.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-8.png","element":"img","alt":" {Zτ} and {Xτ}.","inline":true}],[{"text":"Fix an arbitrary ","element":"span"},{"style":{"height":16},"width":278.9,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-9.png","element":"img","alt":" j. For any p ≥","inline":true,"padRight":true},{"text":"2, differentiating ","element":"span"},{"style":{"height":23.99},"width":561.94,"height":59.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-10.png","element":"img","alt":" |Zτ − Xτ|pP = |P12 (Zτ − Xτ)|p ","inline":true,"padRight":true},{"text":"with respect to","element":"span"}],[{"style":{"width":"86%"},"width":1622,"height":334,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-11.png","element":"img"}],[{"text":"Noting that ","element":"span"},{"style":{"height":22.7},"width":763.17,"height":56.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-12.png","element":"img","alt":" mInx ⪯ P − 12 ∇2UP − 12 ⪯ MInx, we have","inline":true}],[{"id":"id-83","style":{"width":"95%"},"width":1788,"height":221,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-13.png","element":"img"}],[{"text":"where the first inequality follows from the mean value theorem. Now, recall the generalized Young’s inequality, ","element":"span"},{"style":{"height":26.99},"width":833.21,"height":67.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-14.png","element":"img","alt":" ab ≤ sαaαα + s−βbββ for s > 0, a, b, α, β >","inline":true,"padRight":true},{"text":"0 such that ","element":"span"},{"style":{"height":23.69},"width":544.3,"height":59.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-15.png","element":"img","alt":"1α + 1β = 1. Choosing s =","inline":true,"padRight":true},{"text":"( ","element":"span"},{"style":{"height":25.38},"width":771.97,"height":63.46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-16.png","element":"img","alt":"pm2(p−1))(p−1)/p, α = pp−1, and β = p yields","inline":true}],[{"style":{"width":"75%"},"width":1421,"height":191,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-17.png","element":"img"}],[{"text":"Combining all together with ","element":"span"},{"style":{"height":22.94},"width":383.68,"height":57.35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-18.png","element":"img","alt":"pm2(p−1) ≥ m2 , we have","inline":true}],[{"style":{"width":"72%"},"width":1363,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-19.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"65%"},"width":1235,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-20.png","element":"img"}],[{"text":"Integrating both sides from ","element":"span"},{"style":{"height":17.6},"width":251.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-21.png","element":"img","alt":" jγ to (j +1)γ","inline":true,"padRight":true},{"text":"and then multiplying both sides by ","element":"span"},{"style":{"height":17.59},"width":201.91,"height":43.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-22.png","element":"img","alt":" e− pm2 (j+1)γ","inline":true},{"text":", we obtain that","element":"span"}],[{"style":{"width":"87%"},"width":1642,"height":186,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/17-23.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":15.02},"width":194.25,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-0.png","element":"img","alt":" Xτ and X","inline":true,"padRight":true},{"text":"have the same pdf due to Lemma ","element":"span"},{"href":"#id-75","text":"A.2, ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"height":20.84},"width":334.8,"height":52.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-1.png","element":"img","alt":"E[|Z(j+1)γ − X|pP ]","inline":true}],[{"style":{"width":"99%"},"width":1864,"height":375,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-2.png","element":"img"}],[{"text":"where the first inequality follows from ","element":"span"},{"style":{"height":18.33},"width":297.59,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-3.png","element":"img","alt":" e−m((j+1)γ−s) ≤","inline":true,"padRight":true},{"text":"1 and the second inequality follows since ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"is an upper bound for ","element":"span"},{"style":{"height":22.89},"width":287.14,"height":57.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-4.png","element":"img","alt":" |P − 12 ∇2UP − 12 |","inline":true,"padRight":true},{"text":"from the assumption in the lemma. To bound ","element":"span"},{"href":"#id-83","text":"(A.10)","element":"a"},{"text":", we handle the first and second terms, separately.","element":"span"}],[{"text":"For the second term, we integrate ","element":"span"},{"id":"id-85","href":"#id-84","text":"(A.9) ","element":"a"},{"text":"from ","element":"span"},{"style":{"height":17.6},"width":417.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-5.png","element":"img","alt":" jγ to s ∈ [jγ, (j + 1)γ","inline":true},{"text":") to obtain","element":"span"}],[{"style":{"width":"99%"},"width":1868,"height":1260,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-6.png","element":"img"}],[{"text":"where ˜","element":"span"},{"style":{"height":23.5},"width":1905.86,"height":58.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-7.png","element":"img","alt":"xmin = P12 xmin. Since ˜p(˜x) = det(P − 12 )p(P − 12 ˜x), we have −∇2˜x log ˜p(˜x) = −P − 12 ∇2x log p(P − 12 ˜x)P − 12 .","inline":true,"padRight":true},{"text":"Thus, ˜","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m","element":"span"},{"text":"-strongly log-concave. It follows from ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 9] that","element":"span"}],[{"style":{"width":"75%"},"width":1408,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-8.png","element":"img"}],[{"text":"On the other hand, ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 8] yields that","element":"span"}],[{"id":"id-86","style":{"width":"71%"},"width":1333,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/18-9.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-85","text":"(A.11)","element":"a"},{"text":"–","element":"span"},{"href":"#id-86","text":"(A.15)","element":"a"},{"text":", we obtain that","element":"span"}],[{"style":{"width":"90%"},"width":1697,"height":285,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-0.png","element":"img"}],[{"text":"where the second inequality follows from ","element":"span"},{"style":{"height":25.95},"width":521.37,"height":64.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-1.png","element":"img","alt":" γ ≤ mλmin16M2 max{λmin,t} ≤ m16M2","inline":true,"padRight":true},{"text":". Plugging this inequality into ","element":"span"},{"href":"#id-83","text":"(A.10) ","element":"a"},{"text":"yields","element":"span"}],[{"style":{"width":"81%"},"width":1535,"height":166,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-2.png","element":"img"}],[{"text":"To further simplify the first two terms on the right-hand side, we use the following inequalities:","element":"span"}],[{"style":{"width":"78%"},"width":1463,"height":204,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-3.png","element":"img"}],[{"text":"where the second line follows from the fact that ","element":"span"},{"style":{"height":19.22},"width":410.38,"height":48.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-4.png","element":"img","alt":" e−x ≤ 1− x2 for x ∈ [0,","inline":true,"padRight":true},{"text":"1]. Consequently, ","element":"span"},{"style":{"height":19.95},"width":225.02,"height":49.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-5.png","element":"img","alt":" E[|Z(j+1)γ−","inline":true},{"style":{"height":19.59},"width":77.69,"height":48.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-6.png","element":"img","alt":"X|pP ","inline":true,"padRight":true},{"text":"] is bounded as","element":"span"}],[{"style":{"width":"71%"},"width":1345,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-7.png","element":"img"}],[{"text":"Invoking the bound repeatedly, we obtain that","element":"span"}],[{"style":{"width":"91%"},"width":1717,"height":403,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-8.png","element":"img"}],[{"text":"Since (1 ","element":"span"},{"style":{"height":23.24},"width":606.13,"height":58.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-9.png","element":"img","alt":" − m4 γ) ≤ ( 12)m4 γ, Z(j+1)γ = Yj+1","inline":true},{"text":", we conclude that","element":"span"}],[{"style":{"width":"86%"},"width":1623,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/19-10.png","element":"img"}],[{"text":"Replacing ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"+ 1 with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":", the result follows.","element":"span"}],[{"text":"We are now ready to prove Proposition ","element":"span"},{"href":"#id-51","text":"4.1.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Proposition ","element":"span"},{"href":"#id-51","style":{"fontStyle":"italic"},"text":"4.1. ","element":"a"},{"text":"For simplicity, the following notation is used throughout the proof: for a positive definite matrix ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", we let","element":"span"}],[{"style":{"width":"35%"},"width":670,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-0.png","element":"img"}],[{"text":"We also let ","element":"span"},{"style":{"height":17.42},"width":314.84,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-1.png","element":"img","alt":" λmax,t and λmin,t","inline":true,"padRight":true},{"text":"denote the maximum and minimum eigenvalues of ","element":"span"},{"style":{"height":14.62},"width":40.02,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-2.png","element":"img","alt":" Pt","inline":true},{"text":", respectively. Since ","element":"span"},{"style":{"height":15.6},"width":137.94,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-3.png","element":"img","alt":" µt is m","inline":true},{"text":"-strongly log-concave distribution, it follows from ","element":"span"},{"href":"#id-13","referenceIndex":20,"text":"[20, ","element":"a"},{"text":"Lemma 10] that","element":"span"}],[{"style":{"width":"66%"},"width":1251,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-4.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". ","element":"span"},{"text":"We then use Lemma ","element":"span"},{"href":"#id-87","text":"A.3 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":15.02},"width":176.92,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-5.png","element":"img","alt":" nx = dn","inline":true,"padRight":true},{"text":"and the initial distribution ","element":"span"},{"style":{"height":19.31},"width":285.98,"height":48.27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-6.png","element":"img","alt":" θ0 ∼ δθmin,t in","inline":true,"padRight":true},{"text":"Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"to obtain that","element":"span"}],[{"style":{"width":"67%"},"width":1256,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-7.png","element":"img"}],[{"text":"In Algorithm ","element":"span"},{"href":"#id-49","text":"1, ","element":"a"},{"text":"the stepsize and number of iterations are chosen to be ","element":"span"},{"style":{"height":28.68},"width":422.2,"height":71.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-8.png","element":"img","alt":" γt = mλmin,t16M2 max{λmin,t,t}","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"height":27.75},"width":543.49,"height":69.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-9.png","element":"img","alt":" Nt = 4 log2(max{λmin,t,t}/λmin,t)mγt","inline":true,"padRight":true},{"text":". Thus, the first and second terms on the right-hand side of the inequality above are bounded as","element":"span"}],[{"style":{"width":"75%"},"width":1408,"height":205,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-10.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"55%"},"width":1035,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-11.png","element":"img"}],[{"text":"respectively. Therefore, we conclude that","element":"span"}],[{"style":{"width":"73%"},"width":1376,"height":253,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-12.png","element":"img"}],[{"text":"For the special case with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"= 2, a simpler bound is attained. Using the inequality","element":"span"}],[{"style":{"width":"49%"},"width":935,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-13.png","element":"img"}],[{"text":"one can deduce that","element":"span"}],[{"style":{"width":"86%"},"width":1622,"height":280,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-14.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.69},"width":220.52,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/20-15.png","element":"img","alt":" D = 114dnm .","inline":true}],[{"id":"id-61","style":{"fontWeight":"bold"},"text":"A.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Proposition ","element":"span"},{"href":"#id-62","style":{"fontWeight":"bold"},"text":"4.2","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"Fix an arbitrary ","element":"span"},{"style":{"height":18.33},"width":592.05,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-0.png","element":"img","alt":" t. Given θ0 ∈ Rdn, let θτ ∈ Rdn ","inline":true,"padRight":true},{"text":"denote the solution of the following SDE:","element":"span"}],[{"style":{"width":"38%"},"width":725,"height":70,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.6},"width":1748.61,"height":54.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-2.png","element":"img","alt":" Pt = λIdn+�t−1s=1 blkdiag({zsz⊤s }ni=1) and Ut = U1+U ′t with U ′t = �t−1s=1 log pw(xs+1−Θ⊤zs).","inline":true,"padRight":true},{"text":"Define ","element":"span"},{"style":{"height":17.6},"width":146.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-3.png","element":"img","alt":" V (τ) as","inline":true}],[{"style":{"width":"23%"},"width":448,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-4.png","element":"img"}],[{"text":"for a fixed ","element":"span"},{"style":{"height":10.4},"width":74.19,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-5.png","element":"img","alt":" α >","inline":true,"padRight":true},{"text":"0. Applying Ito’s lemma to ","element":"span"},{"style":{"height":17.6},"width":215.58,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-6.png","element":"img","alt":" V (τ) yields","inline":true}],[{"style":{"width":"29%"},"width":551,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-7.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"84%"},"width":1582,"height":1232,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-8.png","element":"img"}],[{"text":"It follows from Young’s inequality that the second and third terms on the right-hand side can be bounded as follows:","element":"span"}],[{"style":{"width":"93%"},"width":1758,"height":527,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/21-9.png","element":"img"}],[{"text":"Putting everything together, we have","element":"span"}],[{"style":{"width":"64%"},"width":1212,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-0.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":8.4},"width":125.26,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-1.png","element":"img","alt":" α = m","inline":true},{"text":". We then obtain that","element":"span"}],[{"style":{"width":"83%"},"width":1555,"height":504,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-2.png","element":"img"}],[{"text":"Regarding ","element":"span"},{"style":{"height":14.62},"width":45.06,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-3.png","element":"img","alt":" F3","inline":true},{"text":", we use the Burkholder-Davis-Gundy inequality ","element":"span"},{"href":"#id-88","referenceIndex":55,"text":"[55] ","element":"a"},{"text":"to obtain that for a fixed ∆ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0","element":"span"}],[{"style":{"width":"59%"},"width":1118,"height":551,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-4.png","element":"img"}],[{"text":"where the expectation is taken with respect to ","element":"span"},{"style":{"height":15.02},"width":38.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-5.png","element":"img","alt":" θτ","inline":true},{"text":". By Young’s inequality, we further have","element":"span"}],[{"style":{"width":"79%"},"width":1480,"height":250,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-6.png","element":"img"}],[{"text":"Putting everything together, we finally have the following bound for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"V ","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"96%"},"width":1805,"height":417,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/22-7.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"73%"},"width":1373,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-0.png","element":"img"}],[{"text":"We then have","element":"span"}],[{"style":{"width":"96%"},"width":1805,"height":250,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-1.png","element":"img"}],[{"text":"Letting ∆ ","element":"span"},{"style":{"height":9.6},"width":99.76,"height":24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-2.png","element":"img","alt":" → ∞","inline":true,"padRight":true},{"text":"and using Fatou’s lemma, we have","element":"span"}],[{"style":{"width":"64%"},"width":1199,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-3.png","element":"img"}],[{"text":"For a random vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"having a log-concave pdf, ","element":"span"},{"href":"#id-89","referenceIndex":56,"text":"[56, ","element":"a"},{"text":"Theorem 5.22] yields that","element":"span"}],[{"style":{"width":"20%"},"width":377,"height":123,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-4.png","element":"img"}],[{"text":"for any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"0. We now observe that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"y ","element":"span"},{"text":":= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"}],[{"style":{"height":22.89},"width":269.52,"height":57.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-5.png","element":"img","alt":"Ut(Pt− 12 y + θ∗","inline":true},{"text":") is convex. Therefore, it follows that","element":"span"}],[{"id":"id-92","style":{"width":"96%"},"width":1802,"height":299,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-6.png","element":"img"}],[{"text":"of ","element":"span"},{"style":{"height":10.62},"width":47.24,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-7.png","element":"img","alt":" ws","inline":true,"padRight":true},{"text":"is denoted by ","element":"span"},{"style":{"height":17.6},"width":410.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-8.png","element":"img","alt":" ws(j). Therefore, Pt","inline":true,"padRight":true},{"text":"can be written as ","element":"span"},{"style":{"height":18.09},"width":656.98,"height":45.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-9.png","element":"img","alt":" Pt = λIdn + blkdiag{Z⊤Z}ni=1 =","inline":true},{"style":{"height":17.6},"width":331,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-10.png","element":"img","alt":"In ⊗ (Z⊤Z + λId","inline":true},{"text":"), and it is straightforward to check that ","element":"span"},{"style":{"height":20.36},"width":723.7,"height":50.91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-11.png","element":"img","alt":" P −1t = In ⊗ (Z⊤Z + λId)−1. Letting","inline":true},{"style":{"height":18.22},"width":547.92,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-12.png","element":"img","alt":"θℓ := Θij for ℓ = (j − 1)d + i","inline":true},{"text":", we deduce that","element":"span"}],[{"style":{"width":"79%"},"width":1488,"height":458,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-13.png","element":"img"}],[{"text":"We are now ready to leverage the self-normalization technique, Lemma ","element":"span"},{"href":"#id-90","text":"B.1 ","element":"a"},{"text":"in Section ","element":"span"},{"href":"#id-91","text":"B.1. ","element":"a"},{"text":"For a fixed ","element":"span"},{"style":{"height":27.65},"width":1295.06,"height":69.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-14.png","element":"img","alt":" j, we let Xs = zs and Vt = λId + �t−1s=1 zsz⊤s , St = �t−1s=1∂ log pw(ws)∂ws(j) zs","inline":true,"padRight":true},{"text":"and take the probability","element":"span"}],[{"style":{"width":"93%"},"width":1756,"height":225,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/23-15.png","element":"img"}],[{"text":"holds with probability at least 1- ","element":"span"},{"style":{"height":22.49},"width":220.79,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-0.png","element":"img","alt":"δn for each j","inline":true},{"text":". Combining these for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , n ","element":"span"},{"text":"with ","element":"span"},{"href":"#id-92","text":"(A.19)","element":"a"},{"text":", we ","element":"span"},{"text":"conclude that","element":"span"}],[{"style":{"width":"89%"},"width":1676,"height":281,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-1.png","element":"img"}],[{"text":"holds with probability no less than 1 ","element":"span"},{"style":{"height":12.8},"width":64.75,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-2.png","element":"img","alt":" − δ","inline":true,"padRight":true},{"text":"for some positive constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"depending only on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m, n, d ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-3.png","element":"img","alt":" λ","inline":true},{"text":", as desired.","element":"span"}],[{"id":"id-65","style":{"fontWeight":"bold"},"text":"A.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-60","style":{"fontWeight":"bold"},"text":"4.3","element":"a"}],[{"text":"Before proving Theorem ","element":"span"},{"href":"#id-60","text":"4.3, ","element":"a"},{"text":"we introduce some auxiliary results on the behavior of ","element":"span"},{"style":{"height":18.63},"width":307.22,"height":46.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-4.png","element":"img","alt":" Mt := Θ∗− ˜Θt ∈","inline":true},{"style":{"height":19.21},"width":302.06,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-5.png","element":"img","alt":"Rd×n, where ˜Θt","inline":true,"padRight":true},{"text":"is a matrix whose vectorization is ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.75},"width":166.06,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-6.png","element":"img","alt":"θt ∈ Rdn","inline":true},{"text":". One of the fundamental ideas is to identify critical columns of ","element":"span"},{"style":{"height":14.62},"width":54.33,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-7.png","element":"img","alt":" Mt","inline":true,"padRight":true},{"text":"representing the column space of ","element":"span"},{"style":{"height":14.62},"width":68.56,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-8.png","element":"img","alt":" Mt.","inline":true,"padRight":true},{"text":"We follow the argument presented in ","element":"span"},{"href":"#id-25","referenceIndex":37,"text":"[37, ","element":"a"},{"text":"Appendix D]. For ","element":"span"},{"style":{"height":19.53},"width":594.54,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-9.png","element":"img","alt":" B ⊂ Rd and v ∈ Rd, let π(v, B","inline":true},{"text":") denote the projection of the vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v ","element":"span"},{"text":"onto the space ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":". ","element":"span"},{"text":"Similarly, we let ","element":"span"},{"style":{"height":17.6},"width":137.47,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-10.png","element":"img","alt":" π(M, B","inline":true},{"text":") denote the column-wise projection of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"M ","element":"span"},{"text":"onto ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":". We then construct a sequence of subspaces ","element":"span"},{"style":{"height":15.6},"width":326.74,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-11.png","element":"img","alt":" Bt for t = T, . . . ,","inline":true,"padRight":true},{"text":"1 in the following way. Let ","element":"span"},{"style":{"height":17.5},"width":402.45,"height":43.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-12.png","element":"img","alt":"BT+1 = ∅. For step t","inline":true},{"text":", we begin by setting ","element":"span"},{"style":{"height":19.54},"width":1054.3,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-13.png","element":"img","alt":" Bt = Bt+1. Given ϵ > 0, while |π(Mt, B⊥t )|F > dϵ,11 we","inline":true,"padRight":true},{"text":"pick a column ","element":"span"},{"style":{"height":15.02},"width":193.57,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-14.png","element":"img","alt":" v from Mt","inline":true,"padRight":true},{"text":"satisfying ","element":"span"},{"style":{"height":19.53},"width":237.02,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-15.png","element":"img","alt":" π(v, B⊥t ) > ϵ","inline":true,"padRight":true},{"text":"and update ","element":"span"},{"style":{"height":17.6},"width":271.17,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-16.png","element":"img","alt":" Bt ← Bt ⊕ {v}","inline":true},{"text":". Thus, for each step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", ","element":"span"},{"text":"we have","element":"span"}],[{"id":"id-128","style":{"width":"66%"},"width":1245,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-17.png","element":"img"}],[{"style":{"height":17.6},"width":1195.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-18.png","element":"img","alt":"Definition A.4. Let TT = {t1, . . . , tm}, t1 > t2 > ... > tm","inline":true},{"text":", be the set of timesteps at which subspaces ","element":"span"},{"style":{"height":15.02},"width":40.66,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-19.png","element":"img","alt":" Bt","inline":true,"padRight":true},{"text":"expand. Clearly, ","element":"span"},{"style":{"height":17.6},"width":453.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-20.png","element":"img","alt":" |TT | ≤ n since Mt has n","inline":true,"padRight":true},{"text":"columns. We also let ","element":"span"},{"style":{"height":17.6},"width":409.83,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-21.png","element":"img","alt":" i(t) := max{i ≤ |TT | :","inline":true},{"style":{"height":17.6},"width":137.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-22.png","element":"img","alt":"ti ≥ t}.","inline":true}],[{"text":"A key insight of this procedure is to discover a sequence of subspaces ","element":"span"},{"style":{"height":15.02},"width":40.66,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-23.png","element":"img","alt":" Bt","inline":true,"padRight":true},{"text":"supporting ","element":"span"},{"style":{"height":15.02},"width":162.81,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-24.png","element":"img","alt":" Mt’s. In","inline":true,"padRight":true},{"text":"this way, we derive the following bounds for the projection of any vector ","element":"span"},{"style":{"height":15.02},"width":175.22,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-25.png","element":"img","alt":" x onto Bt","inline":true,"padRight":true},{"href":"#id-25","referenceIndex":37,"text":"[37, ","element":"a"},{"text":"Lemma 17]:","element":"span"}],[{"id":"id-127","style":{"width":"99%"},"width":1872,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-26.png","element":"img"}],[{"text":"larger than max","element":"span"},{"style":{"height":28.73},"width":695.5,"height":71.82,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-27.png","element":"img","alt":"{16, 4S2 ˜M2dU0 }, where ¯L = 1√2m and ˜M","inline":true,"padRight":true},{"text":"is defined as","element":"span"}],[{"style":{"width":"92%"},"width":1726,"height":242,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/24-28.png","element":"img"}],[{"text":"As mentioned in Section ","element":"span"},{"href":"#id-93","text":"4.2, ","element":"a"},{"text":"we decompose an event into a good set and a bad set. Let Ω denote the probability space representing all randomness incurred from the noise and the preconditioned","element":"span"}],[{"text":"ULA. Given 0 ","element":"span"},{"style":{"height":13.2},"width":113.2,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-0.png","element":"img","alt":" < δ <","inline":true,"padRight":true},{"text":"1 in Proposition ","element":"span"},{"href":"#id-62","text":"4.2, ","element":"a"},{"text":"we define the events ","element":"span"},{"style":{"height":15.02},"width":241.48,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-1.png","element":"img","alt":" Et and Ft as","inline":true}],[{"style":{"width":"71%"},"width":1344,"height":207,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-2.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"99%"},"width":1857,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-3.png","element":"img"}],[{"text":"with the constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"from Proposition ","element":"span"},{"href":"#id-62","text":"4.2, ","element":"a"},{"text":"and","element":"span"}],[{"style":{"width":"85%"},"width":1604,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-4.png","element":"img"}],[{"text":"with the constants ","element":"span"},{"style":{"height":17.82},"width":231.72,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-5.png","element":"img","alt":" S, ρ and Mρ","inline":true,"padRight":true},{"text":"defined in the beginning of Section ","element":"span"},{"href":"#id-42","text":"3.2.","element":"a"},{"href":"#id-94","style":{"height":25.5},"width":528.46,"height":63.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-6.png","element":"img","alt":"12 Here, ¯L = 1√2m and ¯Lν is","inline":true,"padRight":true},{"text":"defined in Assumption ","element":"span"},{"href":"#id-64","text":"3.4, ","element":"a"},{"text":"and ","element":"span"},{"style":{"height":31.64},"width":783.46,"height":79.09,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-7.png","element":"img","alt":" G =�H−1/(d+1) + Hd/(d+1)�� 2Sdd+0.5√U � 1d+1","inline":true,"padRight":true},{"text":". Here, we should notice","element":"span"}],[{"text":"that when ","element":"span"},{"style":{"height":19.41},"width":684.46,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-8.png","element":"img","alt":" w ∈ Et, ˜θs ∈ C for s ≤ t − 1 while ˜θt","inline":true,"padRight":true},{"text":"follows approximate posterior distribution without restriction to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":".","element":"span"}],[{"text":"We first show that the event ","element":"span"},{"style":{"height":14.62},"width":40.06,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-9.png","element":"img","alt":" Ft","inline":true,"padRight":true},{"text":"occurs with high probability. This result allows us to integrate the OFU-based approach into our Bayesian setting for Thompson sampling.","element":"span"}],[{"id":"id-98","style":{"width":"100%"},"width":1872,"height":151,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-10.png","element":"img"}],[{"style":{"height":16.4},"width":448.03,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-11.png","element":"img","alt":"Proof. Given 1 ≤ t ≤ T","inline":true},{"text":", fix an arbitrary time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s ","element":"span"},{"text":"such that 1 ","element":"span"},{"style":{"height":13.6},"width":140.7,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-12.png","element":"img","alt":" ≤ s ≤ t","inline":true},{"text":". By Proposition ","element":"span"},{"href":"#id-62","text":"4.2,","element":"a"}],[{"style":{"width":"72%"},"width":1357,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-13.png","element":"img"}],[{"text":"holds with probability no less than 1 ","element":"span"},{"style":{"height":25.92},"width":149.7,"height":64.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-14.png","element":"img","alt":" − δs(s+1)","inline":true},{"text":". It follows from Proposition ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"and the Minkowski ","element":"span"},{"text":"inequality that for any ","element":"span"},{"style":{"height":15.2},"width":113.96,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-15.png","element":"img","alt":" p ≥ 2,","inline":true}],[{"id":"id-94","style":{"width":"95%"},"width":1790,"height":681,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/25-16.png","element":"img"}],[{"text":"where the second inequality holds with probability no less than 1 ","element":"span"},{"style":{"height":25.92},"width":149.12,"height":64.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-0.png","element":"img","alt":" − δs(s+1)","inline":true},{"text":". We now set ","element":"span"},{"style":{"height":21.29},"width":196.97,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-1.png","element":"img","alt":" p = log( 1δ)","inline":true,"padRight":true},{"text":"and","element":"span"}],[{"style":{"width":"73%"},"width":1374,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-2.png","element":"img"}],[{"text":"Then, Pr(","element":"span"},{"style":{"height":20.61},"width":427.57,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-3.png","element":"img","alt":"|˜θs − θ∗|Ps ≤ βs(δ) | hs","inline":true},{"text":") with probability at least 1 ","element":"span"},{"style":{"height":25.92},"width":149.18,"height":64.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-4.png","element":"img","alt":" − δs(s+1)","inline":true},{"text":". This implies that","element":"span"}],[{"style":{"width":"59%"},"width":1108,"height":266,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-5.png","element":"img"}],[{"text":"Let Λ","element":"span"},{"style":{"height":20.6},"width":937.33,"height":51.51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-6.png","element":"img","alt":"s := {w ∈ Ωs ⊂ Ω : |˜θs − θ∗|Ps ≤ βs(δ)} where Ωs","inline":true,"padRight":true},{"text":"denotes the set of all events before time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":". Thus, Pr(Λ","element":"span"},{"style":{"height":25.92},"width":198.35,"height":64.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-7.png","element":"img","alt":"cs) ≤ 2δs(s+1)","inline":true},{"text":". Thus, we have","element":"span"}],[{"style":{"width":"69%"},"width":1298,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-8.png","element":"img"}],[{"text":"For ","element":"span"},{"style":{"height":14.4},"width":94.22,"height":36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-9.png","element":"img","alt":" i ≤ s","inline":true},{"text":", we rewrite the linear system ","element":"span"},{"href":"#id-95","text":"(1) ","element":"a"},{"text":"as","element":"span"}],[{"style":{"width":"16%"},"width":314,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-10.png","element":"img"}],[{"text":"where","element":"span"}],[{"style":{"width":"27%"},"width":507,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-11.png","element":"img"}],[{"text":"with ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":32.4},"width":526.03,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-12.png","element":"img","alt":"K(θ)⊤ =�In K(θ)⊤�, and","inline":true}],[{"style":{"width":"56%"},"width":1050,"height":182,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-13.png","element":"img"}],[{"text":"The system state at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"can then be expressed as","element":"span"}],[{"style":{"width":"63%"},"width":1193,"height":466,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-14.png","element":"img"}],[{"text":"Recall that ","element":"span"},{"style":{"height":21.62},"width":777.02,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-15.png","element":"img","alt":" |˜Θ⊤i ˜K(˜θi)| ≤ ρ < 1 and |Θ⊤∗ ˜K(˜θi)| ≤ Mρ ","inline":true,"padRight":true},{"text":"thanks to the construction of our algorithm. ","element":"span"},{"text":"Since ","element":"span"},{"style":{"height":17.6},"width":324.13,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-16.png","element":"img","alt":" |Ts| ≤ d, we have","inline":true}],[{"style":{"width":"25%"},"width":472,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/26-17.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"63%"},"width":1184,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-0.png","element":"img"}],[{"text":"By the definition of ","element":"span"},{"style":{"height":17.42},"width":214.78,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-1.png","element":"img","alt":" rj, we have","inline":true}],[{"style":{"width":"58%"},"width":1102,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-2.png","element":"img"}],[{"text":"It follows from Lemma ","element":"span"},{"href":"#id-96","text":"B.3 ","element":"a"},{"text":"that","element":"span"}],[{"id":"id-97","style":{"width":"99%"},"width":1871,"height":721,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-3.png","element":"img"}],[{"text":"with probability no less than 1 ","element":"span"},{"style":{"height":26.26},"width":759.25,"height":65.65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-4.png","element":"img","alt":" − δs(s+1). Let ˆEw,s ⊂ Es and ˆEν,s ⊂ Es","inline":true,"padRight":true},{"text":"denote the events satisfy- ","element":"span"},{"text":"ing ","element":"span"},{"href":"#id-97","text":"(A.22) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-97","text":"(A.23)","element":"a"},{"text":", respectively. Then, on the event ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":21.83},"width":207.38,"height":54.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-5.png","element":"img","alt":"Ew,s ∩ ˆEν,s","inline":true},{"text":", we obtain that","element":"span"}],[{"style":{"width":"96%"},"width":1812,"height":329,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-6.png","element":"img"}],[{"text":"By the union bound argument,","element":"span"}],[{"style":{"width":"60%"},"width":1141,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-7.png","element":"img"}],[{"text":"where the last inequality follows from Pr( ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":26.26},"width":1041.5,"height":65.66,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-8.png","element":"img","alt":"Ecw,s) ≤ δs(s+1), Pr( ˆEcν,s) ≤ δs(s+1) and Pr(Ect ) ≤ 2δ.","inline":true,"padRight":true},{"text":"Consequently, we obtain that","element":"span"}],[{"style":{"width":"77%"},"width":1458,"height":138,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-9.png","element":"img"}],[{"text":"It immediately follows from Proposition ","element":"span"},{"href":"#id-98","text":"A.5 ","element":"a"},{"text":"that Pr(","element":"span"},{"style":{"height":17.6},"width":177.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/27-10.png","element":"img","alt":"F ct ) ≤ 4δ","inline":true},{"text":". Using this property, we now ","element":"span"},{"text":"prove Theorem ","element":"span"},{"href":"#id-60","text":"4.3.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-60","style":{"fontStyle":"italic"},"text":"4.3. ","element":"a"},{"text":"We first decompose ","element":"span"},{"style":{"height":18.22},"width":337.04,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-0.png","element":"img","alt":" E[maxj≤t |xj|p] as","inline":true}],[{"id":"id-100","style":{"width":"76%"},"width":1437,"height":86,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-1.png","element":"img"}],[{"text":"It follows from the Cauchy-Schwartz inequality and Proposition ","element":"span"},{"href":"#id-98","text":"A.5 ","element":"a"},{"text":"that","element":"span"}],[{"style":{"width":"85%"},"width":1607,"height":596,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-2.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":18.62},"width":372.02,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-3.png","element":"img","alt":" |Dt| ≤ Mρ, we have","inline":true}],[{"style":{"width":"58%"},"width":1104,"height":580,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-4.png","element":"img"}],[{"text":"where the second inequality follows from Jensen’s inequality. By Lemma ","element":"span"},{"href":"#id-99","text":"B.2 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":28.39},"width":365.46,"height":70.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-5.png","element":"img","alt":" δ = 1t2p+1M2ptρ ≤ 1t ","inline":true,"padRight":true},{"text":", the first term on the right-hand side of ","element":"span"},{"href":"#id-100","text":"(A.24) ","element":"a"},{"text":"is","element":"span"}],[{"text":"estimated as","element":"span"}],[{"style":{"width":"59%"},"width":1119,"height":326,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-6.png","element":"img"}],[{"text":"for some positive constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"depending only on ","element":"span"},{"style":{"height":20.23},"width":548.66,"height":50.57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/28-7.png","element":"img","alt":" n, nu, ρ, Mρ, S, ¯Lν, m and M.","inline":true}],[{"text":"Finally, we obtain that","element":"span"}],[{"style":{"width":"78%"},"width":1468,"height":562,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-0.png","element":"img"}],[{"id":"id-102","text":"It follows from Jensen’s inequality that","element":"span"}],[{"style":{"width":"40%"},"width":760,"height":177,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-1.png","element":"img"}],[{"text":"where the second inequality holds because ","element":"span"},{"style":{"height":15.02},"width":170.95,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-2.png","element":"img","alt":" νt and wt","inline":true,"padRight":true},{"text":"are sub-Gaussian. Putting everything together, the result follows.","element":"span"}],[{"id":"id-67","style":{"fontWeight":"bold"},"text":"A.6 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Proposition ","element":"span"},{"href":"#id-68","style":{"fontWeight":"bold"},"text":"4.4","element":"a"}],[{"style":{"height":17.6},"width":638.69,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-3.png","element":"img","alt":"Proof. Given j ∈ [1, k], let A∗, B∗","inline":true,"padRight":true},{"text":"be the true system parameters and ","element":"span"},{"style":{"height":18.22},"width":533.07,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-4.png","element":"img","alt":" s ∈ (tj, tj+1) := Ij. We first","inline":true,"padRight":true},{"text":"define the following quantities for ","element":"span"},{"style":{"height":17.02},"width":141.63,"height":42.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-5.png","element":"img","alt":" s ∈ Ij :","inline":true}],[{"style":{"width":"31%"},"width":588,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-6.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.02},"width":52.06,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-7.png","element":"img","alt":" Kj","inline":true,"padRight":true},{"text":"denotes the control gain matrix computed at the beginning of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":"th episode. Writing ","element":"span"},{"style":{"height":14.62},"width":105.61,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-8.png","element":"img","alt":"Ls :=","inline":true},{"style":{"height":42.4},"width":159.24,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-9.png","element":"img","alt":"�In 0","inline":true},{"style":{"height":17.02},"width":152.18,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-10.png","element":"img","alt":"Kj Inu","inline":true}],[{"text":"we can decompose ","element":"span"},{"style":{"height":16.4},"width":386.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-11.png","element":"img","alt":" zs as zs = ys + Lsψs","inline":true,"padRight":true},{"text":"by the construction of the algorithm.","element":"span"}],[{"text":"For a trajectory (","element":"span"},{"style":{"height":18.22},"width":113.93,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-12.png","element":"img","alt":"zs)s≥1","inline":true},{"text":", let us introduce a sequence of random variables up to time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":", which is denoted by ˜","element":"span"},{"style":{"height":17.6},"width":596.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-13.png","element":"img","alt":"hs := (x1, W1, ν1, ..., xs, Ws, νs),","inline":true}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":57.21,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-14.png","element":"img","alt":" Ws","inline":true,"padRight":true},{"text":"denotes randomness incurred by the ULA when triggered, hence, ","element":"span"},{"style":{"height":17.82},"width":392.61,"height":44.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-15.png","element":"img","alt":" Ws = 0 if s ̸= tj for","inline":true,"padRight":true},{"text":"some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". Defining the index set","element":"span"}],[{"style":{"width":"26%"},"width":487,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-16.png","element":"img"}],[{"text":"we consider the modified filtration","element":"span"}],[{"style":{"width":"60%"},"width":1141,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-17.png","element":"img"}],[{"text":"This way we can incorporate the information observed at ","element":"span"},{"style":{"height":16.22},"width":112.72,"height":40.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-18.png","element":"img","alt":" s = tj","inline":true,"padRight":true},{"text":"with that made up to ","element":"span"},{"style":{"height":16.62},"width":191.91,"height":41.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/29-19.png","element":"img","alt":" s = tj − 1","inline":true,"padRight":true},{"text":"as seen in Figure ","element":"span"},{"href":"#id-101","text":"3.","element":"a"}],[{"id":"id-101","style":{"width":"82%"},"width":1541,"height":173,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-0.png","element":"img"}],[{"text":"Figure 3: Filtration and measurability of (","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":248.71,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-1.png","element":"img","alt":"ys) and (Ls).","inline":true}],[{"text":"Yet simple but important observation is that for ","element":"span"},{"style":{"height":25.8},"width":832.83,"height":64.5,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-2.png","element":"img","alt":" Jk = {ni : n1 < n2 < ... < n k(k+1)2 } both","inline":true}],[{"text":"stochastic processes (","element":"span"},{"style":{"height":19.29},"width":397,"height":48.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-3.png","element":"img","alt":"Lns), (yns) are F′ns−1","inline":true},{"text":"-measurable and (","element":"span"},{"style":{"height":19.11},"width":208.25,"height":47.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-4.png","element":"img","alt":"ψns) is F′ns","inline":true},{"text":"-measurable. ","element":"span"},{"text":"To proceed we first notice that","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":612,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-5.png","element":"img"}],[{"text":"Our goal is to find a lower bound of ","element":"span"},{"href":"#id-102","text":"(A.25)","element":"a"},{"text":". To begin with, define ","element":"span"},{"style":{"height":42.4},"width":245.23,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-6.png","element":"img","alt":" ψ1,s =�ws−10","inline":true}],[{"text":"for ","element":"span"},{"style":{"height":17.42},"width":1360.62,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-7.png","element":"img","alt":" s ≥ 1 setting w0 = 0 for simplicity. Noting that Lsψs = Lsψ1,s + ψ2,s","inline":true},{"text":", we apply Lemma ","element":"span"},{"href":"#id-66","text":"B.4 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":22.23},"width":423.4,"height":55.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-8.png","element":"img","alt":" ϵ = 12, ˜λ = 1 to obtain","inline":true}],[{"id":"id-103","style":{"width":"91%"},"width":1704,"height":341,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-9.png","element":"img"}],[{"text":"The first term of ","element":"span"},{"href":"#id-103","text":"(A.26) ","element":"a"},{"text":"is written as","element":"span"}],[{"style":{"width":"76%"},"width":1429,"height":252,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-10.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"v","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":") is indicates the episode number such that ","element":"span"},{"style":{"height":18.75},"width":157.02,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-11.png","element":"img","alt":" s ∈ Iv(s)","inline":true},{"text":". By Lemma ","element":"span"},{"href":"#id-104","text":"B.5, ","element":"a"},{"text":"we conclude that","element":"span"}],[{"id":"id-106","style":{"width":"83%"},"width":1563,"height":136,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-12.png","element":"img"}],[{"text":"for any ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":21.93},"width":1874.58,"height":54.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/30-13.png","element":"img","alt":"λ > 0, where X = [wn1−1, · · · , wnk(k+1)/2−1]⊤ and Y = [Kv(n1)wn1−1, · · · , Kv(nk(k+1)/2)wnk(k+1)/2−1]⊤.","inline":true}],[{"text":"Next, we invoke Lemma ","element":"span"},{"href":"#id-105","text":"B.7 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":21.29},"width":805.43,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-0.png","element":"img","alt":" ϵ = 12λmin(W) for ψs = ws−1, ψs = νs","inline":true,"padRight":true},{"text":"respectively to ","element":"span"},{"text":"characterize good noise sets. Choosing ","element":"span"},{"style":{"height":21.29},"width":170.8,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-1.png","element":"img","alt":" ρ = log 2δ ","inline":true,"padRight":true},{"text":"in Lemma ","element":"span"},{"href":"#id-105","text":"B.7, ","element":"a"},{"text":"there exists ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 such that for ","element":"span"},{"text":"any ","element":"span"},{"style":{"height":31.6},"width":553.8,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-2.png","element":"img","alt":" δ > 0 and k ≥ C�log( 2δ) + d","inline":true,"padRight":true},{"text":"log 9, the following events hold with probability at least 1 ","element":"span"},{"style":{"height":12.8},"width":76.68,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-3.png","element":"img","alt":" − δ:","inline":true}],[{"style":{"width":"98%"},"width":1849,"height":266,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-4.png","element":"img"}],[{"text":"where Ω","element":"span"},{"style":{"height":12.22},"width":69.74,"height":30.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-5.png","element":"img","alt":"ν ⊂","inline":true,"padRight":true},{"text":"Ω denotes the probability space associated with the random sequence (","element":"span"},{"style":{"height":18.22},"width":252.02,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-6.png","element":"img","alt":"νs)s≥1 and Ω","inline":true,"padRight":true},{"text":"is the probability space representing all randomness in the algorithm as defined in the previous subsection. Furthermore, from the observation,","element":"span"}],[{"style":{"width":"72%"},"width":1350,"height":390,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-7.png","element":"img"}],[{"text":"we also have the following event whose subevent is ","element":"span"},{"style":{"height":17.24},"width":91.04,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-8.png","element":"img","alt":" E1,k:","inline":true}],[{"style":{"width":"94%"},"width":1775,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-9.png","element":"img"}],[{"text":"To proceed we choose ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":21.29},"width":311.46,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-10.png","element":"img","alt":"λ = 18λmin(W)k","inline":true,"padRight":true},{"text":"in ","element":"span"},{"href":"#id-106","text":"(A.27) ","element":"a"},{"text":"and recall that ","element":"span"},{"style":{"height":19.13},"width":542.11,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-11.png","element":"img","alt":" |Y |2 = λmax(Y ⊤Y ). On the","inline":true,"padRight":true},{"text":"event ","element":"span"},{"style":{"height":17.24},"width":331.6,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-12.png","element":"img","alt":" E1,k ∩ E2,k ∩ E3,k","inline":true},{"text":", first two terms on the right-hand side of ","element":"span"},{"href":"#id-103","text":"(A.26) ","element":"a"},{"text":"is lower bounded as","element":"span"}],[{"style":{"width":"98%"},"width":1836,"height":669,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-13.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0. We next deal with (","element":"span"},{"style":{"height":8.4},"width":22,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-14.png","element":"img","alt":"∗","inline":true},{"text":") in ","element":"span"},{"href":"#id-102","text":"(A.25) ","element":"a"},{"text":"and (","element":"span"},{"style":{"height":8.4},"width":43.82,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-15.png","element":"img","alt":"∗∗","inline":true},{"text":") in ","element":"span"},{"href":"#id-103","text":"(A.26) ","element":"a"},{"text":"together as they have the same structure. Let us begin by defining","element":"span"}],[{"style":{"width":"81%"},"width":1535,"height":133,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/31-16.png","element":"img"}],[{"text":"Similarly,","element":"span"}],[{"style":{"width":"70%"},"width":1319,"height":126,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-0.png","element":"img"}],[{"text":"Applying Lemma ","element":"span"},{"href":"#id-107","text":"B.8 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":21.29},"width":180.98,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-1.png","element":"img","alt":" ρ = log( 1δ","inline":true},{"text":") to the stochastic processes (","element":"span"},{"style":{"height":18.15},"width":513.46,"height":45.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-2.png","element":"img","alt":"ψs)s∈Jk and (ys)s∈Jk, each","inline":true,"padRight":true},{"text":"of the following events holds with probability at least 1 ","element":"span"},{"style":{"height":12.8},"width":76.68,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-3.png","element":"img","alt":" − δ:","inline":true}],[{"style":{"width":"98%"},"width":1839,"height":247,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-4.png","element":"img"}],[{"text":"since max","element":"span"},{"style":{"height":42.4},"width":784.32,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-5.png","element":"img","alt":"s≤t |Ls| ≤�M2K + 2 with Ls :=�In 0Kj Inu","inline":true}],[{"text":"Here,","element":"span"}],[{"style":{"width":"29%"},"width":556,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-6.png","element":"img"}],[{"text":"Fixing ","element":"span"},{"style":{"height":20.8},"width":1324.5,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-7.png","element":"img","alt":" v =�x⊤ y⊤�⊤ such that |v| = 1 where x ∈ Rn and y ∈ Rnu, we have","inline":true}],[{"style":{"width":"59%"},"width":1123,"height":250,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-8.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Bound of ","element":"span"},{"style":{"height":18.44},"width":518.68,"height":46.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-9.png","element":"img","alt":" Sk(ψ2, Lψ1) on E2,k ∩ E4,k:","inline":true}],[{"style":{"width":"87%"},"width":1647,"height":1239,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/32-10.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"Bound of ","element":"span"},{"style":{"height":19.67},"width":622.96,"height":49.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-0.png","element":"img","alt":" Sk(y, Lψ) on Ftk+1 ∩ E1,k ∩ E5,k:","inline":true}],[{"style":{"width":"84%"},"width":1580,"height":1287,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-1.png","element":"img"}],[{"text":"To bound (a) above, let us observe that ","element":"span"},{"style":{"height":24.22},"width":689.05,"height":60.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-2.png","element":"img","alt":" tk+1 = (k+1)(k+2)2 ≤ 3kp for any p ≥","inline":true,"padRight":true},{"text":"3 and consider the event ","element":"span"},{"style":{"height":18.47},"width":222.59,"height":46.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-3.png","element":"img","alt":" Ftk+1 ∩ E1,k","inline":true},{"text":". Applying Lemma ","element":"span"},{"href":"#id-99","text":"B.2 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":22.58},"width":284.47,"height":56.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-4.png","element":"img","alt":" δ = k−p ≤ t−1k+1","inline":true},{"text":", we deduce that","element":"span"}],[{"style":{"width":"91%"},"width":1721,"height":863,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/33-5.png","element":"img"}],[{"style":{"width":"82%"},"width":1539,"height":331,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-0.png","element":"img"}],[{"text":"Combining altogether and plugging them into ","element":"span"},{"href":"#id-102","text":"(A.25)","element":"a"},{"text":", on the event ","element":"span"},{"style":{"height":18.47},"width":516.19,"height":46.17,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-1.png","element":"img","alt":" Ftk+1 ∩ E1,k ∩ E2,k ∩ E3,k ∩","inline":true},{"style":{"height":17.24},"width":204.08,"height":43.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-2.png","element":"img","alt":"E4,k ∩ E5,k","inline":true},{"text":", one can derive that","element":"span"}],[{"style":{"width":"60%"},"width":1141,"height":165,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-3.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"height":15.6},"width":768.72,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-4.png","element":"img","alt":" Ci, C > 0 with δ = k−p and k ≥ k0 for k0","inline":true,"padRight":true},{"text":"large enough. In turn, we have the concentration bound for the excitation yielding that","element":"span"}],[{"style":{"width":"77%"},"width":1445,"height":683,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-5.png","element":"img"}],[{"text":"where second inequality holds from ","element":"span"},{"style":{"height":17.42},"width":281.18,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-6.png","element":"img","alt":" λmin,t ≥ λ ≥ 1.","inline":true}],[{"id":"id-69","style":{"fontWeight":"bold"},"text":"A.7 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-70","style":{"fontWeight":"bold"},"text":"4.5","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"It follows from ","element":"span"},{"href":"#id-92","text":"(A.19) ","element":"a"},{"text":"in Proposition ","element":"span"},{"href":"#id-62","text":"4.2 ","element":"a"},{"text":"that","element":"span"}],[{"style":{"width":"87%"},"width":1644,"height":300,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-7.png","element":"img"}],[{"text":"and hence,","element":"span"}],[{"id":"id-112","style":{"width":"93%"},"width":1749,"height":357,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/34-8.png","element":"img"}],[{"text":"where the second inequality holds by Jensen’s inequality and the outer expectation is taken with respect to the history at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"96%"},"width":1803,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-0.png","element":"img"}],[{"text":"component of noise ","element":"span"},{"style":{"height":17.6},"width":202.05,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-1.png","element":"img","alt":" wt by wt(j","inline":true},{"text":"). A naive bound is achieved as","element":"span"}],[{"id":"id-109","style":{"width":"89%"},"width":1669,"height":606,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-2.png","element":"img"}],[{"text":"where the second inequality follows from the fact that ","element":"span"},{"style":{"height":19.13},"width":255.19,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-3.png","element":"img","alt":" Z(Z⊤Z)−1Z⊤ ","inline":true,"padRight":true},{"text":"is a projection matrix. We now claim that ","element":"span"},{"text":"E","element":"span"}],[{"text":"high probability leveraging self-normalized bound for vector-valued martingale. For ","element":"span"},{"style":{"height":15.6},"width":241.98,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-4.png","element":"img","alt":" s ≥ 0, let us","inline":true,"padRight":true},{"text":"consider the natural filtration","element":"span"}],[{"style":{"width":"21%"},"width":406,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":216.07,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-6.png","element":"img","alt":" zs = (xs, us","inline":true},{"text":"). Clearly, for ","element":"span"},{"style":{"height":15.6},"width":307.87,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-7.png","element":"img","alt":" s ≥ 1, zs is Fs−1","inline":true},{"text":"-measurable and the random vector ","element":"span"},{"style":{"height":17.6},"width":266.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-8.png","element":"img","alt":" ∇w log pw(ws)","inline":true,"padRight":true},{"text":"is ","element":"span"},{"style":{"height":15.02},"width":47.36,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-9.png","element":"img","alt":" Fs","inline":true},{"text":"-measurable. Then for each ","element":"span"},{"style":{"height":27.65},"width":1222.7,"height":69.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-10.png","element":"img","alt":" j ∈ [1, n], we set ηs = ∂ log pw(ws)∂ws(j) , Xs = zs, St = �t−1s=1 ηsXs =","inline":true},{"style":{"height":28.43},"width":694.35,"height":71.07,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-11.png","element":"img","alt":"�t−1s=1∂ log pw(ws)∂ws(j) zs. Here, ηs is a M√m","inline":true},{"text":"-sub-Gaussian random variable since ","element":"span"},{"style":{"height":25.9},"width":456.9,"height":64.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-12.png","element":"img","alt":" v⊤∇w log pw(wt) is M√m-","inline":true,"padRight":true},{"text":"sub-Gaussian random variable for any ","element":"span"},{"style":{"height":16.4},"width":439.58,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-13.png","element":"img","alt":" v ∈ Rn given when wt","inline":true,"padRight":true},{"text":"is sub-Gaussian (Proposition 2.18 in ","element":"span"},{"href":"#id-108","referenceIndex":58,"text":"[58]","element":"a"},{"text":"). Together with the fact that","element":"span"}],[{"style":{"width":"31%"},"width":585,"height":129,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-14.png","element":"img"}],[{"text":"and the result for self-normalized bound ","element":"span"},{"href":"#id-90","text":"B.1,","element":"a"}],[{"style":{"width":"58%"},"width":1102,"height":429,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-15.png","element":"img"}],[{"text":"holds with probability at least 1 ","element":"span"},{"style":{"height":22.49},"width":89.78,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-16.png","element":"img","alt":" − δn.","inline":true,"padRight":true},{"text":"Note that in the last inequality, we used the fact that det(","element":"span"},{"style":{"height":31.6},"width":1269.22,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/35-17.png","element":"img","alt":"λId + Z⊤Z) = n�det(λIdn + �t−1s=1 blkdiag{zsz⊤s }ni=1) = n�det(Pt).","inline":true}],[{"text":"By the union bound argument,","element":"span"}],[{"id":"id-110","style":{"width":"89%"},"width":1669,"height":278,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-0.png","element":"img"}],[{"text":"with probability at least 1 ","element":"span"},{"style":{"height":16.4},"width":292.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-1.png","element":"img","alt":" − δ for any δ >","inline":true,"padRight":true},{"text":"0. Let us denote this event as ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":20.41},"width":483.22,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-2.png","element":"img","alt":"E so that Pr( ˜E) ≥ 1 − δ.","inline":true,"padRight":true},{"text":"Combining the naive bound ","element":"span"},{"href":"#id-109","text":"(A.29) ","element":"a"},{"text":"and improved bound ","element":"span"},{"href":"#id-110","text":"(A.30)","element":"a"},{"text":",","element":"span"}],[{"id":"id-111","style":{"width":"92%"},"width":1728,"height":678,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-3.png","element":"img"}],[{"text":"We handle two terms on the right hand side separately. Recall that ","element":"span"},{"style":{"height":17.6},"width":294.46,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-4.png","element":"img","alt":" g : x → (log x)p ","inline":true,"padRight":true},{"text":"is concave on ","element":"span"},{"style":{"height":19.13},"width":606.09,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-5.png","element":"img","alt":"x ≥ max{1, ep−1} whenever p >","inline":true,"padRight":true},{"text":"0. By Jensen’s inequality, the first term is bounded as","element":"span"}],[{"style":{"width":"100%"},"width":1890,"height":767,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-6.png","element":"img"}],[{"text":"where the last inequality holds from the Theorem ","element":"span"},{"href":"#id-60","text":"4.3.","element":"a"}],[{"text":"On the other hand, the second term of ","element":"span"},{"href":"#id-111","text":"(A.31) ","element":"a"},{"text":"can be handled similarly. Recalling Jensen’s inequality,","element":"span"}],[{"style":{"width":"23%"},"width":446,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/36-7.png","element":"img"}],[{"text":"for ","element":"span"},{"style":{"height":16},"width":288.91,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-0.png","element":"img","alt":" ai ∈ R and p ≥","inline":true,"padRight":true},{"text":"1, we have that","element":"span"}],[{"style":{"width":"74%"},"width":1399,"height":568,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-1.png","element":"img"}],[{"text":"where the third inequality comes from well-known fact that any ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":"-sub-Gaussian random vector ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"satisfies ","element":"span"},{"style":{"height":19.21},"width":618.9,"height":48.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-2.png","element":"img","alt":" E[X2q] ≤ q!(4¯L2)q for any q > 0.","inline":true,"padRight":true},{"text":"Choosing ","element":"span"},{"style":{"height":21.75},"width":125.85,"height":54.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-3.png","element":"img","alt":" δ = 1t2p","inline":true,"padRight":true},{"text":"and combining two bounds,","element":"span"}],[{"style":{"width":"83%"},"width":1561,"height":233,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-4.png","element":"img"}],[{"text":"Finally, going back to ","element":"span"},{"href":"#id-112","text":"(A.28)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"106%"},"width":1986,"height":649,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-5.png","element":"img"}],[{"text":"where last inequality holds thanks to Proposition ","element":"span"},{"href":"#id-68","text":"4.4. ","element":"a"},{"text":"For the concentration of the approximate posterior, we invoke Jensen’s inequality to derive","element":"span"}],[{"id":"id-114","style":{"width":"97%"},"width":1824,"height":477,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/37-6.png","element":"img"}],[{"text":"where the second inequality comes from Proposition ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"and the concentration result of exact posterior above.","element":"span"}],[{"id":"id-71","style":{"fontWeight":"bold"},"text":"A.8 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-113","style":{"fontWeight":"bold"},"text":"5.1","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"At ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"th episode, for timestep ","element":"span"},{"style":{"height":17.6},"width":296.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-0.png","element":"img","alt":" t ∈ [tk, tk+1), xt","inline":true,"padRight":true},{"text":"is written as","element":"span"}],[{"style":{"width":"65%"},"width":1221,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.62},"width":276.67,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-2.png","element":"img","alt":" rt = B∗νt + wt","inline":true},{"text":". Squaring and taking expectations on both sides of the equation above with respect to noises, the prior and randomized actions,","element":"span"}],[{"style":{"width":"67%"},"width":1263,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":20.61},"width":386.39,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-4.png","element":"img","alt":" Dt = A∗ + B∗K(˜θt).","inline":true}],[{"text":"Since ","element":"span"},{"style":{"height":15.02},"width":37.48,"height":37.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-5.png","element":"img","alt":" θ∗","inline":true,"padRight":true},{"text":"is stabilizable, it is clear to see that there exists small ","element":"span"},{"style":{"height":12.22},"width":85.46,"height":30.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-6.png","element":"img","alt":" ϵ0 >","inline":true,"padRight":true},{"text":"0 for which ","element":"span"},{"style":{"height":17.6},"width":239.14,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-7.png","element":"img","alt":" |θ − θ∗| ≤ ϵ0","inline":true,"padRight":true},{"text":"implies that ","element":"span"},{"style":{"height":17.6},"width":409.5,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-8.png","element":"img","alt":" |A∗ +B∗K(θ)| ≤ ∆ <","inline":true,"padRight":true},{"text":"1 for some ∆ ","element":"span"},{"style":{"fontStyle":"italic"},"text":"> ","element":"span"},{"text":"0. Splitting ","element":"span"},{"style":{"height":19.13},"width":215.14,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-9.png","element":"img","alt":" E[|Dt|2|xt|2","inline":true},{"text":"] around the true system parameter ","element":"span"},{"style":{"height":15.6},"width":51.42,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-10.png","element":"img","alt":" θ∗,","inline":true}],[{"style":{"width":"63%"},"width":1195,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-11.png","element":"img"}],[{"text":"One can see that (i) is bounded by ∆","element":"span"},{"style":{"height":19.13},"width":140.55,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-12.png","element":"img","alt":"2E[|xt|2","inline":true},{"text":"] by the construction. For (ii), we note that ","element":"span"},{"style":{"height":18.62},"width":192.11,"height":46.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-13.png","element":"img","alt":" |Dt| ≤ Mρ","inline":true,"padRight":true},{"text":"by Assumption ","element":"span"},{"href":"#id-57","text":"3.3. ","element":"a"},{"text":"Using Cauchy-Schwartz inequality, (ii) is bounded as","element":"span"}],[{"style":{"width":"79%"},"width":1496,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-14.png","element":"img"}],[{"text":"By Markov’s inequality,","element":"span"}],[{"style":{"width":"37%"},"width":710,"height":236,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-15.png","element":"img"}],[{"text":"where the last inequality holds for ","element":"span"},{"style":{"height":13.82},"width":120.72,"height":34.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-16.png","element":"img","alt":" t ≥ t0","inline":true,"padRight":true},{"text":"thanks to Theorem ","element":"span"},{"href":"#id-70","text":"4.5, ","element":"a"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is a positive constant depending only on ","element":"span"},{"style":{"height":16.4},"width":370.67,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-17.png","element":"img","alt":" p and ϵ0. Taking p","inline":true,"padRight":true},{"text":"large enough to satisfy ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"28(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"+ 1), Theorem ","element":"span"},{"href":"#id-60","text":"4.3 ","element":"a"},{"text":"yields that","element":"span"}],[{"style":{"width":"83%"},"width":1570,"height":348,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-18.png","element":"img"}],[{"text":"As ","element":"span"},{"style":{"height":10.62},"width":31.69,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-19.png","element":"img","alt":" rt","inline":true,"padRight":true},{"text":"is sub-Gaussian, we also have ","element":"span"},{"style":{"height":19.14},"width":116.36,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-20.png","element":"img","alt":" E[|rt|2","inline":true},{"text":"] is bounded, and hence,","element":"span"}],[{"style":{"width":"12%"},"width":228,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-21.png","element":"img"}],[{"text":"for all ","element":"span"},{"style":{"height":17.6},"width":345.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/38-22.png","element":"img","alt":" t ∈ [1, T] and C >","inline":true,"padRight":true},{"text":"0 by the recursive relation.","element":"span"}],[{"text":"To handle the fourth moment, we take the fourth power on both sides and expectation to ","element":"span"},{"href":"#id-114","text":"(A.32) ","element":"a"},{"text":"to obtain","element":"span"}],[{"style":{"width":"101%"},"width":1902,"height":414,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-0.png","element":"img"}],[{"text":"since ","element":"span"},{"style":{"height":19.13},"width":233.86,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-1.png","element":"img","alt":" E[|xt|2] ≤ C","inline":true},{"text":". We recall Theorem ","element":"span"},{"href":"#id-60","text":"4.3 ","element":"a"},{"text":"once again with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p ","element":"span"},{"text":"satisfying ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p > ","element":"span"},{"text":"56(","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"+ 1) to deduces that","element":"span"}],[{"style":{"width":"84%"},"width":1579,"height":491,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-2.png","element":"img"}],[{"text":"for some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0.","element":"span"}],[{"id":"id-73","style":{"fontWeight":"bold"},"text":"A.9 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-115","style":{"fontWeight":"bold"},"text":"5.2","element":"a"}],[{"text":"It follows from ","element":"span"},{"href":"#id-31","referenceIndex":12,"text":"[12] ","element":"a"},{"text":"that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"is Lipschitz continuous on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"with a Lipschitz constant ","element":"span"},{"style":{"height":14.7},"width":99.31,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-3.png","element":"img","alt":" LJ >","inline":true,"padRight":true},{"text":"0. We then estimate one of the key components of regret.","element":"span"}],[{"id":"id-116","style":{"fontWeight":"bold"},"text":"Lemma A.6. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose that Assumptions ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"2.1, ","element":"a"},{"href":"#id-57","style":{"fontStyle":"italic"},"text":"3.3 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"href":"#id-64","style":{"fontStyle":"italic"},"text":"3.4 ","element":"a"},{"style":{"fontStyle":"italic"},"text":"hold. Recall that ","element":"span"},{"text":"¯","element":"span"},{"text":"Θ","element":"span"},{"style":{"height":17.75},"width":384.83,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-4.png","element":"img","alt":"∗ ∈ Rd×n denote the","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"matrix of the true parameter random variables, ","element":"span"},{"text":"˜","element":"span"},{"text":"Θ","element":"span"},{"style":{"height":17.98},"width":170.1,"height":44.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-5.png","element":"img","alt":"k ∈ Rd×n ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the matrix of the parameters sampled in episode ","element":"span"},{"style":{"height":19.53},"width":477.2,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-6.png","element":"img","alt":" k, and zt := (xt, ut) ∈ Rd","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, the following inequality holds:","element":"span"}],[{"style":{"width":"47%"},"width":883,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-7.png","element":"img"}],[{"id":"id-117","style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":21.52},"width":257.51,"height":53.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-8.png","element":"img","alt":" P ∗k := P ∗(˜θk)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the symmetric positive definite solution of the ARE ","element":"span"},{"href":"#id-37","text":"(3) ","element":"a"},{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"height":19.41},"width":242.21,"height":48.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-9.png","element":"img","alt":" θ := ˜θk, and","inline":true},{"style":{"height":10.7},"width":50.19,"height":26.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-10.png","element":"img","alt":"nT","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the last episode for time horizon ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-116","style":{"fontStyle":"italic"},"text":"A.6. ","element":"a"},{"text":"We first observe that for any ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-11.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"which satisfies ","element":"span"},{"style":{"height":17.6},"width":145.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-12.png","element":"img","alt":" |θ| ≤ S,","inline":true}],[{"style":{"width":"89%"},"width":1669,"height":251,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/39-13.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":782.89,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-0.png","element":"img","alt":" MP ∗ satisfies |P ∗(θ)| ≤ MP ∗ for all θ ∈ C","inline":true},{"text":". We then consider","element":"span"}],[{"style":{"width":"99%"},"width":1867,"height":357,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-1.png","element":"img"}],[{"id":"id-118","text":"Thus, with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"< x, y > ","element":"span"},{"text":"denoting the inner product of two vectors ","element":"span"},{"style":{"height":19.13},"width":183.56,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-2.png","element":"img","alt":" x, y ∈ Rd,","inline":true}],[{"style":{"width":"73%"},"width":1370,"height":485,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-3.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-117","text":"(A.35) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-118","text":"(A.36) ","element":"a"},{"text":"yields that","element":"span"}],[{"style":{"width":"92%"},"width":1728,"height":300,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-4.png","element":"img"}],[{"text":"Invoking the Cauchy-Schwarz inequality, we have","element":"span"}],[{"style":{"width":"41%"},"width":769,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-5.png","element":"img"}],[{"text":"It follows from the tower rule together with Proposition ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"that","element":"span"}],[{"style":{"width":"89%"},"width":1684,"height":745,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/40-6.png","element":"img"}],[{"text":"Now putting these together with Theorem ","element":"span"},{"href":"#id-113","text":"5.1, ","element":"a"},{"text":"we obtain","element":"span"}],[{"style":{"width":"70%"},"width":1323,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-0.png","element":"img"}],[{"text":"Finally, to bound ","element":"span"},{"style":{"height":26.69},"width":178.34,"height":66.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-1.png","element":"img","alt":"�nTk=1 Tk√tk","inline":true,"padRight":true},{"text":", we recall that ","element":"span"},{"style":{"height":24.22},"width":1038.39,"height":60.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-2.png","element":"img","alt":" Tk = k + 1 and tk = tk−1 + Tk−1. Thus, tk = Tk(Tk−1)2 .","inline":true,"padRight":true},{"text":"Then, the sum ","element":"span"},{"style":{"height":26.69},"width":178.33,"height":66.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-3.png","element":"img","alt":"�nTk=1 Tk√tk","inline":true,"padRight":true},{"text":"is bounded as follows:","element":"span"}],[{"id":"id-125","style":{"width":"71%"},"width":1345,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-4.png","element":"img"}],[{"text":"Therefore, the result follows.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-115","style":{"fontStyle":"italic"},"text":"5.2. ","element":"a"},{"text":"Combining Theorem ","element":"span"},{"href":"#id-113","text":"5.1 ","element":"a"},{"text":"and Lemma ","element":"span"},{"href":"#id-116","text":"A.6, ","element":"a"},{"text":"we finally prove Theorem ","element":"span"},{"href":"#id-115","text":"5.2, ","element":"a"},{"text":"which yields the ","element":"span"},{"style":{"height":19.98},"width":118.82,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-5.png","element":"img","alt":" O(√T","inline":true},{"text":") regret bound. Recall that the system parameter sampled in Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"is denoted by ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":15.24},"width":38.48,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-6.png","element":"img","alt":"θk","inline":true},{"text":", which is used in obtaining the control gain matrix ","element":"span"},{"style":{"height":20.61},"width":635.12,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-7.png","element":"img","alt":" Kk = K(˜θk) for t ∈ [tk, tk+1). Let","inline":true},{"style":{"height":21.52},"width":231.77,"height":53.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-8.png","element":"img","alt":"P ∗k := P ∗(˜θk","inline":true},{"text":") for brevity and ˜","element":"span"},{"style":{"height":14.84},"width":191.86,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-9.png","element":"img","alt":"ut = Kkxt","inline":true,"padRight":true},{"text":"be an optimal action for ˜","element":"span"},{"style":{"height":15.24},"width":38.48,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-10.png","element":"img","alt":"θk","inline":true},{"text":". Fix an arbitrary ","element":"span"},{"style":{"height":17.6},"width":245.33,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-11.png","element":"img","alt":" t ∈ [tk, tk+1).","inline":true,"padRight":true},{"text":"Then, the Bellman equation ","element":"span"},{"href":"#id-35","referenceIndex":46,"text":"[46] ","element":"a"},{"text":"for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"in episode ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"is given by","element":"span"}],[{"style":{"width":"90%"},"width":1689,"height":198,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-12.png","element":"img"}],[{"id":"id-120","text":"where the expectation is taken with respect to ","element":"span"},{"style":{"height":10.62},"width":43.24,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-13.png","element":"img","alt":" wt","inline":true},{"text":", and the second inequality holds because the mean of ","element":"span"},{"style":{"height":10.62},"width":43.24,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-14.png","element":"img","alt":" wt","inline":true,"padRight":true},{"text":"is zero. On the other hand, the observed next state is expressed as","element":"span"}],[{"style":{"width":"18%"},"width":344,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-15.png","element":"img"}],[{"id":"id-119","text":"where ","element":"span"},{"text":"¯","element":"span"},{"text":"Θ","element":"span"},{"style":{"height":17.75},"width":168.56,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-16.png","element":"img","alt":"∗ ∈ Rd×n ","inline":true,"padRight":true},{"text":"is the matrix of the true parameter random variables. We then notice that","element":"span"}],[{"style":{"width":"79%"},"width":1483,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-17.png","element":"img"}],[{"id":"id-121","text":"Plugging ","element":"span"},{"href":"#id-119","text":"(A.41) ","element":"a"},{"text":"into ","element":"span"},{"href":"#id-120","text":"(A.40) ","element":"a"},{"text":"and rearranging it,","element":"span"}],[{"style":{"width":"90%"},"width":1689,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-18.png","element":"img"}],[{"text":"Since ˜","element":"span"},{"style":{"height":10.62},"width":223.48,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-19.png","element":"img","alt":"ut = ut − νt","inline":true},{"text":", we derive that","element":"span"}],[{"id":"id-122","style":{"width":"72%"},"width":1357,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-20.png","element":"img"}],[{"text":"and","element":"span"}],[{"id":"id-123","style":{"width":"97%"},"width":1826,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-21.png","element":"img"}],[{"text":"Combining ","element":"span"},{"href":"#id-121","text":"(A.42)","element":"a"},{"text":", ","element":"span"},{"href":"#id-122","text":"(A.43) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-123","text":"(A.44)","element":"a"},{"text":", we conclude that","element":"span"}],[{"style":{"width":"85%"},"width":1599,"height":198,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/41-22.png","element":"img"}],[{"text":"where the expectation is taken with respect to ","element":"span"},{"style":{"height":15.02},"width":192.64,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-0.png","element":"img","alt":" wt and νt.","inline":true,"padRight":true},{"text":"Using this expression and observing ","element":"span"},{"style":{"height":16.76},"width":355.8,"height":41.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-1.png","element":"img","alt":" tnT ≤ T ≤ tnT +1 −","inline":true,"padRight":true},{"text":"1, the expected regret of Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"is decomposed as","element":"span"}],[{"style":{"width":"72%"},"width":1361,"height":208,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-2.png","element":"img"}],[{"id":"id-124","text":"where","element":"span"}],[{"style":{"width":"50%"},"width":937,"height":773,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-3.png","element":"img"}],[{"text":"To obtain the exact regret bound, we include ","element":"span"},{"style":{"height":14.62},"width":50.13,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-4.png","element":"img","alt":" R5","inline":true,"padRight":true},{"text":"which is not considered in ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10]","element":"a"},{"text":". By Lemma ","element":"span"},{"href":"#id-116","text":"A.6, ","element":"a"},{"style":{"height":14.62},"width":50.13,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-5.png","element":"img","alt":"R1","inline":true,"padRight":true},{"text":"is bounded as","element":"span"}],[{"style":{"width":"36%"},"width":674,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-6.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":15.24},"width":127.16,"height":38.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-7.png","element":"img","alt":" Tk = k","inline":true,"padRight":true},{"text":"+ 1, we have","element":"span"}],[{"style":{"width":"38%"},"width":716,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-8.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"55%"},"width":1038,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-9.png","element":"img"}],[{"text":"Therefore, we conclude that","element":"span"}],[{"style":{"width":"39%"},"width":748,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/42-10.png","element":"img"}],[{"text":"Regarding ","element":"span"},{"style":{"height":14.62},"width":50.13,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-0.png","element":"img","alt":" R2","inline":true},{"text":", we use the tower rule ","element":"span"},{"style":{"height":17.6},"width":356.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-1.png","element":"img","alt":" E[E[Xt|ht]] = E[Xt","inline":true},{"text":"] to obtain","element":"span"}],[{"style":{"width":"43%"},"width":813,"height":723,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-2.png","element":"img"}],[{"text":"where the last inequality follows from ","element":"span"},{"href":"#id-124","text":"(A.45)","element":"a"},{"text":".","element":"span"}],[{"text":"We also need to deal with ","element":"span"},{"style":{"height":14.62},"width":50.13,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-3.png","element":"img","alt":" R3","inline":true,"padRight":true},{"text":"carefully. What is different from the analysis presented in ","element":"span"},{"href":"#id-8","referenceIndex":10,"text":"[10]","element":"a"},{"text":", the term simply vanishes using the intrinsic property of probability matching of Thompson sampling as exact posterior distributions are used. However, in our analysis, approximate posterior is considered instead so a different approach is required. To cope with this problem, we adopt the notion of Lipschitz continuity of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"for estimation. Specifically,","element":"span"}],[{"style":{"width":"35%"},"width":661,"height":716,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":14.7},"width":50.7,"height":36.75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-5.png","element":"img","alt":" LJ","inline":true,"padRight":true},{"text":"is a Lipschitz constant of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"J ","element":"span"},{"text":"and the last inequality follows from Proposition ","element":"span"},{"href":"#id-51","text":"4.1 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":21.69},"width":220.52,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-6.png","element":"img","alt":"D = 114dnm .","inline":true,"padRight":true},{"text":"Using the bound ","element":"span"},{"href":"#id-125","text":"(A.39) ","element":"a"},{"text":"of ","element":"span"},{"style":{"height":26.69},"width":178.34,"height":66.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-7.png","element":"img","alt":"�nTk=1 Tk√tk","inline":true,"padRight":true},{"text":"in the proof of Lemma ","element":"span"},{"href":"#id-116","text":"A.6, ","element":"a"},{"text":"we have","element":"span"}],[{"style":{"width":"20%"},"width":392,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/43-8.png","element":"img"}],[{"text":"By the definition of ","element":"span"},{"style":{"height":15.2},"width":112.58,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-0.png","element":"img","alt":" νt, R4","inline":true,"padRight":true},{"text":"is bounded as","element":"span"}],[{"style":{"width":"44%"},"width":831,"height":592,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-1.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":922.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-2.png","element":"img","alt":" MP ∗ satisfies P ∗(θ) ≤ MP ∗ for θ ∈ C. Lastly, R5","inline":true,"padRight":true},{"text":"is bounded as","element":"span"}],[{"style":{"width":"45%"},"width":860,"height":580,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":629.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-4.png","element":"img","alt":" MJ satisfies J(θ) ≤ MJ for θ ∈ C","inline":true},{"text":". Putting all the bounds together, we conclude that","element":"span"}],[{"style":{"width":"14%"},"width":268,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-5.png","element":"img"}],[{"text":"and thus the result follows. One novelty in our analysis is that the concentration of approximate posterior is naturally embedded into the analysis, which eventually drops the log ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"term in the resulting regret.","element":"span"}]]},{"heading":"B Lemmas","paragraphs":[[{"id":"id-91","style":{"fontWeight":"bold"},"text":"B.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Self-normalization lemma","element":"span"}],[{"id":"id-90","style":{"fontWeight":"bold"},"text":"Lemma B.1 ","element":"span"},{"text":"(Theorem 1 ","element":"span"},{"href":"#id-63","referenceIndex":53,"text":"[53]","element":"a"},{"text":", self-normalized bound for vector-valued martingales)","element":"span"},{"style":{"height":17.88},"width":252.5,"height":44.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-6.png","element":"img","alt":". Let (Fs)∞s=1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a filtration. Let ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.88},"width":115.29,"height":44.7,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-7.png","element":"img","alt":"ηs)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a real-valued stochastic process such that ","element":"span"},{"style":{"height":16.4},"width":149.55,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-8.png","element":"img","alt":" ηs is Fs","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable and ","element":"span"},{"style":{"height":12},"width":37.66,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-9.png","element":"img","alt":" ηs","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is conditionally ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"style":{"fontStyle":"italic"},"text":"-sub-Gaussian for some ","element":"span"},{"style":{"height":19.81},"width":573.62,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-10.png","element":"img","alt":" R > 0. Let (Xs)∞s=1 be an Rd","inline":true},{"style":{"fontStyle":"italic"},"text":"-valued stochastic process ","element":"span"},{"style":{"fontStyle":"italic"},"text":"such that ","element":"span"},{"style":{"height":15.02},"width":206.63,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-11.png","element":"img","alt":" Xs is Fs−1","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable. For any ","element":"span"},{"style":{"height":16.4},"width":237.2,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-12.png","element":"img","alt":" t ≥ 0, define","inline":true}],[{"style":{"width":"39%"},"width":741,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"where ","element":"span"},{"style":{"height":13.2},"width":105.64,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-14.png","element":"img","alt":" λ > 0","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is given constant. Then, for any ","element":"span"},{"style":{"height":13.2},"width":101.23,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-15.png","element":"img","alt":" δ > 0","inline":true},{"style":{"fontStyle":"italic"},"text":", the inequality","element":"span"}],[{"style":{"width":"42%"},"width":799,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/44-16.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"holds with probability no less than ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":12.8},"width":77.68,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-0.png","element":"img","alt":" − δ.","inline":true}],[{"style":{"fontWeight":"bold"},"text":"B.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Maximum norm bound","element":"span"}],[{"id":"id-99","style":{"fontWeight":"bold"},"text":"Lemma B.2 ","element":"span"},{"text":"(Lemma 5 in ","element":"span"},{"href":"#id-25","referenceIndex":37,"text":"[37]","element":"a"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , T","element":"span"},{"style":{"fontStyle":"italic"},"text":", the following inequality holds:","element":"span"}],[{"style":{"width":"44%"},"width":829,"height":148,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for some constant ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C > ","element":"span"},{"text":"0 ","element":"span"},{"style":{"fontStyle":"italic"},"text":"depending only on ","element":"span"},{"style":{"height":20.23},"width":415.9,"height":50.58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-2.png","element":"img","alt":" d, m, ρ, Mρ, ¯Lν and S.","inline":true}],[{"style":{"width":"99%"},"width":1870,"height":322,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-3.png","element":"img"}],[{"text":"and ","element":"span"},{"style":{"height":10.62},"width":39.92,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-4.png","element":"img","alt":" αt","inline":true,"padRight":true},{"text":"is monotone increasing in ","element":"span"},{"style":{"height":14.62},"width":174.59,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-5.png","element":"img","alt":" Ft. From","inline":true}],[{"style":{"width":"19%"},"width":372,"height":68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-6.png","element":"img"}],[{"text":"in ","element":"span"},{"style":{"height":14.62},"width":40.06,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-7.png","element":"img","alt":" Ft","inline":true},{"text":", we derive that","element":"span"}],[{"id":"id-126","style":{"width":"67%"},"width":1265,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-8.png","element":"img"}],[{"text":"by choosing constants ","element":"span"},{"style":{"height":15.02},"width":46.31,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-9.png","element":"img","alt":" Gi","inline":true},{"text":"’s appropriately. Let us recall ","element":"span"},{"style":{"height":17.6},"width":75.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-10.png","element":"img","alt":" βt(δ","inline":true},{"text":") which is given as","element":"span"}],[{"style":{"width":"100%"},"width":1879,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-11.png","element":"img"}],[{"text":"For ","element":"span"},{"style":{"height":21.29},"width":117.73,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-12.png","element":"img","alt":" δ ≤ 1t ,","inline":true}],[{"style":{"width":"34%"},"width":637,"height":267,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-13.png","element":"img"}],[{"text":"As a result,","element":"span"}],[{"style":{"width":"95%"},"width":1793,"height":140,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-14.png","element":"img"}],[{"text":"In turn, ","element":"span"},{"href":"#id-126","text":"(B.1) ","element":"a"},{"text":"implies that","element":"span"}],[{"style":{"width":"36%"},"width":676,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/45-15.png","element":"img"}],[{"text":"We now claim that one further has","element":"span"}],[{"style":{"width":"68%"},"width":1283,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-0.png","element":"img"}],[{"text":"when ","element":"span"},{"style":{"height":31.6},"width":472.84,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-1.png","element":"img","alt":" G1β′t(δ) + G2�log� tδ�≥","inline":true,"padRight":true},{"text":"1. To see this, set","element":"span"}],[{"style":{"width":"99%"},"width":1871,"height":216,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-2.png","element":"img"}],[{"text":"constants. Clearly, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":") is increasing when ","element":"span"},{"style":{"height":25.81},"width":857.88,"height":64.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-3.png","element":"img","alt":" x >� αdd+1�d+1 and αdd+1 < α. Since α + β ≥ 1,","inline":true}],[{"style":{"width":"99%"},"width":1867,"height":367,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":803.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-5.png","element":"img","alt":" MK satisfies |[I K(θ)⊤]| ≤ MK for θ ∈ C","inline":true},{"text":". Using this relation, one derives that","element":"span"}],[{"style":{"width":"94%"},"width":1774,"height":281,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-6.png","element":"img"}],[{"text":"for appropriately chosen ","element":"span"},{"style":{"height":15.6},"width":332.05,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-7.png","element":"img","alt":" Gi > 0. Here, Gi","inline":true},{"text":"’s represent different constants whenever it appears for brevity.","element":"span"}],[{"text":"Define ","element":"span"},{"style":{"height":14.62},"width":145.6,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-8.png","element":"img","alt":" at := X","inline":true}],[{"style":{"width":"75%"},"width":1406,"height":585,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-9.png","element":"img"}],[{"text":"From","element":"span"}],[{"style":{"width":"99%"},"width":1868,"height":186,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/46-10.png","element":"img"}],[{"text":"Finally, setting","element":"span"}],[{"style":{"width":"47%"},"width":880,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-0.png","element":"img"}],[{"text":"we deduce that","element":"span"}],[{"style":{"width":"64%"},"width":1207,"height":178,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-1.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"B.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Lemmas for Theorem ","element":"span"},{"href":"#id-60","style":{"fontWeight":"bold"},"text":"4.3","element":"a"}],[{"text":"Recall the setup and notation in Section ","element":"span"},{"href":"#id-65","text":"A.5.","element":"a"}],[{"id":"id-96","style":{"fontWeight":"bold"},"text":"Lemma B.3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For any ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , T","element":"span"},{"style":{"fontStyle":"italic"},"text":", on the event ","element":"span"},{"style":{"height":14.62},"width":44.21,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-2.png","element":"img","alt":" Et","inline":true}],[{"style":{"width":"70%"},"width":1325,"height":229,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-3.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"We note that the following inequalities hold on the event ","element":"span"},{"style":{"height":14.62},"width":58.44,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-4.png","element":"img","alt":" Et:","inline":true}],[{"style":{"width":"82%"},"width":1551,"height":706,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-5.png","element":"img"}],[{"text":"The rest of the proof follows that of Lemma 18 in ","element":"span"},{"href":"#id-63","referenceIndex":53,"text":"[53] ","element":"a"},{"text":"and we provide the details for completeness.","element":"span"}],[{"text":"Let us assume that ","element":"span"},{"style":{"height":10.4},"width":70,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-6.png","element":"img","alt":" ϵ <","inline":true,"padRight":true},{"text":"1 for this moment and get back to this part later with a particular choice of ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-7.png","element":"img","alt":" ϵ","inline":true},{"text":". From ","element":"span"},{"href":"#id-127","text":"(A.21)","element":"a"},{"text":", we obtain,","element":"span"}],[{"style":{"width":"40%"},"width":761,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-8.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"67%"},"width":1267,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/47-9.png","element":"img"}],[{"text":"Using ","element":"span"},{"href":"#id-128","text":"(A.20) ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-127","text":"(A.21)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"60%"},"width":1138,"height":330,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-0.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":14.62},"width":41.78,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-1.png","element":"img","alt":" Zt","inline":true,"padRight":true},{"text":"is increasing in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", we have","element":"span"}],[{"style":{"width":"58%"},"width":1096,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-2.png","element":"img"}],[{"text":"Recalling the definition of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"s","element":"span"},{"text":"), the condition ","element":"span"},{"style":{"height":17.6},"width":413.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-3.png","element":"img","alt":" s /∈ Tt and 1 ≤ i ≤ i(s","inline":true},{"text":") implies that ","element":"span"},{"style":{"height":17.44},"width":106.4,"height":43.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-4.png","element":"img","alt":" s < ˜ti","inline":true},{"text":". Therefore, for ","element":"span"},{"style":{"height":15.6},"width":113.04,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-5.png","element":"img","alt":" δ < 1,","inline":true}],[{"style":{"width":"42%"},"width":788,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-6.png","element":"img"}],[{"text":"Hence, we deduce that","element":"span"}],[{"id":"id-129","style":{"width":"99%"},"width":1867,"height":252,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-7.png","element":"img"}],[{"text":"To further simplify ","element":"span"},{"href":"#id-129","text":"(B.6)","element":"a"},{"text":",","element":"span"}],[{"style":{"width":"71%"},"width":1342,"height":206,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-8.png","element":"img"}],[{"text":"Now let us show ","element":"span"},{"style":{"height":10.4},"width":67.95,"height":26,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-9.png","element":"img","alt":" ϵ <","inline":true,"padRight":true},{"text":"1, which is the part we postponed at the beginning of the proof. Since ","element":"span"},{"style":{"height":26.28},"width":209.63,"height":65.71,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-10.png","element":"img","alt":"H > 4S2 ˜M2dU0 ","inline":true,"padRight":true},{"text":", a direct computation yields that","element":"span"}],[{"style":{"width":"20%"},"width":391,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/48-11.png","element":"img"}],[{"text":"Noting that ","element":"span"},{"style":{"height":22.46},"width":940.37,"height":56.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-0.png","element":"img","alt":" λmax,t ≤ 1ntr(Pt) = dλ + �t−1s=1 |zs|2 ≤ dλ + t|Zt|2,","inline":true}],[{"style":{"width":"94%"},"width":1777,"height":827,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-1.png","element":"img"}],[{"text":"Therefore, ","element":"span"},{"style":{"height":20.41},"width":240.95,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-2.png","element":"img","alt":" βt(δ) ≤ ˜MZt","inline":true,"padRight":true},{"text":"holds for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"and consequently,","element":"span"}],[{"style":{"width":"94%"},"width":1770,"height":207,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-3.png","element":"img"}],[{"style":{"fontWeight":"bold"},"text":"B.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Lemmas for Proposition ","element":"span"},{"href":"#id-68","style":{"fontWeight":"bold"},"text":"4.4","element":"a"}],[{"id":"id-66","style":{"fontWeight":"bold"},"text":"Lemma B.4 ","element":"span"},{"text":"(Lemma 10 in ","element":"span"},{"href":"#id-22","referenceIndex":34,"text":"[34]","element":"a"},{"text":")","element":"span"},{"style":{"height":17.88},"width":632.81,"height":44.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-4.png","element":"img","alt":". Let (zs)∞s=1, (ys)∞s=1 and (ξs)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be three sequences of vectors in ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-5.png","element":"img","alt":"Rd","inline":true},{"style":{"fontStyle":"italic"},"text":", satisfying the linear relation ","element":"span"},{"style":{"height":16.4},"width":485.88,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-6.png","element":"img","alt":" zs = ys + ξs for all s ≥ 0","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for all ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":15.6},"width":454.72,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-7.png","element":"img","alt":"λ > 0, all t ≥ 1 and all","inline":true}],[{"style":{"width":"98%"},"width":1846,"height":212,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-8.png","element":"img"}],[{"id":"id-104","style":{"fontWeight":"bold"},"text":"Lemma B.5 ","element":"span"},{"text":"(Lemma 12 in ","element":"span"},{"href":"#id-22","referenceIndex":34,"text":"[34]","element":"a"},{"text":")","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For two matrices ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"},{"style":{"fontStyle":"italic"},"text":"Y ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with the same number of rows and any ","element":"span"},{"text":"¯","element":"span"},{"style":{"height":15.6},"width":284.82,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-9.png","element":"img","alt":"λ > 0, we have","inline":true}],[{"style":{"width":"85%"},"width":1603,"height":551,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/49-10.png","element":"img"}],[{"style":{"width":"74%"},"width":1399,"height":585,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-0.png","element":"img"}],[{"text":"where the last inequality follows from the singular value decomposition and the relation","element":"span"}],[{"style":{"width":"69%"},"width":1309,"height":181,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-1.png","element":"img"}],[{"id":"id-131","href":"#id-130","referenceIndex":59,"style":{"height":19.54},"width":704.25,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-2.png","element":"img","alt":"Lemma B.6 ( [59]). Let W ∈ Rd×d ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a random matrix and ","element":"span"},{"style":{"height":21.29},"width":663.06,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-3.png","element":"img","alt":" ϵ ∈ (0, 12) and M be ϵ-net in Sd−1","inline":true}],[{"style":{"fontStyle":"italic"},"text":"with minimal cardinality. Then, for any ","element":"span"},{"style":{"height":15.6},"width":115.56,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-4.png","element":"img","alt":" ρ > 0,","inline":true}],[{"style":{"width":"56%"},"width":1054,"height":115,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-5.png","element":"img"}],[{"id":"id-105","style":{"fontWeight":"bold"},"text":"Lemma B.7 ","element":"span"},{"text":"(Modification of Proposition 8 in ","element":"span"},{"href":"#id-22","referenceIndex":34,"text":"[34]","element":"a"},{"text":")","element":"span"},{"style":{"height":17.88},"width":242.97,"height":44.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-6.png","element":"img","alt":". Let (ψs)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a sequence of independent, zero ","element":"span"},{"style":{"fontStyle":"italic"},"text":"mean, ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"style":{"fontStyle":"italic"},"text":"-sub-Gaussian and ","element":"span"},{"style":{"height":15.02},"width":47.36,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-7.png","element":"img","alt":" Fs","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable random vector in ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-8.png","element":"img","alt":" Rd","inline":true},{"style":{"fontStyle":"italic"},"text":". Then, for all ","element":"span"},{"style":{"height":15.6},"width":345.32,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-9.png","element":"img","alt":" ρ′ > 0, 0 < ϵ < 1","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":24.25},"width":640.69,"height":60.63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-10.png","element":"img","alt":" t ≥ max( 162 ¯L4ϵ2 , 16¯L2ϵ )(ρ′ + d log 9),","inline":true}],[{"style":{"width":"80%"},"width":1511,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-11.png","element":"img"}],[{"style":{"height":16.4},"width":296.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-12.png","element":"img","alt":"Proof. Here, ψs","inline":true,"padRight":true},{"text":"is zero-mean, ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":"-sub-Gaussian random vector satisfying","element":"span"}],[{"style":{"width":"29%"},"width":560,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-13.png","element":"img"}],[{"text":"for any vector ","element":"span"},{"style":{"height":15.93},"width":127.13,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-14.png","element":"img","alt":" θ ∈ Rd","inline":true},{"text":". Then for any unit vector ","element":"span"},{"style":{"height":16.4},"width":258.19,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-15.png","element":"img","alt":" x, Y := x⊤ψs","inline":true,"padRight":true},{"text":"is zero-mean, ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"text":"-sub-Gaussian, and hence, it follows that","element":"span"}],[{"style":{"width":"69%"},"width":1305,"height":424,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-16.png","element":"img"}],[{"text":"and therefore,","element":"span"}],[{"style":{"width":"61%"},"width":1151,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/50-17.png","element":"img"}],[{"text":"Invoking Markov inequality, for any ","element":"span"},{"style":{"height":15.6},"width":114.56,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-0.png","element":"img","alt":" ρ > 0,","inline":true}],[{"style":{"width":"86%"},"width":1616,"height":413,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-1.png","element":"img"}],[{"text":"Similarly,","element":"span"}],[{"style":{"width":"72%"},"width":1364,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-2.png","element":"img"}],[{"text":"Altogether,","element":"span"}],[{"style":{"width":"75%"},"width":1422,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-3.png","element":"img"}],[{"text":"Now we apply Lemma ","element":"span"},{"href":"#id-131","text":"B.6 ","element":"a"},{"text":"with ","element":"span"},{"style":{"height":22.06},"width":722.24,"height":55.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-4.png","element":"img","alt":" ϵ = 14 and W = �ts=1(ψsψ⊤s − E[ψsψ⊤s ","inline":true,"padRight":true},{"text":"]), we have","element":"span"}],[{"style":{"width":"77%"},"width":1450,"height":128,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-5.png","element":"img"}],[{"text":"Upon substitution exp(","element":"span"},{"style":{"height":26.01},"width":647.46,"height":65.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-6.png","element":"img","alt":"−ρ′) = 9d exp(− min{ ρ16¯L2 , ρ2256t¯L4 }","inline":true},{"text":"), or equivalently,","element":"span"}],[{"style":{"width":"37%"},"width":698,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-7.png","element":"img"}],[{"text":"and solving for ","element":"span"},{"style":{"height":12},"width":23,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-8.png","element":"img","alt":" ρ","inline":true},{"text":", we further obtain that","element":"span"}],[{"style":{"width":"94%"},"width":1775,"height":680,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-9.png","element":"img"}],[{"text":"which implies that","element":"span"}],[{"style":{"width":"45%"},"width":855,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/51-10.png","element":"img"}],[{"text":"Therefore,","element":"span"}],[{"style":{"width":"90%"},"width":1689,"height":393,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-0.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"39%"},"width":748,"height":117,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-1.png","element":"img"}],[{"text":"As a result,","element":"span"}],[{"style":{"width":"74%"},"width":1400,"height":587,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-2.png","element":"img"}],[{"id":"id-107","style":{"fontWeight":"bold"},"text":"Lemma B.8 ","element":"span"},{"text":"(Proposition 9 in ","element":"span"},{"href":"#id-22","referenceIndex":34,"text":"[34]","element":"a"},{"text":")","element":"span"},{"style":{"height":15.02},"width":149.39,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-3.png","element":"img","alt":". Let Fs","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a filtration and ","element":"span"},{"text":"(","element":"span"},{"style":{"height":17.88},"width":122.04,"height":44.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-4.png","element":"img","alt":"ψs)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a sequence of independent, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"zero mean, ","element":"span"},{"text":"¯","element":"span"},{"style":{"fontStyle":"italic"},"text":"L","element":"span"},{"style":{"fontStyle":"italic"},"text":"-sub-Gaussian and ","element":"span"},{"style":{"height":15.02},"width":47.36,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-5.png","element":"img","alt":" Fs","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable random vectors in ","element":"span"},{"style":{"height":19.81},"width":298.42,"height":49.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-6.png","element":"img","alt":" Rd. Let (Ls)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a sequence of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"random matrices in ","element":"span"},{"style":{"height":17.75},"width":394.33,"height":44.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-7.png","element":"img","alt":" Rd×d such that Fs−1","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable and ","element":"span"},{"style":{"height":17.88},"width":427.33,"height":44.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-8.png","element":"img","alt":" |Ls| < ∞. Let (ys)∞s=1 ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"be a sequence of ","element":"span"},{"style":{"height":15.02},"width":90.37,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-9.png","element":"img","alt":"Fs−1","inline":true},{"style":{"fontStyle":"italic"},"text":"-measurable random variables in ","element":"span"},{"style":{"height":15.13},"width":49.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-10.png","element":"img","alt":" Rd","inline":true},{"style":{"fontStyle":"italic"},"text":". Then for all positive definite matrix ","element":"span"},{"style":{"height":12.8},"width":115.33,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-11.png","element":"img","alt":" V ≻ 0","inline":true},{"style":{"fontStyle":"italic"},"text":", the following self-normalized matrix process defined by","element":"span"}],[{"style":{"width":"68%"},"width":1285,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-12.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"satisfies","element":"span"}],[{"style":{"width":"87%"},"width":1647,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-13.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":15.6},"width":150.71,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-14.png","element":"img","alt":" ρ, t ≥ 1.","inline":true}]]},{"heading":"C Empirical Analyses","paragraphs":[[{"text":"We test the performance of our algorithm with Gaussian mixture noises specified in Sections ","element":"span"},{"href":"#id-132","text":"C.4 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-133","text":"C.5. ","element":"a"},{"text":"The source code for our TSLD-LQ implementation is available online: ","element":"span"},{"href":"https://github.com/Jiwhan-Park/tsld","style":{"fontFamily":"monospace"},"text":"https://github. ","element":"a"},{"href":"https://github.com/Jiwhan-Park/tsld","style":{"fontFamily":"monospace"},"text":"com/Jiwhan-Park/tsld","element":"a"},{"text":". The true system parameter Θ","element":"span"},{"style":{"height":6},"width":17,"height":15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/52-15.png","element":"img","alt":"∗","inline":true,"padRight":true},{"text":"is chosen as follows:","element":"span"}],[{"text":"• ","element":"span"},{"text":"for ","element":"span"},{"style":{"height":14.8},"width":224.19,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-0.png","element":"img","alt":" n = nu = 3,","inline":true}],[{"style":{"width":"48%"},"width":916,"height":148,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-1.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"for ","element":"span"},{"style":{"height":15.2},"width":224.19,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-2.png","element":"img","alt":" n = nu = 5,","inline":true}],[{"style":{"width":"69%"},"width":1298,"height":263,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-3.png","element":"img"}],[{"text":"• ","element":"span"},{"text":"for ","element":"span"},{"style":{"height":14.8},"width":246.01,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-4.png","element":"img","alt":" n = nu = 10,","inline":true}],[{"style":{"width":"58%"},"width":1100,"height":1101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-5.png","element":"img"}],[{"text":"For the quadratic cost, ","element":"span"},{"style":{"height":16.61},"width":355.18,"height":41.53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-6.png","element":"img","alt":" Q = 2In, R = Inu","inline":true,"padRight":true},{"text":"are used where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= 3","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"5","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"10. True system parameters (","element":"span"},{"style":{"height":17.6},"width":1854.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-7.png","element":"img","alt":"A∗, B∗) satisfy ρ(A∗ + B∗K) = 0.3365 for n = nu = 3, 0.3187 for n = nu = 5, and 0.3839 for","inline":true},{"style":{"height":15.6},"width":422.12,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-8.png","element":"img","alt":"n = nu = 10, where K","inline":true,"padRight":true},{"text":"denotes the control gain matrix associated with (","element":"span"},{"style":{"height":16},"width":121.15,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-9.png","element":"img","alt":"A∗, B∗","inline":true},{"text":"). For the admissible set ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":", we choose ","element":"span"},{"style":{"height":16.4},"width":623.4,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-10.png","element":"img","alt":" S = 20, MJ = 20000, and ρ = 0.","inline":true},{"text":"99 for both cases regardless of the type of noise. We also sample action perturbation ","element":"span"},{"style":{"height":21.29},"width":416.8,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-11.png","element":"img","alt":" νs from N(0, 110000Inu","inline":true},{"text":") at the end of each episode. Finally, ","element":"span"},{"text":"the prior is set to be Gaussian distribution with covariance 0","element":"span"},{"style":{"height":17.6},"width":696.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-12.png","element":"img","alt":".2In for n = nu = 3, n = nu = 5 (or","inline":true},{"style":{"height":17.6},"width":1871.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-13.png","element":"img","alt":"λ = 5), and with covariance 0.1In for n = nu = 10 (or λ = 10). The mean of each component is","inline":true,"padRight":true},{"text":"set to be 0","element":"span"},{"style":{"fontStyle":"italic"},"text":".","element":"span"},{"text":"5.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"C.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Regret","element":"span"}],[{"text":"We test our method with both symmetric and asymmetric Gaussian mixture noises specified in Sections ","element":"span"},{"href":"#id-132","text":"C.4 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-133","text":"C.5 ","element":"a"},{"text":"respectively. ","element":"span"},{"text":"As shown in Figure ","element":"span"},{"href":"#id-134","text":"4, ","element":"a"},{"text":"the proposed algorithm achieves an ","element":"span"},{"style":{"height":19.98},"width":118.83,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/53-14.png","element":"img","alt":"O(√T","inline":true},{"text":") regret bound even when the noise is asymmetric.","element":"span"}],[{"id":"id-134","style":{"width":"96%"},"width":1806,"height":399,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-0.png","element":"img"}],[{"text":"Figure 4: Expected cumulative regret ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"R","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"T","element":"figcaption","subtype":"caption"},{"text":") over a time horizon ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"T ","element":"figcaption","subtype":"caption"},{"text":"using the Gaussian mixture noise for ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":1270.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-1.png","element":"img","alt":" n = nu = 3 (left), for n = nu = 5 (center), for n = nu = 10 (right).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"C.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Effect of the preconditioner on the number of iterations","element":"span"}],[{"id":"id-135","text":"Table 1: The number of iterations required for the naive ULA and preconditioned ULA when ","element":"figcaption","subtype":"caption"},{"style":{"height":14.22},"width":224.19,"height":35.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-2.png","element":"img","alt":"n = nu = 3.","inline":true}],[{"style":{"width":"71%"},"width":1330,"height":179,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-3.png","element":"img"}],[{"text":"Table ","element":"span"},{"href":"#id-135","text":"1 ","element":"a"},{"text":"shows the number of iterations computed according to Theorem ","element":"span"},{"href":"#id-48","text":"2.4 ","element":"a"},{"text":"(naive ULA) and Algorithm ","element":"span"},{"href":"#id-49","text":"1 ","element":"a"},{"text":"(preconditioned ULA). We observe a significant reduction in the number of iterations required for the sampling process when the preconditioned ULA is employed, in comparison to the naive ULA. This empirical evidence confirms that our algorithm achieves the regret bound utilizing fewer computational resources.","element":"span"}],[{"id":"id-52","style":{"fontWeight":"bold"},"text":"C.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Additional Analyses on Gaussian Mixture Noise","element":"span"}],[{"text":"Figure ","element":"span"},{"href":"#id-136","text":"5 ","element":"a"},{"text":"shows the error between sampled and true system parameters over episode, which demonstrates its ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":22.89},"width":113.35,"height":57.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-4.png","element":"img","alt":"O(t− 14","inline":true,"padRight":true},{"text":") convergence proved in Theorem ","element":"span"},{"href":"#id-70","text":"4.5.","element":"a"}],[{"id":"id-136","style":{"width":"96%"},"width":1809,"height":406,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-5.png","element":"img"}],[{"text":"Figure 5: System parameter error ","element":"figcaption","subtype":"caption"},{"style":{"height":20.61},"width":246.25,"height":51.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-6.png","element":"img","alt":" |˜θk − θ∗|/|θ∗|","inline":true,"padRight":true},{"text":"over episode ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"k ","element":"figcaption","subtype":"caption"},{"text":"using the Gaussian mixture noise for ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":1270.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-7.png","element":"img","alt":" n = nu = 3 (left), for n = nu = 5 (center), for n = nu = 10 (right).","inline":true}],[{"text":"The sample rejection rate of Figure ","element":"span"},{"href":"#id-137","text":"6 ","element":"a"},{"text":"is computed as ","element":"span"},{"style":{"height":18.22},"width":531.11,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-8.png","element":"img","alt":" nrej/(nacc + nrej) where nrej","inline":true,"padRight":true},{"text":"is the total number of rejections at the episode and ","element":"span"},{"style":{"height":10.62},"width":73.17,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-9.png","element":"img","alt":" nacc","inline":true,"padRight":true},{"text":"is the total number of accepted samples at the episode, which is equal to the number of simulations carried out. This result empirically shows the existence of a small positive constant ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-10.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"that satisfies Pr(","element":"span"},{"text":"˜","element":"span"},{"style":{"height":17.6},"width":299.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/54-11.png","element":"img","alt":"θk ∈ C) ≥ 1 − ϵ.","inline":true}],[{"id":"id-137","style":{"width":"96%"},"width":1807,"height":417,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-0.png","element":"img"}],[{"text":"Figure 6: Sample rejection rate over episode using the Gaussian mixture noise for ","element":"figcaption","subtype":"caption"},{"style":{"height":14.22},"width":239.33,"height":35.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-1.png","element":"img","alt":" n = nu = 3","inline":true,"padRight":true},{"text":"(left), for ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":854.94,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-2.png","element":"img","alt":" n = nu = 5 (center), for n = nu = 10 (right).","inline":true}],[{"text":"Execution time illustrated in Table ","element":"span"},{"href":"#id-138","text":"2 ","element":"a"},{"text":"is measured on an Intel Xeon W-2295 (3.00GHz) platform equipped with an NVIDIA RTX 3090 GPU.","element":"span"}],[{"id":"id-138","text":"Table 2: The mean and standard deviation of execution time of 2000 time steps of Algorithm ","element":"figcaption","subtype":"caption"},{"href":"#id-49","text":"1 ","element":"a","subtype":"caption"},{"text":"in seconds for the Gaussian mixture noise. The left column is the mean and the right column is the standard deviation for each system dimension value.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"99%"},"width":1868,"height":170,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-3.png","element":"img"}],[{"id":"id-132","style":{"fontWeight":"bold"},"text":"C.4 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Gaussian mixture noise","element":"span"}],[{"text":"We consider a Gaussian mixture noise which is given by","element":"span"}],[{"style":{"width":"42%"},"width":800,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.29},"width":1747.03,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-5.png","element":"img","alt":" a = [ 12, 12, 12]⊤, [ 14, 14, 14, 14, 14]⊤ and [ 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]⊤ for n = 3, 5 and 10 respectively.","inline":true,"padRight":true},{"text":"Taking gradients,","element":"span"}],[{"style":{"width":"37%"},"width":701,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-6.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"41%"},"width":775,"height":260,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-7.png","element":"img"}],[{"text":"Therefore, the first condition in Assumption ","element":"span"},{"href":"#id-38","text":"2.1 ","element":"a"},{"text":"is satisfied for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"= 3, 5 and 10:","element":"span"}],[{"style":{"width":"28%"},"width":539,"height":226,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/55-8.png","element":"img"}],[{"style":{"width":"29%"},"width":551,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-0.png","element":"img"}],[{"text":"Figure ","element":"span"},{"href":"#id-139","text":"7 ","element":"a"},{"text":"demonstrates the comparison between the marginal distribution for some selected ","element":"span"},{"id":"id-139","text":"dimension of our symmetric Gaussian mixture noise and the standard Gaussian noise.","element":"span"}],[{"style":{"width":"69%"},"width":1300,"height":790,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-1.png","element":"img"}],[{"text":"Figure 7: Comparison between symmetric Gaussian mixture noise and the standard Gaussian noise.","element":"figcaption","subtype":"caption"}],[{"id":"id-133","style":{"fontWeight":"bold"},"text":"C.5 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Asymmetric Gaussian mixture noise","element":"span"}],[{"text":"We consider an asymmetric Gaussian mixture noise which is given by","element":"span"}],[{"style":{"width":"56%"},"width":1059,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-2.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.29},"width":1744.26,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-3.png","element":"img","alt":" γ = 14 and a = [1, 1, 1]⊤, [ 12, 12, 12, 12, 12]⊤ and [12, 12, 12, 12, 12, 12, 12, 12, 12, 12]⊤ for n = 3, 5 and 10","inline":true,"padRight":true},{"text":"respectively. Taking gradients,","element":"span"}],[{"style":{"width":"46%"},"width":871,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-4.png","element":"img"}],[{"text":"and","element":"span"}],[{"style":{"width":"56%"},"width":1057,"height":349,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-5.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.13},"width":428.87,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-6.png","element":"img","alt":" k = exp((1 − 2γ)|a|2/","inline":true},{"text":"2). Therefore, the first condition in Assumption ","element":"span"},{"href":"#id-38","text":"2.1 ","element":"a"},{"text":"is satisfied for ","element":"span"},{"href":"#id-132","style":{"height":21.29},"width":1164.46,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/56-7.png","element":"img","alt":"n = 3, 5, and 10 as in Section C.4. Note that if we set γ = 12","inline":true},{"text":", we recover the symmetric Gaussian ","element":"span"},{"text":"mixture noise defined in Section ","element":"span"},{"href":"#id-132","text":"C.4. ","element":"a"},{"text":"Figure ","element":"span"},{"href":"#id-140","text":"8 ","element":"a"},{"text":"demonstrates the comparison between the marginal distribution for some selected dimension of our symmetric Gaussian mixture noise and the standard Gaussian noise.","element":"span"}],[{"id":"id-140","style":{"width":"69%"},"width":1300,"height":790,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/57-0.png","element":"img"}],[{"text":"Figure 8: Comparison between asymmetric Gaussian mixture noise and the standard Gaussian noise.","element":"figcaption","subtype":"caption"}]]},{"heading":"References","paragraphs":[[{"id":"id-0","text":"[1] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Applied Mathematics","element":"span"},{"text":", vol. 6, no. 1, pp. 4–22, 1985.","element":"span"}],[{"id":"id-1","text":"[2] M. Kearns and S. Singh, “Near-optimal reinforcement learning in polynomial time,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine Learning","element":"span"},{"text":", vol. 49, no. 2, pp. 209–232, 2002.","element":"span"}],[{"id":"id-2","text":"[3] W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view ","element":"span"},{"text":"of the evidence of two samples,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Biometrika","element":"span"},{"text":", vol. 25, no. 3-4, pp. 285–294, 1933.","element":"span"}],[{"id":"id-3","text":"[4] S. Agrawal and N. Goyal, “Analysis of Thompson sampling for the multi-armed bandit prob- ","element":"span"},{"text":"lem,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 25th Annual Conference on Learning Theory","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2012, pp. 39.1–26.","element":"span"}],[{"text":"[5] ——, “Thompson sampling for contextual bandits with linear payoffs,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2013, pp. 127–135.","element":"span"}],[{"id":"id-4","text":"[6] E. Kaufmann, N. Korda, and R. Munos, “Thompson sampling: An asymptotically optimal ","element":"span"},{"text":"finite-time analysis,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Algorithmic Learning Theory","element":"span"},{"text":". ","element":"span"},{"text":"Springer, 2012, pp. 199–213.","element":"span"}],[{"id":"id-5","text":"[7] I. Osband, D. Russo, and B. Van Roy, “(More) efficient reinforcement learning via posterior ","element":"span"},{"text":"sampling,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", vol. 26, 2013.","element":"span"}],[{"id":"id-7","text":"[8] I. Osband and B. Van Roy, “Posterior sampling for reinforcement learning without episodes,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1608.02731","element":"span"},{"text":", 2016.","element":"span"}],[{"id":"id-6","text":"[9] A. Gopalan and S. Mannor, “Thompson sampling for learning parameterized Markov decision ","element":"span"},{"text":"processes,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of The 28th Conference on Learning Theory","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2015, pp. 861–898.","element":"span"}],[{"id":"id-8","text":"[10] Y. Ouyang, M. Gagrani, and R. Jain, “Posterior sampling-based reinforcement learning for ","element":"span"},{"text":"control of unknown linear systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Automatic Control","element":"span"},{"text":", vol. 65, no. 8, pp. 3600–3607, 2019.","element":"span"}],[{"text":"[11] Y. Abbasi-Yadkori and C. Szepesv´ari, “Bayesian optimal control of smoothly parameterized systems.” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of 31st Conference on Uncertainty in Artificial Intelligence","element":"span"},{"text":". Citeseer, 2015, pp. 1–11.","element":"span"}],[{"id":"id-31","text":"[12] M. Abeille and A. Lazaric, “Thompson sampling for linear-quadratic control problems,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Artificial Intelligence and Statistics","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2017, pp. 1246–1254.","element":"span"}],[{"id":"id-9","text":"[13] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis, “On adaptive linear-quadratic regulators,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Automatica","element":"span"},{"text":", vol. 117, p. 108982, 2020.","element":"span"}],[{"id":"id-10","text":"[14] W. R. Gilks, S. Richardson, and D. Spiegelhalter, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Markov Chain Monte Carlo in practice","element":"span"},{"text":". CRC press, 1995.","element":"span"}],[{"id":"id-40","text":"[15] G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and ","element":"span"},{"text":"their discrete approximations,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Bernoulli","element":"span"},{"text":", pp. 341–363, 1996.","element":"span"}],[{"text":"[16] A. Durmus and E. Moulines, “Sampling from a strongly log-concave distribution with the unadjusted Langevin algorithm,” 2016.","element":"span"}],[{"id":"id-11","text":"[17] M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,” ","element":"span"},{"text":"in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":", 2011, pp. 681–688.","element":"span"}],[{"id":"id-12","text":"[18] T. Huix, M. Zhang, and A. Durmus, “Tight regret and complexity bounds for Thompson ","element":"span"},{"text":"Sampling via Langevin Monte Carlo,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Artificial Intelligence and Statistics","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2023, pp. 8749–8770.","element":"span"}],[{"text":"[19] P. Xu, H. Zheng, E. V. Mazumdar, K. Azizzadenesheli, and A. Anandkumar, “Langevin monte carlo for contextual bandits,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". PMLR, 2022, pp. 24 830–24 850.","element":"span"}],[{"id":"id-13","text":"[20] E. Mazumdar, A. Pacchiano, Y.-a. Ma, P. L. Bartlett, and M. I. Jordan, “On Thompson ","element":"span"},{"text":"sampling with Langevin algorithms,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:2002.10002","element":"span"},{"text":", 2020.","element":"span"}],[{"id":"id-14","text":"[21] H. Ishfaq, Q. Lan, P. Xu, A. R. Mahmood, D. Precup, A. Anandkumar, and K. Azizzadenesheli, ","element":"span"},{"text":"“Provable and practical: Efficient exploration in reinforcement learning via Langevin Monte Carlo,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:2305.18246","element":"span"},{"text":", 2023.","element":"span"}],[{"id":"id-15","text":"[22] A. Karbasi, N. L. Kuang, Y. Ma, and S. Mitra, “Langevin Thompson Sampling with loga- ","element":"span"},{"text":"rithmic communication: bandits and reinforcement learning,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2023, pp. 15 828–15 860.","element":"span"}],[{"id":"id-16","text":"[23] X. Li, D. Wu, L. Mackey, and M. A. Erdogdu, “Stochastic Runge-Kutta accelerates Langevin ","element":"span"},{"text":"Monte Carlo and beyond,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1906.07868","element":"span"},{"text":", 2019.","element":"span"}],[{"id":"id-39","text":"[24] W. Mou, Y.-A. Ma, M. J. Wainwright, P. L. Bartlett, and M. I. Jordan, “High-order Langevin ","element":"span"},{"text":"diffusion yields an accelerated MCMC algorithm,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1908.10859","element":"span"},{"text":", 2019.","element":"span"}],[{"text":"[25] Z. Ding, Q. Li, J. Lu, and S. J. Wright, “Random coordinate Langevin Monte Carlo,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on Learning Theory","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2021, pp. 1683–1710.","element":"span"}],[{"id":"id-17","text":"[26] Y. Lu, J. Lu, and J. Nolen, “Accelerating Langevin sampling with birth-death,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1905.09863","element":"span"},{"text":", 2019.","element":"span"}],[{"id":"id-18","text":"[27] M. Girolami and B. Calderhead, “Riemann manifold Langevin and Hamiltonian Monte Carlo ","element":"span"},{"text":"methods,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)","element":"span"},{"text":", vol. 73, no. 2, pp. 123–214, 2011.","element":"span"}],[{"text":"[28] A. S. Dalalyan, “Theoretical guarantees for approximate sampling from smooth and log-concave densities,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)","element":"span"},{"text":", vol. 79, no. 3, pp. 651–676, 2017.","element":"span"}],[{"id":"id-19","text":"[29] R. Dwivedi, Y. Chen, M. J. Wainwright, and B. Yu, “Log-concave sampling: Metropolis- ","element":"span"},{"text":"Hastings algorithms are fast!” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on learning theory","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2018, pp. 793–797.","element":"span"}],[{"id":"id-20","text":"[30] I. D. Landau, R. Lozano, M. M’Saad ","element":"span"},{"style":{"fontStyle":"italic"},"text":"et al.","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Adaptive control","element":"span"},{"text":". ","element":"span"},{"text":"Springer New York, 1998, vol. 51.","element":"span"}],[{"id":"id-21","text":"[31] M. Simchowitz and D. Foster, “Naive exploration is optimal for online LQR,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2020, pp. 8937–8948.","element":"span"}],[{"text":"[32] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “Regret bounds for robust adaptive control of the linear quadratic regulator,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", vol. 31, 2018.","element":"span"}],[{"text":"[33] H. Mania, S. Tu, and B. Recht, “Certainty equivalence is efficient for linear quadratic control,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", vol. 32, 2019.","element":"span"}],[{"id":"id-22","text":"[34] Y. Jedra and A. Proutiere, “Minimal expected regret in linear quadratic control,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Artificial Intelligence and Statistics","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2022, pp. 10 234–10 321.","element":"span"}],[{"id":"id-23","text":"[35] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear ","element":"span"},{"text":"quadratic regulator,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Foundations of Computational Mathematics","element":"span"},{"text":", vol. 20, no. 4, pp. 633–679, 2020.","element":"span"}],[{"id":"id-24","text":"[36] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis, “Finite-time adaptive stabilization of ","element":"span"},{"text":"linear systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IEEE Transactions on Automatic Control","element":"span"},{"text":", vol. 64, no. 8, pp. 3498–3505, 2018.","element":"span"}],[{"id":"id-25","text":"[37] Y. Abbasi-Yadkori and C. Szepesv´ari, “Regret bounds for the adaptive control of linear ","element":"span"},{"text":"quadratic systems,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 24th Annual Conference on Learning Theory","element":"span"},{"text":". PMLR, 2011, pp. 19.1–26.","element":"span"}],[{"id":"id-26","text":"[38] M. Ibrahimi, A. Javanmard, and B. Roy, “Efficient reinforcement learning for high dimensional ","element":"span"},{"text":"linear quadratic systems,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", vol. 25, 2012.","element":"span"}],[{"id":"id-27","text":"[39] A. Cohen, T. Koren, and Y. Mansour, “Learning linear-quadratic regulators efficiently with ","element":"span"},{"text":"only","element":"span"},{"style":{"height":17.6},"width":67.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/59-0.png","element":"img","alt":"√T","inline":true,"padRight":true},{"text":"regret,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2019, pp. 1300– 1309.","element":"span"}],[{"id":"id-28","text":"[40] M. Abeille and A. Lazaric, “Efficient optimistic exploration in linear-quadratic regulators via ","element":"span"},{"text":"Lagrangian relaxation,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2020, pp. 23–31.","element":"span"}],[{"id":"id-29","text":"[41] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis, “Input perturbations for adaptive control ","element":"span"},{"text":"and learning,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Automatica","element":"span"},{"text":", vol. 117, p. 108950, 2020.","element":"span"}],[{"id":"id-30","text":"[42] S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar, “Reinforcement learning with ","element":"span"},{"text":"fast stabilization in linear dynamical systems,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Artificial Intelligence and Statistics","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2022, pp. 5354–5390.","element":"span"}],[{"id":"id-32","text":"[43] M. Abeille and A. Lazaric, “Improved regret bounds for Thompson sampling in linear quadratic ","element":"span"},{"text":"control problems,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"International Conference on Machine Learning","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2018, pp. 1–9.","element":"span"}],[{"id":"id-33","text":"[44] T. Kargin, S. Lale, K. Azizzadenesheli, A. Anandkumar, and B. Hassibi, “Thompson sampling ","element":"span"},{"text":"achieves ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":19.98},"width":118.83,"height":49.95,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/2405.19380/images/60-0.png","element":"img","alt":"O(√T","inline":true},{"text":") regret in linear quadratic control,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Conference on Learning Theory","element":"span"},{"text":". PMLR, 2022, pp. 3235–3284.","element":"span"}],[{"id":"id-34","text":"[45] M. Gagrani, S. Sudhakara, A. Mahajan, A. Nayyar, and Y. Ouyang, “A modified Thompson ","element":"span"},{"text":"sampling-based learning algorithm for unknown linear systems,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"2022 IEEE 61st Conference on Decision and Control (CDC)","element":"span"},{"text":". ","element":"span"},{"text":"IEEE, 2022, pp. 6658–6665.","element":"span"}],[{"id":"id-35","text":"[46] D. Bertsekas, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Dynamic programming and optimal control: Volume II","element":"span"},{"text":". Athena Scientific, 2011.","element":"span"}],[{"id":"id-36","text":"[47] D. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on Thompson ","element":"span"},{"text":"sampling,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1707.02038","element":"span"},{"text":", 2017.","element":"span"}],[{"id":"id-82","text":"[48] G. A. Pavliotis, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Stochastic processes and applications: Diffusion processes, the Fokker-Planck and Langevin equations","element":"span"},{"text":". ","element":"span"},{"text":"Springer, 2014, vol. 60.","element":"span"}],[{"id":"id-41","text":"[49] N. Bou-Rabee and M. Hairer, “Nonasymptotic mixing of the MALA algorithm,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"IMA Journal of Numerical Analysis","element":"span"},{"text":", vol. 33, no. 1, pp. 80–110, 2013.","element":"span"}],[{"text":"[50] C. Li, C. Chen, D. Carlson, and L. Carin, “Preconditioned stochastic gradient Langevin dy- ","element":"span"},{"text":"namics for deep neural networks,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"30th AAAI Conference on Artificial Intelligence","element":"span"},{"text":", 2016.","element":"span"}],[{"text":"[51] J. Lu, Y. Lu, and Z. Zhou, “Continuum limit and preconditioned Langevin sampling of the path integral molecular dynamics,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Journal of Computational Physics","element":"span"},{"text":", vol. 423, p. 109788, 2020.","element":"span"}],[{"text":"[52] P. Bras, “Langevin algorithms for very deep neural networks with application to image classi- ","element":"span"},{"text":"fication,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:2212.14718","element":"span"},{"text":", 2022.","element":"span"}],[{"id":"id-63","text":"[53] Y. Abbasi-Yadkori, D. P´al, and C. Szepesv´ari, “Improved algorithms for linear stochastic ","element":"span"},{"text":"bandits,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", vol. 24, pp. 2312–2320, 2011.","element":"span"}],[{"id":"id-74","text":"[54] X. Cheng, N. S. Chatterji, Y. Abbasi-Yadkori, P. L. Bartlett, and M. I. Jordan, “Sharp conver- ","element":"span"},{"text":"gence rates for Langevin dynamics in the nonconvex setting,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1805.01648","element":"span"},{"text":", 2018.","element":"span"}],[{"id":"id-88","text":"[55] Y.-F. Ren, “On the Burkholder–Davis–Gundy inequalities for continuous martingales,” ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Statistics & Probability Letters","element":"span"},{"text":", vol. 78, no. 17, pp. 3034–3039, 2008.","element":"span"}],[{"id":"id-89","text":"[56] L. Lov´asz and S. Vempala, “Logconcave functions: Geometry and efficient sampling algo- ","element":"span"},{"text":"rithms,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. ","element":"span"},{"text":"IEEE, 2003, pp. 640–649.","element":"span"}],[{"text":"[57] M. Ledoux, “Concentration of measure and logarithmic Sobolev inequalities,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Seminaire de probabilites XXXIII","element":"span"},{"text":". ","element":"span"},{"text":"Springer, 1999, pp. 120–216.","element":"span"}],[{"id":"id-108","text":"[58] ——, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The concentration of measure phenomenon","element":"span"},{"text":". ","element":"span"},{"text":"American Mathematical Soc., 2001, no. 89.","element":"span"}],[{"id":"id-130","text":"[59] R. Vershynin, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"High-dimensional probability: An introduction with applications in data science","element":"span"},{"text":". Cambridge University Press, 2018, vol. 47.","element":"span"}],[{"text":"[60] J. Honorio and T. Jaakkola, “Tight bounds for the expected risk of linear classifiers and pac- ","element":"span"},{"text":"bayes finite-sample guarantees,” in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Artificial Intelligence and Statistics","element":"span"},{"text":". ","element":"span"},{"text":"PMLR, 2014, pp. 384–392.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]