← Back

Resolution-Invariant Error Bounds for Denoising Diffusion Operators via Function-Space Langevin Stability

Metadata

Table of Contents

  1. 1. Introduction: motivation (resolution invariance), gap between empirical DDO results and end-to-end theory, contributions and scope (strongly log-concave targets; discretize-later guarantees).
  2. 2. Background on DDOs and Function-Space Scores: Gaussian reference measures, Cameron–Martin space, score operator as logarithmic derivative, DDO denoising objective, and preconditioned Langevin dynamics.
  3. 3. Problem Setup and Metrics: target class ν ≪ μ0, contraction assumptions, Wasserstein-2 on Hilbert spaces, Galerkin discretizations Pn, and what “resolution-invariant bound” means formally.
  4. 4. Exact Continuum Langevin Contraction: define the SPDE and its Markov semigroup; prove W2-contraction under strong monotonicity/Lipschitz conditions; derive mixing-time bound independent of discretization.
  5. 5. Drift Perturbation and Learned Score Error: prove a drift-perturbation lemma bounding W2 distance between trajectories and between invariant measures in terms of F − L2(ν); connect this to DDO score approximation error.
  6. 6. Time Discretization Error: Euler–Maruyama (and optional Crank–Nicolson) weak/strong error bounds for the function-space Langevin chain; obtain O(h1/2) (or improved) contribution to W2 error.
  7. 7. Uniform-in-Resolution (Galerkin) Results: show constants do not depend on n for projected dynamics; prove commutation/consistency conditions for Pn, noise, and drift needed for the discretize-later theorem.
  8. 8. Annealed Noise Scales: extend the bound to a schedule {νt} (DDO/NCSN-like); add an explicit annealing error term and guidelines for choosing ht, M, and σt.
  9. 9. Implementation Notes & Empirical Validation Plan: how to estimate δ in practice, diagnostic plots, and experiments (Gaussian mixture GRF, Darcy inverse posterior) to test resolution-invariant scaling.
  10. 10. Discussion: limitations (nonconvexity, small-noise singularity), relationship to other infinite-dimensional diffusion formulations, and open directions (beyond log-concave; adaptive preconditioners).

Content

1. Introduction: motivation (resolution invariance), gap between empirical DDO results and end-to-end theory, contributions and scope (strongly log-concave targets; discretize-later guarantees).

Many contemporary inverse problems, generative models for physical fields, and Bayesian formulations of PDE-constrained learning lead naturally to probability measures on infinite-dimensional state spaces. In such settings the object of interest is not a vector in d but a function u taking values in a separable Hilbert space H. Any implementable algorithm necessarily produces a finite-dimensional approximation u(n) ∈ Hn, obtained for instance by truncating a Fourier series or by using a grid-based discretization. A recurring practical observation is that some learned samplers—especially those based on neural operators and denoising objectives—appear to behave stably as the resolution n is refined: one trains at a certain discretization and samples at another, with only mild degradation. Our goal is to formulate and prove a precise form of this in a setting where both the analysis and the algorithm are naturally expressed at the function-space level.

The central difficulty is that the usual convergence theory for Markov chain Monte Carlo (MCMC) algorithms is typically dimension dependent. Even when one proves geometric ergodicity in finite dimensions, the constants often deteriorate with d, and naive limits as d → ∞ can be misleading. In the function-space context one must distinguish two sources of instability. First, the measure may become increasingly ill-conditioned under discretization (e.g., if the potential introduces high-frequency stiffness). Second, the may inject discretization artifacts (e.g., by using noise models that are not H-valued, or by training objectives whose risk necessarily grows with resolution). If either phenomenon occurs, uniform performance across n is impossible, and any apparent empirical stability must be explained by additional structure.

The present work is motivated by denoising diffusion operators (DDOs) and related score-based constructions in which a learned operator approximates a score or drift acting on functions. Empirically, such models are often trained from samples corrupted by Gaussian fields and then used to drive Langevin-type dynamics. While one can justify individual components of this pipeline (e.g., consistency of denoising estimators under suitable conditions), a complete statement of the form
sampling error ≤ mixing error + learning error + time-discretization error,
with constants independent of spatial resolution, has been missing. This gap is not merely technical: without an explicit function-space statement, one cannot disentangle whether a method is robust because it genuinely approximates a well-posed infinite-dimensional algorithm, or because discretization-dependent effects happen to cancel at the tested resolutions.

We therefore adopt a viewpoint. We formulate the target distribution ν on H through a Gaussian reference measure μ0 = 𝒩(0, C) and a potential Φ, and we study the preconditioned Langevin dynamics whose invariant measure is ν. The algorithm we ultimately implement is Euler–Maruyama applied to a learned drift θ, with additive trace-class Gaussian noise. The analysis proceeds in three steps: (i) contraction and mixing of the exact continuum Langevin semigroup in W2; (ii) stability of the law with respect to drift perturbations, quantified by an L2(ν; H) error level δ; and (iii) a W2 control of the Euler–Maruyama discretization error, scaling as h1/2 under minimal assumptions. The key point is that each step admits bounds whose constants are determined by function-space quantities (monotonicity and Lipschitz constants in the Cameron–Martin geometry, and Tr(C)), and hence remain uniform over Galerkin projections that commute with the covariance.

Our contributions are the following.

The scope of the theory is deliberately focused. We work in a strongly log-concave regime, where the relevant monotonicity and Lipschitz constants are uniform and where trace-class noise ensures that the driving Gaussian perturbations are H-valued. These hypotheses are restrictive relative to the most challenging diffusion-model applications, but they delineate a regime in which resolution-invariant guarantees are mathematically natural and provably attainable. They also clarify failure modes: if the corruption or driving noise is not trace class (e.g., effectively white in the discretization limit), then the learned error δ cannot, in general, remain bounded as n → ∞, and dimension-free sampling bounds are not available without additional regularity.

The remainder of the paper develops the necessary background and then formalizes the three-step argument above. In the next section we recall Gaussian reference measures and the Cameron–Martin structure, describe how scores appear as logarithmic derivatives, and explain how DDO denoising objectives lead to operator-valued score approximations compatible with the function-space Langevin dynamics.


2. Background on DDOs and Function-Space Scores: Gaussian reference measures, Cameron–Martin space, score operator as logarithmic derivative, DDO denoising objective, and preconditioned Langevin dynamics.

We recall the objects that allow one to speak meaningfully about ``scores’’ and Langevin drifts on an infinite-dimensional Hilbert space. Throughout, μ0 = 𝒩(0, C) is a centered Gaussian measure on H with C self-adjoint, strictly positive, and trace-class. The trace-class assumption implies that μ0(H) = 1 (in particular, a μ0-distributed random element is H-valued), and it ensures that the Cameron–Martin space
Hμ0 := Im(C1/2),   ∥hHμ0 := ∥C−1/2hH,
is continuously embedded in H. The Cameron–Martin theorem characterizes the quasi-invariance of μ0 under translations by h ∈ Hμ0, and hence provides the correct geometric notion of differentiability for densities built from μ0.

A basic calculus tool is the Gaussian integration-by-parts formula along Cameron–Martin directions. If f is a sufficiently smooth cylindrical function on H and h ∈ Hμ0, then

where hf(u) denotes the directional derivative along h. In particular, identifies the logarithmic derivative of μ0 along Hμ0 as the map u ↦ −C−1u, understood in the weak sense encoded by . This provides the infinite-dimensional analogue of the finite-dimensional identity ∇log 𝒩(0, C)(u) = −C−1u.

Now let ν ≪ μ0 be given by
ν(du) = Z−1exp ( − Φ(u)) μ0(du),   Z := ∫HeΦdμ0,
where Φ is differentiable along Hμ0. Differentiating the density along Cameron–Martin directions and combining with yields the integration-by-parts formula under ν:

Thus the logarithmic derivative of ν relative to the is the Hμ0-valued map
$$ u \longmapsto -\nabla_{H_{\mu_0}}\Phi(u) = \nabla_{H_{\mu_0}}\log\frac{d\nu}{d\mu_0}(u), $$
whereas the full logarithmic derivative of ν in the sense of contains the additional term C−1u inherited from μ0. This distinction is convenient: in function space, there is no Lebesgue reference, and the object that is stable under discretization is typically the μ0-relative score Hμ0log (dν/dμ0).

The preconditioned Langevin SPDE is the diffusion on H whose generator encodes and for which ν is invariant. Writing
F(u) := −(u + CHμ0Φ(u)),
the dynamics read
$$ du_t = F(u_t)\,dt + \sqrt{2}\,dW_t, $$
where Wt is a C-Wiener process. Heuristically, F consists of the Ornstein–Uhlenbeck drift u associated with μ0 and the additional term CHμ0Φ that accounts for the potential. Equivalently,
$$ F(u) = -u + C\,\nabla_{H_{\mu_0}}\log\frac{d\nu}{d\mu_0}(u), $$
so the drift is obtained by applying the preconditioner C to the μ0-relative score and then adding the stabilizing linear term u. Under standard dissipativity and regularity conditions, one verifies invariance of ν by checking that the corresponding Kolmogorov operator is symmetric (or at least has ν as a stationary measure) on cylindrical test functions; is precisely the identity needed for this computation.

We next summarize how denoising diffusion operators (DDOs) produce approximations of these score objects in a manner compatible with the function-space setting. Fix a noise level τ > 0, and consider the corruption model
$$ y = u + \sqrt{\tau}\,\xi, \qquad u\sim \nu,\ \ \xi\sim \mathcal{N}(0,C)\ \text{independent}. $$
Since C is trace-class, ξ ∈ H almost surely and hence y ∈ H almost surely. A DDO is an operator-valued estimator Gθ( ⋅ , τ) : H → H trained by the denoising objective
minθ 𝔼[∥Gθ(y, τ) − uH2],
or a variant in which the loss is measured in a Cameron–Martin norm. The minimizer of this objective (over all measurable maps) is the conditional mean G*(y, τ) = 𝔼[u ∣ y]. Under suitable regularity, this conditional mean determines the score of the corrupted law via an infinite-dimensional Tweedie-type identity: the difference G*(y, τ) − y is a smoothed version of a gradient of the log-density of the corrupted measure, and applying C−1 transfers this difference into the Cameron–Martin space. Concretely, one may define a score surrogate by
θ(y, τ) := τ−1C−1(Gθ(y, τ) − y) ∈ Hμ0,
which, when Gθ ≈ G*, approximates an appropriate logarithmic derivative of the corrupted law along Hμ0. In the small-noise limit (or in a multi-noise training scheme), such estimators are used to approximate Hμ0log (dν/dμ0) = −∇Hμ0Φ, and hence to construct a drift approximation of the form
θ(u) = −u + Cθ(u),
which matches the structure of the continuum drift F. The subsequent analysis treats θ abstractly as a measurable approximation of F, with its quality summarized by an L2(ν; H) error level; this is the point of contact between statistical training guarantees for DDOs and stability properties of Langevin sampling in function space.


3. Problem Setup and Metrics: target class ν ≪ μ0, contraction assumptions, Wasserstein-2 on Hilbert spaces, Galerkin discretizations Pn, and what “resolution-invariant bound” means formally.

We study sampling from a target probability measure ν on H given in Radon–Nikodym form with respect to the Gaussian reference μ0 = 𝒩(0, C),
ν(du) = Z−1exp ( − Φ(u)) μ0(du),   Z := ∫HeΦdμ0,
where Φ : H → ℝ is differentiable along the Cameron–Martin space Hμ0 = Im(C1/2). The function-space geometry is governed by Hμ0 and by the preconditioner C; in particular, the drift associated with the preconditioned Langevin dynamics is
F(u) = −(u + CHμ0Φ(u)).
Our target class is the strongly log-concave regime in the Gaussian base geometry: we assume that Hμ0Φ is globally L-Lipschitz and m-strongly monotone in Hμ0, i.e.,
∥∇Hμ0Φ(u) − ∇Hμ0Φ(v)∥Hμ0 ≤ Lu − vHμ0,
and
⟨∇Hμ0Φ(u) − ∇Hμ0Φ(v), u − vHμ0 ≥ mu − vHμ02,   u, v ∈ Hμ0.
These conditions are the function-space analogue of uniform convexity and smoothness; they imply well-posedness of the continuum Langevin SPDE and yield quantitative contraction of its Markov semigroup in transport metrics. We emphasize that the constants m, L are assumed to be intrinsic to the model and, in particular, do not depend on any spatial discretization.

Our sampling algorithm will not typically access F exactly. Instead, we are given a measurable approximation θ : H → H (e.g. induced by a learned score), whose quality is summarized by an L2(ν; H) error level

We will treat δ as an exogenous parameter that captures learning and model-mismatch errors; the analysis below propagates δ through the sampling dynamics in a stable way.

The notion of ``closeness to ν’’ we use is the 2-Wasserstein distance on H. For Borel probability measures ρ, ρ ∈ 𝒫2(H) with finite second moments, we define
W2(ρ, ρ)2 := infπ ∈ Π(ρ, ρ)H × Hx − yH2π(dx, dy),
where Π(ρ, ρ) denotes the set of couplings of ρ and ρ. Since H is separable, the infimum is attained, and W2 metrizes weak convergence plus convergence of second moments on 𝒫2(H). The choice of W2 is dictated by two features that we will exploit repeatedly: first, it is compatible with synchronous couplings of Langevin dynamics; second, it behaves well under Lipschitz maps. In particular, if T : H → H is 1-Lipschitz, then

a fact we will use with T = Pn a projection.

We implement the sampler by Euler–Maruyama time stepping. Given h > 0 and T = Nh, we consider

with initialization u0 ∈ H (possibly random). The trace-class assumption on C ensures that ξk ∈ H almost surely, so defines an H-valued Markov chain under mild growth conditions on θ (in our setting we will assume global Lipschitzness/dissipativity as needed for the discretization estimates). Our primary output is the law ℒ(uN), and the goal is to bound W2(ℒ(uN), ν) in terms of the runtime T, the step size h, and the drift error δ.

To connect the continuum-first analysis to practical discretizations, we introduce a family of finite-dimensional subspaces {Hn}n ≥ 1 with orthogonal projections Pn : H → Hn. We assume that Pn is nonexpansive, PnxH ≤ ∥xH, and that the discretization is consistent with the Gaussian structure in the sense that

The commutation is natural when Pn truncates in an eigenbasis of C (e.g. Fourier/Karhunen–Lo`eve truncations), and it ensures that Pnξ ∼ 𝒩(0, PnCPn) and that projected C-Wiener processes are PnCPn-Wiener processes on Hn.

The corresponding projected Euler–Maruyama chain is

By construction, evolves in Hn. The natural finite-dimensional target is the pushforward Pnν, i.e. the law of Pnu when u ∼ ν. The stability of the learning error under projection follows immediately from nonexpansiveness:
𝔼u ∼ ν[∥Pnθ(u) − PnF(u)∥H2] ≤ 𝔼u ∼ ν[∥θ(u) − F(u)∥H2] ≤ δ2.

We now formalize what we mean by a resolution-invariant sampling bound. For each n, the algorithm induces a law ℒ(uN(n)) ∈ 𝒫2(Hn), which we view as a measure on H supported on Hn. A bound is resolution-invariant if it takes the form

where the functions Mix, Learn, Disc and all constants implicit in them depend only on intrinsic model parameters (e.g. m, L, Tr(C) and regularity of θ) and on n. In particular, we seek bounds of the type
Mix(T) = eκTW2(ℒ(u0), ν),   Learn(δ) = Kδ,   Disc(h) = Kh1/2,
with κ, K independent of resolution. The next section establishes the key ingredient behind Mix(T): a W2-contraction estimate for the exact continuum Langevin dynamics, from which mixing-time bounds follow without reference to any discretization.


4. Exact Continuum Langevin Contraction: define the SPDE and its Markov semigroup; prove W2-contraction under strong monotonicity/Lipschitz conditions; derive mixing-time bound independent of discretization.

We consider the continuum preconditioned Langevin dynamics on H,

driven by a C-Wiener process Wt. Under the standing Lipschitz and monotonicity assumptions on Hμ0Φ, the SPDE is well posed and defines a time-homogeneous Markov process {ut}t ≥ 0 on H. We denote by utu the solution started at u0u = u ∈ H. The associated Markov semigroup {Pt}t ≥ 0 acts on bounded measurable test functions φ : H → ℝ by
(Ptφ)(u) := 𝔼[φ(utu)],
and, by duality, on probability measures ρ ∈ 𝒫2(H) via ρPt := ℒ(utU0) when U0 ∼ ρ. The measure ν is invariant for (indeed, is the standard μ0-preconditioned Langevin diffusion targeting ν); we will use invariance only as an identity νPt = ν and not rely on any explicit formula for the transition kernels.

The key quantitative property we need is a contraction estimate for the semigroup in W2. This follows from a synchronous coupling argument. Fix u, v ∈ H, and let ut := utu and vt := utv solve driven by the Wiener path Wt. Defining the difference et := ut − vt, we observe that the noise cancels and hence

We estimate the time derivative of etH2. Using and the chain rule,

The drift F is one-sided dissipative in the H-inner product. Indeed, for any x, y ∈ H such that Hμ0Φ(x), ∇Hμ0Φ(y) ∈ Hμ0, we compute

Inserting into yields
$$ \frac{d}{dt}\|e_t\|_H^2 \le -2\|e_t\|_H^2, $$
and therefore, by Gr"onwall,

In particular, taking expectations (trivial here since is pathwise) shows that the synchronous coupling contracts second moments at rate e−2t.

We translate into a W2-contraction bound for the semigroup. Let ρ, ρ ∈ 𝒫2(H) and let (U0, V0) be any coupling with ℒ(U0) = ρ and ℒ(V0) = ρ. Drive the coupled dynamics from (U0, V0) using the same Wiener process, producing (Ut, Vt). Then ℒ(Ut) = ρPt and ℒ(Vt) = ρPt, and (Ut, Vt) is a coupling of these laws. Hence,
W2(ρPt, ρPt)2 ≤ 𝔼[∥Ut − VtH2] ≤ e−2t 𝔼[∥U0 − V0H2].
Taking the infimum over all initial couplings (U0, V0) gives the contraction estimate

We emphasize that the rate in is intrinsic to the continuum drift structure and does not depend on any spatial discretization parameter. Moreover, implies uniqueness of an invariant measure in 𝒫2(H) whenever such a measure exists; in our setting ν is invariant, so it is the unique invariant measure with finite second moment.

Finally, choosing ρ = ν and using νPt = ν yields the mixing estimate

Equivalently, for any ε > 0, it suffices to take t ≥ log  (W2(ℒ(u0), ν)/ε) to ensure W2(ℒ(ut), ν) ≤ ε. This contraction/mixing estimate provides the Mix(T) term in . In the next section we quantify how this picture degrades when the drift F is replaced by a learned approximation θ, leading to an additive error contribution controlled by θ − FL2(ν; H).


5. Drift Perturbation and Learned Score Error: prove a drift-perturbation lemma bounding W2 distance between trajectories and between invariant measures in terms of F − L2(ν); connect this to DDO score approximation error.

We next quantify the effect of replacing the true drift F in by a learned approximation θ. At the continuum level we consider the perturbed dynamics

driven by the C-Wiener process Wt. We assume throughout that θ : H → H is globally Lipschitz (so is well posed) and that its pointwise discrepancy from F is controlled in the target measure,

The goal is an additive W2-error term proportional to δ, uniform in time after mixing.

The basic estimate is obtained by synchronous coupling and an energy inequality for the error process. Let (ut)t ≥ 0 solve and (t)t ≥ 0 solve , started from a coupling (u0, 0) and driven by the same Wt. Define et := t − ut. Then the noise cancels and
t = θ(t) − F(ut) = (F(t) − F(ut)) + (θ(t) − F(t)).
Using the chain rule and the one-sided dissipativity of F,

where the last step is Young’s inequality. Gr"onwall yields, for all t ≥ 0,

To turn into a closed W2-bound, we relate the integrand to . In the strongly log-concave regime, ν has finite second moments with constants depending only on (m, L, Tr(C)), and by the law of s remains close to ν once s is moderately large. Under the additional mild regularity that the discrepancy map G := θ − F is globally Lipschitz, one can bound

where K0 depends on the Lipschitz constant of G and on ν-moment bounds, hence ultimately on (m, L, Tr(C)) and the (dimension-free) Lipschitz controls imposed on θ. Combining – with the definition of W2 via couplings, we obtain a stability estimate of the form

with constants κ > 0 and K1 < ∞ independent of any spatial discretization. In particular, if ν̂θ ∈ 𝒫2(H) denotes an invariant measure (when it exists) for , then setting 0 ∼ ν̂θ and using invariance in gives

Thus, the learned drift incurs an irreducible bias controlled by the L2(ν; H) drift error.

Finally, we connect δ in to score approximation error in denoising diffusion operators. In the μ0-preconditioned Langevin setting the drift is F(u) = −(u + CHμ0Φ(u)). If a learned score model provides sθ(u) ≈ ∇Hμ0Φ(u) and we set
θ(u) := −(u + Csθ(u)),
then G(u) = θ(u) − F(u) = −C(sθ(u) − ∇Hμ0Φ(u)) and hence

Moreover, using the continuous embedding hH ≤ ∥C1/2ℒ(H)hHμ0 for h ∈ Hμ0, we also have
δ ≤ ∥Cℒ(H) ∥C1/2ℒ(H) ∥sθ − ∇Hμ0ΦL2(ν; Hμ0).
Consequently, any DDO training guarantee yielding an L2(ν) score error in either H or Hμ0 translates directly into the perturbation term K1δ in , and therefore into the overall end-to-end W2 bound once discretization error is added in the next section.


6. Time Discretization Error: Euler–Maruyama (and optional Crank–Nicolson) weak/strong error bounds for the function-space Langevin chain; obtain O(h1/2) (or improved) contribution to W2 error.

We now control the bias introduced by replacing the continuous-time perturbed dynamics by its Euler–Maruyama discretization. Throughout this section we fix a globally Lipschitz drift field θ : H → H with Lipschitz constant Lip(θ), and we assume a one-sided dissipativity condition ensuring uniform moment bounds for both the SPDE and the chain: there exist α > 0 and b ≥ 0 such that

(Under the strongly log-concave hypotheses on the target drift F, and for drift approximations θ that preserve the same coercive structure up to a controlled perturbation, is natural; we only use it here to keep constants from growing in time.)

Let t solve with initial condition 0 ∈ L2(Ω; H). Fix h ∈ (0, h0], tk := kh, T = Nh. The Euler–Maruyama chain is

which is equivalent to the update with $\sqrt{2h}\,\xi_k$ and ξk ∼ 𝒩(0, C). We couple {uk}k ≥ 0 and {t}t ≥ 0 by taking the same C-Wiener process Wt in and in , and by setting u0 = 0. This coupling directly yields a strong error estimate, and hence a W2-bound via the standard inequality W2(ℒ(X), ℒ(Y))2 ≤ 𝔼∥X − YH2.

To compare tk and uk, we write the exact increment of the SPDE on [tk, tk + 1]:
$$ \widehat{u}_{t_{k+1}} =\widehat{u}_{t_k} + \int_{t_k}^{t_{k+1}}\widehat{F}_\theta(\widehat{u}_s)\,ds + \sqrt{2}\,(W_{t_{k+1}}-W_{t_k}). $$
Subtracting gives the recursion for the error ek := uk − tk:

The remainder term rk is controlled by Lipschitz continuity:

We next bound 𝔼∥s − tkH2 for s ∈ [tk, tk + 1]. By It^o isometry and (a + b)2 ≤ 2a2 + 2b2,
$$ \widehat{u}_s-\widehat{u}_{t_k} =\int_{t_k}^{s}\widehat{F}_\theta(\widehat{u}_\tau)\,d\tau + \sqrt{2}\,(W_s-W_{t_k}), $$
hence

Since Wt is C-Wiener, 𝔼∥Ws − WtkH2 = (s − tk) Tr(C). Moreover, under and global Lipschitzness, standard Lyapunov estimates yield a uniform-in-time bound supt ∈ [0, T]𝔼∥tH2 ≤ MT and thus supt ∈ [0, T]𝔼∥θ(t)∥H2 ≤ T, with T depending on Lip(θ), (α, b), Tr(C), and 𝔼∥0H2, but not on any spatial resolution. Inserting these bounds into shows 𝔼∥s − tkH2 ≤ c1(s − tk) for s ∈ [tk, tk + 1], where c1 depends on (T, Tr(C)). Combining with yields

We return to . Squaring and taking expectations, using (a + b + c)2 ≤ (1 + η)a2 + (1 + η)b2 + (1 + 1/η)c2 and Lipschitzness of θ, we obtain for h ≤ h0 small enough (depending on Lip(θ)) a discrete Gr"onwall inequality of the form

with γ > 0 depending on the dissipativity in (or, more generally, on a one-sided Lipschitz constant for θ). Iterating and inserting yields 𝔼∥eNH2 ≤ KTh for a constant KT < ∞ depending on Lip(θ), (α, b), Tr(C), and T. Consequently,

for a constant K2 with the same parameter dependence.

The h1/2 rate in is the natural strong order of Euler–Maruyama for additive-noise stochastic evolution equations under global Lipschitz conditions. Under additional smoothness (e.g. θ ∈ C1 with Lipschitz derivative and suitable trace/Hilbert–Schmidt compatibility conditions ensuring higher-order It^o–Taylor remainders are controlled in H), one can improve the strong rate and obtain W2(ℒ(uN), ℒ(T)) ≤ 2h. We do not pursue such refinements, since the end-to-end sampling bound only requires a dimension-free control that vanishes as h → 0.

Finally, we note that an implicit or Crank–Nicolson-type variant may be preferable numerically when θ contains a stiff linear component. For instance, splitting θ(u) = −u + θ(u), one may treat u implicitly and θ explicitly:
$$ u_{k+1} = \frac{1}{1+h}u_k + \frac{h}{1+h}\widehat{G}_\theta(u_k) + \frac{\sqrt{2}}{1+h}(W_{t_{k+1}}-W_{t_k}), $$
which preserves contractivity properties uniformly in h and yields the same O(h1/2) strong rate under analogous assumptions. The subsequent analysis is unchanged: we always reduce W2-control to a synchronous strong error bound and track constants through dissipativity and the trace-class noise structure.


7. Uniform-in-Resolution (Galerkin) Results: show constants do not depend on n for projected dynamics; prove commutation/consistency conditions for Pn, noise, and drift needed for the discretize-later theorem.

We now formalize the ``discretize-later’’ principle for our bounds: when the algorithm is implemented in a finite-dimensional subspace Hn ⊂ H via an orthogonal projection Pn : H → Hn, the constants appearing in the W2-error estimates can be chosen independently of n. The key point is that, under natural consistency assumptions on (Pn, C) and the implementation of the drift oracle, the projected dynamics inherit the same Lipschitz/dissipativity structure and the same trace-class noise control, with no dimension-dependent degradation.

We assume that Pn is the orthogonal projection onto Hn, hence nonexpansive:

We also assume (basis alignment):

This is automatic when Pn is defined by truncation in an eigenbasis of C, and it is precisely the condition that allows noise sampling and preconditioning to be consistent across resolutions.

Define the truncated covariance Cn := PnCPn as an operator on Hn. By we have Cn = CPn = PnC and therefore

so any estimate in which the noise enters only through Tr(C) remains valid uniformly in n.

Let Wt be a C-Wiener process on H and set Wt(n) := PnWt. For any x, y ∈ Hn,
𝔼[⟨Wt(n), xHWt(n), yH] = 𝔼[⟨Wt, xHWt, yH] = tCx, yH = tCnx, yH,
where we used x = Pnx, y = Pny, and . Hence Wt(n) is a Cn-Wiener process on Hn, and the projected noise increments satisfy
Wtk + 1(n) − Wtk(n) ∼ 𝒩(0, hCn),   equivalently   Pnξk ∼ 𝒩(0, Cn).
In particular, we may and will couple all resolutions {Hn} and the continuum by using the underlying Wt and then projecting, which is the natural resolution-consistent coupling.

Given a drift field G : H → H, we define its Galerkin version by Gn := PnG. If G is globally Lipschitz on H, then by

so Lip(Gn) ≤ Lip(G) with no dependence on n. Likewise, if G satisfies the one-sided dissipativity condition , then for u ∈ Hn,

so holds for Gn with the (α, b).

Applying these observations to G = θ shows that the projected perturbed SPDE

and its Euler–Maruyama discretization

inherit the same global Lipschitz constant and the same dissipativity parameters as their continuum counterparts. Consequently, all moment bounds used in the time-discretization analysis (and hence the constant K2 in ) may be chosen uniformly in n, with Tr(C) replacing Tr(Cn) via .

The contractivity and stability estimates in W2 are likewise uniform. Indeed, any bound obtained by synchronous coupling and estimates of the form
x − y, G(x) − G(y)⟩H ≤ −κx − yH2
is preserved under projection on Hn: for x, y ∈ Hn,
x − y, Gn(x) − Gn(y)⟩H = ⟨x − y, Pn(G(x) − G(y))⟩H = ⟨x − y, G(x) − G(y)⟩H.
Thus the same contraction rate κ applies to the projected exact dynamics (with drift Fn := PnF) and to the projected perturbed dynamics (with drift θ, n := Pnθ).

Finally, the learned-drift error does not worsen under projection: by ,
𝔼u ∼ νFn(u) − θ, n(u)∥H2 = 𝔼u ∼ νPn(F(u) − θ(u))∥H2 ≤ 𝔼u ∼ νF(u) − θ(u)∥H2 ≤ δ2.
Moreover, since Pn is 1-Lipschitz, W2 is nonexpansive under pushforward: for any H-valued random variables X, Y,

Combining with the continuum bounds (mixing, drift perturbation stability, and Euler–Maruyama error) yields the claimed discretization-uniform estimate for the Galerkin chain targeting Pnν:
W2(ℒ(uN(n)), Pnν) ≤ eκTW2(ℒ(u0(n)), Pnν) + K(δ + h1/2),
where κ and K depend only on (m, L, Tr(C)) and on the dissipativity/Lipschitz parameters imposed on θ, but not on n. This is the sense in which the sampling guarantee is : the number of steps required to reach a prescribed accuracy is controlled by the statistical and temporal errors (δ, h), and not by the spatial discretization level.


8. Annealed Noise Scales: extend the bound to a schedule {νt} (DDO/NCSN-like); add an explicit annealing error term and guidelines for choosing ht, M, and σt.

In many applications the learned drift (or score-induced drift) is available not only for the target ν, but for a of noised'' measures \(\{\nu_t\}_{t\in[0,1]}\) which interpolates between aneasy’’ reference at t = 1 and the desired target at t = 0. The typical score-based instantiation is a noise scale σ ↦ νσ, where

so that νσ becomes progressively more Gaussian as σ increases. We treat the general situation in which for each t we have a strongly log-concave invariant measure νt for a preconditioned Langevin dynamics with drift
Ft(u) := −(u + CHμ0Φt(u)),
and we have a learned approximation θ, t satisfying a uniform-in-resolution error control

together with Lipschitz/dissipativity assumptions (uniform in t, or at least controlled along the schedule) sufficient to apply the continuum perturbation and Euler–Maruyama estimates at each level.

Fix a decreasing schedule 1 = t0 > t1 > ⋯ > tM = 0. For each level j ∈ {0, …, M − 1}, we run an Euler–Maruyama chain with step hj and Nj steps (sampling time τj := Njhj) targeting νtj + 1:

initialized by u0j := uNj − 1j − 1 (with u00 given). Denote by ν̂j := ℒ(uNjj) the distribution after level j, so the final output law is ν̂M − 1, intended to approximate νtM = ν0.

Iterating the end-to-end bound level-by-level yields a decomposition into (i) residual mixing from the initial condition at the first level, (ii) the accumulated learned-drift and time-discretization errors, and (iii) a term measuring the Wasserstein distance between consecutive targets. Concretely, under a uniform contraction rate κ > 0 and a uniform constant K (depending on the same structural parameters as before, but independent of any Galerkin resolution), we obtain

The middle sum is the : even if each level were sampled exactly from νtj + 1, a finite schedule cannot avoid paying for the cumulative discrepancy between successive targets, unless additional time is spent to contract these errors away (as reflected by the exponential weights).

In the convolutional noise model , we can make this term fully explicit. Let σ0 > σ1 > ⋯ > σM = 0 and write νσj in place of νtj. Coupling U + σjΞ and U + σj + 1Ξ with the (U, Ξ) gives
$$ W_2(\nu_{\sigma_j},\nu_{\sigma_{j+1}}) \le \bigl(\mathbb{E}\|(\sigma_j-\sigma_{j+1})\Xi\|_H^2\bigr)^{1/2} =|\sigma_j-\sigma_{j+1}|\,\sqrt{\mathrm{Tr}(C)}. $$
Thus the annealing error is controlled by the discrete total variation of the noise schedule, scaled by $\sqrt{\mathrm{Tr}(C)}$, with no dependence on spatial resolution.

To reach a prescribed accuracy ε, we balance the three contributions in .

First, we choose per-level sampling times τj so that the exponential weights make earlier errors negligible, e.g.
$$ e^{-\kappa \tau_j}\approx \frac{1}{M} \quad\Longleftrightarrow\quad \tau_j \approx \kappa^{-1}\log M, $$
or larger at the smallest-noise levels where one typically observes larger learned error δσ.

Second, we choose the number of levels M and schedule increments so that jW2(νtj, νtj + 1) ≲ ε. In the convolutional case, it suffices to enforce
$$ \sum_{j=0}^{M-1} |\sigma_j-\sigma_{j+1}|\,\sqrt{\mathrm{Tr}(C)} \lesssim \varepsilon, $$
so a simple choice is a geometric ladder σj + 1 = γσj with γ ∈ (0, 1) and σ0 fixed large enough that νσ0 is easy to initialize. Then j|σj − σj + 1| = σ0, and increasing M primarily improves the annealing term in by allowing contraction between closer consecutive targets.

Third, we select hj to control the discretization error hj1/2 (or hj under higher regularity) and to satisfy the stability constraint hj ≤ h0 required by the Euler–Maruyama analysis. If Lipschitz constants improve at larger noise (as is typical for DDO/NCSN training), one may take hj larger for large σj and decrease hj as σj ↓ 0, while keeping hj ≲ ε2 if the h1/2 rate is the operative one.

All statements above remain valid for the Galerkin-projected implementation, with the same constants, by combining with the uniform-in-n arguments from Section~.


9. Implementation Notes & Empirical Validation Plan: how to estimate δ in practice, diagnostic plots, and experiments (Gaussian mixture GRF, Darcy inverse posterior) to test resolution-invariant scaling.

We record practical considerations for implementing the sampler and for empirically assessing the quantities that enter the dimension-free bound, with particular emphasis on estimating the drift error level δt and on verifying that performance does not degrade as the Galerkin resolution increases.

The bound depends on an L2(νt; H) quantity,
δt2 = 𝔼u ∼ νtθ, t(u) − Ft(u)∥H2,
which is not directly observable when Ft is defined through an unknown Φt. In practice we consider the following complementary estimators/proxies.

To preserve the commutation PnC = CPn and obtain resolution-invariant constants in practice, we implement in a basis {ei}i ≥ 1 diagonalizing C, i.e. Cei = λiei, and represent $u=\sum_{i=1}^n u_i e_i$. Then ξ ∼ 𝒩(0, C) is sampled by $\xi_i=\sqrt{\lambda_i}z_i$ with $z_i\stackrel{i.i.d.}{\sim}\mathcal{N}(0,1)$, and $P_n\xi=\sum_{i=1}^n \xi_i e_i$. When H = L2(D) and C is stationary, this is naturally realized by FFT-based sampling; otherwise, we precompute a truncated Karhunen–Lo`eve expansion. We approximate ∥ ⋅ ∥H consistently with the basis (or with quadrature weights on a grid) and report all errors in this norm to avoid spurious resolution effects.

Beyond monitoring observables, we emphasize diagnostics aligned with the proof mechanism (contraction plus perturbation). We recommend: (i) , where two chains started from different initializations but driven by the same Gaussian noises are used to estimate empirical decay of 𝔼∥uk − vkH2 and hence an effective κ; (ii) , comparing histograms and second moments of u, eiH for the first r modes across resolutions; (iii) , tracking Φ(uk) (when computable) and ukH2 to detect step-size instability; and (iv) , computing W2 on a fixed low-dimensional projection (first r modes) via standard finite-dimensional solvers, which yields a resolution-comparable metric even when full W2 is impractical.

We consider a non-Gaussian target on H defined as a mixture of two Gaussian measures with shared covariance C,
$$ \nu := \tfrac12 \mathcal{N}(m_1,C)+\tfrac12 \mathcal{N}(m_2,C), $$
with m1, m2 ∈ Hμ0 supported on low frequencies. This target intentionally violates strong log-concavity, and therefore serves to probe failure modes while keeping the score analytically tractable (the mixture score can be evaluated in closed form on each Hn). We train θ, σ from the corruption model νσ and evaluate: (i) how δ̂σ, n varies with n; (ii) how many steps N are needed for the projected W2 error on the first r modes to reach a fixed tolerance; and (iii) whether empirical contraction deteriorates with n. The expected outcome, if the method is resolution-consistent, is that the required N is stable in n until representation error (insufficient network capacity for high frequencies) becomes dominant.

We consider an elliptic inverse problem with prior μ0 = 𝒩(0, C) on the log-permeability field u, and observations y = G(u) + η, η ∼ 𝒩(0, Γ). The posterior satisfies ν ≪ μ0 with
$$ \Phi(u)=\tfrac12\bigl\|\Gamma^{-1/2}(G(u)-y)\bigr\|^2. $$
We obtain reference samples on a moderate discretization n using a standard function-space MCMC method (e.g. pCN) and train θ, σ using the trace-class corruption νσ. We then run the learned sampler on resolutions n ∈ {n, 2n, 4n} without retraining and compare: (i) posterior predictive statistics (pushforwards through G and sensor predictions), (ii) low-mode marginals against the reference chain projected to the same modes, and (iii) the dependence of effective sample size per unit compute on n. The central empirical claim corresponding to the theory is that, once δσ is controlled uniformly, the iteration complexity needed to stabilize these diagnostics should not increase with n, up to the expected (nlog n) per-step cost.


10. Discussion: limitations (nonconvexity, small-noise singularity), relationship to other infinite-dimensional diffusion formulations, and open directions (beyond log-concave; adaptive preconditioners).

Our analysis is intentionally confined to a regime in which three ingredients interact cleanly: (i) trace-class, strictly positive noise with a fixed covariance operator C; (ii) strong log-concavity of ν along Hμ0, encoded by m-strong monotonicity and L-Lipschitzness of Hμ0Φ; and (iii) a learned drift error controlled in an L2(ν; H) sense by δ. These assumptions are sufficient to make the proof mechanism essentially one-dimensional: synchronous coupling yields contraction, perturbation yields an additive bias O(δ), and Euler–Maruyama yields a discretization bias O(h1/2). The benefit is that the resulting constants are independent of Galerkin resolution, but the scope is correspondingly limited.

A first limitation is nonconvexity (or multimodality). If Φ is not strongly convex along the Cameron–Martin directions, we generally lose global contractivity in W2 for the continuum semigroup and therefore lose the straightforward mixing term eκT. Even when the invariant measure exists and the SPDE is well-posed, the relevant convergence mechanism can be metastable and dominated by barrier-crossing times, for which dimension-free bounds are substantially more delicate. One can still pursue bounds of the form
W2(ℒ(uT), ν) ≤ Mix(T; V) + Bias(δ, h),
where Mix(T; V) is controlled via a Lyapunov function V and a minorization condition on suitable sublevel sets. However, establishing such conditions in function space, and doing so uniformly over Pn, typically requires additional structure (e.g. a globally dissipative drift plus a noise that is sufficiently nondegenerate on a ``small set’’). In the nonconvex setting, we therefore view the present result as giving a baseline: it characterizes when one expect resolution-invariant behavior, and it isolates precisely which part of the argument fails when contractivity is absent.

A second limitation concerns small-noise or singular-noise regimes. Our setting assumes C is trace-class and fixed as the resolution increases, so ξ ∼ 𝒩(0, C) is H-valued almost surely and Tr(C) < ∞ enters the constants. If one instead considers ``whiter’’ noise (e.g. covariance approaching the identity on finer grids), then Tr(C) may diverge with n, and both the implementation and the bounds can degrade. This is not merely technical: if the corruption/noise injects energy at arbitrarily high frequencies, then learning a drift in an H-norm that remains uniformly accurate becomes information-theoretically and computationally harder, and the error level δ typically grows with resolution. This small-noise singularity also arises in annealing limits where one lets the effective diffusion scale go to zero; in such limits, the bias term induced by δ is amplified unless δ simultaneously vanishes at a compatible rate.

A third limitation is that the learning assumption is expressed under the law ν:
𝔼u ∼ νF(u) − θ(u)∥H2 ≤ δ2,
whereas the sampler operates out of stationarity during burn-in and may explore regions under its own transient law. In strongly contractive regimes this mismatch is mitigated, but, strictly speaking, an off-distribution control (e.g. uniform Lipschitzness of θ and a weighted L2 bound under a Lyapunov measure) would be more satisfactory. This suggests an open direction: replace the L2(ν; H) error by a stability notion that can be verified from training data and that is robust under the sampling dynamics, for example a bound of the form
supρ ∈ 𝒫2(H): ρ(V) ≤ M 𝔼u ∼ ρF(u) − θ(u)∥H2 ≤ δM2,
for a coercive V. Proving an end-to-end bound with δM would better reflect practical training and would more naturally accommodate non-log-concave targets.

It is also useful to place the present formulation among other infinite-dimensional diffusion constructions. The SPDE
$$ du_t\ =\ -(u_t+C\nabla_{H_{\mu_0}}\Phi(u_t))\,dt+\sqrt{2}\,dW_t $$
may be viewed as an overdamped Langevin dynamics preconditioned by the prior covariance, with Wt a C-Wiener process; this is closely related to function-space MALA and to the idea that the prior should set the geometry of the proposal. From this perspective, our resolution-invariant estimate can be read as a quantitative stability result for a μ0-reversible diffusion under simultaneous drift approximation and time discretization. Other diffusion-based approaches (e.g. Schr"odinger bridge formulations or score-based reverse-time SDEs) can also be posed on Hilbert spaces, but they often require either time-inhomogeneous drifts or additional regularity to justify changes of measure. An interesting conceptual link is that, in all cases, the trace-class nature of the noise is the key property that prevents high-frequency divergences and makes the continuum-first viewpoint meaningful.

Finally, we highlight two open directions that we expect to be central in applications. The first is to move beyond strong log-concavity while retaining resolution-invariant control, potentially by combining (a) contractivity on low modes, (b) coercivity of the prior on high modes, and (c) localized minorization driven by the nondegenerate directions of C. The second is to develop adaptive or learned preconditioners. Replacing C by an operator K (possibly state-dependent) could accelerate mixing, but it changes both the invariant measure and the discretization stability. Even in the linear case, choosing K to match posterior curvature resembles operator preconditioning; in the learned setting, one may attempt to parameterize K and the drift jointly while enforcing that the resulting diffusion preserves ν. Establishing dimension-free guarantees in such adaptive geometries appears feasible in principle, but it will require a careful extension of the contraction argument to nonconstant metrics and a corresponding notion of drift error aligned with the new geometry.