Resolution-Invariant Error Bounds for Denoising Diffusion Operators via Function-Space Langevin Stability

Metadata

Total Words: 6,878
Export Date: 2026-01-19 22:23:45
Description: Score-based diffusion models are typically defined on ℝ^d, and their computational cost and stability often deteriorate with increasing resolution. Denoising Diffusion Operators (DDOs) recently introduced a rigorous diffusion framework in function space by corrupting data with trace-class Gaussian random fields and defining scores as logarithmic derivatives with respect to Gaussian reference measures. This work turns DDO’s empirical “fixed cost independent of resolution” phenomenon into quantitative theorems. We study preconditioned Langevin dynamics on a separable Hilbert space H whose invariant measure is a target ν ≪ μ₀ with density exp (−Φ) relative to a trace-class Gaussian μ₀. In the strongly log-concave regime (strong monotonicity and Lipschitz Cameron–Martin gradients), we prove contraction of the exact Langevin semigroup in Wasserstein-2, establish stability of the invariant measure under learned drift perturbations, and bound the bias introduced by Euler–Maruyama discretization. Combining these ingredients yields end-to-end sampling error bounds of the form W₂(Law(u_N), ν) ≤ e^−κTW₂(Law(u₀), ν) + K(δ + h^1/2), where δ is the learned-score error and h the time step. Crucially, constants are independent of the spatial discretization, and the same guarantees hold uniformly over Galerkin projections, formalizing resolution-invariant generation. We discuss extensions to annealed noise scales and provide implementation/experimental protocols to validate tightness of bounds on inverse problems and PDE pushforward measures.

1. Introduction: motivation (resolution invariance), gap between empirical DDO results and end-to-end theory, contributions and scope (strongly log-concave targets; discretize-later guarantees).
2. Background on DDOs and Function-Space Scores: Gaussian reference measures, Cameron–Martin space, score operator as logarithmic derivative, DDO denoising objective, and preconditioned Langevin dynamics.
3. Problem Setup and Metrics: target class ν ≪ μ₀, contraction assumptions, Wasserstein-2 on Hilbert spaces, Galerkin discretizations P_n, and what “resolution-invariant bound” means formally.
4. Exact Continuum Langevin Contraction: define the SPDE and its Markov semigroup; prove W₂-contraction under strong monotonicity/Lipschitz conditions; derive mixing-time bound independent of discretization.
5. Drift Perturbation and Learned Score Error: prove a drift-perturbation lemma bounding W₂ distance between trajectories and between invariant measures in terms of ∥F − F̂∥_L²(ν); connect this to DDO score approximation error.
6. Time Discretization Error: Euler–Maruyama (and optional Crank–Nicolson) weak/strong error bounds for the function-space Langevin chain; obtain O(h^1/2) (or improved) contribution to W₂ error.
7. Uniform-in-Resolution (Galerkin) Results: show constants do not depend on n for projected dynamics; prove commutation/consistency conditions for P_n, noise, and drift needed for the discretize-later theorem.
8. Annealed Noise Scales: extend the bound to a schedule {ν_t} (DDO/NCSN-like); add an explicit annealing error term and guidelines for choosing h_t, M, and σ_t.
9. Implementation Notes & Empirical Validation Plan: how to estimate δ in practice, diagnostic plots, and experiments (Gaussian mixture GRF, Darcy inverse posterior) to test resolution-invariant scaling.
10. Discussion: limitations (nonconvexity, small-noise singularity), relationship to other infinite-dimensional diffusion formulations, and open directions (beyond log-concave; adaptive preconditioners).

Content

1. Introduction: motivation (resolution invariance), gap between empirical DDO results and end-to-end theory, contributions and scope (strongly log-concave targets; discretize-later guarantees).

Many contemporary inverse problems, generative models for physical fields, and Bayesian formulations of PDE-constrained learning lead naturally to probability measures on infinite-dimensional state spaces. In such settings the object of interest is not a vector in ℝ^d but a function u taking values in a separable Hilbert space H. Any implementable algorithm necessarily produces a finite-dimensional approximation u⁽ⁿ⁾ ∈ H_n, obtained for instance by truncating a Fourier series or by using a grid-based discretization. A recurring practical observation is that some learned samplers—especially those based on neural operators and denoising objectives—appear to behave stably as the resolution n is refined: one trains at a certain discretization and samples at another, with only mild degradation. Our goal is to formulate and prove a precise form of this in a setting where both the analysis and the algorithm are naturally expressed at the function-space level.

The central difficulty is that the usual convergence theory for Markov chain Monte Carlo (MCMC) algorithms is typically dimension dependent. Even when one proves geometric ergodicity in finite dimensions, the constants often deteriorate with d, and naive limits as d → ∞ can be misleading. In the function-space context one must distinguish two sources of instability. First, the measure may become increasingly ill-conditioned under discretization (e.g., if the potential introduces high-frequency stiffness). Second, the may inject discretization artifacts (e.g., by using noise models that are not H-valued, or by training objectives whose risk necessarily grows with resolution). If either phenomenon occurs, uniform performance across n is impossible, and any apparent empirical stability must be explained by additional structure.

The present work is motivated by denoising diffusion operators (DDOs) and related score-based constructions in which a learned operator approximates a score or drift acting on functions. Empirically, such models are often trained from samples corrupted by Gaussian fields and then used to drive Langevin-type dynamics. While one can justify individual components of this pipeline (e.g., consistency of denoising estimators under suitable conditions), a complete statement of the form
sampling error ≤ mixing error + learning error + time-discretization error,
with constants independent of spatial resolution, has been missing. This gap is not merely technical: without an explicit function-space statement, one cannot disentangle whether a method is robust because it genuinely approximates a well-posed infinite-dimensional algorithm, or because discretization-dependent effects happen to cancel at the tested resolutions.

We therefore adopt a viewpoint. We formulate the target distribution ν on H through a Gaussian reference measure μ₀ = 𝒩(0, C) and a potential Φ, and we study the preconditioned Langevin dynamics whose invariant measure is ν. The algorithm we ultimately implement is Euler–Maruyama applied to a learned drift F̂_θ, with additive trace-class Gaussian noise. The analysis proceeds in three steps: (i) contraction and mixing of the exact continuum Langevin semigroup in W₂; (ii) stability of the law with respect to drift perturbations, quantified by an L²(ν; H) error level δ; and (iii) a W₂ control of the Euler–Maruyama discretization error, scaling as h^1/2 under minimal assumptions. The key point is that each step admits bounds whose constants are determined by function-space quantities (monotonicity and Lipschitz constants in the Cameron–Martin geometry, and Tr(C)), and hence remain uniform over Galerkin projections that commute with the covariance.

Our contributions are the following.

The scope of the theory is deliberately focused. We work in a strongly log-concave regime, where the relevant monotonicity and Lipschitz constants are uniform and where trace-class noise ensures that the driving Gaussian perturbations are H-valued. These hypotheses are restrictive relative to the most challenging diffusion-model applications, but they delineate a regime in which resolution-invariant guarantees are mathematically natural and provably attainable. They also clarify failure modes: if the corruption or driving noise is not trace class (e.g., effectively white in the discretization limit), then the learned error δ cannot, in general, remain bounded as n → ∞, and dimension-free sampling bounds are not available without additional regularity.

The remainder of the paper develops the necessary background and then formalizes the three-step argument above. In the next section we recall Gaussian reference measures and the Cameron–Martin structure, describe how scores appear as logarithmic derivatives, and explain how DDO denoising objectives lead to operator-valued score approximations compatible with the function-space Langevin dynamics.

2. Background on DDOs and Function-Space Scores: Gaussian reference measures, Cameron–Martin space, score operator as logarithmic derivative, DDO denoising objective, and preconditioned Langevin dynamics.

We recall the objects that allow one to speak meaningfully about ``scores’’ and Langevin drifts on an infinite-dimensional Hilbert space. Throughout, μ₀ = 𝒩(0, C) is a centered Gaussian measure on H with C self-adjoint, strictly positive, and trace-class. The trace-class assumption implies that μ₀(H) = 1 (in particular, a μ₀-distributed random element is H-valued), and it ensures that the Cameron–Martin space
H_μ₀ := Im(C^1/2), ∥h∥_{H_μ₀} := ∥C^−1/2h∥_H,
is continuously embedded in H. The Cameron–Martin theorem characterizes the quasi-invariance of μ₀ under translations by h ∈ H_μ₀, and hence provides the correct geometric notion of differentiability for densities built from μ₀.

A basic calculus tool is the Gaussian integration-by-parts formula along Cameron–Martin directions. If f is a sufficiently smooth cylindrical function on H and h ∈ H_μ₀, then

where ∂_hf(u) denotes the directional derivative along h. In particular, identifies the logarithmic derivative of μ₀ along H_μ₀ as the map u ↦ −C⁻¹u, understood in the weak sense encoded by . This provides the infinite-dimensional analogue of the finite-dimensional identity ∇log 𝒩(0, C)(u) = −C⁻¹u.

Now let ν ≪ μ₀ be given by
ν(du) = Z⁻¹exp ( − Φ(u)) μ₀(du), Z := ∫_He^−Φ dμ₀,
where Φ is differentiable along H_μ₀. Differentiating the density along Cameron–Martin directions and combining with yields the integration-by-parts formula under ν:

Thus the logarithmic derivative of ν relative to the is the H_μ₀-valued map
$$ u \longmapsto -\nabla_{H_{\mu_0}}\Phi(u) = \nabla_{H_{\mu_0}}\log\frac{d\nu}{d\mu_0}(u), $$
whereas the full logarithmic derivative of ν in the sense of contains the additional term C⁻¹u inherited from μ₀. This distinction is convenient: in function space, there is no Lebesgue reference, and the object that is stable under discretization is typically the μ₀-relative score ∇_{H_μ₀}log (dν/dμ₀).

The preconditioned Langevin SPDE is the diffusion on H whose generator encodes and for which ν is invariant. Writing
F(u) := −(u + C∇_{H_μ₀}Φ(u)),
the dynamics read
$$ du_t = F(u_t)\,dt + \sqrt{2}\,dW_t, $$
where W_t is a C-Wiener process. Heuristically, F consists of the Ornstein–Uhlenbeck drift −u associated with μ₀ and the additional term −C∇_{H_μ₀}Φ that accounts for the potential. Equivalently,
$$ F(u) = -u + C\,\nabla_{H_{\mu_0}}\log\frac{d\nu}{d\mu_0}(u), $$
so the drift is obtained by applying the preconditioner C to the μ₀-relative score and then adding the stabilizing linear term −u. Under standard dissipativity and regularity conditions, one verifies invariance of ν by checking that the corresponding Kolmogorov operator is symmetric (or at least has ν as a stationary measure) on cylindrical test functions; is precisely the identity needed for this computation.

We next summarize how denoising diffusion operators (DDOs) produce approximations of these score objects in a manner compatible with the function-space setting. Fix a noise level τ > 0, and consider the corruption model
$$ y = u + \sqrt{\tau}\,\xi, \qquad u\sim \nu,\ \ \xi\sim \mathcal{N}(0,C)\ \text{independent}. $$
Since C is trace-class, ξ ∈ H almost surely and hence y ∈ H almost surely. A DDO is an operator-valued estimator G_θ( ⋅ , τ) : H → H trained by the denoising objective
min_θ 𝔼[∥G_θ(y, τ) − u∥_H²],
or a variant in which the loss is measured in a Cameron–Martin norm. The minimizer of this objective (over all measurable maps) is the conditional mean G^*(y, τ) = 𝔼[u ∣ y]. Under suitable regularity, this conditional mean determines the score of the corrupted law via an infinite-dimensional Tweedie-type identity: the difference G^*(y, τ) − y is a smoothed version of a gradient of the log-density of the corrupted measure, and applying C⁻¹ transfers this difference into the Cameron–Martin space. Concretely, one may define a score surrogate by
ŝ_θ(y, τ) := τ⁻¹C⁻¹(G_θ(y, τ) − y) ∈ H_μ₀,
which, when G_θ ≈ G^*, approximates an appropriate logarithmic derivative of the corrupted law along H_μ₀. In the small-noise limit (or in a multi-noise training scheme), such estimators are used to approximate ∇_{H_μ₀}log (dν/dμ₀) = −∇_{H_μ₀}Φ, and hence to construct a drift approximation of the form
F̂_θ(u) = −u + C ŝ_θ(u),
which matches the structure of the continuum drift F. The subsequent analysis treats F̂_θ abstractly as a measurable approximation of F, with its quality summarized by an L²(ν; H) error level; this is the point of contact between statistical training guarantees for DDOs and stability properties of Langevin sampling in function space.

3. Problem Setup and Metrics: target class ν ≪ μ₀, contraction assumptions, Wasserstein-2 on Hilbert spaces, Galerkin discretizations P_n, and what “resolution-invariant bound” means formally.

We study sampling from a target probability measure ν on H given in Radon–Nikodym form with respect to the Gaussian reference μ₀ = 𝒩(0, C),
ν(du) = Z⁻¹exp ( − Φ(u)) μ₀(du), Z := ∫_He^−Φ dμ₀,
where Φ : H → ℝ is differentiable along the Cameron–Martin space H_μ₀ = Im(C^1/2). The function-space geometry is governed by H_μ₀ and by the preconditioner C; in particular, the drift associated with the preconditioned Langevin dynamics is
F(u) = −(u + C∇_{H_μ₀}Φ(u)).
Our target class is the strongly log-concave regime in the Gaussian base geometry: we assume that ∇_{H_μ₀}Φ is globally L-Lipschitz and m-strongly monotone in H_μ₀, i.e.,
∥∇_{H_μ₀}Φ(u) − ∇_{H_μ₀}Φ(v)∥_{H_μ₀} ≤ L∥u − v∥_{H_μ₀},
and
⟨∇_{H_μ₀}Φ(u) − ∇_{H_μ₀}Φ(v), u − v⟩_{H_μ₀} ≥ m∥u − v∥_{H_μ₀}², u, v ∈ H_μ₀.
These conditions are the function-space analogue of uniform convexity and smoothness; they imply well-posedness of the continuum Langevin SPDE and yield quantitative contraction of its Markov semigroup in transport metrics. We emphasize that the constants m, L are assumed to be intrinsic to the model and, in particular, do not depend on any spatial discretization.

Our sampling algorithm will not typically access F exactly. Instead, we are given a measurable approximation F̂_θ : H → H (e.g. induced by a learned score), whose quality is summarized by an L²(ν; H) error level

We will treat δ as an exogenous parameter that captures learning and model-mismatch errors; the analysis below propagates δ through the sampling dynamics in a stable way.

The notion of ``closeness to ν’’ we use is the 2-Wasserstein distance on H. For Borel probability measures ρ, ρ^′ ∈ 𝒫₂(H) with finite second moments, we define
W₂(ρ, ρ^′)² := inf_{π ∈ Π(ρ, ρ^′)}∫_H × H∥x − y∥_H² π(dx, dy),
where Π(ρ, ρ^′) denotes the set of couplings of ρ and ρ^′. Since H is separable, the infimum is attained, and W₂ metrizes weak convergence plus convergence of second moments on 𝒫₂(H). The choice of W₂ is dictated by two features that we will exploit repeatedly: first, it is compatible with synchronous couplings of Langevin dynamics; second, it behaves well under Lipschitz maps. In particular, if T : H → H is 1-Lipschitz, then

a fact we will use with T = P_n a projection.

We implement the sampler by Euler–Maruyama time stepping. Given h > 0 and T = Nh, we consider

with initialization u₀ ∈ H (possibly random). The trace-class assumption on C ensures that ξ_k ∈ H almost surely, so defines an H-valued Markov chain under mild growth conditions on F̂_θ (in our setting we will assume global Lipschitzness/dissipativity as needed for the discretization estimates). Our primary output is the law ℒ(u_N), and the goal is to bound W₂(ℒ(u_N), ν) in terms of the runtime T, the step size h, and the drift error δ.

To connect the continuum-first analysis to practical discretizations, we introduce a family of finite-dimensional subspaces {H_n}_n ≥ 1 with orthogonal projections P_n : H → H_n. We assume that P_n is nonexpansive, ∥P_nx∥_H ≤ ∥x∥_H, and that the discretization is consistent with the Gaussian structure in the sense that

The commutation is natural when P_n truncates in an eigenbasis of C (e.g. Fourier/Karhunen–Lo`eve truncations), and it ensures that P_nξ ∼ 𝒩(0, P_nCP_n) and that projected C-Wiener processes are P_nCP_n-Wiener processes on H_n.

The corresponding projected Euler–Maruyama chain is

By construction, evolves in H_n. The natural finite-dimensional target is the pushforward P_n♯ν, i.e. the law of P_nu when u ∼ ν. The stability of the learning error under projection follows immediately from nonexpansiveness:
𝔼_u ∼ ν[∥P_nF̂_θ(u) − P_nF(u)∥_H²] ≤ 𝔼_u ∼ ν[∥F̂_θ(u) − F(u)∥_H²] ≤ δ².

We now formalize what we mean by a resolution-invariant sampling bound. For each n, the algorithm induces a law ℒ(u_N⁽ⁿ⁾) ∈ 𝒫₂(H_n), which we view as a measure on H supported on H_n. A bound is resolution-invariant if it takes the form

where the functions Mix, Learn, Disc and all constants implicit in them depend only on intrinsic model parameters (e.g. m, L, Tr(C) and regularity of F̂_θ) and on n. In particular, we seek bounds of the type
Mix(T) = e^−κTW₂(ℒ(u₀), ν), Learn(δ) = Kδ, Disc(h) = Kh^1/2,
with κ, K independent of resolution. The next section establishes the key ingredient behind Mix(T): a W₂-contraction estimate for the exact continuum Langevin dynamics, from which mixing-time bounds follow without reference to any discretization.

4. Exact Continuum Langevin Contraction: define the SPDE and its Markov semigroup; prove W₂-contraction under strong monotonicity/Lipschitz conditions; derive mixing-time bound independent of discretization.

We consider the continuum preconditioned Langevin dynamics on H,

driven by a C-Wiener process W_t. Under the standing Lipschitz and monotonicity assumptions on ∇_{H_μ₀}Φ, the SPDE is well posed and defines a time-homogeneous Markov process {u_t}_t ≥ 0 on H. We denote by u_t^u the solution started at u₀^u = u ∈ H. The associated Markov semigroup {P_t}_t ≥ 0 acts on bounded measurable test functions φ : H → ℝ by
(P_tφ)(u) := 𝔼[φ(u_t^u)],
and, by duality, on probability measures ρ ∈ 𝒫₂(H) via ρP_t := ℒ(u_t^U₀) when U₀ ∼ ρ. The measure ν is invariant for (indeed, is the standard μ₀-preconditioned Langevin diffusion targeting ν); we will use invariance only as an identity νP_t = ν and not rely on any explicit formula for the transition kernels.

The key quantitative property we need is a contraction estimate for the semigroup in W₂. This follows from a synchronous coupling argument. Fix u, v ∈ H, and let u_t := u_t^u and v_t := u_t^v solve driven by the Wiener path W_t. Defining the difference e_t := u_t − v_t, we observe that the noise cancels and hence

We estimate the time derivative of ∥e_t∥_H². Using and the chain rule,

The drift F is one-sided dissipative in the H-inner product. Indeed, for any x, y ∈ H such that ∇_{H_μ₀}Φ(x), ∇_{H_μ₀}Φ(y) ∈ H_μ₀, we compute

Inserting into yields
$$ \frac{d}{dt}\|e_t\|_H^2 \le -2\|e_t\|_H^2, $$
and therefore, by Gr"onwall,

In particular, taking expectations (trivial here since is pathwise) shows that the synchronous coupling contracts second moments at rate e^−2t.

We translate into a W₂-contraction bound for the semigroup. Let ρ, ρ^′ ∈ 𝒫₂(H) and let (U₀, V₀) be any coupling with ℒ(U₀) = ρ and ℒ(V₀) = ρ^′. Drive the coupled dynamics from (U₀, V₀) using the same Wiener process, producing (U_t, V_t). Then ℒ(U_t) = ρP_t and ℒ(V_t) = ρ^′P_t, and (U_t, V_t) is a coupling of these laws. Hence,
W₂(ρP_t, ρ^′P_t)² ≤ 𝔼[∥U_t − V_t∥_H²] ≤ e^−2t 𝔼[∥U₀ − V₀∥_H²].
Taking the infimum over all initial couplings (U₀, V₀) gives the contraction estimate

We emphasize that the rate in is intrinsic to the continuum drift structure and does not depend on any spatial discretization parameter. Moreover, implies uniqueness of an invariant measure in 𝒫₂(H) whenever such a measure exists; in our setting ν is invariant, so it is the unique invariant measure with finite second moment.

Finally, choosing ρ^′ = ν and using νP_t = ν yields the mixing estimate

Equivalently, for any ε > 0, it suffices to take t ≥ log (W₂(ℒ(u₀), ν)/ε) to ensure W₂(ℒ(u_t), ν) ≤ ε. This contraction/mixing estimate provides the Mix(T) term in . In the next section we quantify how this picture degrades when the drift F is replaced by a learned approximation F̂_θ, leading to an additive error contribution controlled by ∥F̂_θ − F∥_L²(ν; H).

5. Drift Perturbation and Learned Score Error: prove a drift-perturbation lemma bounding W₂ distance between trajectories and between invariant measures in terms of ∥F − F̂∥_L²(ν); connect this to DDO score approximation error.

We next quantify the effect of replacing the true drift F in by a learned approximation F̂_θ. At the continuum level we consider the perturbed dynamics

driven by the C-Wiener process W_t. We assume throughout that F̂_θ : H → H is globally Lipschitz (so is well posed) and that its pointwise discrepancy from F is controlled in the target measure,

The goal is an additive W₂-error term proportional to δ, uniform in time after mixing.

The basic estimate is obtained by synchronous coupling and an energy inequality for the error process. Let (u_t)_t ≥ 0 solve and (û_t)_t ≥ 0 solve , started from a coupling (u₀, û₀) and driven by the same W_t. Define e_t := û_t − u_t. Then the noise cancels and
ė_t = F̂_θ(û_t) − F(u_t) = (F(û_t) − F(u_t)) + (F̂_θ(û_t) − F(û_t)).
Using the chain rule and the one-sided dissipativity of F,

where the last step is Young’s inequality. Gr"onwall yields, for all t ≥ 0,

To turn into a closed W₂-bound, we relate the integrand to . In the strongly log-concave regime, ν has finite second moments with constants depending only on (m, L, Tr(C)), and by the law of û_s remains close to ν once s is moderately large. Under the additional mild regularity that the discrepancy map G := F̂_θ − F is globally Lipschitz, one can bound

where K₀ depends on the Lipschitz constant of G and on ν-moment bounds, hence ultimately on (m, L, Tr(C)) and the (dimension-free) Lipschitz controls imposed on F̂_θ. Combining – with the definition of W₂ via couplings, we obtain a stability estimate of the form

with constants κ > 0 and K₁ < ∞ independent of any spatial discretization. In particular, if ν̂_θ ∈ 𝒫₂(H) denotes an invariant measure (when it exists) for , then setting û₀ ∼ ν̂_θ and using invariance in gives

Thus, the learned drift incurs an irreducible bias controlled by the L²(ν; H) drift error.

Finally, we connect δ in to score approximation error in denoising diffusion operators. In the μ₀-preconditioned Langevin setting the drift is F(u) = −(u + C∇_{H_μ₀}Φ(u)). If a learned score model provides s_θ(u) ≈ ∇_{H_μ₀}Φ(u) and we set
F̂_θ(u) := −(u + Cs_θ(u)),
then G(u) = F̂_θ(u) − F(u) = −C(s_θ(u) − ∇_{H_μ₀}Φ(u)) and hence

Moreover, using the continuous embedding ∥h∥_H ≤ ∥C^1/2∥_ℒ(H)∥h∥_{H_μ₀} for h ∈ H_μ₀, we also have
δ ≤ ∥C∥_ℒ(H) ∥C^1/2∥_ℒ(H) ∥s_θ − ∇_{H_μ₀}Φ∥_{L²(ν; H_μ₀)}.
Consequently, any DDO training guarantee yielding an L²(ν) score error in either H or H_μ₀ translates directly into the perturbation term K₁δ in , and therefore into the overall end-to-end W₂ bound once discretization error is added in the next section.

6. Time Discretization Error: Euler–Maruyama (and optional Crank–Nicolson) weak/strong error bounds for the function-space Langevin chain; obtain O(h^1/2) (or improved) contribution to W₂ error.

We now control the bias introduced by replacing the continuous-time perturbed dynamics by its Euler–Maruyama discretization. Throughout this section we fix a globally Lipschitz drift field F̂_θ : H → H with Lipschitz constant Lip(F̂_θ), and we assume a one-sided dissipativity condition ensuring uniform moment bounds for both the SPDE and the chain: there exist α > 0 and b ≥ 0 such that

(Under the strongly log-concave hypotheses on the target drift F, and for drift approximations F̂_θ that preserve the same coercive structure up to a controlled perturbation, is natural; we only use it here to keep constants from growing in time.)

Let û_t solve with initial condition û₀ ∈ L²(Ω; H). Fix h ∈ (0, h₀], t_k := kh, T = Nh. The Euler–Maruyama chain is

which is equivalent to the update with $\sqrt{2h}\,\xi_k$ and ξ_k ∼ 𝒩(0, C). We couple {u_k}_k ≥ 0 and {û_t}_t ≥ 0 by taking the same C-Wiener process W_t in and in , and by setting u₀ = û₀. This coupling directly yields a strong error estimate, and hence a W₂-bound via the standard inequality W₂(ℒ(X), ℒ(Y))² ≤ 𝔼∥X − Y∥_H².

To compare û_{t_k} and u_k, we write the exact increment of the SPDE on [t_k, t_k + 1]:
$$ \widehat{u}_{t_{k+1}} =\widehat{u}_{t_k} + \int_{t_k}^{t_{k+1}}\widehat{F}_\theta(\widehat{u}_s)\,ds + \sqrt{2}\,(W_{t_{k+1}}-W_{t_k}). $$
Subtracting gives the recursion for the error e_k := u_k − û_{t_k}:

The remainder term r_k is controlled by Lipschitz continuity:

We next bound 𝔼∥û_s − û_{t_k}∥_H² for s ∈ [t_k, t_k + 1]. By It^o isometry and (a + b)² ≤ 2a² + 2b²,
$$ \widehat{u}_s-\widehat{u}_{t_k} =\int_{t_k}^{s}\widehat{F}_\theta(\widehat{u}_\tau)\,d\tau + \sqrt{2}\,(W_s-W_{t_k}), $$
hence

Since W_t is C-Wiener, 𝔼∥W_s − W_{t_k}∥_H² = (s − t_k) Tr(C). Moreover, under and global Lipschitzness, standard Lyapunov estimates yield a uniform-in-time bound sup_{t ∈ [0, T]}𝔼∥û_t∥_H² ≤ M_T and thus sup_{t ∈ [0, T]}𝔼∥F̂_θ(û_t)∥_H² ≤ M̃_T, with M̃_T depending on Lip(F̂_θ), (α, b), Tr(C), and 𝔼∥û₀∥_H², but not on any spatial resolution. Inserting these bounds into shows 𝔼∥û_s − û_{t_k}∥_H² ≤ c₁(s − t_k) for s ∈ [t_k, t_k + 1], where c₁ depends on (M̃_T, Tr(C)). Combining with yields

We return to . Squaring and taking expectations, using (a + b + c)² ≤ (1 + η)a² + (1 + η)b² + (1 + 1/η)c² and Lipschitzness of F̂_θ, we obtain for h ≤ h₀ small enough (depending on Lip(F̂_θ)) a discrete Gr"onwall inequality of the form

with γ > 0 depending on the dissipativity in (or, more generally, on a one-sided Lipschitz constant for F̂_θ). Iterating and inserting yields 𝔼∥e_N∥_H² ≤ K_Th for a constant K_T < ∞ depending on Lip(F̂_θ), (α, b), Tr(C), and T. Consequently,

for a constant K₂ with the same parameter dependence.

The h^1/2 rate in is the natural strong order of Euler–Maruyama for additive-noise stochastic evolution equations under global Lipschitz conditions. Under additional smoothness (e.g. F̂_θ ∈ C¹ with Lipschitz derivative and suitable trace/Hilbert–Schmidt compatibility conditions ensuring higher-order It^o–Taylor remainders are controlled in H), one can improve the strong rate and obtain W₂(ℒ(u_N), ℒ(û_T)) ≤ K̃₂ h. We do not pursue such refinements, since the end-to-end sampling bound only requires a dimension-free control that vanishes as h → 0.

Finally, we note that an implicit or Crank–Nicolson-type variant may be preferable numerically when F̂_θ contains a stiff linear component. For instance, splitting F̂_θ(u) = −u + Ĝ_θ(u), one may treat −u implicitly and Ĝ_θ explicitly:
$$ u_{k+1} = \frac{1}{1+h}u_k + \frac{h}{1+h}\widehat{G}_\theta(u_k) + \frac{\sqrt{2}}{1+h}(W_{t_{k+1}}-W_{t_k}), $$
which preserves contractivity properties uniformly in h and yields the same O(h^1/2) strong rate under analogous assumptions. The subsequent analysis is unchanged: we always reduce W₂-control to a synchronous strong error bound and track constants through dissipativity and the trace-class noise structure.

7. Uniform-in-Resolution (Galerkin) Results: show constants do not depend on n for projected dynamics; prove commutation/consistency conditions for P_n, noise, and drift needed for the discretize-later theorem.

We now formalize the ``discretize-later’’ principle for our bounds: when the algorithm is implemented in a finite-dimensional subspace H_n ⊂ H via an orthogonal projection P_n : H → H_n, the constants appearing in the W₂-error estimates can be chosen independently of n. The key point is that, under natural consistency assumptions on (P_n, C) and the implementation of the drift oracle, the projected dynamics inherit the same Lipschitz/dissipativity structure and the same trace-class noise control, with no dimension-dependent degradation.

We assume that P_n is the orthogonal projection onto H_n, hence nonexpansive:

We also assume (basis alignment):

This is automatic when P_n is defined by truncation in an eigenbasis of C, and it is precisely the condition that allows noise sampling and preconditioning to be consistent across resolutions.

Define the truncated covariance C_n := P_nCP_n as an operator on H_n. By we have C_n = CP_n = P_nC and therefore

so any estimate in which the noise enters only through Tr(C) remains valid uniformly in n.

Let W_t be a C-Wiener process on H and set W_t⁽ⁿ⁾ := P_nW_t. For any x, y ∈ H_n,
𝔼[⟨W_t⁽ⁿ⁾, x⟩_H⟨W_t⁽ⁿ⁾, y⟩_H] = 𝔼[⟨W_t, x⟩_H⟨W_t, y⟩_H] = t⟨Cx, y⟩_H = t⟨C_nx, y⟩_H,
where we used x = P_nx, y = P_ny, and . Hence W_t⁽ⁿ⁾ is a C_n-Wiener process on H_n, and the projected noise increments satisfy
W_{t_k + 1}⁽ⁿ⁾ − W_{t_k}⁽ⁿ⁾ ∼ 𝒩(0, h C_n), equivalently P_nξ_k ∼ 𝒩(0, C_n).
In particular, we may and will couple all resolutions {H_n} and the continuum by using the underlying W_t and then projecting, which is the natural resolution-consistent coupling.

Given a drift field G : H → H, we define its Galerkin version by G_n := P_nG. If G is globally Lipschitz on H, then by

so Lip(G_n) ≤ Lip(G) with no dependence on n. Likewise, if G satisfies the one-sided dissipativity condition , then for u ∈ H_n,

so holds for G_n with the (α, b).

Applying these observations to G = F̂_θ shows that the projected perturbed SPDE

and its Euler–Maruyama discretization

inherit the same global Lipschitz constant and the same dissipativity parameters as their continuum counterparts. Consequently, all moment bounds used in the time-discretization analysis (and hence the constant K₂ in ) may be chosen uniformly in n, with Tr(C) replacing Tr(C_n) via .

The contractivity and stability estimates in W₂ are likewise uniform. Indeed, any bound obtained by synchronous coupling and estimates of the form
⟨x − y, G(x) − G(y)⟩_H ≤ −κ∥x − y∥_H²
is preserved under projection on H_n: for x, y ∈ H_n,
⟨x − y, G_n(x) − G_n(y)⟩_H = ⟨x − y, P_n(G(x) − G(y))⟩_H = ⟨x − y, G(x) − G(y)⟩_H.
Thus the same contraction rate κ applies to the projected exact dynamics (with drift F_n := P_nF) and to the projected perturbed dynamics (with drift F̂_θ, n := P_nF̂_θ).

Finally, the learned-drift error does not worsen under projection: by ,
𝔼_u ∼ ν∥F_n(u) − F̂_θ, n(u)∥_H² = 𝔼_u ∼ ν∥P_n(F(u) − F̂_θ(u))∥_H² ≤ 𝔼_u ∼ ν∥F(u) − F̂_θ(u)∥_H² ≤ δ².
Moreover, since P_n is 1-Lipschitz, W₂ is nonexpansive under pushforward: for any H-valued random variables X, Y,

Combining with the continuum bounds (mixing, drift perturbation stability, and Euler–Maruyama error) yields the claimed discretization-uniform estimate for the Galerkin chain targeting P_n♯ν:
W₂(ℒ(u_N⁽ⁿ⁾), P_n♯ν) ≤ e^−κTW₂(ℒ(u₀⁽ⁿ⁾), P_n♯ν) + K(δ + h^1/2),
where κ and K depend only on (m, L, Tr(C)) and on the dissipativity/Lipschitz parameters imposed on F̂_θ, but not on n. This is the sense in which the sampling guarantee is : the number of steps required to reach a prescribed accuracy is controlled by the statistical and temporal errors (δ, h), and not by the spatial discretization level.

8. Annealed Noise Scales: extend the bound to a schedule {ν_t} (DDO/NCSN-like); add an explicit annealing error term and guidelines for choosing h_t, M, and σ_t.

In many applications the learned drift (or score-induced drift) is available not only for the target ν, but for a of noised'' measures $\{\nu_t\}_{t\in[0,1]}$ which interpolates between aneasy’’ reference at t = 1 and the desired target at t = 0. The typical score-based instantiation is a noise scale σ ↦ ν_σ, where

so that ν_σ becomes progressively more Gaussian as σ increases. We treat the general situation in which for each t we have a strongly log-concave invariant measure ν_t for a preconditioned Langevin dynamics with drift
F_t(u) := −(u + C∇_{H_μ₀}Φ_t(u)),
and we have a learned approximation F̂_θ, t satisfying a uniform-in-resolution error control

together with Lipschitz/dissipativity assumptions (uniform in t, or at least controlled along the schedule) sufficient to apply the continuum perturbation and Euler–Maruyama estimates at each level.

Fix a decreasing schedule 1 = t₀ > t₁ > ⋯ > t_M = 0. For each level j ∈ {0, …, M − 1}, we run an Euler–Maruyama chain with step h_j and N_j steps (sampling time τ_j := N_jh_j) targeting ν_{t_j + 1}:

initialized by u₀^j := u_{N_j − 1}^j − 1 (with u₀⁰ given). Denote by ν̂_j := ℒ(u_{N_j}^j) the distribution after level j, so the final output law is ν̂_M − 1, intended to approximate ν_{t_M} = ν₀.

Iterating the end-to-end bound level-by-level yields a decomposition into (i) residual mixing from the initial condition at the first level, (ii) the accumulated learned-drift and time-discretization errors, and (iii) a term measuring the Wasserstein distance between consecutive targets. Concretely, under a uniform contraction rate κ > 0 and a uniform constant K (depending on the same structural parameters as before, but independent of any Galerkin resolution), we obtain

The middle sum is the : even if each level were sampled exactly from ν_{t_j + 1}, a finite schedule cannot avoid paying for the cumulative discrepancy between successive targets, unless additional time is spent to contract these errors away (as reflected by the exponential weights).

In the convolutional noise model , we can make this term fully explicit. Let σ₀ > σ₁ > ⋯ > σ_M = 0 and write ν_{σ_j} in place of ν_{t_j}. Coupling U + σ_jΞ and U + σ_j + 1Ξ with the (U, Ξ) gives
$$ W_2(\nu_{\sigma_j},\nu_{\sigma_{j+1}}) \le \bigl(\mathbb{E}\|(\sigma_j-\sigma_{j+1})\Xi\|_H^2\bigr)^{1/2} =|\sigma_j-\sigma_{j+1}|\,\sqrt{\mathrm{Tr}(C)}. $$
Thus the annealing error is controlled by the discrete total variation of the noise schedule, scaled by $\sqrt{\mathrm{Tr}(C)}$, with no dependence on spatial resolution.

To reach a prescribed accuracy ε, we balance the three contributions in .

First, we choose per-level sampling times τ_j so that the exponential weights make earlier errors negligible, e.g.
$$ e^{-\kappa \tau_j}\approx \frac{1}{M} \quad\Longleftrightarrow\quad \tau_j \approx \kappa^{-1}\log M, $$
or larger at the smallest-noise levels where one typically observes larger learned error δ_σ.

Second, we choose the number of levels M and schedule increments so that ∑_jW₂(ν_{t_j}, ν_{t_j + 1}) ≲ ε. In the convolutional case, it suffices to enforce
$$ \sum_{j=0}^{M-1} |\sigma_j-\sigma_{j+1}|\,\sqrt{\mathrm{Tr}(C)} \lesssim \varepsilon, $$
so a simple choice is a geometric ladder σ_j + 1 = γσ_j with γ ∈ (0, 1) and σ₀ fixed large enough that ν_σ₀ is easy to initialize. Then ∑_j|σ_j − σ_j + 1| = σ₀, and increasing M primarily improves the annealing term in by allowing contraction between closer consecutive targets.

Third, we select h_j to control the discretization error h_j^1/2 (or h_j under higher regularity) and to satisfy the stability constraint h_j ≤ h₀ required by the Euler–Maruyama analysis. If Lipschitz constants improve at larger noise (as is typical for DDO/NCSN training), one may take h_j larger for large σ_j and decrease h_j as σ_j ↓ 0, while keeping h_j ≲ ε² if the h^1/2 rate is the operative one.

All statements above remain valid for the Galerkin-projected implementation, with the same constants, by combining with the uniform-in-n arguments from Section~.

9. Implementation Notes & Empirical Validation Plan: how to estimate δ in practice, diagnostic plots, and experiments (Gaussian mixture GRF, Darcy inverse posterior) to test resolution-invariant scaling.

We record practical considerations for implementing the sampler and for empirically assessing the quantities that enter the dimension-free bound, with particular emphasis on estimating the drift error level δ_t and on verifying that performance does not degrade as the Galerkin resolution increases.

The bound depends on an L²(ν_t; H) quantity,
δ_t² = 𝔼_{u ∼ ν_t}∥F̂_θ, t(u) − F_t(u)∥_H²,
which is not directly observable when F_t is defined through an unknown Φ_t. In practice we consider the following complementary estimators/proxies.

To preserve the commutation P_nC = CP_n and obtain resolution-invariant constants in practice, we implement in a basis {e_i}_i ≥ 1 diagonalizing C, i.e. Ce_i = λ_ie_i, and represent $u=\sum_{i=1}^n u_i e_i$. Then ξ ∼ 𝒩(0, C) is sampled by $\xi_i=\sqrt{\lambda_i}z_i$ with $z_i\stackrel{i.i.d.}{\sim}\mathcal{N}(0,1)$, and $P_n\xi=\sum_{i=1}^n \xi_i e_i$. When H = L²(D) and C is stationary, this is naturally realized by FFT-based sampling; otherwise, we precompute a truncated Karhunen–Lo`eve expansion. We approximate ∥ ⋅ ∥_H consistently with the basis (or with quadrature weights on a grid) and report all errors in this norm to avoid spurious resolution effects.

Beyond monitoring observables, we emphasize diagnostics aligned with the proof mechanism (contraction plus perturbation). We recommend: (i) , where two chains started from different initializations but driven by the same Gaussian noises are used to estimate empirical decay of 𝔼∥u_k − v_k∥_H² and hence an effective κ; (ii) , comparing histograms and second moments of ⟨u, e_i⟩_H for the first r modes across resolutions; (iii) , tracking Φ(u_k) (when computable) and ∥u_k∥_H² to detect step-size instability; and (iv) , computing W₂ on a fixed low-dimensional projection (first r modes) via standard finite-dimensional solvers, which yields a resolution-comparable metric even when full W₂ is impractical.

We consider a non-Gaussian target on H defined as a mixture of two Gaussian measures with shared covariance C,
$$ \nu := \tfrac12 \mathcal{N}(m_1,C)+\tfrac12 \mathcal{N}(m_2,C), $$
with m₁, m₂ ∈ H_μ₀ supported on low frequencies. This target intentionally violates strong log-concavity, and therefore serves to probe failure modes while keeping the score analytically tractable (the mixture score can be evaluated in closed form on each H_n). We train F̂_θ, σ from the corruption model ν_σ and evaluate: (i) how δ̂_σ, n varies with n; (ii) how many steps N are needed for the projected W₂ error on the first r modes to reach a fixed tolerance; and (iii) whether empirical contraction deteriorates with n. The expected outcome, if the method is resolution-consistent, is that the required N is stable in n until representation error (insufficient network capacity for high frequencies) becomes dominant.

We consider an elliptic inverse problem with prior μ₀ = 𝒩(0, C) on the log-permeability field u, and observations y = G(u) + η, η ∼ 𝒩(0, Γ). The posterior satisfies ν ≪ μ₀ with
$$ \Phi(u)=\tfrac12\bigl\|\Gamma^{-1/2}(G(u)-y)\bigr\|^2. $$
We obtain reference samples on a moderate discretization n_⋆ using a standard function-space MCMC method (e.g. pCN) and train F̂_θ, σ using the trace-class corruption ν_σ. We then run the learned sampler on resolutions n ∈ {n_⋆, 2n_⋆, 4n_⋆} without retraining and compare: (i) posterior predictive statistics (pushforwards through G and sensor predictions), (ii) low-mode marginals against the reference chain projected to the same modes, and (iii) the dependence of effective sample size per unit compute on n. The central empirical claim corresponding to the theory is that, once δ_σ is controlled uniformly, the iteration complexity needed to stabilize these diagnostics should not increase with n, up to the expected Õ(nlog n) per-step cost.

10. Discussion: limitations (nonconvexity, small-noise singularity), relationship to other infinite-dimensional diffusion formulations, and open directions (beyond log-concave; adaptive preconditioners).

Our analysis is intentionally confined to a regime in which three ingredients interact cleanly: (i) trace-class, strictly positive noise with a fixed covariance operator C; (ii) strong log-concavity of ν along H_μ₀, encoded by m-strong monotonicity and L-Lipschitzness of ∇_{H_μ₀}Φ; and (iii) a learned drift error controlled in an L²(ν; H) sense by δ. These assumptions are sufficient to make the proof mechanism essentially one-dimensional: synchronous coupling yields contraction, perturbation yields an additive bias O(δ), and Euler–Maruyama yields a discretization bias O(h^1/2). The benefit is that the resulting constants are independent of Galerkin resolution, but the scope is correspondingly limited.

A first limitation is nonconvexity (or multimodality). If Φ is not strongly convex along the Cameron–Martin directions, we generally lose global contractivity in W₂ for the continuum semigroup and therefore lose the straightforward mixing term e^−κT. Even when the invariant measure exists and the SPDE is well-posed, the relevant convergence mechanism can be metastable and dominated by barrier-crossing times, for which dimension-free bounds are substantially more delicate. One can still pursue bounds of the form
W₂(ℒ(u_T), ν) ≤ Mix(T; V) + Bias(δ, h),
where Mix(T; V) is controlled via a Lyapunov function V and a minorization condition on suitable sublevel sets. However, establishing such conditions in function space, and doing so uniformly over P_n, typically requires additional structure (e.g. a globally dissipative drift plus a noise that is sufficiently nondegenerate on a ``small set’’). In the nonconvex setting, we therefore view the present result as giving a baseline: it characterizes when one expect resolution-invariant behavior, and it isolates precisely which part of the argument fails when contractivity is absent.

A second limitation concerns small-noise or singular-noise regimes. Our setting assumes C is trace-class and fixed as the resolution increases, so ξ ∼ 𝒩(0, C) is H-valued almost surely and Tr(C) < ∞ enters the constants. If one instead considers ``whiter’’ noise (e.g. covariance approaching the identity on finer grids), then Tr(C) may diverge with n, and both the implementation and the bounds can degrade. This is not merely technical: if the corruption/noise injects energy at arbitrarily high frequencies, then learning a drift in an H-norm that remains uniformly accurate becomes information-theoretically and computationally harder, and the error level δ typically grows with resolution. This small-noise singularity also arises in annealing limits where one lets the effective diffusion scale go to zero; in such limits, the bias term induced by δ is amplified unless δ simultaneously vanishes at a compatible rate.

A third limitation is that the learning assumption is expressed under the law ν:
𝔼_u ∼ ν∥F(u) − F̂_θ(u)∥_H² ≤ δ²,
whereas the sampler operates out of stationarity during burn-in and may explore regions under its own transient law. In strongly contractive regimes this mismatch is mitigated, but, strictly speaking, an off-distribution control (e.g. uniform Lipschitzness of F̂_θ and a weighted L² bound under a Lyapunov measure) would be more satisfactory. This suggests an open direction: replace the L²(ν; H) error by a stability notion that can be verified from training data and that is robust under the sampling dynamics, for example a bound of the form
sup_{ρ ∈ 𝒫₂(H): ρ(V) ≤ M} 𝔼_u ∼ ρ∥F(u) − F̂_θ(u)∥_H² ≤ δ_M²,
for a coercive V. Proving an end-to-end bound with δ_M would better reflect practical training and would more naturally accommodate non-log-concave targets.

It is also useful to place the present formulation among other infinite-dimensional diffusion constructions. The SPDE
$$ du_t\ =\ -(u_t+C\nabla_{H_{\mu_0}}\Phi(u_t))\,dt+\sqrt{2}\,dW_t $$
may be viewed as an overdamped Langevin dynamics preconditioned by the prior covariance, with W_t a C-Wiener process; this is closely related to function-space MALA and to the idea that the prior should set the geometry of the proposal. From this perspective, our resolution-invariant estimate can be read as a quantitative stability result for a μ₀-reversible diffusion under simultaneous drift approximation and time discretization. Other diffusion-based approaches (e.g. Schr"odinger bridge formulations or score-based reverse-time SDEs) can also be posed on Hilbert spaces, but they often require either time-inhomogeneous drifts or additional regularity to justify changes of measure. An interesting conceptual link is that, in all cases, the trace-class nature of the noise is the key property that prevents high-frequency divergences and makes the continuum-first viewpoint meaningful.

Finally, we highlight two open directions that we expect to be central in applications. The first is to move beyond strong log-concavity while retaining resolution-invariant control, potentially by combining (a) contractivity on low modes, (b) coercivity of the prior on high modes, and (c) localized minorization driven by the nondegenerate directions of C. The second is to develop adaptive or learned preconditioners. Replacing C by an operator K (possibly state-dependent) could accelerate mixing, but it changes both the invariant measure and the discretization stability. Even in the linear case, choosing K to match posterior curvature resembles operator preconditioning; in the learned setting, one may attempt to parameterize K and the drift jointly while enforcing that the resulting diffusion preserves ν. Establishing dimension-free guarantees in such adaptive geometries appears feasible in principle, but it will require a careful extension of the contraction argument to nonconstant metrics and a corresponding notion of drift error aligned with the new geometry.

Resolution-Invariant Error Bounds for Denoising Diffusion Operators via Function-Space Langevin Stability

Metadata

Table of Contents

Content

1. Introduction: motivation (resolution invariance), gap between empirical DDO results and end-to-end theory, contributions and scope (strongly log-concave targets; discretize-later guarantees).

2. Background on DDOs and Function-Space Scores: Gaussian reference measures, Cameron–Martin space, score operator as logarithmic derivative, DDO denoising objective, and preconditioned Langevin dynamics.

3. Problem Setup and Metrics: target class ν ≪ μ0, contraction assumptions, Wasserstein-2 on Hilbert spaces, Galerkin discretizations Pn, and what “resolution-invariant bound” means formally.

4. Exact Continuum Langevin Contraction: define the SPDE and its Markov semigroup; prove W2-contraction under strong monotonicity/Lipschitz conditions; derive mixing-time bound independent of discretization.

5. Drift Perturbation and Learned Score Error: prove a drift-perturbation lemma bounding W2 distance between trajectories and between invariant measures in terms of ∥F − F̂∥L2(ν); connect this to DDO score approximation error.

6. Time Discretization Error: Euler–Maruyama (and optional Crank–Nicolson) weak/strong error bounds for the function-space Langevin chain; obtain O(h1/2) (or improved) contribution to W2 error.

7. Uniform-in-Resolution (Galerkin) Results: show constants do not depend on n for projected dynamics; prove commutation/consistency conditions for Pn, noise, and drift needed for the discretize-later theorem.

8. Annealed Noise Scales: extend the bound to a schedule {νt} (DDO/NCSN-like); add an explicit annealing error term and guidelines for choosing ht, M, and σt.