Probabilistic programming systems in current use are rarely deployed with a semantics that is both fully exact and fully preserved by the surrounding toolchain. Inference is typically approximate (variational objectives, sequential Monte Carlo with resampling, MCMC with finite burn-in), compilation passes may replace subroutines by faster surrogates, and implementations rely on finite precision and randomized kernels whose idealized counterparts are not literally realized. For these reasons, a notion of equivalence that is strictly extensional (equality of denotations) is often too brittle to support program reasoning as practiced: small, localized approximations should induce small changes in observable behavior, and the relevant mathematical problem is to control such changes uniformly over program contexts. We therefore regard robust, quantitative observational comparison as a primary semantic interface: the semantics should provide a metric in which common transformations are nonexpansive, and in which convergence of approximations implies convergence of induced observations.
Random graph models provide a particularly clear setting in which to make this point. A typical program samples a finite number of vertices and queries adjacency relations between sampled vertices, possibly combining these queries with additional probabilistic choices and deterministic control. Even in this restricted fragment, exact contextual equivalence is subtle: two models may agree on all finite observations while differing on sets of measure zero, and syntactic equality of generators is far from informative. The source develops an exact classification theorem for the undirected simple graph interface, showing that distributive Markov semantics for such programs correspond to graphons, unique up to the usual notion of weak isomorphism. In that picture, the semantic content of a theory is not a particular measurable function W : [0, 1]2 → [0, 1] but its equivalence class under measure-preserving relabelings, exactly matching the standard representation of dense exchangeable random graphs. This identification explains why graphon geometry is the natural ambient structure for program equivalence in the dense regime.
The present work takes the same classification result as a starting point and asks for a quantitative refinement compatible with program observation. Our guiding principle is that the semantic distance between two theories should be defined by a family of tests that are themselves programs, and that are stratified by the amount of sampling they perform. Concretely, we test theories using closed boolean programs that sample at most k vertices via and then query among the sampled vertices. Such tests are operationally meaningful—they correspond to experiments that inspect a bounded number of sampled vertices—and they align directly with the statistics used throughout graph limit theory. We measure discrepancies between test outcomes in total variation, so that postprocessing (deterministic or stochastic) is automatically nonexpansive and so that the resulting distances admit a direct probabilistic interpretation as the maximal distinguishing advantage of a bounded experiment.
The resulting hierarchy dk of observational distances and its aggregate dprog serve two simultaneous purposes. First, they provide a robust, program-defined metric on theories, suitable for reasoning about approximate implementations and approximate compilation. Second, they connect the semantics of probabilistic programs to the established analytic notions on graphons, notably induced subgraph densities and the cut metric. This connection is not merely qualitative: the basic tests realized by k-bounded programs generate exactly the same σ-algebra of finite observations as induced k-vertex subgraph events, and the standard stability bounds for subgraph counts yield explicit Lipschitz control of program observables by cut distance. Conversely, convergence in program distance implies convergence of all finite induced laws, and hence (by classical compactness and separation arguments in dense graph limits) convergence in cut distance up to equivalence.
We summarize our contributions as follows.The net effect is that random graph theories acquire a canonical, implementation-relevant notion of approximate equivalence: two theories are close exactly when no bounded sampling experiment can distinguish them with significant advantage, and this closeness is equivalent (in the topological sense) to closeness in the standard cut metric on graphons. The remainder of the paper develops the necessary preliminaries and then proves the stated identifications and continuity properties. Our emphasis throughout is that the metric structure arises from the programming interface itself, while the graphon correspondence provides the analytic tools needed to relate these program metrics to established notions in dense graph limit theory.
We collect the analytic and probabilistic background used throughout. Our two main ambient structures are the category FinStoch, in which program denotations on numeral types live, and the space of graphons equipped with the cut metric, which governs dense exchangeable random graphs.
We take FinStoch to have objects the natural numbers n ∈ ℕ, viewed as finite sets [n] = {1, …, n}. A morphism M : m → n is a row-stochastic matrix M ∈ [0, 1]m × n with $\sum_{j=1}^n M_{ij}=1$ for each i ∈ [m]. Composition is matrix multiplication: for K : n → r, $(K\circ M)_{i\ell}=\sum_{j=1}^n M_{ij}K_{j\ell}$. A deterministic map f : [m] → [n] is identified with the stochastic matrix δf given by (δf)ij = 1 if f(i) = j and 0 otherwise. A morphism p : 1 → n is equivalently a probability distribution on [n]; in particular, programs 1 → 2 denote Bernoulli laws.
For distributions p, q : 1 → n we
use the total variation metric
$$
d_{\mathrm{TV}}(p,q)\;:=\;\frac12\sum_{j=1}^n |p(j)-q(j)|,
$$
which also admits the equivalent variational characterization dTV(p, q) = supA ⊆ [n]|p(A) − q(A)|.
For stochastic matrices M, N : m → n
we equip each hom-set with the induced sup-row metric
$$
d_{\mathrm{TV}}(M,N)\;:=\;\max_{i\in[m]}
d_{\mathrm{TV}}\big(M(i,-),N(i,-)\big)
\;=\;\max_{i\in[m]}\frac12\sum_{j=1}^n|M_{ij}-N_{ij}|.
$$
This choice is convenient for program semantics because it is precisely
the worst-case distinguishing advantage over inputs i ∈ [m], and because it
interacts well with composition.
We shall repeatedly use the standard nonexpansiveness properties of
TV under stochastic postprocessing. Concretely, if K : n → r is
stochastic, then for each fixed row i the map p ↦ pK is an
affine map on the simplex, and TV is contractive under such Markov
kernels. In matrix form, this yields
Similarly, if f : [ℓ] → [m] is
deterministic, then precomposition simply selects rows, hence
dTV(M ∘ δf, N ∘ δf) ≤ dTV(M, N).
We will use~ to justify that observational distances defined by suprema
of TV discrepancies behave well under program contexts.
A (dense) graphon is a symmetric measurable function W : [0, 1]2 → [0, 1],
with W(x, y) = W(y, x)
a.e. Given W, we obtain an
exchangeable random graph on vertex set [k] by sampling i.i.d. latent
variables U1, …, Uk ∼ Unif[0, 1]
and then, conditional on (Ui), sampling
edges {i, j}
independently with
ℙ({i, j} ∈ E ∣ U1, …, Uk) = W(Ui, Uj) (1 ≤ i < j ≤ k).
This defines a distribution on 𝒢k, the set of simple
graphs on [k]. For F ∈ 𝒢k, the
induced subgraph density is the probability of seeing exactly F under this procedure:
Equivalently, (tind(F, W))F ∈ 𝒢k
is the law of the k-vertex
labeled graph sampled from W;
we will often regard these numbers as the canonical finite observables
of W.
Graphons are considered up to measure-preserving relabeling. If φ : [0, 1] → [0, 1] is measure-preserving, we write Wφ(x, y) := W(φ(x), φ(y)). Such relabelings do not change the sampled graph law, hence leave all quantities tind(F, W) invariant.
We measure graphon proximity using the cut norm
∥W − U∥▫ := supS, T ⊆ [0, 1]|∫S × T(W(x, y) − U(x, y)) dx dy|,
where S, T range over
measurable sets. The cut distance on equivalence classes is defined
by
δ▫(W, U) := infφ ∥Wφ − U∥▫,
where φ ranges over
measure-preserving bijections (or, equivalently, measure-preserving maps
in the usual completion). This is a pseudometric on concrete
representatives and descends to a metric on weak isomorphism classes; we
use it as the standard topology on dense graph limits.
We will use the fact that cut proximity controls finite subgraph
statistics. The relevant inequalities are classical; we record a
convenient induced variant with explicit (though not optimized)
constants. Let k ≥ 1 and F ∈ 𝒢k, and
write $m=\binom{k}{2}$. Expanding the
integrand in~ as a multilinear polynomial in the m variables {W(xi, xj)}i < j
and bounding termwise, one obtains
A standard route to~ is to first bound differences of non-induced
homomorphism densities by |E(F)|∥W − U∥▫,
and then express induced densities as alternating sums of homomorphism
densities over supergraphs of F, producing the factor 2m. In particular, after
optimizing over relabelings we get the cut-distance form
|tind(F, W) − tind(F, U)| ≤ 2m m δ▫(W, U).
We will only need the qualitative content that, for fixed k, all induced k-vertex statistics vary Lipschitzly
with δ▫, with
constants depending solely on k.
The converse direction—that agreement of all finite induced statistics forces δ▫ = 0—is also standard in dense graph limit theory: the family {tind(F, −)}F separates graphon equivalence classes, and the cut-metric space is compact. We will appeal to these facts when relating program-defined distances to the usual graphon topology.
Our goal is to compare two random-graph theories Ψ, Φ : 𝒰ℕ → FinStoch by the observable behavior of numeral-typed programs. Since 𝒰ℕ is purely probabilistic/combinatorial (objects are numerals n, morphisms are programs with n-valued outputs), it is natural to equip each hom-set FinStoch(m, n) with a metric that measures statistical distinguishability. We take total variation, as recalled in Section~, and regard it as an observation metric rather than merely a property of distributions.
Fix m, n ∈ ℕ. For
M, N : m → n
in FinStoch, recall
$$
d_{\mathrm{TV}}(M,N)\;=\;\max_{i\in[m]}\frac12\sum_{j\in[n]}|M_{ij}-N_{ij}|.
$$
This choice admits an explicit ``testing’’ interpretation: a
deterministic predicate on [n]
is a function b : [n] → [2], identified
with δb : n → 2
in FinStoch. Postcomposing M : m → n with
δb yields
the induced Bernoulli experiment δb ∘ M : m → 2.
Using the variational characterization of TV rowwise, we have the
equivalence
i.e. dTV(M, N)
is the maximum distinguishing advantage of any (deterministic) yes/no
observation applied to the output of M versus N, in the worst case over inputs
i ∈ [m].
Moreover, TV is stable under program contexts: if K : n → r is any stochastic postprocessing, then dTV(K ∘ M, K ∘ N) ≤ dTV(M, N), and similarly precomposition by deterministic maps is nonexpansive. We will use these contraction properties to justify that distances defined by taking suprema of dTV-discrepancies are robust under adding surrounding code.
Let Ψ : 𝒰ℕ → FinStoch be a
theory. For each program p : m → n in 𝒰ℕ we write Ψ⟦p⟧ ∈ FinStoch(m, n)
for its denotation. In particular, if p : 1 → n is closed then
Ψ⟦p⟧ is a
distribution on [n]. For
boolean outputs p : 1 → 2, the
total variation distance reduces to absolute difference of acceptance
probabilities:
Thus boolean programs are canonical statistical tests for distinguishing
theories.
While one could attempt to compare Ψ and Φ by taking a supremum over all boolean programs p : 1 → 2, this is too coarse for two reasons: (i) the space of tests is infinite and syntactically unstructured, and (ii) in the graph interface the only source of unbounded information is the number of vertex samples. We therefore introduce a bounded testing hierarchy aligned with the number of calls to .
We say that a closed program p : 1 → 2 in 𝒰ℕ is if its syntax contains at
most k occurrences of the
vertex-sampling primitive new. Intuitively, such a program
can only inspect the random graph through the induced subgraph on at
most k sampled vertices,
together with any internal probabilistic choices that do not introduce
additional vertices. We write Progk(1, 2) for the
collection of closed k-bounded
boolean programs (considered modulo the equational theory of 𝒰, so that program rewrites do not change
membership).
The restriction to closed programs 1 → 2 is deliberate: by~, any observable property of an n-valued output can be reduced to a boolean post-test, and by the contraction of TV under postcomposition it suffices to bound boolean observations. In other words, we adopt the standard ``testing semantics’’ stance that observational power is measured by acceptance probabilities of predicates.
Given two theories Ψ, Φ : 𝒰ℕ → FinStoch,
we define
Since each denotation Ψ⟦p⟧ : 1 → 2 is a Bernoulli
law, we have 0 ≤ dk(Ψ, Φ) ≤ 1
and dk(Ψ, Φ) = dk(Φ, Ψ).
Monotonicity in the bound is immediate: if k ≤ k′ then
Progk(1, 2) ⊆ Progk′(1, 2)
and hence dk(Ψ, Φ) ≤ dk′(Ψ, Φ).
The point of~ is that the supremum ranges over a conceptually simple family of experiments: choose up to k vertices using , query the edge predicate among them as desired, and finally output a bit. The next section makes this intuition precise by proving a normal form in which all calls are hoisted and the remainder of the program is deterministic in the induced adjacency matrix.
To aggregate all finite observational levels into a single bounded
metric, we define
This is a standard weighted construction: each min {1, dk} is a
bounded pseudometric, the weights 2−k ensure convergence,
and the resulting dprog is bounded by 1. Operationally, dprog measures
distinguishability by allowing tests of all finite vertex complexities
but discounting higher complexities exponentially.
Two further remarks clarify why~ is the appropriate notion of distance for our purposes. First, because min {1, −} and weighted sums preserve the triangle inequality, dprog is a pseudometric on theories; it becomes a genuine metric after quotienting by observational equivalence (i.e. identifying theories that agree on all finite tests). Second, dprog is designed to interact with the graphon correspondence: the subsequent identification of dk with induced k-vertex graph statistics will imply that dprog factors through graphon weak isomorphism classes and is topologically equivalent to cut distance.
The definition of dk quantifies
over all closed boolean programs that use at most k calls to . In order to relate such
programs to classical finite-graph statistics, we isolate the precise
syntactic dependence of a k-bounded program on the random
graph: up to the equations of 𝒰, a
k-bounded program can only
depend on the induced adjacency pattern among the sampled vertices.
Concretely, we show that all occurrences of can be hoisted to the
outside, after which the remainder factors through the k × k adjacency matrix
(edge(xi, xj))i, j ≤ k.
We begin with a purely structural normalization step: vertex samples
can be pulled outward across the other program constructs. This is an
instance of the usual ``Fubini/let-exchange’’ principles in a
commutative Markov setting, specialized to a closed sampler new : 1 → vertex. Intuitively,
sampling a vertex is independent of all computations that do not
themselves resample vertices; hence we may always reorder and regroup
vertex samples.
For later use it is convenient to allow padding: if r < k we may adjoin unused samples xr + 1, …, xk and ignore them. Thus, without loss of generality, we may work with exactly k samples, at the cost of notational redundancy.
After hoisting, the only way that the sampled vertices x1, …, xk can influence the outcome is through applications of . Since is deterministic and satisfies the simple undirected axioms (irreflexive and symmetric), its observable content on a k-tuple of vertices is exactly a simple labeled graph on [k], equivalently a symmetric {0, 1}-matrix with zero diagonal.
Fix once and for all an enumeration of unordered pairs
$$
E_k \;:=\;\{\,\{i,j\}\mid 1\le i<j\le k\,\},\qquad
|E_k|=\binom{k}{2}.
$$
We write Ak for the
finite set {0, 1}Ek
of adjacency patterns on [k],
and we identify it with a numeral object of size $|A_k|=2^{\binom{k}{2}}$ in 𝒰ℕ (any fixed encoding suffices).
There is then a canonical deterministic program
adjk : vertexk → Ak
which, given (x1, …, xk),
returns the bit-vector (edge(xi, xj)){i, j} ∈ Ek.
Determinism of ensures that adjk is deterministic,
and symmetry/irreflexivity ensure that this vector exactly corresponds
to the induced simple graph on the sampled vertices.
We now show that once new has been hoisted, every
remaining program depends on the vertices only through adjk. The key point is
that in the universal graph interface there are no vertex-valued
observations other than feeding vertices into ; thus vertices have no
observable structure beyond their induced adjacency relations.
Combining Lemmas~ and~, every closed k-bounded boolean program p : 1 → 2 is equal to one of the
form
for some q̃ : Ak → 2.
Here newk denotes
the iterated sampling program that returns k vertices.
For the identification results in the next section, we further refine~ by observing that q̃ can be taken deterministic without loss of distinguishing power. Indeed, q̃ is precisely a (possibly randomized) statistical test applied to the finite outcome space Ak. By the variational characterization of total variation (cf. above), the optimal distinguishing advantage between two distributions on Ak is achieved by a deterministic predicate f : Ak → 2, i.e. by choosing a subset S ⊆ Ak and accepting exactly on S. Thus, although our definition of dk allows arbitrary k-bounded programs (including internal randomness after sampling vertices), in computing the supremum we may restrict to programs whose final stage is deterministic in the adjacency pattern.
In summary, the k-bounded
testing hierarchy is equivalent to testing, by deterministic predicates,
the distribution of the induced labeled graph on [k] generated by newk
together with the deterministic edge oracle. The next section exploits
this normal form to identify dk(Ψ, Φ)
with the total variation distance between the induced k-vertex graph laws of WΨ and WΦ, and to
exhibit explicit programs realizing the individual induced subgraph
densities tind(F, ⋅).
We now relate the observational distances dk to the graphon cut metric. Since by Theorem~A we may identify dk(Ψ, Φ) with the maximum discrepancy of induced k-vertex laws, it suffices to bound induced subgraph densities in terms of δ▫.
We recall the standard counting-lemma estimate for (labeled)
homomorphism densities. For a simple graph H on vertex set [k], write
t(H, W) := ∫[0, 1]k ∏{i, j} ∈ E(H)W(xi, xj) dx1⋯dxk.
The following inequality is classical (see e.g. Lov'asz, ) and is the
basic quantitative interface between cut norm and subgraph
statistics.
For F ∈ 𝒢k, the
induced density tind(F, W)
may be expressed by inclusion–exclusion in terms of the homomorphism
densities t(H, W) as H ranges over supergraphs of F on the same vertex set. Writing
$m=\binom{k}{2}$ and letting Sup(F) denote the family of graphs
H on [k] with E(F) ⊆ E(H) ⊆ E(Kk),
we have
Consequently, Lemma~ implies the uniform cut-norm stability
estimate
2^{m},m,|W-U|_{},
\end{equation}
since |Sup(F)| ≤ 2m
and |E(H)| ≤ m for all
H ⊆ Kk.
This gives an explicit admissible choice $C_k:=2^{\binom{k}{2}}\binom{k}{2}$ in
Proposition~B.
Let Ψ, Φ be
theories with associated graphons WΨ, WΦ.
By Theorem~A and~,
dk(Ψ, Φ) = maxF ∈ 𝒢k|tind(F, WΨ) − tind(F, WΦ)| ≤ Ck ∥WΨ − WΦ∥▫.
Finally, since δ▫(WΨ, WΦ) = infφ∥WΨφ − WΦ∥▫
over measure-preserving relabelings φ, we may apply the same bound to
WΨφ
and take the infimum in φ,
obtaining
dk(Ψ, Φ) ≤ Ck δ▫(WΨ, WΦ).
Fix Ψ and let Φn be any
sequence with δ▫(WΦn, WΨ) → 0.
The inequality above yields dk(Φn, Ψ) → 0
for each fixed k. To pass to
dprog, we use the
summability built into its definition: given ε > 0, choose K so that ∑k > K2−k < ε/2.
Then pick N such that for all
n ≥ N and all 1 ≤ k ≤ K, we have dk(Φn, Ψ) < ε
(possible since K is finite).
For such n,
dprog(Φn, Ψ) ≤ ∑k ≤ K2−kdk(Φn, Ψ) + ∑k > K2−k ≤ ε∑k ≤ K2−k + ε/2 < ε,
so dprog is
continuous in δ▫.
For the converse direction (Theorem~C and Corollary~D), we require that sufficiently many induced densities control cut distance. Qualitatively, this is a standard consequence of (i) compactness of graphon equivalence classes under δ▫ and (ii) separation by the family {tind(F, ⋅)}F (Lemma~5). Quantitatively, one may route this through a weak regularity lemma: for each ε > 0, every graphon W admits a step approximation W(ε) with at most r(ε) parts such that δ▫(W, W(ε)) ≤ ε/3, where r(ε) may be taken on the order of 2O(ε−2). The parameters of a step graphon with r parts are a finite list of edge-densities between parts; these may be recovered (up to an error depending on ε) from finitely many induced densities tind(F, ⋅) with |V(F)| ≤ k(ε), for an explicit (but very large) function k(ε) depending on r(ε).
Thus, given ε, there exist
k(ε) and η(ε) > 0 such that
if
maxF ∈ 𝒢k(ε)|tind(F, W) − tind(F, U)| < η(ε),
then δ▫(W, U) < ε.
Translating the left-hand side via Theorem~A, it suffices to require
dk(ε)(Ψ, Φ) < η(ε),
which in turn is implied by dprog(Ψ, Φ)
being small enough since dprog dominates 2−k(ε)min {1, dk(ε)}.
Defining
ω(δ) := inf {ε > 0: δ ≥ 2−k(ε)min {1, η(ε)}},
we obtain a monotone modulus with δ▫(WΨ, WΦ) ≤ ω(dprog(Ψ, Φ)).
As noted earlier, any such explicit ω extracted from weak regularity
will have poor (typically tower-type) growth, and improving it requires
additional structural hypotheses beyond the dense graphon regime.
To control the k-bounded observational discrepancy in terms of the cut geometry, we reduce to the usual analytic bounds relating the cut norm to subgraph statistics. Throughout, we regard graphons W, U : [0, 1]2 → [0, 1] as representatives of their weak isomorphism classes, and we write ∥ ⋅ ∥▫ for the cut norm on [0, 1]2. Since δ▫ is obtained by taking an infimum of cut norms over measure-preserving relabelings, it is enough to derive estimates in ∥ ⋅ ∥▫ and then infimize.
Let H be a simple graph on
vertex set [k] with edge set
E(H) and m := |E(H)|. We
use the labeled homomorphism density
t(H, W) := ∫[0, 1]k ∏{i, j} ∈ E(H)W(xi, xj) dx1⋯dxk.
The next estimate is the standard ``counting lemma’’ in its simplest
form; the dependence on H is
only through the number of edges.
Fix k ≥ 1 and F ∈ 𝒢k. Write
$m:=\binom{k}{2}$ and denote by Sup(F) the family of graphs H on [k] satisfying E(F) ⊆ E(H) ⊆ E(Kk).
The induced density is
tind(F, W) = ∫[0, 1]k ∏{i, j} ∈ E(F)W(xi, xj) ∏{i, j} ∉ E(F)(1 − W(xi, xj)) dx1⋯dxk,
and expanding the ∏(1 − W) by
inclusion–exclusion yields the identity
tind(F, W) = ∑H ∈ Sup(F)(−1)|E(H)|−|E(F)| t(H, W).
Combining this with Lemma~, we obtain a uniform estimate
In particular, we may take $C_k:=2^{\binom{k}{2}}\binom{k}{2}$ in the
stability statement dk ≤ Ck δ▫;
no optimization of Ck is needed for
our purposes, and any explicit Ck depending
only on k suffices.
Let Ψ, Φ : 𝒰ℕ → FinStoch
be theories and let WΨ, WΦ
be their associated graphons. By the identification of k-bounded program tests with induced
k-vertex statistics
(Theorem~A), we have
dk(Ψ, Φ) = maxF ∈ 𝒢k|tind(F, WΨ) − tind(F, WΦ)|.
Applying the induced-density cut-norm bound to each F gives
dk(Ψ, Φ) ≤ Ck ∥WΨ − WΦ∥▫.
To pass from ∥ ⋅ ∥▫ to δ▫, we use that tind(F, ⋅) is
invariant under measure-preserving relabelings: for any
measure-preserving φ, we have
tind(F, WΨ) = tind(F, WΨφ).
Hence the same inequality holds with WΨ replaced by
WΨφ,
and taking the infimum over φ
yields
dk(Ψ, Φ) ≤ Ck δ▫(WΨ, WΦ).
This is Proposition~B.
The definition
dprog(Ψ, Φ) = ∑k ≥ 12−kmin {1, dk(Ψ, Φ)}
builds summability into the observable hierarchy. If δ▫(WΦn, WΨ) → 0,
then Proposition~B implies dk(Φn, Ψ) → 0
for each fixed k. Given ε > 0, choose K so that ∑k > K2−k < ε/2.
Then for n sufficiently large,
we have dk(Φn, Ψ) < ε
for all k ≤ K, and
therefore
dprog(Φn, Ψ) ≤ ∑k ≤ K2−kdk(Φn, Ψ) + ∑k > K2−k < ε∑k ≤ K2−k + ε/2 ≤ ε.
Thus δ▫-convergence
implies dprog-convergence.
For the reverse implication, the qualitative statement is that the family of all induced densities {tind(F, ⋅)}F separates graphon equivalence classes, and the cut-metric space of equivalence classes is compact; consequently, convergence of all induced densities forces convergence in δ▫. In our setting, dprog(Ψn, Ψ) → 0 implies dk(Ψn, Ψ) → 0 for every fixed k, hence convergence of induced k-vertex laws for every k, and therefore convergence of all induced densities.
To obtain a quantitative modulus, we appeal to a weak regularity
lemma. Fix ε ∈ (0, 1]. There
exists a function r(ε) (e.g. r(ε) = 2O(ε−2)
suffices by Frieze–Kannan) such that every graphon W admits a step graphon W(ε) with at
most r(ε) parts
and
δ▫(W, W(ε)) ≤ ε/3.
A step graphon with r parts is
determined by finitely many parameters (the part measures and the r × r edge-density matrix).
For fixed r, these parameters
can be estimated, with an error tolerance depending on ε, from finitely many induced
densities on at most k(ε) vertices, for some
explicit (though rapidly growing) k(ε) depending on r(ε). Consequently, there
exists η(ε) > 0
such that whenever
maxF ∈ 𝒢k(ε)|tind(F, W) − tind(F, U)| < η(ε),
we can couple the step approximants W(ε) and U(ε) so that
δ▫(W(ε), U(ε)) ≤ ε/3,
and hence by the triangle inequality δ▫(W, U) < ε.
Translating the hypothesis through Theorem~A, the displayed maximum
equals dk(ε)(Ψ, Φ)
for the corresponding theories. Moreover, since
dprog(Ψ, Φ) ≥ 2−k(ε)min {1, dk(ε)(Ψ, Φ)},
it suffices to require dprog(Ψ, Φ) < 2−k(ε)η(ε)
(and η(ε) ≤ 1 if
desired). We therefore obtain a monotone modulus by setting
ω(δ) := inf {ε ∈ (0, 1]: δ ≥ 2−k(ε)min {1, η(ε)}},
which ensures δ▫(WΨ, WΦ) ≤ ω(dprog(Ψ, Φ)).
The dependence hidden in k(ε) and η(ε) is necessarily poor
when derived solely from weak regularity; sharpening ω is a separate quantitative problem
rather than a conceptual obstruction.
The family {dk}k ≥ 1 is intentionally stratified: fixing k restricts us to observables expressible by sampling at most k vertices and then applying a deterministic Boolean predicate to the induced adjacency matrix. Consequently, for any fixed k, the pseudometric dk is strictly weaker than cut distance and does not determine the underlying graphon up to weak isomorphism. We record concrete instances of this phenomenon and briefly indicate where the constants in the quantitative comparison necessarily deteriorate.
For k = 2 we have $\mathcal G_2=\{\overline{K_2},K_2\}$, hence
the induced 2-vertex law is determined
by a single parameter, the edge density
ρ(W) := ∫[0, 1]2W(x, y) dx dy.
Indeed tind(K2, W) = ρ(W)
and $t_{\mathrm{ind}}(\overline{K_2},W)=1-\rho(W)$,
so
d2(Ψ, Φ) = max {|ρ(WΨ) − ρ(WΦ)|, |ρ(WΨ) − ρ(WΦ)|} = |ρ(WΨ) − ρ(WΦ)|.
Thus any two theories/graphons with the same overall edge density are
indistinguishable by 2-bounded tests,
even if their higher-order structure is far apart.
Fix α ∈ (0, 1) and let
Wα be the
constant graphon Wα ≡ α.
Let U be the balanced complete
bipartite graphon given by the partition [0, 1] = A ⊔ B with λ(A) = λ(B) = 1/2
and
$$
U(x,y)\;=\;\begin{cases}
0,& (x,y)\in A^2\cup B^2,\\
1,& (x,y)\in A\times B\cup B\times A.
\end{cases}
$$
Then ρ(W1/2) = ρ(U) = 1/2,
so d2(W1/2, U) = 0.
However, the triangle density separates them:
$$
t(K_3,W_{1/2})=\Bigl(\frac12\Bigr)^3=\frac18,
\qquad
t(K_3,U)=0,
$$
hence d3(W1/2, U) ≥ |tind(K3, W1/2) − tind(K3, U)| = 1/8.
In program terms, there exists a 3-bounded Boolean program that samples three
vertices and returns exactly when they form a triangle; its acceptance
probability differs by 1/8 under these
two theories despite complete agreement on all 2-vertex experiments.
The previous example already shows that d2 = 0 does not imply
small cut distance. In fact δ▫(W1/2, U)
is bounded below by an absolute constant. Writing D := U − W1/2,
we have D = +1/2 on A × B ∪ B × A
and D = −1/2 on A2 ∪ B2.
Taking S = T = A
gives
$$
\Bigl|\int_{S\times T} D\Bigr|
=\Bigl|\int_{A\times A} \Bigl(-\frac12\Bigr)\Bigr|
=\frac12\cdot \lambda(A)^2
=\frac18,
$$
so ∥U − W1/2∥▫ ≥ 1/8
and therefore δ▫(W1/2, U) ≥ 1/8
as well. Hence no inequality of the form δ▫ ≤ f(d2)
can hold with f(δ) → 0 as δ → 0; any modulus necessarily
requires access to all k,
i.e. to dprog.
More generally, for a fixed k the finite vector
(tind(F, W))F ∈ 𝒢k
cannot separate graphon equivalence classes: the space of graphons
(modulo weak isomorphism) is infinite-dimensional, while 𝒢k is finite. Concretely,
already within the class of step graphons with a moderate number of
parts, one can vary the parameters (part measures and edge
probabilities) along nontrivial directions while preserving the finitely
many constraints tind(F, ⋅) for
F ∈ 𝒢k.
Such perturbations yield distinct graphons W ≆ U with tind(F, W) = tind(F, U)
for all F ∈ 𝒢k, hence
dk(W, U) = 0,
yet typically tind(F′, W) ≠ tind(F′, U)
for some F′ ∈ 𝒢k + 1.
This is the precise sense in which bounded testing is a strictly
incomplete observational principle: k-bounded programs recover exactly
the induced k-vertex law and
nothing beyond it.
Our choice $C_k=2^{\binom{k}{2}}\binom{k}{2}$ comes from a direct inclusion–exclusion expansion of induced densities into ordinary homomorphism densities together with the edge-counting bound |t(H, W) − t(H, U)| ≤ |E(H)|∥W − U∥▫. The factor |E(H)| in the latter estimate is essentially sharp in its dependence on H: if H is a single edge, then t(H, W) − t(H, U) = ∫(W − U), and one can realize |∫(W − U)| on a near-extremal cut by taking W − U approximately constant on a rectangle S × T. For induced densities, the combinatorial factor $2^{\binom{k}{2}}$ is certainly not optimized and can be improved in specific regimes (e.g. for particular F or for restricted classes of graphons), but we do not expect polynomial dependence on k uniformly over all F ∈ 𝒢k: the inclusion–exclusion expansion inherently sums over all supergraphs H of F, whose number is $2^{\binom{k}{2}-|E(F)|}$.
The modulus ω obtained via weak regularity is correspondingly coarse. Intuitively, to force δ▫(W, U) < ε from finitely many subgraph statistics, one must first approximate both W and U by step graphons of complexity r(ε) and then identify (up to relabeling) their step parameters from a finite list of induced densities. Even with Frieze–Kannan weak regularity, one has r(ε) = 2O(ε−2), and any reconstruction argument that is uniform over all step graphons with r parts leads to rapidly growing k(ε) and correspondingly tiny thresholds η(ε). This reflects a genuine limitation rather than a technical artifact: without restricting to a smaller model class (e.g. step graphons with a fixed number of parts, Lipschitz graphons, or other parametrically controlled families), low-order statistics can remain stable under substantial cut perturbations, and conversely small changes in cut metric may require inspecting very high-order induced laws to certify.
From the programmatic viewpoint, these examples say that bounded-k approximate equivalence is a meaningful but deliberately local notion: it certifies agreement of all experiments that inspect at most k sampled vertices, but it cannot preclude global rearrangements visible only at larger sample sizes. The aggregated distance dprog remedies this by weighting all k, but its converse control of δ▫ necessarily inherits the poor quantitative behavior of regularity-type arguments in the absence of additional hypotheses.
We briefly indicate how the distances dk and dprog can be used as correctness/robustness criteria for probabilistic transformations, and we record several natural extension directions.
A common situation in probabilistic compilation and approximate
inference is that a source-to-source rewrite replaces a subprogram by an
approximation whose denotation is close in total variation. The TV
metric on FinStoch is designed
precisely so that such local errors do not amplify under surrounding
program structure: by Lemma~1, stochastic postcomposition is 1-Lipschitz, and deterministic wiring is
nonexpansive as well. Concretely, if q, q′ : m → n
are two subterms with
dTV(Ψ⟦q⟧, Ψ⟦q′⟧) ≤ ε,
then for any context C[−]
built from composition, products/coproducts, and the distributive Markov
structure of 𝒰ℕ, we
have
dTV(Ψ⟦C[q]⟧, Ψ⟦C[q′]⟧) ≤ ε.
When C[−] is moreover in the
sense that C[q] uses
at most k occurrences of , we
obtain a uniform bound on the effect of the rewrite on all k-vertex experiments. In practice,
this permits a modular proof methodology: we certify local
approximations in TV, and the global observational deviation for all
bounded tests follows for free. This is particularly useful for
transformations that change sampling order, eliminate latent variables,
or replace an expensive conditional distribution by a cheaper surrogate,
since the denotational error bound is stable under further optimization
passes.
true] = tind(F, W)
under graphon semantics. This yields two practically relevant
regimes.For the aggregated metric dprog, we may truncate: since ∑k > K2−k = 2−K, checking dk only up to k = K controls dprog up to an additive error 2−K. Hence, for a target tolerance η, it suffices to check finitely many induced laws up to K = ⌈log2(1/η)⌉, together with the desired thresholds for each dk.
Because dprog factors through weak isomorphism (Theorem~C), it is insensitive to measure-preserving relabelings and therefore compares rather than arbitrary latent parametrizations. This makes dprog a natural objective or evaluation metric in settings where one learns graphon-like latent structure from data and wishes to compare two fitted models, two training runs, or a fitted model against ground truth. Proposition~B provides a one-sided stability guarantee: small cut distance implies small program distance, so convergence in graphon geometry implies convergence of all finite experiments expressible in the language.
The graph interface can be generalized by replacing the binary predicate with an r-ary relation symbol (or multiple relation symbols), yielding theories of uniform hypergraphs or, more generally, relational structures. On the limit side, one expects the corresponding semantics to be governed by hypergraphons or by Aldous–Hoover exchangeable array representations. The same blueprint should persist: k-bounded programs observe the induced law on the labeled k-element substructure, and a suitable analogue of Theorem~A identifies dk with the maximum discrepancy over induced k-vertex hypergraphs/structures. The main technical work is to import the appropriate limit topology (hypergraph cut norms and their variants) and to re-establish a counterpart of Theorem~C, which is known to be more delicate for hypergraphs than for graphs.
Our setup is intentionally dense: produces a vertex, and yields a Bernoulli adjacency with no additional scaling, so graphons are the appropriate limits. For sparse random graphs, a direct graphon semantics collapses many distinct models, and one must replace W by a graphex-type object and replace δ▫ by an appropriate sparse metric. From the programming side, two issues arise: (i) a sparse interface typically requires an explicit intensity parameter (or Poisson sampling of vertices/edges), and (ii) bounded-k vertex sampling may not reveal sparse structure unless the observation language also exposes counts or degrees at the correct scale. A satisfactory sparse analogue therefore likely needs a refined notion of boundedness (e.g. bounded expected number of revealed edges) and an observation class rich enough to express sparse statistics.
We have focused on closed Boolean programs p : 1 → 2 and TV distance, since this makes dk a clean supremum over events and gives an exact identification with induced subgraph laws. Two enrichments are natural. First, one may allow finite outputs p : 1 → n and use the same matrix TV metric on FinStoch(1, n), thereby measuring discrepancies of multiway observables (not only predicates). Second, one may replace TV by another divergence/metric (e.g. Wasserstein on ordered outputs, or an f-divergence when absolute continuity is available). Such choices can be better aligned with downstream tasks, but they typically forfeit the simple ``maximum over coordinates’’ characterization and may require additional analytic input to relate the resulting program distances to cut-type graphon metrics.
The distances dk and dprog provide a semantics-driven interface between program equivalence and graph limit geometry: they support modular robustness arguments for compiler transformations, reduce bounded observational equivalence to finitely many induced statistics, and admit principled generalizations to richer relational signatures, to sparse models, and to alternative notions of observable discrepancy.