Sketch-WL: Near-Linear Randomized Emulation of k-WL Statistics via Anchors and Tensor/Count Sketches

Metadata

Total Words: 9,769
Export Date: 2026-01-18 02:32:33
Description: Classic analyses of GNN expressivity (e.g., the WL hierarchy, k-WL, and higher-order equivariant networks) explain why message passing is limited but also reveal a computational barrier: exact k-WL-style refinement tracks Θ(n^k) tuples and is impractical beyond small k. Motivated by the survey’s emphasis on topology representation limits and the poor scalability of higher-order models, we propose Sketch-WL, a randomized sketching framework that approximates the refinement statistics underpinning k-WL/k-FWL without explicit k-tuple tensors. Sketch-WL combines (i) anchor-based positional summaries and (ii) compressed multiset/tensor sketches (CountSketch/TensorSketch) to approximate labeled homomorphism counts of bounded-treewidth patterns, which are closely tied to WL-type invariants. We give explicit probabilistic guarantees: for bounded-degree sparse graphs, Sketch-WL estimates counts of motifs (triangles, 4-cycles, and, more generally, any fixed pattern of treewidth ≤ k and size ≤ h) within additive error εn^|V(H)| using Õ(mε⁻²) time and Õ(nε⁻²) space—no Θ(n^k) blow-up. We also provide lower bounds showing that (up to polylog factors) the dependence on input size and on ε⁻²log (1/δ) is unavoidable for key substructure counting tasks. We outline an experimental protocol on synthetic WL-hard instances and molecular benchmarks (e.g., ZINC) comparing Sketch-WL-augmented backbones to ESAN/GSN/KP-GNN and sparse graph transformers.

1. Introduction and Motivation: why WL/higher-order expressivity is useful but computationally prohibitive; how Sketch-WL targets the survey’s scalability bottleneck; contributions and summary of bounds.
2. Background: WL and k-WL/k-FWL; homomorphism counts and bounded-treewidth patterns; sketches (CountSketch/TensorSketch), concentration, and why they approximate multiset/tensor aggregates.
3. Problem Setup and Computational Model: adjacency-list access; bounded-degree/sparse regime; formal definition of WL-statistics-to-approximate; separation vs estimation tasks; error/margin notions.
4. Sketch-WL Algorithm: anchor sampling; positional features; multiset sketching for neighborhood aggregation; tensor sketching for pairwise/path and motif interactions; graph-level readout/signature construction.
5. Core Guarantees for Motif Counting: triangles and 4-cycles first; additive-error bounds; runtime/space; near-matching lower bounds (time and sample/sketch complexity).
6. Generalization to Bounded-Treewidth WL-Statistics: estimating hom(H, G) for tw(H) ≤ k and |V(H)| ≤ h; relation to k-WL invariants; separation theorem under margin assumptions.
7. Limitations and Lower Bounds: why exact k-WL emulation is impossible at near-linear cost in worst-case (unless allowing strong assumptions); hardness of variable-size subgraph counting (#P-hardness); resource lower bounds in query/streaming variants.
8. Implementation Notes (Flagged): practical choices (hash families, sketch width, anchor count), numerical stability, and integration as a drop-in module for MPNNs/transformers.
9. Experimental Protocol (Flagged): datasets (synthetic WL-hard, ZINC-like), baselines (ESAN/GSN/KP-GNN/sparse transformers), metrics (motif MAE, separation accuracy, downstream task gains), and ablation of anchors vs sketches vs iterations.
10. Discussion and Future Work: extending beyond bounded degree, dynamic graphs, feature-aware variants, and tighter WL-equivalence regimes.

Content

1. Introduction and Motivation: why WL/higher-order expressivity is useful but computationally prohibitive; how Sketch-WL targets the survey’s scalability bottleneck; contributions and summary of bounds.

We consider the problem of extracting graph features that are simultaneously (i) sufficiently expressive to distinguish nontrivial structures and (ii) computable at the scale of sparse graphs. A canonical yardstick for expressivity is the Weisfeiler–Leman (WL) refinement family: the 1-dimensional WL procedure underlies many message-passing graph neural networks, while its higher-order variants (k-WL and its folklore variants such as k-FWL) capture increasingly rich combinatorial invariants. The difficulty is that the straightforward implementation of k-WL evolves colors on V^k and requires Θ(n^k) states, which becomes prohibitive even for moderate k once n is large. The same obstacle appears in higher-order neural architectures and tensorized aggregations, where a faithful simulation of tuple refinement requires storing and updating large tensors.

Our point of departure is that, in sparse regimes, it is often unnecessary to materialize the full V^k state in order to approximate the statistics that certify k-WL separation. In particular, a substantial body of work relates k-WL invariants to counts of small patterns, typically formulated as homomorphism counts hom(H, G) for bounded-treewidth graphs H. If we fix a maximum motif size h and a treewidth bound k, then the family
ℋ_k, h = {H : |V(H)| ≤ h, tw(H) ≤ k}
indexes a collection of low-complexity witnesses. The combinatorial content of WL invariants may then be accessed via the vector of statistics (hom(H, G))_{H ∈ ℋ_k, h}, up to known equivalences on appropriate graph classes. This suggests a concrete algorithmic target: approximate, in near-linear time, a suitably rich set of bounded-treewidth homomorphism counts, and package these approximations into a permutation-invariant graph signature.

However, even for fixed h, the naive computation of hom(H, G) by dynamic programming along a tree decomposition of H typically produces intermediate tables whose size is n^O(k), and thus again runs into a tensor blow-up when implemented directly. The main technical objective is therefore to replace these large intermediate objects by succinct randomized sketches that preserve the relevant multilinear forms. In this way we seek a ``sketched’’ analogue of higher-order refinement: we do not store tuple colors, but rather maintain per-node (or per-anchor) compressed summaries from which the required motif statistics can be estimated with controlled variance.

The procedure we study, which we refer to as , instantiates this objective in the adjacency-list access model on sparse graphs. The algorithm maintains, for each node v ∈ V, a sketch state S_ℓ(v) updated over ℓ = 0, 1, …, L rounds by aggregating the multiset of neighbor sketches and applying a sketchable compression map. Two features are central. First, the multiset aggregation step is realized by linear sketches (e.g., CountSketch-style signed hashing), so that the update is compatible with the linearity properties required for unbiased estimation. Second, we augment purely local refinement with a small number of randomly sampled that provide coarse global position information (for example, truncated distances or short random-walk landing estimates). This augmentation is designed to mitigate symmetries that defeat purely local schemes while preserving permutation invariance in distribution.

From the standpoint of output, Sketch-WL produces a graph-level signature vector Φ_k, h(G) ∈ ℝ^d obtained by pooling the final node sketches together with explicit estimators for selected motif statistics. The intended contract is additive approximation: for each fixed pattern H with |V(H)| ≤ h, we aim for
$$ \bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr| \;\le\; \varepsilon\, n^{|V(H)|} $$
with failure probability at most δ, uniformly over all H ∈ ℋ_k, h. The scale εn^|V(H)| is the natural worst-case normalization, since hom(H, G) ≤ n^|V(H)| for simple graphs. In dense regimes this guarantee is coarse; in the sparse bounded-degree regime it is meaningful insofar as motif counts for fixed H can still vary on the order of n (or larger, depending on H), while exact counting remains sensitive to fine local structure.

The computational aim is that the above approximation be achievable without ever storing Θ(n^k) tuple states. Under a bounded-degree promise Δ ≤ Δ₀ (with Δ₀ constant), Sketch-WL can be implemented so that node refinement costs essentially a constant number of passes over adjacency lists, and motif estimation reduces to a controlled number of sparse linear-algebraic primitives (e.g., A-vector multiplications) and tensor-sketch updates. Consequently, for fixed k, h the overall complexity is near-linear in the input size, of the form
$$ \tilde{O}\!\left(m\cdot f(k,h)\cdot \varepsilon^{-2}\log\!\frac{1}{\delta}\right) \quad\text{time} \qquad\text{and}\qquad \tilde{O}\!\left(n\cdot f(k,h)\cdot \varepsilon^{-2}\log\!\frac{1}{\delta}\right) \quad\text{space}, $$
where f(k, h) depends only on the pattern parameters (typically exponentially in h, or in (k, h)) and the Õ(⋅) notation suppresses polylogarithmic factors in n (and similarly in ε⁻¹, δ⁻¹ when appropriate). The dependence on ε and δ matches the standard behavior of variance-reduction via independent repetitions and concentration bounds for sketch estimators.

In addition to approximation, we emphasize a separation guarantee that is aligned with the WL motivation. If two graphs G and G^′ differ on some target statistic by a margin exceeding the approximation error, namely if for some H ∈ ℋ_k, h one has |hom(H, G) − hom(H, G^′)| > 2εn^|V(H)|, then the corresponding sketched signatures differ with probability at least 1 − δ. Thus Sketch-WL does not merely approximate counts in isolation; it yields a randomized invariant whose discrimination power tracks that of the targeted bounded-treewidth statistics up to an explicit additive tolerance.

Finally, we view lower bounds as part of the motivation, in that they clarify which resource savings are feasible in our access model. For example, even the seemingly modest task of distinguishing triangle-rich from triangle-poor graphs in the worst case forces Ω(m) adjacency-list operations, and achieving additive error εn³ with failure probability δ requires on the order of ε⁻²log (1/δ) independent measurements. Hence, once we commit to worst-case guarantees, one should not expect sublinear-time algorithms in m, nor should one expect to remove the quadratic dependence on ε⁻¹ without additional structural assumptions. The role of Sketch-WL is therefore to match these unavoidable dependencies while eliminating the Θ(n^k) blow-up that would arise from a direct implementation of k-WL or tensor-based higher-order message passing.

The remainder of this exposition develops the requisite background for these claims. We review WL refinements and their higher-order variants, the connection between WL invariants and homomorphism counts for bounded-treewidth patterns, and the sketching primitives (CountSketch and TensorSketch) together with the concentration tools that justify their use for multiset and tensor aggregates.

2. Background: WL and k-WL/k-FWL; homomorphism counts and bounded-treewidth patterns; sketches (CountSketch/TensorSketch), concentration, and why they approximate multiset/tensor aggregates.

We recall the 1-dimensional Weisfeiler–Leman (WL) procedure (also called color refinement). Starting from initial colors c₀(v) (which may encode discrete node labels, degrees, or be constant), WL iteratively updates
$$ c_{\ell}(v)\;:=\;\mathrm{Hash}\Bigl(c_{\ell-1}(v),\ \multiset{c_{\ell-1}(u):u\in N(v)}\Bigr), \qquad \ell=1,2,\dots, $$
where $\multiset{\cdot}$ denotes a multiset and Hash is an injective encoding up to relabeling of colors. Two graphs are distinguished by 1-WL if, for some round ℓ, the multisets of colors differ (equivalently, the color histograms differ). The algorithm is permutation-invariant by construction and runs in time O(Lm) for L rounds. Its limitation is well known: there exist nonisomorphic regular graphs that remain indistinguishable under 1-WL for all ℓ.

For k ≥ 2, the k-WL refinement lifts the state space from vertices to k-tuples. One may view it as refining colors on V^k, where each tuple v = (v₁, …, v_k) ∈ V^k carries a color C_ℓ(v). The update aggregates over obtained by varying one coordinate:
$$ C_{\ell}(\mathbf{v}) \;:=\; \mathrm{Hash}\Bigl(C_{\ell-1}(\mathbf{v}),\ \bigl(\multiset{C_{\ell-1}(\mathbf{v}^{(i\leftarrow u)}):u\in V}\bigr)_{i=1}^k\Bigr), $$
where v^{(i ← u)} denotes the tuple obtained from v by replacing its i-th component by u. The initial coloring C₀(v) typically encodes the equality pattern among coordinates and the adjacency relations induced by G on the tuple. The expressivity of k-WL increases with k, and, on many graph classes, moderate k suffices to distinguish graphs that defeat 1-WL.

A commonly used computationally friendlier analogue is the folklore k-FWL variant, in which the aggregation over u ∈ V is restricted to local neighborhoods (e.g., replacing u ∈ V by u ∈ N(v_i), or by u adjacent to some coordinate), thereby recovering a message-passing interpretation on tuples. Regardless of the exact variant, the salient point for our purposes is that direct simulation maintains Θ(n^k) states and performs Θ(kn^k ⋅ n) or Θ(kn^kΔ)-type aggregations, which is infeasible at scale even when m is sparse.

We recall that a graph homomorphism from a pattern H to a host graph G is a map φ : V(H) → V(G) such that {x, y} ∈ E(H) implies {φ(x), φ(y)} ∈ E(G). The homomorphism count is
hom(H, G) = ∑_{φ : V(H) → V(G)} ∏_{{x, y} ∈ E(H)}1[{φ(x), φ(y)} ∈ E(G)],
and satisfies the trivial bound hom(H, G) ≤ n^|V(H)|. We emphasize that hom(H, G) is a polynomial in the adjacency indicators of G of degree |E(H)| and is invariant under relabeling of V(G).

A key structural fact underlying our approach is that, for fixed k, families of bounded-treewidth patterns capture precisely the information content of k-WL invariants in a quantitative sense: there are characterizations (and refinements thereof) relating indistinguishability by k-WL to equality of homomorphism counts from all patterns of treewidth at most k (with minor shifts in the parameter depending on the chosen WL variant and whether one considers vertex-colored graphs). We do not reprove these connections here; rather, we use them as guidance for which statistics to approximate. Concretely, fixing a size bound h and treewidth bound k, we focus on the finite family
ℋ_k, h = {H: |V(H)| ≤ h, tw(H) ≤ k},
and treat the vector (hom(H, G))_{H ∈ ℋ_k, h} as a surrogate for the bounded-complexity portion of k-WL information.

When tw(H) ≤ k, a tree decomposition of H of width k yields a standard dynamic program for computing hom(H, G). The computation propagates tables indexed by assignments of vertices of G to the current bag, and each table has size on the order of n^k + 1. For fixed k and h, the number of bags is O(h), so the algorithm is polynomial-time in n; however, it explicitly materializes intermediate tensors of dimension k + 1, which is exactly the blow-up we seek to avoid when n is large. Thus, even though hom(H, G) is ``fixed-parameter tractable’’ in (k, h), we require a different implementation strategy to achieve near-linear resource bounds in sparse graphs.

CountSketch is a canonical linear sketch for frequency vectors. Given a universe [D], choose a hash function h : [D] → [s] and a sign function σ : [D] → {±1} (typically from a pairwise independent family). For a vector x ∈ ℝ^D, its sketch CS(x) ∈ ℝ^s is defined by
CS(x)[j] = ∑_{i: h(i) = j}σ(i) x_i.
The map x ↦ CS(x) is linear, and inner products are preserved in expectation: for independent sketches,
𝔼[⟨CS(x), CS(y)⟩] = ⟨x, y⟩,
with variance decaying as O(∥x∥₂²∥y∥₂²/s) under standard assumptions. Using depth t (independent repetitions) and aggregating by median or mean yields concentration with failure probability exp (−Θ(t)). For our purposes, we interpret multiset aggregation as sketching a histogram vector; linearity then permits summing neighbor contributions without enumerating distinct multiset keys.

When dynamic programs produce multilinear expressions (e.g., products of compatibility indicators across bags), the intermediate object is naturally a tensor (Kronecker product) rather than a vector. TensorSketch provides an efficient linear map TS(⋅) such that for vectors x⁽¹⁾, …, x^(q) one can form a sketch of the tensor product x⁽¹⁾ ⊗ ⋯ ⊗ x^(q) without materializing it, and moreover preserve inner products:
$$ \mathbb{E}\bigl[\langle \mathrm{TS}(x^{(1)}\otimes\cdots\otimes x^{(q)}),\ \mathrm{TS}(y^{(1)}\otimes\cdots\otimes y^{(q)})\rangle\bigr] \;=\; \prod_{i=1}^q \langle x^{(i)},y^{(i)}\rangle. $$
The standard construction uses hash-and-sign maps together with convolution (e.g., via FFT) to combine sketches of the factors. The relevance is that many bounded-treewidth homomorphism computations reduce to repeated applications of such multilinear contractions; TensorSketch allows us to approximate these contractions while storing only O(s) numbers per sketched table rather than n^Θ(k) entries.

The sketches we employ are unbiased (or nearly so, depending on normalization), and their variances admit bounds in terms of sketch width s and depth t. Consequently, by choosing s = Θ(ε⁻²) (up to logarithmic factors) and t = Θ(log (1/δ)) and applying standard concentration inequalities (Chebyshev with repetition, or median-of-means), we obtain additive error guarantees of the form εn^|V(H)| for a fixed pattern H. Since k and h are fixed, the family ℋ_k, h is finite, and we may take a union bound over H ∈ ℋ_k, h to obtain simultaneous correctness for all requested motifs. The core conceptual point is that linear sketches commute with the summation/product structure arising from multiset aggregation and bounded-treewidth dynamic programs, thereby enabling us to approximate WL-aligned statistics without ever constructing tuple colors or large intermediate tensors.

3. Problem Setup and Computational Model: adjacency-list access; bounded-degree/sparse regime; formal definition of WL-statistics-to-approximate; separation vs estimation tasks; error/margin notions.

We fix integers k ≥ 2 and h ≥ 3 throughout and consider an input graph G = (V, E) with |V| = n and |E| = m. Our objective is to approximate a collection of WL-aligned statistics of G—specifically, homomorphism counts from bounded-size, bounded-treewidth patterns—using a procedure whose time is near-linear in m and whose memory is near-linear in n, up to factors depending only on (k, h) and polylogarithmic factors in n.

We work in the randomized RAM model with . Concretely, for any vertex v ∈ V the algorithm may query deg (v) and may query the i-th neighbor N(v)[i] for i ∈ [deg (v)]. We count time in such adjacency-list operations (together with O(1) arithmetic on O(log n)-bit words). We assume the algorithm can sample a uniform random vertex from V at unit cost; sampling a uniform random directed edge can then be implemented by sampling v proportional to deg (v) via standard preprocessing, or by rejection sampling under bounded degree. Our constructions will be expressible as one (or a constant number of) passes over adjacency lists, while maintaining O(1) (sketched) state per vertex.

The primary regime is sparse graphs, i.e. m = Õ(n), and in the main theorems we assume a bounded-degree promise
Δ = max_v ∈ Vdeg (v) ≤ Δ₀,
where Δ₀ is a fixed constant. This assumption is not merely aesthetic: it rules out pathological concentration failures for local sampling and simplifies variance bounds for sketch-based multilinear estimators. When Δ is unbounded, additional normalization or degree-splitting may be required; since our focus is on near-linear procedures in sparse graphs, we treat bounded degree as the canonical case.

For a fixed pattern graph H we write hom(H, G) for the number of graph homomorphisms φ : V(H) → V(G), i.e.,
hom(H, G) = ∑_{φ : V(H) → V(G)} ∏_{{x, y} ∈ E(H)}1[{φ(x), φ(y)} ∈ E(G)].
We fix the finite family of patterns
ℋ_k, h = {H: |V(H)| ≤ h, tw(H) ≤ k},
and we regard the vector (hom(H, G))_{H ∈ ℋ_k, h} as the statistics-to-approximate. The role of the treewidth bound is to ensure that these statistics correspond to bounded-complexity WL information and admit decompositions amenable to sketching, while the size bound |V(H)| ≤ h ensures the family is finite and hence supports uniform guarantees via a union bound. Since
0 ≤ hom(H, G) ≤ n^|V(H)| for all H,
it is natural to measure error additively in units of n^|V(H)|.

Given parameters ε ∈ (0, 1) and δ ∈ (0, 1), the estimation task is to output, for each requested H ∈ ℋ_k, h, a randomized estimator $\widehat{\mathrm{hom}}(H,G)$ satisfying
$$ \Pr\Bigl[\ \bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr|\ \le\ \varepsilon\,n^{|V(H)|}\ \Bigr]\ \ge\ 1-\delta, $$
and, more strongly, to achieve this for all H ∈ ℋ_k, h with overall failure probability at most δ:
$$ \Pr\Bigl[\ \forall H\in\mathcal{H}_{k,h}:\ \bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr|\ \le\ \varepsilon\,n^{|V(H)|}\ \Bigr]\ \ge\ 1-\delta. $$
Since k and h are fixed, |ℋ_k, h| is a constant (depending only on (k, h)), and the uniform guarantee is obtained by choosing per-pattern failure probability δ/|ℋ_k, h| and applying a union bound. We emphasize that we target accuracy; when hom(H, G) = Θ(n^|V(H)|) this is equivalent to O(ε) relative error, whereas for sparser motifs additive error is the appropriate worst-case benchmark.

Beyond estimating individual counts, we seek a graph-level signature Φ_k, h(G) ∈ ℝ^d which supports a robust separation statement. Formally, we say that two graphs G and G^′ are if there exists a witness pattern H ∈ ℋ_k, h with
|hom(H, G) − hom(H, G^′)| > 2ε n^|V(H)|.
The margin 2εn^|V(H)| is chosen so that, if both estimators are accurate to εn^|V(H)|, then the corresponding estimated intervals are disjoint. The separation task is to construct Φ_k, h(G) such that, with probability at least 1 − δ over the algorithmic randomness,
G, G^′ are (k, h, ε)-separated ⇒ Φ_k, h(G) ≠ Φ_k, h(G^′).
In particular, if we include the (approximate) motif counts themselves as part of the signature, then separation follows immediately from the uniform estimation guarantee; however, we will also incorporate additional WL-style refined summaries so that the signature serves as a compact proxy for iterative neighborhood information as well.

Our algorithms are required to avoid the Θ(n^k) state size implicit in maintaining colors on V^k. Instead, we target space Õ(n ⋅ f(k, h) ⋅ ε⁻²log (1/δ)) by storing only O(1) sketches per vertex (and a constant number of global sketches), and time Õ(m ⋅ f(k, h) ⋅ ε⁻²log (1/δ)) by performing sparse neighborhood aggregations and a bounded number of sketch-based multilinear contractions. Here f(k, h) may be exponential in h (or in (k, h)) but is independent of n and m, reflecting fixed-parameter dependence on the pattern complexity. The output dimension d is permitted to scale as Õ(f(k, h)ε⁻²log (1/δ)), consistent with the number of stored sketch coordinates.

All procedures are randomized, using hash functions and (when needed) uniform samples of vertices or local neighborhoods. We require that the output distribution is invariant under relabeling of V(G); equivalently, for any permutation π of V the distributions of Φ_k, h(G) and Φ_k, h(π(G)) coincide. This is achieved by using only permutation-invariant access (adjacency lists) together with randomness independent of labeling (uniform vertex sampling and universal hashing).

The next section instantiates these requirements by combining (i) WL-inspired neighborhood refinement implemented via linear multiset sketches and (ii) bounded-treewidth motif estimation implemented via tensor sketches, yielding a single signature Φ_k, h(G) that supports both uniform additive estimation and margin-based separation under the stated resource bounds.

4. Sketch-WL Algorithm: anchor sampling; positional features; multiset sketching for neighborhood aggregation; tensor sketching for pairwise/path and motif interactions; graph-level readout/signature construction.

We now describe the randomized procedure that produces the signature Φ_k, h(G) and the estimators $\widehat{\mathrm{hom}}(H,G)$ while maintaining only sketched states at vertices. Conceptually, the construction has two coupled components: (i) a WL-style refinement operating on vertex states but implemented via linear sketches of neighbor multisets, and (ii) a motif module which contracts collections of refined vertex states using tensor-sketch primitives aligned with bounded-treewidth decompositions. Throughout we view k, h as fixed constants, and we select the sketch parameters so that the total number of stored coordinates is Õ(f(k, h)ε⁻²log (1/δ)).

A purely local refinement can be too symmetric in sparse bounded-degree graphs; we therefore inject a small amount of global reference information by sampling a set of Anc = {a₁, …, a_r} ⊆ V uniformly at random, with r = Õ(ε⁻²log (1/δ)) (or smaller when anchors are used only for stabilization rather than estimation). For each v ∈ V we compute a positional feature vector p(v) relative to Anc. One convenient choice is the truncated distance profile
p(v) = (min {dist(v, a_j), q})_j = 1^r,
where q = O(log n) bounds the search radius; under bounded degree, q can be treated as a constant in the neighborhood explored around each anchor. Alternatively, one may use short random-walk landing probabilities from each anchor, approximated by a small number of steps and stored in sketch form. The only property we require is that p(v) is label-invariant (depends only on graph structure and random seed) and computable in near-linear time in m for fixed r, q.

We form an initial vertex descriptor by concatenating any available base feature (e.g., a constant, degree, or a hashed label when present) with the anchor-position vector:
x(v) = concat(base(v), p(v)).
We then apply a CountSketch map to obtain a fixed-width randomized summary S₀(v) ∈ ℝ^s, where s is the sketch width and t is the depth (number of independent hash/sign functions). Formally, we fix t hash functions h_j : [D] → [s] and sign functions σ_j : [D] → {±1} (for an appropriate input dimension D) drawn from a limited-independence family, and we store the sketched coordinates
(S₀(v))_j, b = ∑_{i : h_j(i) = b}σ_j(i) x(v)_i, j ∈ [t], b ∈ [s].
We emphasize that we do not require x(v) to be explicitly materialized as a dense vector; it suffices that the nonzero features can be streamed into the sketch updates.

For ℓ ≥ 1 we update vertex states by aggregating the multiset {S_{ℓ − 1}(u) : u ∈ N(v)}. A direct WL update would store a histogram over neighbor colors; we instead maintain a multiset operator. Concretely, we define
T_ℓ(v) = MultiSetSketch({S_{ℓ − 1}(u) : u ∈ N(v)}) ∈ ℝ^{s^′},
where MultiSetSketch is linear in its inputs. The simplest instantiation is coordinatewise summation of neighbor sketches (possibly after an additional randomized hashing layer to reduce correlations):
T_ℓ(v) = ∑_{u ∈ N(v)}Ψ(S_{ℓ − 1}(u)),
with Ψ a fixed random sign-permutation map. Linearity ensures that T_ℓ(v) can be computed by iterating once over adjacency lists and accumulating into the bucketed arrays of v. Under bounded degree, the per-vertex update cost is O(Δ₀ ⋅ s^′), hence total time O(m ⋅ s^′) per refinement round.

Given the previous state and the aggregated neighbor sketch, we compute the next refined state by a randomized hash-and-sketch step:
S_ℓ(v) = HashAndSketch(S_{ℓ − 1}(v), T_ℓ(v)) ∈ ℝ^s.
One may implement HashAndSketch as (i) concatenation of S_{ℓ − 1}(v) and T_ℓ(v) followed by a fresh CountSketch, or (ii) a polynomial sketch of low degree in the coordinates of (S_{ℓ − 1}(v), T_ℓ(v)) when we wish to model higher-order interactions. For the purposes of our guarantees it suffices that the map is (a) permutation invariant with respect to neighbor ordering, (b) computed from limited-independence hashing, and (c) induces bounded variance for the downstream motif estimators. We perform L such refinement rounds, where L is a fixed constant chosen to match the desired locality depth (and, heuristically, the number of WL iterations we intend to approximate).

After refinement, we have vertex states S_L(v) that act as randomized feature vectors summarizing rooted neighborhoods (with anchor references). To estimate homomorphism counts for a fixed bounded-treewidth pattern H, we must contract collections of such features across multiple vertices of G in a manner consistent with the adjacency constraints of H. We proceed by fixing a tree decomposition of H of width at most k, and writing hom(H, G) as a sum over assignments to bags of products of local compatibility indicators. The intermediate objects in such a dynamic program are tensors of order at most k + 1; rather than materializing them, we maintain tensor sketches of these tensors. In particular, for any multilinear form
$$ \sum_{v_1,\dots,v_p\in V} \Bigl(\prod_{(i,j)\in \mathcal{C}} \mathbf{1}[\{v_i,v_j\}\in E]\Bigr)\cdot \Bigl(\prod_{i=1}^p g_i(v_i)\Bigr), $$
with p ≤ h and constraint set 𝒞 derived from H, we replace the products by a TensorSketch estimator that is unbiased and computable from the sketched vertex states and sparse edge traversals. For small motifs such as length-2 paths and wedges, the required contractions reduce to simple bilinear forms over E; for more complex bounded-treewidth motifs, the decomposition ensures that all contractions are over constant-width separators and can be implemented with f(k, h) many sketch coordinates.

We define the graph-level signature Φ_k, h(G) by combining (i) a global pooling of refined vertex states and (ii) the motif estimators:
$$ \Phi_{k,h}(G)\;=\;\Bigl(\sum_{v\in V} S_L(v)\Bigr)\ \oplus\ \Bigl(\widehat{\mathrm{hom}}(H,G)\Bigr)_{H\in\mathcal{H}_{k,h}}, $$
where ⊕ denotes concatenation. The pooled vector ∑_vS_L(v) serves as a compact summary of refined local neighborhoods, while the appended motif estimates provide explicit access to the WL-aligned statistics. Because both parts are computed from permutation-invariant primitives (uniform anchor sampling, adjacency-list iteration, and hashing independent of labels), the distribution of Φ_k, h(G) is invariant under relabeling of V.

All steps are implementable with O(1) (sketched) state per vertex: we store S_ℓ(v) for ℓ ∈ {0, …, L} either explicitly for the current round (overwriting in place) or with two buffers, together with the anchor-position features p(v) (or their sketches). Each refinement round is one pass over adjacency lists to accumulate T_ℓ(v), followed by an O(n) pass to apply HashAndSketch. The motif module is likewise expressible as a bounded number of sparse passes and sketch contractions whose total work is Õ(m ⋅ f(k, h) ⋅ ε⁻²log (1/δ)) when sketch widths are set to achieve variance O(ε²n^2|V(H)|) and failure probability δ after a union bound over H ∈ ℋ_k, h. In the next section we instantiate these choices for triangles and 4-cycles and then state the corresponding additive-error guarantees together with near-matching lower bounds in the access model.

5. Core Guarantees for Motif Counting: triangles and 4-cycles first; additive-error bounds; runtime/space; near-matching lower bounds (time and sample/sketch complexity).

We first instantiate the motif module for the two canonical constant-size motifs: triangles and 4-cycles. These serve as a concrete calibration point for (i) the additive-error guarantees we target throughout and (ii) the near-linear dependence on m and on the sketch repetition parameter ε⁻²log (1/δ).

Let A be the adjacency matrix of an undirected simple graph G. The triangle count satisfies
$$ T(G)\;=\;\frac{1}{6}\,\mathrm{tr}(A^3), $$
since each triangle contributes 6 oriented length-3 closed walks. For 4-cycles we use the standard decomposition of length-4 closed walks:
$$ \mathrm{tr}(A^4)\;=\;8\,C_4(G)\;+\;2m\;+\;4\sum_{v\in V}\binom{d(v)}{2}, $$
where d(v) is the degree of v. Indeed, the trace counts (a) genuine 4-cycles (each contributing 8), (b) back-and-forth walks on a single edge (contributing 2 per edge), and (c) walks supported on a length-2 path u − v − w (contributing 4 per unordered neighbor-pair at the middle vertex). Consequently,
$$ C_4(G)\;=\;\frac{1}{8}\Bigl(\mathrm{tr}(A^4)-2m-4\sum_{v\in V}\binom{d(v)}{2}\Bigr), $$
and the correction term is computable exactly in O(m) time from the adjacency lists.

For any symmetric matrix M and a Rademacher vector z ∈ {±1}ⁿ, we have
𝔼[z^⊤Mz] = tr(M),
and, moreover, Var(z^⊤Mz) ≤ 2∥M∥_F². We therefore estimate tr(A^p) for p ∈ {3, 4} by averaging independent probes:
$$ \widehat{\tau}_p\;:=\;\frac{1}{q}\sum_{i=1}^q \bigl(z^{(i)}\bigr)^\top A^p z^{(i)}. $$
Each quantity (z⁽ⁱ⁾)^⊤A^pz⁽ⁱ⁾ is computed using p sparse matrix–vector multiplications, which, under adjacency-list access, reduce to p passes over the edge set:
(Ax)_v = ∑_{u ∈ N(v)}x_u, v ∈ V.
Thus, one probe costs O(m) arithmetic operations and O(n) working memory for a constant number of vectors; the total cost is O(qm) time and O(n) space, up to polylogarithmic factors needed to implement limited-independence hashing when we generate the z⁽ⁱ⁾ pseudorandomly.

We package the two trace estimators into
$$ \widehat{T}(G)\;:=\;\frac{1}{6}\widehat{\tau}_3, \qquad \widehat{C}_4(G)\;:=\;\frac{1}{8}\Bigl(\widehat{\tau}_4-2m-4\sum_{v\in V}\binom{d(v)}{2}\Bigr). $$
The following statement is the specialization of our general sketching guarantees to these motifs.

Unbiasedness is immediate from 𝔼[z^⊤A^pz] = tr(A^p) and the deterministic correction for C₄(G). For concentration, we invoke the variance bound Var(z^⊤Mz) ≤ 2∥M∥_F² with M = A^p, together with averaging over q probes and a standard median-of-means (or Chebyshev plus repetition) amplification to reach failure probability δ. Since we target additive error εn^p, it suffices that q = Θ(ε⁻²log (1/δ)) controls the empirical mean to within εn^p; polylogarithmic overhead accounts for limited-independence generation of the probes and for union bounding when we report multiple motif estimates simultaneously. The time bound follows because each probe requires a constant number of sparse matvecs, each implementable in O(m) adjacency-list operations; the working memory is O(n) per probe and reused across probes.

In the full procedure, T̂(G) and Ĉ₄(G) are appended to the pooled refined-state readout. The resulting concatenated vector remains permutation invariant in distribution: the probe vectors may be generated from a seed independent of vertex labels (or derived from the same seed as the anchor sampling), and the subsequent computations are expressed as sums over adjacency lists.

The above dependence on m and on ε⁻²log (1/δ) is not an artifact of our implementation; it is forced by the access model in the worst case.

First, any adjacency-list algorithm that distinguishes two families of graphs that differ only by Θ(εn³) triangles must, in the worst case, inspect Ω(m) edges. The reason is that an adversary can localize all triangle discrepancies inside a subset of edges that is information-theoretically hidden unless a linear fraction of adjacency entries is queried. Formally, one applies Yao’s minimax principle to a distribution supported on graphs in which (i) all but a planted subgraph region are identical, and (ii) the planted region is placed uniformly among many candidate locations; then any sublinear-time algorithm fails to hit the planted region with constant probability, and cannot distinguish the two cases.

Second, even assuming Ω(m) time per pass, the number of independent measurements cannot be reduced below Ω(ε⁻²log (1/δ)) for worst-case additive error. This follows from the classical sample-complexity lower bound for estimating the mean of a bounded-variance random variable: for any unbiased estimator assembled from q independent probes (or q independent sketch repetitions), the variance decreases at best linearly in q, so achieving standard deviation O(εn³) (and failure probability δ via amplification) forces q = Ω(ε⁻²log (1/δ)). This lower bound applies to triangle estimation (and, with the same reasoning, to any fixed-degree trace/homomorphism statistic whose unbiased probe has variance bounded away from zero on a hard family), and it matches the repetition complexity in Theorem~ up to polylogarithmic factors.

These two lower bounds justify our choice of objectives: in the adjacency-list model, near-linear time in m and ε⁻²log (1/δ) repetitions are essentially optimal for constant-size motif estimation. In the next section we extend the motif module from these two trace-based motifs to arbitrary bounded-treewidth patterns, recovering the corresponding WL-aligned homomorphism statistics.

6. Generalization to Bounded-Treewidth WL-Statistics: estimating hom(H, G) for tw(H) ≤ k and |V(H)| ≤ h; relation to k-WL invariants; separation theorem under margin assumptions.

We now extend the motif module from the two trace-based motifs to the full family of bounded-treewidth patterns. Fix integers k ≥ 2 and h ≥ 3, and recall ℋ_k, h denotes the collection of graphs H with |V(H)| ≤ h and tw(H) ≤ k. Our target statistic for each H ∈ ℋ_k, h is the homomorphism count hom(H, G), i.e., the number of maps φ : V(H) → V(G) such that {φ(x), φ(y)} ∈ E(G) for all {x, y} ∈ E(H). For fixed H, this quantity is a degree-|E(H)| polynomial in the adjacency matrix entries of G, and it subsumes standard small-motif counts (injective counts can be recovered from hom(⋅, G) by inclusion–exclusion for fixed H).

Let H ∈ ℋ_k, h and fix a tree decomposition (𝒯, {B_α}_{α ∈ V(𝒯)}) of width at most k, so |B_α| ≤ k + 1 for all bags. A classical exact algorithm computes hom(H, G) by dynamic programming over 𝒯: for each bag B_α one maintains a table indexed by assignments σ : B_α → V(G), storing the number of extensions of σ to a homomorphism from the portion of H covered by the subtree below α. The combine operation at a parent bag α with children β₁, …, β_r is a product-and-sum over assignments on intersections B_α ∩ B_{β_j}, and the running time is n^O(k) for fixed k (already too large for our purpose when n is large, even if k is small).

Our sketching step replaces each such table by a linear sketch that supports (i) summation over assignments, (ii) multiplication of independent factors (as induced by the decomposition), and (iii) restriction/gluing along separators B_α ∩ B_β, all without materializing n^k + 1 entries. Concretely, we represent a table F_α : V(G)^B_α → ℝ by a CountSketch/TensorSketch of F_α viewed as a vector of length n^|B_α|. Since |B_α| ≤ k + 1 is constant, we may regard the sketch dimension as depending only on (k, h, ε, δ) and not on n.

At an internal node α of the decomposition, the exact recurrence has the schematic form
$$ F_\alpha(\sigma)\;=\;\mathrm{Loc}_\alpha(\sigma)\cdot \prod_{j=1}^{r} \Bigl(\sum_{\tau_j:\,V(G)^{B_{\beta_j}}\to V(G),\,\tau_j|_{B_\alpha\cap B_{\beta_j}}=\sigma|_{B_\alpha\cap B_{\beta_j}}} F_{\beta_j}(\tau_j)\Bigr), $$
where Loc_α(σ) ∈ {0, 1} enforces the edges of H whose endpoints both lie in B_α (and similarly enforces consistency of already-introduced vertices). The key point is that the operations involved are multilinear in the child tables. We implement the multilinear forms using TensorSketch so that, for each α, from sketches of F_β₁, …, F_{β_r} we can produce a sketch of F_α with (i) unbiasedness for every linear query of F_α and (ii) variance bounded by a factor controlled by the sketch width and depth. The local factor Loc_α(σ) is applied by multiplying by adjacency indicators; under adjacency-list access this reduces to iterating over edges incident to candidate images of bag vertices, and since |B_α| is constant and Δ ≤ Δ₀, the local enforcement can be integrated into the sketch update with constant overhead per neighborhood lookup.

At the root bag ρ, the homomorphism count is hom(H, G) = ∑_σF_ρ(σ); in the sketch representation this is a linear functional and is therefore estimated unbiasedly from the sketch of F_ρ. Repeating the sketch with independent hash functions and applying median-of-means yields concentration at failure probability δ.

We fix a tree decomposition of H of width ≤ k. Each DP combine step is a multilinear map in the child tables followed by summation over eliminated variables; TensorSketch linearizes such multilinear maps in the sense that the sketch of a product can be computed from the sketches of the factors, and any linear query of the resulting table has an unbiased estimator with variance proportional to the squared ℓ₂-mass of the table. Since H is constant size and Δ is bounded, the ℓ₂-mass of the DP tables is controlled by a polynomial in n of degree |V(H)|, and choosing sketch width s = Θ̃(ε⁻²) and depth t = Θ̃(log (1/δ)) bounds the variance so that a median-of-means amplification yields the stated additive error. The adjacency-list access is used only to evaluate constant-size local constraints and to perform neighborhood-based aggregations; these cost Õ(m) per independent sketch repetition.

We emphasize that the restriction tw(H) ≤ k is not merely algorithmic convenience: it aligns the statistic family with k-dimensional WL invariants. In particular, known WL–homomorphism characterizations imply that equality of hom(H, ⋅) over all bounded-treewidth patterns H (with width bounded as a function of k) is an isomorphism-invariant refinement comparable to k-WL. Consequently, the collection {hom(H, G)}_{H ∈ ℋ_k, h} may be viewed as a finite truncation of the corresponding WL-aligned invariant family, and our procedure approximates precisely this truncation while avoiding the Θ(n^k) tuple-state blow-up inherent in explicit k-WL refinements.

We now state the separation property that justifies including $\{\widehat{\mathrm{hom}}(H,G)\}_{H\in\mathcal{H}_{k,h}}$ inside the global signature. Let Φ_k, h(G) denote the concatenation of the pooled refined-state readout with the motif estimates $\widehat{\mathrm{hom}}(H,G)$ for H ∈ ℋ_k, h (or for any predetermined subfamily). Since each coordinate estimate is permutation-invariant in distribution and is computed from adjacency lists through sums and sketches, Φ_k, h is an isomorphism-invariant randomized signature.

By Theorem~ and a union bound over the finitely many H ∈ ℋ_k, h that we estimate, with probability at least 1 − δ all estimates satisfy $|\widehat{\mathrm{hom}}(H,\cdot)-\mathrm{hom}(H,\cdot)|\le \varepsilon n^{|V(H)|}$ simultaneously. On this event, if hom(H, G) and hom(H, G^′) differ by more than 2εn^|V(H)| for some witness H, then the corresponding estimated coordinates differ (their error intervals are disjoint), hence the concatenated vectors Φ_k, h(G) and Φ_k, h(G^′) are unequal.

The foregoing establishes that bounded-treewidth homomorphism statistics can be approximated uniformly in near-linear time and used as separating coordinates in our signature; we next delineate limitations and lower bounds that preclude exact emulation of full k-WL at comparable resource bounds in worst-case regimes.

7. Limitations and Lower Bounds: why exact k-WL emulation is impossible at near-linear cost in worst-case (unless allowing strong assumptions); hardness of variable-size subgraph counting (#P-hardness); resource lower bounds in query/streaming variants.

Our guarantees deliberately target the regime of parameters (k, h) and recovery of the WL-aligned statistics {hom(H, G)}_{H ∈ ℋ_k, h}. We now record the obstructions showing that, in worst-case graphs and under standard access models, one cannot simultaneously demand (i) exact emulation of k-WL, (ii) uniformity over patterns of unbounded size, and (iii) near-linear resource bounds in m and n.

The k-dimensional WL refinement maintains a color for each ordered k-tuple in V^k and iteratively updates it using multisets of colors of neighboring tuples (neighbors obtained by changing one coordinate at a time). Even ignoring the cost of forming these multisets, the object being refined has cardinality n^k, hence any explicit simulation necessarily uses Ω(n^k) time merely to touch all tuple states, and Ω(n^k) space to store the current colors (or any information sufficient to output them).

One might hope to circumvent this by computing only a graph-level invariant that separates exactly the same graph pairs as k-WL, without materializing the tuple coloring. However, there exist families of graphs for which the stabilized k-WL coloring induces Θ(n^k) distinct color classes on V^k (indeed, for generic instances the orbit structure on tuples can be essentially discrete). Any procedure that matches k-WL separation on all graphs must in particular be able to distinguish instances whose only witnesses live at the tuple level, and in such cases the information content of the stabilized coloring is itself Θ(n^k) in the worst case. Thus, absent additional structure assumptions on the graph class (e.g., bounded treewidth, bounded expansion, or strong regularity constraints), exact k-WL emulation is incompatible with our objective of avoiding Θ(n^k) blow-up.

Our procedure fixes h in advance and treats f(k, h) as a constant with respect to n. This restriction is not an artifact of the analysis but is compelled by classical counting hardness. In particular, the following facts are standard.

Given an input pattern H with |V(H)| = h as part of the input, computing sub(H, G) is #P-complete. A canonical reduction is from #Clique: taking H = K_h, the value sub(K_h, G) counts h-cliques (up to automorphisms), and deciding/ counting cliques is #P-hard when h is part of the input.

While hom(H, G) permits non-injective maps, it remains hard uniformly over H. Indeed, for fixed H the quantities hom(F, G) over all quotients F of H determine sub(H, G) via Möbius inversion on the partition lattice of V(H) (equivalently, inclusion–exclusion over identifications of vertices). Consequently, an algorithm that could compute hom(H, G) exactly for arbitrary input H in time polynomial in n and |H| would yield exact injective counts under polynomial-time Turing reductions, contradicting #P-hardness unless unlikely complexity collapses occur.

These hardness statements justify the design choice that (k, h) must be treated as fixed constants if one seeks uniform near-linear dependence on m and n.

Even when H is fixed (say H = K₃), the access model imposes strong barriers. With adjacency-list access, a randomized algorithm that must succeed on worst-case graphs cannot, in general, estimate T(G) (or distinguish T(G) from T(G) ± Θ(εn³)) without inspecting a linear number of edges in the worst case. Intuitively, an adversary can place all triangle-discrepancy inside a set of edges that is information-theoretically hidden unless the algorithm probes a constant fraction of adjacency lists; formal statements follow by Yao-style minimax arguments with distributions supported on graphs that differ only in a sparse, hard-to-find region. Thus the Õ(m) dependence of our motif estimators is not merely convenient; it is essentially necessary in the worst case.

Independently, concentration imposes the familiar dependence on ε and δ. Any unbiased estimator with bounded variance requires on the order of ε⁻²log (1/δ) independent measurements (sketch repetitions, random probes, or equivalent randomness) to achieve additive error εn^|V(H)| with failure probability at most δ. Hence, within the usual randomized estimation framework, our ε⁻²log (1/δ) scaling is near-optimal up to polylogarithmic factors.

Our procedures exploit adjacency-list access to perform neighborhood-limited consistency checks and sparse matrix–vector style primitives. In an insertion-only edge stream, however, even triangle counting admits strong space lower bounds via reductions from communication complexity (e.g., set-disjointness embeddings). In particular, one cannot in general maintain sufficiently informative sketches in o(n) space while guaranteeing worst-case additive accuracy with constant success probability. This separates the offline adjacency-list model from the streaming model: the former admits near-linear-time, near-linear-space estimators for fixed motifs, while the latter provably forbids analogous guarantees without additional promises on the input distribution or graph class.

Our uniform additive bounds are stated in the bounded-degree regime Δ ≤ Δ₀. This hypothesis enters both algorithmically (local constraint enforcement can be done with constant neighborhood work) and statistically (higher moments of the relevant estimators can be controlled). If Δ is unbounded, then either (i) local enforcement costs can scale with Δ, or (ii) variance can be dominated by high-degree vertices, necessitating additional normalization, clipping, or degree-sensitive sampling; such modifications typically trade simplicity for more delicate bias–variance analyses. Moreover, since our guarantees are additive, motifs whose true counts are o(n^|V(H)|) cannot be promised small error without further density assumptions.

In summary, the constraints fixed $(k,h)$'',randomized approximation’‘, and ``near-linear dependence on m and n’’ are jointly consistent and essentially sharp under standard complexity and access-model lower bounds; dropping any of these constraints changes the landscape qualitatively.

8. Implementation Notes (Flagged): practical choices (hash families, sketch width, anchor count), numerical stability, and integration as a drop-in module for MPNNs/transformers.

We record practical choices for implementing Sketch-WL so that the resulting procedure remains faithful to the stated access model while being usable as a component in standard learning pipelines. The guiding constraint is that all randomness should enter through explicit hash functions and anchor sampling, and all state should remain linear in n up to the chosen sketch sizes.

In all sketching subroutines (CountSketch, MultiSetSketch, TensorSketch), we require only limited independence to obtain the usual variance bounds. In practice we instantiate each hash as a fast 64-bit mix function followed by a range reduction. Concretely, for bucket hashes h : [n] → [s] and sign hashes σ : [n] → {±1}, we use either (i) tabulation hashing (which empirically performs well and is close to 3–4-wise independent in relevant regimes), or (ii) multiply–shift families of the form
h(x) = ((ax + b) mod 2⁶⁴) shr (64 − ⌈log₂s⌉), σ(x) = (−1)^{lsb((cx + d) mod 2⁶⁴)},
with independent odd a, c and uniform b, d. We emphasize that separation comparisons between graphs should use : the same hash parameters and (when applicable) the same anchor-sampling seed. For training, one may optionally resample hashes across epochs as a form of noise injection, but evaluation should fix a seed so that signatures are stable and comparable.

Theoretical bounds suggest s = Θ(ε⁻²) (up to logarithmic factors) and t = Θ(log (1/δ)) when using median/mean aggregation across independent repetitions. Empirically, we find that small depths t ∈ {2, 3, 4} already control catastrophic collisions, while the width s primarily governs the bias/variance tradeoff. A useful heuristic is to choose s so that a single node sketch occupies a cache-friendly footprint (e.g., s = 128, 256, 512), then increase t only if the downstream objective exhibits heavy-tailed errors. When a single scalar estimate is required (e.g., a single motif count), we prefer median-of-means over a raw mean across depths, since it improves robustness to occasional large-collision events without changing asymptotic cost.

Anchors serve two roles: they break symmetries in a controlled, distributional manner, and they provide a low-dimensional positional scaffold that helps downstream models (and sometimes the sketch itself) disambiguate globally similar neighborhoods. We implement anchors as uniformly sampled vertices a₁, …, a_r ∈ V, and we compute truncated BFS distances dist(v, a_i) ∧ q for a small radius q (typically q ∈ {3, 4, 5}). This can be done via r BFS runs in time Õ(rm), which is acceptable when r is small (e.g., r ∈ [8, 64]) and m is sparse; on larger graphs we instead use multi-source BFS over batches of anchors or replace distances by short random-walk landing sketches (a small number of sparse matrix–vector multiplies with restart). We also record that, for permutation invariance, anchors must be sampled independently of vertex labels; when deterministic runs are required, we implement sampling by hashing the vertex identifier with a global seed and selecting the r smallest hash values.

The refinement depth L plays the role of message-passing radius. For bounded-size motif estimation with |V(H)| ≤ h, it is typically sufficient to take L on the order of the motif diameter (often L ≤ h), whereas for purely discriminative signatures one may increase L until performance saturates. The multiset compression step should be implemented as a linear operator in neighbor sketches. In sparse frameworks, we realize
T_ℓ(v) = ∑_{u ∈ N(v)}S_{ℓ − 1}(u)
followed by a hash-based mixing (to emulate independent CountSketch rows) as a sparse matrix aggregation (SpMM) with optional signed neighbor contributions. This avoids explicit neighbor-list materialization and yields near-linear passes over edges. For unbounded degrees, we recommend degree-normalized sums (e.g., dividing by $\sqrt{\deg(v)}$) to mitigate variance dominated by hubs; this does not alter invariance and empirically stabilizes training.

Because sketches are sums of signed contributions, their variance grows with the number of aggregated terms and the number of refinement rounds. In float32 arithmetic, large graphs or large L can therefore incur noticeable roundoff. We address this by (i) using float32 for per-node sketches but accumulating graph-level sums in float64, (ii) optionally applying a per-round rescaling (e.g., multiplying T_ℓ(v) by (deg (v) + 1)^−1/2), and (iii) applying standard normalization layers (LayerNorm or RMSNorm) when Sketch-WL is embedded into a learned model. For purely algorithmic motif estimation (without learning), we instead prefer explicit variance control via increased s or increased independent repetitions. When reporting motif estimates, we recommend storing both the point estimate and an empirical standard deviation across depths t as a diagnostic.

For patterns that admit trace expressions (triangles, 4-cycles), Hutchinson-style estimators reduce to repeated sparse matrix–vector multiplies. In implementations, we fuse these multiplies where possible and reuse random probe vectors across motifs to reduce memory traffic. For bounded-treewidth patterns handled via TensorSketch, we precompute the tree decomposition of each H once, then generate the corresponding sequence of multilinear contractions as a small static computation graph whose parameters are the hash seeds. This yields a fixed overhead depending only on h and k, and ensures that the per-graph runtime is dominated by sparse passes over E.

We treat Sketch-WL as a structural encoder producing node features ψ(v) ∈ ℝ^D and optionally graph features Ψ(G) ∈ ℝ^{D^′}. In an MPNN, we concatenate ψ(v) to the initial node attributes and run standard message passing; in a transformer, we use ψ(v) either as an additive positional encoding or as additional keys/values in attention. Since all sketch operations are linear in the input features given fixed hashes, gradients propagate through the linear algebra, while the hashes themselves remain non-trainable (as in random-feature methods). If one includes the motif estimates $\widehat{\mathrm{hom}}(H,G)$, they can be appended to the graph representation or used as auxiliary regression targets; in either case, one should share hash seeds across the training set to avoid introducing a spurious source of distribution shift.

As a starting point for sparse graphs, we take r ∈ [16, 32], q ∈ [3, 5], L ∈ [3, 6], s ∈ {256, 512}, and t ∈ {3, 4}, then tune s first (for variance) and L second (for expressivity). These choices keep per-node state modest while preserving the essential near-linear scaling in m and n targeted by our analysis.

9. Experimental Protocol (Flagged): datasets (synthetic WL-hard, ZINC-like), baselines (ESAN/GSN/KP-GNN/sparse transformers), metrics (motif MAE, separation accuracy, downstream task gains), and ablation of anchors vs sketches vs iterations.

We design experiments to probe three claims implicit in the preceding sections: (i) that the sketch-based motif estimators achieve useful additive accuracy at near-linear cost on sparse graphs, (ii) that the resulting signatures separate graphs whenever a bounded-treewidth witness statistic differs beyond the prescribed margin, and (iii) that the same machinery can be used as a practical structural encoder that improves downstream prediction relative to standard message passing at comparable computational budgets.

To stress-test separation rather than average-case prediction, we include synthetic benchmarks where low-dimensional local statistics are insufficient. Concretely, we generate pairs and small collections of graphs drawn from classical WL-hard constructions (e.g., Cai–Fürer–Immerman style pairs at controlled size) and from families of regular graphs with matched degree profiles and near-matched small-radius neighborhoods. We additionally include motif-controlled random graph families where a base sparse random graph is perturbed by planting a prescribed number of copies of a fixed motif (triangles, 4-cycles, and selected H ∈ ℋ_k, h with h ≤ 6) in randomly chosen locations while preserving the maximum-degree cap by rejecting placements that exceed Δ₀. This yields instances where the only intended distinguishing signal is a controlled change in hom(H, G), allowing us to evaluate separation under explicit additive margins.

To evaluate utility as an encoder in a standard supervised pipeline, we use a molecular regression suite in the style of ZINC (graphs with atom and bond types, sparse and moderate size). We follow standard splits (train/validation/test) and report mean absolute error (MAE) on the target property. Since molecules are naturally bounded-degree, these data also align with the bounded-degree regime assumed in our analysis. For all datasets, we treat node and edge attributes as given; when graphs are unlabeled we use only a constant node feature and optionally degree as an additional baseline feature, so that any gains are attributable to structural information rather than rich attributes.

We compare against (i) expressive MPNN baselines (GIN/GraphConv variants) with comparable parameter counts, (ii) substructure-enhanced architectures, including ESAN (with its rooted-subgraph enumeration choices), GSN (subgraph counting features where feasible), and KP-GNN (kernel/positional enhancements), and (iii) sparse transformer-style graph models that scale by neighborhood sampling or sparsified attention. For motif estimation specifically, we include two reference points: an exact motif counter for small graphs and small motifs (enumeration with pruning), used only to provide ground-truth labels; and a trace-based estimator for triangles/4-cycles based on sparse matrix–vector multiplies without Sketch-WL refinement, serving as a sanity check that the Hutchinson component behaves as expected. Where a baseline requires motif features not available at scale (e.g., exact subgraph counts), we either restrict to motif sizes where computation is feasible on the given graphs or report the baseline as ``not applicable’’ rather than imputing approximations that change its nature.

For separation experiments, we enforce common randomness across all graphs in a comparison set: identical hash seeds, identical sketch widths/depths, and (when anchors are used) identical anchor-sampling seeds. This implements the invariance/distributional comparability discussed earlier and prevents spurious separation due to different random projections. For supervised training, we fix one global seed per run (including hashes and anchor sampling) and average over multiple independent runs; we also report the standard deviation across runs for the main metrics. When baselines have tunable depth/width, we tune within a matched compute budget (similar wall-clock time or parameter count) using the validation split; for Sketch-WL we tune (r, q, L, s, t) with the heuristic that s governs variance and L governs expressivity, consistent with the implementation notes.

For motif estimation, we report MAE for each target H, both in raw scale and normalized by n^|V(H)| to reflect the additive error target εn^|V(H)|. When motif counts are very small (near-zero), we additionally report a clipped relative error to avoid division artifacts. For separation, we construct labeled pairs (G, G^′) where the ground-truth witness statistic differs by a controlled amount; we then measure separation accuracy, defined as the fraction of pairs for which Φ_k, h(G) ≠ Φ_k, h(G^′) under a fixed seed, and we plot accuracy as a function of the planted margin (expressed in units of n^|V(H)|). For downstream tasks, we report MAE on regression (ZINC-like) and, where applicable, ROC-AUC on classification tasks; we also report throughput (edges processed per second) and peak memory, since our primary objective is to avoid Θ(n^k) state growth.

We isolate the roles of the main components by a controlled ablation suite. First, we remove anchors entirely (r = 0) and keep only refinement-based sketches, to measure how much positional scaffolding contributes beyond pure multiset aggregation. Second, we keep anchors but replace CountSketch/MultiSetSketch by exact dense concatenation on small graphs (where feasible), which exposes the effect of sketch collisions independently of aggregation design. Third, we vary the refinement depth L while holding (s, t) fixed, to observe the saturation point where additional rounds no longer improve separation or downstream accuracy. Fourth, we vary sketch width s and depth t to trace the bias–variance and collision-robustness tradeoffs predicted by standard sketch analyses; we summarize these as (MAE, separation accuracy) versus memory footprint per node. Finally, we ablate the motif module by (i) using only node-level signatures ∑_vS_L(v) and (ii) appending the explicit motif estimates $\{\widehat{\mathrm{hom}}(H,G)\}$, thereby quantifying the incremental value of explicit low-treewidth statistics in supervised tasks.

Unless otherwise stated, each plotted point is averaged over multiple random seeds, with error bars reflecting standard error; all motif ground truths are computed exactly when h is small enough for the dataset sizes under consideration. We emphasize that our aim is not to optimize absolute predictive performance per se, but to demonstrate that the near-linear sketching regime yields (a) reliable motif estimates at the intended additive scale, (b) separation behavior consistent with the witness-based guarantees, and (c) competitive downstream gains when used as a drop-in structural encoder.

10. Discussion and Future Work: extending beyond bounded degree, dynamic graphs, feature-aware variants, and tighter WL-equivalence regimes.

Our analysis and experimental protocol are organized around a bounded-degree regime, fixed pattern size h, and bounded treewidth tw(H) ≤ k. We regard these choices as the minimal assumptions under which one can simultaneously (i) retain WL-relevant statistics, (ii) obtain uniform additive guarantees of the form εn^|V(H)|, and (iii) avoid Θ(n^k) storage. We now indicate several directions in which the present framework can be extended, together with technical obstacles that appear fundamental in the adjacency-list model.

The bounded-degree promise Δ ≤ Δ₀ enters our bounds through variance control: whenever local neighborhoods are explored (explicitly via refinement or implicitly via A-vector products), high-degree vertices can dominate second moments. A first extension is to replace worst-case degree control by a distributional or moment condition, e.g.,
∑_v ∈ Vdeg (v)² ≤ B or Pr_v ∼ V[deg (v) > τ] ≤ η,
and to let sketch widths scale with B (or with τ, η) rather than with Δ₀. On the algorithmic side, we can incorporate degree-aware normalization into the refinement operator (e.g., a signed sum of neighbor sketches scaled by deg (v)^−1/2) so that the induced random features resemble polynomials in a normalized adjacency matrix. This typically stabilizes the contribution of hubs but changes the target statistic from raw homomorphism counts to weighted counts; one must then track how the weighting interacts with the WL–homomorphism correspondence.

A complementary approach is explicit heavy-vertex handling. We may maintain exact neighborhood aggregates for vertices with deg (v) > τ (for a threshold τ chosen as a function of ε), and apply sketches only to the light part. This is analogous to standard ``heavy–light’’ decompositions in sublinear algorithms for triangle counting. The open point is to obtain a clean uniform guarantee simultaneously for all H ∈ ℋ_k, h without introducing dependence on the number of heavy vertices that is too pessimistic. We expect that a hybrid estimator, in which DP-style decomposition of H is performed with conditioning on whether each mapped vertex lands in the heavy set, can yield bounds depending on ∑_vdeg (v)^p for small p determined by H.

The sketch primitives used in Sketch-WL (CountSketch, TensorSketch, and linear multiset sketches) are, by construction, linear. This suggests a dynamic variant in which we maintain per-node states S_ℓ(v) under edge insertions and deletions. At a purely algebraic level, if T_ℓ(v) is a linear function of neighbor sketches, then an update (u, v) can be propagated locally. The difficulty is algorithmic rather than definitional: WL-style refinement is an iterative fixed-point computation, hence a single edge update can, in principle, affect sketches at all radii up to L.

Two restricted regimes appear tractable. First, when we fix L to be a small constant (as in practical encoders), one can propagate updates for O(L) hops, giving worst-case update time Õ(Δ^L) and, in bounded-degree graphs, polylogarithmic amortized time if updates are sparse. Second, for motif estimation modules expressible as traces (e.g., triangles, 4-cycles), there are streaming/dynamic trace estimators based on maintaining sketch vectors z and quantities of the form z^⊤A^pz; since A changes by a rank-2 update per edge, these quantities can sometimes be updated faster than recomputing sparse products from scratch. A precise characterization of which H ∈ ℋ_k, h admit polylogarithmic-time dynamic maintenance (with additive accuracy) remains open, especially under deletions where adversarial cancellations can inflate variance.

In many applications graphs carry node and edge labels (categorical or continuous). Our sketching formalism already permits this: the base feature map base_feature(v) in the initialization step can include a hashed embedding of node labels, and the multiset sketch can be made edge-aware by hashing neighbor sketches together with an edge-label token. Two questions are then substantive.

First, we may ask for guarantees that are uniform over labelings in the same sense as our unlabelled guarantees. This is immediate when labels are treated as fixed inputs and the randomness is only in the sketches; however, the relevant scale of hom(H, G) changes because label constraints reduce the number of admissible homomorphisms. The natural normalization becomes n^|V(H)| multiplied by a label-frequency factor, and one should state bounds in terms of the empirical label distribution.

Second, we may ask for feature-aware analogues of the WL–homomorphism connection. For colored graphs, it is standard to consider colored homomorphisms (respecting colors). Our estimator constructions extend by inserting indicator factors into the local compatibility predicates in the tree-decomposition DP. The remaining work is to determine which families of colored patterns witness the expressivity of the corresponding colored k-WL, and to quantify how many such patterns are needed for a given accuracy target.

Our separation guarantee is stated with respect to the family ℋ_k, h, hence it is inherently a ``finite truncation’’ of the full collection of bounded-treewidth patterns. In classes where k-WL equivalence is characterized by equality of all hom(H, ⋅) for tw(H) ≤ k, the truncation parameter h determines what we can witness. It is therefore natural to ask for quantitative completeness statements of the following form: given two graphs G, G^′ that are distinguished by k-WL after L rounds, does there exist a witness H with tw(H) ≤ k and |V(H)| ≤ g(k, L) such that hom(H, G) ≠ hom(H, G^′)? Any such bound g would translate directly into a principled choice of h for Sketch-WL, and would tighten the gap between approximate sketch-based signatures and exact k-WL distinguishability.

Relatedly, we have focused on additive guarantees. In WL contexts one sometimes cares about exact distinguishability rather than approximation. This suggests studying ``collision-free’’ regimes: for fixed n and fixed families of graphs, can one choose sketch width s and depth t so that, with high probability, the entire refinement process is injective on the multiset of neighborhoods encountered (thus simulating exact refinement)? Such statements are plausible when the number of distinct intermediate multisets is subexponential in s, but proving them in a data-dependent manner would require controlling adaptivity across rounds.

On the algorithmic side, we have not optimized constants or polylogarithmic factors. There is room to reduce the number of passes over adjacency lists, to exploit SIMD-friendly hash families, and to combine multiple motifs in a single tensor sketch to amortize cost. On the lower-bound side, our discussion has emphasized triangles as a representative statistic. It would be useful to identify, for each fixed H ∈ ℋ_k, h, the tight dependence on ε and on degree/moment parameters in the adjacency-list model, and to isolate motifs whose estimation is strictly harder than trace-style motifs even under bounded degree.

We view these directions as compatible with the central objective: to obtain WL-relevant, permutation-invariant signatures that scale near-linearly in sparse graphs while providing explicit statistical guarantees for a controlled family of patterns.