We consider the problem of extracting graph features that are simultaneously (i) sufficiently expressive to distinguish nontrivial structures and (ii) computable at the scale of sparse graphs. A canonical yardstick for expressivity is the Weisfeiler–Leman (WL) refinement family: the 1-dimensional WL procedure underlies many message-passing graph neural networks, while its higher-order variants (k-WL and its folklore variants such as k-FWL) capture increasingly rich combinatorial invariants. The difficulty is that the straightforward implementation of k-WL evolves colors on Vk and requires Θ(nk) states, which becomes prohibitive even for moderate k once n is large. The same obstacle appears in higher-order neural architectures and tensorized aggregations, where a faithful simulation of tuple refinement requires storing and updating large tensors.
Our point of departure is that, in sparse regimes, it is often
unnecessary to materialize the full Vk state in
order to approximate the statistics that certify k-WL separation. In particular, a
substantial body of work relates k-WL invariants to counts of small
patterns, typically formulated as homomorphism counts hom(H, G) for
bounded-treewidth graphs H. If
we fix a maximum motif size h
and a treewidth bound k, then
the family
ℋk, h = {H : |V(H)| ≤ h, tw(H) ≤ k}
indexes a collection of low-complexity witnesses. The combinatorial
content of WL invariants may then be accessed via the vector of
statistics (hom(H, G))H ∈ ℋk, h,
up to known equivalences on appropriate graph classes. This suggests a
concrete algorithmic target: approximate, in near-linear time, a
suitably rich set of bounded-treewidth homomorphism counts, and package
these approximations into a permutation-invariant graph signature.
However, even for fixed h, the naive computation of hom(H, G) by dynamic programming along a tree decomposition of H typically produces intermediate tables whose size is nO(k), and thus again runs into a tensor blow-up when implemented directly. The main technical objective is therefore to replace these large intermediate objects by succinct randomized sketches that preserve the relevant multilinear forms. In this way we seek a ``sketched’’ analogue of higher-order refinement: we do not store tuple colors, but rather maintain per-node (or per-anchor) compressed summaries from which the required motif statistics can be estimated with controlled variance.
The procedure we study, which we refer to as , instantiates this objective in the adjacency-list access model on sparse graphs. The algorithm maintains, for each node v ∈ V, a sketch state Sℓ(v) updated over ℓ = 0, 1, …, L rounds by aggregating the multiset of neighbor sketches and applying a sketchable compression map. Two features are central. First, the multiset aggregation step is realized by linear sketches (e.g., CountSketch-style signed hashing), so that the update is compatible with the linearity properties required for unbiased estimation. Second, we augment purely local refinement with a small number of randomly sampled that provide coarse global position information (for example, truncated distances or short random-walk landing estimates). This augmentation is designed to mitigate symmetries that defeat purely local schemes while preserving permutation invariance in distribution.
From the standpoint of output, Sketch-WL produces a graph-level
signature vector Φk, h(G) ∈ ℝd
obtained by pooling the final node sketches together with explicit
estimators for selected motif statistics. The intended contract is
additive approximation: for each fixed pattern H with |V(H)| ≤ h, we aim
for
$$
\bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr|
\;\le\;
\varepsilon\, n^{|V(H)|}
$$
with failure probability at most δ, uniformly over all H ∈ ℋk, h.
The scale εn|V(H)|
is the natural worst-case normalization, since hom(H, G) ≤ n|V(H)|
for simple graphs. In dense regimes this guarantee is coarse; in the
sparse bounded-degree regime it is meaningful insofar as motif counts
for fixed H can still vary on
the order of n (or larger,
depending on H), while exact
counting remains sensitive to fine local structure.
The computational aim is that the above approximation be achievable
without ever storing Θ(nk)
tuple states. Under a bounded-degree promise Δ ≤ Δ0 (with
Δ0 constant),
Sketch-WL can be implemented so that node refinement costs essentially a
constant number of passes over adjacency lists, and motif estimation
reduces to a controlled number of sparse linear-algebraic primitives
(e.g., A-vector
multiplications) and tensor-sketch updates. Consequently, for fixed
k, h the overall
complexity is near-linear in the input size, of the form
$$
\tilde{O}\!\left(m\cdot f(k,h)\cdot
\varepsilon^{-2}\log\!\frac{1}{\delta}\right)
\quad\text{time}
\qquad\text{and}\qquad
\tilde{O}\!\left(n\cdot f(k,h)\cdot
\varepsilon^{-2}\log\!\frac{1}{\delta}\right)
\quad\text{space},
$$
where f(k, h) depends
only on the pattern parameters (typically exponentially in h, or in (k, h)) and the Õ(⋅) notation suppresses
polylogarithmic factors in n
(and similarly in ε−1, δ−1
when appropriate). The dependence on ε and δ matches the standard behavior of
variance-reduction via independent repetitions and concentration bounds
for sketch estimators.
In addition to approximation, we emphasize a separation guarantee that is aligned with the WL motivation. If two graphs G and G′ differ on some target statistic by a margin exceeding the approximation error, namely if for some H ∈ ℋk, h one has |hom(H, G) − hom(H, G′)| > 2εn|V(H)|, then the corresponding sketched signatures differ with probability at least 1 − δ. Thus Sketch-WL does not merely approximate counts in isolation; it yields a randomized invariant whose discrimination power tracks that of the targeted bounded-treewidth statistics up to an explicit additive tolerance.
Finally, we view lower bounds as part of the motivation, in that they clarify which resource savings are feasible in our access model. For example, even the seemingly modest task of distinguishing triangle-rich from triangle-poor graphs in the worst case forces Ω(m) adjacency-list operations, and achieving additive error εn3 with failure probability δ requires on the order of ε−2log (1/δ) independent measurements. Hence, once we commit to worst-case guarantees, one should not expect sublinear-time algorithms in m, nor should one expect to remove the quadratic dependence on ε−1 without additional structural assumptions. The role of Sketch-WL is therefore to match these unavoidable dependencies while eliminating the Θ(nk) blow-up that would arise from a direct implementation of k-WL or tensor-based higher-order message passing.
The remainder of this exposition develops the requisite background for these claims. We review WL refinements and their higher-order variants, the connection between WL invariants and homomorphism counts for bounded-treewidth patterns, and the sketching primitives (CountSketch and TensorSketch) together with the concentration tools that justify their use for multiset and tensor aggregates.
We recall the 1-dimensional
Weisfeiler–Leman (WL) procedure (also called color refinement). Starting
from initial colors c0(v) (which may
encode discrete node labels, degrees, or be constant), WL iteratively
updates
$$
c_{\ell}(v)\;:=\;\mathrm{Hash}\Bigl(c_{\ell-1}(v),\
\multiset{c_{\ell-1}(u):u\in N(v)}\Bigr),
\qquad \ell=1,2,\dots,
$$
where $\multiset{\cdot}$ denotes a
multiset and Hash is an injective
encoding up to relabeling of colors. Two graphs are distinguished by
1-WL if, for some round ℓ, the multisets of colors differ
(equivalently, the color histograms differ). The algorithm is
permutation-invariant by construction and runs in time O(Lm) for L rounds. Its limitation is well
known: there exist nonisomorphic regular graphs that remain
indistinguishable under 1-WL for all
ℓ.
For k ≥ 2, the k-WL refinement lifts the state
space from vertices to k-tuples. One may view it as
refining colors on Vk, where each
tuple v = (v1, …, vk) ∈ Vk
carries a color Cℓ(v).
The update aggregates over obtained by varying one coordinate:
$$
C_{\ell}(\mathbf{v})
\;:=\;
\mathrm{Hash}\Bigl(C_{\ell-1}(\mathbf{v}),\
\bigl(\multiset{C_{\ell-1}(\mathbf{v}^{(i\leftarrow u)}):u\in
V}\bigr)_{i=1}^k\Bigr),
$$
where v(i ← u)
denotes the tuple obtained from v by replacing its i-th component by u. The initial coloring C0(v)
typically encodes the equality pattern among coordinates and the
adjacency relations induced by G on the tuple. The expressivity of
k-WL increases with k, and, on many graph classes,
moderate k suffices to
distinguish graphs that defeat 1-WL.
A commonly used computationally friendlier analogue is the folklore k-FWL variant, in which the aggregation over u ∈ V is restricted to local neighborhoods (e.g., replacing u ∈ V by u ∈ N(vi), or by u adjacent to some coordinate), thereby recovering a message-passing interpretation on tuples. Regardless of the exact variant, the salient point for our purposes is that direct simulation maintains Θ(nk) states and performs Θ(knk ⋅ n) or Θ(knkΔ)-type aggregations, which is infeasible at scale even when m is sparse.
We recall that a graph homomorphism from a pattern H to a host graph G is a map φ : V(H) → V(G)
such that {x, y} ∈ E(H)
implies {φ(x), φ(y)} ∈ E(G).
The homomorphism count is
hom(H, G) = ∑φ : V(H) → V(G) ∏{x, y} ∈ E(H)1[{φ(x), φ(y)} ∈ E(G)],
and satisfies the trivial bound hom(H, G) ≤ n|V(H)|.
We emphasize that hom(H, G) is a polynomial
in the adjacency indicators of G of degree |E(H)| and is invariant
under relabeling of V(G).
A key structural fact underlying our approach is that, for fixed
k, families of
bounded-treewidth patterns capture precisely the information content of
k-WL invariants in a
quantitative sense: there are characterizations (and refinements
thereof) relating indistinguishability by k-WL to equality of homomorphism
counts from all patterns of treewidth at most k (with minor shifts in the
parameter depending on the chosen WL variant and whether one considers
vertex-colored graphs). We do not reprove these connections here;
rather, we use them as guidance for which statistics to approximate.
Concretely, fixing a size bound h and treewidth bound k, we focus on the finite
family
ℋk, h = {H: |V(H)| ≤ h, tw(H) ≤ k},
and treat the vector (hom(H, G))H ∈ ℋk, h
as a surrogate for the bounded-complexity portion of k-WL information.
When tw(H) ≤ k, a tree decomposition of H of width k yields a standard dynamic program for computing hom(H, G). The computation propagates tables indexed by assignments of vertices of G to the current bag, and each table has size on the order of nk + 1. For fixed k and h, the number of bags is O(h), so the algorithm is polynomial-time in n; however, it explicitly materializes intermediate tensors of dimension k + 1, which is exactly the blow-up we seek to avoid when n is large. Thus, even though hom(H, G) is ``fixed-parameter tractable’’ in (k, h), we require a different implementation strategy to achieve near-linear resource bounds in sparse graphs.
CountSketch is a canonical linear sketch for frequency vectors. Given
a universe [D], choose a hash
function h : [D] → [s] and
a sign function σ : [D] → {±1} (typically
from a pairwise independent family). For a vector x ∈ ℝD, its
sketch CS(x) ∈ ℝs is
defined by
CS(x)[j] = ∑i: h(i) = jσ(i) xi.
The map x ↦ CS(x) is
linear, and inner products are preserved in expectation: for independent
sketches,
𝔼[⟨CS(x), CS(y)⟩] = ⟨x, y⟩,
with variance decaying as O(∥x∥22∥y∥22/s)
under standard assumptions. Using depth t (independent repetitions) and
aggregating by median or mean yields concentration with failure
probability exp (−Θ(t)). For our
purposes, we interpret multiset aggregation as sketching a histogram
vector; linearity then permits summing neighbor contributions without
enumerating distinct multiset keys.
When dynamic programs produce multilinear expressions (e.g., products
of compatibility indicators across bags), the intermediate object is
naturally a tensor (Kronecker product) rather than a vector.
TensorSketch provides an efficient linear map TS(⋅) such that for vectors x(1), …, x(q)
one can form a sketch of the tensor product x(1) ⊗ ⋯ ⊗ x(q)
without materializing it, and moreover preserve inner products:
$$
\mathbb{E}\bigl[\langle \mathrm{TS}(x^{(1)}\otimes\cdots\otimes
x^{(q)}),\ \mathrm{TS}(y^{(1)}\otimes\cdots\otimes y^{(q)})\rangle\bigr]
\;=\;
\prod_{i=1}^q \langle x^{(i)},y^{(i)}\rangle.
$$
The standard construction uses hash-and-sign maps together with
convolution (e.g., via FFT) to combine sketches of the factors. The
relevance is that many bounded-treewidth homomorphism computations
reduce to repeated applications of such multilinear contractions;
TensorSketch allows us to approximate these contractions while storing
only O(s) numbers per
sketched table rather than nΘ(k)
entries.
The sketches we employ are unbiased (or nearly so, depending on normalization), and their variances admit bounds in terms of sketch width s and depth t. Consequently, by choosing s = Θ(ε−2) (up to logarithmic factors) and t = Θ(log (1/δ)) and applying standard concentration inequalities (Chebyshev with repetition, or median-of-means), we obtain additive error guarantees of the form εn|V(H)| for a fixed pattern H. Since k and h are fixed, the family ℋk, h is finite, and we may take a union bound over H ∈ ℋk, h to obtain simultaneous correctness for all requested motifs. The core conceptual point is that linear sketches commute with the summation/product structure arising from multiset aggregation and bounded-treewidth dynamic programs, thereby enabling us to approximate WL-aligned statistics without ever constructing tuple colors or large intermediate tensors.
We fix integers k ≥ 2 and h ≥ 3 throughout and consider an input graph G = (V, E) with |V| = n and |E| = m. Our objective is to approximate a collection of WL-aligned statistics of G—specifically, homomorphism counts from bounded-size, bounded-treewidth patterns—using a procedure whose time is near-linear in m and whose memory is near-linear in n, up to factors depending only on (k, h) and polylogarithmic factors in n.
We work in the randomized RAM model with . Concretely, for any vertex v ∈ V the algorithm may query deg (v) and may query the i-th neighbor N(v)[i] for i ∈ [deg (v)]. We count time in such adjacency-list operations (together with O(1) arithmetic on O(log n)-bit words). We assume the algorithm can sample a uniform random vertex from V at unit cost; sampling a uniform random directed edge can then be implemented by sampling v proportional to deg (v) via standard preprocessing, or by rejection sampling under bounded degree. Our constructions will be expressible as one (or a constant number of) passes over adjacency lists, while maintaining O(1) (sketched) state per vertex.
The primary regime is sparse graphs, i.e. m = Õ(n), and in
the main theorems we assume a bounded-degree promise
Δ = maxv ∈ Vdeg (v) ≤ Δ0,
where Δ0 is a fixed
constant. This assumption is not merely aesthetic: it rules out
pathological concentration failures for local sampling and simplifies
variance bounds for sketch-based multilinear estimators. When Δ is unbounded, additional
normalization or degree-splitting may be required; since our focus is on
near-linear procedures in sparse graphs, we treat bounded degree as the
canonical case.
For a fixed pattern graph H
we write hom(H, G)
for the number of graph homomorphisms φ : V(H) → V(G),
i.e.,
hom(H, G) = ∑φ : V(H) → V(G) ∏{x, y} ∈ E(H)1[{φ(x), φ(y)} ∈ E(G)].
We fix the finite family of patterns
ℋk, h = {H: |V(H)| ≤ h, tw(H) ≤ k},
and we regard the vector (hom(H, G))H ∈ ℋk, h
as the statistics-to-approximate. The role of the treewidth bound is to
ensure that these statistics correspond to bounded-complexity WL
information and admit decompositions amenable to sketching, while the
size bound |V(H)| ≤ h ensures
the family is finite and hence supports uniform guarantees via a union
bound. Since
0 ≤ hom(H, G) ≤ n|V(H)| for
all H,
it is natural to measure error additively in units of n|V(H)|.
Given parameters ε ∈ (0, 1)
and δ ∈ (0, 1), the estimation
task is to output, for each requested H ∈ ℋk, h,
a randomized estimator $\widehat{\mathrm{hom}}(H,G)$
satisfying
$$
\Pr\Bigl[\ \bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr|\
\le\ \varepsilon\,n^{|V(H)|}\ \Bigr]\ \ge\ 1-\delta,
$$
and, more strongly, to achieve this for all H ∈ ℋk, h
with overall failure probability at most δ:
$$
\Pr\Bigl[\ \forall H\in\mathcal{H}_{k,h}:\
\bigl|\widehat{\mathrm{hom}}(H,G)-\mathrm{hom}(H,G)\bigr|\ \le\
\varepsilon\,n^{|V(H)|}\ \Bigr]\ \ge\ 1-\delta.
$$
Since k and h are fixed, |ℋk, h| is a
constant (depending only on (k, h)), and the uniform
guarantee is obtained by choosing per-pattern failure probability δ/|ℋk, h|
and applying a union bound. We emphasize that we target accuracy; when
hom(H, G) = Θ(n|V(H)|)
this is equivalent to O(ε) relative error,
whereas for sparser motifs additive error is the appropriate worst-case
benchmark.
Beyond estimating individual counts, we seek a graph-level signature
Φk, h(G) ∈ ℝd
which supports a robust separation statement. Formally, we say that two
graphs G and G′ are if there exists a
witness pattern H ∈ ℋk, h
with
|hom(H, G) − hom(H, G′)| > 2ε n|V(H)|.
The margin 2εn|V(H)|
is chosen so that, if both estimators are accurate to εn|V(H)|,
then the corresponding estimated intervals are disjoint. The separation
task is to construct Φk, h(G)
such that, with probability at least 1 − δ over the algorithmic
randomness,
G, G′ are
(k, h, ε)-separated ⇒ Φk, h(G) ≠ Φk, h(G′).
In particular, if we include the (approximate) motif counts themselves
as part of the signature, then separation follows immediately from the
uniform estimation guarantee; however, we will also incorporate
additional WL-style refined summaries so that the signature serves as a
compact proxy for iterative neighborhood information as well.
Our algorithms are required to avoid the Θ(nk) state size implicit in maintaining colors on Vk. Instead, we target space Õ(n ⋅ f(k, h) ⋅ ε−2log (1/δ)) by storing only O(1) sketches per vertex (and a constant number of global sketches), and time Õ(m ⋅ f(k, h) ⋅ ε−2log (1/δ)) by performing sparse neighborhood aggregations and a bounded number of sketch-based multilinear contractions. Here f(k, h) may be exponential in h (or in (k, h)) but is independent of n and m, reflecting fixed-parameter dependence on the pattern complexity. The output dimension d is permitted to scale as Õ(f(k, h)ε−2log (1/δ)), consistent with the number of stored sketch coordinates.
All procedures are randomized, using hash functions and (when needed) uniform samples of vertices or local neighborhoods. We require that the output distribution is invariant under relabeling of V(G); equivalently, for any permutation π of V the distributions of Φk, h(G) and Φk, h(π(G)) coincide. This is achieved by using only permutation-invariant access (adjacency lists) together with randomness independent of labeling (uniform vertex sampling and universal hashing).
The next section instantiates these requirements by combining (i) WL-inspired neighborhood refinement implemented via linear multiset sketches and (ii) bounded-treewidth motif estimation implemented via tensor sketches, yielding a single signature Φk, h(G) that supports both uniform additive estimation and margin-based separation under the stated resource bounds.
We now describe the randomized procedure that produces the signature Φk, h(G) and the estimators $\widehat{\mathrm{hom}}(H,G)$ while maintaining only sketched states at vertices. Conceptually, the construction has two coupled components: (i) a WL-style refinement operating on vertex states but implemented via linear sketches of neighbor multisets, and (ii) a motif module which contracts collections of refined vertex states using tensor-sketch primitives aligned with bounded-treewidth decompositions. Throughout we view k, h as fixed constants, and we select the sketch parameters so that the total number of stored coordinates is Õ(f(k, h)ε−2log (1/δ)).
A purely local refinement can be too symmetric in sparse
bounded-degree graphs; we therefore inject a small amount of global
reference information by sampling a set of Anc = {a1, …, ar} ⊆ V
uniformly at random, with r = Õ(ε−2log (1/δ))
(or smaller when anchors are used only for stabilization rather than
estimation). For each v ∈ V we compute a
positional feature vector p(v) relative to Anc. One convenient choice is the truncated
distance profile
p(v) = (min {dist(v, aj), q})j = 1r,
where q = O(log n)
bounds the search radius; under bounded degree, q can be treated as a constant in
the neighborhood explored around each anchor. Alternatively, one may use
short random-walk landing probabilities from each anchor, approximated
by a small number of steps and stored in sketch form. The only property
we require is that p(v) is label-invariant
(depends only on graph structure and random seed) and computable in
near-linear time in m for
fixed r, q.
We form an initial vertex descriptor by concatenating any available
base feature (e.g., a constant, degree, or a hashed label when present)
with the anchor-position vector:
x(v) = concat(base(v), p(v)).
We then apply a CountSketch map to obtain a fixed-width randomized
summary S0(v) ∈ ℝs,
where s is the sketch width
and t is the depth (number of
independent hash/sign functions). Formally, we fix t hash functions hj : [D] → [s]
and sign functions σj : [D] → {±1}
(for an appropriate input dimension D) drawn from a limited-independence
family, and we store the sketched coordinates
(S0(v))j, b = ∑i : hj(i) = bσj(i) x(v)i, j ∈ [t], b ∈ [s].
We emphasize that we do not require x(v) to be explicitly
materialized as a dense vector; it suffices that the nonzero features
can be streamed into the sketch updates.
For ℓ ≥ 1 we update vertex
states by aggregating the multiset {Sℓ − 1(u) : u ∈ N(v)}.
A direct WL update would store a histogram over neighbor colors; we
instead maintain a multiset operator. Concretely, we define
Tℓ(v) = MultiSetSketch({Sℓ − 1(u) : u ∈ N(v)}) ∈ ℝs′,
where MultiSetSketch is linear in its
inputs. The simplest instantiation is coordinatewise summation of
neighbor sketches (possibly after an additional randomized hashing layer
to reduce correlations):
Tℓ(v) = ∑u ∈ N(v)Ψ(Sℓ − 1(u)),
with Ψ a fixed random
sign-permutation map. Linearity ensures that Tℓ(v)
can be computed by iterating once over adjacency lists and accumulating
into the bucketed arrays of v.
Under bounded degree, the per-vertex update cost is O(Δ0 ⋅ s′),
hence total time O(m ⋅ s′)
per refinement round.
Given the previous state and the aggregated neighbor sketch, we
compute the next refined state by a randomized hash-and-sketch
step:
Sℓ(v) = HashAndSketch(Sℓ − 1(v), Tℓ(v)) ∈ ℝs.
One may implement HashAndSketch as (i)
concatenation of Sℓ − 1(v)
and Tℓ(v)
followed by a fresh CountSketch, or (ii) a polynomial sketch of low
degree in the coordinates of (Sℓ − 1(v), Tℓ(v))
when we wish to model higher-order interactions. For the purposes of our
guarantees it suffices that the map is (a) permutation invariant with
respect to neighbor ordering, (b) computed from limited-independence
hashing, and (c) induces bounded variance for the downstream motif
estimators. We perform L such
refinement rounds, where L is
a fixed constant chosen to match the desired locality depth (and,
heuristically, the number of WL iterations we intend to
approximate).
After refinement, we have vertex states SL(v)
that act as randomized feature vectors summarizing rooted neighborhoods
(with anchor references). To estimate homomorphism counts for a fixed
bounded-treewidth pattern H,
we must contract collections of such features across multiple vertices
of G in a manner consistent
with the adjacency constraints of H. We proceed by fixing a tree
decomposition of H of width at
most k, and writing hom(H, G) as a sum over
assignments to bags of products of local compatibility indicators. The
intermediate objects in such a dynamic program are tensors of order at
most k + 1; rather than
materializing them, we maintain tensor sketches of these tensors. In
particular, for any multilinear form
$$
\sum_{v_1,\dots,v_p\in V} \Bigl(\prod_{(i,j)\in \mathcal{C}}
\mathbf{1}[\{v_i,v_j\}\in E]\Bigr)\cdot \Bigl(\prod_{i=1}^p
g_i(v_i)\Bigr),
$$
with p ≤ h and
constraint set 𝒞 derived from H, we replace the products by a
TensorSketch estimator that is unbiased and computable from the sketched
vertex states and sparse edge traversals. For small motifs such as
length-2 paths and wedges, the required
contractions reduce to simple bilinear forms over E; for more complex
bounded-treewidth motifs, the decomposition ensures that all
contractions are over constant-width separators and can be implemented
with f(k, h)
many sketch coordinates.
We define the graph-level signature Φk, h(G)
by combining (i) a global pooling of refined vertex states and (ii) the
motif estimators:
$$
\Phi_{k,h}(G)\;=\;\Bigl(\sum_{v\in V} S_L(v)\Bigr)\ \oplus\
\Bigl(\widehat{\mathrm{hom}}(H,G)\Bigr)_{H\in\mathcal{H}_{k,h}},
$$
where ⊕ denotes concatenation. The
pooled vector ∑vSL(v)
serves as a compact summary of refined local neighborhoods, while the
appended motif estimates provide explicit access to the WL-aligned
statistics. Because both parts are computed from permutation-invariant
primitives (uniform anchor sampling, adjacency-list iteration, and
hashing independent of labels), the distribution of Φk, h(G)
is invariant under relabeling of V.
All steps are implementable with O(1) (sketched) state per vertex: we store Sℓ(v) for ℓ ∈ {0, …, L} either explicitly for the current round (overwriting in place) or with two buffers, together with the anchor-position features p(v) (or their sketches). Each refinement round is one pass over adjacency lists to accumulate Tℓ(v), followed by an O(n) pass to apply HashAndSketch. The motif module is likewise expressible as a bounded number of sparse passes and sketch contractions whose total work is Õ(m ⋅ f(k, h) ⋅ ε−2log (1/δ)) when sketch widths are set to achieve variance O(ε2n2|V(H)|) and failure probability δ after a union bound over H ∈ ℋk, h. In the next section we instantiate these choices for triangles and 4-cycles and then state the corresponding additive-error guarantees together with near-matching lower bounds in the access model.
We first instantiate the motif module for the two canonical constant-size motifs: triangles and 4-cycles. These serve as a concrete calibration point for (i) the additive-error guarantees we target throughout and (ii) the near-linear dependence on m and on the sketch repetition parameter ε−2log (1/δ).
Let A be the adjacency
matrix of an undirected simple graph G. The triangle count
satisfies
$$
T(G)\;=\;\frac{1}{6}\,\mathrm{tr}(A^3),
$$
since each triangle contributes 6
oriented length-3 closed walks. For
4-cycles we use the standard
decomposition of length-4 closed
walks:
$$
\mathrm{tr}(A^4)\;=\;8\,C_4(G)\;+\;2m\;+\;4\sum_{v\in V}\binom{d(v)}{2},
$$
where d(v) is the
degree of v. Indeed, the trace
counts (a) genuine 4-cycles (each
contributing 8), (b) back-and-forth
walks on a single edge (contributing 2
per edge), and (c) walks supported on a length-2 path u − v − w
(contributing 4 per unordered
neighbor-pair at the middle vertex). Consequently,
$$
C_4(G)\;=\;\frac{1}{8}\Bigl(\mathrm{tr}(A^4)-2m-4\sum_{v\in
V}\binom{d(v)}{2}\Bigr),
$$
and the correction term is computable exactly in O(m) time from the
adjacency lists.
For any symmetric matrix M
and a Rademacher vector z ∈ {±1}n, we
have
𝔼[z⊤Mz] = tr(M),
and, moreover, Var(z⊤Mz) ≤ 2∥M∥F2.
We therefore estimate tr(Ap) for p ∈ {3, 4} by averaging independent
probes:
$$
\widehat{\tau}_p\;:=\;\frac{1}{q}\sum_{i=1}^q \bigl(z^{(i)}\bigr)^\top
A^p z^{(i)}.
$$
Each quantity (z(i))⊤Apz(i)
is computed using p sparse
matrix–vector multiplications, which, under adjacency-list access,
reduce to p passes over the
edge set:
(Ax)v = ∑u ∈ N(v)xu, v ∈ V.
Thus, one probe costs O(m) arithmetic operations
and O(n) working
memory for a constant number of vectors; the total cost is O(qm) time and
O(n) space, up to
polylogarithmic factors needed to implement limited-independence hashing
when we generate the z(i)
pseudorandomly.
We package the two trace estimators into
$$
\widehat{T}(G)\;:=\;\frac{1}{6}\widehat{\tau}_3,
\qquad
\widehat{C}_4(G)\;:=\;\frac{1}{8}\Bigl(\widehat{\tau}_4-2m-4\sum_{v\in
V}\binom{d(v)}{2}\Bigr).
$$
The following statement is the specialization of our general sketching
guarantees to these motifs.
Unbiasedness is immediate from 𝔼[z⊤Apz] = tr(Ap) and the deterministic correction for C4(G). For concentration, we invoke the variance bound Var(z⊤Mz) ≤ 2∥M∥F2 with M = Ap, together with averaging over q probes and a standard median-of-means (or Chebyshev plus repetition) amplification to reach failure probability δ. Since we target additive error εnp, it suffices that q = Θ(ε−2log (1/δ)) controls the empirical mean to within εnp; polylogarithmic overhead accounts for limited-independence generation of the probes and for union bounding when we report multiple motif estimates simultaneously. The time bound follows because each probe requires a constant number of sparse matvecs, each implementable in O(m) adjacency-list operations; the working memory is O(n) per probe and reused across probes.
In the full procedure, T̂(G) and Ĉ4(G) are appended to the pooled refined-state readout. The resulting concatenated vector remains permutation invariant in distribution: the probe vectors may be generated from a seed independent of vertex labels (or derived from the same seed as the anchor sampling), and the subsequent computations are expressed as sums over adjacency lists.
The above dependence on m and on ε−2log (1/δ) is not an artifact of our implementation; it is forced by the access model in the worst case.
First, any adjacency-list algorithm that distinguishes two families of graphs that differ only by Θ(εn3) triangles must, in the worst case, inspect Ω(m) edges. The reason is that an adversary can localize all triangle discrepancies inside a subset of edges that is information-theoretically hidden unless a linear fraction of adjacency entries is queried. Formally, one applies Yao’s minimax principle to a distribution supported on graphs in which (i) all but a planted subgraph region are identical, and (ii) the planted region is placed uniformly among many candidate locations; then any sublinear-time algorithm fails to hit the planted region with constant probability, and cannot distinguish the two cases.
Second, even assuming Ω(m) time per pass, the number of independent measurements cannot be reduced below Ω(ε−2log (1/δ)) for worst-case additive error. This follows from the classical sample-complexity lower bound for estimating the mean of a bounded-variance random variable: for any unbiased estimator assembled from q independent probes (or q independent sketch repetitions), the variance decreases at best linearly in q, so achieving standard deviation O(εn3) (and failure probability δ via amplification) forces q = Ω(ε−2log (1/δ)). This lower bound applies to triangle estimation (and, with the same reasoning, to any fixed-degree trace/homomorphism statistic whose unbiased probe has variance bounded away from zero on a hard family), and it matches the repetition complexity in Theorem~ up to polylogarithmic factors.
These two lower bounds justify our choice of objectives: in the adjacency-list model, near-linear time in m and ε−2log (1/δ) repetitions are essentially optimal for constant-size motif estimation. In the next section we extend the motif module from these two trace-based motifs to arbitrary bounded-treewidth patterns, recovering the corresponding WL-aligned homomorphism statistics.
We now extend the motif module from the two trace-based motifs to the full family of bounded-treewidth patterns. Fix integers k ≥ 2 and h ≥ 3, and recall ℋk, h denotes the collection of graphs H with |V(H)| ≤ h and tw(H) ≤ k. Our target statistic for each H ∈ ℋk, h is the homomorphism count hom(H, G), i.e., the number of maps φ : V(H) → V(G) such that {φ(x), φ(y)} ∈ E(G) for all {x, y} ∈ E(H). For fixed H, this quantity is a degree-|E(H)| polynomial in the adjacency matrix entries of G, and it subsumes standard small-motif counts (injective counts can be recovered from hom(⋅, G) by inclusion–exclusion for fixed H).
Let H ∈ ℋk, h and fix a tree decomposition (𝒯, {Bα}α ∈ V(𝒯)) of width at most k, so |Bα| ≤ k + 1 for all bags. A classical exact algorithm computes hom(H, G) by dynamic programming over 𝒯: for each bag Bα one maintains a table indexed by assignments σ : Bα → V(G), storing the number of extensions of σ to a homomorphism from the portion of H covered by the subtree below α. The combine operation at a parent bag α with children β1, …, βr is a product-and-sum over assignments on intersections Bα ∩ Bβj, and the running time is nO(k) for fixed k (already too large for our purpose when n is large, even if k is small).
Our sketching step replaces each such table by a linear sketch that supports (i) summation over assignments, (ii) multiplication of independent factors (as induced by the decomposition), and (iii) restriction/gluing along separators Bα ∩ Bβ, all without materializing nk + 1 entries. Concretely, we represent a table Fα : V(G)Bα → ℝ by a CountSketch/TensorSketch of Fα viewed as a vector of length n|Bα|. Since |Bα| ≤ k + 1 is constant, we may regard the sketch dimension as depending only on (k, h, ε, δ) and not on n.
At an internal node α of
the decomposition, the exact recurrence has the schematic form
$$
F_\alpha(\sigma)\;=\;\mathrm{Loc}_\alpha(\sigma)\cdot \prod_{j=1}^{r}
\Bigl(\sum_{\tau_j:\,V(G)^{B_{\beta_j}}\to V(G),\,\tau_j|_{B_\alpha\cap
B_{\beta_j}}=\sigma|_{B_\alpha\cap B_{\beta_j}}}
F_{\beta_j}(\tau_j)\Bigr),
$$
where Locα(σ) ∈ {0, 1}
enforces the edges of H whose
endpoints both lie in Bα (and
similarly enforces consistency of already-introduced vertices). The key
point is that the operations involved are multilinear in the child
tables. We implement the multilinear forms using TensorSketch so that,
for each α, from sketches of
Fβ1, …, Fβr
we can produce a sketch of Fα with (i)
unbiasedness for every linear query of Fα and (ii)
variance bounded by a factor controlled by the sketch width and depth.
The local factor Locα(σ) is
applied by multiplying by adjacency indicators; under adjacency-list
access this reduces to iterating over edges incident to candidate images
of bag vertices, and since |Bα| is constant
and Δ ≤ Δ0, the
local enforcement can be integrated into the sketch update with constant
overhead per neighborhood lookup.
At the root bag ρ, the homomorphism count is hom(H, G) = ∑σFρ(σ); in the sketch representation this is a linear functional and is therefore estimated unbiasedly from the sketch of Fρ. Repeating the sketch with independent hash functions and applying median-of-means yields concentration at failure probability δ.
We fix a tree decomposition of H of width ≤ k. Each DP combine step is a multilinear map in the child tables followed by summation over eliminated variables; TensorSketch linearizes such multilinear maps in the sense that the sketch of a product can be computed from the sketches of the factors, and any linear query of the resulting table has an unbiased estimator with variance proportional to the squared ℓ2-mass of the table. Since H is constant size and Δ is bounded, the ℓ2-mass of the DP tables is controlled by a polynomial in n of degree |V(H)|, and choosing sketch width s = Θ̃(ε−2) and depth t = Θ̃(log (1/δ)) bounds the variance so that a median-of-means amplification yields the stated additive error. The adjacency-list access is used only to evaluate constant-size local constraints and to perform neighborhood-based aggregations; these cost Õ(m) per independent sketch repetition.
We emphasize that the restriction tw(H) ≤ k is not merely algorithmic convenience: it aligns the statistic family with k-dimensional WL invariants. In particular, known WL–homomorphism characterizations imply that equality of hom(H, ⋅) over all bounded-treewidth patterns H (with width bounded as a function of k) is an isomorphism-invariant refinement comparable to k-WL. Consequently, the collection {hom(H, G)}H ∈ ℋk, h may be viewed as a finite truncation of the corresponding WL-aligned invariant family, and our procedure approximates precisely this truncation while avoiding the Θ(nk) tuple-state blow-up inherent in explicit k-WL refinements.
We now state the separation property that justifies including $\{\widehat{\mathrm{hom}}(H,G)\}_{H\in\mathcal{H}_{k,h}}$ inside the global signature. Let Φk, h(G) denote the concatenation of the pooled refined-state readout with the motif estimates $\widehat{\mathrm{hom}}(H,G)$ for H ∈ ℋk, h (or for any predetermined subfamily). Since each coordinate estimate is permutation-invariant in distribution and is computed from adjacency lists through sums and sketches, Φk, h is an isomorphism-invariant randomized signature.
By Theorem~ and a union bound over the finitely many H ∈ ℋk, h that we estimate, with probability at least 1 − δ all estimates satisfy $|\widehat{\mathrm{hom}}(H,\cdot)-\mathrm{hom}(H,\cdot)|\le \varepsilon n^{|V(H)|}$ simultaneously. On this event, if hom(H, G) and hom(H, G′) differ by more than 2εn|V(H)| for some witness H, then the corresponding estimated coordinates differ (their error intervals are disjoint), hence the concatenated vectors Φk, h(G) and Φk, h(G′) are unequal.
The foregoing establishes that bounded-treewidth homomorphism statistics can be approximated uniformly in near-linear time and used as separating coordinates in our signature; we next delineate limitations and lower bounds that preclude exact emulation of full k-WL at comparable resource bounds in worst-case regimes.
Our guarantees deliberately target the regime of parameters (k, h) and recovery of the WL-aligned statistics {hom(H, G)}H ∈ ℋk, h. We now record the obstructions showing that, in worst-case graphs and under standard access models, one cannot simultaneously demand (i) exact emulation of k-WL, (ii) uniformity over patterns of unbounded size, and (iii) near-linear resource bounds in m and n.
The k-dimensional WL refinement maintains a color for each ordered k-tuple in Vk and iteratively updates it using multisets of colors of neighboring tuples (neighbors obtained by changing one coordinate at a time). Even ignoring the cost of forming these multisets, the object being refined has cardinality nk, hence any explicit simulation necessarily uses Ω(nk) time merely to touch all tuple states, and Ω(nk) space to store the current colors (or any information sufficient to output them).
One might hope to circumvent this by computing only a graph-level invariant that separates exactly the same graph pairs as k-WL, without materializing the tuple coloring. However, there exist families of graphs for which the stabilized k-WL coloring induces Θ(nk) distinct color classes on Vk (indeed, for generic instances the orbit structure on tuples can be essentially discrete). Any procedure that matches k-WL separation on all graphs must in particular be able to distinguish instances whose only witnesses live at the tuple level, and in such cases the information content of the stabilized coloring is itself Θ(nk) in the worst case. Thus, absent additional structure assumptions on the graph class (e.g., bounded treewidth, bounded expansion, or strong regularity constraints), exact k-WL emulation is incompatible with our objective of avoiding Θ(nk) blow-up.
Our procedure fixes h in advance and treats f(k, h) as a constant with respect to n. This restriction is not an artifact of the analysis but is compelled by classical counting hardness. In particular, the following facts are standard.
Given an input pattern H with |V(H)| = h as part of the input, computing sub(H, G) is #P-complete. A canonical reduction is from #Clique: taking H = Kh, the value sub(Kh, G) counts h-cliques (up to automorphisms), and deciding/ counting cliques is #P-hard when h is part of the input.
While hom(H, G) permits non-injective maps, it remains hard uniformly over H. Indeed, for fixed H the quantities hom(F, G) over all quotients F of H determine sub(H, G) via Möbius inversion on the partition lattice of V(H) (equivalently, inclusion–exclusion over identifications of vertices). Consequently, an algorithm that could compute hom(H, G) exactly for arbitrary input H in time polynomial in n and |H| would yield exact injective counts under polynomial-time Turing reductions, contradicting #P-hardness unless unlikely complexity collapses occur.
These hardness statements justify the design choice that (k, h) must be treated as fixed constants if one seeks uniform near-linear dependence on m and n.
Even when H is fixed (say H = K3), the access model imposes strong barriers. With adjacency-list access, a randomized algorithm that must succeed on worst-case graphs cannot, in general, estimate T(G) (or distinguish T(G) from T(G) ± Θ(εn3)) without inspecting a linear number of edges in the worst case. Intuitively, an adversary can place all triangle-discrepancy inside a set of edges that is information-theoretically hidden unless the algorithm probes a constant fraction of adjacency lists; formal statements follow by Yao-style minimax arguments with distributions supported on graphs that differ only in a sparse, hard-to-find region. Thus the Õ(m) dependence of our motif estimators is not merely convenient; it is essentially necessary in the worst case.
Independently, concentration imposes the familiar dependence on ε and δ. Any unbiased estimator with bounded variance requires on the order of ε−2log (1/δ) independent measurements (sketch repetitions, random probes, or equivalent randomness) to achieve additive error εn|V(H)| with failure probability at most δ. Hence, within the usual randomized estimation framework, our ε−2log (1/δ) scaling is near-optimal up to polylogarithmic factors.
Our procedures exploit adjacency-list access to perform neighborhood-limited consistency checks and sparse matrix–vector style primitives. In an insertion-only edge stream, however, even triangle counting admits strong space lower bounds via reductions from communication complexity (e.g., set-disjointness embeddings). In particular, one cannot in general maintain sufficiently informative sketches in o(n) space while guaranteeing worst-case additive accuracy with constant success probability. This separates the offline adjacency-list model from the streaming model: the former admits near-linear-time, near-linear-space estimators for fixed motifs, while the latter provably forbids analogous guarantees without additional promises on the input distribution or graph class.
Our uniform additive bounds are stated in the bounded-degree regime Δ ≤ Δ0. This hypothesis enters both algorithmically (local constraint enforcement can be done with constant neighborhood work) and statistically (higher moments of the relevant estimators can be controlled). If Δ is unbounded, then either (i) local enforcement costs can scale with Δ, or (ii) variance can be dominated by high-degree vertices, necessitating additional normalization, clipping, or degree-sensitive sampling; such modifications typically trade simplicity for more delicate bias–variance analyses. Moreover, since our guarantees are additive, motifs whose true counts are o(n|V(H)|) cannot be promised small error without further density assumptions.
In summary, the constraints fixed $(k,h)$'',randomized
approximation’‘, and ``near-linear dependence on m and n’’ are jointly consistent and
essentially sharp under standard complexity and access-model lower
bounds; dropping any of these constraints changes the landscape
qualitatively.
We record practical choices for implementing Sketch-WL so that the resulting procedure remains faithful to the stated access model while being usable as a component in standard learning pipelines. The guiding constraint is that all randomness should enter through explicit hash functions and anchor sampling, and all state should remain linear in n up to the chosen sketch sizes.
In all sketching subroutines (CountSketch, MultiSetSketch,
TensorSketch), we require only limited independence to obtain the usual
variance bounds. In practice we instantiate each hash as a fast 64-bit mix function followed by a range
reduction. Concretely, for bucket hashes h : [n] → [s] and
sign hashes σ : [n] → {±1}, we use
either (i) tabulation hashing (which empirically performs well and is
close to 3–4-wise independent in relevant regimes), or
(ii) multiply–shift families of the form
h(x) = ((ax + b) mod 264) shr (64 − ⌈log2s⌉), σ(x) = (−1)lsb((cx + d) mod 264),
with independent odd a, c and uniform b, d. We emphasize that
separation comparisons between graphs should use : the same hash
parameters and (when applicable) the same anchor-sampling seed. For
training, one may optionally resample hashes across epochs as a form of
noise injection, but evaluation should fix a seed so that signatures are
stable and comparable.
Theoretical bounds suggest s = Θ(ε−2) (up to logarithmic factors) and t = Θ(log (1/δ)) when using median/mean aggregation across independent repetitions. Empirically, we find that small depths t ∈ {2, 3, 4} already control catastrophic collisions, while the width s primarily governs the bias/variance tradeoff. A useful heuristic is to choose s so that a single node sketch occupies a cache-friendly footprint (e.g., s = 128, 256, 512), then increase t only if the downstream objective exhibits heavy-tailed errors. When a single scalar estimate is required (e.g., a single motif count), we prefer median-of-means over a raw mean across depths, since it improves robustness to occasional large-collision events without changing asymptotic cost.
Anchors serve two roles: they break symmetries in a controlled, distributional manner, and they provide a low-dimensional positional scaffold that helps downstream models (and sometimes the sketch itself) disambiguate globally similar neighborhoods. We implement anchors as uniformly sampled vertices a1, …, ar ∈ V, and we compute truncated BFS distances dist(v, ai) ∧ q for a small radius q (typically q ∈ {3, 4, 5}). This can be done via r BFS runs in time Õ(rm), which is acceptable when r is small (e.g., r ∈ [8, 64]) and m is sparse; on larger graphs we instead use multi-source BFS over batches of anchors or replace distances by short random-walk landing sketches (a small number of sparse matrix–vector multiplies with restart). We also record that, for permutation invariance, anchors must be sampled independently of vertex labels; when deterministic runs are required, we implement sampling by hashing the vertex identifier with a global seed and selecting the r smallest hash values.
The refinement depth L
plays the role of message-passing radius. For bounded-size motif
estimation with |V(H)| ≤ h, it is
typically sufficient to take L
on the order of the motif diameter (often L ≤ h), whereas for purely
discriminative signatures one may increase L until performance saturates. The
multiset compression step should be implemented as a linear operator in
neighbor sketches. In sparse frameworks, we realize
Tℓ(v) = ∑u ∈ N(v)Sℓ − 1(u)
followed by a hash-based mixing (to emulate independent CountSketch
rows) as a sparse matrix aggregation (SpMM) with optional signed
neighbor contributions. This avoids explicit neighbor-list
materialization and yields near-linear passes over edges. For unbounded
degrees, we recommend degree-normalized sums (e.g., dividing by $\sqrt{\deg(v)}$) to mitigate variance
dominated by hubs; this does not alter invariance and empirically
stabilizes training.
Because sketches are sums of signed contributions, their variance grows with the number of aggregated terms and the number of refinement rounds. In float32 arithmetic, large graphs or large L can therefore incur noticeable roundoff. We address this by (i) using float32 for per-node sketches but accumulating graph-level sums in float64, (ii) optionally applying a per-round rescaling (e.g., multiplying Tℓ(v) by (deg (v) + 1)−1/2), and (iii) applying standard normalization layers (LayerNorm or RMSNorm) when Sketch-WL is embedded into a learned model. For purely algorithmic motif estimation (without learning), we instead prefer explicit variance control via increased s or increased independent repetitions. When reporting motif estimates, we recommend storing both the point estimate and an empirical standard deviation across depths t as a diagnostic.
For patterns that admit trace expressions (triangles, 4-cycles), Hutchinson-style estimators reduce to repeated sparse matrix–vector multiplies. In implementations, we fuse these multiplies where possible and reuse random probe vectors across motifs to reduce memory traffic. For bounded-treewidth patterns handled via TensorSketch, we precompute the tree decomposition of each H once, then generate the corresponding sequence of multilinear contractions as a small static computation graph whose parameters are the hash seeds. This yields a fixed overhead depending only on h and k, and ensures that the per-graph runtime is dominated by sparse passes over E.
We treat Sketch-WL as a structural encoder producing node features ψ(v) ∈ ℝD and optionally graph features Ψ(G) ∈ ℝD′. In an MPNN, we concatenate ψ(v) to the initial node attributes and run standard message passing; in a transformer, we use ψ(v) either as an additive positional encoding or as additional keys/values in attention. Since all sketch operations are linear in the input features given fixed hashes, gradients propagate through the linear algebra, while the hashes themselves remain non-trainable (as in random-feature methods). If one includes the motif estimates $\widehat{\mathrm{hom}}(H,G)$, they can be appended to the graph representation or used as auxiliary regression targets; in either case, one should share hash seeds across the training set to avoid introducing a spurious source of distribution shift.
As a starting point for sparse graphs, we take r ∈ [16, 32], q ∈ [3, 5], L ∈ [3, 6], s ∈ {256, 512}, and t ∈ {3, 4}, then tune s first (for variance) and L second (for expressivity). These choices keep per-node state modest while preserving the essential near-linear scaling in m and n targeted by our analysis.
We design experiments to probe three claims implicit in the preceding sections: (i) that the sketch-based motif estimators achieve useful additive accuracy at near-linear cost on sparse graphs, (ii) that the resulting signatures separate graphs whenever a bounded-treewidth witness statistic differs beyond the prescribed margin, and (iii) that the same machinery can be used as a practical structural encoder that improves downstream prediction relative to standard message passing at comparable computational budgets.
To stress-test separation rather than average-case prediction, we include synthetic benchmarks where low-dimensional local statistics are insufficient. Concretely, we generate pairs and small collections of graphs drawn from classical WL-hard constructions (e.g., Cai–Fürer–Immerman style pairs at controlled size) and from families of regular graphs with matched degree profiles and near-matched small-radius neighborhoods. We additionally include motif-controlled random graph families where a base sparse random graph is perturbed by planting a prescribed number of copies of a fixed motif (triangles, 4-cycles, and selected H ∈ ℋk, h with h ≤ 6) in randomly chosen locations while preserving the maximum-degree cap by rejecting placements that exceed Δ0. This yields instances where the only intended distinguishing signal is a controlled change in hom(H, G), allowing us to evaluate separation under explicit additive margins.
To evaluate utility as an encoder in a standard supervised pipeline, we use a molecular regression suite in the style of ZINC (graphs with atom and bond types, sparse and moderate size). We follow standard splits (train/validation/test) and report mean absolute error (MAE) on the target property. Since molecules are naturally bounded-degree, these data also align with the bounded-degree regime assumed in our analysis. For all datasets, we treat node and edge attributes as given; when graphs are unlabeled we use only a constant node feature and optionally degree as an additional baseline feature, so that any gains are attributable to structural information rather than rich attributes.
We compare against (i) expressive MPNN baselines (GIN/GraphConv variants) with comparable parameter counts, (ii) substructure-enhanced architectures, including ESAN (with its rooted-subgraph enumeration choices), GSN (subgraph counting features where feasible), and KP-GNN (kernel/positional enhancements), and (iii) sparse transformer-style graph models that scale by neighborhood sampling or sparsified attention. For motif estimation specifically, we include two reference points: an exact motif counter for small graphs and small motifs (enumeration with pruning), used only to provide ground-truth labels; and a trace-based estimator for triangles/4-cycles based on sparse matrix–vector multiplies without Sketch-WL refinement, serving as a sanity check that the Hutchinson component behaves as expected. Where a baseline requires motif features not available at scale (e.g., exact subgraph counts), we either restrict to motif sizes where computation is feasible on the given graphs or report the baseline as ``not applicable’’ rather than imputing approximations that change its nature.
For separation experiments, we enforce common randomness across all graphs in a comparison set: identical hash seeds, identical sketch widths/depths, and (when anchors are used) identical anchor-sampling seeds. This implements the invariance/distributional comparability discussed earlier and prevents spurious separation due to different random projections. For supervised training, we fix one global seed per run (including hashes and anchor sampling) and average over multiple independent runs; we also report the standard deviation across runs for the main metrics. When baselines have tunable depth/width, we tune within a matched compute budget (similar wall-clock time or parameter count) using the validation split; for Sketch-WL we tune (r, q, L, s, t) with the heuristic that s governs variance and L governs expressivity, consistent with the implementation notes.
For motif estimation, we report MAE for each target H, both in raw scale and normalized by n|V(H)| to reflect the additive error target εn|V(H)|. When motif counts are very small (near-zero), we additionally report a clipped relative error to avoid division artifacts. For separation, we construct labeled pairs (G, G′) where the ground-truth witness statistic differs by a controlled amount; we then measure separation accuracy, defined as the fraction of pairs for which Φk, h(G) ≠ Φk, h(G′) under a fixed seed, and we plot accuracy as a function of the planted margin (expressed in units of n|V(H)|). For downstream tasks, we report MAE on regression (ZINC-like) and, where applicable, ROC-AUC on classification tasks; we also report throughput (edges processed per second) and peak memory, since our primary objective is to avoid Θ(nk) state growth.
We isolate the roles of the main components by a controlled ablation suite. First, we remove anchors entirely (r = 0) and keep only refinement-based sketches, to measure how much positional scaffolding contributes beyond pure multiset aggregation. Second, we keep anchors but replace CountSketch/MultiSetSketch by exact dense concatenation on small graphs (where feasible), which exposes the effect of sketch collisions independently of aggregation design. Third, we vary the refinement depth L while holding (s, t) fixed, to observe the saturation point where additional rounds no longer improve separation or downstream accuracy. Fourth, we vary sketch width s and depth t to trace the bias–variance and collision-robustness tradeoffs predicted by standard sketch analyses; we summarize these as (MAE, separation accuracy) versus memory footprint per node. Finally, we ablate the motif module by (i) using only node-level signatures ∑vSL(v) and (ii) appending the explicit motif estimates $\{\widehat{\mathrm{hom}}(H,G)\}$, thereby quantifying the incremental value of explicit low-treewidth statistics in supervised tasks.
Unless otherwise stated, each plotted point is averaged over multiple random seeds, with error bars reflecting standard error; all motif ground truths are computed exactly when h is small enough for the dataset sizes under consideration. We emphasize that our aim is not to optimize absolute predictive performance per se, but to demonstrate that the near-linear sketching regime yields (a) reliable motif estimates at the intended additive scale, (b) separation behavior consistent with the witness-based guarantees, and (c) competitive downstream gains when used as a drop-in structural encoder.
Our analysis and experimental protocol are organized around a bounded-degree regime, fixed pattern size h, and bounded treewidth tw(H) ≤ k. We regard these choices as the minimal assumptions under which one can simultaneously (i) retain WL-relevant statistics, (ii) obtain uniform additive guarantees of the form εn|V(H)|, and (iii) avoid Θ(nk) storage. We now indicate several directions in which the present framework can be extended, together with technical obstacles that appear fundamental in the adjacency-list model.
The bounded-degree promise Δ ≤ Δ0 enters
our bounds through variance control: whenever local neighborhoods are
explored (explicitly via refinement or implicitly via A-vector products), high-degree
vertices can dominate second moments. A first extension is to replace
worst-case degree control by a distributional or moment condition,
e.g.,
∑v ∈ Vdeg (v)2 ≤ B or Prv ∼ V[deg (v) > τ] ≤ η,
and to let sketch widths scale with B (or with τ, η) rather than with
Δ0. On the
algorithmic side, we can incorporate degree-aware normalization into the
refinement operator (e.g., a signed sum of neighbor sketches scaled by
deg (v)−1/2) so
that the induced random features resemble polynomials in a normalized
adjacency matrix. This typically stabilizes the contribution of hubs but
changes the target statistic from raw homomorphism counts to weighted
counts; one must then track how the weighting interacts with the
WL–homomorphism correspondence.
A complementary approach is explicit heavy-vertex handling. We may maintain exact neighborhood aggregates for vertices with deg (v) > τ (for a threshold τ chosen as a function of ε), and apply sketches only to the light part. This is analogous to standard ``heavy–light’’ decompositions in sublinear algorithms for triangle counting. The open point is to obtain a clean uniform guarantee simultaneously for all H ∈ ℋk, h without introducing dependence on the number of heavy vertices that is too pessimistic. We expect that a hybrid estimator, in which DP-style decomposition of H is performed with conditioning on whether each mapped vertex lands in the heavy set, can yield bounds depending on ∑vdeg (v)p for small p determined by H.
The sketch primitives used in Sketch-WL (CountSketch, TensorSketch, and linear multiset sketches) are, by construction, linear. This suggests a dynamic variant in which we maintain per-node states Sℓ(v) under edge insertions and deletions. At a purely algebraic level, if Tℓ(v) is a linear function of neighbor sketches, then an update (u, v) can be propagated locally. The difficulty is algorithmic rather than definitional: WL-style refinement is an iterative fixed-point computation, hence a single edge update can, in principle, affect sketches at all radii up to L.
Two restricted regimes appear tractable. First, when we fix L to be a small constant (as in practical encoders), one can propagate updates for O(L) hops, giving worst-case update time Õ(ΔL) and, in bounded-degree graphs, polylogarithmic amortized time if updates are sparse. Second, for motif estimation modules expressible as traces (e.g., triangles, 4-cycles), there are streaming/dynamic trace estimators based on maintaining sketch vectors z and quantities of the form z⊤Apz; since A changes by a rank-2 update per edge, these quantities can sometimes be updated faster than recomputing sparse products from scratch. A precise characterization of which H ∈ ℋk, h admit polylogarithmic-time dynamic maintenance (with additive accuracy) remains open, especially under deletions where adversarial cancellations can inflate variance.
In many applications graphs carry node and edge labels (categorical or continuous). Our sketching formalism already permits this: the base feature map base_feature(v) in the initialization step can include a hashed embedding of node labels, and the multiset sketch can be made edge-aware by hashing neighbor sketches together with an edge-label token. Two questions are then substantive.
First, we may ask for guarantees that are uniform over labelings in the same sense as our unlabelled guarantees. This is immediate when labels are treated as fixed inputs and the randomness is only in the sketches; however, the relevant scale of hom(H, G) changes because label constraints reduce the number of admissible homomorphisms. The natural normalization becomes n|V(H)| multiplied by a label-frequency factor, and one should state bounds in terms of the empirical label distribution.
Second, we may ask for feature-aware analogues of the WL–homomorphism connection. For colored graphs, it is standard to consider colored homomorphisms (respecting colors). Our estimator constructions extend by inserting indicator factors into the local compatibility predicates in the tree-decomposition DP. The remaining work is to determine which families of colored patterns witness the expressivity of the corresponding colored k-WL, and to quantify how many such patterns are needed for a given accuracy target.
Our separation guarantee is stated with respect to the family ℋk, h, hence it is inherently a ``finite truncation’’ of the full collection of bounded-treewidth patterns. In classes where k-WL equivalence is characterized by equality of all hom(H, ⋅) for tw(H) ≤ k, the truncation parameter h determines what we can witness. It is therefore natural to ask for quantitative completeness statements of the following form: given two graphs G, G′ that are distinguished by k-WL after L rounds, does there exist a witness H with tw(H) ≤ k and |V(H)| ≤ g(k, L) such that hom(H, G) ≠ hom(H, G′)? Any such bound g would translate directly into a principled choice of h for Sketch-WL, and would tighten the gap between approximate sketch-based signatures and exact k-WL distinguishability.
Relatedly, we have focused on additive guarantees. In WL contexts one sometimes cares about exact distinguishability rather than approximation. This suggests studying ``collision-free’’ regimes: for fixed n and fixed families of graphs, can one choose sketch width s and depth t so that, with high probability, the entire refinement process is injective on the multiset of neighborhoods encountered (thus simulating exact refinement)? Such statements are plausible when the number of distinct intermediate multisets is subexponential in s, but proving them in a data-dependent manner would require controlling adaptivity across rounds.
On the algorithmic side, we have not optimized constants or polylogarithmic factors. There is room to reduce the number of passes over adjacency lists, to exploit SIMD-friendly hash families, and to combine multiple motifs in a single tensor sketch to amortize cost. On the lower-bound side, our discussion has emphasized triangles as a representative statistic. It would be useful to identify, for each fixed H ∈ ℋk, h, the tight dependence on ε and on degree/moment parameters in the adjacency-list model, and to isolate motifs whose estimation is strictly harder than trace-style motifs even under bounded degree.
We view these directions as compatible with the central objective: to obtain WL-relevant, permutation-invariant signatures that scale near-linearly in sparse graphs while providing explicit statistical guarantees for a controlled family of patterns.