When Does Homomorphism Expressivity Exist? A Meta-Theorem for Stabilizing Refinement GNNs

Metadata

Total Words: 13,826
Export Date: 2026-01-18 03:49:52
Description: Recent work introduced homomorphism expressivity F_M as a complete, quantitative framework for comparing GNN architectures: two graphs receive identical representations iff they match homomorphism counts over a maximal pattern family F_M. That work proved existence, uniqueness, and tight structural characterizations of F_M for several important architectures via unfolding trees, tree decompositions, nested ear decomposition (NED), and maximality via F"urer-type counterexamples. This paper turns those case analyses into a platform theorem. We formalize a general class of stabilizing refinement-with-aggregation (RWA) architectures (covering message passing, subgraph/local tuple methods, and guarded folklore variants) and propose explicit axioms—equivariance, stabilization, a locality schema, and compositionality—that are checkable from an architecture specification and rule out pathological ‘if-then’ invariants. Under these axioms we prove that homomorphism expressivity exists and is unique at graph-, node-, and edge-levels. The proof supplies a generic recipe to derive a candidate pattern basis from the architecture’s unfoldings and then establishes maximality by constructing universal separating graph pairs using generalized pebble-game arguments and gadget graphs extending F"urer constructions (including rooted, clique-augmented variants). As corollaries we recover known F_M characterizations for MPNNs, Subgraph GNNs, Local k-GNNs, and (Local) folklore GNNs, and we formally pinpoint why certain ad hoc architectures fail to admit a well-defined F_M. We conclude with a blueprint for applying the theorem to modern architectures and discuss where additional assumptions are needed for models with learned, data-dependent neighborhoods (e.g., sparse attention).

1. Introduction: why WL-style qualitative expressiveness is insufficient; recap homomorphism expressivity as a complete invariant; goal of a meta-theorem and ‘expressivity certificate’ conditions.
2. Preliminaries: graphs, rooted graphs, homomorphisms and spasms; refinement procedures; categorical product; overview of Dell–Grohe–Rattan-style linear relations used in completeness proofs.
3. The RWA Architecture Class: formal definition of tuple-state refinement with a neighbor relation template; invariant vs equivariant outputs; examples (MPNN, Subgraph, Local k, folklore).
4. Axioms for Existence and Uniqueness: (A1) equivariance, (A2) stabilization, (A3) locality schema, (A4) compositionality, (A5) unfolding identifiability; discussion, necessity, and a taxonomy of failure modes.
5. Upper Bound / Completeness: unfolding objects and bag-isomorphism counting; show representations are determined by hom counts on a derived pattern basis; matrix factorization argument (bag-isomorphism ↔︎ hom).
6. Lower Bound / Maximality: generalized pebble games parameterized by the neighbor relation template; construction of generalized F"urer gadgets and twists; rooted (clique-augmented) gadgets for node/edge-level; prove that every forbidden pattern yields a separating pair.
7. Recipe to Derive F_𝒜: how to compute/describe the candidate family from an architecture spec; conversion of unfolding-basis descriptions into structural decompositions (treewidth/NED-like) when possible; guidance for new architectures.
8. Instantiations and Corollaries: recover known F_M for standard GNNs; formalize the pathological counterexample and show exactly which axiom it violates; additional examples (distance-guarded variants, fixed sparse-attention variants).
9. Complexity and Constructivity: size bounds for gadget counterexamples; decidability/complexity of membership tests under common structural characterizations; limits of full automation.
10. Discussion and 2026 Outlook: how the meta-theorem can underwrite expressivity claims for modern models; open directions (learned neighborhoods, approximate variants, transformers).

Content

1. Introduction: why WL-style qualitative expressiveness is insufficient; recap homomorphism expressivity as a complete invariant; goal of a meta-theorem and ‘expressivity certificate’ conditions.

The contemporary theory of message-passing and Weisfeiler–Leman (WL) style graph representations has largely developed around a notion of expressiveness: an architecture is deemed stronger if it distinguishes strictly more non-isomorphic graphs, or if it simulates a higher-dimensional refinement. While such comparisons are useful as a first taxonomy, they do not by themselves constitute a complete description of what a representation . In particular, (i) qualitative distinguishability is insensitive to the of the separating signal (one witness substructure versus exponentially many), (ii) it fails to capture the many settings where the downstream task is inherently -like (motifs, partition functions, kernel features), and (iii) it does not directly yield a tight ``if and only if’’ characterization of the information preserved by a stable representation.

A canonical illustration is provided by pairs of graphs that are indistinguishable by a given refinement (say, 1-WL), but that differ drastically in the number of occurrences of a fixed pattern. In such a case, any invariant extracted solely from the stable colors cannot recover that pattern statistic; yet qualitative language—``the architecture cannot distinguish the graphs’’—does not specify statistics are lost, nor does it provide a systematic recipe to certify loss of information for pattern outside the architecture’s range. Conversely, when an architecture distinguish two graphs, qualitative statements do not identify a canonical family of witnesses that explains the distinction in a uniform algebraic way.

For refinement-based architectures, there is a well-established quantitative alternative: . For a fixed pattern graph F, the number hom(F, G) of homomorphisms from F to an input graph G is a robust, compositional statistic. First, it linearizes naturally: many classical graph invariants can be expressed as (finite) linear combinations of homomorphism counts, and conversely, subgraph counts can be recovered from homomorphism counts through inclusion–exclusion over quotients (the ``spasm’’ formalism). Second, homomorphism counts behave well under the standard algebraic operations on graphs (in particular, disjoint union and categorical product), which makes them compatible with closure properties that arise implicitly when one analyzes stable refinement procedures. Third, and crucially for our purposes, homomorphism counts admit relative to refinement: for several canonical variants of WL and local k-tuple methods, equality of stable representations is equivalent to equality of hom(F, ⋅) over an explicitly characterized family of patterns F.

Our objective is to lift these completeness phenomena to a single meta-theorem that treats a broad class of architectures uniformly. We do not fix a particular GNN or a particular WL dimension a priori. Instead, we consider an abstract refinement-with-aggregation (RWA) architecture 𝒜 specified by (a) an initialization of colors on k-tuples, (b) an equivariant round update that aggregates colors over a fixed neighbor-relation template R on k-tuples, and (c) an invariant (graph-level) or equivariant (node-/edge-level) readout of the stabilized colors. Under mild axioms ensuring that 𝒜 behaves like a genuine refinement procedure rather than an ad hoc encoding, we aim to associate to 𝒜 a family of patterns F_𝒜 that plays the role of a complete quantitative invariant: the stable representation χ_𝒜(G) should be determined exactly by the vector (hom(F, G))_{F ∈ F_𝒜}, and conversely, if two graphs have identical stable representations, then their homomorphism counts agree for all F ∈ F_𝒜. The maximality requirement is essential: it is not merely that there exists family that is complete, but that there is a unique such family, so that membership F ∈ F_𝒜 is the sharp dividing line between what the architecture can and cannot determine from its stable output.

To make such a statement meaningful across architectures, we must articulate what constitutes an ``expressivity certificate.’’ Informally, a certificate for 𝒜 should provide three ingredients.

We require an unfolding semantics that explains stabilized colors as isomorphism types of local combinatorial objects generated by iterating R. For 1-WL this basis is the family of rooted unfolding trees; for higher-dimensional methods one obtains hypertree-like objects or canonical decomposition fragments. The meta-theorem demands that the architecture admit such a basis , so that the stable colors do not hide information unrelated to the refinement dynamics. This is the role of an unfolding identifiability condition: distinct stable colors must correspond to non-isomorphic unfoldings.

Counting unfolding types inside an input graph is not yet the same as counting homomorphisms from ordinary patterns, because unfoldings are typically rooted, layered, or bagged objects rather than plain graphs. The certificate therefore includes a reduction showing that the relevant counts (bag-isomorphisms, typed embeddings, or their analogues) are expressible as a finite triangular linear system in homomorphism counts of an induced family of pattern graphs. This is the mechanism behind completeness: once the stable representation is determined by counts of unfolding types, and those are in turn determined by homomorphism counts of induced patterns, we obtain a homomorphism characterization of representation equality.

Maximality requires more than an upper bound. For every pattern F that is compatible with the unfolding basis, we need an explicit pair of graphs G_F, H_F such that 𝒜 produces identical stable representations on G_F and H_F, yet hom(F, G_F) ≠ hom(F, H_F). Such pairs witness that hom(F, ⋅) is determined by χ_𝒜(⋅) and therefore that F ∉ F_𝒜. The meta-theorem we develop demands a uniform construction of these separators from F, typically via a generalized F"urer-type gadget together with a ``twist’’ operation that preserves the architecture’s indistinguishability while flipping the target homomorphism count.

The axioms imposed on 𝒜 should be read precisely as conditions ensuring that the above three ingredients can be made canonical and architecture-independent. Equivariance (or invariance) of initialization and updates is necessary so that χ_𝒜 respects isomorphism and the unfolding semantics is well-defined. Stabilization ensures that the refinement reaches a fixed point on every finite graph, so that the representation is a finite object and is amenable to counting arguments. The fixed neighbor-relation template R is the locus of locality: it is what permits the inductive definition of unfoldings and the extraction of a grammar of local witnesses. Compositional closure under disjoint union and categorical product rules out special-purpose behavior that would break the homomorphism algebra and would obstruct maximality; it is also the structural reason why homomorphism counts, rather than arbitrary statistics, emerge as the appropriate complete invariants. Finally, unfolding identifiability ensures that the architecture does not conflate non-isomorphic unfoldings into the same stable color in an uncontrolled way; without this, the unfolding basis would cease to be an invariant semantics of the representation, and the resulting ``expressivity family’’ could fail to exist.

We emphasize that our aim is not to re-prove isolated results for particular architectures, but to isolate a reusable proof pattern: from R and the refinement update, we derive an induced family of unfolding objects and hence an induced family of patterns, yielding a completeness direction; then, from violations of the induced structural constraints, we derive explicit gadget separators, yielding maximality. The outcome is a certificate that can be instantiated to recover known homomorphism characterizations (forests for message passing, bounded treewidth for suitable k-tuple models, and so on) while also applying to less standard neighbor relations, provided they satisfy the axioms.

The remainder of the paper is organized to support this program. In the next section we fix notation for graphs (including rooted variants), homomorphisms and their relation to subgraph counts, and we formalize refinement procedures in the RWA model. We also recall the categorical product operation and summarize the Dell–Grohe–Rattan style linear relations that convert between bagged counts and homomorphism counts, which will serve as the algebraic backbone of the completeness proofs. Subsequent sections then implement the certificate: we derive the unfolding basis associated with R, prove completeness via the resulting linear system, and establish maximality by constructing gadget pairs that witness the failure of any pattern outside the maximal family.

We collect the basic objects and algebraic identities used throughout. All graphs are finite, simple, undirected, and vertex-labeled. We write 𝒢 for the class of such graphs.

A (labeled) graph is a triple G = (V_G, E_G, ℓ_G), where V_G is a finite set, $E_G\subseteq \binom{V_G}{2}$ is an undirected edge set, and ℓ_G : V_G → Σ is a label function into a fixed countable label set Σ. An isomorphism φ : G → H is a bijection φ : V_G → V_H such that {u, v} ∈ E_G ⇔ {φ(u), φ(v)} ∈ E_H and ℓ_G(u) = ℓ_H(φ(u)) for all u ∈ V_G. We write G ≅ H if an isomorphism exists.

We frequently use rooted and multi-rooted variants. For u ∈ V_G we write G^u for the rooted graph obtained by designating u as a distinguished vertex. For u, v ∈ V_G we analogously write G^uv for the graph rooted at an ordered pair (u, v); the order is important when we later treat k-tuples. More generally, for a tuple u = (u₁, …, u_k) ∈ V_G^k we write G^u for the k-rooted graph. Root-preserving isomorphisms are defined in the obvious way: φ : G^u → H^v must satisfy φ(u_i) = v_i for all i ∈ [k].

Let F, G ∈ 𝒢. A graph homomorphism h : F → G is a map h : V_F → V_G such that
{x, y} ∈ E_F ⇒ {h(x), h(y)} ∈ E_G, and ℓ_F(x) = ℓ_G(h(x)) ∀x ∈ V_F.
We denote by Hom(F, G) the set of all such homomorphisms, and by hom(F, G) = |Hom(F, G)| its cardinality. For rooted graphs we enforce root preservation: for a ∈ V_F^k and u ∈ V_G^k we write Hom(F^a, G^u) for the homomorphisms h : F → G satisfying h(a_i) = u_i for all i ∈ [k], and hom(F^a, G^u) for their number.

We also use the related notion of injective embeddings and subgraph counts. Writing Emb(F, G) for the set of label-preserving injective homomorphisms, we define emb(F, G) = |Emb(F, G)|. If F is unlabeled or labels are rigid, one may further divide by |Aut(F)| to obtain the number sub(F, G) of (not-necessarily-induced) subgraphs isomorphic to F. For our purposes the fundamental statistic is hom(F, G), because it interacts cleanly with the graph operations that arise from refinement dynamics.

Homomorphisms conflate vertices of F, and it is therefore convenient to organize this conflation explicitly. A partition π of V_F is if all vertices within a block have the same label. From such a partition we form a quotient graph F/π by contracting each block to a single vertex and placing an edge between two blocks if there exists at least one edge of F between the corresponding blocks. (Since our graphs are simple, loops produced by contracting an edge within a block are discarded; equivalently, we may restrict to partitions π in which no edge lies within a block.) The of F, denoted Spasm(F), is the set of isomorphism types of all such quotients F/π that can occur as homomorphic images of F.

The relevance is the standard fact that homomorphism counts are linear combinations of injective counts over Spasm(F), and conversely. Concretely, if we let surj(F, H) be the number of surjective homomorphisms from F onto H (up to equality on vertices, not modulo automorphisms), then for every G we have
hom(F, G) = ∑_{H ∈ Spasm(F)}surj(F, H) ⋅ emb(H, G),
and by Möbius inversion on the partition lattice one obtains emb(F, G) as an integer linear combination of hom(H, G) over H ∈ Spasm(F). We will use this perspective only as a bookkeeping device: it explains why the passage from homomorphisms to substructures and back is always mediated by finite triangular linear systems indexed by quotients.

We next fix the minimal language for refinement-based representations. Let k ≥ 1. For a graph G, a k-tuple is u = (u₁, …, u_k) ∈ V_G^k. A (discrete) refinement state at round t is a coloring
c_G^t : V_G^k → Γ_t,
into some finite or countable set of colors Γ_t. A refinement round replaces c_G^t by c_G^t + 1 via an isomorphism-equivariant update rule that aggregates information from a specified neighborhood relation on k-tuples. We will formalize the update schema and the allowed neighbor relations in Section~3; for the present discussion we only require the following two generic properties.

First, refinement is : if φ : G → H is an isomorphism and u ∈ V_G^k, then c_G^t(u) = c_H^t(φ(u)) for all rounds t, where φ(u) is applied componentwise. Second, refinement : for each finite G there exists T = T(G) such that c_G^T = c_G^T + 1, hence c_G^t = c_G^T for all t ≥ T. The stabilized coloring induces a partition of V_G^k into color classes, and any invariant readout of this partition (e.g., the multiset of colors, or color histograms) yields an isomorphism-invariant graph representation. Rooted and equivariant outputs are obtained by restricting attention to tuples containing specified vertices; again, the precise readout interface will be fixed later.

Two graph operations play a structural role in our completeness and maximality arguments.

For disjoint union, given graphs G, H with disjoint vertex sets we write G ⊔ H for their union (labels and edges are inherited componentwise). Homomorphism counts interact with ⊔ by component factorization: if F has connected components F₁, …, F_m, then
$$ \mathrm{hom}(F,G)=\prod_{i=1}^m \mathrm{hom}(F_i,G), \qquad\text{and}\qquad \mathrm{hom}(F,G\sqcup H)=\prod_{i=1}^m \bigl(\mathrm{hom}(F_i,G)+\mathrm{hom}(F_i,H)\bigr). $$
In particular, for connected F we have hom(F, G ⊔ H) = hom(F, G) + hom(F, H).

For categorical product, the product G × H is defined by
V_G × H := V_G × V_H, {(g, h), (g^′, h^′)} ∈ E_G × H ⇔ {g, g^′} ∈ E_G ∧ {h, h^′} ∈ E_H,
with vertex labels ℓ_G × H(g, h) := (ℓ_G(g), ℓ_H(h)) (or any fixed encoding of pairs into Σ). The key identity is multiplicativity of homomorphisms:
hom(F, G × H) = hom(F, G) ⋅ hom(F, H) for all F, G, H ∈ 𝒢.
This follows by the bijection Hom(F, G × H) ≅ Hom(F, G) × Hom(F, H) obtained by projecting a homomorphism h : V_F → V_G × V_H to its two coordinate maps. Rooted variants satisfy the same identity when roots are taken as tuples: hom(F^a, (G × H)^(u, v)) = hom(F^a, G^u)hom(F^a, H^v).

These two operations are the algebraic reason homomorphism counts form the appropriate coordinate system for refinement-based representations: the invariants preserved by stable refinement are compatible with ⊔ and × under mild closure hypotheses, and therefore live naturally in the semiring generated by {hom(F, ⋅)}_{F ∈ 𝒢}.

In completeness proofs we must bridge the gap between what refinement ``sees’’ locally and plain homomorphism counts. The refinement dynamics will be interpreted via (rooted trees, hypertrees, or bagged fragments) generated by iterating the neighbor relation on tuples. Counting occurrences of such objects in G typically yields quantities that are not immediately of the form hom(F, G), because unfoldings come with additional structure: layers, bags, or explicit equality constraints between repeated vertices.

The standard mechanism, originating in work of Dell–Grohe–Rattan and subsequently reused in several WL completeness results, is that these structured counts are linearly interreducible with homomorphism counts of finitely many ordinary patterns obtained from the unfolding object by ``forgetting structure’’ and taking quotients. We state this at the level of a schematic principle, which will be instantiated once U_𝒜 is defined.

Fix a finite structured pattern P (think: a finite unfolding object with a set of bags and specified overlaps). Let biso(P, G) denote the number of of P into G: assignments of vertices of P to vertices of G that preserve labels and edges and that respect the intended identifications within bag overlaps. Such assignments can be viewed as homomorphisms from an auxiliary relational structure encoding the bag incidences; equivalently, they are homomorphisms from a graph equipped with extra equality constraints.

The conversion proceeds in two steps. First, equality constraints are eliminated by expanding over partitions of the vertex set of P, producing a finite set Quot(P) of ordinary graphs (quotients of the underlying graph of P) such that
biso(P, G) = ∑_{F ∈ Quot(P)}α_P, F hom(F, G)
for explicit integers α_P, F. Second, by ordering quotients by (say) number of vertices and refining by isomorphism type, the coefficient matrix becomes triangular with nonzero diagonal; hence the system is invertible over ℚ, and we also have
hom(F, G) = ∑_{P ∈ UnfTypes( ≤ |V_F|)}β_F, P biso(P, G)
for suitable coefficients β_F, P. In particular, vectors of bag-isomorphism counts and vectors of homomorphism counts over the induced pattern family determine one another.

We will use this linear-algebraic bridge as follows: stabilization of refinement colors will be shown to be equivalent to counting unfolding types P inside G; the above relations then imply that χ_𝒜(G) is determined exactly by hom(F, G) over the induced family of patterns F. The maximality direction will rely on explicit separators rather than linear algebra, but the same quotient-and-triangularity viewpoint clarifies why the resulting family is canonical.

We now formalize the class of refinement-based architectures covered by our meta-theorems. The common interface is that an architecture 𝒜 maintains discrete states (``colors’’) on k-tuples of vertices, updates these states for finitely many rounds by aggregating over a fixed tuple-neighborhood relation, and finally applies an invariant or equivariant readout. We emphasize that we treat the update as an information-theoretic refinement operator: whenever we write Hash below, it stands for an injective encoding of the input data into a fresh color alphabet (equivalently, a canonical naming of isomorphism types of the local view presented to the update).

Fix k ≥ 1. For each input graph G, the architecture maintains a sequence of colorings
c_G^t : V_G^k → Γ_t, t = 0, 1, 2, …,
where Γ_t is a (finite or countable) color set that may grow with t. We allow repeated vertices in tuples, i.e., V_G^k rather than $\binom{V_G}{k}$, since equality patterns among coordinates often carry information in higher-order refinement.

The locality of the update is specified by a R. Concretely, R is a rule that, given a graph G and a tuple u ∈ V_G^k, returns a finite multiset (or set) of tuples R_G(u) ⊆ V_G^k. The interpretation is that v ∈ R_G(u) is a ``message source’’ for u. We do not impose a specific syntactic form on R here; later axioms will restrict R to be fixed across inputs and isomorphism-equivariant. For the present section it suffices that R_G(u) is defined uniformly and is finite for each (G, u).

We optionally allow a single updated alongside tuple colors. This is a sequence g_G^t ∈ Δ_t that is intended to model the effect of graph-level tokens or global pooling used within each round.

An RWA round computes for each u ∈ V_G^k an aggregated neighborhood summary from current colors and then applies a deterministic refinement map. We write this schematically as

where Hash is injective in the tuple of its arguments (so two tuples receive the same new color if and only if their previous colors, aggregated neighbor-multisets, and global values coincide). If no global channel is present, one omits g_G^t. The aggregation operator is thus fixed to be multiset collection; the architectural variability is encoded in R (which determines what is collected) and in the initialization/readout interfaces described next. This abstraction matches the standard viewpoint that sufficiently expressive MLPs together with multiset pooling simulate injective hashing of neighbor-multisets.

When a global channel is present, we allow it to be updated by an invariant aggregation over tuple colors, e.g.,
$$ g_G^{t+1}:=\mathsf{Hash}_\mathrm{glob}\Bigl(g_G^t,\; \multiset{\,c_G^t(\mathbf{u}) : \mathbf{u}\in V_G^k\,}\Bigr), $$
again with Hash_glob injective. This is a convenient way to include common ``global context’’ mechanisms while keeping the refinement semantics explicit.

Initialization fixes c_G⁰ (and possibly g_G⁰). In all examples of interest, c_G⁰(u) depends only on the labeled equality/adjacency type induced by the tuple, for instance:
c_G⁰(u) = ℓ_G(u) for k = 1,
and for k > 1 one may include (i) the label tuple (ℓ_G(u₁), …, ℓ_G(u_k)), (ii) the equality pattern on coordinates (u_i = u_j or not), and (iii) the adjacency pattern among coordinates ({u_i, u_j} ∈ E_G or not). We keep initialization abstract, since later axioms only require that it respects isomorphisms.

Given an input graph G, we run the refinement until it stabilizes, i.e., until some T = T(G) satisfies c_G^T = c_G^T + 1 (and g_G^T = g_G^T + 1 if present). We denote the stabilized tuple-coloring by c_G^∞ := c_G^T.

A graph-level representation is obtained by an isomorphism-invariant function of the stabilized coloring, typically the multiset histogram of tuple-colors:
$$ \chi_{\mathcal{A}}(G)\;:=\;\mathsf{Read}^{\mathrm{inv}}\Bigl(\multiset{\,c_G^\infty(\mathbf{u}) : \mathbf{u}\in V_G^k\,},\; g_G^\infty\Bigr). $$
The simplest choice is to take Read^inv to be the identity on the multiset (or its multiplicity vector over Γ_T), but we allow any invariant post-processing since our results depend only on the stable partition induced by c_G^∞.

Equivariant outputs are defined by restricting the stabilized coloring to tuples that contain designated vertices and then applying an invariant aggregation over the remaining coordinates. For example, a node-level representation may be defined by
$$ \chi_{\mathcal{A}}(u)\;:=\;\mathsf{Read}^{\mathrm{n}}\Bigl(\multiset{\,c_G^\infty(\mathbf{w}) : \mathbf{w}\in V_G^k,\; \exists i\in[k]\; \mathbf{w}_i=u\,}\Bigr), $$
and an ordered-pair (edge-level) representation analogously by conditioning on (u, v) appearing in specified coordinates:
$$ \chi_{\mathcal{A}}(u,v)\;:=\;\mathsf{Read}^{\mathrm{e}}\Bigl(\multiset{\,c_G^\infty(\mathbf{w}) : \mathbf{w}\in V_G^k,\; \mathbf{w}_1=u,\;\mathbf{w}_2=v\,}\Bigr), $$
with the obvious generalization to k-tuple outputs χ_𝒜(u). This interface covers the usual situation where an architecture is permutation-equivariant at the tuple level and then applies coordinatewise pooling to obtain node/edge embeddings.

We record how several common GNN refinement schemes fit into the above template. These examples only serve to anchor the abstraction; later sections will characterize their induced homomorphism families.

Take k = 1 and define R_G(u) := {v ∈ V_G : {u, v} ∈ E_G}. With initialization c_G⁰(u) = ℓ_G(u), the update
$$ c_G^{t+1}(u)=\mathsf{Hash}\Bigl(c_G^t(u),\multiset{\,c_G^t(v):v\in N_G(u)\,}\Bigr) $$
is exactly the standard color refinement / 1-WL update when Hash is injective. Many message-passing architectures are continuous relaxations of this scheme; at our level of abstraction they are treated as computing a stable partition coarser than or equal to the one produced by injective hashing.

Let u = (u₁, …, u_k). The classical k-WL neighborhood replaces one coordinate by an arbitrary vertex:
$$ \mathsf{R}_G^{\mathrm{WL}}(\mathbf{u}) :=\bigcup_{i=1}^k \bigl\{(u_1,\dots,u_{i-1},w,u_{i+1},\dots,u_k): w\in V_G\bigr\}. $$
This relation is global (it ranges over all vertices) and yields the usual k-dimensional refinement on tuples. Local variants are obtained by restricting w to a graph-theoretic neighborhood of u_i, which we discuss next.

A common restriction is to only modify a coordinate along an edge. Define
$$ \mathsf{R}_G^{\mathrm{loc}}(\mathbf{u}) :=\bigcup_{i=1}^k \bigl\{(u_1,\dots,u_{i-1},w,u_{i+1},\dots,u_k): w\in N_G(u_i)\bigr\}. $$
This is a purely local schema: messages to u come from tuples obtained by a single-step move in exactly one coordinate. The resulting refinement is strictly weaker than k-WL in general but admits a clean unfolding semantics in terms of locally generated tuple-neighborhood trees.

For k = 2, writing (u, v) for a tuple, the local neighborhood becomes
R_G^pair(u, v) := {(u^′, v) : u^′ ∈ N_G(u)} ∪ {(u, v^′) : v^′ ∈ N_G(v)},
optionally augmented by the swapped pair (v, u) to model symmetric pair features. Such pair refinements underlie many ``2-GNN’’ and edge-aware architectures; the difference between variants is largely in which additional relations are included (e.g., conditioning on whether {u, v} ∈ E_G in initialization, or adding special handling for diagonal pairs u = v).

Architectures that explicitly encode small rooted subgraphs can also be represented in the tuple framework by taking k = s for a fixed subgraph size s and interpreting a tuple u ∈ V_G^s as an ordered candidate vertex set for a pattern instance. One then chooses R so that neighboring tuples correspond to simple edits of the candidate set (e.g., replace one coordinate by a nearby vertex, swap coordinates to enforce permutation symmetry, or constrain replacements to maintain a connected induced subgraph around a designated root coordinate). The refinement colors on V_G^s can be read out to node-level summaries by pooling over tuples that contain a given node. This viewpoint captures a wide range of ``Subgraph-GNN’’ constructions in which the internal message passing over extracted subgraphs is abstracted into an injective refinement on the space of tuples representing those subgraphs.

Finally, several standard modifications are subsumed by the optional global channel and by allowing R to be a union of finitely many basic relations. For instance, adding a graph-level token that aggregates from all nodes each round corresponds to including g_G^t as above; incorporating multiple edge types or directions corresponds to defining R_G(u) as a disjoint union of neighbor multisets indexed by relation type and including the index in the hashed summary. These extensions do not change the refinement nature of the computation; they only enrich the local view presented to Hash.

The remainder of the paper isolates the precise axioms under which such an RWA specification admits a canonical homomorphism-count characterization and a maximal separating family.

4. Axioms for Existence and Uniqueness: (A1) equivariance, (A2) stabilization, (A3) locality schema, (A4) compositionality, (A5) unfolding identifiability; discussion, necessity, and a taxonomy of failure modes.

The RWA interface of Section~ isolates a large class of refinement procedures, but it does not by itself guarantee that the resulting stable representations admit a canonical characterization by homomorphism counts, nor that the induced family of ``test patterns’’ is uniquely determined. We therefore impose five axioms (A1–A5). Each axiom formalizes a closure or uniformity principle that is implicitly satisfied by the standard refinement schemes (WL and its local variants) and is explicitly used in our completeness and maximality arguments.

We require that initialization, each round update, and the readout commute with graph isomorphisms. Concretely, for an isomorphism π : V_G → V_H between labeled graphs G and H, we extend π coordinatewise to tuples u = (u₁, …, u_k) ∈ V_G^k by π(u) = (π(u₁), …, π(u_k)). Then (A1) demands that for all t ≥ 0,
c_H^t(π(u)) = c_G^t(u), g_H^t = g_G^t,
and that the readout is invariant/equivariant in the standard sense:
χ_𝒜(H) = χ_𝒜(G), χ_𝒜^H(π(u)) = χ_𝒜^G(u), χ_𝒜^H(π(u), π(v)) = χ_𝒜^G(u, v),
for graph-/node-/edge-level outputs (and analogously for general tuple outputs).
Axiom (A1) is not merely cosmetic: it is the minimal condition under which χ_𝒜 can be compared to homomorphism counts, since hom(F, ⋅) is itself isomorphism-invariant (and the rooted variants are equivariant).

We assume that for every finite input graph G, the refinement dynamics reaches a fixed point in finitely many rounds:
∃ T(G) < ∞: c_G^T(G) = c_G^T(G) + 1 (and g_G^T(G) = g_G^T(G) + 1 if present).
This axiom separates refinement procedures from more general recurrent computations. In the information-theoretic view where Hash is injective, standard WL-style refinements are monotone in the sense that they refine partitions of V_G^k and hence must stabilize after at most |V_G|^k proper refinement steps. However, the abstract RWA interface permits global channels and post-processing, and it allows color alphabets Γ_t to vary with t. Axiom (A2) asserts that, regardless of such details, the architecture admits a well-defined stable coloring c_G^∞ that underlies χ_𝒜. Without stabilization, there is no canonical limit partition'' to which one could attach a basis of local witnesses, and the equivalence relationχ_𝒜(G) = χ_𝒜(H)’’ becomes ill-behaved with respect to the algebraic manipulations used later.

We require that the neighbor relation is specified by a template R that is instantiated on each input graph and is isomorphism-equivariant. Formally, for every isomorphism π : G → H and every tuple u ∈ V_G^k, we demand
π(R_G(u)) = R_H(π(u))
as multisets of tuples. In particular, R may depend on k and on the labeled adjacency/equality type of u in G, but it may not depend on non-isomorphism-invariant features (such as vertex indices) nor on graph-specific parameters chosen adversarially per input. We also assume that R_G(u) is finite for each (G, u), ensuring that one refinement round only inspects a finite ``local view’’ around u.

Axiom (A3) is the uniformity principle that makes an ``unfolding semantics’’ possible. The unfolding objects Unf_𝒜^(D)(G, u) used in Section~5 are defined by repeatedly expanding according to R_G; if R were allowed to vary arbitrarily with G, there would be no single object family U_𝒜 whose isomorphism types enumerate the possible local views across all graphs.

The meta-theorems rely on standard amplification and linear-independence arguments that operate by forming composite graphs from smaller ones. We therefore assume that the architecture is with respect to two operations.

For disjoint graphs G and H, the refinement on G ⊎ H should decompose into independent refinements on the components, up to a canonical combination at the readout stage. In its weakest form, we require that χ_𝒜(G ⊎ H) is determined by χ_𝒜(G) and χ_𝒜(H) (and analogously for rooted/equivariant outputs when the root is contained in a specified component). This prevents architectures from encoding nonlocal ``cross-component’’ information not justified by the locality template R.

We also assume closure under the categorical product G × H, which is the standard device for turning local indistinguishability into global lower bounds and for enforcing linear independence of count vectors. Concretely, we require that the stable representation of G × H is a function of the stable representations of G and H, in a way compatible with the product structure (and similarly for rooted products). At the level of refinement, the intended reading is that the refinement operator respects product symmetries and does not contain ad hoc rules that ``detect’’ that a graph is a product and branch on that event.

Axiom (A4) rules out special-casing that would otherwise destroy the existence of a homomorphism family: maximality is proved by showing that indistinguishability is preserved under such compositions, while certain homomorphism counts multiply or combine predictably. If 𝒜 violates these closure properties, then indistinguishability may fail to be a congruence with respect to ⊎ or ×, and the usual algebraic completion argument that produces a unique maximal family F_𝒜 can break down (cf. Proposition~5).

Finally, we require that stable colors correspond faithfully to isomorphism types of unfolding objects induced by R. Informally, (A5) says that 𝒜 does not collapse two genuinely different local views into the same stable color unless those views are already indistinguishable by R-unfolding isomorphism. Formally, for each depth D we consider the rooted unfolding object Unf_𝒜^(D)(G, u) obtained by expanding from u according to R_G and recording at each node the relevant initial type information (labels, equality patterns, adjacency within tuples, and, if present, the global-channel input viewed as a root annotation). Axiom (A5) asserts:

The first direction is typically proved by induction from (A1) and the definition of the update; the second direction is an additional identifiability requirement that excludes degenerate architectures whose update or readout intentionally merges distinct unfolding types. It is precisely this axiom that allows us to treat the stabilized colors as local witness objects and to define U_𝒜 as the set of all unfoldings that can occur.

Each axiom corresponds to a distinct class of pathologies.
Violating (A1) allows the architecture to depend on vertex identifiers or arbitrary tie-breaking; then χ_𝒜 need not be expressible in terms of any isomorphism-invariant counting logic, and homomorphism characterizations are inapplicable.
Violating (A2) permits oscillations or unbounded state growth; then there is no stable partition and no finite-depth unfolding basis that determines the output.
Violating (A3) leads to non-uniform locality where the neighborhood rule depends on global properties of the input graph; in that case there is no single induced unfolding family, and any purported F_𝒜 would have to encode graph-dependent behavior.
Violating (A4) admits ``if–then’’ constructions that break congruence under ⊎ or ×; such architectures can implement functions for which no unique maximal separating homomorphism family exists, as the closure steps in the maximality proof fail.
Violating (A5) allows intentional information loss that collapses distinct unfoldings; then U_𝒜 cannot be recovered from the stable coloring, and the completeness direction in Section~5 no longer yields an characterization.

In the remainder, we work exclusively under (A1–A5). These axioms are strong enough to force a canonical unfolding semantics and weak enough to include the refinement schemes of interest; the subsequent completeness and maximality theorems should be read as consequences of this constrained, but still expressive, design space.

5. Upper Bound / Completeness: unfolding objects and bag-isomorphism counting; show representations are determined by hom counts on a derived pattern basis; matrix factorization argument (bag-isomorphism ↔︎ hom).

We establish the upper bound direction: the stable representation produced by 𝒜 is determined by homomorphism counts over an explicit pattern basis induced by the neighbor template R. Throughout we fix the tuple dimension k and write c_G^t : V_G^k → Γ_t for the tuple-coloring after t rounds.

For D ≥ 0, a graph G, and a tuple u ∈ V_G^k, we define the depth-D unfolding object Unf_𝒜^(D)(G, u) by iteratively expanding from u according to the neighbor relation R_G. Concretely, nodes of the unfolding are R-walks (u = u₀, u₁, …, u_d) of length d ≤ D such that u_i + 1 ∈ R_G(u_i), with the root being the empty walk at u. Each node is annotated by the of the corresponding tuple in G, namely: the vertex labels (ℓ_G(u₁), …, ℓ_G(u_k)), the equality pattern on coordinates, and the adjacency pattern between coordinates (as induced by E_G); if 𝒜 uses a global channel, we include its initial value as an additional root annotation. Children of a node corresponding to x are indexed by the multiset R_G(x), so that multiplicities in R_G(x) are represented as multiplicities of isomorphic subtrees. Isomorphisms between unfoldings are root-preserving, type-preserving, and respect these multiset child structures.

The relevance of unfoldings is that one refinement round is precisely one expansion step together with multiset aggregation. By induction on D, using (A1) and the fact that R is instantiated equivariantly on each input (A3), we obtain the following correctness statement.

If Unf_𝒜^(D)(G, u) ≅ Unf_𝒜^(D)(H, v), then c_G^D(u) = c_H^D(v).

By (A5), stable colors do not identify genuinely different unfolding types. In particular, for each pair (G, u) and (H, v), equality c_G^∞(u) = c_H^∞(v) implies that their unfoldings are isomorphic for all sufficiently large depths D (depending only on the stabilization depths of G and H). Thus stable tuple-colors may be treated as of unfolding isomorphism types.

Let U_𝒜 be the set of all finite unfolding objects that arise as Unf_𝒜^(D)(G, u) for some G ∈ 𝒢, u ∈ V_G^k, and D ≥ 0, taken up to isomorphism. For each τ ∈ U_𝒜, we write N_τ(G) for the number of tuples u ∈ V_G^k whose stabilized color corresponds to τ, equivalently whose unfolding type is τ at a depth beyond stabilization. Since χ_𝒜(G) is (by definition of the RWA interface) an invariant of the multiset of stabilized tuple-colors (possibly post-processed by an invariant readout), we conclude:

For every G, the stable representation χ_𝒜(G) is determined by the vector (N_τ(G))_{τ ∈ U_𝒜}, with finite support on each fixed G.

It remains to express these unfolding-type counts in terms of homomorphism counts of ordinary pattern graphs.

A depth-D unfolding object can be encoded by a finite capturing the same local constraints while remembering the tuple structure. Formally, with each unfolding type τ ∈ U_𝒜 we associate a finite structure Bag(τ) consisting of (i) a finite graph-shaped incidence skeleton given by the unfolding tree truncated at depth D (for some representative depth where τ is realized), (ii) a distinguished of k vertices for each node of the skeleton, together with explicit equalities inside the bag, and (iii) edge and label constraints that enforce the adjacency/equality/label types recorded by τ. A from Bag(τ) into G is a mapping of all vertices of Bag(τ) to V_G that is a graph homomorphism and is injective on each bag after quotienting by the enforced equalities, while preserving the marked root bag. We write biso(Bag(τ), G) for the number of such bag-isomorphisms.

The construction is arranged so that choosing a tuple u in G together with compatible witnesses for each expansion step of τ is equivalent to choosing a bag-isomorphism of Bag(τ) into G. Consequently, for each τ we have an identity of the form
N_τ(G) = biso(Bag(τ), G),
up to a fixed normalization that only depends on automorphisms of the bagged pattern (and may be absorbed by considering rooted variants of Bag(τ)). Hence χ_𝒜(G) is determined by the family of numbers biso(Bag(τ), G) as τ ranges over U_𝒜.

We now pass from bag-isomorphism counts to ordinary homomorphism counts. The key point is that bag-injectivity constraints can be removed by a finite inclusion–exclusion (equivalently, a Möbius inversion over partitions), yielding a linear combination of homomorphism numbers.

Fix a finite bagged pattern B (in our application, B = Bag(τ)). Consider all quotients of B obtained by identifying vertices in all possible ways consistent with the forced equalities. Each such quotient is an ordinary finite graph F together with induced labels; we denote by Spasm(B) the finite set of all graphs F obtainable in this way (up to isomorphism). For each F ∈ Spasm(B), there is a natural surjection q : B → F. Any homomorphism h : F → G composes with q to a homomorphism B → G that violates bag-injectivity exactly on those identifications represented by q. Standard Möbius inversion over the lattice of partitions inside each bag yields integer coefficients μ_B, F such that for all G,

Equation~ is finite because Spasm(B) is finite.

Conversely, for each finite graph F we may pick a canonical bagged expansion B_F whose bags remember the tuple incidences that arise in our unfolding grammar (for instance, by taking a tree-shaped covering of F consistent with R and placing a k-bag at each node). When these expansions are ordered by refinement (coarser identifications first), the coefficient system μ_B, F becomes triangular with respect to this order, and the diagonal coefficients are 1. Therefore the linear transformation from the vector (hom(F, G))_{F ∈ Pat} to (biso(B, G))_{B ∈ BagPat} is invertible over ℤ on each finite truncation of the index sets. In particular, for any finite subcollection ℬ of bagged patterns, there exists a finite graph family ℱ and an integer matrix M with det (M) = ±1 such that
(biso(B, G))_{B ∈ ℬ} = M ⋅ (hom(F, G))_{F ∈ ℱ}
for all G. Hence the bag-isomorphism counts and the corresponding homomorphism counts carry exactly the same information.

Applying this to ℬ = {Bag(τ) : τ ∈ U_𝒜^( ≤ D)}, the set of unfolding types realizable up to some fixed depth D, we obtain a finite induced pattern set Pat(U_𝒜^( ≤ D)) of ordinary graphs such that, for all G, the depth-D unfolding statistics (and thus the depth-D colors) are determined by (hom(F, G))_{F ∈ Pat(U_𝒜^( ≤ D))}. Letting D exceed stabilization depths yields the desired completeness statement.

Define Pat(U_𝒜) := ⋃_D ≥ 0Pat(U_𝒜^( ≤ D)). For each fixed input graph G, only finitely many depths are relevant (beyond stabilization no new tuple-colors appear), so the homomorphism vector (hom(F, G))_{F ∈ Pat(U_𝒜)} determines χ_𝒜(G). Moreover, if χ_𝒜(G) = χ_𝒜(H), then the stabilized tuple-color multisets agree, hence all unfolding-type counts agree, hence all corresponding bag-isomorphism counts agree, and by invertibility of the linear system, hom(F, G) = hom(F, H) for all F ∈ Pat(U_𝒜). We thus obtain the equivalence
χ_𝒜(G) = χ_𝒜(H) ⇔ hom(F, G) = hom(F, H) ∀ F ∈ Pat(U_𝒜),
which is the promised completeness (upper bound) in homomorphism-counting terms.

The rooted/node- and edge-level variants are obtained by the same argument with rooted unfoldings. For node-level outputs, we work with tuples u in which one coordinate is designated as the output node (or, equivalently, with rooted graphs G^u) and define rooted unfolding objects Unf_𝒜^(D)(G^u, u) whose root annotation records the mark. The associated bagged patterns are rooted, and the linear reduction is replaced by its rooted analogue, expressing rooted bag-isomorphism counts as integer combinations of rooted homomorphism counts hom(F, G^u). The edge-level case is identical with two marked vertices. In each case, stabilization and (A5) identify equivariant stable representations with rooted unfolding types, and the bag-isomorphism factorization transfers this information to rooted homomorphism counts over the induced rooted pattern basis.

6. Lower Bound / Maximality: generalized pebble games parameterized by the neighbor relation template; construction of generalized F"urer gadgets and twists; rooted (clique-augmented) gadgets for node/edge-level; prove that every forbidden pattern yields a separating pair.

We now establish the lower bound direction: the homomorphism basis extracted from the unfolding semantics is . Concretely, for every pattern graph F ∉ F_𝒜 we construct an explicit separating pair (G_F, H_F) such that χ_𝒜(G_F) = χ_𝒜(H_F) while hom(F, G_F) ≠ hom(F, H_F); we also give rooted variants yielding separations at node- and edge-level.

Fix the neighbor template R on k-tuples. We define a two-structure game PG_R^D(G, H) of length D that mirrors the refinement-with-aggregation interface. A position is a pair of tuples (u, v) ∈ V_G^k × V_H^k. We say that (u, v) is if the initial types of u in G and v in H agree (labels, equality pattern on coordinates, and adjacency pattern between coordinates; and, if present, the global channel annotation agrees at the root).

In round i, Spoiler chooses one of the two graphs, say G, and selects a neighbor tuple u^′ ∈ R_G(u). Duplicator must respond with a tuple v^′ ∈ R_H(v). The game proceeds from (u^′, v^′). Duplicator loses immediately if the resulting position is illegal. Duplicator wins if she can survive D rounds.

This game is the natural abstraction of one-step multiset aggregation: R determines which tuple-colors can influence the next color, and legality enforces that any matched tuples have compatible intrinsic types. The standard induction, adapted to the present R-parametric setting, yields:

For all D ≥ 0, if Duplicator wins PG_R^D(G, H) from (u, v), then c_G^D(u) = c_H^D(v). Conversely, by (A5), if c_G^D(u) = c_H^D(v), then the depth-D unfoldings are isomorphic and Duplicator wins from (u, v).

Passing to stabilization, we obtain: if for every u ∈ V_G^k there exists v ∈ V_H^k with Duplicator winning for all D (and symmetrically), then the multisets of stable tuple-colors coincide and hence χ_𝒜(G) = χ_𝒜(H). Thus, to prove indistinguishability by 𝒜, it suffices to provide a Duplicator strategy in this R-game.

By definition of F_𝒜, a pattern F ∉ F_𝒜 is one whose homomorphism count is determined by the unfolding-type statistics encoded in χ_𝒜. Equivalently (and this is the viewpoint we use for gadget design), F fails to admit the decomposition discipline implicit in the unfolding grammar U_𝒜: there is no way to represent F as a consistent gluing of local k-tuple constraints generated by repeated R-expansion. In familiar instantiations this failure specializes to, e.g., having a cycle (for k = 1 MPNNs), or exceeding a prescribed NED/treewidth bound (for higher-order local architectures). We keep the discussion abstract and only use the existence of such a failure mode, witnessed by the R-pebble game characterization below.

We consider the CG_R(F), in which Spoiler tries to force a configuration that violates any putative R-consistent unfolding decomposition of F. One may formalize CG_R(F) as a finite obstruction search over partial R-walks of k-tuples in F, with legality given by the same initial-type constraints as above, and with moves constrained to follow R_F. The key property is:

Alice (Spoiler) wins CG_R(F) if and only if F ∉ F_𝒜.

We use this equivalence only as an interface: given F ∉ F_𝒜, we fix a finite winning transcript for Spoiler in CG_R(F) and compile it into a gadget in which R-local views are symmetric while hom(F, ⋅) detects the asymmetry.

We now describe a concrete gadget family Gad(F) together with a operation that preserves all R-local unfolding types but changes hom(F, ⋅). The construction generalizes the Cai–F"urer–Immerman (CFI) paradigm to arbitrary degree patterns and to the present R-parametric locality.

Fix a connected pattern F = (V_F, E_F, ℓ_F), and orient each edge arbitrarily to obtain an incidence list at every vertex. For each vertex x ∈ V_F of degree d(x), we create a C_x consisting of 2^{d(x) − 1} vertices, indexed by even-parity bitstrings α ∈ {0, 1}^d(x) (or equivalently even subsets of incident edges). Intuitively, α encodes a parity constraint on how a homomorphism traverses the incident edges of x. Labels are lifted by setting all vertices of C_x to carry label ℓ_F(x) (or a refined label if we need to break symmetry between different degrees).

For each edge e = {x, y} ∈ E_F, we connect clouds C_x and C_y by a perfect matching defined as follows. Let the coordinate corresponding to e in x’s incidence list be i, and the coordinate corresponding to e in y’s incidence list be j. For α ∈ C_x and β ∈ C_y, we add an edge between the vertices (x, α) and (y, β) if and only if α_i = β_j. This yields a graph Gad(F) whose local neighborhoods are highly regular: any vertex in a cloud sees, for each incident edge of F, exactly half of the opposite cloud, and the parity restriction is internal to the cloud.

To define the twist, fix a designated edge e^⋆ = {x^⋆, y^⋆} ∈ E_F. We obtain Twist(Gad(F), e^⋆) by modifying only the matching on e^⋆: instead of connecting (x^⋆, α) to (y^⋆, β) when α_i = β_j, we connect them when α_i ≠ β_j. All other adjacencies and labels are unchanged. We write
G_F := Gad(F), H_F := Twist(Gad(F), e^⋆).

The reason G_F and H_F are hard for refinement procedures is that the twist is globally defined but locally invisible to R-bounded exploration. Formally, one shows that Duplicator wins PG_R^D(G_F, H_F) for every D, from an appropriate correspondence between tuples in G_F and tuples in H_F. The correspondence is induced by an 𝔽₂-affine symmetry: within each cloud C_x, flipping all indices by adding a fixed even-parity vector is an automorphism; moreover, along each non-twisted edge matching, equal-bit constraints are preserved under synchronized flips of the relevant coordinates. The only discrepancy occurs at the twisted edge e^⋆, and Duplicator compensates by switching between two affine cosets when the play crosses e^⋆. Since the game only exposes neighborhoods through R-moves, and R is fixed and equivariant, this compensation can be maintained uniformly for any finite number of rounds.

Consequently, stabilized tuple-colors match, and therefore
χ_𝒜(G_F) = χ_𝒜(H_F).
This is the point at which (A1)–(A3) are used: equivariance ensures that the Duplicator strategy, defined in terms of the gadget symmetries, aligns with the refinement operator; and the fixed template R prevents the architecture from probing non-R-local global structure.

It remains to show that the twist changes hom(F, ⋅) whenever F ∉ F_𝒜. A homomorphism h : F → G_F chooses, for each x ∈ V_F, a vertex in the corresponding cloud C_x, i.e. an even-parity index α_x. The edge constraints ensure that for each edge {x, y}, the selected indices satisfy (α_x)_xy = (α_y)_yx in G_F, whereas in H_F the constraint is flipped on e^⋆. Thus homomorphisms correspond to solutions of a system of 𝔽₂-constraints of the form
(α_x)_xy ⊕ (α_y)_yx = b_xy,
with b_xy = 0 for all edges in G_F and b_x^⋆y^⋆ = 1 (and b = 0 elsewhere) for H_F, together with the requirement that each α_x has even parity.

If F admitted an R-consistent unfolding decomposition, then these constraints would be satisfiable in a way that is invariant under flipping b_e^⋆, and the solution counts (hence homomorphism counts) would coincide. Precisely the failure F ∉ F_𝒜 yields an obstruction cycle (or, in general, an R-certified parity witness) forcing different solution counts when a single edge constraint is flipped. Therefore
hom(F, G_F) ≠ hom(F, H_F).
This establishes the required separation at graph level.

For node- and edge-level statements, we require separations that persist under equivariant readouts, i.e. we must exhibit roots r and s in G_F and H_F whose stable equivariant representations agree while rooted homomorphism counts differ. We enforce root rigidity by a standard clique-augmentation.

We form Ĝ_F from G_F by adjoining a labeled clique K of size k + 2 whose vertices carry pairwise distinct labels (or, equivalently, are made distinguishable by attaching uniquely labeled leaves). We connect a designated vertex of K to a designated anchor vertex of G_F (chosen in a cloud corresponding to a fixed vertex of F), and do the same for H_F, obtaining Ĥ_F. The unique labeling forces every homomorphism of a rooted pattern to map the root(s) into prescribed vertices of K, and forces any 𝒜-equivariant representation at those vertices to be canonical. Since the same rigid gadget is attached in both graphs and the attachment is R-locally identical, the previous Duplicator strategy extends, yielding equality of node-/edge-level stable representations at the chosen roots. On the other hand, rooted homomorphisms hom(F, Ĝ_F^u) and hom(F, Ĥ_F^u) still encode the same parity system on the underlying CFI part and hence remain separated.

Combining indistinguishability in the R-pebble game with the parity separation of homomorphism counts, we obtain the promised maximality principle: for every forbidden pattern F ∉ F_𝒜 we can explicitly produce (G_F, H_F) (and rooted analogues) witnessing that no invariant computable from χ_𝒜 can recover hom(F, ⋅). This completes the lower bound direction and, together with the completeness statement proved earlier, pins down F_𝒜 as the unique maximal homomorphism expressivity family associated with 𝒜 under axioms (A1)–(A5).

7. Recipe to Derive F_𝒜: how to compute/describe the candidate family from an architecture spec; conversion of unfolding-basis descriptions into structural decompositions (treewidth/NED-like) when possible; guidance for new architectures.

Having proved that every refinement-with-aggregation architecture 𝒜 satisfying (A1–A5) induces a unique maximal homomorphism expressivity family F_𝒜, we now make explicit how one (and, in favorable cases, ) this family from the architecture-level specification. The point is not that one can always list F_𝒜 explicitly, but rather that one can (i) compute an unfolding basis U_𝒜 mechanically from the neighbor template R and the initialization types, (ii) read off from U_𝒜 a candidate pattern family Pat(U_𝒜) that is complete for χ_𝒜, and (iii) translate U_𝒜 into a structural decomposition discipline on patterns, yielding a concise characterization of F_𝒜 (treewidth/NED-like) when such a translation exists.

We first rewrite the forward pass of 𝒜 in the RWA normal form: a tuple dimension k, an initial type map on k-tuples (encoding labels and equality/adjacency patterns among coordinates, and any extra designated channels), and a fixed neighbor template R that is instantiated as R_G(u) on each input graph G. In this normalization we ignore the particular hash function and aggregator implementation; only the interface matters: the next-round color of a tuple is a function of its current color and the multiset of colors in R_G(u). If the architecture uses several aggregators (e.g. multiple radii or multiple relation types), we take R to be the disjoint union of the corresponding neighbor relations, with colors recording the relation identifier; this is without loss of generality for expressivity.

Given R and the initial tuple types, we define the depth-D unfolding object Unf_𝒜^(D)(G, u) inductively:

We then quotient by isomorphism of unfoldings (root-preserving, type-preserving, and respecting relation tags). The family U_𝒜 is the set of all finite unfolding types obtainable as Unf_𝒜^(D)(G, u) over all finite graphs G, tuples u ∈ V_G^k, and depths D. By (A5), distinct stable colors correspond to distinct unfolding types for sufficiently large D; thus U_𝒜 is the canonical ``witness vocabulary’’ of 𝒜.

In practice, this step is a grammar extraction: from the syntactic description of R we write an inductive datatype whose constructors correspond to (i) initial tuple-types and (ii) R-expansions that collect multisets of previously constructed objects, decorated by relation tags. When R has bounded arity and finitely many clauses, this grammar is finite-state in the usual sense (though it generates infinitely many objects across depths).

To obtain a homomorphism-complete family, we associate to each unfolding object U ∈ U_𝒜 a finite set of pattern graphs Pat(U) whose homomorphism counts into G recover the number of occurrences of U as an unfolding type in G. Concretely, one proceeds in two substeps.

This compilation is canonical up to homomorphic equivalence, and it is designed so that counting homomorphisms from P into G is (via a triangular linear system) equivalent to counting bag-isomorphisms from Bag(U) into G, hence equivalent to counting tuples in G whose unfolding type is U. We then set
Pat(U_𝒜) := ⋃_{U ∈ U_𝒜}Pat(U).
By the completeness direction proved earlier, equality of χ_𝒜(G) and χ_𝒜(H) is equivalent to equality of hom(P, ⋅) over all P ∈ Pat(U_𝒜) (and rooted analogues if the readout is equivariant).

While Pat(U_𝒜) is already a complete basis, it is often redundant and not structurally transparent. The structurally meaningful object is the on patterns induced by U_𝒜: a pattern F is said to be if it can be obtained by a finite construction that mirrors the unfolding grammar, where each constructor adds a constant-size piece (a bag) and glues it to previously built pieces according to the overlap constraints specified by the corresponding R-tag. Equivalently, F admits a decomposition whose ``bags’’ are k-tuple-shaped (or (k + 1)-tuple-shaped, depending on whether R replaces one coordinate at a time) and whose adhesion sets are exactly those coordinate overlaps permitted by R.

Operationally, we proceed as follows. We consider the hypergraph ℋ_F whose hyperedges are the candidate bags, and we attempt to orient and arrange these bags into a tree-like or hypertree-like object such that:

When this succeeds, we typically obtain a known width measure: for k-tuple replacement neighborhoods one recovers tree decompositions of width ≤ k; for pair-based neighborhoods with endpoint-sharing constraints one recovers NED-like decompositions. When it does not match a known width measure, we obtain a new decomposition notion canonically attached to R; in that case, F_𝒜 is best stated as the class of patterns admitting this R-decomposition.

From the preceding step, we propose the candidate family
F̂_𝒜 := {F ∈ 𝒢 : F is U_𝒜-decomposable},
with the corresponding rooted families defined by requiring a decomposition that respects the designated root tuple(s) as a distinguished initial bag. The meta-theorem then supplies the tightness criterion: to identify F̂_𝒜 with the unique maximal family F_𝒜, it suffices to verify that the decomposition discipline exactly matches the R-pebble-game characterization of indistinguishability on patterns. In practice this is done by proving two implications:

Once these are shown, maximality and uniqueness promote F̂_𝒜 to F_𝒜 automatically.

For a newly proposed 𝒜, we recommend the following workflow.

In summary, the derivation of F_𝒜 is a compilation pipeline
(𝒜, R) ⇒ U_𝒜 ⇒ Pat(U_𝒜) ⇒ decomposition discipline ⇒ F_𝒜,
where the last implication is certified by the game/gadget maximality interface. This is the mechanism we will instantiate next to recover the known families for standard architectures and to isolate, in a formal way, which axiom failures lead to pathological behavior.

8. Instantiations and Corollaries: recover known F_M for standard GNNs; formalize the pathological counterexample and show exactly which axiom it violates; additional examples (distance-guarded variants, fixed sparse-attention variants).

We now instantiate the derivation pipeline of on a number of standard neighborhood templates, and we record the resulting maximal families in the form most useful for subsequent arguments. In each case the identification follows the same logic: from the syntactic description of R we read off the unfolding grammar U_𝒜, from U_𝒜 we extract the permitted overlap types between constant-size bags, and we then recognize the induced decomposition discipline as a known width notion. The soundness direction is then a dynamic program over the induced decomposition, while completeness follows from the R-pebble-game/gadget interface developed earlier.

Consider first the archetypal message-passing architecture with tuple dimension k = 1 and neighbor template
R_G(u) = {u^′ ∈ V_G : {u, u^′} ∈ E_G},
possibly with multiple relation types corresponding to edge labels. The depth-D unfolding Unf_𝒜^(D)(G, u) is then a rooted, vertex-labeled tree of depth D, where each child corresponds to a neighbor and relation tags record the edge type. Consequently, the induced bag structures are precisely rooted trees, and the compilation Pat(U_𝒜) yields (rooted) tree patterns. Since homomorphism counts of trees determine the stable color multiset (and conversely, tree unfoldings are the only local witnesses that survive stabilization), we recover the well-known characterization: the maximal family F_𝒜 consists of forests. More precisely, χ_𝒜(G) = χ_𝒜(H) if and only if hom(T, G) = hom(T, H) for all finite trees T (and similarly for node-rooted trees in the equivariant setting). This is the k = 1 instance of Corollary~4.

We next consider architectures whose refinement state lives on ordered pairs (u, v) ∈ V_G² (so k = 2) and whose neighbor template updates one endpoint at a time. A minimal such template is
R_G(u, v) = {(u^′, v) : u^′ ∼ u} ∪ {(u, v^′) : v^′ ∼ v},
with relation tags recording whether the first or second coordinate moved. The corresponding unfolding objects are rooted trees of pair-states, but crucially each expansion step carries an overlap constraint: along an edge of the unfolding, one coordinate is preserved. When we translate this to a decomposition discipline on a pattern F, we obtain decompositions into constant-size bags shaped like ordered pairs (or small augmentations thereof) whose adhesion sets are forced to include one endpoint. This is the abstract form of an ``endpoint-sharing’’ constraint.

Different pair-based architectures impose different additional constraints on R, and these translate into different permitted adhesions, hence different decomposition notions. For example:

In each case, the reason such NED-like notions appear is that the only overlap information that can be transported through the refinement is the overlap already hard-coded in the relation tags of R: if the template never permits an expansion that simultaneously changes both endpoints, then no decomposition assembled from the corresponding unfolding vocabulary can require larger adhesions than those endpoints.

We emphasize that the above statements are not merely upper bounds. The maximality theorem supplies the converse: whenever a pattern F fails the corresponding endpoint-discipline, we obtain explicit gadget graphs G_F, H_F that are indistinguishable by the pair-refinement (hence yield identical χ_𝒜) but satisfy hom(F, G_F) ≠ hom(F, H_F).

For k-tuple models (including k-WL-like and folklore k-GNN variants) the most common neighbor template replaces one coordinate at a time, typically along adjacency:
$$ \mathsf{R}_G(u_1,\dots,u_k)\;=\;\bigcup_{i=1}^k \{(u_1,\dots,u_{i-1},u_i',u_{i+1},\dots,u_k): u_i'\sim u_i\}, $$
together with intrinsic coordinate relations (equalities and adjacencies among coordinates) included in the tuple type. The unfolding grammar then describes how k-tuples can be expanded by changing a single coordinate while keeping the remaining k − 1 coordinates fixed, and the relation tags record which coordinate was replaced. Translating this into a pattern discipline yields a tree decomposition whose bags correspond to (k + 1)-vertex neighborhoods obtained by introducing one new vertex adjacent to a preserved k-tuple context. The adhesion sets are exactly the preserved k-tuples, hence have size at most k, and the running-intersection property follows from the tree-like nature of unfoldings.

This is precisely the standard treewidth mechanism: the induced family is the class of patterns of treewidth at most k. Concretely, we recover the known equivalence (again as in Corollary~4): for such an architecture 𝒜, the stable representation χ_𝒜(G) determines hom(F, G) for every F of treewidth ≤ k, and for every pattern F with tw(F) > k there exist indistinguishable gadget graphs separating hom(F, ⋅). Rooted analogues correspond to rooted tree decompositions in which the root tuple sits inside a designated initial bag.

We next formalize two simple ``if–then’’ constructions illustrating why the axioms (A4) and (A5) are not technical conveniences but structural necessities for the existence of a maximal homomorphism family.

Let 𝒜_⊕ be the architecture that outputs a single bit
χ_{𝒜_⊕}(G) := |V_G| mod 2,
computed via a global channel that increments a counter during refinement and then reduces mod 2 at readout. This is isomorphism-invariant and stabilizes immediately, and it can be phrased as an RWA-style procedure with a global accumulator. However, it violates compositionality under disjoint union: we have χ_{𝒜_⊕}(G ⊔ H) = χ_{𝒜_⊕}(G) ⊕ χ_{𝒜_⊕}(H) rather than a homomorphism-compatible additive behavior.

Assume for contradiction that there were a family F such that χ_{𝒜_⊕}(G) = χ_{𝒜_⊕}(H) if and only if hom(F^′, G) = hom(F^′, H) for all F^′ ∈ F. For every fixed pattern F^′, the function G ↦ hom(F^′, G) is additive under disjoint union:
hom(F^′, G ⊔ H) = hom(F^′, G) + hom(F^′, H).
Hence any graph invariant determined solely by the collection {hom(F^′, ⋅) : F^′ ∈ F} must respect the congruence induced by these additive coordinates. The parity function does not define such a congruence: for example, there exist G, H with the same parity but such that G ⊔ G and H ⊔ H have opposite parity. This contradicts determinability from hom counts, and therefore no maximal family F_{𝒜_⊕} exists. This is an explicit instance of Proposition~5 with failure of (A4).

Fix a finite graph H₀. Define an architecture 𝒜_H₀ that first runs an otherwise standard refinement procedure (say, the k = 1 neighborhood template) to stabilization, and then applies the readout
$$ \chi_{\mathcal{A}_{H_0}}(G)\;:=\; \begin{cases} 1 & \text{if } G\cong H_0,\\ 0 & \text{otherwise.} \end{cases} $$
This is isomorphism-invariant and stabilizing, but it is not ``explainable’’ by unfolding objects: the difference between H₀ and a graph G with identical stabilized color multiset (which can be arranged by choosing G among known WL-indistinguishable families) is introduced entirely at readout by an external test not reflected in the unfolding vocabulary. In particular, there cannot exist a family F such that all non-isomorphic graphs outside H₀ become homomorphism-indistinguishable while H₀ is separated, since for any nontrivial pattern F^′ the map G ↦ hom(F^′, G) is not constant across all G ≆ H₀. Thus the induced equivalence relation of χ_{𝒜_H₀} cannot be represented as equality on any set of homomorphism counts. This is a canonical instance of Proposition~5 tied to failure of (A5) (and typically also (A4), since such special-casing does not behave well under products/unions).

Finally, we note two template families that fall between the standard cases above and illustrate how the recipe produces new, architecture-specific decompositions.

Suppose R permits a coordinate replacement not only along edges but along paths of length at most r, with relation tags recording the chosen radius class (or, equivalently, working in the r-th power graph G^(r) while keeping the original labels). The unfolding grammar then expands tuples by ``guarded’’ moves within r-balls, and the induced bags must cover edges of F while maintaining adhesions that are realizable within distance ≤ r in the partially constructed pattern. The resulting pattern class can be stated as admitting an R-decomposition with radius-r guards: each new bag introduces vertices whose attachment to the existing structure is witnessed by a bounded-radius connector. This yields expressivity strictly stronger than purely edge-local templates but still limited to patterns compatible with the guarded adhesion discipline.

Assume the architecture aggregates not over all of R_G(u) but over a fixed subrelation R_G^♯(u) ⊆ R_G(u) chosen by an isomorphism-equivariant rule (e.g. restricting to neighbors of specified labels, or to a designated edge type). Then the unfolding vocabulary is computed with R^♯ in place of R, and F_𝒜 aligns with decompositions that only use the corresponding restricted attachment types. In particular, such sparsification can only decrease the maximal family: any pattern requiring attachments outside the permitted relation tags is excluded and admits gadget separations. This provides a formal way to certify expressivity loss (and, conversely, to justify safe sparsification) directly from the template.

These instantiations will be the input to the next section, where we quantify the constructivity implicit in the maximality theorem by bounding gadget sizes and discussing the complexity of recognizing membership in the structurally characterized cases.

9. Complexity and Constructivity: size bounds for gadget counterexamples; decidability/complexity of membership tests under common structural characterizations; limits of full automation.

The meta-theorem in is information-theoretic: it asserts the existence of a unique maximal homomorphism family and, for every excluded pattern F ∉ F_𝒜, the existence of explicit counterexamples (G_F, H_F) with χ_𝒜(G_F) = χ_𝒜(H_F) but hom (F, G_F) ≠ hom (F, H_F) (and rooted analogues). In this section we make precise what ``explicit’’ entails in the present setting, we record size bounds for the gadgets arising from our construction, and we delineate which membership and separation questions admit effective procedures under the structural characterizations identified in . We also isolate the points at which full automation is provably limited.

Fix an architecture 𝒜 satisfying (A1)–(A5), with tuple dimension k and neighbor template R. For a pattern F ∉ F_𝒜, Theorem~3 produces graphs G_F, H_F that are indistinguishable by the induced R-pebble game (hence by 𝒜), but are separated by F-homomorphism counts. Operationally, our construction proceeds in two steps: (i) we build a base gadget Gad(F) that locally encodes the incidence structure of F in a way compatible with the moves of the R-game, and (ii) we apply a along a chosen attachment edge (or, more generally, along a designated relation instance) to obtain Twist(Gad(F), e), which preserves R-indistinguishability but flips the parity of a controlled family of F-homomorphisms. We then set
G_F := Gad(F), H_F := Twist(Gad(F), e).
The specific form of Gad(F) depends on the decomposition obstruction exhibited by F relative to the unfolding basis U_𝒜; however, in all standard instances the gadget is a bounded-depth, bounded-adhesion blow-up of F in which each vertex v ∈ V_F is replaced by a local cloud whose internal wiring supports a exploitable by Duplicator.

The resulting size bounds are most conveniently stated in terms of degree parameters of F and the local signature induced by R. In the common case where the gadget step replaces each vertex v by a cloud of size on the order of 2^{deg_F(v) − 1} (the familiar phenomenon from CFI/F"urer constructions), we obtain
|V_{G_F}|,|V_{H_F}| = Θ (∑_{v ∈ V_F}2^{deg_F(v) − 1}), |E_{G_F}|,|E_{H_F}| = poly(∑_{v ∈ V_F}2^{deg_F(v) − 1}),
where the implicit polynomial degree is uniform over F and depends only on the arities and tags present in R (and on whether we enforce vertex labels). In particular, for bounded-degree patterns this yields gadgets of size polynomial in |V_F|, whereas for unbounded degree an exponential blow-up is inherent to the symmetry needed to maintain R-indistinguishability while enabling a controlled twist.

For node- or edge-rooted separations (equivariant readouts), we additionally require that the distinguished root tuple(s) be anchored in a rigid structure that cannot be permuted away by an automorphism of the gadget. A convenient method is : we adjoin a labeled rigid container (typically a (k + 1)-clique with unique label pattern) and attach the root vertex (or root endpoints) in a prescribed way. This increases the gadget size by an additive constant depending only on k and the label alphabet, leaving the asymptotic bound above unchanged:
|V_{G_F^root}| = |V_{G_F}|+O_k(1), |V_{H_F^root}| = |V_{H_F}|+O_k(1).
We stress that these are bounds for a generic obstruction. In concrete instantiations, smaller separating pairs often exist; our point is that the meta-theorem supplies a uniform construction with explicit resource dependence on F.

The equivalence χ_𝒜(G) = χ_𝒜(H) ⇔ hom (F, G) = hom (F, H) for all F ∈ F_𝒜 is only as useful as our ability to recognize when a given pattern F lies in F_𝒜. In general, F_𝒜 is defined as a maximal family extracted from the unfolding basis U_𝒜, and without additional structure one should not expect a simple graph-theoretic description. Nevertheless, in the instantiations of the family aligns with standard width notions, and membership becomes decidable by known algorithms.

For k = 1 message passing, we have F_𝒜= forests. Membership is immediate: F ∈ F_𝒜 iff F is acyclic, decidable in time O(|V_F|+|E_F|) by a DFS.

For k-tuple refinement with the standard one-coordinate replacement template, we have $F_{\mathcal{A}}=\{F:\tw(F)\le k\}$. For fixed k, membership is decidable in linear time in |V_F| via Bodlaender’s algorithm (with constants depending superexponentially on k, as usual). If k is part of the input, deciding $\tw(F)\le k$ is NP-hard, and thus no fully general polynomial-time membership test should be expected unless P = NP. This delineates a practical regime: one fixes the architecture parameter k and then performs membership tests for patterns F of interest.

For the pair-based local models characterized via NED-like decompositions, the situation is analogous. Once the endpoint discipline is fixed (endpoint-shared, strong, almost-strong), membership reduces to the existence of a decomposition of F into constant-size bags with prescribed adhesion patterns. Such existence can be checked by dynamic programming over candidate decomposition trees, or, equivalently, by a finite-state search over partial decompositions, yielding a procedure polynomial in |V_F| for each fixed bag size/adhesion template. Here again, the difficulty is not decidability but parameter dependence: if we enlarge the permitted bag vocabulary (or allow it to be part of the input), the recognition problem can encode hard constraint satisfaction tasks.

Given an explicit membership test for the structural characterization of F_𝒜, we may also extract the for non-membership and feed it into the gadget construction. Concretely, for width notions the failure of membership can be witnessed by an obstruction: for forests, a cycle; for treewidth ≤ k, a minor-like certificate of $\tw(F)>k$ or, operationally, failure of a k-tree decomposition search; for NED variants, a minimal subpattern violating the endpoint adhesion rule. In our framework, these obstructions correspond to a Spoiler win in the R-pebble game on F against the unfolding vocabulary. Once such a certificate is available, the gadget Gad(F) is constructed by composing local widgets prescribed by the certificate and then applying the twist along a distinguished interface.

The resulting overall runtime is polynomial in the output size of the gadget graphs:
$$ \mathrm{time}(\textsc{ConstructSeparator}(F)) \;=\; \mathrm{poly}(|V_{G_F}|+|E_{G_F}|), $$
with the polynomial uniform over F for each fixed architecture 𝒜. The dependence on F is therefore dominated by the blow-up intrinsic to |V_{G_F}|, as discussed above. This separation between and is useful: it clarifies that the main barrier to small counterexamples is not implementability but the need for sufficiently symmetric gadgets.

Even when F ∈ F_𝒜, and thus hom (F, ⋅) is determined by χ_𝒜(⋅), this does not imply that we can efficiently hom (F, G) from χ_𝒜(G) in general. Theorem~2 proceeds via a linear reduction between counts of unfolding-object types and homomorphism counts of induced patterns, but the relevant linear system is conceptual: it is finite for each fixed radius and vocabulary, yet its size grows quickly with the unfolding depth required for stabilization and with the number of unfolding types.

In regimes where F_𝒜 is a bounded-width class, there are nevertheless efficient counting algorithms independent of χ_𝒜. For instance, for each fixed k and each fixed pattern F with $\tw(F)\le k$, the homomorphism number hom (F, G) can be computed in time O(|V_G|^k + 1) by standard tree-decomposition dynamic programming. Thus the content of the meta-theorem should be read as a : it characterizes exactly which patterns are in principle recoverable from χ_𝒜, without asserting that the recovery map is computationally efficient when given χ_𝒜(G) alone.

Finally, we record three barriers that delimit what one can hope to automate uniformly across architectures.

First, if an architecture were universal up to isomorphism, i.e. χ_𝒜(G) = χ_𝒜(H) ⇔ G ≅ H, then deciding representation equality would solve Graph Isomorphism. Any attempt at a fully general, efficiently checkable characterization of F_𝒜 for such architectures would therefore amount to progress on GI-hard instances. This motivates restricting attention to refinement-based architectures whose neighbor template enforces a bounded-width unfolding vocabulary.

Second, even within refinement-based families, parameters that are treated as architecture hyperparameters (such as tuple dimension k) are not benign from the standpoint of meta-automation. As noted above, allowing k to vary turns membership in the induced family into the treewidth decision problem, NP-hard in k. Thus any general tool that takes (𝒜, F) as input and outputs whether F ∈ F_𝒜 must either fix such parameters in advance or accept worst-case hardness.

Third, the underlying counting problems remain hard: for general patterns F, computing hom (F, G) is #P-hard. Our equivalence statements therefore cannot be interpreted as providing efficient counting algorithms from representations; rather, they certify that no method observing only χ_𝒜(G) can, in the worst case, recover counts for patterns outside F_𝒜, while for patterns inside F_𝒜 recovery is information-theoretically possible (and algorithmically possible under additional bounded-width assumptions).

These complexity considerations will guide our discussion in the next section: they indicate which kinds of expressivity claims can be made robustly for modern architectures, and which extensions (learned neighbor selection, approximate aggregation, and hybrid attention/refinement mechanisms) require revisiting the axioms and the unfolding-based proof strategy.

10. Discussion and 2026 Outlook: how the meta-theorem can underwrite expressivity claims for modern models; open directions (learned neighborhoods, approximate variants, transformers).

We close by articulating what the meta-theorem buys us as a for modern graph models, and by isolating the main points at which contemporary architectures fall outside the present axiomatic envelope. Our aim is not to proclaim universality, but to make explicit which claims can be justified model-specific ad hoc arguments and which claims require new technical ideas.

A recurrent difficulty in the expressivity literature is that statements are often phrased at the level of ``model class = some refinement power’’ without clarifying which implementation choices are essential (hashing vs. continuous MLPs, sum vs. multiset normalization, finite vs. infinite precision, etc.). The present framework promotes a different view: once a forward pass can be cast as a stabilizing refinement-with-aggregation procedure with a fixed neighbor template, the architecture inherits a homomorphism family F_𝒜 (and rooted analogues) that exactly characterizes the information content of its stable representation.

Operationally, this gives a clean rule for expressivity claims of the form
χ_𝒜(G) = χ_𝒜(H) ⇒ hom(F, G) = hom(F, H) for all F ∈ 𝒞.
Such a statement is true if and only if 𝒞 ⊆ F_𝒜, and it is in the sense that every F ∉ F_𝒜 admits a uniform counterexample pair indistinguishable to 𝒜 but separated by hom(F, ⋅). In particular, one does not need to guess the ``right’’ test family by analogy with 1-WL or k-WL; the unfolding basis induced by R dictates it.

From the standpoint of model design, we may read F_𝒜 as the set of patterns for which the architecture can, in principle, act as a sufficient statistic (at the level of stable representations). If an application demands sensitivity to a pattern outside F_𝒜, then no amount of training data or parameter tuning can overcome the obstruction while keeping the same R-template and refinement semantics.

Many architectures deployed in 2024–2026 fit the axioms once we separate from . For example, message-passing GNNs with any continuous update (MLPs, residuals, layer normalization) still instantiate a refinement operator provided that (i) aggregation is permutation-invariant, (ii) the update is equivariant, and (iii) we interpret ``colors’’ as stable equivalence classes under the induced operator (i.e. we work information-theoretically, ignoring numerical collisions). Under this interpretation, the neighbor template is the sole driver of the unfolding vocabulary.

Hybrid models that mix local message passing with a global channel (graph token, virtual node, pooled context) also fall into the framework if the global channel is treated as an additional, isomorphism-invariant register whose update depends only on multisets of tuple-states. In that case, the unfolding objects become trees (or hypertrees) decorated with global annotations; the induced pattern family typically remains structurally bounded, but can strictly enlarge compared to the purely local case.

Equivariant readouts (node- or edge-level) are particularly clean in this language: rooted homomorphism families F_𝒜ⁿ, F_𝒜^e provide an exact account of which rooted patterns are determined by node/edge representations. Thus, claims such as ``the model can represent all functions of neighborhoods of radius r’’ are only meaningful once one specifies whether the neighborhood is defined by R on tuples (and hence by the unfolding), or by an extrinsic metric (e.g. shortest-path balls) that may not be preserved by the refinement dynamics.

A dominant trend is to the notion of neighborhood—via attention, learned edge weights, top-m selection, or structure learning modules that add/remove edges. This is precisely where axiom (A3) (fixed, isomorphism-equivariant neighbor template) becomes delicate.

There are two qualitatively different regimes.

We view the first regime as a promising direction for 2026: one can seek an axiom system where the template is not fixed but is drawn from a equivariant grammar of neighborhood rules, so that the induced guarded unfoldings remain finitely describable and yield a maximal homomorphism family.

Modern models often depart from exact multiset aggregation in three ways: (i) neighbor sampling for scalability, (ii) truncation to a bounded number of rounds, and (iii) real-valued computations with finite precision. Each interacts differently with our axioms.

Neighbor sampling breaks (A2) and (A5) if interpreted literally: the refinement state becomes random, stabilization is not guaranteed on a fixed instance, and representation equality becomes a probabilistic statement. A natural extension is therefore to replace determinacy by distributional invariants, studying when
ℒ(χ_𝒜(G)) = ℒ(χ_𝒜(H))
implies equality of certain homomorphism counts or of low-degree graph statistics. One expects the induced ``pattern family’’ to become a family of patterns weighted by sampling probabilities, akin to unbiased estimators for subgraph counts. Establishing maximality in this stochastic setting would require a gadget theory that equalizes the distribution of sampled unfoldings while perturbing targeted hom counts in expectation.

Truncation to T rounds yields a well-defined, finitely bounded unfolding family U_𝒜^( ≤ T). In this case we can recover a clean theorem: representation equality after T rounds is equivalent to equality of homomorphism counts over the patterns induced by depth-T unfoldings. The maximal family is then explicitly finite (up to isomorphism types of bounded depth), and the separation gadgets can be chosen of bounded radius. Thus, for truncated models, the theory becomes constructive, at the cost of a weaker invariance.

Finite precision and continuous embeddings are subtler. Our meta-theorem is intentionally information-theoretic; it identifies what is determined by the refinement operator, not what is numerically stable. For practitioners, this suggests a two-step interpretive discipline: first identify F_𝒜 at the level of the induced symbolic refinement; then separately analyze whether the chosen parameterization approximates that refinement with acceptable error. Bridging these two steps is an open interface between expressivity theory and numerical analysis.

Graph Transformers with full self-attention do not fit the local neighbor-template paradigm unless one treats the neighborhood of each tuple as ``all tuples,’’ which collapses the unfolding perspective into a global computation. Even then, two features of Transformers complicate matters: (i) attention scores induce a (weighted) aggregation rather than a pure multiset, and (ii) positional/structural encodings (Laplacian eigenvectors, random features, shortest-path biases) inject additional information that may not be definable from the graph structure alone.

There is nevertheless a plausible path forward. One may define an RWA-like model over a richer signature in which the neighbor relation includes global edges annotated by equivalence classes of relative positions, and in which aggregation carries weights drawn from a finite palette (or from an abstract commutative semiring). The corresponding unfolding objects would be hypertrees or, in the fully global case, canonical decompositions akin to color refinement on complete relational structures. The question for 2026 is then: can we recover a maximal family of test structures that plays the role of F_𝒜 when the underlying logic is no longer purely homomorphism logic over graphs but over enriched signatures?

We emphasize a limitation: if positional encodings leak vertex identities (even approximately), then the induced representation can become as hard to characterize as graph isomorphism itself. In that regime, the correct conclusion is not that the framework fails, but that any maximal-family characterization will inherit GI-level barriers.

We conclude with a short list of technically precise problems suggested by the above discussion.

Our position is that the unfolding-based viewpoint remains the correct organizing principle: as architectures evolve, the relevant question is which witness objects they canonically induce, and which pattern counts those witnesses linearly generate. The meta-theorem provides the stable core; the outlook is to enlarge the space of admissible unfoldings without sacrificing maximality and separability.

When Does Homomorphism Expressivity Exist? A Meta-Theorem for Stabilizing Refinement GNNs

Metadata

Table of Contents

Content

1. Introduction: why WL-style qualitative expressiveness is insufficient; recap homomorphism expressivity as a complete invariant; goal of a meta-theorem and ‘expressivity certificate’ conditions.

2. Preliminaries: graphs, rooted graphs, homomorphisms and spasms; refinement procedures; categorical product; overview of Dell–Grohe–Rattan-style linear relations used in completeness proofs.

3. The RWA Architecture Class: formal definition of tuple-state refinement with a neighbor relation template; invariant vs equivariant outputs; examples (MPNN, Subgraph, Local k, folklore).