← Back

Preprocessing Leakage in Graph Learning Pipelines: A (Tpre, rpre) Audit via Locality Collapse

Metadata

Table of Contents

  1. 1. Introduction: why preprocessing is the modern expressiveness confound (pipelines, tools, retrieval), and why WL-style comparisons fail without budgets
  2. 2. Background: RL-CONGEST separation of preprocessing/message passing; locality (k-hop) vs global properties; examples from biconnectivity + resistance distance
  3. 3. Formal setup: graph instance families, tasks (node/edge/graph), preprocessing functions, and a unified preprocessing budget model
  4. 4. Defining leakage: the post-preprocessing locality radius r_pre and the leakage report tuple (T_pre, r_pre, verifier class); invariance to architecture choice (GNN vs Transformer)
  5. 5. General locality-collapse theorem: any efficiently computable task can be made locally trivial if preprocessing budget matches task complexity; implications for ‘expressiveness’ claims
  6. 6. Conditional lower bounds / non-triviality: when can a polynomial-time preprocessor not collapse locality (NP-hardness and oracle separations); what must be assumed to avoid vacuity
  7. 7. Case studies with tight locality bounds: (a) effective resistance and biconnectivity, (b) shortest-path distances and distance-definable tasks, (c) positional encodings, (d) motif/subgraph counting preprocessors and their complexity
  8. 8. Leakage auditing algorithms: (i) certificate-based analytic auditing (prove r_pre upper bounds), (ii) empirical auditing via ego-graph baselines and feature-only baselines; recommendations for benchmark design
  9. 9. Implementation & experiments (recommended): standardized ablations showing locality collapse; tool-augmented preprocessing scenarios; reporting guidelines
  10. 10. Discussion and open problems: measuring ‘partial leakage’, approximate distances, public randomness, and extending audits to foundation-model pretraining pipelines

Content

1. Introduction: why preprocessing is the modern expressiveness confound (pipelines, tools, retrieval), and why WL-style comparisons fail without budgets

In contemporary graph learning practice we rarely deploy a ``bare’’ learning architecture on an input attributed graph G = (V, E, X). Rather, we employ a two-stage pipeline: an explicit preprocessing routine fpre first transforms G into a derived graph G = (V, E, X), after which a downstream predictor (a message-passing GNN, a graph Transformer, a kernel method, or a retrieval-based mechanism) operates on G. This division of labor is often motivated by engineering considerations—data cleaning, feature normalization, positional encodings, subgraph extraction, augmentation, or indexing—but it has a direct mathematical consequence: the information available to the learner is not constrained by the inductive bias of the architecture alone, but by the joint power of (fpre, learner).

We emphasize that fpre is not merely feature engineering'' in the weak sense of composing a fixed map \(\mathbb{R}^d\to\mathbb{R}^{d'}\) at each node. Modern preprocessors can alter topology (adding edges to represent proximity, adding supernodes, sparsifying or densifying), can attach global numerical summaries (all-pairs distances, spectral embeddings, effective resistances), and can call external tools (linear solvers, shortest-path routines, motif counters, canonicalization packages) whose computational strength is not reflected in the stated model class. In many benchmark settings these operations are silently permitted, and their cost is not accounted for. Consequently, when we attribute performance differences toexpressiveness’’ of the learner, we may in fact be measuring expressiveness of the preprocessing pipeline.

This confound becomes acute in regimes where preprocessing is tantamount to solving the task. At the extreme, if a preprocessor computes the target labels (or a succinct certificate thereof) and stores them in X, then the downstream model need only read off this information locally. Such constructions are not pathological; they capture a spectrum of common practices. For instance, a pipeline that augments each node with its shortest-path distances to a designated set of anchors can make long-range reachability predicates locally recognizable. A pipeline that attaches spectral or diffusion-based positional encodings can expose global connectivity patterns through constant-radius neighborhoods in G. A pipeline that stores subgraph counts or graphlet histograms can reveal global motif structure to a model that otherwise only aggregates within a few hops. From the standpoint of what the learner must compute preprocessing, these pipelines reduce the ``effective’’ difficulty of the task by pushing computation into fpre.

We therefore require a vocabulary that separates architectural limitations from preprocessing power. The natural notion, and the one we adopt, is locality after preprocessing. Informally, a node-level label is k-local on G if it is determined by the induced k-hop neighborhood G[NkG(u)]; analogous definitions apply to edge- and graph-level tasks. Once this is formalized, we can ask a concrete question: what is the minimum radius rpre(P, fpre) at which the composed task P ∘ fpre becomes local on the preprocessed instance family? A small value of rpre indicates that preprocessing has collapsed the task into a local readout problem, regardless of the downstream architecture.

This perspective also clarifies why comparisons against Weisfeiler–Leman (WL) type baselines can fail to diagnose genuine architectural advantages when preprocessing is unconstrained. The standard narrative is that message-passing GNNs are bounded (in various senses) by 1-WL, higher-order GNNs correspond to higher-dimensional WL variants, and graph Transformers may or may not exceed these limitations depending on their positional encodings and attention mechanisms. Yet WL comparisons implicitly assume that the only available input is the raw G with its original features X. If we allow an arbitrary fpre, then the WL hierarchy is no longer a meaningful ceiling: preprocessing can write into X any globally computed invariant, including invariants that are strictly stronger than any fixed-dimensional WL refinement on the original graph. In that case, a 1-WL-equivalent architecture run on G can appear to ``beat’’ 1-WL on G, not because the architecture has transcended WL, but because the task has been rewritten into a locally checkable form.

Even more pointedly, the failure mode is not merely that preprocessing can add global information; it can add exactly the information needed for the prediction objective. In a supervised setting, it is common to tune preprocessing choices on validation performance. Without an explicit accounting of the preprocessing budget and its informational content, one can inadvertently (or deliberately) bake label information into features through leakage channels such as near-duplicate retrieval, graph-level metadata joined from external sources, or computed statistics that are highly correlated with the label and require global computation to obtain. The resulting system may generalize within the benchmark but does not support claims about the intrinsic representational power of the learner.

We thus advocate that any claim of expressiveness or limitation should be stated relative to a declared preprocessing regime. Concretely, we must specify: (a) what computations fpre is allowed to perform, (b) the associated budget Tpre(G) measured in an explicit centralized model (optionally including oracle calls), and (c) the post-preprocessing locality radius rpre(P, fpre) (exact, certified upper bound, or empirical diagnostic). This reporting does not forbid powerful preprocessing; rather, it makes transparent when performance is achieved by global computation performed upstream.

Finally, we note that the above concerns apply equally to GNNs and graph Transformers, and extend beyond neural networks. Retrieval-augmented systems, for example, can be viewed as preprocessors that build indices and attach retrieved exemplars or templates as additional features; this is a form of external computation whose complexity is carried by Tpre. Similarly, tool-using pipelines that call shortest-path solvers, linear system solvers, or motif counters are preprocessors with substantial algorithmic power. Without a budgeted analysis, it is impossible to distinguish whether a downstream learner is exploiting a genuinely nonlocal reasoning mechanism or merely consuming a locally accessible encoding of a globally computed quantity.

The remainder of this work formalizes these intuitions. We treat preprocessing as a first-class computational stage and analyze when it can, and cannot, collapse global graph properties into local decision rules. This leads to a principled separation between architectural expressiveness'' andpreprocessing leakage,’’ and provides a language in which fairness of comparisons can be assessed.


2. Background: RL-CONGEST separation of preprocessing/message passing; locality (k-hop) vs global properties; examples from biconnectivity + resistance distance

A convenient abstraction for understanding graph learning pipelines is the analogy with round-based distributed computing. A depth-k message-passing architecture (MPNN) can be viewed as performing k synchronous communication rounds on G: in each round, every node sends a message to its neighbors, aggregates received messages, and updates its internal state. When we bound the per-edge message size to O(log n) bits (or, more generally, to polylog(n)), this abstraction aligns with the model; if messages are unbounded, it aligns with the model. In both cases, after k rounds, the state at a node u is a function only of the labeled induced neighborhood G[NkG(u)], hence any node-level prediction produced at u is necessarily k-local in the sense that it depends only on NkG(u) and the features therein. This observation, which underlies the standard locality-based limitations of MPNNs, is independent of training and applies equally to deterministic and randomized algorithms (conditioning on the randomness).

The role of preprocessing is naturally interpreted as an additional stage occurring the round-limited message passing begins. In distributed terminology, this resembles augmenting the instance with (node/edge labels) or with a locally checkable produced by a powerful prover, after which a limited-round verifier checks the property. In our setting, the preprocessor fpre plays the role of this prover: it may perform substantial centralized computation and then write the outcome into X, or modify the topology to make relevant information accessible within small radius. Once such a stage is permitted, the effective limitation of round-limited message passing is no longer controlled solely by the learner depth; instead, it is controlled by the locality radius of the task P ∘ fpre. The point of the (Tpre, rpre)-view developed later is that it makes this separation explicit: the downstream architecture may still be k-local in the raw graph G, but it can become O(1)-local in the derived graph G because preprocessing has already performed the global work.

The distinction between local and global properties is standard in both distributed computing and descriptive complexity, and we recall the central mechanism: . Fix k ≥ 0. Two rooted graphs (G, u) and (H, v) are k-locally indistinguishable if there exists an isomorphism between the rooted induced neighborhoods G[NkG(u)] and H[NkH(v)] preserving all node and edge features. Any k-local node-level rule must assign the same label to u and v on such instances. Therefore, to show that a task is not k-local on some family, it suffices to construct pairs of instances with identical k-hop views around the relevant node/edge but different ground-truth labels. For edge- and graph-level tasks, one uses analogous indistinguishability statements for neighborhoods of edges or for all nodes simultaneously. In particular, many connectivity-type properties are global: for every fixed k, one can find graphs with large girth or large diameter in which all k-hop neighborhoods look like trees, while the property of interest depends on a cycle or on a distant connection.

Biconnectivity provides a concrete example of this phenomenon. Consider the edge-level property Bridge(G) ∈ {0, 1}E indicating whether an edge is a bridge, and the node-level property Cut(G) ∈ {0, 1}V indicating whether a node is an articulation point. On raw, unannotated graphs, neither property is constant-radius local on unrestricted families. For bridges, fix k and take a cycle CL of length L ≫ k and a path PL of length L. Choose an edge e far from the endpoints of PL. Then the induced k-hop neighborhood around e in both CL and PL is a length-(2k + 1) path with all degrees equal to 2 internally, hence the rooted edge neighborhoods are isomorphic; yet e is a bridge in PL and not a bridge in CL. Thus bridge detection is not k-local for any fixed k on this family. A similar construction witnesses globality for cut vertices: take a cycle CL and a ``lollipop’’ obtained by attaching a long path to the cycle at a node u; for L ≫ k, the k-hop neighborhood around u can be made identical while u is a cut vertex only in the lollipop. These are representative of a general pattern: local neighborhoods cannot certify the presence or absence of a distant alternative route.

The same tasks become locally decidable if preprocessing appends sufficiently informative global features. Effective resistance (resistance distance) is a canonical example because it encodes global connectivity through a numerical invariant computable from the Laplacian pseudoinverse. Let R(u, v) denote the effective resistance between nodes u and v in an unweighted connected graph, interpreted as the voltage drop induced by one unit of current injected at u and extracted at v. Suppose the preprocessor fpre augments G with exact values R(u, v) (e.g., stored as edge features on the complete graph on V, or accessible on demand). Then the bridge predicate becomes 0-local: an existing edge (u, v) ∈ E is a bridge if and only if R(u, v) = 1. Consequently, after preprocessing, a local rule can label each edge by reading a single scalar. Likewise, cut-vertex detection becomes 1-local: a node u is a cut vertex if and only if there exist neighbors s, t ∈ N(u) such that the resistance metric satisfies the additive condition
R(s, t) = R(s, u) + R(u, t),
which witnesses that all current (and hence all paths in an appropriate metric sense) between s and t must pass through u. This condition can be checked by a 1-hop local verifier that inspects u, its neighbors, and the pairwise resistances among {u} ∪ N(u). In both cases, we obtain a sharp contrast: the raw tasks are global on natural instance families, but once a heavy global precomputation is allowed, the same tasks collapse to constant-radius locality.

This example also clarifies why budgets must accompany locality claims. Exact all-pairs resistance distances typically require solving linear systems or computing a Laplacian pseudoinverse, with worst-case cost far beyond what is implicit in a ``k-layer message passing’’ description, and storing R explicitly requires Θ(n2) bandwidth in X. From the standpoint of attribution, a downstream predictor that succeeds using such features is exploiting the global computation performed by fpre, not circumventing locality limitations by architectural means. The formal setup in the next section turns these observations into a measurable quantity rpre(P, fpre) paired with an explicit preprocessing budget Tpre(G), allowing us to state and compare expressiveness claims relative to declared preprocessing power.


3. Formal setup: graph instance families, tasks (node/edge/graph), preprocessing functions, and a unified preprocessing budget model

We fix a universe of attributed graphs and make explicit which parts of a learning pipeline are treated as , which are treated as , and which are treated as . An (attributed) graph is a triple G = (V, E, X) with |V| = n and |E| = m, where X denotes the collection of node/edge features (and, when convenient, a graph-level feature vector). We allow directed or undirected edges and allow multigraphs, but our examples and locality statements are most naturally read for simple undirected graphs. An 𝒢 is any set of graphs closed under the restrictions we wish to impose (e.g., connectivity, bounded degree, planarity, or a fixed feature schema). All locality and budget statements are parametrized by 𝒢; without such a family, even the meaning of ``bounded radius’’ claims is typically vacuous.

A graph prediction task (or property) is a mapping P : 𝒢 → 𝒴, with three canonical output types:
node-level: 𝒴 = {0, 1}V,   edge-level: 𝒴 = {0, 1}E,   graph-level: 𝒴 ∈ {0, 1} or ℝ.
We write P(G)u for the node label of u, P(G)(u, v) for the edge label, and P(G) for graph-level outputs. Nothing in the setup requires Boolean labels; multiclass labels or real-valued regression targets can be handled by replacing {0, 1} with an appropriate codomain. We emphasize that P is a task specification; it need not be efficiently computable, although the complexity TP(G) in a centralized model will play a role in our leakage upper bounds.

A preprocessing procedure is a function
fpre : 𝒢 → 𝒢,   G ↦ G = (V, E, X).
We allow fpre to change topology (add/delete edges), to add auxiliary nodes, and to replace or augment features. In particular, fpre may encode global information in X (e.g., distance matrices, spectral quantities, motif counts) or may create shortcut edges to reduce distances. To compare outputs to the original task P(G), we assume that fpre provides an explicit identification of the original prediction items inside the derived graph, e.g., an injective map ιV : V ↪ V for node tasks and an injective map ιE : E ↪ E for edge tasks, or, equivalently, a binary is-original'' tag plus a stable ordering. This is a mild bookkeeping requirement: without it, one could triviallysolve’’ a task by permuting items and changing the meaning of coordinates.

For a graph G = (V, E, X), we write NkG(u) for the k-hop neighborhood of a node u ∈ V, and for an edge (u, v) ∈ E we use NkG((u, v)) := NkG(u) ∪ NkG(v). The corresponding labeled induced subgraph is denoted G[NkG(u)] (or G[NkG((u, v))]), including all node/edge features restricted to the induced subgraph. In all definitions below, isomorphisms are required to preserve features exactly; thus locality is always a statement about the feature schema X.

We formalize a local post-preprocessing rule as a family of functions that read only bounded-radius induced neighborhoods. For node-level prediction on G, a rule gk assigns to each u ∈ V an output
gk(G, u) = ψ (G[NkG(ιV(u))], ιV(u))
for some function ψ invariant under rooted isomorphism of the labeled neighborhood. For edge-level tasks we analogously require
gk(G, (u, v)) = ψ (G[NkG(ιE(u, v))], ιE(u, v)).
For graph-level tasks, we adopt a uniform convention that the preprocessor may introduce (or designate) a readout node ρ ∈ V and that the output is computed as gk(G) = ψ(G[NkG(ρ)], ρ). This convention is without loss for our purposes: any constant-size global readout (a special token, a pooled feature vector, or an appended ``graph node’’) is representable by preprocessing, and our objective is precisely to attribute such representational power to fpre.

We say that P ∘ fpre is on 𝒢 if there exists a k-local rule gk such that for all G ∈ 𝒢, the outputs agree on the original prediction items:
P(G)u = gk(fpre(G), u)  ∀u ∈ V,   P(G)(u, v) = gk(fpre(G), (u, v))  ∀(u, v) ∈ E,
and similarly for graph-level tasks. We postpone to the next section the definition of the such k, which will be our leakage radius rpre(P, fpre).

To compare preprocessors that perform qualitatively different operations (distance computations, spectral routines, oracle calls, external solvers), we associate to each fpre a budget Tpre(G). Our default choice is worst-case running time in the word-RAM model with O(log n)-bit words, with the understanding that feature values written to X must be representable at the declared precision. When preprocessing uses additional resources, we absorb them additively into Tpre, e.g.,
Tpre(G) = tRAM(G) + ∑iαiqi(G) + β ⋅ size(X(G)),
where qi(G) counts the number of queries to an oracle/subroutine i (such as a shortest-path oracle, a linear-system solver, or a #SubGI oracle), and size(X(G)) measures the number of machine words written into node/edge features (and, if relevant, the number of auxiliary nodes/edges created). The coefficients αi, β are bookkeeping weights that can be set to 1 when all costs are measured in a common unit; we keep them explicit only to emphasize that time'' andfeature bandwidth’’ are both part of preprocessing power. In many of our statements we suppress polylogarithmic factors and write (⋅).

Independently of learning, we fix a centralized complexity measure TP(G), defined as the worst-case running time of an algorithm computing P(G) in the same computational model. The pair (Tpre, TP) will serve as the comparison that makes leakage statements meaningful: if Tpre is allowed to match TP up to (1) factors, then fpre can in principle compute the labels and encode them locally; conversely, if Tpre is restricted, then nontrivial lower bounds on post-preprocessing locality become possible.

Finally, to distinguish ``local but computationally unbounded’’ readout rules from local rules that plausibly correspond to small neural modules, we parameterize local rules by a verifier class VClass (e.g., TIME(poly), TC0, or a fixed-depth MLP applied to an adjacency-encoded ego graph). This parameter will enter our reporting protocol but does not affect the definition of k-locality itself: locality constrains information is accessible (bounded-radius induced neighborhoods), while VClass constrains that information may be processed.

With these conventions in place, we can define leakage as the minimal post-preprocessing radius needed to decide P ∘ fpre, and we can report it together with the preprocessing budget and the verifier class, independently of the downstream architecture.


4. Defining leakage: the post-preprocessing locality radius r_pre and the leakage report tuple (T_pre, r_pre, verifier class); invariance to architecture choice (GNN vs Transformer)

Fix an instance family 𝒢, a task P : 𝒢 → 𝒴, and a preprocessing function fpre : 𝒢 → 𝒢.
The preceding definitions single out a notion of after preprocessing (bounded-radius induced neighborhoods in G) and a notion of (agreement with P(G) on the original prediction items via the identification maps).
We now compress these into a single numerical quantity measuring how much of the task can be decided after preprocessing.

For each k ≥ 0, recall that ``P ∘ fpre is k-local on 𝒢’’ means that there exists a k-local decision rule gk whose outputs coincide with P(G) on all original nodes/edges (or on the designated graph readout) for every G ∈ 𝒢.
We define the as
rpre(P, fpre; 𝒢) := inf { k ∈ ℕ : P ∘ fpre is k-local on 𝒢 },
with the convention rpre(P, fpre; 𝒢) = ∞ if no finite k suffices.
When 𝒢 is fixed throughout, we suppress it and write rpre(P, fpre).

Two features of this definition are deliberate.
First, rpre is a radius over 𝒢: it asserts that a single locality bound works uniformly for all instances in the family, rather than only on average.
Second, rpre is measured preprocessing, i.e., neighborhoods are taken in G = fpre(G) and include the augmented feature schema X.
Thus rpre attributes any locality reduction caused by shortcut edges, auxiliary nodes, or globally-computed features to the preprocessor, not to the downstream model.

The definition quantifies over of some local rule gk (with the required rooted-isomorphism invariance on labeled ego graphs).
We do not assume that a learner will find gk, nor that gk is statistically estimable from limited data.
This is intentional: leakage is meant to be an statement.
If rpre(P, fpre) is small, then there exists a bounded-radius procedure that recovers the labels from G; the downstream architecture, optimization, and sample size only govern whether this procedure is realized in practice.

Locality alone does not specify how costly it is to process the induced neighborhood.
A rule that reads G[Nk(⋅)] might still solve an NP-hard subproblem within that neighborhood, or might require arithmetic with extremely high-precision real features written by preprocessing.
Accordingly, we parameterize our leakage statements by a VClass, a computational class in which gk is required to lie (e.g., TIME(poly), TC0, or a fixed-size neural module applied to an adjacency-encoded ego graph).
Formally, given VClass we may define the restricted radius
rpreVClass(P, fpre) := inf { k : ∃ gk ∈ VClass such that gk witnesses k-locality of P ∘ fpre },
again taking the value if no such witness exists.
In the absence of an explicit VClass, our default is the existence-based rpre above.

We package the preprocessing cost, the induced locality, and the computational nature of the local rule into a single report:
Leak(P, fpre) = (Tprerpre, VClass, notes).
Here Tpre(G) is the declared preprocessing budget (time/oracle/bandwidth, as specified in the setup), while rpre is either the exact radius or a certified upper bound rpre ≤ k obtained by exhibiting an explicit k-local rule.
The field notes is reserved for the bookkeeping that determines whether the report is meaningful and comparable across works: which instance family 𝒢 is in force, how auxiliary nodes/edges are tagged, whether feature values are exact or approximate, and whether the stated radius is proved or only observed empirically (e.g., via ego-graph baselines constrained to radius k).
This tuple is the object we will later compare across preprocessors; in particular, two pipelines with identical downstream architectures but different fpre should not be viewed as comparable unless both are accompanied by such a report.

A central point is that Leak(P, fpre) is defined of whether the subsequent learner is a message-passing GNN, a graph Transformer with global attention, or any other architecture.
The reason is structural: our leakage question is not can a given model class compute \(P\) from \(G''\)?'' but rather \[ \text{is P already locally decidable on the representation G?’’}
]
If the answer is yes with small radius, then any sufficiently expressive architecture (including ones with global access) has, in principle, an easy path: it may simply implement (or approximate) the local rule gk and ignore long-range computation.
Conversely, if the answer is no (large rpre or within a plausible VClass), then even a globally attentive model is being helped by hidden label information in X at small radii, and any success must come from genuine global inference or from nonlocal access patterns not captured by our locality model.

We emphasize that this invariance is not a claim that all architectures behave the same; rather, it is a claim about .
Whatever advantage a graph Transformer gains from receiving G rather than G is, by definition, part of preprocessing power whenever it is witnessed by a small rpre.
In this sense, rpre is an architecture-agnostic measure of ``how much of the task has been compiled into the input.’’

The quantity rpre should be read as a locality-collapsing effect of preprocessing: rpre = 0 means the labels are readable from single nodes/edges (or from a constant-size readout token), while rpre = 1 means a node’s label can be verified from its immediate neighborhood in G, and so on.
When rpre is small and Tpre is large, it becomes difficult to attribute downstream performance to ``expressiveness’’ of the learner; the task may have been made trivial by preprocessing.
The next section formalizes this intuition by showing that, when Tpre is allowed to match the centralized complexity TP, locality collapse is unavoidable in the strongest possible sense.


5. General locality-collapse theorem: any efficiently computable task can be made locally trivial if preprocessing budget matches task complexity; implications for ‘expressiveness’ claims

We now formalize the strongest possible version of the leakage phenomenon: once preprocessing is permitted to spend essentially the same resources as a centralized solver for the task, it can so that the remaining decision is local in the sense of rpre.
The conclusion is not that typical feature-engineering pipelines behave this way, but rather that , because otherwise the preprocessor can always trivialize the remaining computation.

Theorem~ makes precise the following implication: if the preprocessing stage is permitted to spend a budget comparable to that of solving the task centrally, then the post-preprocessing locality radius can always be forced down to a constant.
In that regime, any subsequent ``learning architecture’’ receives an input on which the task is already locally decidable.
Thus, downstream success cannot be attributed to long-range message passing, global attention, or higher-order aggregation—the nontrivial part of the computation has already occurred inside fpre.

Formally, consider any model class 𝒜 operating on G after preprocessing.
If we allow Tpre(G) ≥ Ω̃(TP(G)), then there exists a pipeline in which 𝒜 is superfluous: one may replace it by the constant-radius rule gk from the proof (with k ∈ {0, 1} depending on task type) while preserving exact correctness on all instances.
This is precisely the sense in which expressiveness statements become vacuous without an explicit preprocessing budget: any architectural advantage can be simulated by preprocessing that simply writes the answers into the representation.

Two clarifications are worth recording.
First, the collapse does not require large neighborhoods or subtle coding; the features required are only the label bits (or a bounded-precision real value for regression), placed on the relevant prediction items.
Second, the collapse is compatible with augmentations that substantially change topology: the definition of rpre measures locality G, so adding shortcut edges, hub nodes, or globally-computed features is exactly the mechanism by which leakage is expressed and must be accounted for.

In summary, Theorem~ provides an unconditional upper bound on rpre as a function of preprocessing resources: once Tpre is allowed to approach TP, locality can be made constant, and comparisons of downstream architectures become uninterpretable unless one simultaneously reports (and restricts) Tpre and the permitted feature/topology augmentations.


6. Conditional lower bounds / non-triviality: when can a polynomial-time preprocessor not collapse locality (NP-hardness and oracle separations); what must be assumed to avoid vacuity

Theorem~ is an statement: with a budget comparable to a centralized solver, preprocessing can always force rpre down to a constant by compiling answers (or checkable certificates) into the representation.
To obtain a framework in which rpre is not automatically vacuous, we must therefore work in regimes where such compilation would contradict standard complexity assumptions, or where compilation would require oracle power that we explicitly forbid or budget for.
We record two complementary barriers: NP-hardness (as a coarse-grained obstruction) and #P/PH oracle separations (as an obstruction to ``feature oracles’’ such as unrestricted motif counting).

Suppose P is NP-hard under polynomial-time many-one reductions.
If there existed a polynomial-time preprocessor fpre such that P ∘ fpre were 0-local with a polynomial-time verifier g, then the composite procedure
G ↦ fpre(G) ↦ g(fpre(G))
would decide P in polynomial time.
This yields the conditional barrier formalized by Theorem~3 in our theorem list: under the assumption P ≠ NP, no polynomial-time preprocessing scheme can reduce an NP-hard decision task to a trivial 0-local readout with an efficient verifier.
In other words, for NP-hard tasks, locality collapse is possible only by paying one of the following prices: (i) super-polynomial Tpre, (ii) a verifier g whose computational power is itself beyond polynomial time, or (iii) a relaxation of the task notion (approximation, promises, or distributional/average-case correctness).

Two remarks are necessary for correct interpretation.
First, this is a obstruction only in the weak sense that it rules out a single polynomial-time fpre that works on all inputs G ∈ 𝒢; it does not forbid heuristics that collapse locality on restricted families or typical instances.
Second, the statement is about solving: in learning settings, it remains possible (and common) that preprocessing makes labels from small-radius neighborhoods without making the task algorithmically easy in the worst case.
This is precisely why our reporting protocol allows an empirical estimate alongside certified bounds.

The preceding NP-hardness barrier is easily circumvented if we silently grant preprocessing access to computationally powerful oracles.
Many ``feature engineering’’ pipelines do precisely this in practice (e.g., calls to ILP/SAT solvers, exact substructure enumeration, or external chemistry toolkits), even when the downstream model is presented as a purely local learner.
Accordingly, our budget Tpre must be understood as : it must include not only RAM time, but also the number and type of external oracle calls.

A particularly clean separation arises for subgraph- and motif-counting features.
If fpre is permitted to output exact values of #SubGI(H ↪ G) for unrestricted pattern families , then (by Theorem~4 in our theorem list) preprocessing is #P-hard in general; moreover, even a single adaptive #P query together with polynomial-time computation suffices to decide all languages in PH (via standard Toda-style inclusions).
Thus, in the presence of such an oracle, locality claims become uninformative: the preprocessor can encode extremely hard global information into node/edge features, after which a constant-radius downstream rule can appear to have ``global expressiveness’’ while in fact consuming #P-level power upstream.
The appropriate resolution is not to ban all global features, but to and them: if exact motif or subgraph counts are used, the leakage report must reflect oracle complexity (or restrict to fixed-size patterns where exact counting is tractable on the relevant graph class).

Locality is a restriction (bounded radius in G), not a computational one.
Even with small rpre, one could in principle place all computational hardness into the local decision rule gk by allowing gk to be arbitrarily expensive.
This is why our leakage tuple includes an explicit verifier class VClass.
At the extreme, if VClass = TIME(exp), then any property becomes 0-local by simply copying the input G into a large feature string at a distinguished node and letting g0 solve the problem there.
To avoid this degeneracy, one must either (i) constrain VClass to a neural-plausible or circuit-like class (e.g. small MLP, TC0, low-depth circuits), or (ii) explicitly include the evaluation time Tgk in the accounting.
Theorem~5 captures the relevant simulation principle: an exact k-local certificate P(G) = gk(fpre(G)) implies a centralized algorithm with runtime (Tpre(G) + Tgk(G)).
Consequently, any (assumed) lower bound on TP transfers to a lower bound on Tpre for achieving small rpre with efficient verifiers.

The above discussion yields a minimal set of methodological commitments.
To interpret downstream performance as evidence of model expressiveness rather than preprocessing leakage, we must (a) specify Tpre including external calls and oracle access; (b) bound the output bandwidth of preprocessing (feature dimension, bit precision, and any O(n2) objects such as SPD/RD tables) since bandwidth is itself a channel for leakage; and (c) declare a verifier class VClass so that local but computationally heavy'' readouts are not conflated withlocal and easy’’ ones.
Under these declarations, the non-triviality claims become conditional but operational: if P is NP-hard, exact locality collapse with polynomial budgets would imply P = NP; if preprocessing uses #P-level primitives, then any resulting locality collapse must be attributed to that oracle power.
These principles guide the case studies that follow, where we certify tight rpre bounds for concrete, commonly used preprocessors and explicitly relate those bounds to their computational and informational costs.


7. Case studies with tight locality bounds: (a) effective resistance and biconnectivity, (b) shortest-path distances and distance-definable tasks, (c) positional encodings, (d) motif/subgraph counting preprocessors and their complexity

We now instantiate the (Tpre, rpre)-lens on several preprocessors that are routinely used (explicitly or implicitly) to ``globalize’’ local architectures. Our goal in each case study is to (i) identify a natural task family P whose apparent non-locality can be collapsed by the preprocessor, (ii) give an explicit constant-radius decision rule on G = fpre(G) (hence a certified upper bound on rpre), and (iii) make the required preprocessing budget and bandwidth explicit.

Let fpre augment G with exact effective resistance values R(u, v) between relevant pairs of vertices. In the strongest form, fpre stores all-pairs resistances, i.e. a matrix R ∈ ℝn × n whose entries are written into node features (e.g. node u carries the vector R(u, ⋅)). This representation is expensive in both computation and bandwidth: exact all-pairs resistances require access to the Laplacian pseudoinverse and, in general, incur Ω(n2) output size and at least Ω(n2) work simply to write the features. Nevertheless, once such features exist, several classical global connectivity predicates become constant-radius.

Concretely, by Theorem~2, bridge detection becomes 0-local: for an edge (u, v) ∈ E, one checks whether R(u, v) = 1. No neighborhood exploration is necessary; the label depends only on the edge feature R(u, v) (or equivalently on the endpoints’ feature vectors restricted to the coordinate v and u). Cut-vertex detection becomes 1-local: a node u is a cut vertex if and only if there exist neighbors s, t ∈ N(u) such that
R(s, t) = R(s, u) + R(u, t),
which can be verified by inspecting the induced subgraph on N1G(u). The tightness aspect is equally important: on plain graphs without such resistance features, both bridge and cut-vertex detection are global in the locality sense (one can witness non-locality on path-versus-cycle families where arbitrarily large neighborhoods are indistinguishable around a given edge/node while the global label flips). Thus the preprocessor has collapsed rpre from Ω(D) to 0/1, and the only plausible attribution for any downstream ``global’’ success is the resistance augmentation itself.

A second canonical preprocessor computes shortest-path distances. Let D denote the all-pairs shortest-path matrix (SPD). If fpre writes to each node u the vector D(u, ⋅) ∈ ℕn, then any node-level statistic that is a function of distances from u becomes 0-local. For example, eccentricity, closeness centrality, and t-reachability labels admit 0-local readout:
ecc(u) = maxv ∈ VD(u, v),   close(u) = (∑v ∈ VD(u, v))−1,   1{∃vD(u, v) ≤ t ∧ ϕ(v)}.
Similarly, graph-level quantities such as diameter diam(G) = maxu, vD(u, v) become 0-local we permit a designated readout location (e.g. a ``virtual’’ graph node whose features contain the entire matrix, or a convention that graph-level labels are stored in a global feature slot). Absent such conventions, one can still obtain 1-locality by adding a virtual node z adjacent to all vertices and storing global summaries at z.

The cost story is immediate: computing exact SPD is typically O(nm) in unweighted graphs by running BFS from each node, and the output bandwidth is Θ(n2) even before precision/encoding considerations. Moreover, rpre depends subtly on the verifier class. If the preprocessor stores ecc(u) directly as a scalar, the 0-local verifier can lie in a very weak class (constant-depth arithmetic on O(log n)-bit words). If instead it stores the entire vector D(u, ⋅), then the 0-local verifier must compute a maximum over n values, which is still polynomial-time but not ``neural-plausible’’ in the strongest circuit-theoretic senses unless we explicitly allow such aggregation. This illustrates why rpre must be reported together with VClass: the same distance features can be either a trivial leakage channel (when they already contain the answer) or a merely representation (when substantial computation remains in the verifier).

Positional encodings (PEs) are often presented as ``features,’’ but many of them behave, for locality, like topology modifications. For instance, Laplacian eigenvector encodings and related spectral coordinates require global computation; random-walk structural encodings require solving linear systems or simulating walks at scale. Once written into X, however, they can reduce the effective radius needed to separate nodes or to verify global certificates (e.g. by giving each node a nearly unique coordinate system).

More importantly, several PE pipelines add determined by global information (e.g. k-nearest-neighbor edges in the PE space, diffusion edges, or a global CLS''/virtual node). In our framework these are bona fide changes from \(G\) to \(G''=(V'',E'',X'')\). Adding a universal virtual node \(z\) connected to all \(u\in V\) immediately makes many graph-level predicates \(1\)-local at \(z\), because \(G''[N^{G''}_1(z)]\) contains all vertices (and often their full feature payload). Similarly, adding long-range PE-induced edges can reduce the diameter of \(G''\) and hence lower \(r_{\mathrm{pre}}\) even if no explicit labels are compiled. Thus PE design must be audited not only for runtime but also for \emph{induced locality}: an architecture may appear to havelong-range reasoning’’ while in fact operating locally on a graph whose long-range edges were inserted upstream.

Finally, motif-counting features provide a sharp illustration of oracle-priced leakage. If fpre outputs exact counts #SubGI(H ↪ G) for an unrestricted pattern family , then the preprocessor can encode extremely hard global information, and by Theorem~4 this capability is #P-hard in general (with standard consequences for PH under oracle access). From a locality perspective, once such counts are placed into X, the downstream task is frequently 0-local: a node-level label ``does u participate in any copy of H?’’ becomes a threshold test on a single feature coordinate, and graph-level existence/counting predicates become constant-radius tests at an agreed readout location. In contrast, when is restricted to fixed-size motifs (e.g. triangles, 4-cycles), exact counting may be tractable on specific graph classes, and the correct report is then a quantitative one: locality has been reduced, but only by paying a preprocessing cost that must be disclosed (and which may scale as n|V(H)| in the worst case if implemented na"ively).

Across effective resistance, SPD features, positional encodings, and motif counts, the same structural phenomenon recurs: once global invariants are written into X (or once E is augmented to shrink graph distances), the post-preprocessing radius rpre for many ``global’’ tasks becomes constant, often 0 or 1. The only scientifically meaningful question is therefore not whether a downstream learner can exploit these representations, but how much computational and informational power was consumed upstream to create them. This motivates the next section, where we present concrete leakage-auditing procedures that either certify small rpre analytically or diagnose practical locality collapse empirically via strictly local ego-graph baselines.


8. Leakage auditing algorithms: (i) certificate-based analytic auditing (prove r_pre upper bounds), (ii) empirical auditing via ego-graph baselines and feature-only baselines; recommendations for benchmark design

We now make the (Tpre, rpre)-report operational by giving auditing procedures that (a) small post-preprocessing locality whenever an explicit local rule exists, and (b) practical locality collapse when formal certificates are unavailable or inconvenient. Both procedures treat preprocessing as part of the computational budget and therefore allow us to attribute performance either to upstream computation or to downstream inference.

The analytic audit is a proof obligation: we exhibit an explicit k-local decision rule gk such that
P(G) = gk (fpre(G))   for all G ∈ 𝒢,
where gk is constrained to read only G[NkG(u)] for node/edge labels (or an agreed readout location for graph labels) and lies in a stated verifier class VClass. Once such a rule is written down, we have a certified bound rpre(P, fpre) ≤ k by definition. Crucially, we insist that the certificate uses information present in G; the audit is not a proof about the original G, but about what the preprocessor has already placed into topology/features.

In practice, we find it useful to separate three increasingly permissive certificate patterns.

For reporting, we recommend that a paper claiming improved long-range'' reasoning from preprocessed features include (a) the smallest \(k\) for which a certificate is known, (b) a short description of \(g_k\), and (c) an upper bound on the total preprocessing work \(T_{\mathrm{pre}}(G)\) including any calls to solvers/oracles and the cost of writing \(\Theta(n^2)\)-size outputs when applicable. When such a certificate exists, Theorem~5 further implies an attribution statement: if \(r_{\mathrm{pre}}\) is small with an efficient \(g_k\), then the pipeline implicitly contains an efficient centralized algorithm for \(P\), and any purportedexpressiveness’’ must be charged against preprocessing.

Analytic certificates are not always available for modern feature pipelines (e.g. learned positional encodings, tool-augmented preprocessors, or proprietary feature extractors). In such cases we propose an empirical audit that tests for locality collapse by training baselines that are to k-hop ego-graphs of G. The guiding principle is: if a classifier that cannot see beyond NkG(u) achieves near-saturated performance, then the task has effectively become k-local after preprocessing, regardless of whether we can prove it.

Concretely, for node-level tasks we consider, for each k ∈ {0, 1, …, Kmax}, a predictor hk that takes as input only the induced subgraph G[NkG(u)] with features and outputs a label for u. For edge-level tasks we analogously use G[NkG((u, v))]. The case k = 0 is the baseline: no adjacency information is visible except what is encoded in node/edge attributes. We then define an estimated collapse radius
pre := min {k ≤ Kmax: Acc(hk) ≥ τ},
for a task-dependent threshold τ (e.g. within ε of the best reported model, or within a confidence band around the best attainable performance on the benchmark).

To ensure the baseline is genuinely local, we enforce two invariants. First, the data pipeline must materialize ego-graphs explicitly and forbid global operations (no batch normalization whose statistics mix nodes from different ego-graphs; no attention across a batch; no global positional indices that reintroduce node identities). Second, when the preprocessor produces graph-level artifacts (e.g. an n × n distance matrix distributed across nodes), we must specify what is actually visible to hk: if node u contains D(u, ⋅), then k = 0 already includes n numbers, and the baseline is allowed to use them, but this must be recorded as bandwidth-heavy leakage rather than interpreted as ``local generalization.’’

A minimal audit can be summarized as follows.
We conclude with concrete recommendations that make leakage quantifiable and comparisons meaningful.

These audits supply the bridge to our implementation guidelines: once (Tpre, rpre, VClass) are reported with the above ablations, experiments can explicitly demonstrate (or refute) locality collapse and thereby prevent conflating preprocessing power with downstream architectural expressiveness.


We now describe experimental practices that make the leakage report Leak(P, fpre) = (Tpre, rpre, VClass, notes) empirically meaningful and comparable across pipelines. The goal is not to propose new benchmarks, but to standardize ablations that (i) expose locality collapse when it occurs (as predicted by Theorem~1), (ii) quantify the extent to which performance is attributable to preprocessing rather than downstream inference, and (iii) make tool-augmented or solver-augmented preprocessors auditable.

For each reported model trained on a fixed preprocessed instance family fpre(𝒢), we recommend reporting a small set of baselines that isolate what information is already present at bounded radius in G. At a minimum, for node- and edge-level tasks, one should include:
(i) a k = 0 baseline that uses only node/edge attributes and no adjacency beyond what is encoded as features;
(ii) a small set of ego-graph baselines for k ∈ {1, 2, 3} (or up to a task-dependent cap);
and (iii) a full-graph baseline in the same VClass as a sanity check that the learner is not underpowered.
We emphasize that the ego-graph baselines should be implemented by explicit subgraph extraction G[Nk(⋅)] followed by a predictor that receives only this induced subgraph as input, so that locality is enforced by construction rather than by an architectural heuristic.

In addition to reporting the resulting accuracies, it is informative to report the k ↦ Acc(hk) and the corresponding pre for a fixed threshold τ. We recommend choosing τ relative to the best reported pipeline for the benchmark, e.g. τ = Acc − ε with a small ε (or an uncertainty-corrected band), and reporting sensitivity over ε. In our experience, the curve is often more diagnostic than a single number: a sharp jump from k = 0 to k = 1 suggests that preprocessing has produced a locally checkable certificate, whereas a smooth increase may indicate partial leakage through approximate global summaries.

To detect inadvertent label encoding through precomputed features, we recommend a set of controlled corruptions that preserve marginal distributions while destroying alignment with graph structure. Concretely, if X decomposes into blocks X = (Xraw, Xpre), where Xpre is produced by fpre, then one can (i) permute Xpre(u) across nodes u, (ii) permute only within degree classes or within connected components (to preserve coarse statistics), or (iii) resample Xpre from other graphs in the dataset with matched size. A large performance drop under such perturbations indicates that Xpre is carrying instance-specific information that is not recoverable from local topology alone. These tests are particularly important when fpre outputs high-bandwidth objects (e.g. per-node distance vectors, spectral coordinates, or motif histograms), where a seemingly ``local’’ readout may in fact be exploiting globally computed identifiers.

We recommend that every experimental table involving preprocessing include an explicit budget line with at least the following quantities:
Tpre(G) (measured wall-clock and asymptotic bound),   Sout(G) (output size in scalars),   b (numeric precision / bits per scalar).
When fpre computes all-pairs objects (e.g. SPD or effective resistance), one should report whether the representation is stored explicitly (Θ(n2) scalars) or sparsified/quantized, since this materially changes both bandwidth and the effective power of the local readout. If the preprocessor makes external calls (linear solvers, MILP solvers, SAT solvers, database queries), then Tpre must include both the number and type of calls and the total time; if calls are nondeterministic or time-capped, this should be stated and treated as part of the experimental condition rather than hidden as ``data preparation.’’

A growing class of pipelines computes features using tools that are best modeled as oracles: exact/approximate solvers, retrieval systems, or foundation-model calls. Our recommendation is to treat such tools as part of fpre and to report them in the leakage tuple via an budget accounting:
Tpre(G) = (CPU/GPU time, #oracle queries, total tokens or bytes, tool class).
This makes explicit when a pipeline is implicitly leveraging #P-hard or NP-hard computation (cf. Theorem~4) or when it is simply amortizing heavy computation outside the learning loop. In tool-augmented settings, we also recommend a : rerun the downstream learner with the tool outputs replaced by outputs generated without access to the test instance (e.g. generic templates, unconditional samples, or outputs from an unrelated graph). If performance remains high, then the tool output is not carrying instance information; if performance collapses, then the tool output is an instance-dependent channel and must be audited for locality and budget.

Locality audits are fragile to seemingly innocuous implementation choices. We therefore recommend the following reproducibility constraints when reporting k-local baselines or certificates:
We recommend that each experimental configuration include a short, standardized disclosure block:

This block is intentionally brief but forces the separation of upstream computation from downstream inference. In particular, when a pipeline achieves strong ``long-range’’ performance, the disclosure should make it evident whether the gain is explained by small rpre (locality collapse) together with large Tpre, or whether it persists in regimes where preprocessing is genuinely local and low-budget.


10. Discussion and open problems: measuring ‘partial leakage’, approximate distances, public randomness, and extending audits to foundation-model pretraining pipelines

We conclude by isolating several directions in which the leakage tuple Leak(P, fpre) = (Tpre, rpre, VClass, notes) is presently too coarse, and by formulating open problems whose resolution would make locality audits both more faithful and harder to game.

Our definition of rpre is qualitative: either P ∘ fpre is k-local (exactly) or it is not. Empirically, however, preprocessing often yields features that are neither a full label encoding nor purely local statistics; instead, they offer a toward the correct answer that a downstream learner can amplify. This suggests that one should measure leakage on a continuum. One natural extension is an ε-locality radius
rpre(ε)(P, fpre) := min {k: ∃ k-local gk s.t. Pr [gk(G[Nk(⋅)]) = P(G)] ≥ 1 − ε},
with the probability taken over an explicit distribution on 𝒢 (or over a worst-case instance family if one adopts distribution-free agnostic guarantees). This raises two immediate questions. First, what is the correct for node/edge-level tasks: per-label error, average error over nodes/edges, or an adversarially chosen index? Second, what is the correct for ε: should we compare to Bayes risk under the distribution, or to the accuracy of a full-graph learner in a fixed VClass? Even in the simplest setting, one expects a separation between information leakage'' (existence of a \(k\)-local predictor with small \(\varepsilon\)) andcomputational leakage’’ (existence of such a predictor within a restricted VClass). A principled report would therefore include a pair (rpre(ε), VClass) rather than a single integer radius, together with an explicit definition of the error criterion.

A complementary approach is to define a of radius k, e.g.
Advk := supg ∈ 𝒢k(Pr [g(G[Nk(⋅)]) = P(G)] − Pr [g0(G[N0(⋅)]) = P(G)]),
where 𝒢k denotes a fixed hypothesis class of k-local predictors. This makes explicit that leakage is a attributable to increasing radius given the same VClass, aligning with the collapse curves advocated in Section~. The open problem is to connect such operational quantities to information-theoretic measures (conditional mutual information between P(G) and G[Nk] given G[Nk − 1], or conditional entropy decay), in a way that is invariant under benign feature reparameterizations.

Theorems~2–4 highlight that global objects (e.g. all-pairs effective resistance, all-pairs shortest paths, or exact motif counts) can collapse locality dramatically, but exactness is rarely used in practice. Many pipelines compute approximate shortest-path distances, approximate resistances via Laplacian solvers, low-rank spectral embeddings, or quantized distance bins. Formally, these are still global functions of G; nevertheless, their is reduced, and their approximation error may destroy the crisp characterizations underlying constant-radius certificates.

This motivates a refined accounting in which the preprocessor output is constrained not only by time Tpre but also by an information budget: output size Sout(G) and numeric precision b. One can then ask for a tradeoff curve: for a fixed task P, what minimal (Sout, b) suffices to achieve rpre(ε) ≤ k for small k? Even for concrete tasks such as bridge detection, it is unclear how robust the 0-local rule \(R(u,v)=1\)'' is under approximation. The immediate technical challenge is to control false positives/negatives when resistances are perturbed by \(\pm \delta\), or when shortest-path distances are only available up to additive/multiplicative distortion. The broader open problem is to characterize which classes of approximate global summaries admit \emph{locally checkable certificates} despite approximation error, and which do not. Such results would separateapproximate leakage’’ that still yields constant-radius decision rules from approximate features that merely help learning in a non-certificate way.

Our definitions are deterministic: after preprocessing, a label is a function of the radius-k induced neighborhood. In distributed computing and property testing, it is standard to allow a local algorithm access to public randomness. If we allow the local verifier gk to use a shared random seed ρ, then one can define a randomized radius rprerand by requiring Prρ[gk(⋅; ρ) = P(G)] ≥ 1 − ε. This is not merely a technicality: public randomness can simulate hashing-based certificates, local sampling, or randomized sketch verification, potentially reducing the radius needed to validate a global summary computed by fpre.

The associated open questions are definitional and complexity-theoretic. Definitional: what randomness is ``free’’ in a learning pipeline audit? If the seed is fixed across all instances, it can become an implicit identifier channel; if it is per-instance, it must be budgeted as part of the output. Complexity-theoretic: can randomized locality meaningfully reduce the preprocessing budget needed to collapse locality for natural problems, without simply reintroducing Theorem~1 by another route (namely, by allowing the preprocessor to write a randomized encoding of the labels)? Any satisfactory theory must treat randomness as a resource with explicit accounting, analogously to time and output bandwidth.

A practical frontier is that fpre is increasingly implemented by a model (graph foundation model, LLM-based tool, retrieval system) that consumes a representation of the entire graph and emits node/edge/graph embeddings. In our framework, such a system is an oracle-augmented preprocessor with budget components that do not fit the word-RAM abstraction: token budgets, retrieval calls, and (often hidden) training compute and training data access. The immediate methodological requirement is to state whether the preprocessor is recomputed per test instance, whether it is transductive (trained on the test graph), and whether it can access labels or label-correlated metadata during pretraining.

A more conceptual open problem is to define an analogue of rpre when fpre is across many instances. If a foundation model has memorized a distribution of graphs, then the per-instance runtime may be small while the computational budget is enormous. Leakage reports that ignore this amortization can be misleading in the opposite direction of Theorem~1: they may understate the effective preprocessing power. We therefore view it as necessary to extend Leak(P, fpre) with provenance fields (training data access, transductive vs inductive, and whether the preprocessor parameters depend on the evaluation instance) and with an amortized budget model. Establishing a standard here is an open problem in its own right, but without it, comparisons across ``feature-free’’ learners and tool-augmented pipelines are not well-defined.

Finally, our strongest negative statements are conditional (Theorem~3) or complexity-based (Theorem~4). A missing piece is a set of lower bounds showing that, for specific tasks P, any preprocessor that reduces rpre(ε) below a threshold must output Ω(n) or Ω(n2) bits, regardless of time. Such bounds would formalize the intuition that certain global properties cannot be made locally decidable without transmitting essentially the whole graph (or a near-complete distance matrix) through features. Proving these bounds, even for restricted instance families, appears to require combining locality arguments with communication complexity or streaming lower bounds. We regard this as the most direct route to a theory in which (Tpre, Sout, b, rpre) jointly constrain leakage in a way that cannot be circumvented by simply relocating computation upstream.