A bridge lemma: above the compression threshold, selection is forced.
If semantic load exceeds capacity and relevance can move, competence requires non-trivial selection.
Equivalently:
Selection is not an architectural choice above threshold; it is a mathematical necessity.
This lemma closes the missing link in the chain:
Compression threshold ⇒ Selection
forced ⇒ Closure forced.
Let the world expose an input vector
xt ∈ ℝn
and a target
yt = ft(xt),
where the relevant coordinates (or factors) that determine yt may vary over
time.
We model this by a relevance set
Rt ⊆ {1, …, n}, |Rt| ≤ r,
such that
yt depends only
on
{xt(j) : j ∈ Rt}.
Novelty / moving relevance: Rt may change
over time according to an admissible process class 𝒫.
Crucially, 𝒫 includes sequences where
the relevant coordinates are not fixed.
(Optional robustness note: if correlations exist, interpret n as the number of effective residual degrees of freedom after conditioning on what the agent can represent.)
At each step, the system has a representational / computational
budget that allows it to process at most k degrees of freedom with high
fidelity:
k < n.
This is the formal “compression” condition.
A selection policy is any mechanism that chooses
which components of xt (or derived
features) receive processing resources:
At = Select(xt, mt) ⊆ {1, …, n}, |At| ≤ k,
where mt
denotes internal state/memory.
The selected subvector is
xt|At.
The system outputs ŷt and incurs
loss
ℓt = ℓ(ŷt, yt).
We say the system is competent under novelty if it
maintains uniformly low expected loss across all admissible relevance
processes:
supP ∈ 𝒫 𝔼P[ℓt] ≤ ϵ for
all sufficiently large t.
Lemma (Selection-for-Competence).
Suppose:
1) the environment has n effective degrees of freedom with moving relevance Rt,
2) the system has per-step capacity k < n,
3) competence requires supP ∈ 𝒫𝔼[ℓt] ≤ ϵ for small ϵ.Then the system must implement non-trivial selection:
it must choose At in a way that depends on xt and/or internal state mt such that At adapts to relevance.In particular, any architecture that applies a fixed, relevance-independent projection
At ≡ A (constant)
or that processes all coordinates only through a fixed bottleneck of size k without adaptive allocation,
cannot maintain competence across 𝒫.
We prove that without adaptive selection, there exists an admissible relevance process that forces persistent error.
Assume, for contradiction, that the system does not implement non-trivial selection.
Formally, assume one of the following holds:
Either way, there exists a set of “unattended” coordinates at each
time:
Ut := {1, …, n} \ At,
with
|Ut| ≥ n − k > 0.
Because 𝒫 contains moving relevance,
pick a process P⋆ ∈ 𝒫 such that at each
time,
Rt ⊆ Ut.
That is: the relevant coordinates always lie in the system’s unselected
set.
This is possible whenever:
- relevance is allowed to vary, and
- the system’s attended set is limited to k < n.
(If the reviewer demands non-adaptivity: we can choose a fixed relevance set R disjoint from a fixed A, or a time-scheduled rotation independent of the system’s internal randomness. The conclusion still holds.)
Let the target be any function that depends only on xt through Rt:
yt = ft(xt|Rt),
where Rt ⊆ Ut.
By construction, the system’s selected input xt|At contains no information about the coordinates determining yt.
Because the system’s output ŷt depends only
on what it processes (at most k coordinates plus its internal
state),
and the relevant information is excluded, the system cannot predict
yt better
than a baseline.
Under mild conditions (e.g., the excluded coordinates have
non-degenerate variance and are not deterministically encoded in the
selected ones),
we obtain a non-zero error lower bound:
𝔼P⋆[ℓt] ≥ c
for some constant c > 0
independent of t.
(Example intuition: if yt is a bit determined by an unselected coordinate with balanced probability, any predictor without access to it cannot beat chance.)
Thus:
supP ∈ 𝒫𝔼P[ℓt] ≥ 𝔼P⋆[ℓt] ≥ c,
so the competence criterion fails for any ϵ < c.
This contradicts the assumption that the system remains competent across 𝒫.
Therefore, to satisfy competence under novelty, the system must
implement non-trivial selection that can track
relevance:
At = Select(xt, mt) (adaptive).
▫
“Selection” here is an abstract necessity class, not a specific module.
Any of the following count:
- attention,
- routing,
- gating,
- internal search branching control,
- memory write policies,
- dynamic feature construction.
The lemma only forces resource allocation that can adapt, not a particular implementation.
If the unselected coordinates are perfectly predictable from selected ones, the lower bound can fail.
That is why the correct statement is in terms of effective
residual degrees of freedom:
- after conditioning on what the system processes,
- what independent uncertainty remains relevant.
The compression threshold ℛ > τcritical should be read as precisely this residual overload condition.
Once selection is forced, the next necessity result activates:
Corollary.
Above the compression threshold, architectures must perform selection.
Therefore, the competence question becomes: what governs selection?
This is where evaluative closure enters as the next forced property.
This lemma supplies the missing bridge:
Compression ⇒ Selection ⇒ Closure.
When relevance can move and capacity is bounded, any competent system must choose what to process—selection is forced by compression.