← Back

Compression-to-Selection Necessity (Selection-for-Competence Lemma)

A bridge lemma: above the compression threshold, selection is forced.


0. Claim in One Line

If semantic load exceeds capacity and relevance can move, competence requires non-trivial selection.

Equivalently:

Selection is not an architectural choice above threshold; it is a mathematical necessity.

This lemma closes the missing link in the chain:
Compression threshold ⇒ Selection forced ⇒ Closure forced.


1. Minimal Formal Setup

1.1 Environment with moving relevance

Let the world expose an input vector
xt ∈ ℝn
and a target
yt = ft(xt),
where the relevant coordinates (or factors) that determine yt may vary over time.

We model this by a relevance set
Rt ⊆ {1, …, n},  |Rt| ≤ r,
such that
yt depends only on {xt(j) : j ∈ Rt}.

Novelty / moving relevance: Rt may change over time according to an admissible process class 𝒫.
Crucially, 𝒫 includes sequences where the relevant coordinates are not fixed.

(Optional robustness note: if correlations exist, interpret n as the number of effective residual degrees of freedom after conditioning on what the agent can represent.)

1.2 Capacity bound (compression)

At each step, the system has a representational / computational budget that allows it to process at most k degrees of freedom with high fidelity:
k < n.
This is the formal “compression” condition.

1.3 Selection mechanism

A selection policy is any mechanism that chooses which components of xt (or derived features) receive processing resources:
At = Select(xt, mt) ⊆ {1, …, n},  |At| ≤ k,
where mt denotes internal state/memory.

The selected subvector is
xt|At.

1.4 Competence requirement

The system outputs t and incurs loss
t = (t, yt).

We say the system is competent under novelty if it maintains uniformly low expected loss across all admissible relevance processes:
supP ∈ 𝒫 𝔼P[t] ≤ ϵ  for all sufficiently large t.


2. Lemma (Compression ⇒ Selection Necessity)

Lemma (Selection-for-Competence).
Suppose:
1) the environment has n effective degrees of freedom with moving relevance Rt,
2) the system has per-step capacity k < n,
3) competence requires supP ∈ 𝒫𝔼[t] ≤ ϵ for small ϵ.

Then the system must implement non-trivial selection:
it must choose At in a way that depends on xt and/or internal state mt such that At adapts to relevance.

In particular, any architecture that applies a fixed, relevance-independent projection
At ≡ A  (constant)
or that processes all coordinates only through a fixed bottleneck of size k without adaptive allocation,
cannot maintain competence across 𝒫.


3. Proof (By Adversarial Relevance Sequence)

We prove that without adaptive selection, there exists an admissible relevance process that forces persistent error.

3.1 Consider the non-triviality negation

Assume, for contradiction, that the system does not implement non-trivial selection.

Formally, assume one of the following holds:

Either way, there exists a set of “unattended” coordinates at each time:
Ut := {1, …, n} \ At,
with
|Ut| ≥ n − k > 0.

3.2 Choose an admissible relevance process that avoids the attended set

Because 𝒫 contains moving relevance, pick a process P ∈ 𝒫 such that at each time,
Rt ⊆ Ut.
That is: the relevant coordinates always lie in the system’s unselected set.

This is possible whenever:
- relevance is allowed to vary, and
- the system’s attended set is limited to k < n.

(If the reviewer demands non-adaptivity: we can choose a fixed relevance set R disjoint from a fixed A, or a time-scheduled rotation independent of the system’s internal randomness. The conclusion still holds.)

3.3 Define a target that depends only on the unselected coordinates

Let the target be any function that depends only on xt through Rt:
yt = ft(xt|Rt),
where Rt ⊆ Ut.

By construction, the system’s selected input xt|At contains no information about the coordinates determining yt.

3.4 Lower bound the achievable error

Because the system’s output t depends only on what it processes (at most k coordinates plus its internal state),
and the relevant information is excluded, the system cannot predict yt better than a baseline.

Under mild conditions (e.g., the excluded coordinates have non-degenerate variance and are not deterministically encoded in the selected ones),
we obtain a non-zero error lower bound:
𝔼P[t] ≥ c
for some constant c > 0 independent of t.

(Example intuition: if yt is a bit determined by an unselected coordinate with balanced probability, any predictor without access to it cannot beat chance.)

Thus:
supP ∈ 𝒫𝔼P[t] ≥ 𝔼P[t] ≥ c,
so the competence criterion fails for any ϵ < c.

This contradicts the assumption that the system remains competent across 𝒫.

Therefore, to satisfy competence under novelty, the system must implement non-trivial selection that can track relevance:
At = Select(xt, mt)  (adaptive).


4. Notes on Strength and Generality

4.1 Why this lemma is not “just attention”

“Selection” here is an abstract necessity class, not a specific module.

Any of the following count:
- attention,
- routing,
- gating,
- internal search branching control,
- memory write policies,
- dynamic feature construction.

The lemma only forces resource allocation that can adapt, not a particular implementation.

4.2 Correlation caveat (residual load)

If the unselected coordinates are perfectly predictable from selected ones, the lower bound can fail.

That is why the correct statement is in terms of effective residual degrees of freedom:
- after conditioning on what the system processes,
- what independent uncertainty remains relevant.

The compression threshold ℛ > τcritical should be read as precisely this residual overload condition.


5. Corollary (Selection ⇒ Closure becomes meaningful)

Once selection is forced, the next necessity result activates:

Corollary.
Above the compression threshold, architectures must perform selection.
Therefore, the competence question becomes: what governs selection?
This is where evaluative closure enters as the next forced property.

This lemma supplies the missing bridge:
Compression ⇒ Selection ⇒ Closure.


6. One-Sentence Summary

When relevance can move and capacity is bounded, any competent system must choose what to process—selection is forced by compression.