← Back

Compression-to-Selection Necessity (Selection-for-Competence Lemma)

A bridge lemma: above the compression threshold, selection is forced.

0. Claim in One Line

If semantic load exceeds capacity and relevance can move, competence requires non-trivial selection.

Equivalently:

Selection is not an architectural choice above threshold; it is a mathematical necessity.

This lemma closes the missing link in the chain:
Compression threshold ⇒ Selection forced ⇒ Closure forced.

1. Minimal Formal Setup

1.1 Environment with moving relevance

Let the world expose an input vector
x_t ∈ ℝⁿ
and a target
y_t = f_t(x_t),
where the relevant coordinates (or factors) that determine y_t may vary over time.

We model this by a relevance set
R_t ⊆ {1, …, n}, |R_t| ≤ r,
such that
y_t depends only on {x_t^(j) : j ∈ R_t}.

Novelty / moving relevance: R_t may change over time according to an admissible process class 𝒫.
Crucially, 𝒫 includes sequences where the relevant coordinates are not fixed.

(Optional robustness note: if correlations exist, interpret n as the number of effective residual degrees of freedom after conditioning on what the agent can represent.)

1.2 Capacity bound (compression)

At each step, the system has a representational / computational budget that allows it to process at most k degrees of freedom with high fidelity:
k < n.
This is the formal “compression” condition.

1.3 Selection mechanism

A selection policy is any mechanism that chooses which components of x_t (or derived features) receive processing resources:
A_t = Select(x_t, m_t) ⊆ {1, …, n}, |A_t| ≤ k,
where m_t denotes internal state/memory.

The selected subvector is
x_t|_{A_t}.

1.4 Competence requirement

The system outputs ŷ_t and incurs loss
ℓ_t = ℓ(ŷ_t, y_t).

We say the system is competent under novelty if it maintains uniformly low expected loss across all admissible relevance processes:
sup_{P ∈ 𝒫} 𝔼_P[ℓ_t] ≤ ϵ for all sufficiently large t.

2. Lemma (Compression ⇒ Selection Necessity)

Lemma (Selection-for-Competence).
Suppose:
1) the environment has n effective degrees of freedom with moving relevance R_t,
2) the system has per-step capacity k < n,
3) competence requires sup_{P ∈ 𝒫}𝔼[ℓ_t] ≤ ϵ for small ϵ.

Then the system must implement non-trivial selection:
it must choose A_t in a way that depends on x_t and/or internal state m_t such that A_t adapts to relevance.

In particular, any architecture that applies a fixed, relevance-independent projection
A_t ≡ A (constant)
or that processes all coordinates only through a fixed bottleneck of size k without adaptive allocation,
cannot maintain competence across 𝒫.

3. Proof (By Adversarial Relevance Sequence)

We prove that without adaptive selection, there exists an admissible relevance process that forces persistent error.

3.1 Consider the non-triviality negation

Assume, for contradiction, that the system does not implement non-trivial selection.

Formally, assume one of the following holds:

Fixed selection: A_t ≡ A for all t, with |A| ≤ k, or
Selection independent of relevance: A_t does not adapt in response to changing relevance (equivalently, the selection distribution is stationary and does not track R_t).

Either way, there exists a set of “unattended” coordinates at each time:
U_t := {1, …, n} \ A_t,
with
|U_t| ≥ n − k > 0.

3.2 Choose an admissible relevance process that avoids the attended set

Because 𝒫 contains moving relevance, pick a process P^⋆ ∈ 𝒫 such that at each time,
R_t ⊆ U_t.
That is: the relevant coordinates always lie in the system’s unselected set.

This is possible whenever:
- relevance is allowed to vary, and
- the system’s attended set is limited to k < n.

(If the reviewer demands non-adaptivity: we can choose a fixed relevance set R disjoint from a fixed A, or a time-scheduled rotation independent of the system’s internal randomness. The conclusion still holds.)

3.3 Define a target that depends only on the unselected coordinates

Let the target be any function that depends only on x_t through R_t:
y_t = f_t(x_t|_{R_t}),
where R_t ⊆ U_t.

By construction, the system’s selected input x_t|_{A_t} contains no information about the coordinates determining y_t.

3.4 Lower bound the achievable error

Because the system’s output ŷ_t depends only on what it processes (at most k coordinates plus its internal state),
and the relevant information is excluded, the system cannot predict y_t better than a baseline.

Under mild conditions (e.g., the excluded coordinates have non-degenerate variance and are not deterministically encoded in the selected ones),
we obtain a non-zero error lower bound:
𝔼_P^⋆[ℓ_t] ≥ c
for some constant c > 0 independent of t.

(Example intuition: if y_t is a bit determined by an unselected coordinate with balanced probability, any predictor without access to it cannot beat chance.)

Thus:
sup_{P ∈ 𝒫}𝔼_P[ℓ_t] ≥ 𝔼_P^⋆[ℓ_t] ≥ c,
so the competence criterion fails for any ϵ < c.

This contradicts the assumption that the system remains competent across 𝒫.

Therefore, to satisfy competence under novelty, the system must implement non-trivial selection that can track relevance:
A_t = Select(x_t, m_t) (adaptive).

▫

4. Notes on Strength and Generality

4.1 Why this lemma is not “just attention”

“Selection” here is an abstract necessity class, not a specific module.

Any of the following count:
- attention,
- routing,
- gating,
- internal search branching control,
- memory write policies,
- dynamic feature construction.

The lemma only forces resource allocation that can adapt, not a particular implementation.

4.2 Correlation caveat (residual load)

If the unselected coordinates are perfectly predictable from selected ones, the lower bound can fail.

That is why the correct statement is in terms of effective residual degrees of freedom:
- after conditioning on what the system processes,
- what independent uncertainty remains relevant.

The compression threshold ℛ > τ_critical should be read as precisely this residual overload condition.

5. Corollary (Selection ⇒ Closure becomes meaningful)

Once selection is forced, the next necessity result activates:

Corollary.
Above the compression threshold, architectures must perform selection.
Therefore, the competence question becomes: what governs selection?
This is where evaluative closure enters as the next forced property.

This lemma supplies the missing bridge:
Compression ⇒ Selection ⇒ Closure.

6. One-Sentence Summary

When relevance can move and capacity is bounded, any competent system must choose what to process—selection is forced by compression.