Under bounded resources, active selection (attention, routing, memory write, control allocation) is not merely helpful—it becomes increasingly forced as the environment’s semantic load rises relative to the system’s representational bandwidth.
Unlike some necessity results that admit sharp thresholds (e.g., coupling-dominance regimes), the compression/selection transition is typically graded: selection pressure increases smoothly with load and only appears “threshold-like” when constraints and success criteria are hard.
We formalize this with a small set of pressure parameters and show why a continuous “consciousness-likeness / phenomenal intensity” proxy naturally emerges.
A bounded agent interacts with a stream of situations:
Internally the agent maintains a compressed state zt with limited capacity and uses a selection mechanism αt (attention/routing/allocation) to decide what information influences the next update and action.
A convenient abstraction:
zt = Encα(ht) with cap(zt) ≤ k,
where ht is the agent’s internal history (or a summary thereof), k is a resource bound (bits, slots, tokens, activations), and α parameterizes selection.
The agent then chooses
ut = π(zt).
To avoid dependence on differential calculus, we’ll measure selection impact using information / causal dependence.
Semantic load is the amount of information about xt that must be preserved (directly or indirectly) to achieve robust performance across novelty.
A principled proxy is the task-relevant information required for near-optimal control:
ℒ := infπ I(X; Z) s.t. 𝔼[R(π(Z))] ≥ R* − ϵ.
Interpretation:
- I(X; Z) is
the number of bits about the world the agent must internally
retain.
- R* is best
achievable return.
- ϵ is the tolerated
performance gap.
This is an information bottleneck / rate–distortion style definition: to keep expected loss ≤ ϵ, you need at least ℒ bits of relevant information.
Compression ratio:
$$\rho := \frac{\mathcal{L}}{k} \quad (\rho > 1 \text{ means demand exceeds capacity}).$$
Selection pressure is “how hard the system must work to decide what matters.”
A minimal, architecture-neutral measure is how much the selection variable αt influences the next state or policy:
Define leverage as conditional mutual information:
L := I(αt; Zt + 1 ∣ Ht),
or equivalently (often more operational):
L := I(αt; Ut + 1 ∣ Ht),
where Ht summarizes history available before selection.
Let ℓ(x, u) be a loss. Define minimal achievable distortion under capacity k:
D(k) := infp(z ∣ x): I(X; Z) ≤ k 𝔼[ℓ(X, π(Z))].
Selection pressure rises with the slope |D′(k)|: if small increases in capacity sharply reduce loss, then the system is in a sensitive regime where “what you keep” matters a lot.
Let λ be a novelty rate: the probability that the task-relevant coordinate changes. High λ prevents caching one fixed summary forever.
A simple proxy is how quickly the identity of the optimal feature switches:
λ := Pr [argmax relevance(t + 1) ≠ argmax relevance(t)].
In many environments the performance curve as a function of capacity and selection strength is smooth:
These soften what would otherwise be a crisp impossibility boundary.
Assume:
1. The agent’s internal representation obeys a capacity constraint I(X; Z) ≤ k.
2. Required information for ϵ-optimality is ℒ(ϵ).
3. Novelty causes the relevant features to drift at rate λ.
Then selection pressure (measured via leverage L or distortion sensitivity |D′(k)|) is generally increasing in ρ = ℒ/k and λ, and becomes “threshold-like” only in the limit of hard success criteria and adversarial novelty.
Intuition: when ρ ≪ 1, almost anything fits; selection barely matters. As ρ → 1, capacity becomes binding; selection choices become consequential. When ρ > 1, the agent cannot retain all relevant bits and must actively select a subset, making leverage necessary for performance.
Let D(R) be the rate–distortion function for the task loss ℓ: the smallest expected distortion achievable with rate R bits.
If the agent’s effective rate is R = k (capacity k bits), then expected loss is bounded below:
𝔼[ℓ] ≥ D(k).
Suppose the target loss corresponds to distortion D*. If D(k) > D*, the target is unachievable regardless of selection. If D(k) ≈ D*, then small changes in what is encoded can move you above/below target—this is precisely where selection pressure becomes large.
The gradedness comes from the fact that typical D(R) curves are smooth and convex: marginal utility of bits increases as you approach the knee.
A useful pressure proxy:
$$\Pi_k := -\frac{d}{dk} D(k).$$
When the system has multiple operators or subsystems that must coordinate (planning, memory, perception, action), coupling strength γ amplifies selection pressure.
Heuristically, expected loss decomposes into:
Loss ≈ D(k) + γ ⋅ Pr [coordination error] + noise terms.
When γ is large, even modest coordination error dominates, and selection must not only compress but also coordinate what is kept and how it is used across modules.
This is the bridge to a global evaluative broadcast: globality becomes more necessary as γ increases.
Let’s define a scalar proxy ℐ that increases when the system is deep in the regime where active selection is doing real work.
A simple monotone form:
ℐ = σ(alog ρ + blog γ + clog (L + ε) + dlog (λ + ε) − θ),
where:
- σ is a sigmoid (saturating
nonlinearity),
- ε prevents
singularities,
- a, b, c, d ≥ 0
weight pressures,
- θ sets a reference point
(“fuzzy boundary”).
Interpretation:
- If ρ is small (plenty of
slack), intensity is low.
- As ρ grows (capacity binds),
intensity rises.
- As coupling γ grows, global
coordination pressure rises.
- As leverage L grows,
evaluation/selection has real causal control.
- As novelty λ grows, fixed
summaries fail; selection must stay online.
This supports the framing:
Phenomenal intensity ∝ f(compression ratio, γ, leverage strength, λ, …)
rather than a single crisp threshold.
This note does not claim phenomenology in any metaphysical sense. It provides a quantitative complement to the binary necessity chain.
Compression pressure explains why closure becomes non-negotiable under bounded resources and novelty, but the transition is commonly graded.
You get near-binary transitions when:
- capacity is a hard cap (not penalized),
- novelty is adversarial (worst-case),
- the success criterion is strict (beat chance by a fixed margin for all
times),
- and the environment can select which coordinate matters.
In that limit, smooth curves collapse into “possible/impossible” boundaries and selection becomes provably necessary in the strongest sense.