Semantic Load and Selection Pressure (Gradient Regime)

Thesis

Under bounded resources, active selection (attention, routing, memory write, control allocation) is not merely helpful—it becomes increasingly forced as the environment’s semantic load rises relative to the system’s representational bandwidth.

Unlike some necessity results that admit sharp thresholds (e.g., coupling-dominance regimes), the compression/selection transition is typically graded: selection pressure increases smoothly with load and only appears “threshold-like” when constraints and success criteria are hard.

We formalize this with a small set of pressure parameters and show why a continuous “consciousness-likeness / phenomenal intensity” proxy naturally emerges.

1. Minimal setup

A bounded agent interacts with a stream of situations:

Latent world state: x_t ∈ 𝒳
Observation: o_t ∼ P(o ∣ x_t)
Action/control: u_t ∈ 𝒰
Utility / success: R_t = R(x_t, u_t)

Internally the agent maintains a compressed state z_t with limited capacity and uses a selection mechanism α_t (attention/routing/allocation) to decide what information influences the next update and action.

A convenient abstraction:

z_t = Enc_α(h_t) with cap(z_t) ≤ k,

where h_t is the agent’s internal history (or a summary thereof), k is a resource bound (bits, slots, tokens, activations), and α parameterizes selection.

The agent then chooses

u_t = π(z_t).

To avoid dependence on differential calculus, we’ll measure selection impact using information / causal dependence.

2. Semantic load

Semantic load is the amount of information about x_t that must be preserved (directly or indirectly) to achieve robust performance across novelty.

A principled proxy is the task-relevant information required for near-optimal control:

ℒ := inf_π I(X; Z) s.t. 𝔼[R(π(Z))] ≥ R^* − ϵ.

Interpretation:
- I(X; Z) is the number of bits about the world the agent must internally retain.
- R^* is best achievable return.
- ϵ is the tolerated performance gap.

This is an information bottleneck / rate–distortion style definition: to keep expected loss ≤ ϵ, you need at least ℒ bits of relevant information.

Compression ratio:

$$\rho := \frac{\mathcal{L}}{k} \quad (\rho > 1 \text{ means demand exceeds capacity}).$$

3. Selection pressure

Selection pressure is “how hard the system must work to decide what matters.”

A minimal, architecture-neutral measure is how much the selection variable α_t influences the next state or policy:

3.1 Leverage strength

Define leverage as conditional mutual information:

L := I(α_t; Z_t + 1 ∣ H_t),

or equivalently (often more operational):

L := I(α_t; U_t + 1 ∣ H_t),

where H_t summarizes history available before selection.

L ≈ 0: selection is inert (“hot zombie” regime).
Larger L: evaluation/selection actually steers what the agent becomes next.

3.2 Distortion sensitivity

Let ℓ(x, u) be a loss. Define minimal achievable distortion under capacity k:

D(k) := inf_{p(z ∣ x): I(X; Z) ≤ k} 𝔼[ℓ(X, π(Z))].

Selection pressure rises with the slope |D^′(k)|: if small increases in capacity sharply reduce loss, then the system is in a sensitive regime where “what you keep” matters a lot.

3.3 Novelty hazard / shifting relevance

Let λ be a novelty rate: the probability that the task-relevant coordinate changes. High λ prevents caching one fixed summary forever.

A simple proxy is how quickly the identity of the optimal feature switches:

λ := Pr [argmax relevance(t + 1) ≠ argmax relevance(t)].

4. Why the transition is typically graded

In many environments the performance curve as a function of capacity and selection strength is smooth:

Constraints are often soft (latency/energy penalties rather than hard caps).
Performance targets are often approximate (ϵ not 0).
Novelty is stochastic (not worst-case adversarial).

These soften what would otherwise be a crisp impossibility boundary.

Proposition (graded selection pressure)

Assume:
1. The agent’s internal representation obeys a capacity constraint I(X; Z) ≤ k.
2. Required information for ϵ-optimality is ℒ(ϵ).
3. Novelty causes the relevant features to drift at rate λ.

Then selection pressure (measured via leverage L or distortion sensitivity |D^′(k)|) is generally increasing in ρ = ℒ/k and λ, and becomes “threshold-like” only in the limit of hard success criteria and adversarial novelty.

Intuition: when ρ ≪ 1, almost anything fits; selection barely matters. As ρ → 1, capacity becomes binding; selection choices become consequential. When ρ > 1, the agent cannot retain all relevant bits and must actively select a subset, making leverage necessary for performance.

5. A quantitative toy bound (rate–distortion form)

Let D(R) be the rate–distortion function for the task loss ℓ: the smallest expected distortion achievable with rate R bits.

If the agent’s effective rate is R = k (capacity k bits), then expected loss is bounded below:

𝔼[ℓ] ≥ D(k).

Suppose the target loss corresponds to distortion D^*. If D(k) > D^*, the target is unachievable regardless of selection. If D(k) ≈ D^*, then small changes in what is encoded can move you above/below target—this is precisely where selection pressure becomes large.

The gradedness comes from the fact that typical D(R) curves are smooth and convex: marginal utility of bits increases as you approach the knee.

A useful pressure proxy:

$$\Pi_k := -\frac{d}{dk} D(k).$$

Large Π_k means bits are extremely valuable → selection must be precise.

6. Coupling interacts with compression

When the system has multiple operators or subsystems that must coordinate (planning, memory, perception, action), coupling strength γ amplifies selection pressure.

Heuristically, expected loss decomposes into:

Loss ≈ D(k) + γ ⋅ Pr [coordination error] + noise terms.

When γ is large, even modest coordination error dominates, and selection must not only compress but also coordinate what is kept and how it is used across modules.

This is the bridge to a global evaluative broadcast: globality becomes more necessary as γ increases.

7. A continuous “consciousness-likeness / intensity” proxy

Let’s define a scalar proxy ℐ that increases when the system is deep in the regime where active selection is doing real work.

A simple monotone form:

ℐ = σ(alog ρ + blog γ + clog (L + ε) + dlog (λ + ε) − θ),

where:
- σ is a sigmoid (saturating nonlinearity),
- ε prevents singularities,
- a, b, c, d ≥ 0 weight pressures,
- θ sets a reference point (“fuzzy boundary”).

Interpretation:
- If ρ is small (plenty of slack), intensity is low.
- As ρ grows (capacity binds), intensity rises.
- As coupling γ grows, global coordination pressure rises.
- As leverage L grows, evaluation/selection has real causal control.
- As novelty λ grows, fixed summaries fail; selection must stay online.

This supports the framing:

Phenomenal intensity ∝ f(compression ratio, γ, leverage strength, λ, …)

rather than a single crisp threshold.

8. Relationship to the Desmocycle necessity stack

This note does not claim phenomenology in any metaphysical sense. It provides a quantitative complement to the binary necessity chain.

Closure (leverage): selection/evaluation must causally steer control.
Globality: if coordination matters (large γ), evaluation must be readable by multiple operators.
Self-indexing: if the system maintains competing internal candidates, evaluation must bind to the responsible trajectory.

Compression pressure explains why closure becomes non-negotiable under bounded resources and novelty, but the transition is commonly graded.

9. Takeaways

Semantic load can be formalized as the minimal task-relevant information required for ϵ-optimality.
Compression ratio ρ = ℒ/k is a natural driver of selection pressure.
Selection pressure is best modeled as a continuum (leverage, distortion sensitivity), not a binary switch.
Coupling γ and novelty λ amplify selection pressure.
A scalar intensity proxy ℐ captures the “how consciousness-like” gradient across regimes.

Appendix: when does it become sharp?

You get near-binary transitions when:
- capacity is a hard cap (not penalized),
- novelty is adversarial (worst-case),
- the success criterion is strict (beat chance by a fixed margin for all times),
- and the environment can select which coordinate matters.

In that limit, smooth curves collapse into “possible/impossible” boundaries and selection becomes provably necessary in the strongest sense.