← Back

Semantic Load and Selection Pressure (Gradient Regime)

Thesis

Under bounded resources, active selection (attention, routing, memory write, control allocation) is not merely helpful—it becomes increasingly forced as the environment’s semantic load rises relative to the system’s representational bandwidth.

Unlike some necessity results that admit sharp thresholds (e.g., coupling-dominance regimes), the compression/selection transition is typically graded: selection pressure increases smoothly with load and only appears “threshold-like” when constraints and success criteria are hard.

We formalize this with a small set of pressure parameters and show why a continuous “consciousness-likeness / phenomenal intensity” proxy naturally emerges.


1. Minimal setup

A bounded agent interacts with a stream of situations:

Internally the agent maintains a compressed state zt with limited capacity and uses a selection mechanism αt (attention/routing/allocation) to decide what information influences the next update and action.

A convenient abstraction:

zt = Encα(ht)  with  cap(zt) ≤ k,

where ht is the agent’s internal history (or a summary thereof), k is a resource bound (bits, slots, tokens, activations), and α parameterizes selection.

The agent then chooses

ut = π(zt).

To avoid dependence on differential calculus, we’ll measure selection impact using information / causal dependence.


2. Semantic load

Semantic load is the amount of information about xt that must be preserved (directly or indirectly) to achieve robust performance across novelty.

A principled proxy is the task-relevant information required for near-optimal control:

ℒ := infπI(X; Z)  s.t.  𝔼[R(π(Z))] ≥ R* − ϵ.

Interpretation:
- I(X; Z) is the number of bits about the world the agent must internally retain.
- R* is best achievable return.
- ϵ is the tolerated performance gap.

This is an information bottleneck / rate–distortion style definition: to keep expected loss  ≤ ϵ, you need at least bits of relevant information.

Compression ratio:

$$\rho := \frac{\mathcal{L}}{k} \quad (\rho > 1 \text{ means demand exceeds capacity}).$$


3. Selection pressure

Selection pressure is “how hard the system must work to decide what matters.”

A minimal, architecture-neutral measure is how much the selection variable αt influences the next state or policy:

3.1 Leverage strength

Define leverage as conditional mutual information:

L := I(αt; Zt + 1 ∣ Ht),

or equivalently (often more operational):

L := I(αt; Ut + 1 ∣ Ht),

where Ht summarizes history available before selection.

3.2 Distortion sensitivity

Let (x, u) be a loss. Define minimal achievable distortion under capacity k:

D(k) := infp(z ∣ x): I(X; Z) ≤ k 𝔼[(X, π(Z))].

Selection pressure rises with the slope |D(k)|: if small increases in capacity sharply reduce loss, then the system is in a sensitive regime where “what you keep” matters a lot.

3.3 Novelty hazard / shifting relevance

Let λ be a novelty rate: the probability that the task-relevant coordinate changes. High λ prevents caching one fixed summary forever.

A simple proxy is how quickly the identity of the optimal feature switches:

λ := Pr [argmax relevance(t + 1) ≠ argmax relevance(t)].


4. Why the transition is typically graded

In many environments the performance curve as a function of capacity and selection strength is smooth:

These soften what would otherwise be a crisp impossibility boundary.

Proposition (graded selection pressure)

Assume:
1. The agent’s internal representation obeys a capacity constraint I(X; Z) ≤ k.
2. Required information for ϵ-optimality is ℒ(ϵ).
3. Novelty causes the relevant features to drift at rate λ.

Then selection pressure (measured via leverage L or distortion sensitivity |D(k)|) is generally increasing in ρ = ℒ/k and λ, and becomes “threshold-like” only in the limit of hard success criteria and adversarial novelty.

Intuition: when ρ ≪ 1, almost anything fits; selection barely matters. As ρ → 1, capacity becomes binding; selection choices become consequential. When ρ > 1, the agent cannot retain all relevant bits and must actively select a subset, making leverage necessary for performance.


5. A quantitative toy bound (rate–distortion form)

Let D(R) be the rate–distortion function for the task loss : the smallest expected distortion achievable with rate R bits.

If the agent’s effective rate is R = k (capacity k bits), then expected loss is bounded below:

𝔼[] ≥ D(k).

Suppose the target loss corresponds to distortion D*. If D(k) > D*, the target is unachievable regardless of selection. If D(k) ≈ D*, then small changes in what is encoded can move you above/below target—this is precisely where selection pressure becomes large.

The gradedness comes from the fact that typical D(R) curves are smooth and convex: marginal utility of bits increases as you approach the knee.

A useful pressure proxy:

$$\Pi_k := -\frac{d}{dk} D(k).$$


6. Coupling interacts with compression

When the system has multiple operators or subsystems that must coordinate (planning, memory, perception, action), coupling strength γ amplifies selection pressure.

Heuristically, expected loss decomposes into:

Loss ≈ D(k) + γ ⋅ Pr [coordination error] + noise terms.

When γ is large, even modest coordination error dominates, and selection must not only compress but also coordinate what is kept and how it is used across modules.

This is the bridge to a global evaluative broadcast: globality becomes more necessary as γ increases.


7. A continuous “consciousness-likeness / intensity” proxy

Let’s define a scalar proxy that increases when the system is deep in the regime where active selection is doing real work.

A simple monotone form:

ℐ = σ(alog ρ + blog γ + clog (L + ε) + dlog (λ + ε) − θ),

where:
- σ is a sigmoid (saturating nonlinearity),
- ε prevents singularities,
- a, b, c, d ≥ 0 weight pressures,
- θ sets a reference point (“fuzzy boundary”).

Interpretation:
- If ρ is small (plenty of slack), intensity is low.
- As ρ grows (capacity binds), intensity rises.
- As coupling γ grows, global coordination pressure rises.
- As leverage L grows, evaluation/selection has real causal control.
- As novelty λ grows, fixed summaries fail; selection must stay online.

This supports the framing:

Phenomenal intensity ∝ f(compression ratio, γ, leverage strength, λ, …)

rather than a single crisp threshold.


8. Relationship to the Desmocycle necessity stack

This note does not claim phenomenology in any metaphysical sense. It provides a quantitative complement to the binary necessity chain.

Compression pressure explains why closure becomes non-negotiable under bounded resources and novelty, but the transition is commonly graded.


9. Takeaways

  1. Semantic load can be formalized as the minimal task-relevant information required for ϵ-optimality.
  2. Compression ratio ρ = ℒ/k is a natural driver of selection pressure.
  3. Selection pressure is best modeled as a continuum (leverage, distortion sensitivity), not a binary switch.
  4. Coupling γ and novelty λ amplify selection pressure.
  5. A scalar intensity proxy captures the “how consciousness-like” gradient across regimes.

Appendix: when does it become sharp?

You get near-binary transitions when:
- capacity is a hard cap (not penalized),
- novelty is adversarial (worst-case),
- the success criterion is strict (beat chance by a fixed margin for all times),
- and the environment can select which coordinate matters.

In that limit, smooth curves collapse into “possible/impossible” boundaries and selection becomes provably necessary in the strongest sense.