Carbon- and Congestion-Priced Federated Learning: Equilibrium Update Frequency and Local Epochs with AoI/Latency

Metadata

Total Words: 11,853
Export Date: 2026-01-14 21:39:16
Description: We study how explicit carbon/energy pricing and network congestion pricing reshape the optimal compute–communication mix in industrial federated learning (FL). Building on the satisfaction/AoI-versus-latency framing in meta-computing enhanced FL (MEFL) and the compute-for-communication substitution emphasized by FedAvg, we propose a tractable steady-state procurement model in which a server values model-improving updates but penalizes staleness and delay through an AoI/latency term. Local computation (epochs) boosts per-update effectiveness with diminishing returns, while higher update frequency improves freshness but increases congestion and monetary resource costs. We provide closed-form first-order characterizations and comparative statics: carbon prices reduce optimal local epochs, while congestion prices reduce update frequency; together they induce a threshold rule for when compute-heavy (FedAvg-like) versus communication-heavy training is optimal. Extensions discuss endogenous congestion pricing and heterogeneous node selection, where numerical methods may be required. The results provide testable predictions for 2026-era deployments facing carbon accounting, energy constraints, and congested industrial networks.

1. Introduction: Why carbon/congestion pricing is the new bottleneck for IIoT FL in 2026; link to MEFL satisfaction and FedAvg compute–communication substitution.
2. Related work: (i) AoI/latency incentives in IIoT FL (MEFL), (ii) communication-efficient FL (FedAvg), (iii) green/cost-aware edge AI, (iv) congestion pricing and queueing.
3. Model primitives and steady-state objective: Define update frequency, local epochs, AoI/latency, and resource prices (carbon and congestion). Explain why the model is a reduced-form approximation to per-unit-time welfare.
4. Baseline optimization (full-information procurement): Solve the per-node problem; show uniqueness and provide closed-form solutions for common ϕ (e.g., ϕ(E) = log (1 + E)).
5. Comparative statics: Prove monotonicity of E^* in p_C, f^* in p_N, and conditions for monotonicity of f^* in p_C. Provide economic interpretation as input substitution with timeliness externalities.
6. Regime switching (FedAvg-like vs comm-heavy): Impose discrete architecture constraint E ∈ {E_L, E_H}; derive the carbon threshold p_C^†(p_N) and show how congestion shifts it.
7. Heterogeneous nodes and selection: Add heterogeneity in e_i and value/productivity v_i; derive index-based participation/selection cutoffs; discuss budget constraints.
8. Endogenous congestion pricing extension: Let p_N depend on aggregate update load F = ∑_if_i; characterize fixed points; note when closed-form breaks and numerical methods are needed.
9. Quantitative illustration: Calibration ranges (energy per epoch, carbon intensity, bandwidth prices); simulation design; falsifiable predictions (elasticities).
10. Discussion and conclusion: Design guidance for practitioners; policy implications for telemetry and carbon accounting; limitations and next steps.

Content

1. Introduction: Why carbon/congestion pricing is the new bottleneck for IIoT FL in 2026; link to MEFL satisfaction and FedAvg compute–communication substitution.

In 2026, the operational bottleneck for industrial IoT (IIoT) federated learning (FL) is no longer “can we train at the edge?” but rather “who pays for the externalities created by training at the edge, and how do those prices reshape the engineering choices we thought were purely technical?” Two developments make this shift salient. First, carbon accounting has moved from a reputational concern to an explicit budget constraint: industrial firms increasingly face internal carbon shadow prices, sectoral emissions trading exposure, and procurement rules that treat compute energy as a priced input. Second, connectivity for mission-critical IIoT—private 5G, Wi‑Fi 7, time-sensitive networking, and shared backhaul—has become sufficiently loaded that network usage is often priced (or rationed) via congestion-aware tariffs, slicing fees, or capacity penalties. When edge devices participate in FL, they consume both energy (local training) and network resources (uplink model updates). Carbon and congestion pricing therefore enter the FL design problem as first-order economic primitives, not after-the-fact implementation details.

This paper’s starting point is that IIoT FL is ultimately procured: a server (an OEM, platform provider, or plant operator) solicits timely, useful updates from a population of heterogeneous devices. What the server cares about is not “accuracy in the limit” but steady operational value—predictive maintenance quality, anomaly detection, process control robustness—delivered under tight timeliness constraints. In this environment, stale information is costly even if it is statistically “good.” This is why age-of-information (AoI) and latency concepts, originally developed for communication systems, have become central to the economics of IIoT learning. A model update arriving late can be worse than no update at all if it causes the global model to lag behind fast-moving machine states. Conversely, excessive communication to avoid staleness can overload the network and create the very latency the system seeks to prevent.

These realities motivate a simple but economically sharp abstraction: each node chooses (or is instructed to choose) how often to communicate and how much to compute per communication. The former is summarized by an update frequency (or equivalently an update cycle length), while the latter is summarized by local computation intensity, such as the number of local epochs per update. This two-dimensional control captures a canonical engineering substitution in FL: we can communicate more frequently with lighter local computation (communication-heavy, “more FedSGD-like”), or communicate less frequently while doing more work locally (compute-heavy, “more FedAvg-like”). In practice, this substitution is not merely about bandwidth; it also affects stability, drift under non-iid data, and robustness to intermittent connectivity. Our goal is not to reproduce all of those dynamics, but to isolate the economic tradeoff that pricing makes unavoidable: communication-heavy regimes pay more congestion charges and potentially induce more network delay, while compute-heavy regimes pay more carbon/energy charges and risk higher device-side energy draw.

A second modeling motivation comes from how industrial systems are governed. IIoT deployments are rarely egalitarian, volunteer networks; they are managed ecosystems where participation is engineered and incentivized. Devices differ in energy intensity (hardware generation, thermal constraints, battery health), and their data are unevenly valuable (rare fault modes, critical asset classes). The server therefore faces a procurement problem: it must decide which devices should participate and at what intensities, taking prices as given. In many deployments, the server can implement these choices through posted contracts (e.g., reimbursements, service credits, or internal transfer prices) because devices are owned by the firm or operate under enforceable service-level agreements. Even when devices are not fully controlled, the direction of travel is toward verifiability via device attestation and digital twins that report energy use, duty cycles, and communication volumes. This institutional detail matters because it makes full-information benchmark allocations informative: they describe what a sophisticated operator would implement absent hidden information frictions, and they provide a yardstick for later mechanism-design extensions.

Carbon and congestion prices matter here not only because they raise costs, but because they reshape optimal mixing of compute and communication. A naïve view says: if carbon becomes expensive, do less local training; if congestion becomes expensive, communicate less. The more subtle point is that FL’s value is produced jointly by (i) update freshness and (ii) update quality. Freshness is governed by how often information is incorporated; quality is amplified by local computation, but with diminishing returns. In IIoT, diminishing returns are particularly plausible: after a small number of local epochs, additional passes over streaming or highly correlated sensor data add little incremental generalization but still consume energy. Likewise, timeliness costs are not linear in frequency: updating too rarely causes staleness, while updating too often can create congestion-induced delay and scheduling overhead. Once we recognize these curvature properties, prices do more than scale down activity—they change which margin is worth paying for.

This is where the “MEFL satisfaction” framing is useful. Rather than modeling end-to-end accuracy trajectories, we represent the per-unit-time contribution of a device as a satisfaction or value flow that increases in (effective) learning progress and decreases in timeliness costs. The server’s problem becomes a steady-state control problem: choose device intensities to maximize surplus net of priced resource use. This reframing is pragmatic. Industrial operators frequently optimize steady-state KPIs—downtime risk, false-alarm rates, latency budgets—under recurring energy and network charges, rather than solving a one-off training problem. Moreover, the steady-state view aligns with how carbon and congestion prices are applied: they are assessed per unit energy and per unit data, period after period. By matching the model’s objective to the billing and budgeting realities, we can speak directly to the “new bottleneck” facing IIoT FL procurement teams.

The engineering counterpart to this economic framing is the compute–communication substitution embodied by FedAvg-like architectures. FedAvg can be read as a response to expensive communication: increase local work per round so that fewer rounds are needed. In 2026, however, we must ask: expensive relative to what? If communication is priced but compute is cheap and clean (e.g., devices plugged into low-carbon grids), the classic FedAvg intuition strengthens. If compute is carbon-priced and devices are energy-inefficient, the same shift toward local computation becomes socially and privately costly. Put differently, the same algorithmic knob—number of local epochs—has an economic meaning that depends on carbon prices and device heterogeneity. Our framework makes this dependence explicit and yields testable comparative statics: holding everything else fixed, a higher congestion price tilts the optimal design toward fewer updates; a higher carbon price tilts it toward less local computation, and, under a natural condition, toward fewer updates as well because updates become less valuable when each update embodies less effective learning.

This pricing-centric perspective also helps interpret practical phenomena that operators increasingly report. First, “network is the bottleneck” complaints often coexist with surprisingly conservative update schedules. Our model rationalizes this: if increasing update frequency triggers congestion-induced latency (queueing, scheduling contention, retransmissions), timeliness can deteriorate even as nominal update frequency rises, making the marginal update unattractive. Second, carbon budgets can lead to an apparently paradoxical outcome: an operator may reduce local computation and also reduce communication, not because both inputs are complements technologically, but because the value of communicating a weaker update falls when local computation is curtailed. Third, heterogeneous devices will not be treated symmetrically: carbon pricing effectively penalizes energy-inefficient nodes, changing the composition of participants and potentially biasing the learned model toward devices that are “cheap to run,” a fairness and representativeness concern that practitioners increasingly flag.

We emphasize what this introduction does not claim. We do not model the full nonconvex training dynamics of deep networks, nor do we endogenize carbon and congestion prices from equilibrium in electricity or telecommunications markets. We also abstract from some engineering constraints—uplink scheduling, packet loss, intermittent connectivity—that matter in specific deployments. Our claim is narrower: even with conservative proxies for AoI/latency and a reduced-form representation of learning returns, the procurement problem already produces sharp economic predictions and policy-relevant thresholds. In particular, when the server must choose between discrete architectural regimes—communication-heavy versus compute-heavy—pricing generates a switching rule: there exists a carbon price level above which the compute-heavy regime is no longer worth its energy cost, and this threshold itself shifts with congestion pricing. This kind of threshold logic is precisely what practitioners want when they must set organization-wide defaults (“use FedAvg with E_H unless carbon price exceeds X”) while retaining the ability to fine-tune per-device intensities.

Finally, the policy and practice implications are immediate. Carbon pricing, whether explicit (tax/ETS) or internal (shadow prices in capital planning), will influence not only how much edge learning occurs but how it is architected. Congestion pricing and network tariffs will analogously influence whether “communication-efficient” FL is actually efficient in an economic sense. Our model provides a structured way to translate prices into design parameters—update cycles and local epochs—and thus into implementable procurement guidelines. It also clarifies where additional institutional design is needed: if firms care about representativeness or resilience, they may need mechanisms that counteract the selection effects induced by pricing. The remainder of the paper formalizes these tradeoffs, derives the comparative statics that connect prices to optimal FL intensities, and then uses a discrete architecture restriction to make the compute–communication substitution and its regime-switching implications transparent.

Work at the intersection of IIoT communications, federated learning, and economic design has grown rapidly, but it is still fragmented across communities that often treat “timeliness,” “communication efficiency,” and “energy/carbon cost” as separate engineering objectives. Our contribution is to place these elements in a single procurement-oriented surplus maximization problem in which carbon and congestion prices enter as explicit primitives. This section situates that move relative to four closely related literatures.

A first thread studies age-of-information (AoI), latency, and timeliness incentives in networked control and industrial analytics, and more recently in IIoT FL. AoI was developed to formalize how “fresh” a received signal is, with a central insight that the welfare-relevant notion of timeliness is not simply throughput but the distribution of staleness over time. In IIoT settings—predictive maintenance, anomaly detection, and process control—the economic loss from stale information can be highly convex: an update that arrives “a bit late” may be nearly useless if the underlying machine state evolves quickly. The communications literature has therefore emphasized scheduling, sampling, and queueing policies that minimize AoI or AoI-like penalties under capacity constraints (e.g., sampling policies, peak AoI vs average AoI, and variants that incorporate service times and congestion). Parallel work in learning has adopted timeliness-aware objectives for FL—sometimes under labels such as “mobile/edge FL with AoI,” “timeliness-aware FL,” or “MEFL”—where the goal is to prioritize clients with fresher or more relevant updates and to account for straggling and intermittent connectivity. What we take from this literature is not a particular AoI formula, but the modeling discipline: timeliness is a first-order performance dimension, and it interacts nonlinearly with update frequency because higher update rates can themselves create queueing delay. Where our approach differs is that we treat timeliness as an economic cost that enters the server’s objective alongside priced resource usage, rather than as a standalone network metric to be minimized subject to fixed learning requirements. This shift matters in procurement environments because the server’s choice is rarely “minimize AoI at any cost”; it is “how much timeliness is worth paying for” once energy and network usage are budgeted and priced.

A second related line concerns communication-efficient FL and the compute–communication substitution embodied by FedAvg and its descendants. In the FL literature, FedAvg is commonly motivated by limited uplink capacity or device participation constraints: do more local SGD steps per round to reduce the number of rounds, possibly combined with compression, sparsification, quantization, partial participation, and asynchronous protocols. A large theory and systems literature studies how local steps affect convergence rates, stability under heterogeneous data, and sensitivity to client drift; empirically, practitioners tune local epochs as a systems knob trading communication rounds against local computation and potential accuracy degradation. Our model is deliberately agnostic about detailed convergence dynamics, but it is tightly aligned with the core engineering substitution: local computation can increase the “effective quality” of an update, while communication frequency determines how quickly the global state incorporates new information. We differ from much of the communication-efficient FL literature in two ways. First, we make the substitution priced: each update carries an explicit network cost, and each additional local epoch carries an explicit energy/carbon cost. Second, we emphasize that the best choice of “communication-efficient” architecture is not invariant; it depends on the relative shadow prices and on curvature—diminishing returns to extra local computation and increasing marginal timeliness loss from aggressive update rates. In practice, FedAvg-like defaults are often justified by bandwidth limitations alone; our framework clarifies when those defaults remain optimal once carbon accounting is added, and when an operator should instead shift toward fewer local epochs (even if it increases communication) because the marginal carbon cost dominates.

A third literature studies green and cost-aware edge AI, including energy-efficient training/inference, carbon-aware scheduling, and “sustainable ML.” This work ranges from device-level power modeling and DVFS-based energy control, to data-center carbon-aware load shifting, to system-wide accounting frameworks that translate energy use into carbon using grid intensity. In edge AI and IIoT, the focus is often on prolonging battery life, respecting thermal constraints, or meeting device duty-cycle budgets, sometimes with multi-objective optimization that trades accuracy against energy. More recent work also considers carbon pricing or carbon budgets, either explicitly (as a monetary penalty) or implicitly (as a constraint). We build on the same premise—that energy is not free and increasingly is not even “just a technical constraint”—but we embed it into a surplus maximization problem that also values timeliness. This matters because “green” policies implemented inside firms are frequently expressed as internal shadow prices applied uniformly across business units, while network charges are implemented as tariffs or capacity penalties; both are naturally modeled as per-unit prices. Our model is therefore closer in spirit to cost-accounting and procurement practice than to purely engineering-centric energy minimization. At the same time, we acknowledge a limitation relative to the green AI literature: we do not attempt to model the physical power curve of devices, thermal throttling, or time-varying grid carbon intensity. Those features can be layered in later, but our goal in the baseline is to isolate the economic comparative statics that already arise under linear per-epoch energy costs and a reduced-form learning return.

A fourth foundational thread comes from congestion pricing and queueing economics, which provides the conceptual backbone for treating network load as a priced externality. Classic results in transport and telecommunications show that congestion creates a wedge between private and social marginal costs, motivating Pigouvian tolls or priority pricing. In packet networks and wireless systems, congestion manifests through queueing delays, contention, retransmissions, and scheduling overhead—effects that are often negligible at low load but rise sharply near capacity. The queueing literature supplies both the analytical objects (latency as a function of arrival rates) and the economic logic (pricing to internalize congestion). In the IIoT FL setting, the “arrival rate” is induced by the server’s chosen update frequencies across many devices, and the congestion cost is experienced as increased end-to-end latency and/or explicit usage charges. Our reduced-form latency proxy is meant to capture this increasing marginal delay without committing to a specific queueing model or MAC-layer protocol. The point of contact with congestion pricing is twofold: (i) it justifies treating per-update communication as carrying a marginal social cost beyond the device’s own energy use, and (ii) it highlights that pushing frequency too high can be self-defeating because it increases delay and therefore timeliness loss. In that sense, our framework complements purely algorithmic approaches that attempt to “speed up” FL by increasing communication: even if more frequent updates can help learning, their value must be weighed against congestion-induced latency and priced network costs.

Across these literatures, our primary synthesis is to treat IIoT FL as a procurement problem with two priced inputs—energy/carbon and network usage—producing a value flow that is increasing in effective learning progress and decreasing in timeliness loss. This perspective helps organize several practical questions that are difficult to answer within any single prior strand: when do carbon budgets push an operator toward “lighter” local training, and when does congestion pricing push toward less frequent communication? When does it become optimal to switch discretely between FedAvg-like and more communication-heavy regimes? And how do these incentives interact with heterogeneous device energy intensities in a way that changes who participates? By positioning our analysis relative to AoI/MEFL, communication-efficient FL, green edge AI, and congestion pricing, we aim to provide a common economic language that is directly usable for design and policy: prices translate into optimal update cycles and local epoch choices, and into threshold rules that procurement teams can operationalize—even when the underlying learning dynamics are complex.

3. Model primitives and steady-state objective: Define update frequency, local epochs, AoI/latency, and resource prices (carbon and congestion). Explain why the model is a reduced-form approximation to per-unit-time welfare.

We now formalize the procurement environment that underlies the comparative statics in the remainder of the paper. The goal is not to reproduce a full learning-theoretic convergence analysis, but to write down a steady-state per-unit-time objective that puts (i) learning “quality” from local computation, (ii) timeliness losses from stale/slow information, and (iii) priced resource usage (carbon/energy and network congestion) into a single surplus metric that a procuring server can optimize.

We consider a server (principal) coordinating federated learning across a set of IIoT devices (agents) indexed by i ∈ ℐ, with |ℐ| = I. Each node generates model updates at a chosen cycle length θ_i > 0, equivalently at frequency f_i ≡ 1/θ_i updates per unit time. We interpret f_i broadly: it can represent literal periodic reporting, expected update arrivals under randomized scheduling, or an effective rate induced by partial participation. The key modeling discipline is that increasing f_i improves freshness and (typically) learning progress, but it also loads the network and can increase congestion-induced delay.

In addition to deciding how often a node communicates, the server also chooses the node’s local computation intensity E_i ≥ 0, interpreted as the number of local epochs (or, more generally, the amount of local optimization) performed per communicated update. This is the canonical FedAvg lever: holding communication fixed, more local work can make each transmitted update “more informative,” but with diminishing returns and potentially higher resource cost. To capture this, we assume that the per-update effectiveness of local computation is summarized by a function ϕ(E) satisfying: ϕ(0) = 0, ϕ^′(E) ≥ 0, ϕ^″(E) ≤ 0, and ϕ ∈ C². Concavity is the reduced-form way to encode that the first few local epochs may substantially improve an update, while additional epochs yield smaller incremental gains.

Timeliness primitives: AoI and congestion-induced latency

A central object in IIoT is the cost of operating on stale information. Rather than modeling detailed queueing dynamics and control losses, we use a standard AoI-inspired proxy that is analytically transparent and captures the nonlinearity that matters for incentives. If updates arrive periodically at rate f, the average age of information is proportional to the area of a sawtooth: it scales like 1/(2f). We therefore take the baseline AoI proxy as
$$ A(f)=\frac{1}{2f}. $$
In many IIoT settings, “timeliness” is not only about staleness from infrequent sampling; it is also about latency from congestion, contention, and queueing. To reflect that, we augment the 1/(2f) term with a reduced-form congestion penalty that rises with the induced load. We write a latency proxy
$$ L(f)=\frac{1}{2f}+\kappa f,\qquad \kappa>0, $$
where κf is a tractable way to represent the idea that pushing update frequency higher can be self-defeating: it increases arrival rates, which increases delay (or the probability of missing deadlines), which erodes the usefulness of the updates. We do not claim κf is a literal queueing formula; rather, it is a parsimonious approximation that ensures increasing marginal timeliness costs at high frequencies—precisely the force that rules out “infinite frequency” corner solutions and makes the economics of congestion pricing visible.

We weight timeliness losses by λ > 0, which converts the latency/AoI proxy into the same units as learning value. In applications, λ can be interpreted as the marginal business cost of delay/staleness: higher for fast-moving processes (tight control loops, safety monitoring) and lower for slowly evolving tasks (long-horizon condition monitoring).

Learning-value primitive: per-unit-time flow benefits

The server values learning progress, but in steady state we model this progress as a flow rather than as a finite-horizon trajectory. Specifically, we assume that node i’s update stream contributes value proportional to the rate of effective updates:
effective update flow from node i ∝ ϕ(E_i) f_i.
Let τ > 0 scale the marginal value of effective learning progress. Then the per-unit-time gross contribution from node i, net of timeliness losses but before monetary resource costs, is
$$ G_i(f_i,E_i)=\tau\,\phi(E_i)\,f_i-\lambda\,L(f_i) =\tau\phi(E_i)f_i-\lambda\left(\frac{1}{2f_i}+\kappa f_i\right). $$
This term intentionally abstracts from the full dynamics of optimization (nonconvexity, client drift, and so forth). The maintained interpretation is that, at the margin, more frequent updates and more effective per-update computation increase the rate at which the global model improves (or the rate at which the deployed model tracks a changing environment), while timeliness losses reduce the realized value of that improvement.

Priced inputs: carbon/energy and network congestion

Procurement environments differ from pure engineering optimization because input usage is priced—either through literal market prices/tariffs or through internal shadow prices tied to budgets and reporting. We therefore introduce two per-unit prices:

A carbon/energy price p_C ≥ 0 applied to local computation. Node i has an energy intensity e_i > 0 per local epoch per update. If node i performs E_i epochs per update, its energy per update is e_iE_i, and its monetary energy (or carbon-accounting) cost per update is p_Ce_iE_i.
A network congestion price p_N ≥ 0 applied to communication. Each update has payload size m > 0 (e.g., a model delta), taken constant for tractability. The monetary network cost per update is then p_Nm. One can interpret p_N as an access tariff set by a network operator, or as the server’s internal congestion shadow price capturing the externality that one device’s traffic imposes on others.

Because both costs are incurred per update, per-unit-time costs scale with frequency. Thus node i’s priced resource cost per unit time is
C_i(f_i, E_i) = (p_Ce_iE_i + p_Nm)f_i.
This linear-in-f_i structure is not an innocuous simplification: it is exactly what makes “communication is expensive when p_N is high” a sharp statement, and it maps directly to how cloud/network bills and carbon accounting are typically booked (usage × price).

Server objective and the steady-state interpretation

We assume the server has quasilinear preferences and aggregates value across nodes. Let β > 0 be a scale factor converting satisfaction G_i into the server’s value (or, equivalently, an importance weight on the FL task within a broader business objective). The server’s steady-state per-unit-time objective is
W = ∑_{i ∈ ℐ}[β G_i(f_i, E_i) − C_i(f_i, E_i)].
We emphasize two interpretations of W. First, it is a social surplus measure under the common procurement benchmark in which transfers between the server and nodes net out, so only real resource costs (energy and network usage) remain. Second, it is also a reduced-form profit objective if p_C and p_N are literal input prices faced by the server (possibly passed through in contracts).

Why steady state? In practice, FL involves transient learning curves, changing participation, and time-varying network conditions. Our baseline deliberately compresses these dynamics into a stationary flow problem for three reasons. (i) In many IIoT deployments, FL is not a one-shot training run but an ongoing process of adaptation, making per-unit-time welfare a natural primitive. (ii) The key policy instruments we want to interpret—carbon prices, congestion tariffs—are themselves typically expressed per unit energy and per unit data, i.e., in flow terms. (iii) The comparative statics of interest (how optimal E_i and f_i move with p_C and p_N) already appear in this stationary formulation; adding richer learning dynamics is valuable, but it should not be necessary to see the basic economic logic.

At the same time, we acknowledge what this abstraction leaves out. The function ϕ(E) subsumes complex algorithmic effects: client drift under many local steps, heterogeneity in data quality, and nonstationary environments. The payload m can in reality depend on compression or sparsification choices. And the latency proxy L(f) is not derived from a specific queueing discipline. We view these as extensions rather than flaws: the baseline is meant to be a transparent mapping from prices to operating choices (update cycles and local epochs). The next section takes this objective as given and solves the per-node problem under full information, providing closed-form solutions for common ϕ and clean threshold rules that can be operationalized in procurement settings.

4. Baseline optimization (full-information procurement): Solve the per-node problem; show uniqueness and provide closed-form solutions for common ϕ (e.g., ϕ(E) = log (1 + E)).

4. Baseline optimization (full-information procurement)

We begin with the benchmark in which the server observes each node’s energy intensity e_i and can directly implement (or contract on) the pair (f_i, E_i). Because the objective is additively separable across nodes, the server’s problem decomposes into I independent per-node optimizations. Fix a node i and suppress the subscript when no confusion arises. The per-node program is
$$ \max_{f\ge 0,\;E\ge 0}\; w(f,E) =\beta\Big(\tau\phi(E)f-\lambda\Big(\frac{1}{2f}+\kappa f\Big)\Big)-\big(p_C e\,E+p_N m\big)f. $$
The structure is economically transparent: f scales both the benefits from “effective updates” and the per-update monetary costs, while the timeliness term generates a nonlinearity in f that prevents the trivial “update infinitely often” solution when parameters imply that high frequency is socially valuable.

4.1. First-order conditions and interior characterization

Consider first an interior candidate with f > 0 and E > 0. The FOC with respect to local computation E is
$$ \frac{\partial w}{\partial E} =\beta\tau\,\phi'(E)\,f - p_C e\,f = f\big(\beta\tau\phi'(E)-p_C e\big)=0, $$
so any interior optimum must satisfy the simple marginal condition
$$ \boxed{\;\beta\tau\phi'(E^*)=p_C e.\;} \tag{E-FOC} $$
Two points are worth highlighting. First, f factors out, so E^* (when interior) is pinned down solely by the tradeoff between the marginal learning value of an extra local epoch and its carbon-priced energy cost. Second, because ϕ is concave, ϕ^′ is weakly decreasing, so the equation has at most one solution; if it fails (because p_Ce is too large), the optimum is at the boundary E^* = 0.

The FOC with respect to frequency f is
$$ \frac{\partial w}{\partial f} =\beta\tau\phi(E)-\beta\lambda\kappa-\big(p_C eE+p_N m\big) +\frac{\beta\lambda}{2f^2}=0. $$
Rearranging yields
$$ \frac{\beta\lambda}{2f^{2}} =\big(p_C eE+p_N m\big)+\beta\lambda\kappa-\beta\tau\phi(E). \tag{f-FOC} $$
Whenever the right-hand side is strictly positive, we obtain the closed-form interior frequency
$$ \boxed{\; f^*(E) =\sqrt{\frac{\beta\lambda}{2\Big[\big(p_C eE+p_N m\big)+\beta\lambda\kappa-\beta\tau\phi(E)\Big]}} \;}. \tag{1} $$
If the bracketed term is nonpositive, the per-node objective is increasing in f over the relevant range (the linear-in-f component dominates the −λ/(2f) term), and an interior optimizer does not exist. In that case the solution is a boundary choice determined by feasibility (e.g., an exogenous maximum feasible update rate, or a network-imposed cap). In the remainder, we focus on the economically relevant region where parameters imply the bracketed term is positive at the optimal E, so that a finite f^* exists.

Putting the two FOCs together gives a convenient solution procedure: (i) solve for E^* from (E-FOC) with a nonnegativity truncation, then (ii) plug E^* into (1) to obtain f^*.

4.2. Uniqueness (and why κ > 0 matters)

The per-node objective is strictly concave in f on f > 0 because
$$ \frac{\partial^2 w}{\partial f^2}=-\frac{\beta\lambda}{f^3}<0. $$
It is concave in E because ϕ is concave and w is linear in E outside of ϕ(E):
$$ \frac{\partial^2 w}{\partial E^2}=\beta\tau\phi''(E)\,f\le 0. $$
Intuitively, concavity in E comes from diminishing returns to local epochs, while strict concavity in f is generated by the timeliness term −λ/(2f) together with the linear congestion penalty −λκf. The role of κ > 0 is practical as well as technical: it ensures that “more frequent reporting” eventually becomes unattractive because it creates delay/overload, thereby ruling out knife-edge parameter cases in which the best response is to push communication to the maximum possible rate.

Under these conditions, w(f, E) is jointly concave on f > 0, E ≥ 0, so the optimizer (f^*, E^*) is unique up to boundary truncation. Thus the FOCs above are not merely necessary but (with the usual KKT conditions) sufficient for optimality.

4.3. Closed-form solution for ϕ(E) = log (1 + E)

To make the comparative statics operational, it is useful to specialize to a standard effectiveness function. Let
$$ \phi(E)=\log(1+E),\qquad \phi'(E)=\frac{1}{1+E}. $$
Then the interior condition βτϕ^′(E) = p_Ce becomes
$$ \frac{\beta\tau}{1+E^*}=p_C e \quad\Rightarrow\quad E^*=\frac{\beta\tau}{p_C e}-1. $$
Imposing E ≥ 0 yields the truncated closed form
$$ \boxed{\; E^*(p_C)=\max\Big\{\frac{\beta\tau}{p_C e}-1,\;0\Big\}. \;} \tag{2} $$
This expression already captures the core “procurement” message: the server substitutes away from local computation as carbon becomes expensive, and low-efficiency devices (high e) are assigned systematically fewer local epochs.

Given E^*, the frequency expression (1) becomes
$$ \boxed{\; f^*(p_C,p_N) =\sqrt{\frac{\beta\lambda}{2\Big[\big(p_C eE^*(p_C)+p_N m\big)+\beta\lambda\kappa-\beta\tau\log(1+E^*(p_C))\Big]}} \;} \tag{3} $$
provided the denominator is positive. Two special cases help interpretation:

High carbon price (or inefficient device): If p_Ce ≥ βτ, then E^* = 0 and
$$ f^*=\sqrt{\frac{\beta\lambda}{2\big[p_N m+\beta\lambda\kappa\big]}}. $$
Communication is then chosen purely to balance timeliness gains against congestion pricing and congestion-induced latency; the device does no extra local work because its carbon-priced compute is too costly.
Low carbon price: If p_Ce < βτ, then E^* > 0 and both the value and the cost per update shift with p_C. Equation (3) makes clear that carbon pricing affects f^* only through the net per-update term βτϕ(E^*) − (p_CeE^*), which will be the object driving the comparative statics in the next section.

4.4. Implementation as a full-information procurement contract

Because our benchmark assumes full information and quasilinear payoffs, the allocation (f_i^*, E_i^*) can be implemented by a simple posted contract for each node: reimburse verifiable resource usage at the prevailing prices (or shadow prices) and add a fixed transfer to satisfy participation. Concretely, a contract can specify a required (f_i, E_i), verify performance via telemetry/attestation, and pay
T_i = (p_Ce_iE_i + p_Nm)f_i + (fixed fee),
so the node is indifferent over the mandated action while the server internalizes the true resource costs in its choice of (f_i, E_i). This benchmark is useful not because it describes every deployment, but because it cleanly separates (i) the economic logic of carbon and congestion pricing from (ii) informational frictions that we can layer on later via menus and self-selection.

In sum, the baseline delivers a unique per-node prescription: choose E_i^* by equating marginal learning value to marginal carbon-priced energy cost, then choose f_i^* to trade off timeliness benefits against both monetary per-update costs and congestion-induced latency. The next section uses these characterizations to derive monotonicity results and the regime-switching thresholds that are most directly interpretable for policy and system design.

5. Comparative statics: Prove monotonicity of E^* in p_C, f^* in p_N, and conditions for monotonicity of f^* in p_C. Provide economic interpretation as input substitution with timeliness externalities.

5. Comparative statics: prices, substitution, and timeliness externalities

Our first-order characterization delivers clean monotonicity predictions once we treat (p_C, p_N) as policy-relevant shadow prices. The guiding intuition is standard in procurement: the server substitutes away from inputs whose unit prices rise, but here substitution is mediated by a timeliness externality—higher frequency reduces staleness (AoI) yet increases congestion-induced delay through κf. This section formalizes three comparative statics: (i) carbon pricing reduces local computation intensity E^*; (ii) congestion pricing reduces update frequency f^*; and (iii) carbon pricing also weakly reduces f^* once we account for the fact that the “effective value per update” is itself optimized over E.

5.1. Carbon price and local computation E^*

Start from the interior epoch condition (E-FOC),
βτϕ^′(E^*) = p_Ce.
Because ϕ is concave, ϕ^′ is weakly decreasing, so the mapping p_C ↦ E^*(p_C) is (weakly) decreasing. The formal derivative follows from the implicit function theorem on the interior region where E^* > 0 and ϕ^″(E^*) < 0:
$$ \beta\tau \phi''(E^*)\frac{\partial E^*}{\partial p_C}=e \quad\Rightarrow\quad \frac{\partial E^*}{\partial p_C}=\frac{e}{\beta\tau\phi''(E^*)}\le 0. $$
At high carbon prices the interior equation may cease to have a nonnegative solution; then the KKT conditions imply a boundary choice E^* = 0. Thus E^*(p_C) is globally weakly decreasing: it falls smoothly when interior and becomes flat at zero once carbon-priced compute is sufficiently expensive.

Two economic implications are immediate. First, heterogeneity in energy intensity e_i acts exactly like heterogeneous effective carbon exposure: devices with high e_i are assigned fewer local epochs, even under full information, because their marginal epoch is “taxed” more heavily. Second, in the baseline model E^* is independent of the congestion price p_N, reflecting separability: p_N prices communication per update, while the epoch decision trades off only marginal learning returns βτϕ^′(E) against marginal carbon cost p_Ce. (In richer formulations where accuracy per unit time must meet a target, we would expect substitution from communication toward compute when p_N rises; we return to this idea in the discrete regime model.)

5.2. Congestion price and update frequency f^*

Next consider the frequency decision. Holding E fixed, the interior solution takes the closed form (1),
$$ f^*(E)=\sqrt{\frac{\beta\lambda}{2D(E;p_C,p_N)}},\qquad D(E;p_C,p_N)\equiv \big(p_C eE+p_N m\big)+\beta\lambda\kappa-\beta\tau\phi(E), $$
defined when D > 0. Differentiating with respect to p_N at an interior solution yields a strict monotonicity result whenever m > 0:
$$ \frac{\partial f^*}{\partial p_N} =-\frac{1}{2}\sqrt{\frac{\beta\lambda}{2}}\,D^{-3/2}\cdot m = -\frac{m}{2D}\,f^*<0. $$
Thus congestion pricing reduces the efficient update rate: as each transmission becomes monetarily more expensive, the server economizes on communication by increasing the update cycle length θ = 1/f.

This monotonicity has a straightforward “Pigouvian” interpretation. The term p_Nm can be read either as an explicit price charged by a network operator or as the server’s own shadow price for network capacity (e.g., an internal congestion cost). In either case, a higher p_N shifts up the per-update marginal cost of frequency, and the optimal response is to reduce f, especially when m (payload size) is large.

5.3. Carbon price and update frequency f^*: the role of optimized per-update value

Carbon pricing enters the frequency condition both directly (through the term p_CeE) and indirectly (through the induced reduction in E^*). A convenient way to organize the effect is to separate the “per-update net value of compute” from the purely frequency-driven timeliness tradeoff.

Define the optimized per-update net learning value (in utility units) as
Ψ(p_C) ≡ max_E ≥ 0{βτϕ(E) − p_CeE} = βτϕ(E^*(p_C)) − p_CeE^*(p_C).
Then the denominator in the frequency formula can be rewritten at the epoch optimum as
D(E^*(p_C); p_C, p_N) = p_Nm + βλκ − Ψ(p_C),
so that, whenever the interior f^* exists,
$$ f^*(p_C,p_N) =\sqrt{\frac{\beta\lambda}{2\big[p_N m+\beta\lambda\kappa-\Psi(p_C)\big]}}. \tag{CS-f} $$
This expression makes the economics transparent: communication is more frequent when the per-update net value Ψ(p_C) is high (updates are “worth more”), and less frequent when either congestion costs (p_Nm) or congestion-induced latency (βλκ) are high.

The remaining question is how Ψ(p_C) moves with p_C. Under our maintained assumptions, Ψ is weakly decreasing in p_C; indeed, where the epoch choice is interior, the envelope theorem gives an especially simple derivative:
$$ \frac{d\Psi}{dp_C} =\frac{\partial}{\partial p_C}\Big(\beta\tau\phi(E^*)-p_C eE^*\Big) =-eE^*(p_C)\le 0. $$
Intuitively, when the carbon price rises, the server optimally trims epochs, but even after this adjustment the best achievable per-update net learning value falls—by exactly the current energy exposure eE^*. Combining this with (CS-f), we obtain
$$ \frac{\partial f^*}{\partial p_C} =\frac{f^*}{2D}\,\frac{d\Psi}{dp_C} = -\frac{eE^*(p_C)}{2D}\,f^* \le 0, $$
with strict inequality whenever E^* > 0 and the interior f^* solution applies. When carbon pricing is high enough that E^* = 0, Ψ(p_C) = 0 locally and ∂f^*/∂p_C = 0: carbon policy no longer distorts frequency because the server has already shut down discretionary local computation.

This result is best read as an interaction between input substitution and a timeliness externality. Carbon pricing directly discourages compute (lower E^*), which mechanically makes each update less effective; because frequency is chosen to balance timeliness gains against per-update costs, lower per-update effectiveness reduces the incentive to communicate frequently. The κf term is central here: it ensures that frequency is not “free” in latency terms, so a decline in per-update value translates into a lower optimal cadence rather than an attempt to compensate by updating ever more often.

Finally, we note a practical corollary that will matter for architecture choice. Since p_N moves f^* directly while p_C moves both E^* and f^*, the joint policy environment (p_C, p_N) shapes whether the system should rely on “communication-heavy” learning (high f, low E) or “compute-heavy” learning (lower f, higher E per update). The next section makes this tradeoff explicit by restricting E to two discrete regimes and deriving the carbon threshold p_C^†(p_N) at which the server switches architectures.

6. Regime switching (FedAvg-like vs comm-heavy): Impose discrete architecture constraint E ∈ {E_L, E_H}; derive the carbon threshold p_C^†(p_N) and show how congestion shifts it.

6. Regime switching (FedAvg-like vs. communication-heavy): a discrete architecture constraint

The continuous-choice benchmark is useful for comparative statics, but it overstates the server’s flexibility. In most IIoT FL deployments the “architecture’’ is implemented by protocol and software defaults (or certification constraints) that effectively pin local computation to a small set of admissible epoch regimes. A natural stylization is therefore to restrict the epoch choice to two discrete levels,
E ∈ {E_L, E_H}, 0 ≤ E_L < E_H,
where E_L represents a communication-heavy design (many rounds, little local work per round) and E_H represents a compute-heavy, FedAvg-like design (more local work per round).

With E discrete, the server’s problem becomes a two-step program: for each regime r ∈ {L, H}, (i) choose the optimal update frequency f given E = E_r, and then (ii) pick the regime delivering the higher optimized per-node value. We can then ask a policy-relevant question: for given congestion pricing p_N, at what carbon price p_C does the server switch from compute-heavy to communication-heavy?

6.1. Frequency choice within a fixed regime

Fix a node i and suppress the index to simplify notation. For a given epoch regime E_r, the per-unit-time objective reduces to a one-dimensional problem
V_r(p_C, p_N) ≡ max_f > 0 w(f, E_r),
with
$$ w(f,E_r) =\beta\Big(\tau\phi(E_r)f-\lambda\big(\tfrac{1}{2f}+\kappa f\big)\Big) -\big(p_C eE_r+p_N m\big)f. $$
As in the interior analysis, define the “effective per-unit-frequency wedge’’
D_r(p_C, p_N) ≡ (p_CeE_r + p_Nm) + βλκ − βτϕ(E_r).
Whenever D_r > 0, the objective is strictly concave in f and the unique interior maximizer is
$$ f_r^*(p_C,p_N) =\sqrt{\frac{\beta\lambda}{2D_r(p_C,p_N)}}. $$
Intuitively, D_r aggregates all terms that make frequent communication unattractive in regime r: carbon-priced compute per unit time (p_CeE_r)f, congestion-priced traffic (p_Nm)f, and congestion-induced latency βλκf, net of the learning value βτϕ(E_r)f. A higher E_r raises both the carbon exposure p_CeE_r and the per-update value βτϕ(E_r); which force dominates determines whether the compute-heavy regime induces a higher or lower optimal cadence.

Plugging the optimizer back into w yields the regime value (for D_r > 0):
$$ V_r(p_C,p_N) =w\big(f_r^*(p_C,p_N),E_r\big) =-\sqrt{2\beta\lambda\,D_r(p_C,p_N)}. $$
We emphasize that, in this reduced-form steady-state proxy, V_r is strictly decreasing in D_r: any force that raises the effective wedge against frequent updating reduces the optimized value within that regime.

6.2. The carbon switching threshold p_C^†(p_N)

The server prefers the compute-heavy regime E_H whenever V_H(p_C, p_N) ≥ V_L(p_C, p_N). On the interior region where both regime-specific frequency choices are well defined, the monotonicity of V_r in D_r implies that the comparison is equivalent to comparing wedges:
V_H ≥ V_L ⇔ D_H(p_C, p_N) ≤ D_L(p_C, p_N).
Therefore, a (candidate) switching threshold solves D_H = D_L, i.e.,
p_Ce(E_H − E_L) = βτ(ϕ(E_H) − ϕ(E_L)).
This delivers a clean closed form:
$$ p_C^{\dagger} =\frac{\beta\tau\big(\phi(E_H)-\phi(E_L)\big)}{e\,(E_H-E_L)}. $$
The economic content is straightforward: the server switches away from compute-heavy training once the carbon price makes the marginal carbon cost of moving from E_L to E_H exceed the corresponding marginal learning gain, measured in the same utility units.

At the same time, this knife-edge clarity reveals a limitation: under the maintained baseline in which (i) the communication payload m is the same across regimes and (ii) congestion enters only through the linear per-update term (p_Nm)f (and the same κ applies across regimes), the congestion price p_N cancels out of the regime comparison. Congestion pricing still depresses V_H and V_L by lowering f_r^*, but it does so in a way that does not tilt the ranking between E_H and E_L on the interior region.

This is exactly the sense in which the two-regime exercise is informative: it isolates which modeling ingredients are needed for congestion to shift architectural choice, rather than merely scaling down communication intensity.

6.3. How congestion shifts the threshold in practice (and when it decreases)

In operational FL systems, regimes typically differ not only in local epochs but also in the network footprint per “round’’—through protocol overhead, compression choices, error-feedback state, or richer metadata in compute-heavy designs. A parsimonious way to capture this is to allow the payload to be regime-dependent, m_r (with m_H not necessarily equal to m_L). Repeating the analysis with m_r yields
D_r(p_C, p_N) = (p_CeE_r + p_Nm_r) + βλκ − βτϕ(E_r),
and the indifference condition D_H = D_L becomes
$$ p_C^{\dagger}(p_N) =\frac{\beta\tau\big(\phi(E_H)-\phi(E_L)\big)-p_N\,(m_H-m_L)}{e\,(E_H-E_L)}. $$
This expression makes the congestion-shift channel transparent:
$$ \frac{\partial p_C^{\dagger}}{\partial p_N} =-\frac{m_H-m_L}{e\,(E_H-E_L)}. $$
Hence, p_C^†(p_N) is weakly decreasing in p_N whenever the compute-heavy regime carries a weakly larger network footprint per update, m_H ≥ m_L. Economically, if compute-heavy training is also “heavier’’ on the network (for instance, because it transmits a less-compressed model delta, additional state, or higher-precision updates to capitalize on more local work), then congestion pricing penalizes the compute-heavy regime disproportionately, so the server requires cheaper carbon (a lower p_C) to justify it.

Conversely, if m_H < m_L—the case one might associate with aggressive compression or fewer bytes per effective unit of learning under FedAvg-like designs—then congestion makes compute-heavy training more attractive and the threshold increases. We can therefore read the sign of ∂p_C^†/∂p_N as an empirical question about which regime is more “bytes intensive’’ per update, holding fixed the update definition.

A second, closely related way congestion can shift the threshold is through feasibility/boundary effects. Even if m is identical across regimes, large p_N can push one regime into a corner where the interior expression for f_r^* no longer applies (e.g., due to protocol-imposed bounds on admissible cadences or because the implied optimizer would require an unrealistically small/large θ). Once one regime is effectively constrained while the other remains interior, the regime ranking inherits a dependence on p_N through which constraint binds, and the switching set generically tilts with congestion.

Taken together, these considerations justify treating the threshold as a function p_C^†(p_N) in applied work: congestion affects architectural choice precisely to the extent that architectures differ in their induced network footprint or in which operating constraints become active under high congestion. This is also the sense in which policy instruments interact: carbon pricing primarily rotates the compute margin, while congestion pricing (or capacity scarcity) rotates the communication margin, and the resulting architecture choice depends on how protocols map “more local work’’ into transmitted bytes and admissible frequencies.

7. Heterogeneous nodes and selection: Add heterogeneity in e_i and value/productivity v_i; derive index-based participation/selection cutoffs; discuss budget constraints.

7. Heterogeneous nodes and selection: value–energy tradeoffs and index cutoffs

The discrete-regime exercise treats a representative node. In practice, IIoT fleets are heterogeneous along two economically central dimensions: (i) energy intensity e_i (older devices, weaker accelerators, or harsher thermal envelopes require more Joules per local epoch), and (ii) learning productivity/value (some nodes observe richer data, higher variance states, or more decision-relevant operating regimes). We capture the second dimension with a multiplicative shifter v_i > 0 in the learning term, so that node i’s per-unit-time contribution becomes τv_iϕ(E_i)f_i.

With this heterogeneity, the server’s per-node surplus (still in steady state, ignoring transfers) is
$$ w_i(f_i,E_i;v_i,e_i) =\beta\Big(\tau v_i\phi(E_i)f_i-\lambda\big(\tfrac{1}{2f_i}+\kappa f_i\big)\Big) -\big(p_C e_i E_i+p_N m\big)f_i, $$
and the server may also choose a participation set 𝒮 ⊆ ℐ (equivalently, set f_i = 0 by not enrolling node i at all, so that no AoI/latency terms are incurred for that node).

7.1. Optimal intensities with (v_i, e_i)

Conditional on including node i, the interior FOCs take the same form as before, with τ replaced by τv_i. In particular, the local-epoch choice equates marginal learning returns to marginal carbon-priced compute:
βτv_i ϕ^′(E_i^*) = p_Ce_i,
so E_i^* is increasing in v_i and decreasing in (p_C, e_i). The update-frequency choice can be written compactly by defining the net per-update learning surplus from compute (measured in the server’s utility units),
ψ_i(p_C; v_i, e_i) ≡ βτv_iϕ(E_i^*) − p_Ce_iE_i^*,
so that the frequency FOC becomes
$$ \frac{\beta\lambda}{2(f_i^*)^2} =\big(p_N m+\beta\lambda\kappa\big)-\psi_i(p_C;v_i,e_i), $$
whenever the right-hand side is positive (otherwise the interior expression breaks and a boundary/feasibility constraint is needed). This representation makes the selection logic transparent: nodes with high v_i and low e_i generate larger ψ_i, which relaxes the “wedge” against frequent updating created by congestion pricing p_N and congestion-latency κ.

7.2. Participation as a cutoff in optimized net surplus

Let the server’s outside option from excluding node i be normalized to 0. Then the server includes node i iff its optimized per-node surplus is nonnegative:
i ∈ 𝒮 ⇔ max_{f ≥ 0, E ≥ 0} w_i(f, E; v_i, e_i) ≥ 0.
This “participation” formulation is the natural full-information benchmark: with quasilinear node utilities and verifiable costs, the server can always implement (f_i^*, E_i^*) through a cost-reimbursing contract plus (if needed) a fixed transfer that satisfies individual rationality. In richer environments (private e_i, private v_i, limited verification), the right-hand side becomes a virtual surplus and participation becomes a screening problem; we return to this only briefly below.

Two comparative-static implications follow immediately. First, increasing either price p_C or p_N weakly shrinks 𝒮, because higher prices lower the maximized value of w_i. Second, selection becomes more stringent for energy-inefficient nodes as carbon prices rise: even if such nodes remain useful at low p_C, they are “priced out” earlier.

7.3. A tractable index in the log case

To make the cutoff operational, it is useful to exhibit an explicit index in a canonical concave form. Take
ϕ(E) = log (1 + E),
so $\phi'(E)=\frac{1}{1+E}$. The epoch FOC yields
$$ E_i^*(p_C)=\max\Big\{\frac{\beta\tau v_i}{p_C e_i}-1,\;0\Big\}. $$
Thus the compute decision depends on the value per unit energy ratio v_i/e_i and the carbon price p_C. When E_i^* > 0, we can also compute the net per-update surplus term:
$$ \psi_i(p_C;v_i,e_i) =\beta\tau v_i\log\!\Big(\frac{\beta\tau v_i}{p_C e_i}\Big)-\big(\beta\tau v_i-p_C e_i\big). $$
Inserting ψ_i into the frequency condition shows that higher v_i/e_i raises ψ_i, which (holding feasibility fixed) increases the optimal frequency f_i^* and raises the node’s optimized surplus. This yields an “index” interpretation: nodes can be approximately ranked by
$$ \text{index}_i \;\approx\; \frac{v_i}{e_i}, $$
with the participation set characterized by a cutoff that moves downward in (p_C, p_N). Formally, for any fixed (p_C, p_N), there exists a threshold $\underline{\rho}(p_C,p_N)$ such that nodes with sufficiently large v_i/e_i are (weakly) more likely to satisfy max_f, Ew_i ≥ 0. In applied work, this is the key empirical implication: carbon pricing shifts the selected fleet toward “high learning value per Joule,” while congestion pricing shifts it toward “high learning value per transmitted byte” (here proxied by v_i since m is constant in the baseline).

A limitation is worth flagging. Because our steady-state proxy penalizes staleness via the λ/(2f) term, the interior solution may deliver negative optimized levels for some parameterizations even for high v_i. In realistic deployments, feasibility bounds on f (protocol limits), additional baseline benefits from participation, or a finite-horizon objective typically regularize levels; nonetheless, the index ranking and price-comparative statics remain informative for which nodes are dropped first.

7.4. Budget constraints and the “shadow price” of funds

Up to now, transfers cancel in the server’s surplus W. Procurement in the field, however, is often subject to a hard budget (or accounting) constraint: the server cannot reimburse all induced energy and network expenditures. A parsimonious way to model this is to impose a per-unit-time spending cap B on reimbursements:
∑_{i ∈ 𝒮}T_i ≤ B,
where under full-information cost reimbursement a natural lower bound is T_i ≥ (p_Ce_iE_i + p_Nm)f_i (plus any required fixed participation payment if nodes have nonzero outside options).

The main analytical implication is that a binding budget introduces a Lagrange multiplier μ ≥ 0 that effectively scales up resource prices. In the simplest case where transfers equal verifiable resource costs, the constrained problem is equivalent to the unconstrained problem with augmented prices
p_C^eff = (1 + μ)p_C, p_N^eff = (1 + μ)p_N.
Hence budget scarcity amplifies the impact of both carbon and congestion pricing on (i) intensive margins (f_i, E_i) and (ii) the extensive margin 𝒮. Practically, this predicts “double rationing” in high-price environments: not only does the server reduce frequencies and epochs for enrolled nodes, it also drops marginal nodes entirely.

When nodes’ outside options differ or information rents matter, the budget constraint becomes closer to a knapsack problem: select the subset of nodes with the highest (virtual) surplus per dollar of required transfer. Even then, the same qualitative index logic reappears: the server tilts enrollment toward high v_i and low e_i, and prices (p_C, p_N) rotate the cutoff.

7.5. Implementation remarks (screening and verification)

Finally, heterogeneity highlights why measurement and verification matter for policy. If e_i is privately known and difficult to audit, a menu of contracts over (f, E) can be used to screen devices, but carbon pricing then interacts with informational distortions: the server may deliberately depress E or f to reduce rents for energy-inefficient types. Conversely, if secure attestation or digital twins make e_i contractible, the clean full-information cutoffs above become a reasonable approximation, and one can interpret carbon pricing as transparently shifting participation toward energy-efficient IIoT nodes rather than merely reshuffling transfers.

8. Endogenous congestion pricing extension: Let p_N depend on aggregate update load F = ∑_if_i; characterize fixed points; note when closed-form breaks and numerical methods are needed.

8. Endogenous congestion pricing: feedback from aggregate update load

Up to this point, we have treated the congestion price p_N as an exogenous “market” parameter faced by the server when it selects per-node frequencies f_i. In many IIoT deployments, however, this is an incomplete description: network costs are load dependent. When more devices push updates more frequently, backhaul and access networks experience higher utilization, queueing, packet loss, and ultimately a higher shadow cost of bandwidth. A natural extension is therefore to let the congestion price be an increasing function of the aggregate update load
F ≡ ∑_{i ∈ ℐ}f_i, p_N = P(F), P^′(F) > 0,
where P(⋅) is interpreted either as (i) a tariff schedule posted by a network operator, (ii) a reduced-form mapping from utilization to “effective” per-MB cost, or (iii) a Pigouvian toll that internalizes the congestion externality created by higher update traffic.

The key economic change is that the server now faces a feedback loop: higher chosen frequencies raise F, which raises p_N, which in turn lowers the privately optimal frequencies. Equilibrium frequencies are therefore characterized as a fixed point.

8.1. Reduced-form pricing schedules and microfoundations

A convenient starting point is a parametric schedule such as
P(F) = p̄_N + ηF (η > 0),
or more generally P(F) = p̄_N + ηF^α with α ≥ 1. These forms capture the idea that marginal bandwidth cost is increasing in utilization, and they are analytically tractable in comparative statics even when they do not admit closed forms.

A simple microfoundation is to suppose the network operator incurs a convex operating cost K(F) (e.g., energy for radio resources, peering/transit costs, or capacity expansion cost amortized per unit time) and charges a per-unit-data price equal to marginal cost:
p_N = K^′(F)/m,
where m is the per-update payload. Alternatively, if the operator chooses p_N to manage delay, one can interpret P(F) as implementing a service-level objective: higher load implies higher expected waiting time and thus higher effective “price” per MB faced by the server.

8.2. Fixed-point characterization

With endogenous p_N, the server’s problem is still separable across nodes conditional on a conjectured congestion price. For any candidate price p_N, we can compute each node’s optimal pair (f_i^*(p_N), E_i^*). In the baseline structure, the epoch choice E_i^* continues to depend on carbon cost parameters (and node energy intensity) but not on p_N, whereas the frequency choice is decreasing in p_N whenever the interior solution applies.

Define the aggregate frequency induced by a given congestion price:
ℱ(p_N) ≡ ∑_{i ∈ ℐ}f_i^*(p_N).
An equilibrium with endogenous congestion pricing is then a pair (F^*, p_N^*) such that
F^* = ℱ(p_N^*), p_N^* = P(F^*).
Equivalently, it is a fixed point in F:
F^* = Γ(F^*) ≡ ℱ(P(F^*)).
This representation makes the logic transparent: P(⋅) maps load into a congestion price, and ℱ(⋅) maps that congestion price back into the server’s optimal aggregate load.

Existence is typically straightforward under mild regularity. If P(⋅) is continuous and ℱ(⋅) is continuous (which follows from the continuity of each f_i^*(p_N) away from boundary truncations), then Γ(⋅) is continuous. Moreover, Γ(F) ≥ 0, and under standard parameterizations Γ(F) is bounded above because sufficiently large p_N makes updates prohibitively expensive, driving f_i^* toward 0. Hence a fixed point exists by a standard intermediate-value argument.

Uniqueness hinges on the strength of the feedback. Since P is increasing and each f_i^* is decreasing in p_N (interior), ℱ is decreasing and thus Γ(F) = ℱ(P(F)) is decreasing in F. A decreasing continuous function can cross the 45-degree line at most once if it is strictly decreasing on the relevant interval; thus, away from kinks created by boundary solutions, we typically obtain a unique equilibrium load F^*. When boundary truncations matter (e.g., some nodes optimally set f_i^* = 0 at high p_N), Γ may become piecewise smooth; uniqueness still often holds but requires checking that the piecewise segments do not generate multiple crossings. Economically, multiplicity is most plausible when the congestion schedule is very steep and the participation/frequency responses have sharp thresholds—conditions under which “low-load/low-price” and “high-load/high-price” regimes can both be self-consistent.

8.3. Comparative statics with feedback (and why closed forms often break)

Endogenizing p_N changes comparative statics because a primitive parameter affects F both directly (through the server’s best response) and indirectly (through the induced change in congestion price). Formally, if we write the fixed point as H(F; ξ) = F − Γ(F; ξ) = 0 where ξ is a parameter such as p_C or κ, then (when differentiable) the implicit function theorem gives
$$ \frac{\partial F^*}{\partial \xi} = \frac{-\frac{\partial H}{\partial \xi}}{\frac{\partial H}{\partial F}} = \frac{\frac{\partial \Gamma}{\partial \xi}}{1-\Gamma_F}, \qquad \Gamma_F \equiv \frac{\partial \Gamma}{\partial F}. $$
Because Γ is decreasing in F, we have Γ_F < 0 and thus 1 − Γ_F > 1. This inequality has a useful interpretation: feedback dampens shocks that would otherwise change aggregate frequency. For example, an increase in the congestion sensitivity η or the payload m directly reduces optimal f_i, lowering F; but the resulting lower load also reduces p_N, partially offsetting the initial drop. Symmetrically, anything that raises the server’s private incentive to update more (higher τ or β, lower κ) increases F but also raises congestion prices, which pushes back against the increase.

This is also where closed forms frequently fail. Even if each f_i^*(p_N) is available in closed form, plugging p_N = P(F) typically yields a nonlinear equation in F. With heterogeneity across nodes (different e_i, possibly different v_i, and thus different E_i^* and different intercept terms in the f-FOC), ℱ(p_N) becomes a sum of square-root-type terms, and the fixed point generally does not simplify.

There are narrow cases where a closed form can be recovered—e.g., homogeneous nodes and a linear P(F) may reduce the fixed point to a solvable polynomial—but these are better viewed as pedagogical special cases than robust quantitative tools.

8.4. Numerical solution and practical computation

In applied settings, the most reliable approach is numerical. A convenient algorithm exploits monotonicity:

Compute epoch choices E_i^*(p_C) (and any other parameters entering the per-update value term) from the E-FOC with truncation at 0.
Define the best-response frequency f_i^*(p_N) for each node using the frequency FOC (again with truncation if feasibility constraints or nonnegativity bind).
Solve the scalar fixed point F = ℱ(P(F)).

Because Γ(F) is typically continuous and decreasing, one can solve F − Γ(F) = 0 via bisection on an interval [0, F̄] where F̄ is an upper bound computed from the zero-congestion benchmark (e.g., p_N = p̄_N) or from protocol limits. If differentiability holds, Newton or quasi-Newton methods are faster, but bisection is robust to kinks created by node dropouts (f_i^* = 0).

A closely related method is price iteration: start from a guess p_N⁽⁰⁾, compute F^(t) = ℱ(p_N^(t)), update p_N^(t + 1) = P(F^(t)), and iterate. When the composite mapping p_N ↦ P(ℱ(p_N)) is a contraction (a “small gain” condition equivalent to sufficiently weak feedback), this converges quickly and also provides a natural notion of stability.

8.5. Interpretation and limitations

Endogenous congestion pricing highlights a practical policy point: carbon pricing and network pricing are not independent instruments once congestion is load-determined. A carbon-price increase that reduces compute (and often reduces communication) can indirectly lower congestion prices, mitigating the communication reduction and changing the incidence across nodes. Conversely, stronger congestion pricing can shift the server toward architectures or operating points that economize on communication, potentially increasing the relative attractiveness of compute-heavy local training—an effect that becomes especially salient when we move beyond the baseline separability and allow explicit substitution between local epochs and update frequency.

At the same time, we should acknowledge the limits of the reduced-form P(F). Real networks exhibit capacity regions, discrete scheduling, and time-of-day variability; moreover, congestion may enter not only through monetary costs but also through the latency term itself. These features strengthen the case for numerical solution and for calibration/estimation—precisely the direction we take in the quantitative illustration that follows.

9. Quantitative illustration: Calibration ranges (energy per epoch, carbon intensity, bandwidth prices); simulation design; falsifiable predictions (elasticities).

9. Quantitative illustration: calibration ranges, simulation design, and falsifiable elasticities

Our purpose in this section is not to claim a “one-size-fits-all” numerical forecast for IIoT federated learning (FL), but rather to show how the primitives of the model map into measurable engineering quantities, to provide plausible calibration ranges, and to spell out a small set of falsifiable comparative-static predictions (elasticities and threshold effects) that one can take to data or to a digital-twin simulation.

9.1. Mapping primitives to measurable quantities (and plausible ranges)

Energy per local epoch per update, e_i. A convenient operational definition is
e_i = P̄_i^train × t_i^epoch,
where P̄_i^train is average incremental power draw during local training (above idle) and t_i^epoch is time per local epoch. In IIoT settings, both terms are highly heterogeneous: low-power microcontrollers may have P̄_i^train ∈ [0.5, 5]W while edge gateways or GPU-enabled controllers can sit in [10, 150]W; epoch times can range from fractions of a second (tiny models, small local datasets) to tens of seconds (larger models, slower storage). Translating to energy, it is common to see per-epoch values spanning roughly
e_i ∈ [10⁻¹, 10³] J/epoch (i.e., ≈ [3 × 10⁻⁸, 3 × 10⁻⁴] kWh/epoch).
The upper end is not pathological: a 50W device taking 20s per epoch consumes 1000J per epoch.

Carbon price per unit energy, p_C. The model treats p_C as a monetary price per kWh (or per J). In practice, carbon costs are often stated as $/tCO₂, so we convert using the local grid emissions factor g (tCO₂/kWh):
p_C^kWh = p_C^tCO₂ × g.
With p_C^tCO₂ ∈ [50, 200] $/tCO₂ (typical regulatory or internal shadow prices) and g ∈ [0.0002, 0.0007] tCO₂/kWh (roughly 200–700 gCO₂/kWh), we obtain
p_C^kWh ∈ [0.01, 0.14] $/kWh.
This range is economically meaningful relative to energy costs that already vary across sites and time-of-day; it is also precisely why measuring when and where training occurs matters for carbon accounting.

Network congestion price, p_N. We interpret p_N as $/MB (or $/GB) of update traffic, encompassing explicit tariffs, SLA shadow prices, and/or congestion tolls. Enterprise and industrial connectivity costs vary widely: metered cellular can be a few $/GB, while private backhaul or negotiated transit can be far lower in accounting terms but still exhibit high shadow costs when capacity is scarce. A practical range is
p_N ∈ [5 × 10⁻⁵, 10⁻²] $/MB (i.e., [0.05, 10] $/GB).

Payload per update, m. For FL, m depends on model size and compression/quantization. In IIoT applications we often see m ∈ [0.1, 20] MB per round (from aggressively compressed shallow models to moderately sized neural nets). This parameter matters because it scales the marginal cost of frequency one-for-one through the term p_Nmf_i.

Timeliness penalty parameters, λ and κ. These are less directly observed because they summarize the business cost of staleness and queueing. One pragmatic strategy is to calibrate λ to a willingness-to-pay for reducing latency or AoI (e.g., using downtime costs or defect-detection value) and calibrate κ using empirical delay-vs-load curves from the network (or a queueing model). We emphasize that this is where “engineering” and “economics” meet: κ can be estimated from telemetry, while λ is a valuation parameter.

Effectiveness from local computation, ϕ(E). For quantitative illustration we recommend a one-parameter concave family such as ϕ(E) = log (1 + E) or $\phi(E)=\sqrt{E}$. The log case is particularly transparent because it implies a simple inverse relationship between optimal epochs and carbon price.

9.2. Simulation design (baseline and with endogenous congestion)

A minimal simulation that matches the model’s logic proceeds as follows.

Choose a population of devices. Fix I (e.g., I = 10²–10⁴) and draw heterogeneity in energy intensity e_i (lognormal is natural) and, if desired, heterogeneity in learning productivity v_i that scales the benefit term (replace τ by τv_i). One can tie e_i mechanically to device classes (sensor, PLC, gateway) to mirror a real fleet.
Fix prices and technical parameters. Select (p_C, p_N, m), and either (i) treat p_N as fixed, or (ii) make it load-dependent through a schedule p_N = P(F) that can be estimated from observed throughput/delay or from an operator tariff.
Compute per-node optimal intensities. For a chosen ϕ, compute E_i^*(p_C) from the epoch first-order condition (with truncation at 0 if needed), then compute f_i^* from the frequency first-order condition (again truncating if the interior formula implies infeasibility). In the endogenous congestion version, solve the scalar fixed point for F and thus p_N, then recover {f_i^*}.
Report outcomes and counterfactuals. Aggregate statistics—mean frequency, distribution of update cycles θ_i, total traffic m∑_if_i, energy use ∑_ie_iE_if_i, and surplus—are computed under baseline and under price shocks (e.g., doubling p_C, increasing p_N, changing m via compression). For the discrete-architecture variant E ∈ {E_L, E_H}, compute regime shares and the implied threshold behavior in p_C.

This design is intentionally modular: a practitioner can swap in a measured P(F), a different ϕ fitted from learning curves, or device-specific m_i if payloads differ.

9.3. Falsifiable predictions: elasticities and threshold effects

The model generates several predictions that can be tested using FL logs (round frequency, local steps), network telemetry (utilization, delay), and carbon/energy accounting (site-specific grid intensity).

(i) Update-frequency elasticity w.r.t. congestion price. In the interior solution,
f_i^* ∝ [ (p_Nm) + (other terms) ]^−1/2.
Hence a robust qualitative prediction is ∂f_i^*/∂p_N < 0. Quantitatively, the log-elasticity takes the form
$$ \frac{\partial \log f_i^*}{\partial \log p_N} \;=\; -\frac{1}{2}\cdot \frac{p_N m}{(p_N m)+\text{(other terms)}}\;\in\;(-1/2,0), $$
approaching −1/2 when network cost dominates the denominator. This delivers an empirical target: in high-congestion regimes, a 10% increase in effective $/MB should reduce update frequency by roughly 5% (modulo truncations and any protocol constraints).

(ii) Epoch elasticity w.r.t. carbon price (log case). If ϕ(E) = log (1 + E), then
$$ E_i^*=\max\left\{\frac{\beta\tau}{p_C e_i}-1,\,0\right\}. $$
Whenever E_i^* > 0, we obtain a stark prediction: epochs fall approximately inversely with the carbon price (and with device energy intensity). In particular, for 1 + E_i^* = βτ/(p_Ce_i),
$$ \frac{\partial \log(1+E_i^*)}{\partial \log p_C}=-1, \qquad \frac{\partial \log(1+E_i^*)}{\partial \log e_i}=-1. $$
Thus, if a site’s carbon price doubles (holding everything else fixed), the model predicts a halving of 1 + E_i^* for devices that remain interior.

(iii) Participation/activation margins under heterogeneous devices. With heterogeneity, some nodes optimally shut down (choose f_i^* = 0) when prices are high or value is low. The falsifiable implication is selection on energy efficiency: conditional on task value, higher-e_i devices should reduce frequency earlier and exit sooner as p_C rises. In data, this appears as a compositional shift of training participation toward more energy-efficient hardware when carbon accounting tightens.

(iv) Architecture switching as a threshold phenomenon. Under the two-regime restriction E ∈ {E_L, E_H}, the model predicts a sharp switch at a threshold p_C^†(p_N). Empirically, this translates into a discontinuous change in observed local-steps-per-round (and often an accompanying change in frequency) around policy-induced carbon-cost changes—especially when organizations adopt internal carbon prices or when regulators change reporting requirements.

(v) Feedback attenuates elasticities when p_N is load-dependent. When congestion prices respond endogenously to aggregate updates, any primitive shock that would change ∑_if_i is partially offset by the induced movement in p_N. The testable implication is that measured elasticities of aggregate traffic with respect to, say, compression (lower m) or valuation shocks (higher τ) should be smaller in networks with steeper utilization-based pricing or tighter capacity—precisely because prices “push back.”

Taken together, these predictions discipline what we should see in real IIoT FL deployments: (a) frequency responds strongly and concavely to network pricing and payload size; (b) local computation responds strongly to carbon pricing and device energy intensity; and (c) policy or tariff changes can induce discrete regime shifts in “compute-heavy vs communication-heavy” operation. The next section returns to design guidance and the practical limits of what this reduced-form calibration can capture.

10. Discussion and conclusion: Design guidance for practitioners; policy implications for telemetry and carbon accounting; limitations and next steps.

10. Discussion and conclusion: design guidance, policy implications, and next steps

Our model is intentionally spare: we reduce an IIoT federated learning (FL) deployment to two choice variables per node—update frequency f_i and local compute intensity E_i—and two externally facing prices—carbon/energy p_C and network congestion p_N. The payoff from doing so is that the design problem becomes legible as a pair of “equalize-at-the-margin” rules: choose local epochs until the marginal learning gain equals the marginal carbon cost, and choose communication frequency until the marginal value of freshness equals the marginal cost of updates (including congestion and latency). The discussion below translates these rules into operational guidance and highlights what must be measured—and audited—if carbon-aware FL is to be more than a slogan.

10.1. Design guidance for practitioners: what to tune, what to measure, and what to automate

(i) Separate decisions: compute per round vs rounds per time. In the baseline model, the first-order condition for E_i depends on p_Ce_i but not on p_N, while the condition for f_i depends on both p_N and the net value of an update. This separability is a practical feature: teams can often assign “how many local steps per round” to an on-device training module, and “how often to communicate” to a scheduler that is network-aware. In other words, carbon-aware tuning can be implemented without redesigning the whole FL stack: estimate e_i, pick E_i using a carbon price signal, and then schedule f_i using network conditions and timeliness requirements.

(ii) Implement carbon-aware local training as a marginal rule, not a static cap. A common practice is to impose a hard energy budget or to throttle training uniformly. The model suggests a more efficient approach: treat carbon as a shadow price that changes with location and time, and adapt E_i accordingly. In the log illustration ϕ(E) = log (1 + E), optimal epochs follow an inverse rule 1 + E_i^* ∝ 1/(p_Ce_i), so a device running during a low-carbon window should rationally do more work per round than the same device during a high-carbon window. Practically, this points toward carbon-aware “bursting”: concentrate local epochs when the effective p_C is low (cleaner grid or lower internal carbon charge), and reduce epochs otherwise.

(iii) Treat update frequency as the primary congestion control knob. Communication costs scale linearly with f_i, while timeliness costs are convex in f_i through κf_i. This makes frequency the natural control for network externalities. When a deployment experiences congestion, shrinking payload m and lowering f_i are substitutes; the model clarifies when each is preferred. If compression/quantization can reduce m without materially lowering the “value per update,” then reducing m is a first-best fix because it cuts congestion costs without increasing staleness. When reducing m is costly (e.g., accuracy loss, engineering constraints), the next best response is to lower f_i, accepting higher AoI but relieving network load and queueing delays.

(iv) Use heterogeneity rather than fighting it. Real IIoT fleets include ultra-low-power sensors, PLCs, gateways, and GPU-capable controllers. A uniform policy (“all nodes do E epochs and update every θ seconds”) is almost never surplus-maximizing once e_i varies. The mechanism design implication is simple even in the full-information baseline: compute-inefficient nodes should either (a) compute less per update (lower E_i), (b) update less often (lower f_i), or (c) drop out when prices spike. In operations, this becomes a set of defaults: an “energy class” table for devices, per-class E schedules indexed by p_C, and per-class maximum frequencies indexed by p_N and observed delay.

(v) Architecture choice is a policy lever when communication is scarce. The discrete variant E ∈ {E_L, E_H} captures an engineering reality: organizations often choose between a communication-heavy configuration (small local work, frequent rounds) and a compute-heavy configuration (more local work, fewer rounds). The threshold p_C^†(p_N) is a useful organizing concept for practice: it says there is a region of environments where compute-heavy FL is the rational response to expensive communication, and a region where it is not worth the carbon cost. Even if the true frontier is continuous, teams can operationalize “regime switching” by selecting from a small menu of FL profiles (e.g., (E, f) presets) rather than tuning continuously.

(vi) Contracts and internal chargebacks can implement the optimum. Many deployments rely on business units that “own” devices, energy budgets, and connectivity plans. The model’s quasilinear structure suggests a straightforward governance design: reimburse verifiable resource costs and pay a bonus indexed to the value of contributions (or a proxy such as participation at prescribed (f_i, E_i)). Even if exact value is hard to measure, a posted menu of profiles can induce self-selection: energy-efficient nodes pick higher-E or higher-f options because their private cost is lower.

10.2. Policy implications: why telemetry and carbon accounting are part of the mechanism

The comparative statics in the model are only as credible as the measurement system that feeds p_C, p_N, and e_i into the control policy.

(i) Carbon accounting must be granular in time and place. Treating p_C as a single number is analytically convenient but operationally misleading. Grid intensity varies by region and hour; internal shadow prices vary by division and reporting regime. If training is scheduled without location- and time-specific emissions factors, the system will systematically misallocate compute. A practical policy implication is that “carbon-aware FL” requires an auditable pipeline from (a) node location (or power domain), (b) marginal emissions factors (or contractual renewable claims), and (c) an internal carbon price to an effective p_C that is applied at the time of training.

(ii) Device energy telemetry should be treated as regulated measurement, not an afterthought. Our key heterogeneity parameter e_i is not merely a device spec; it depends on workload, temperature, and implementation. If e_i is mismeasured, carbon-aware policies can perversely shift load toward devices that are actually inefficient. This motivates an engineering-policy recommendation: organizations should maintain a calibration protocol for e_i (benchmarks, periodic re-estimation, and attestation), analogous to how industrial systems calibrate sensors that drive safety-critical controls.

(iii) Network pricing is de facto congestion policy. Even when p_N is not an explicit tariff, it exists as an internal scarcity cost: SLA risk, delayed control decisions, or foregone throughput. Making p_N explicit—via chargebacks, quotas, or congestion tolls—turns an unpriced externality into a controllable input. Conversely, if p_N is set administratively at zero, the system will tend to over-communicate, shifting the burden into latency and operational risk (captured in the model by κ).

10.3. Limitations and next steps

We view the model as a disciplined “accounting identity with incentives,” not a full representation of learning dynamics.

First, we compress learning progress into ϕ(E) and linear-in-frequency value τϕ(E)f. Real learning curves depend on data non-i.i.d.-ness, optimizer state, and drift; the value of an additional update may decay with time or depend on global model quality. A next step is to embed the pricing-and-scheduling problem into a finite-horizon learning curve, where the server trades off early rapid improvement against later diminishing returns.

Second, we use stylized AoI/latency proxies A(f) = 1/(2f) and L(f) = 1/(2f) + κf. These capture the core convexity but abstract from burstiness, deadline constraints, and multi-class queueing. Empirically grounded queueing models (or learned delay predictors) can replace L(⋅) while preserving the same economic logic.

Third, the baseline treats p_N as exogenous. In many networks, congestion is endogenous to ∑_if_i. Endogenizing p_N creates a fixed-point problem and introduces strategic interactions across nodes (each node’s frequency imposes latency on others). This is precisely where mechanism design becomes central: efficient outcomes generally require congestion pricing or rationing.

Fourth, we abstract from privacy risk, fairness across sites, and constraints such as minimum update rates for safety-critical monitoring. These constraints can be incorporated as KKT multipliers that effectively shift p_C, p_N, or λ, but doing so explicitly is important for governance.

Despite these limitations, the central conclusion is robust: in IIoT FL, carbon and congestion prices are not peripheral “cost accounting” details; they are policy-relevant signals that determine the optimal balance between local computation and communication. The practical agenda is therefore clear: measure e_i credibly, expose meaningful p_C and p_N signals to the scheduler, and implement adaptive (E_i, f_i) policies—possibly via simple regime menus—that make the tradeoff transparent and auditable.