In 2026, the operational bottleneck for industrial IoT (IIoT) federated learning (FL) is no longer “can we train at the edge?” but rather “who pays for the externalities created by training at the edge, and how do those prices reshape the engineering choices we thought were purely technical?” Two developments make this shift salient. First, carbon accounting has moved from a reputational concern to an explicit budget constraint: industrial firms increasingly face internal carbon shadow prices, sectoral emissions trading exposure, and procurement rules that treat compute energy as a priced input. Second, connectivity for mission-critical IIoT—private 5G, Wi‑Fi 7, time-sensitive networking, and shared backhaul—has become sufficiently loaded that network usage is often priced (or rationed) via congestion-aware tariffs, slicing fees, or capacity penalties. When edge devices participate in FL, they consume both energy (local training) and network resources (uplink model updates). Carbon and congestion pricing therefore enter the FL design problem as first-order economic primitives, not after-the-fact implementation details.
This paper’s starting point is that IIoT FL is ultimately procured: a server (an OEM, platform provider, or plant operator) solicits timely, useful updates from a population of heterogeneous devices. What the server cares about is not “accuracy in the limit” but steady operational value—predictive maintenance quality, anomaly detection, process control robustness—delivered under tight timeliness constraints. In this environment, stale information is costly even if it is statistically “good.” This is why age-of-information (AoI) and latency concepts, originally developed for communication systems, have become central to the economics of IIoT learning. A model update arriving late can be worse than no update at all if it causes the global model to lag behind fast-moving machine states. Conversely, excessive communication to avoid staleness can overload the network and create the very latency the system seeks to prevent.
These realities motivate a simple but economically sharp abstraction: each node chooses (or is instructed to choose) how often to communicate and how much to compute per communication. The former is summarized by an update frequency (or equivalently an update cycle length), while the latter is summarized by local computation intensity, such as the number of local epochs per update. This two-dimensional control captures a canonical engineering substitution in FL: we can communicate more frequently with lighter local computation (communication-heavy, “more FedSGD-like”), or communicate less frequently while doing more work locally (compute-heavy, “more FedAvg-like”). In practice, this substitution is not merely about bandwidth; it also affects stability, drift under non-iid data, and robustness to intermittent connectivity. Our goal is not to reproduce all of those dynamics, but to isolate the economic tradeoff that pricing makes unavoidable: communication-heavy regimes pay more congestion charges and potentially induce more network delay, while compute-heavy regimes pay more carbon/energy charges and risk higher device-side energy draw.
A second modeling motivation comes from how industrial systems are governed. IIoT deployments are rarely egalitarian, volunteer networks; they are managed ecosystems where participation is engineered and incentivized. Devices differ in energy intensity (hardware generation, thermal constraints, battery health), and their data are unevenly valuable (rare fault modes, critical asset classes). The server therefore faces a procurement problem: it must decide which devices should participate and at what intensities, taking prices as given. In many deployments, the server can implement these choices through posted contracts (e.g., reimbursements, service credits, or internal transfer prices) because devices are owned by the firm or operate under enforceable service-level agreements. Even when devices are not fully controlled, the direction of travel is toward verifiability via device attestation and digital twins that report energy use, duty cycles, and communication volumes. This institutional detail matters because it makes full-information benchmark allocations informative: they describe what a sophisticated operator would implement absent hidden information frictions, and they provide a yardstick for later mechanism-design extensions.
Carbon and congestion prices matter here not only because they raise costs, but because they reshape optimal mixing of compute and communication. A naïve view says: if carbon becomes expensive, do less local training; if congestion becomes expensive, communicate less. The more subtle point is that FL’s value is produced jointly by (i) update freshness and (ii) update quality. Freshness is governed by how often information is incorporated; quality is amplified by local computation, but with diminishing returns. In IIoT, diminishing returns are particularly plausible: after a small number of local epochs, additional passes over streaming or highly correlated sensor data add little incremental generalization but still consume energy. Likewise, timeliness costs are not linear in frequency: updating too rarely causes staleness, while updating too often can create congestion-induced delay and scheduling overhead. Once we recognize these curvature properties, prices do more than scale down activity—they change which margin is worth paying for.
This is where the “MEFL satisfaction” framing is useful. Rather than modeling end-to-end accuracy trajectories, we represent the per-unit-time contribution of a device as a satisfaction or value flow that increases in (effective) learning progress and decreases in timeliness costs. The server’s problem becomes a steady-state control problem: choose device intensities to maximize surplus net of priced resource use. This reframing is pragmatic. Industrial operators frequently optimize steady-state KPIs—downtime risk, false-alarm rates, latency budgets—under recurring energy and network charges, rather than solving a one-off training problem. Moreover, the steady-state view aligns with how carbon and congestion prices are applied: they are assessed per unit energy and per unit data, period after period. By matching the model’s objective to the billing and budgeting realities, we can speak directly to the “new bottleneck” facing IIoT FL procurement teams.
The engineering counterpart to this economic framing is the compute–communication substitution embodied by FedAvg-like architectures. FedAvg can be read as a response to expensive communication: increase local work per round so that fewer rounds are needed. In 2026, however, we must ask: expensive relative to what? If communication is priced but compute is cheap and clean (e.g., devices plugged into low-carbon grids), the classic FedAvg intuition strengthens. If compute is carbon-priced and devices are energy-inefficient, the same shift toward local computation becomes socially and privately costly. Put differently, the same algorithmic knob—number of local epochs—has an economic meaning that depends on carbon prices and device heterogeneity. Our framework makes this dependence explicit and yields testable comparative statics: holding everything else fixed, a higher congestion price tilts the optimal design toward fewer updates; a higher carbon price tilts it toward less local computation, and, under a natural condition, toward fewer updates as well because updates become less valuable when each update embodies less effective learning.
This pricing-centric perspective also helps interpret practical phenomena that operators increasingly report. First, “network is the bottleneck” complaints often coexist with surprisingly conservative update schedules. Our model rationalizes this: if increasing update frequency triggers congestion-induced latency (queueing, scheduling contention, retransmissions), timeliness can deteriorate even as nominal update frequency rises, making the marginal update unattractive. Second, carbon budgets can lead to an apparently paradoxical outcome: an operator may reduce local computation and also reduce communication, not because both inputs are complements technologically, but because the value of communicating a weaker update falls when local computation is curtailed. Third, heterogeneous devices will not be treated symmetrically: carbon pricing effectively penalizes energy-inefficient nodes, changing the composition of participants and potentially biasing the learned model toward devices that are “cheap to run,” a fairness and representativeness concern that practitioners increasingly flag.
We emphasize what this introduction does not claim. We do not model the full nonconvex training dynamics of deep networks, nor do we endogenize carbon and congestion prices from equilibrium in electricity or telecommunications markets. We also abstract from some engineering constraints—uplink scheduling, packet loss, intermittent connectivity—that matter in specific deployments. Our claim is narrower: even with conservative proxies for AoI/latency and a reduced-form representation of learning returns, the procurement problem already produces sharp economic predictions and policy-relevant thresholds. In particular, when the server must choose between discrete architectural regimes—communication-heavy versus compute-heavy—pricing generates a switching rule: there exists a carbon price level above which the compute-heavy regime is no longer worth its energy cost, and this threshold itself shifts with congestion pricing. This kind of threshold logic is precisely what practitioners want when they must set organization-wide defaults (“use FedAvg with EH unless carbon price exceeds X”) while retaining the ability to fine-tune per-device intensities.
Finally, the policy and practice implications are immediate. Carbon pricing, whether explicit (tax/ETS) or internal (shadow prices in capital planning), will influence not only how much edge learning occurs but how it is architected. Congestion pricing and network tariffs will analogously influence whether “communication-efficient” FL is actually efficient in an economic sense. Our model provides a structured way to translate prices into design parameters—update cycles and local epochs—and thus into implementable procurement guidelines. It also clarifies where additional institutional design is needed: if firms care about representativeness or resilience, they may need mechanisms that counteract the selection effects induced by pricing. The remainder of the paper formalizes these tradeoffs, derives the comparative statics that connect prices to optimal FL intensities, and then uses a discrete architecture restriction to make the compute–communication substitution and its regime-switching implications transparent.
Work at the intersection of IIoT communications, federated learning, and economic design has grown rapidly, but it is still fragmented across communities that often treat “timeliness,” “communication efficiency,” and “energy/carbon cost” as separate engineering objectives. Our contribution is to place these elements in a single procurement-oriented surplus maximization problem in which carbon and congestion prices enter as explicit primitives. This section situates that move relative to four closely related literatures.
A first thread studies age-of-information (AoI), latency, and timeliness incentives in networked control and industrial analytics, and more recently in IIoT FL. AoI was developed to formalize how “fresh” a received signal is, with a central insight that the welfare-relevant notion of timeliness is not simply throughput but the distribution of staleness over time. In IIoT settings—predictive maintenance, anomaly detection, and process control—the economic loss from stale information can be highly convex: an update that arrives “a bit late” may be nearly useless if the underlying machine state evolves quickly. The communications literature has therefore emphasized scheduling, sampling, and queueing policies that minimize AoI or AoI-like penalties under capacity constraints (e.g., sampling policies, peak AoI vs average AoI, and variants that incorporate service times and congestion). Parallel work in learning has adopted timeliness-aware objectives for FL—sometimes under labels such as “mobile/edge FL with AoI,” “timeliness-aware FL,” or “MEFL”—where the goal is to prioritize clients with fresher or more relevant updates and to account for straggling and intermittent connectivity. What we take from this literature is not a particular AoI formula, but the modeling discipline: timeliness is a first-order performance dimension, and it interacts nonlinearly with update frequency because higher update rates can themselves create queueing delay. Where our approach differs is that we treat timeliness as an economic cost that enters the server’s objective alongside priced resource usage, rather than as a standalone network metric to be minimized subject to fixed learning requirements. This shift matters in procurement environments because the server’s choice is rarely “minimize AoI at any cost”; it is “how much timeliness is worth paying for” once energy and network usage are budgeted and priced.
A second related line concerns communication-efficient FL and the compute–communication substitution embodied by FedAvg and its descendants. In the FL literature, FedAvg is commonly motivated by limited uplink capacity or device participation constraints: do more local SGD steps per round to reduce the number of rounds, possibly combined with compression, sparsification, quantization, partial participation, and asynchronous protocols. A large theory and systems literature studies how local steps affect convergence rates, stability under heterogeneous data, and sensitivity to client drift; empirically, practitioners tune local epochs as a systems knob trading communication rounds against local computation and potential accuracy degradation. Our model is deliberately agnostic about detailed convergence dynamics, but it is tightly aligned with the core engineering substitution: local computation can increase the “effective quality” of an update, while communication frequency determines how quickly the global state incorporates new information. We differ from much of the communication-efficient FL literature in two ways. First, we make the substitution priced: each update carries an explicit network cost, and each additional local epoch carries an explicit energy/carbon cost. Second, we emphasize that the best choice of “communication-efficient” architecture is not invariant; it depends on the relative shadow prices and on curvature—diminishing returns to extra local computation and increasing marginal timeliness loss from aggressive update rates. In practice, FedAvg-like defaults are often justified by bandwidth limitations alone; our framework clarifies when those defaults remain optimal once carbon accounting is added, and when an operator should instead shift toward fewer local epochs (even if it increases communication) because the marginal carbon cost dominates.
A third literature studies green and cost-aware edge AI, including energy-efficient training/inference, carbon-aware scheduling, and “sustainable ML.” This work ranges from device-level power modeling and DVFS-based energy control, to data-center carbon-aware load shifting, to system-wide accounting frameworks that translate energy use into carbon using grid intensity. In edge AI and IIoT, the focus is often on prolonging battery life, respecting thermal constraints, or meeting device duty-cycle budgets, sometimes with multi-objective optimization that trades accuracy against energy. More recent work also considers carbon pricing or carbon budgets, either explicitly (as a monetary penalty) or implicitly (as a constraint). We build on the same premise—that energy is not free and increasingly is not even “just a technical constraint”—but we embed it into a surplus maximization problem that also values timeliness. This matters because “green” policies implemented inside firms are frequently expressed as internal shadow prices applied uniformly across business units, while network charges are implemented as tariffs or capacity penalties; both are naturally modeled as per-unit prices. Our model is therefore closer in spirit to cost-accounting and procurement practice than to purely engineering-centric energy minimization. At the same time, we acknowledge a limitation relative to the green AI literature: we do not attempt to model the physical power curve of devices, thermal throttling, or time-varying grid carbon intensity. Those features can be layered in later, but our goal in the baseline is to isolate the economic comparative statics that already arise under linear per-epoch energy costs and a reduced-form learning return.
A fourth foundational thread comes from congestion pricing and queueing economics, which provides the conceptual backbone for treating network load as a priced externality. Classic results in transport and telecommunications show that congestion creates a wedge between private and social marginal costs, motivating Pigouvian tolls or priority pricing. In packet networks and wireless systems, congestion manifests through queueing delays, contention, retransmissions, and scheduling overhead—effects that are often negligible at low load but rise sharply near capacity. The queueing literature supplies both the analytical objects (latency as a function of arrival rates) and the economic logic (pricing to internalize congestion). In the IIoT FL setting, the “arrival rate” is induced by the server’s chosen update frequencies across many devices, and the congestion cost is experienced as increased end-to-end latency and/or explicit usage charges. Our reduced-form latency proxy is meant to capture this increasing marginal delay without committing to a specific queueing model or MAC-layer protocol. The point of contact with congestion pricing is twofold: (i) it justifies treating per-update communication as carrying a marginal social cost beyond the device’s own energy use, and (ii) it highlights that pushing frequency too high can be self-defeating because it increases delay and therefore timeliness loss. In that sense, our framework complements purely algorithmic approaches that attempt to “speed up” FL by increasing communication: even if more frequent updates can help learning, their value must be weighed against congestion-induced latency and priced network costs.
Across these literatures, our primary synthesis is to treat IIoT FL as a procurement problem with two priced inputs—energy/carbon and network usage—producing a value flow that is increasing in effective learning progress and decreasing in timeliness loss. This perspective helps organize several practical questions that are difficult to answer within any single prior strand: when do carbon budgets push an operator toward “lighter” local training, and when does congestion pricing push toward less frequent communication? When does it become optimal to switch discretely between FedAvg-like and more communication-heavy regimes? And how do these incentives interact with heterogeneous device energy intensities in a way that changes who participates? By positioning our analysis relative to AoI/MEFL, communication-efficient FL, green edge AI, and congestion pricing, we aim to provide a common economic language that is directly usable for design and policy: prices translate into optimal update cycles and local epoch choices, and into threshold rules that procurement teams can operationalize—even when the underlying learning dynamics are complex.
We now formalize the procurement environment that underlies the comparative statics in the remainder of the paper. The goal is not to reproduce a full learning-theoretic convergence analysis, but to write down a steady-state per-unit-time objective that puts (i) learning “quality” from local computation, (ii) timeliness losses from stale/slow information, and (iii) priced resource usage (carbon/energy and network congestion) into a single surplus metric that a procuring server can optimize.
We consider a server (principal) coordinating federated learning across a set of IIoT devices (agents) indexed by i ∈ ℐ, with |ℐ| = I. Each node generates model updates at a chosen cycle length θi > 0, equivalently at frequency fi ≡ 1/θi updates per unit time. We interpret fi broadly: it can represent literal periodic reporting, expected update arrivals under randomized scheduling, or an effective rate induced by partial participation. The key modeling discipline is that increasing fi improves freshness and (typically) learning progress, but it also loads the network and can increase congestion-induced delay.
In addition to deciding how often a node communicates, the server also chooses the node’s local computation intensity Ei ≥ 0, interpreted as the number of local epochs (or, more generally, the amount of local optimization) performed per communicated update. This is the canonical FedAvg lever: holding communication fixed, more local work can make each transmitted update “more informative,” but with diminishing returns and potentially higher resource cost. To capture this, we assume that the per-update effectiveness of local computation is summarized by a function ϕ(E) satisfying: ϕ(0) = 0, ϕ′(E) ≥ 0, ϕ″(E) ≤ 0, and ϕ ∈ C2. Concavity is the reduced-form way to encode that the first few local epochs may substantially improve an update, while additional epochs yield smaller incremental gains.
A central object in IIoT is the cost of operating on stale
information. Rather than modeling detailed queueing dynamics and control
losses, we use a standard AoI-inspired proxy that is analytically
transparent and captures the nonlinearity that matters for incentives.
If updates arrive periodically at rate f, the average age of information is
proportional to the area of a sawtooth: it scales like 1/(2f). We therefore take the
baseline AoI proxy as
$$
A(f)=\frac{1}{2f}.
$$
In many IIoT settings, “timeliness” is not only about staleness from
infrequent sampling; it is also about latency from
congestion, contention, and queueing. To reflect that, we augment the
1/(2f) term with a
reduced-form congestion penalty that rises with the induced load. We
write a latency proxy
$$
L(f)=\frac{1}{2f}+\kappa f,\qquad \kappa>0,
$$
where κf is a
tractable way to represent the idea that pushing update frequency higher
can be self-defeating: it increases arrival rates, which increases delay
(or the probability of missing deadlines), which erodes the usefulness
of the updates. We do not claim κf is a literal queueing
formula; rather, it is a parsimonious approximation that ensures
increasing marginal timeliness costs at high frequencies—precisely the
force that rules out “infinite frequency” corner solutions and makes the
economics of congestion pricing visible.
We weight timeliness losses by λ > 0, which converts the latency/AoI proxy into the same units as learning value. In applications, λ can be interpreted as the marginal business cost of delay/staleness: higher for fast-moving processes (tight control loops, safety monitoring) and lower for slowly evolving tasks (long-horizon condition monitoring).
The server values learning progress, but in steady state we model
this progress as a flow rather than as a finite-horizon
trajectory. Specifically, we assume that node i’s update stream contributes value
proportional to the rate of effective updates:
effective update flow from node
i ∝ ϕ(Ei) fi.
Let τ > 0 scale the
marginal value of effective learning progress. Then the per-unit-time
gross contribution from node i, net of timeliness losses but
before monetary resource costs, is
$$
G_i(f_i,E_i)=\tau\,\phi(E_i)\,f_i-\lambda\,L(f_i)
=\tau\phi(E_i)f_i-\lambda\left(\frac{1}{2f_i}+\kappa f_i\right).
$$
This term intentionally abstracts from the full dynamics of optimization
(nonconvexity, client drift, and so forth). The maintained
interpretation is that, at the margin, more frequent updates and more
effective per-update computation increase the rate at which the global
model improves (or the rate at which the deployed model tracks a
changing environment), while timeliness losses reduce the realized value
of that improvement.
Procurement environments differ from pure engineering optimization because input usage is priced—either through literal market prices/tariffs or through internal shadow prices tied to budgets and reporting. We therefore introduce two per-unit prices:
A carbon/energy price pC ≥ 0 applied to local computation. Node i has an energy intensity ei > 0 per local epoch per update. If node i performs Ei epochs per update, its energy per update is eiEi, and its monetary energy (or carbon-accounting) cost per update is pCeiEi.
A network congestion price pN ≥ 0 applied to communication. Each update has payload size m > 0 (e.g., a model delta), taken constant for tractability. The monetary network cost per update is then pNm. One can interpret pN as an access tariff set by a network operator, or as the server’s internal congestion shadow price capturing the externality that one device’s traffic imposes on others.
Because both costs are incurred per update, per-unit-time costs scale
with frequency. Thus node i’s
priced resource cost per unit time is
Ci(fi, Ei) = (pCeiEi + pNm)fi.
This linear-in-fi structure is
not an innocuous simplification: it is exactly what makes “communication
is expensive when pN is high” a
sharp statement, and it maps directly to how cloud/network bills and
carbon accounting are typically booked (usage × price).
We assume the server has quasilinear preferences and aggregates value
across nodes. Let β > 0 be
a scale factor converting satisfaction Gi into the
server’s value (or, equivalently, an importance weight on the FL task
within a broader business objective). The server’s steady-state
per-unit-time objective is
W = ∑i ∈ ℐ[β Gi(fi, Ei) − Ci(fi, Ei)].
We emphasize two interpretations of W. First, it is a social
surplus measure under the common procurement benchmark in which
transfers between the server and nodes net out, so only real resource
costs (energy and network usage) remain. Second, it is also a
reduced-form profit objective if pC and pN are literal
input prices faced by the server (possibly passed through in
contracts).
Why steady state? In practice, FL involves transient learning curves, changing participation, and time-varying network conditions. Our baseline deliberately compresses these dynamics into a stationary flow problem for three reasons. (i) In many IIoT deployments, FL is not a one-shot training run but an ongoing process of adaptation, making per-unit-time welfare a natural primitive. (ii) The key policy instruments we want to interpret—carbon prices, congestion tariffs—are themselves typically expressed per unit energy and per unit data, i.e., in flow terms. (iii) The comparative statics of interest (how optimal Ei and fi move with pC and pN) already appear in this stationary formulation; adding richer learning dynamics is valuable, but it should not be necessary to see the basic economic logic.
At the same time, we acknowledge what this abstraction leaves out. The function ϕ(E) subsumes complex algorithmic effects: client drift under many local steps, heterogeneity in data quality, and nonstationary environments. The payload m can in reality depend on compression or sparsification choices. And the latency proxy L(f) is not derived from a specific queueing discipline. We view these as extensions rather than flaws: the baseline is meant to be a transparent mapping from prices to operating choices (update cycles and local epochs). The next section takes this objective as given and solves the per-node problem under full information, providing closed-form solutions for common ϕ and clean threshold rules that can be operationalized in procurement settings.
We begin with the benchmark in which the server observes each node’s
energy intensity ei and can
directly implement (or contract on) the pair (fi, Ei).
Because the objective is additively separable across nodes, the server’s
problem decomposes into I
independent per-node optimizations. Fix a node i and suppress the subscript when no
confusion arises. The per-node program is
$$
\max_{f\ge 0,\;E\ge 0}\; w(f,E)
=\beta\Big(\tau\phi(E)f-\lambda\Big(\frac{1}{2f}+\kappa
f\Big)\Big)-\big(p_C e\,E+p_N m\big)f.
$$
The structure is economically transparent: f scales both the benefits from
“effective updates” and the per-update monetary costs, while the
timeliness term generates a nonlinearity in f that prevents the trivial “update
infinitely often” solution when parameters imply that high frequency is
socially valuable.
Consider first an interior candidate with f > 0 and E > 0. The FOC with respect to
local computation E is
$$
\frac{\partial w}{\partial E}
=\beta\tau\,\phi'(E)\,f - p_C e\,f
= f\big(\beta\tau\phi'(E)-p_C e\big)=0,
$$
so any interior optimum must satisfy the simple marginal condition
$$
\boxed{\;\beta\tau\phi'(E^*)=p_C e.\;}
\tag{E-FOC}
$$
Two points are worth highlighting. First, f factors out, so E* (when interior) is
pinned down solely by the tradeoff between the marginal learning value
of an extra local epoch and its carbon-priced energy cost. Second,
because ϕ is concave, ϕ′ is weakly decreasing,
so the equation has at most one solution; if it fails (because pCe is
too large), the optimum is at the boundary E* = 0.
The FOC with respect to frequency f is
$$
\frac{\partial w}{\partial f}
=\beta\tau\phi(E)-\beta\lambda\kappa-\big(p_C eE+p_N m\big)
+\frac{\beta\lambda}{2f^2}=0.
$$
Rearranging yields
$$
\frac{\beta\lambda}{2f^{2}}
=\big(p_C eE+p_N m\big)+\beta\lambda\kappa-\beta\tau\phi(E).
\tag{f-FOC}
$$
Whenever the right-hand side is strictly positive, we obtain the
closed-form interior frequency
$$
\boxed{\;
f^*(E)
=\sqrt{\frac{\beta\lambda}{2\Big[\big(p_C eE+p_N
m\big)+\beta\lambda\kappa-\beta\tau\phi(E)\Big]}}
\;}.
\tag{1}
$$
If the bracketed term is nonpositive, the per-node objective is
increasing in f over the
relevant range (the linear-in-f component dominates the −λ/(2f) term), and an
interior optimizer does not exist. In that case the solution is a
boundary choice determined by feasibility (e.g., an exogenous maximum
feasible update rate, or a network-imposed cap). In the remainder, we
focus on the economically relevant region where parameters imply the
bracketed term is positive at the optimal E, so that a finite f* exists.
Putting the two FOCs together gives a convenient solution procedure: (i) solve for E* from (E-FOC) with a nonnegativity truncation, then (ii) plug E* into (1) to obtain f*.
The per-node objective is strictly concave in f on f > 0 because
$$
\frac{\partial^2 w}{\partial f^2}=-\frac{\beta\lambda}{f^3}<0.
$$
It is concave in E because
ϕ is concave and w is linear in E outside of ϕ(E):
$$
\frac{\partial^2 w}{\partial E^2}=\beta\tau\phi''(E)\,f\le 0.
$$
Intuitively, concavity in E
comes from diminishing returns to local epochs, while strict concavity
in f is generated by the
timeliness term −λ/(2f) together with the
linear congestion penalty −λκf. The role of
κ > 0 is practical as well
as technical: it ensures that “more frequent reporting” eventually
becomes unattractive because it creates delay/overload, thereby ruling
out knife-edge parameter cases in which the best response is to push
communication to the maximum possible rate.
Under these conditions, w(f, E) is jointly concave on f > 0, E ≥ 0, so the optimizer (f*, E*) is unique up to boundary truncation. Thus the FOCs above are not merely necessary but (with the usual KKT conditions) sufficient for optimality.
To make the comparative statics operational, it is useful to
specialize to a standard effectiveness function. Let
$$
\phi(E)=\log(1+E),\qquad \phi'(E)=\frac{1}{1+E}.
$$
Then the interior condition βτϕ′(E) = pCe
becomes
$$
\frac{\beta\tau}{1+E^*}=p_C e
\quad\Rightarrow\quad
E^*=\frac{\beta\tau}{p_C e}-1.
$$
Imposing E ≥ 0 yields the
truncated closed form
$$
\boxed{\;
E^*(p_C)=\max\Big\{\frac{\beta\tau}{p_C e}-1,\;0\Big\}.
\;}
\tag{2}
$$
This expression already captures the core “procurement” message: the
server substitutes away from local computation as carbon becomes
expensive, and low-efficiency devices (high e) are assigned systematically fewer
local epochs.
Given E*, the
frequency expression (1) becomes
$$
\boxed{\;
f^*(p_C,p_N)
=\sqrt{\frac{\beta\lambda}{2\Big[\big(p_C eE^*(p_C)+p_N
m\big)+\beta\lambda\kappa-\beta\tau\log(1+E^*(p_C))\Big]}}
\;}
\tag{3}
$$
provided the denominator is positive. Two special cases help
interpretation:
High carbon price (or inefficient device): If
pCe ≥ βτ,
then E* = 0
and
$$
f^*=\sqrt{\frac{\beta\lambda}{2\big[p_N m+\beta\lambda\kappa\big]}}.
$$
Communication is then chosen purely to balance timeliness gains against
congestion pricing and congestion-induced latency; the device does no
extra local work because its carbon-priced compute is too
costly.
Low carbon price: If pCe < βτ, then E* > 0 and both the value and the cost per update shift with pC. Equation (3) makes clear that carbon pricing affects f* only through the net per-update term βτϕ(E*) − (pCeE*), which will be the object driving the comparative statics in the next section.
Because our benchmark assumes full information and quasilinear
payoffs, the allocation (fi*, Ei*)
can be implemented by a simple posted contract for each node: reimburse
verifiable resource usage at the prevailing prices (or shadow prices)
and add a fixed transfer to satisfy participation. Concretely, a
contract can specify a required (fi, Ei),
verify performance via telemetry/attestation, and pay
Ti = (pCeiEi + pNm)fi + (fixed
fee),
so the node is indifferent over the mandated action while the server
internalizes the true resource costs in its choice of (fi, Ei).
This benchmark is useful not because it describes every deployment, but
because it cleanly separates (i) the economic logic of carbon and
congestion pricing from (ii) informational frictions that we can layer
on later via menus and self-selection.
In sum, the baseline delivers a unique per-node prescription: choose Ei* by equating marginal learning value to marginal carbon-priced energy cost, then choose fi* to trade off timeliness benefits against both monetary per-update costs and congestion-induced latency. The next section uses these characterizations to derive monotonicity results and the regime-switching thresholds that are most directly interpretable for policy and system design.
Our first-order characterization delivers clean monotonicity predictions once we treat (pC, pN) as policy-relevant shadow prices. The guiding intuition is standard in procurement: the server substitutes away from inputs whose unit prices rise, but here substitution is mediated by a timeliness externality—higher frequency reduces staleness (AoI) yet increases congestion-induced delay through κf. This section formalizes three comparative statics: (i) carbon pricing reduces local computation intensity E*; (ii) congestion pricing reduces update frequency f*; and (iii) carbon pricing also weakly reduces f* once we account for the fact that the “effective value per update” is itself optimized over E.
Start from the interior epoch condition (E-FOC),
βτϕ′(E*) = pCe.
Because ϕ is concave, ϕ′ is weakly decreasing,
so the mapping pC ↦ E*(pC)
is (weakly) decreasing. The formal derivative follows from the implicit
function theorem on the interior region where E* > 0 and ϕ″(E*) < 0:
$$
\beta\tau \phi''(E^*)\frac{\partial E^*}{\partial p_C}=e
\quad\Rightarrow\quad
\frac{\partial E^*}{\partial p_C}=\frac{e}{\beta\tau\phi''(E^*)}\le 0.
$$
At high carbon prices the interior equation may cease to have a
nonnegative solution; then the KKT conditions imply a boundary choice
E* = 0. Thus E*(pC)
is globally weakly decreasing: it falls smoothly when interior and
becomes flat at zero once carbon-priced compute is sufficiently
expensive.
Two economic implications are immediate. First, heterogeneity in energy intensity ei acts exactly like heterogeneous effective carbon exposure: devices with high ei are assigned fewer local epochs, even under full information, because their marginal epoch is “taxed” more heavily. Second, in the baseline model E* is independent of the congestion price pN, reflecting separability: pN prices communication per update, while the epoch decision trades off only marginal learning returns βτϕ′(E) against marginal carbon cost pCe. (In richer formulations where accuracy per unit time must meet a target, we would expect substitution from communication toward compute when pN rises; we return to this idea in the discrete regime model.)
Next consider the frequency decision. Holding E fixed, the interior solution takes
the closed form (1),
$$
f^*(E)=\sqrt{\frac{\beta\lambda}{2D(E;p_C,p_N)}},\qquad
D(E;p_C,p_N)\equiv \big(p_C eE+p_N
m\big)+\beta\lambda\kappa-\beta\tau\phi(E),
$$
defined when D > 0.
Differentiating with respect to pN at an
interior solution yields a strict monotonicity result whenever m > 0:
$$
\frac{\partial f^*}{\partial p_N}
=-\frac{1}{2}\sqrt{\frac{\beta\lambda}{2}}\,D^{-3/2}\cdot m
= -\frac{m}{2D}\,f^*<0.
$$
Thus congestion pricing reduces the efficient update rate: as each
transmission becomes monetarily more expensive, the server economizes on
communication by increasing the update cycle length θ = 1/f.
This monotonicity has a straightforward “Pigouvian” interpretation. The term pNm can be read either as an explicit price charged by a network operator or as the server’s own shadow price for network capacity (e.g., an internal congestion cost). In either case, a higher pN shifts up the per-update marginal cost of frequency, and the optimal response is to reduce f, especially when m (payload size) is large.
Carbon pricing enters the frequency condition both directly (through the term pCeE) and indirectly (through the induced reduction in E*). A convenient way to organize the effect is to separate the “per-update net value of compute” from the purely frequency-driven timeliness tradeoff.
Define the optimized per-update net learning value (in utility units)
as
Ψ(pC) ≡ maxE ≥ 0{βτϕ(E) − pCeE} = βτϕ(E*(pC)) − pCeE*(pC).
Then the denominator in the frequency formula can be rewritten at the
epoch optimum as
D(E*(pC); pC, pN) = pNm + βλκ − Ψ(pC),
so that, whenever the interior f* exists,
$$
f^*(p_C,p_N)
=\sqrt{\frac{\beta\lambda}{2\big[p_N
m+\beta\lambda\kappa-\Psi(p_C)\big]}}.
\tag{CS-f}
$$
This expression makes the economics transparent: communication is more
frequent when the per-update net value Ψ(pC)
is high (updates are “worth more”), and less frequent when either
congestion costs (pNm)
or congestion-induced latency (βλκ) are
high.
The remaining question is how Ψ(pC)
moves with pC. Under our
maintained assumptions, Ψ is
weakly decreasing in pC; indeed,
where the epoch choice is interior, the envelope theorem gives an
especially simple derivative:
$$
\frac{d\Psi}{dp_C}
=\frac{\partial}{\partial p_C}\Big(\beta\tau\phi(E^*)-p_C eE^*\Big)
=-eE^*(p_C)\le 0.
$$
Intuitively, when the carbon price rises, the server optimally trims
epochs, but even after this adjustment the best achievable
per-update net learning value falls—by exactly the current energy
exposure eE*. Combining
this with (CS-f), we obtain
$$
\frac{\partial f^*}{\partial p_C}
=\frac{f^*}{2D}\,\frac{d\Psi}{dp_C}
= -\frac{eE^*(p_C)}{2D}\,f^*
\le 0,
$$
with strict inequality whenever E* > 0 and the
interior f*
solution applies. When carbon pricing is high enough that E* = 0, Ψ(pC) = 0
locally and ∂f*/∂pC = 0:
carbon policy no longer distorts frequency because the server has
already shut down discretionary local computation.
This result is best read as an interaction between input substitution and a timeliness externality. Carbon pricing directly discourages compute (lower E*), which mechanically makes each update less effective; because frequency is chosen to balance timeliness gains against per-update costs, lower per-update effectiveness reduces the incentive to communicate frequently. The κf term is central here: it ensures that frequency is not “free” in latency terms, so a decline in per-update value translates into a lower optimal cadence rather than an attempt to compensate by updating ever more often.
Finally, we note a practical corollary that will matter for architecture choice. Since pN moves f* directly while pC moves both E* and f*, the joint policy environment (pC, pN) shapes whether the system should rely on “communication-heavy” learning (high f, low E) or “compute-heavy” learning (lower f, higher E per update). The next section makes this tradeoff explicit by restricting E to two discrete regimes and deriving the carbon threshold pC†(pN) at which the server switches architectures.
The continuous-choice benchmark is useful for comparative statics,
but it overstates the server’s flexibility. In most IIoT FL deployments
the “architecture’’ is implemented by protocol and software defaults (or
certification constraints) that effectively pin local computation to a
small set of admissible epoch regimes. A natural stylization is
therefore to restrict the epoch choice to two discrete levels,
E ∈ {EL, EH}, 0 ≤ EL < EH,
where EL
represents a communication-heavy design (many rounds, little local work
per round) and EH represents a
compute-heavy, FedAvg-like design (more local work per round).
With E discrete, the server’s problem becomes a two-step program: for each regime r ∈ {L, H}, (i) choose the optimal update frequency f given E = Er, and then (ii) pick the regime delivering the higher optimized per-node value. We can then ask a policy-relevant question: for given congestion pricing pN, at what carbon price pC does the server switch from compute-heavy to communication-heavy?
Fix a node i and suppress
the index to simplify notation. For a given epoch regime Er, the
per-unit-time objective reduces to a one-dimensional problem
Vr(pC, pN) ≡ maxf > 0 w(f, Er),
with
$$
w(f,E_r)
=\beta\Big(\tau\phi(E_r)f-\lambda\big(\tfrac{1}{2f}+\kappa f\big)\Big)
-\big(p_C eE_r+p_N m\big)f.
$$
As in the interior analysis, define the “effective per-unit-frequency
wedge’’
Dr(pC, pN) ≡ (pCeEr + pNm) + βλκ − βτϕ(Er).
Whenever Dr > 0, the
objective is strictly concave in f and the unique interior maximizer
is
$$
f_r^*(p_C,p_N)
=\sqrt{\frac{\beta\lambda}{2D_r(p_C,p_N)}}.
$$
Intuitively, Dr aggregates
all terms that make frequent communication unattractive in regime r: carbon-priced compute per unit
time (pCeEr)f,
congestion-priced traffic (pNm)f,
and congestion-induced latency βλκf, net
of the learning value βτϕ(Er)f.
A higher Er raises both
the carbon exposure pCeEr
and the per-update value βτϕ(Er);
which force dominates determines whether the compute-heavy regime
induces a higher or lower optimal cadence.
Plugging the optimizer back into w yields the regime value (for Dr > 0):
$$
V_r(p_C,p_N)
=w\big(f_r^*(p_C,p_N),E_r\big)
=-\sqrt{2\beta\lambda\,D_r(p_C,p_N)}.
$$
We emphasize that, in this reduced-form steady-state proxy, Vr is strictly
decreasing in Dr: any force
that raises the effective wedge against frequent updating reduces the
optimized value within that regime.
The server prefers the compute-heavy regime EH whenever
VH(pC, pN) ≥ VL(pC, pN).
On the interior region where both regime-specific frequency choices are
well defined, the monotonicity of Vr in Dr implies that
the comparison is equivalent to comparing wedges:
VH ≥ VL ⇔ DH(pC, pN) ≤ DL(pC, pN).
Therefore, a (candidate) switching threshold solves DH = DL,
i.e.,
pCe(EH − EL) = βτ(ϕ(EH) − ϕ(EL)).
This delivers a clean closed form:
$$
p_C^{\dagger}
=\frac{\beta\tau\big(\phi(E_H)-\phi(E_L)\big)}{e\,(E_H-E_L)}.
$$
The economic content is straightforward: the server switches away from
compute-heavy training once the carbon price makes the marginal
carbon cost of moving from EL to EH exceed the
corresponding marginal learning gain, measured in the same
utility units.
At the same time, this knife-edge clarity reveals a limitation: under the maintained baseline in which (i) the communication payload m is the same across regimes and (ii) congestion enters only through the linear per-update term (pNm)f (and the same κ applies across regimes), the congestion price pN cancels out of the regime comparison. Congestion pricing still depresses VH and VL by lowering fr*, but it does so in a way that does not tilt the ranking between EH and EL on the interior region.
This is exactly the sense in which the two-regime exercise is informative: it isolates which modeling ingredients are needed for congestion to shift architectural choice, rather than merely scaling down communication intensity.
In operational FL systems, regimes typically differ not only in local
epochs but also in the network footprint per “round’’—through protocol
overhead, compression choices, error-feedback state, or richer metadata
in compute-heavy designs. A parsimonious way to capture this is to allow
the payload to be regime-dependent, mr (with mH not
necessarily equal to mL). Repeating
the analysis with mr yields
Dr(pC, pN) = (pCeEr + pNmr) + βλκ − βτϕ(Er),
and the indifference condition DH = DL
becomes
$$
p_C^{\dagger}(p_N)
=\frac{\beta\tau\big(\phi(E_H)-\phi(E_L)\big)-p_N\,(m_H-m_L)}{e\,(E_H-E_L)}.
$$
This expression makes the congestion-shift channel transparent:
$$
\frac{\partial p_C^{\dagger}}{\partial p_N}
=-\frac{m_H-m_L}{e\,(E_H-E_L)}.
$$
Hence, pC†(pN)
is weakly decreasing in pN whenever the
compute-heavy regime carries a weakly larger network footprint per
update, mH ≥ mL.
Economically, if compute-heavy training is also “heavier’’ on the
network (for instance, because it transmits a less-compressed model
delta, additional state, or higher-precision updates to capitalize on
more local work), then congestion pricing penalizes the compute-heavy
regime disproportionately, so the server requires cheaper
carbon (a lower pC) to justify
it.
Conversely, if mH < mL—the case one might associate with aggressive compression or fewer bytes per effective unit of learning under FedAvg-like designs—then congestion makes compute-heavy training more attractive and the threshold increases. We can therefore read the sign of ∂pC†/∂pN as an empirical question about which regime is more “bytes intensive’’ per update, holding fixed the update definition.
A second, closely related way congestion can shift the threshold is through feasibility/boundary effects. Even if m is identical across regimes, large pN can push one regime into a corner where the interior expression for fr* no longer applies (e.g., due to protocol-imposed bounds on admissible cadences or because the implied optimizer would require an unrealistically small/large θ). Once one regime is effectively constrained while the other remains interior, the regime ranking inherits a dependence on pN through which constraint binds, and the switching set generically tilts with congestion.
Taken together, these considerations justify treating the threshold as a function pC†(pN) in applied work: congestion affects architectural choice precisely to the extent that architectures differ in their induced network footprint or in which operating constraints become active under high congestion. This is also the sense in which policy instruments interact: carbon pricing primarily rotates the compute margin, while congestion pricing (or capacity scarcity) rotates the communication margin, and the resulting architecture choice depends on how protocols map “more local work’’ into transmitted bytes and admissible frequencies.
The discrete-regime exercise treats a representative node. In practice, IIoT fleets are heterogeneous along two economically central dimensions: (i) energy intensity ei (older devices, weaker accelerators, or harsher thermal envelopes require more Joules per local epoch), and (ii) learning productivity/value (some nodes observe richer data, higher variance states, or more decision-relevant operating regimes). We capture the second dimension with a multiplicative shifter vi > 0 in the learning term, so that node i’s per-unit-time contribution becomes τviϕ(Ei)fi.
With this heterogeneity, the server’s per-node surplus (still in
steady state, ignoring transfers) is
$$
w_i(f_i,E_i;v_i,e_i)
=\beta\Big(\tau v_i\phi(E_i)f_i-\lambda\big(\tfrac{1}{2f_i}+\kappa
f_i\big)\Big)
-\big(p_C e_i E_i+p_N m\big)f_i,
$$
and the server may also choose a participation set 𝒮 ⊆ ℐ (equivalently, set fi = 0 by not
enrolling node i at all, so
that no AoI/latency terms are incurred for that node).
Conditional on including node i, the interior FOCs take the same
form as before, with τ
replaced by τvi. In
particular, the local-epoch choice equates marginal learning returns to
marginal carbon-priced compute:
βτvi ϕ′(Ei*) = pCei,
so Ei*
is increasing in vi and
decreasing in (pC, ei).
The update-frequency choice can be written compactly by defining the
net per-update learning surplus from compute (measured in the
server’s utility units),
ψi(pC; vi, ei) ≡ βτviϕ(Ei*) − pCeiEi*,
so that the frequency FOC becomes
$$
\frac{\beta\lambda}{2(f_i^*)^2}
=\big(p_N m+\beta\lambda\kappa\big)-\psi_i(p_C;v_i,e_i),
$$
whenever the right-hand side is positive (otherwise the interior
expression breaks and a boundary/feasibility constraint is needed). This
representation makes the selection logic transparent: nodes with high
vi and low
ei
generate larger ψi, which
relaxes the “wedge” against frequent updating created by congestion
pricing pN
and congestion-latency κ.
Let the server’s outside option from excluding node i be normalized to 0. Then the server includes node i iff its optimized per-node surplus
is nonnegative:
i ∈ 𝒮 ⇔ maxf ≥ 0, E ≥ 0 wi(f, E; vi, ei) ≥ 0.
This “participation” formulation is the natural full-information
benchmark: with quasilinear node utilities and verifiable costs, the
server can always implement (fi*, Ei*)
through a cost-reimbursing contract plus (if needed) a fixed transfer
that satisfies individual rationality. In richer environments (private
ei,
private vi, limited
verification), the right-hand side becomes a virtual surplus
and participation becomes a screening problem; we return to this only
briefly below.
Two comparative-static implications follow immediately. First, increasing either price pC or pN weakly shrinks 𝒮, because higher prices lower the maximized value of wi. Second, selection becomes more stringent for energy-inefficient nodes as carbon prices rise: even if such nodes remain useful at low pC, they are “priced out” earlier.
To make the cutoff operational, it is useful to exhibit an explicit
index in a canonical concave form. Take
ϕ(E) = log (1 + E),
so $\phi'(E)=\frac{1}{1+E}$. The epoch
FOC yields
$$
E_i^*(p_C)=\max\Big\{\frac{\beta\tau v_i}{p_C e_i}-1,\;0\Big\}.
$$
Thus the compute decision depends on the value per unit energy
ratio vi/ei
and the carbon price pC. When Ei* > 0,
we can also compute the net per-update surplus term:
$$
\psi_i(p_C;v_i,e_i)
=\beta\tau v_i\log\!\Big(\frac{\beta\tau v_i}{p_C
e_i}\Big)-\big(\beta\tau v_i-p_C e_i\big).
$$
Inserting ψi into the
frequency condition shows that higher vi/ei
raises ψi,
which (holding feasibility fixed) increases the optimal frequency fi*
and raises the node’s optimized surplus. This yields an “index”
interpretation: nodes can be approximately ranked by
$$
\text{index}_i \;\approx\; \frac{v_i}{e_i},
$$
with the participation set characterized by a cutoff that moves downward
in (pC, pN).
Formally, for any fixed (pC, pN),
there exists a threshold $\underline{\rho}(p_C,p_N)$ such that nodes
with sufficiently large vi/ei
are (weakly) more likely to satisfy maxf, Ewi ≥ 0.
In applied work, this is the key empirical implication: carbon pricing
shifts the selected fleet toward “high learning value per Joule,” while
congestion pricing shifts it toward “high learning value per transmitted
byte” (here proxied by vi since m is constant in the baseline).
A limitation is worth flagging. Because our steady-state proxy penalizes staleness via the λ/(2f) term, the interior solution may deliver negative optimized levels for some parameterizations even for high vi. In realistic deployments, feasibility bounds on f (protocol limits), additional baseline benefits from participation, or a finite-horizon objective typically regularize levels; nonetheless, the index ranking and price-comparative statics remain informative for which nodes are dropped first.
Up to now, transfers cancel in the server’s surplus W. Procurement in the field,
however, is often subject to a hard budget (or accounting) constraint:
the server cannot reimburse all induced energy and network expenditures.
A parsimonious way to model this is to impose a per-unit-time spending
cap B on reimbursements:
∑i ∈ 𝒮Ti ≤ B,
where under full-information cost reimbursement a natural lower bound is
Ti ≥ (pCeiEi + pNm)fi
(plus any required fixed participation payment if nodes have nonzero
outside options).
The main analytical implication is that a binding budget introduces a
Lagrange multiplier μ ≥ 0 that
effectively scales up resource prices. In the simplest case where
transfers equal verifiable resource costs, the constrained problem is
equivalent to the unconstrained problem with augmented
prices
pCeff = (1 + μ)pC, pNeff = (1 + μ)pN.
Hence budget scarcity amplifies the impact of both carbon and congestion
pricing on (i) intensive margins (fi, Ei)
and (ii) the extensive margin 𝒮.
Practically, this predicts “double rationing” in high-price
environments: not only does the server reduce frequencies and epochs for
enrolled nodes, it also drops marginal nodes entirely.
When nodes’ outside options differ or information rents matter, the budget constraint becomes closer to a knapsack problem: select the subset of nodes with the highest (virtual) surplus per dollar of required transfer. Even then, the same qualitative index logic reappears: the server tilts enrollment toward high vi and low ei, and prices (pC, pN) rotate the cutoff.
Finally, heterogeneity highlights why measurement and verification matter for policy. If ei is privately known and difficult to audit, a menu of contracts over (f, E) can be used to screen devices, but carbon pricing then interacts with informational distortions: the server may deliberately depress E or f to reduce rents for energy-inefficient types. Conversely, if secure attestation or digital twins make ei contractible, the clean full-information cutoffs above become a reasonable approximation, and one can interpret carbon pricing as transparently shifting participation toward energy-efficient IIoT nodes rather than merely reshuffling transfers.
Up to this point, we have treated the congestion price pN as an
exogenous “market” parameter faced by the server when it selects
per-node frequencies fi. In many IIoT
deployments, however, this is an incomplete description: network costs
are load dependent. When more devices push updates more
frequently, backhaul and access networks experience higher utilization,
queueing, packet loss, and ultimately a higher shadow cost of bandwidth.
A natural extension is therefore to let the congestion price be an
increasing function of the aggregate update load
F ≡ ∑i ∈ ℐfi, pN = P(F), P′(F) > 0,
where P(⋅) is interpreted
either as (i) a tariff schedule posted by a network operator, (ii) a
reduced-form mapping from utilization to “effective” per-MB cost, or
(iii) a Pigouvian toll that internalizes the congestion externality
created by higher update traffic.
The key economic change is that the server now faces a feedback loop: higher chosen frequencies raise F, which raises pN, which in turn lowers the privately optimal frequencies. Equilibrium frequencies are therefore characterized as a fixed point.
A convenient starting point is a parametric schedule such as
P(F) = p̄N + ηF (η > 0),
or more generally P(F) = p̄N + ηFα
with α ≥ 1. These forms
capture the idea that marginal bandwidth cost is increasing in
utilization, and they are analytically tractable in comparative statics
even when they do not admit closed forms.
A simple microfoundation is to suppose the network operator incurs a
convex operating cost K(F) (e.g., energy for
radio resources, peering/transit costs, or capacity expansion cost
amortized per unit time) and charges a per-unit-data price equal to
marginal cost:
pN = K′(F)/m,
where m is the per-update
payload. Alternatively, if the operator chooses pN to manage
delay, one can interpret P(F) as implementing a
service-level objective: higher load implies higher expected waiting
time and thus higher effective “price” per MB faced by the server.
With endogenous pN, the server’s problem is still separable across nodes conditional on a conjectured congestion price. For any candidate price pN, we can compute each node’s optimal pair (fi*(pN), Ei*). In the baseline structure, the epoch choice Ei* continues to depend on carbon cost parameters (and node energy intensity) but not on pN, whereas the frequency choice is decreasing in pN whenever the interior solution applies.
Define the aggregate frequency induced by a given congestion
price:
ℱ(pN) ≡ ∑i ∈ ℐfi*(pN).
An equilibrium with endogenous congestion pricing is then a pair (F*, pN*)
such that
F* = ℱ(pN*), pN* = P(F*).
Equivalently, it is a fixed point in F:
F* = Γ(F*) ≡ ℱ(P(F*)).
This representation makes the logic transparent: P(⋅) maps load into a congestion
price, and ℱ(⋅) maps that congestion
price back into the server’s optimal aggregate load.
Existence is typically straightforward under mild regularity. If P(⋅) is continuous and ℱ(⋅) is continuous (which follows from the continuity of each fi*(pN) away from boundary truncations), then Γ(⋅) is continuous. Moreover, Γ(F) ≥ 0, and under standard parameterizations Γ(F) is bounded above because sufficiently large pN makes updates prohibitively expensive, driving fi* toward 0. Hence a fixed point exists by a standard intermediate-value argument.
Uniqueness hinges on the strength of the feedback. Since P is increasing and each fi* is decreasing in pN (interior), ℱ is decreasing and thus Γ(F) = ℱ(P(F)) is decreasing in F. A decreasing continuous function can cross the 45-degree line at most once if it is strictly decreasing on the relevant interval; thus, away from kinks created by boundary solutions, we typically obtain a unique equilibrium load F*. When boundary truncations matter (e.g., some nodes optimally set fi* = 0 at high pN), Γ may become piecewise smooth; uniqueness still often holds but requires checking that the piecewise segments do not generate multiple crossings. Economically, multiplicity is most plausible when the congestion schedule is very steep and the participation/frequency responses have sharp thresholds—conditions under which “low-load/low-price” and “high-load/high-price” regimes can both be self-consistent.
Endogenizing pN changes
comparative statics because a primitive parameter affects F both directly (through the
server’s best response) and indirectly (through the induced change in
congestion price). Formally, if we write the fixed point as H(F; ξ) = F − Γ(F; ξ) = 0
where ξ is a parameter such as
pC or
κ, then (when differentiable)
the implicit function theorem gives
$$
\frac{\partial F^*}{\partial \xi}
=
\frac{-\frac{\partial H}{\partial \xi}}{\frac{\partial H}{\partial F}}
=
\frac{\frac{\partial \Gamma}{\partial \xi}}{1-\Gamma_F},
\qquad
\Gamma_F \equiv \frac{\partial \Gamma}{\partial F}.
$$
Because Γ is decreasing in
F, we have ΓF < 0 and
thus 1 − ΓF > 1.
This inequality has a useful interpretation: feedback dampens
shocks that would otherwise change aggregate frequency. For
example, an increase in the congestion sensitivity η or the payload m directly reduces optimal fi, lowering
F; but the resulting lower
load also reduces pN, partially
offsetting the initial drop. Symmetrically, anything that raises the
server’s private incentive to update more (higher τ or β, lower κ) increases F but also raises congestion prices,
which pushes back against the increase.
This is also where closed forms frequently fail. Even if each fi*(pN) is available in closed form, plugging pN = P(F) typically yields a nonlinear equation in F. With heterogeneity across nodes (different ei, possibly different vi, and thus different Ei* and different intercept terms in the f-FOC), ℱ(pN) becomes a sum of square-root-type terms, and the fixed point generally does not simplify.
There are narrow cases where a closed form can be recovered—e.g., homogeneous nodes and a linear P(F) may reduce the fixed point to a solvable polynomial—but these are better viewed as pedagogical special cases than robust quantitative tools.
In applied settings, the most reliable approach is numerical. A convenient algorithm exploits monotonicity:
Because Γ(F) is typically continuous and decreasing, one can solve F − Γ(F) = 0 via bisection on an interval [0, F̄] where F̄ is an upper bound computed from the zero-congestion benchmark (e.g., pN = p̄N) or from protocol limits. If differentiability holds, Newton or quasi-Newton methods are faster, but bisection is robust to kinks created by node dropouts (fi* = 0).
A closely related method is price iteration: start from a guess pN(0), compute F(t) = ℱ(pN(t)), update pN(t + 1) = P(F(t)), and iterate. When the composite mapping pN ↦ P(ℱ(pN)) is a contraction (a “small gain” condition equivalent to sufficiently weak feedback), this converges quickly and also provides a natural notion of stability.
Endogenous congestion pricing highlights a practical policy point: carbon pricing and network pricing are not independent instruments once congestion is load-determined. A carbon-price increase that reduces compute (and often reduces communication) can indirectly lower congestion prices, mitigating the communication reduction and changing the incidence across nodes. Conversely, stronger congestion pricing can shift the server toward architectures or operating points that economize on communication, potentially increasing the relative attractiveness of compute-heavy local training—an effect that becomes especially salient when we move beyond the baseline separability and allow explicit substitution between local epochs and update frequency.
At the same time, we should acknowledge the limits of the reduced-form P(F). Real networks exhibit capacity regions, discrete scheduling, and time-of-day variability; moreover, congestion may enter not only through monetary costs but also through the latency term itself. These features strengthen the case for numerical solution and for calibration/estimation—precisely the direction we take in the quantitative illustration that follows.
Our purpose in this section is not to claim a “one-size-fits-all” numerical forecast for IIoT federated learning (FL), but rather to show how the primitives of the model map into measurable engineering quantities, to provide plausible calibration ranges, and to spell out a small set of falsifiable comparative-static predictions (elasticities and threshold effects) that one can take to data or to a digital-twin simulation.
Energy per local epoch per update, ei. A
convenient operational definition is
ei = P̄itrain × tiepoch,
where P̄itrain
is average incremental power draw during local training (above idle) and
tiepoch
is time per local epoch. In IIoT settings, both terms are highly
heterogeneous: low-power microcontrollers may have P̄itrain ∈ [0.5, 5]W
while edge gateways or GPU-enabled controllers can sit in [10, 150]W; epoch times can range from
fractions of a second (tiny models, small local datasets) to tens of
seconds (larger models, slower storage). Translating to energy, it is
common to see per-epoch values spanning roughly
ei ∈ [10−1, 103] J/epoch (i.e., ≈ [3 × 10−8, 3 × 10−4] kWh/epoch).
The upper end is not pathological: a 50W device taking 20s per epoch
consumes 1000J per epoch.
Carbon price per unit energy, pC. The
model treats pC as a monetary
price per kWh (or per J). In practice, carbon costs are often stated as
$/tCO2, so we convert using
the local grid emissions factor g (tCO2/kWh):
pCkWh = pCtCO2 × g.
With pCtCO2 ∈ [50, 200]
$/tCO2 (typical regulatory
or internal shadow prices) and g ∈ [0.0002, 0.0007] tCO2/kWh (roughly 200–700 gCO2/kWh), we obtain
pCkWh ∈ [0.01, 0.14] $/kWh.
This range is economically meaningful relative to energy costs that
already vary across sites and time-of-day; it is also precisely why
measuring when and where training occurs matters for carbon
accounting.
Network congestion price, pN. We
interpret pN as $/MB (or
$/GB) of update traffic, encompassing explicit tariffs, SLA shadow
prices, and/or congestion tolls. Enterprise and industrial connectivity
costs vary widely: metered cellular can be a few $/GB, while private
backhaul or negotiated transit can be far lower in accounting terms but
still exhibit high shadow costs when capacity is scarce. A
practical range is
pN ∈ [5 × 10−5, 10−2] $/MB (i.e.,
[0.05, 10] $/GB).
Payload per update, m. For FL, m depends on model size and compression/quantization. In IIoT applications we often see m ∈ [0.1, 20] MB per round (from aggressively compressed shallow models to moderately sized neural nets). This parameter matters because it scales the marginal cost of frequency one-for-one through the term pNmfi.
Timeliness penalty parameters, λ and κ. These are less directly observed because they summarize the business cost of staleness and queueing. One pragmatic strategy is to calibrate λ to a willingness-to-pay for reducing latency or AoI (e.g., using downtime costs or defect-detection value) and calibrate κ using empirical delay-vs-load curves from the network (or a queueing model). We emphasize that this is where “engineering” and “economics” meet: κ can be estimated from telemetry, while λ is a valuation parameter.
Effectiveness from local computation, ϕ(E). For quantitative illustration we recommend a one-parameter concave family such as ϕ(E) = log (1 + E) or $\phi(E)=\sqrt{E}$. The log case is particularly transparent because it implies a simple inverse relationship between optimal epochs and carbon price.
A minimal simulation that matches the model’s logic proceeds as follows.
Choose a population of devices. Fix I (e.g., I = 102–104) and draw heterogeneity in energy intensity ei (lognormal is natural) and, if desired, heterogeneity in learning productivity vi that scales the benefit term (replace τ by τvi). One can tie ei mechanically to device classes (sensor, PLC, gateway) to mirror a real fleet.
Fix prices and technical parameters. Select (pC, pN, m), and either (i) treat pN as fixed, or (ii) make it load-dependent through a schedule pN = P(F) that can be estimated from observed throughput/delay or from an operator tariff.
Compute per-node optimal intensities. For a chosen ϕ, compute Ei*(pC) from the epoch first-order condition (with truncation at 0 if needed), then compute fi* from the frequency first-order condition (again truncating if the interior formula implies infeasibility). In the endogenous congestion version, solve the scalar fixed point for F and thus pN, then recover {fi*}.
Report outcomes and counterfactuals. Aggregate statistics—mean frequency, distribution of update cycles θi, total traffic m∑ifi, energy use ∑ieiEifi, and surplus—are computed under baseline and under price shocks (e.g., doubling pC, increasing pN, changing m via compression). For the discrete-architecture variant E ∈ {EL, EH}, compute regime shares and the implied threshold behavior in pC.
This design is intentionally modular: a practitioner can swap in a measured P(F), a different ϕ fitted from learning curves, or device-specific mi if payloads differ.
The model generates several predictions that can be tested using FL logs (round frequency, local steps), network telemetry (utilization, delay), and carbon/energy accounting (site-specific grid intensity).
(i) Update-frequency elasticity w.r.t. congestion
price. In the interior solution,
fi* ∝ [ (pNm) + (other
terms) ]−1/2.
Hence a robust qualitative prediction is ∂fi*/∂pN < 0.
Quantitatively, the log-elasticity takes the form
$$
\frac{\partial \log f_i^*}{\partial \log p_N}
\;=\;
-\frac{1}{2}\cdot \frac{p_N m}{(p_N m)+\text{(other
terms)}}\;\in\;(-1/2,0),
$$
approaching −1/2 when network cost
dominates the denominator. This delivers an empirical target: in
high-congestion regimes, a 10% increase in effective $/MB should reduce
update frequency by roughly 5% (modulo truncations and any protocol
constraints).
(ii) Epoch elasticity w.r.t. carbon price (log
case). If ϕ(E) = log (1 + E),
then
$$
E_i^*=\max\left\{\frac{\beta\tau}{p_C e_i}-1,\,0\right\}.
$$
Whenever Ei* > 0,
we obtain a stark prediction: epochs fall approximately inversely with
the carbon price (and with device energy intensity). In particular, for
1 + Ei* = βτ/(pCei),
$$
\frac{\partial \log(1+E_i^*)}{\partial \log p_C}=-1,
\qquad
\frac{\partial \log(1+E_i^*)}{\partial \log e_i}=-1.
$$
Thus, if a site’s carbon price doubles (holding everything else fixed),
the model predicts a halving of 1 + Ei*
for devices that remain interior.
(iii) Participation/activation margins under heterogeneous devices. With heterogeneity, some nodes optimally shut down (choose fi* = 0) when prices are high or value is low. The falsifiable implication is selection on energy efficiency: conditional on task value, higher-ei devices should reduce frequency earlier and exit sooner as pC rises. In data, this appears as a compositional shift of training participation toward more energy-efficient hardware when carbon accounting tightens.
(iv) Architecture switching as a threshold phenomenon. Under the two-regime restriction E ∈ {EL, EH}, the model predicts a sharp switch at a threshold pC†(pN). Empirically, this translates into a discontinuous change in observed local-steps-per-round (and often an accompanying change in frequency) around policy-induced carbon-cost changes—especially when organizations adopt internal carbon prices or when regulators change reporting requirements.
(v) Feedback attenuates elasticities when pN is load-dependent. When congestion prices respond endogenously to aggregate updates, any primitive shock that would change ∑ifi is partially offset by the induced movement in pN. The testable implication is that measured elasticities of aggregate traffic with respect to, say, compression (lower m) or valuation shocks (higher τ) should be smaller in networks with steeper utilization-based pricing or tighter capacity—precisely because prices “push back.”
Taken together, these predictions discipline what we should see in real IIoT FL deployments: (a) frequency responds strongly and concavely to network pricing and payload size; (b) local computation responds strongly to carbon pricing and device energy intensity; and (c) policy or tariff changes can induce discrete regime shifts in “compute-heavy vs communication-heavy” operation. The next section returns to design guidance and the practical limits of what this reduced-form calibration can capture.
Our model is intentionally spare: we reduce an IIoT federated learning (FL) deployment to two choice variables per node—update frequency fi and local compute intensity Ei—and two externally facing prices—carbon/energy pC and network congestion pN. The payoff from doing so is that the design problem becomes legible as a pair of “equalize-at-the-margin” rules: choose local epochs until the marginal learning gain equals the marginal carbon cost, and choose communication frequency until the marginal value of freshness equals the marginal cost of updates (including congestion and latency). The discussion below translates these rules into operational guidance and highlights what must be measured—and audited—if carbon-aware FL is to be more than a slogan.
(i) Separate decisions: compute per round vs rounds per time. In the baseline model, the first-order condition for Ei depends on pCei but not on pN, while the condition for fi depends on both pN and the net value of an update. This separability is a practical feature: teams can often assign “how many local steps per round” to an on-device training module, and “how often to communicate” to a scheduler that is network-aware. In other words, carbon-aware tuning can be implemented without redesigning the whole FL stack: estimate ei, pick Ei using a carbon price signal, and then schedule fi using network conditions and timeliness requirements.
(ii) Implement carbon-aware local training as a marginal rule, not a static cap. A common practice is to impose a hard energy budget or to throttle training uniformly. The model suggests a more efficient approach: treat carbon as a shadow price that changes with location and time, and adapt Ei accordingly. In the log illustration ϕ(E) = log (1 + E), optimal epochs follow an inverse rule 1 + Ei* ∝ 1/(pCei), so a device running during a low-carbon window should rationally do more work per round than the same device during a high-carbon window. Practically, this points toward carbon-aware “bursting”: concentrate local epochs when the effective pC is low (cleaner grid or lower internal carbon charge), and reduce epochs otherwise.
(iii) Treat update frequency as the primary congestion control knob. Communication costs scale linearly with fi, while timeliness costs are convex in fi through κfi. This makes frequency the natural control for network externalities. When a deployment experiences congestion, shrinking payload m and lowering fi are substitutes; the model clarifies when each is preferred. If compression/quantization can reduce m without materially lowering the “value per update,” then reducing m is a first-best fix because it cuts congestion costs without increasing staleness. When reducing m is costly (e.g., accuracy loss, engineering constraints), the next best response is to lower fi, accepting higher AoI but relieving network load and queueing delays.
(iv) Use heterogeneity rather than fighting it. Real IIoT fleets include ultra-low-power sensors, PLCs, gateways, and GPU-capable controllers. A uniform policy (“all nodes do E epochs and update every θ seconds”) is almost never surplus-maximizing once ei varies. The mechanism design implication is simple even in the full-information baseline: compute-inefficient nodes should either (a) compute less per update (lower Ei), (b) update less often (lower fi), or (c) drop out when prices spike. In operations, this becomes a set of defaults: an “energy class” table for devices, per-class E schedules indexed by pC, and per-class maximum frequencies indexed by pN and observed delay.
(v) Architecture choice is a policy lever when communication is scarce. The discrete variant E ∈ {EL, EH} captures an engineering reality: organizations often choose between a communication-heavy configuration (small local work, frequent rounds) and a compute-heavy configuration (more local work, fewer rounds). The threshold pC†(pN) is a useful organizing concept for practice: it says there is a region of environments where compute-heavy FL is the rational response to expensive communication, and a region where it is not worth the carbon cost. Even if the true frontier is continuous, teams can operationalize “regime switching” by selecting from a small menu of FL profiles (e.g., (E, f) presets) rather than tuning continuously.
(vi) Contracts and internal chargebacks can implement the optimum. Many deployments rely on business units that “own” devices, energy budgets, and connectivity plans. The model’s quasilinear structure suggests a straightforward governance design: reimburse verifiable resource costs and pay a bonus indexed to the value of contributions (or a proxy such as participation at prescribed (fi, Ei)). Even if exact value is hard to measure, a posted menu of profiles can induce self-selection: energy-efficient nodes pick higher-E or higher-f options because their private cost is lower.
The comparative statics in the model are only as credible as the measurement system that feeds pC, pN, and ei into the control policy.
(i) Carbon accounting must be granular in time and place. Treating pC as a single number is analytically convenient but operationally misleading. Grid intensity varies by region and hour; internal shadow prices vary by division and reporting regime. If training is scheduled without location- and time-specific emissions factors, the system will systematically misallocate compute. A practical policy implication is that “carbon-aware FL” requires an auditable pipeline from (a) node location (or power domain), (b) marginal emissions factors (or contractual renewable claims), and (c) an internal carbon price to an effective pC that is applied at the time of training.
(ii) Device energy telemetry should be treated as regulated measurement, not an afterthought. Our key heterogeneity parameter ei is not merely a device spec; it depends on workload, temperature, and implementation. If ei is mismeasured, carbon-aware policies can perversely shift load toward devices that are actually inefficient. This motivates an engineering-policy recommendation: organizations should maintain a calibration protocol for ei (benchmarks, periodic re-estimation, and attestation), analogous to how industrial systems calibrate sensors that drive safety-critical controls.
(iii) Network pricing is de facto congestion policy. Even when pN is not an explicit tariff, it exists as an internal scarcity cost: SLA risk, delayed control decisions, or foregone throughput. Making pN explicit—via chargebacks, quotas, or congestion tolls—turns an unpriced externality into a controllable input. Conversely, if pN is set administratively at zero, the system will tend to over-communicate, shifting the burden into latency and operational risk (captured in the model by κ).
We view the model as a disciplined “accounting identity with incentives,” not a full representation of learning dynamics.
First, we compress learning progress into ϕ(E) and linear-in-frequency value τϕ(E)f. Real learning curves depend on data non-i.i.d.-ness, optimizer state, and drift; the value of an additional update may decay with time or depend on global model quality. A next step is to embed the pricing-and-scheduling problem into a finite-horizon learning curve, where the server trades off early rapid improvement against later diminishing returns.
Second, we use stylized AoI/latency proxies A(f) = 1/(2f) and L(f) = 1/(2f) + κf. These capture the core convexity but abstract from burstiness, deadline constraints, and multi-class queueing. Empirically grounded queueing models (or learned delay predictors) can replace L(⋅) while preserving the same economic logic.
Third, the baseline treats pN as exogenous. In many networks, congestion is endogenous to ∑ifi. Endogenizing pN creates a fixed-point problem and introduces strategic interactions across nodes (each node’s frequency imposes latency on others). This is precisely where mechanism design becomes central: efficient outcomes generally require congestion pricing or rationing.
Fourth, we abstract from privacy risk, fairness across sites, and constraints such as minimum update rates for safety-critical monitoring. These constraints can be incorporated as KKT multipliers that effectively shift pC, pN, or λ, but doing so explicitly is important for governance.
Despite these limitations, the central conclusion is robust: in IIoT FL, carbon and congestion prices are not peripheral “cost accounting” details; they are policy-relevant signals that determine the optimal balance between local computation and communication. The practical agenda is therefore clear: measure ei credibly, expose meaningful pC and pN signals to the scheduler, and implement adaptive (Ei, fi) policies—possibly via simple regime menus—that make the tradeoff transparent and auditable.