Federated learning (FL) procurement has matured into an operational
market: multiple model owners
(requesters'') compete for participation from a large population of heterogeneous devices and users (workers’’)
who incur private costs when training, communicating, and exposing their
devices to privacy and reliability risks. In 2026, the binding
constraints in this market are no longer purely monetary. Two additional
frictions have become first-order in practice and, we argue, must be
modeled explicitly if one wants mechanisms that remain implementable
outside stylized laboratory settings: (i) hard sustainability
constraints (energy/carbon budgets) imposed by regulation, internal
accounting, or contractual commitments, and (ii) compatibility
constraints arising from interference and shared infrastructure, which
induce group-based caps on how many workers can be simultaneously served
by a requester.
The first friction is straightforward to observe in deployments.
Large-scale FL often runs at the edge (phones, IoT, on-prem GPUs) but is
orchestrated by cloud services that are increasingly subject to carbon
accounting. Many requesters now face explicit emissions ceilings for a
quarter or a product cycle, not merely a shadow price for carbon. This
hard-cap logic appears in diverse forms: procurement policies that
allocate a fixed number of
energy credits'' to teams; sustainability clauses in enterprise contracts; and compliance regimes that require auditable disclosure of energy use and emissions. In such environments, paying more money cannot necessarily buy more participation once the carbon envelope is exhausted. Mechanism design that treats sustainability as an afterthought—e.g., by optimizing a monetary budget and then performing an ex postcarbon
check’’—will frequently produce allocations that are infeasible, or it
will require ad hoc clipping rules that break incentive guarantees.
The second friction—compatibility—reflects the operational reality that workers are not interchangeable across requesters once we account for shared bottlenecks. Edge participants share bandwidth, radio spectrum, regional backhaul, charging infrastructure, and even human attention; they also share risk exposure when multiple requesters target the same device populations. These interactions generate negative externalities that are naturally modeled as congestion or interference. In procurement terms, it is often not acceptable to allocate too many workers from the same ``incompatibility group’’ to a given requester: e.g., too many devices in a single cell tower region; too many clients behind the same NAT or ISP; too many workers who rely on the same vulnerable library or hardware supply chain; or too many participants whose data distributions are nearly identical (which can destabilize training and undermine statistical guarantees). Operators increasingly encode these considerations as caps derived from reliability or failure-probability requirements: if the probability of congestion-induced failure grows steeply with the number of concurrently served workers in a group, then a chance constraint can be translated into a deterministic cap. This translation is not merely technical convenience; it mirrors how SRE and ML platform teams implement policy—via simple capacity limits that are enforceable in real time.
These two frictions interact. Carbon constraints shrink the feasible set of hires even when monetary budgets are slack, while compatibility caps can prevent a requester from fully spending either monetary or carbon budgets by blocking access to available workers in a congested group. As a result, the market-clearing logic underlying single-budget procurement mechanisms becomes incomplete: it is no longer enough to find a price that balances money-based demand and supply. We must simultaneously respect a second (non-monetary) feasibility dimension and a combinatorial structure induced by incompatibility. Our aim in this paper is to make this dual-feasibility environment precise, and then show that the key desiderata of mechanism design—dominant-strategy incentive compatibility (DSIC), ex post individual rationality (IR), and budget feasibility—remain achievable with polynomial-time computation and provable welfare guarantees.
Why is dual budgeting the right modeling choice, rather than, say, an internal carbon price that can be folded into monetary costs? The answer is institutional rather than philosophical. Internal carbon prices are common, but they typically coexist with hard caps: teams are charged a carbon price for accounting, yet they still face a quarterly emissions ceiling that cannot be exceeded. Moreover, carbon prices fluctuate (and differ across organizations), while the mechanism designer (the platform) often faces a simpler requirement: never schedule training that violates an externally declared carbon allotment. This hard-cap view makes the sustainability constraint structurally analogous to a second budget, but crucially it is not transferable across requesters and not relaxable by additional payments. In other words, money and carbon are not perfect substitutes at the mechanism level, even if they are related in corporate planning.
Existing reputation-based and budget-feasible mechanisms for FL procurement illustrate the gap. A representative approach—typified by CARE-style designs—assigns workers based on reported costs and observable quality/reputation, ensuring that payments fit within a monetary budget. Such mechanisms are attractive because they blend economic incentives (truthful cost reporting) with ML-relevant signals (quality weights), and they often inherit clean threshold-payment logic from single-parameter auction theory. However, in a carbon-constrained environment, a purely monetary mechanism can fail in two ways. First, it can simply over-allocate: the mechanism may select a set of workers that is affordable in dollars but exceeds the requester’s energy/carbon cap once training actually occurs. Second, even when ex post clipping is applied (e.g., ``drop some winners until carbon fits’’), the resulting allocation rule is typically no longer monotone in bids, undermining DSIC and inducing strategic misreporting. The platform then faces a choice between feasibility and incentive guarantees—an undesirable tradeoff if the goal is a robust market protocol.
The sustainability dimension also changes what it means for an allocation to be ``efficient.’’ In single-budget settings, spending more budget typically increases welfare by allowing more hires or higher-quality hires. Under a hard carbon cap, marginal dollars may not translate into marginal participation; instead, they may only shift which workers are selected within a fixed carbon envelope. This creates a new role for pricing: prices must clear a market subject to a non-monetary quantity limit. Put differently, carbon introduces a binding quantity constraint that truncates monetary demand. Ignoring this truncation leads to miscalibrated price signals and, in multi-requester markets, can distort competition across requesters: one requester with abundant money but scarce carbon may bid up participation that it cannot ultimately use, crowding out a requester who could.
Compatibility constraints further complicate the picture because they convert what would be a separable per-requester selection problem into a coupled allocation problem. In many FL systems, a worker can plausibly be assigned to at most one requester’s training job in a given period, and the platform must respect group-level caps. This resembles matching with additional side constraints—an environment where naive greedy selection can be far from optimal and, more importantly for incentives, can be non-monotone. The compatibility structure thus cannot be relegated to an engineering afterthought (``we’ll just enforce it in scheduling’’) if we also want truthful bidding and interpretable payments.
Our modeling choices are guided by implementability. We assume that per-worker energy usage per training contract is publicly verifiable (via telemetry or standardized measurement) and approximately constant at the level relevant for procurement. This assumption is not innocuous, but it captures a common contractual arrangement: organizations often procure participation for a standardized training job (same number of rounds, similar device requirements), and they can instrument energy use at the orchestrator level. Under this assumption, a carbon/energy budget becomes a hard cap on the number of hires a requester can sustain, which integrates cleanly with monetary affordability. We also treat compatibility through group-requester caps derived from a chance constraint; again, the goal is to encode a complex reliability requirement into an enforceable and auditable rule that can be checked at allocation time.
Within this structured environment, our contribution is conceptual as much as technical. Conceptually, we argue that sustainability constraints in FL procurement should be treated as first-class feasibility constraints—akin to budgets—not as a soft preference or a post hoc reporting exercise. Mechanism design provides a language for doing so while preserving truthfulness and transparency. Technically, the key message is that adding a carbon cap does not necessarily destroy the tractability and incentive properties that made single-budget mechanisms appealing. When carbon enters as a per-requester cardinality cap (induced by constant, verifiable per-worker energy), we can extend price-based employability ideas to a dual-budget notion of demand, compute allocations via standard flow constructions under compatibility constraints, and retain threshold-style payments that support DSIC and ex post IR.
At the same time, we emphasize the limits of this baseline model. Energy usage is not truly constant across workers, and measurement can be noisy or manipulable if instrumentation is weak. When energy footprints are heterogeneous, carbon feasibility becomes knapsack-like and the allocation problem inherits computational hardness; in such regimes, one must rely on approximation, discretization, or randomized mechanisms, and the clean deterministic DSIC story becomes more delicate. Similarly, compatibility groups are an abstraction: real interference patterns can be time-varying and network-dependent, and caps may need periodic recalibration. We therefore view our results as illuminating a tradeoff: stronger sustainability and reliability guarantees can be incorporated into truthful procurement, but doing so requires careful attention to what is measurable and enforceable.
This introduction motivates why the dual-budget, compatibility-constrained procurement problem is the right object for 2026-era FL systems. The next section situates our approach relative to the literatures on budget-feasible procurement and multi-buyer mechanisms, FL incentives and reputation, sustainability constraints in ML systems, and interference-constrained markets.
Our setting sits at the intersection of four literatures that are often studied separately: budget-feasible procurement and multi-buyer mechanisms, incentive design for federated learning (FL), sustainability constraints in ML systems, and allocation under compatibility/interference constraints. A recurring theme across these areas is that the operational constraints that make a protocol deployable (budgets, capacity limits, reliability requirements) are exactly the constraints that tend to break the clean monotonicity and separability properties on which dominant-strategy mechanism design relies. Our goal is to leverage the parts of these literatures that compose—price-based selection, threshold payments, and flow-based allocation—while being explicit about what must be assumed (and verified) for this composition to remain valid.
The classic point of departure is for procurement: a buyer with a hard budget seeks to purchase items (or services) from strategic sellers with private costs, while maximizing a value function subject to the budget. Beginning with early work such as Singer~ and a large sequence of refinements, this literature establishes that one can obtain constant-factor approximations under truthfulness and budget feasibility for a variety of valuation classes (additive, submodular) using posted prices, random sampling, and carefully designed thresholds (see surveys and follow-ups in the same line). Two aspects are especially relevant to us. First, many truthful budget-feasible mechanisms rely on structure on the seller side (each agent has one private cost) and consequently use monotone allocation rules plus threshold payments. Second, a substantial subset of these mechanisms can be interpreted as variants of “market-clearing” at a uniform price that is computed from bids and the budget, which is precisely the logic behind price-elicitation approaches.
However, most canonical budget-feasible results are for a buyer (or a single aggregate budget) purchasing from many sellers, whereas FL procurement platforms increasingly intermediate among requesters with separate budgets and heterogeneous constraints. Multi-buyer procurement brings the problem closer to matching or assignment markets, and raises additional design issues: competition across requesters, feasibility coupling (a worker cannot be assigned to multiple requesters simultaneously within a period), and the need for per-requester budget compliance rather than a single global constraint. Related work in multi-unit and multi-buyer environments includes double auctions, two-sided matching markets with prices, and “clinching” style procedures under budgets; yet DSIC with hard budgets on the buyer side often requires structure (e.g., monotone demand, gross substitutes) that is absent once we add combinatorial side constraints. In practice, platforms therefore fall back on posted prices or heuristic ranking rules that are easy to implement but come with limited incentive guarantees.
A further strand studies budget-feasible mechanisms under beyond money, such as matroid constraints, matching constraints, or general packing constraints. These constraints matter because they mirror real engineering restrictions (capacity, interference, limited parallelism), but they also tend to make naive greedy allocation non-monotone. The mechanisms most relevant to our approach are those that (i) compute a critical price (or a small menu of prices), (ii) restrict attention to agents below that price, and (iii) choose a feasible subset maximizing cardinality or additive value under the side constraints. This is the conceptual template behind PEA- and CARE-style designs that we build upon; our contribution is to show how a second hard budget (carbon) can be folded into the same template sacrificing polynomial-time implementability or threshold-based truthfulness, provided the sustainability constraint reduces to a cardinality cap.
FL has generated a large literature on incentives, broadly motivated by the fact that participation is costly, devices are heterogeneous, and data owners may strategically respond to reward schemes. Many papers propose reverse auctions or contract-theoretic schemes in which a single model owner purchases computation or data from devices, sometimes incorporating quality signals (e.g., sample size, past accuracy contributions) or robustness considerations (e.g., adversarial or unreliable participants). A parallel thread emphasizes contribution evaluation—often via Shapley-value approximations, marginal accuracy improvements, or gradient-based metrics—and then uses these scores to distribute rewards. Reputation systems and repeated-interaction models attempt to address long-run reliability and sybil resistance, complementing one-shot payment rules.
Two gaps remain salient for our purposes. First, much of the FL incentive literature assumes a requester and optimizes a single resource dimension (typically monetary), whereas modern FL platforms increasingly multiplex many concurrent training jobs with separate budgets and policies. Second, even when energy or battery constraints are mentioned, they are usually treated as (i) private costs folded into ci, (ii) soft penalties in an objective, or (iii) device-side constraints that the platform does not explicitly enforce. In contrast, our focus is on sustainability constraints that are auditable and non-transferable across requesters. This distinction matters: if carbon is a hard feasibility constraint rather than a component of worker disutility, then simply “paying more” cannot restore feasibility, and ad hoc post-processing of monetary winners (to satisfy carbon) can break the monotonicity needed for DSIC.
We also note that several FL mechanisms incorporate observable “quality weights” (reputation, reliability, or proxy utility) and then solve a weighted procurement problem subject to budgets. Our baseline treatment aligns with this practice: the mechanism may maximize additive value ∑vixij with public vi, while strategic behavior enters only through reported costs. The novelty, relative to existing reputation-weighted procurement rules, is that we enforce a second hard cap per requester and maintain threshold payments in a multi-requester assignment environment with group-based compatibility limits.
Outside mechanism design, there is now a mature systems-and-policy literature on the carbon footprint of training and deploying ML models. Work on “Green AI,” carbon accounting dashboards, and carbon-aware scheduling has made energy consumption measurable and increasingly actionable: cloud providers expose region-specific emissions intensities; operators shift workloads in time and space to exploit renewable availability; and organizations adopt internal carbon prices or caps to govern compute usage. In parallel, regulators and procurement contracts have moved from aspirational sustainability goals toward , which in practice often take the form of fixed quarterly budgets for energy or emissions.
Most of this literature is optimization- or scheduling-driven rather than incentive-driven: it presumes the operator can control workloads and does not contend with strategic reporting of costs by edge participants. Where economics enters, it is frequently through carbon pricing (a shadow cost per unit energy) or multi-objective tradeoffs (accuracy versus energy). These approaches are valuable, but they do not directly address our institutional premise: in many deployments, the platform is required to satisfy a hard cap (a compliance constraint), not merely to trade off carbon against dollars in an objective. From a mechanism-design perspective, this suggests modeling sustainability as a second feasibility constraint rather than as a price. Our framework is complementary to carbon pricing: one can still interpret the carbon cap as reflecting an organizational willingness-to-pay for emissions, but the mechanism must operate under the constraint that the cap cannot be violated ex post.
A key modeling fork concerns whether energy usage is homogeneous and verifiable. Systems work highlights that energy can vary across devices, networks, and training regimes. We acknowledge this: when energy footprints differ meaningfully across workers, the sustainability constraint becomes knapsack-like and exact allocation with additional constraints is computationally hard. This observation links our “constant per-contract energy” assumption to the sustainability literature’s emphasis on instrumentation and standardization: if platforms can standardize FL jobs (fixed rounds, fixed client workloads) and verify energy via telemetry, then the sustainability constraint simplifies into a tractable cap. If not, then one must turn to discretization, approximation, or randomized policies—approaches more commonly seen in systems optimization than in deterministic DSIC mechanism design.
Finally, our compatibility constraints connect to allocation problems with interference, congestion, or conflict constraints. In wireless networking and spectrum markets, feasibility is often modeled via conflict graphs or interference sets; in cloud and edge scheduling, feasibility is governed by shared bottlenecks (bandwidth, backhaul, local compute, or failure domains). Mechanism design in such settings is challenging because feasible sets are rarely matroids, and because “local” constraints can induce global non-monotonicities. A pragmatic response—mirroring how operators implement policy—is to replace complex interference interactions with enforceable caps derived from reliability targets, e.g., chance constraints on congestion or failure probabilities. This yields deterministic group-based limits that are easy to audit and enforce at allocation time.
From an algorithmic standpoint, group-based caps and per-requester capacities lead naturally to flow and b-matching formulations, which are among the few combinatorial primitives that are both expressive enough to capture real constraints and computationally tractable at scale. This algorithmic tractability is not merely a convenience: it is also a mechanism-design enabler, because polynomial-time winner determination under a monotone “eligible set” (agents below a price) makes threshold payments implementable. Our work therefore aligns with a broader trend in constrained mechanism design: restrict the allocation problem to a structure (flows, matchings, partition caps) where feasibility can be certified efficiently and where the economic logic (monotonicity and thresholds) can be preserved.
In summary, prior work gives us three largely separate toolkits: (i) truthful budget-feasible procurement at a critical price, (ii) FL-specific incentive heuristics incorporating reputation and contribution signals, and (iii) systems-level carbon accounting and interference-aware scheduling. What is missing is a unified procurement mechanism that treats sustainability as a hard per-requester budget, respects operational compatibility limits, and still provides DSIC and ex post IR with polynomial-time computation in a multi-requester market. Our dual-budget extension is intended to fill precisely this gap, while making explicit where the design is robust (constant-factor guarantees under enforceable caps) and where it is inherently limited (heterogeneous or non-verifiable energy footprints).
We study a procurement market for federated-learning (FL) training in which a platform matches a set of strategic worker devices to multiple concurrent requesters (training jobs). The modeling choice that disciplines the analysis is that requesters face hard feasibility constraints: a monetary budget and an auditable sustainability (energy/carbon) budget. This reflects common operational practice: budgets are committed ex ante (e.g., a project’s dollar budget and a compliance-driven energy cap), and the platform must remain feasible , rather than merely trading off dollars against emissions in an objective.
There are two types of agents. The set of workers is S with |S| = n, indexed by i. Each worker represents a device (or data owner) that can be recruited to participate in one FL training contract for one requester during the relevant period. The set of requesters is A with |A| = m, indexed by j. Each requester corresponds to a training job (or a model owner) that wishes to hire a subset of workers subject to its feasibility constraints. The platform is a mechanism designer: it collects reports, computes an assignment, and specifies payments.
Each worker i has a private participation cost ci ≥ 0 (covering local computation, privacy risk, opportunity cost, and other device-side burdens). Workers submit a report (bid) bi ∈ ℝ+ interpreted as the minimum payment they claim to require. We emphasize the single-parameter structure because it is the main lever that enables dominant-strategy incentive compatibility (DSIC) with threshold payments. In particular, we do allow workers to report multi-dimensional types (e.g., separate costs for energy, data quality, and latency); we treat these dimensions as either embedded in ci or managed through observables and feasibility constraints described below.
Optionally, each worker has a publicly observable reputation/quality weight vi ≥ 0 (e.g., historical reliability, contribution proxy, device class). This captures the practical fact that platforms often have some non-strategic score used to prioritize participants even when costs are privately known. When we focus on maximizing the number of hires, we set vi ≡ 1; when we model quality-sensitive procurement, the platform’s objective becomes additive in vi.
Each requester j comes with a monetary budget Bj > 0 and a carbon/energy budget Ej > 0. The monetary budget is standard: the total payments made by requester j cannot exceed Bj. The sustainability budget is modeled as a hard cap on energy use attributable to that requester’s training contract within the period. We interpret Ej broadly (it may be an energy cap, an emissions cap converted into energy units using an accounting factor, or an internal carbon budget denominated in “energy-equivalent” units), but in all cases it is a non-transferable feasibility constraint at the requester level.
A key simplifying assumption, motivated by standardized FL jobs, is
that energy usage per worker-contract is constant and verifiable: each
recruited worker consumes e > 0 units of energy for
participating in the contract. This is plausible when the platform fixes
the training recipe (number of rounds, local epochs, batch sizes) and
audits consumption via telemetry or provider accounting. Under this
assumption, the carbon/energy constraint for requester j reduces to a cap:
$$
e\sum_i x_{ij}\le E_j \quad \Longleftrightarrow \quad \sum_i x_{ij}\le
K_j:=\left\lfloor \frac{E_j}{e}\right\rfloor,
$$
so Kj is
the maximum number of workers that requester j can hire under its sustainability
budget. This reduction—energy to a per-requester hire cap—is central: it
preserves computational tractability and, importantly, keeps the
sustainability constraint , which is crucial for clean incentive
arguments.
Beyond budgets, the platform must respect operational constraints
that capture interference, correlated failures, or “do-not-co-locate”
policies. We model these via incompatibility groups. The worker set
S is partitioned into L groups G1, …, GL
(e.g., devices sharing a network bottleneck, a geographic region with
limited backhaul, or a common failure domain). For each requester j and group ℓ, the platform enforces a cap τℓj ∈ ℤ+
limiting how many workers from group Gℓ may be
simultaneously assigned to requester j:
∑i ∈ Gℓxij ≤ τℓj ∀ℓ ∈ {1, …, L}, ∀j ∈ A.
We view τℓj as
an for more complex stochastic congestion interactions. Concretely, let
qℓ(k)
denote a primitive “failure/congestion probability” when k workers from group ℓ are activated in parallel for the
same requester, and let δ ∈ (0, 1) be an operator-chosen
tolerance. A canonical mapping is
τℓj := max {k ∈ ℤ+ : qℓ(k) ≤ δ},
although our mechanism will treat τℓj as
an input once derived or estimated. This modeling choice mirrors
practice: operators rarely enforce fully state-dependent interference
constraints in real time; instead they certify policies in the form of
auditable caps that guarantee reliability targets.
An allocation specifies, for each worker–requester pair, an indicator
xij ∈ {0, 1}
equal to 1 if worker i is assigned to requester j. We write Sj := {i : xij = 1}
for the set assigned to requester j. Workers are unit-supply in our
baseline: each worker can serve at most one requester in the
period,
∑j ∈ Axij ≤ 1 ∀i ∈ S,
reflecting the fact that a device cannot meaningfully participate in
multiple FL jobs simultaneously (or the platform disallows it to avoid
privacy/accounting complications). The mechanism also specifies payments
pij ≥ 0
from requester j to worker
i, and we define total payment
to worker i as pi := ∑jpij.
The platform must output (x, p) satisfying three families of constraints:
In what follows, we often combine the first two constraints at the level of workers a requester can afford at a uniform per-worker price. Intuitively, if workers are each paid (at most) a common price r, then requester j can fund at most ⌊Bj/r⌋ workers monetarily and at most Kj workers environmentally; the binding cap is their minimum. This “dual budget” structure is what will later allow us to integrate sustainability caps into price-based procurement without altering the threshold-payment logic.
Workers are quasi-linear. If worker i is hired by some requester, it
incurs cost ci once;
otherwise it incurs no cost. Thus utility is
ui(b) = pi(b) − ci ⋅ 1{∃j: xij(b) = 1}.
We target two standard economic properties. First, : truthful reporting
bi = ci
should maximize ui for each
worker regardless of others’ reports. Second, : under truthful
reporting, selected workers receive nonnegative utility. These are
natural for platform procurement in which workers are one-shot
participants and cannot be assumed to play Bayes-Nash equilibria.
The platform seeks to maximize requester-side value from recruited
participants subject to feasibility. In the simplest “throughput”
formulation, value is the total number of recruited workers, i.e.,
max ∑i, jxij.
When quality weights are used, the objective becomes additive in vi:
max ∑i, jvixij.
We stress that vi is taken as
observable and non-strategic in the baseline; strategic behavior enters
only through bi. This aligns
with many deployed systems that treat reputation or device class as
platform-determined while eliciting (explicitly or implicitly) a
compensation requirement from participants.
The constant-and-verifiable energy assumption is both a strength and a limitation. It is a strength because it lets us treat sustainability as a hard per-requester hire cap Kj, which composes cleanly with compatibility caps and supports polynomial-time allocation. It is a limitation because real energy footprints vary across devices and network conditions. Our interpretation is that this assumption is plausible when the platform standardizes training workloads (so that per-participant energy is approximately fixed) and audits consumption; when this is not feasible, one must confront knapsack-like feasibility and the resulting computational and incentive complications. We return to this issue explicitly as an extension.
Before introducing our mechanism, it is useful to understand why a
number of seemingly natural
reductions’’ either fail feasibility (especially the hard carbon cap) or fail incentive compatibility once we insist on \emph{ex post} guarantees. The overarching lesson is that dual budgets are not just a matter of adding one more constraint: they change the geometry of what aprice-based’’
procurement step can safely do, and they expose where single-parameter
truthfulness is fragile.
A first impulse—common in policy discussions—is to convert carbon into dollars via an internal carbon price λ and define an ``effective’’ monetary budget B̃j := Bj + λEj (or, dually, to treat Ej as a soft penalty in the objective). One then runs a standard budget-feasible procurement procedure under B̃j and ignores carbon thereafter.
This is conceptually appealing but mismatched to our premise that Ej is a compliance constraint. If carbon is a hard cap, then feasibility must hold for realized allocation: no choice of λ can guarantee that a dollar-feasible outcome is also carbon-feasible, because the mapping from dollars to energy depends on the realized set of hires. In our baseline model, where each hire consumes verifiable energy e, carbon-feasibility is simply |Sj| ≤ Kj. Any reduction that does not explicitly enforce |Sj| ≤ Kj risks selecting |Sj| > Kj whenever B̃j is large relative to Bj or when λ is tuned too low. Conversely, tuning λ to be ``conservative’’ can force the mechanism to under-hire even when abundant monetary budget is available, which is precisely the kind of systematic inefficiency operators wish to avoid when Ej is slack.
More subtly, a Lagrangian-style collapse also undermines the interpretability of guarantees: it may yield carbon feasibility only ``on average’’ (or only for the design point at which λ was calibrated). In the operational settings we have in mind—auditability, reporting, and non-transferable sustainability allocations—this is typically not acceptable.
A second common engineering pattern is sequential feasibility:
compute a money-feasible allocation and payments (e.g., by a
budget-feasible auction or by a PEA-style critical price for Bj), and then,
if |Sj| > Kj,
discard some winners until |Sj| = Kj.
The truncation can be drop highest bids,’’drop lowest
quality,’’ or some ad hoc fairness rule.
The feasibility problem is immediate: once we start removing winners, the original payment computation is no longer coherent. If the original payments were computed as thresholds for the money-only allocation, they need not be valid thresholds for the truncated allocation. In particular, a worker can face a discontinuity where (i) they would have been selected and paid under the money-only rule, but (ii) they are removed by the carbon truncation for reasons unrelated to their own bid. This breaks the monotonicity logic that underwrites DSIC in single-parameter environments: even if the money-only allocation is monotone in bi, the composition ``money-only selection → carbon truncation’’ need not be. Intuitively, when carbon binds, selection depends on which workers happened to be selected, so a worker’s probability of surviving truncation can move in the wrong direction as they change their bid (through how their bid affects the set of provisional winners and the truncation rule’s tie situations).
The budget-feasibility problem is equally severe if one attempts to restore DSIC by paying threshold bids after truncation. Thresholds in the truncated problem are typically than in the money-only problem because the truncation induces stronger competition for fewer slots. Paying these higher thresholds can easily violate ∑ipij ≤ Bj unless the allocation rule was designed with a uniform price cap in mind. Put differently: ``truncate after the fact’’ is not merely a feasibility patch; it changes the market-clearing price logic, and reintroduces exactly the budget-feasibility issues that budget-feasible mechanisms work hard to avoid.
A third baseline, ubiquitous in FL systems, is to sort workers by a cost-per-quality score (e.g., bi/vi) and greedily accept workers while constraints remain. This can look compelling because it resembles a knapsack heuristic and because it often performs well in unconstrained simulations.
From a mechanism-design perspective, however, two obstacles appear.
If we pay each selected worker their bid (a common ``reverse auction’’ engineering choice), then workers optimally inflate bids up to the point of being rejected. This is not a minor technicality: the selection rule itself encourages strategic overstatement, and the resulting equilibrium can be arbitrarily inefficient.
In single-parameter settings, any monotone allocation can be paired with threshold payments to obtain DSIC. Greedy rules in constrained matching environments are often monotone once we incorporate unit-supply across requesters, compatibility caps, and the fact that requesters have different binding constraints at different prices. Even when a greedy rule is monotone, the induced threshold payments can exceed budgets because the threshold that keeps worker i in the selected set can be determined by a different constraint than the one that was binding at the realized bids. This is the procurement analogue of why naive thresholding in budgeted environments can explode payments: the marginal excluded competitor can sit at a price that the requester cannot afford for all accepted slots.
The deeper point is that our constraints are not ``one-dimensional.’’ A greedy rule that is locally sensible for money alone (or carbon alone) can become globally non-monotone once the feasibility region is shaped by both ⌊Bj/r⌋ and Kj, and further intersected with the group caps τℓj. In practice, platforms often respond by hard-coding tie-breaking rules, post-processing allocations, or adjusting payments ex post. Each of these steps is precisely where DSIC typically fails.
Another plausible reduction is to treat each requester independently: run a budget-feasible procurement for each j (respecting that requester’s Bj and Kj), and then resolve conflicts when the same worker is selected by multiple requesters (since workers are unit-supply). Conflict resolution might assign the worker to the highest-paying requester, the requester with the tightest deadline, or simply the earliest requester in a fixed order.
The problem is that the conflict-resolution stage couples the auctions in a way that is generally incentive-compatible. A worker’s chance of being finally allocated depends not only on their own bid relative to others in one requester’s auction, but also on whether they are simultaneously attractive elsewhere and on how the platform breaks ties across requesters. Thus, even if each per-requester auction is DSIC in isolation, their composition with an ex post conflict rule typically is not DSIC globally. From a feasibility standpoint, per-requester designs also struggle to respect group caps τℓj when those caps are derived from shared operational resources and must be enforced at the platform level with a consistent accounting.
Up to this point, our criticism has been about naive . There is also a genuine lurking behind the scenes: if energy usage is not constant-and-verifiable, the problem ceases to be cleanly single-parameter.
Suppose instead each worker has an energy footprint ei that is private information (or manipulable) and enters feasibility as ∑i ∈ Sjei ≤ Ej. A worker’s type becomes (ci, ei), which is multi-dimensional. At that point, the canonical ``monotone allocation + threshold payment’’ recipe no longer applies: DSIC in multi-parameter domains requires much stronger structural conditions (cycle-monotonicity), and simple posted-price or threshold constructions can fail even when the allocation problem were easy computationally.
Computationally, heterogeneous energy also transforms carbon feasibility into a knapsack constraint, and the allocation problem becomes NP-hard even without strategic behavior. This creates a three-way tension that is hard to escape simultaneously: (i) we would like exact or near-exact welfare (or value) maximization, (ii) we would like DSIC, and (iii) we would like polynomial-time computation and budget feasibility. Standard tools each break somewhere: VCG-style mechanisms demand exact welfare maximization and do not naturally respect hard budgets; knapsack approximations can be made truthful only with careful maximal-in-range restrictions or randomization; and any reliance on reported ei threatens feasibility if reports are not verifiable.
This is precisely why we built the baseline model around a verifiable constant e. It is not that heterogeneous energy is unimportant—it is operationally central—but rather that, without verifiability, the sustainability constraint is no longer ``just another cap.’’ It becomes both a computational bottleneck and an incentive bottleneck.
These baselines clarify what we need from a workable mechanism in our setting. We need an allocation rule that (a) enforces carbon feasibility as a hard cap (rather than through pricing or truncation), (b) preserves single-parameter incentive structure by keeping the sustainability constraint bid-independent, and (c) controls payments in a way that is compatible with hard monetary budgets even in the presence of compatibility caps and multiple requesters. The next section builds exactly this: a dual-budget, virtual-price extension of PEA in which both budgets enter through a unified ``employability at price r’’ notion, while feasibility under τℓj is handled algorithmically via flow.
We now describe the mechanism that resolves the tension highlighted above: we want a procurement rule (so that DSIC can be obtained via monotonicity and thresholds), while simultaneously enforcing two feasibility systems—monetary budgets and carbon budgets—together with platform-level compatibility caps. The key design move is to treat carbon not as a priced externality, but as a that is folded into the same ``how many hires are affordable at price r’’ accounting that drives price-based procurement.
In our baseline model, each selected worker consumes a verifiable and
constant amount of energy e > 0 per training contract.
Therefore the carbon/energy constraint for requester j,
e∑ixij ≤ Ej,
is equivalent to a
$$
\sum_i x_{ij} \le K_j := \Big\lfloor \frac{E_j}{e}\Big\rfloor.
$$
This reduction is not merely notational: it is the sense in which the
sustainability constraint can be enforced and . From here onward, carbon
enters the mechanism only through Kj.
As in price-equilibrating procurement, we consider posted-price
outcomes indexed by a per-worker price r. At price r, requester j can afford at most ⌊Bj/r⌋
workers under money, and at most Kj workers under
carbon. We therefore define the as
$$
\kappa_j(r) \;:=\; \min\Big\{\big\lfloor \tfrac{B_j}{r}\big\rfloor,\;
K_j\Big\}.
$$
Summing across requesters yields the (i.e., the total number of
worker-slots the demand side can fund and certify at price r):
$$
E(r) \;:=\; \sum_{j\in A} \kappa_j(r) \;=\; \sum_{j\in
A}\min\Big\{\big\lfloor \tfrac{B_j}{r}\big\rfloor,\; K_j\Big\}.
$$
On the supply side, workers with bids at most r are available:
S(r) := { i ∈ S : bi ≤ r }.
Two features are worth emphasizing for intuition. First, E(r) is entirely determined
by budgets and r; bids affect
only which workers are in S(r). Second, carbon
affects E(r) only by
truncating each requester’s money-based employability at Kj. When Kj is slack
(relative to ⌊Bj/r⌋),
the mechanism reduces to its money-only counterpart; when Kj is tight, it
becomes a hard limit on how many workers the requester may ever hire,
regardless of monetary slack.
We restrict attention to a finite
$$
R_b \;:=\; \Big\{\frac{B_j}{t} : j\in A,\ t\in\{1,\dots,K_j\}\Big\},
$$
which is exactly the set of prices at which some requester’s money-based
capacity ⌊Bj/r⌋
can change while remaining relevant under the carbon cap (there is no
point considering t > Kj
since carbon forbids more than Kj hires
anyway). Operationally, Rb is a
discretization induced by budgets; it will let us compute the critical
price by scanning a polynomial-size list.
Even if the demand side can employ E(r) workers in aggregate,
the platform may be unable to realize E(r) hires because of
unit-supply (each worker can serve at most one requester) and
compatibility constraints. For each incompatibility group Gℓ and requester
j, we require
∑i ∈ Gℓxij ≤ τℓj.
At a fixed price r, we
therefore define the (for the cardinality objective; the vi-weighted
variant is discussed below) as:
OSP(r): max ∑i ∈ S(r), j ∈ Axij
subject to (i) unit-supply ∑jxij ≤ 1
for all i, (ii) compatibility
caps ∑i ∈ Gℓxij ≤ τℓj
for all (ℓ, j), and
(iii) per-requester dual-budget caps ∑ixij ≤ κj(r)
for all j. Let Mf(r)
denote the optimal objective value of OSP(r), i.e., the maximum number of
workers we can actually allocate at price r given all hard constraints.
A maximum s–t flow in this network corresponds to a feasible allocation, and its value equals the number of assigned workers. Thus Mf(r) can be computed in polynomial time for each r.
The purpose of the critical price is to find the smallest uniform
price at which the platform can realize all employable slots. Formally,
we seek a price r* ∈ Rb
at which
Mf(r*) = E(r*),
and we take r* to
be the smallest such price in Rb (ties broken
deterministically). Intuitively, as we raise r, each requester’s money-based
capacity ⌊Bj/r⌋
weakly decreases and is always truncated by Kj, so
employability E(r)
weakly decreases. Meanwhile, higher r expands the available worker set
S(r). When r is very low, S(r) can be too small (or
too ``imbalanced’’ across groups) to fill all employable slots, so Mf(r) < E(r).
When r is high enough, the
platform can fill whatever employability remains, and we reach equality.
The smallest price at which equality holds is the point at which
supply-side feasibility (under compatibility) meets demand-side
employability (under dual budgets).
Algorithmically, DB-PEA computes r* by sorting Rb in increasing
order, and for each candidate r:
(i) computing κj(r)
and E(r), (ii)
solving OSP(r) via max-flow to
obtain Mf(r),
and (iii) stopping at the first r with Mf(r) = E(r).
Because |Rb| = ∑jKj
and each max-flow is polynomial, this entire step is polynomial.
At the critical price r*, we compute a feasible allocation of size Mf(r*) by solving OSP(r*). Since max-flow may have multiple optimal solutions, we fix a deterministic tie-breaking rule to make the allocation . This is not cosmetic: DSIC payments are defined via thresholds, which require that the allocation be well-defined for every bid profile.
A clean approach is to impose a lexicographic priority ordering over workers that is consistent with bids (e.g., sort workers by increasing bi and then by a fixed identifier), and then choose, among all maximum-cardinality feasible allocations, the one that is minimum-lex in that order. Practically, we implement this by converting the max-flow to a min-cost max-flow: we first maximize flow value, and subject to that, minimize a cost that assigns each worker an infinitesimal cost according to the priority ordering (equivalently, use integer costs with sufficiently separated magnitudes). This yields a unique allocation while preserving feasibility and the critical-price logic.
The output of this step is the allocation matrix (xij),
equivalently the assigned sets Sj = {i : xij = 1},
satisfying all constraints by construction:
∑ixij ≤ κj(r*) ≤ Kj, ∑i ∈ Gℓxij ≤ τℓj.
Finally, we compute payments so that truthful bidding is a dominant strategy. Given a deterministic monotone allocation rule, the canonical payment is each winner’s : the largest bid at which the worker would still be selected holding others’ bids fixed.
Concretely, for each worker i who is allocated under the above
rule, define
θi := sup { z ≥ 0 : i
is selected by DB-PEA when bi = z
and b−i fixed }.
We then pay pi = θi
(and pi = 0 for
non-winners), with the payment routed from the requester j to whom i is assigned, i.e., pij = pi
if xij = 1
and 0 otherwise. Because only prices in
Rb can
change capacities, and because the worker’s inclusion depends only on
whether bi ≤ r
for candidate prices and on deterministic tie-breaking, θi can be
computed by searching over a finite set (e.g., by scanning or binary
searching over Rb while
recomputing the induced outcome with bi set to the
candidate). This yields polynomial-time payment computation.
A useful operational interpretation is that DB-PEA behaves like a posted-price market-clearing rule at a bid-dependent price level, but it pays each winner the posted price at which they become pivotal for feasibility under compatibility and dual budgets. The next section formalizes that this payment is exactly what restores DSIC while keeping payments within requesters’ hard monetary budgets.
Our core mechanism description above maximizes the number of hires (equivalently, takes vi ≡ 1). If the platform instead wants to maximize ∑i, jvixij with publicly observable vi, we can incorporate vi into the tie-breaking/selection objective at r* by solving a max-cost flow (or min-cost with negative costs) on the same network. Because vi is not privately controlled by the worker, this refinement changes welfare but does not create new strategic dimensions; the incentive analysis continues to pivot on monotonicity in bi and the existence of thresholds. We return to the formal guarantees (and to the approximation dependence on compatibility tightness) in the next section.
We now formalize what DB-PEA buys us, conceptually and technically: a procurement environment with hard budget systems (money and carbon) and a compatibility system. The main point is that carbon enters the mechanism only through the bid-independent hire caps Kj, so the incentive and feasibility arguments largely decouple: incentives hinge on monotonicity in bids, while feasibility is enforced by capacities in the allocation subproblem.
DB-PEA is a deterministic, direct-revelation mechanism in a single-parameter domain (each worker has a private cost ci and submits a bid bi). The standard route to dominant-strategy truthfulness is therefore to (i) ensure the allocation rule is in each agent’s bid, and (ii) pay payments.
We rely on the fact that bids affect the mechanism only through in the
subproblems OSP(r): worker
i is present in S(r) iff bi ≤ r.
Lowering bi can only add
i to more sets S(r) (never remove it), and
does not change any requester’s capacities κj(r) = min {⌊Bj/r⌋, Kj}
or any compatibility caps τℓj.
Hence, for every candidate virtual price r ∈ Rb,
the feasible region of OSP(r)
weakly expands when bi
decreases.
The only remaining subtlety is : max-flow can admit multiple optimal allocations. We therefore fix a deterministic, bid-consistent tie-break (e.g., min-lex relative to a priority order that refines increasing bids by a fixed id). Under such a rule, when i lowers its bid it cannot lose to a higher-bid worker in the priority order; thus selection is monotone. Given monotonicity and determinism, threshold payments θi implement DSIC (the worker’s utility is maximized by bidding truthfully) and ex post IR (a truthful winner is paid at least its cost, while a non-winner pays/receives nothing). Formally, for truthful reporting bi = ci, we have ui = pi − ci ≥ 0 for winners because θi ≥ ci by definition of the threshold, and ui = 0 for non-winners.
The mechanism must satisfy hard systems simultaneously: (i) requester monetary budgets, (ii) requester carbon budgets, and (iii) compatibility caps. Compatibility is enforced directly by the flow construction, so it holds by construction. Carbon and money feasibility require slightly more care because payments are individualized thresholds rather than a single posted price.
∑ipij ≤ Bj and e∑ixij ≤ Ej ⇔ ∑ixij ≤ Kj,
Carbon feasibility is immediate: the allocation enforces ∑ixij ≤ κj(r*) ≤ Kj
for every requester j, and
Kj = ⌊Ej/e⌋
is exactly the carbon cap under constant per-worker energy e.
Monetary feasibility uses two ingredients. First, DB-PEA never
assigns more than ⌊Bj/r*⌋
workers to requester j because
∑ixij ≤ κj(r*) ≤ ⌊Bj/r*⌋.
Second, each winning worker’s threshold payment is capped by the
critical price: θi ≤ r*
(either as a property of the threshold under the critical-price
construction, or explicitly by paying min {θi, r*},
which preserves DSIC because the cap is bid-independent from the
worker’s perspective once r* is fixed).
Therefore,
$$
\sum_i p_{ij}
=\sum_{i:x_{ij}=1}\theta_i
\le \sum_{i:x_{ij}=1} r^*
= r^*\cdot |S_j|
\le r^*\cdot \Big\lfloor \frac{B_j}{r^*}\Big\rfloor
\le B_j.
$$
The same logic also clarifies why the mechanism treats carbon as a
quantity cap rather than a price: since Kj affects only
slots requester j may fill, it
does not introduce any new payment externalities that could jeopardize
budget balance.
DB-PEA is computationally tractable precisely because, under constant e, carbon feasibility reduces to a cardinality constraint per requester. This keeps the allocation subproblem within the scope of max-flow (or b-matching).
The dependence on ∑jKj
is economically natural: a tighter carbon regime (smaller Kj) reduces the
virtual price set and typically lowers computational burden, while a
very slack carbon regime reverts to the money-only setting. In practice,
one can often accelerate computation further by reusing residual graphs
across nearby r values, but
such engineering is orthogonal to the theoretical tractability
claim.
We next summarize the welfare guarantee, stated in the simplest cardinality form (vi ≡ 1). Let OPT denote the maximum number of workers that can be feasibly allocated under budgets and compatibility constraints when the platform knows true costs. Let ALG be DB-PEA’s number of allocated workers.
$$
\textsf{ALG}\ \ge\ \frac{1}{2\alpha+1}\ \textsf{OPT},
\qquad
\alpha:=\min\Big\{m,\ \max_{\ell,j}\Big\lceil \frac{|G_\ell|}{\tau_{\ell
j}}\Big\rceil\Big\}.
$$
The parameter α captures how
constraining the compatibility system is relative to the partition
sizes. When τℓj is
large (weak incompatibilities), α is small and the approximation is
strong; when some group-requester cap is extremely tight relative to the
group size, α grows and any
uniform-price, monotone rule can be forced to leave feasible value on
the table due to bottlenecks. Importantly, α is of carbon: it depends only on
how the interference/congestion structure restricts routing workers to
requesters.
Carbon enters only via the per-requester caps Kj and hence
only shrinks the feasible set for both OPT and ALG.
The approximation statement is relative to OPT , so tightening carbon budgets reduces
the absolute achievable allocation but does not introduce new
combinatorial pathologies beyond restricting the region. Put
differently, DB-PEA’s loss relative to the full-information optimum is
controlled by compatibility bottlenecks (through α) rather than by the mere presence
of a second (quantity) budget.
From a policy perspective, this monotone effect is desirable: stricter carbon limits should not be undermined by strategic bidding. In DB-PEA they are not, because Kj is exogenous and enforced directly in the allocation.
If the platform’s objective is ∑i, jvixij with publicly observable vi, the same incentive and feasibility arguments go through so long as the allocation rule remains monotone in bids (e.g., maximize total weight the critical-price capacity constraints, with deterministic tie-breaking). What can change is the approximation statement: bounding performance relative to the true weighted optimum typically requires additional structure (e.g., bucketing vi values and applying the PEA/CARE analysis per bucket), which may introduce an additional logarithmic factor. We flag this as a welfare-refinement rather than an incentive or feasibility complication: the carbon and budget logic remains unchanged.
Our baseline analysis leans on a deliberately strong simplification: per-contract energy is constant and verifiable (e), so the carbon budget Ej reduces to a hard cardinality cap Kj = ⌊Ej/e⌋. This is not only analytically convenient; it is also the knife-edge that keeps the allocation subproblem in max-flow territory and allows carbon feasibility to be enforced without entering the payment logic. In practice, however, energy footprints vary across devices, training workloads, and network conditions, and interference is rarely a clean partition into groups. We therefore flag several extensions that are economically natural but technically more delicate, and we indicate (i) where numerical methods or dynamic programming become unavoidable, and (ii) how incentive/approximation guarantees typically weaken.
Suppose each worker i has a
publicly known (or auditable) energy footprint ei > 0 per
contract, while budgets remain (Bj, Ej).
Carbon feasibility for requester j becomes
∑ieixij ≤ Ej,
which is a knapsack constraint even when vi ≡ 1. The
allocation problem at a fixed virtual price r is no longer a flow with simple
capacities: each requester now faces a resource constraint in as well as
a slot constraint in (through ⌊Bj/r⌋),
and the compatibility constraints couple requesters through shared
groups. This intersection typically destroys total unimodularity and
makes exact optimization NP-hard (a standard reduction embeds knapsack
by taking a single requester, no compatibility constraints, and asking
for a maximum-cardinality subset under ∑ieixij ≤ Ej).
Two practical approaches are worth separating.
If ei take values in a small set (or can be coarsened into H bins), we can treat energy as constant within each class. Concretely, let classes h ∈ {1, …, H} have representative energies ēh and let S(h)(r) = {i ∈ S(r) : ei ∈ class h}. A simple implementation is a sequential fill: for each requester j, allocate across classes subject to per-class pseudo-caps Kjh such that ∑hēhKjh ≤ Ej and Kjh ∈ ℤ+. Conditional on these integer pseudo-caps, the within-class assignment reverts to the baseline max-flow form with requester capacity ∑hKjh and group caps τℓj, and we can search over (Kjh) via a small DP per requester when H is small.
The economic price of discretization is twofold. First, feasibility becomes approximate: rounding ei to ēh requires either a conservative choice (guaranteeing ∑ieixij ≤ Ej but potentially under-allocating) or an ex post correction step (dropping some winners when realized energy exceeds the budget). Second, the clean constant-factor welfare bound can pick up an additional factor that depends on the coarsening error or on H (e.g., via a loss term from conservative rounding).
Holding r fixed, one can decouple across requesters only after accounting for compatibility. A pragmatic compromise is to keep the routing structure (compatibility) in a flow-like object, but to replace the requester capacity edge by a knapsack subroutine. For example, for each requester j and each group ℓ, we know we can accept at most τℓj workers from Gℓ; selecting which workers to take among the available ones then becomes a knapsack variant when workers are partitioned by groups, or a general knapsack if not. This is where numerical methods enter: pseudo-polynomial DP in Ej (after scaling) or an FPTAS is natural, but it replaces polynomial-time max-flow with a complexity that is polynomial in n and 1/ε but depends on scaled budgets.
On incentives, the lesson is that . If we implement an approximate allocation rule, DSIC is no longer automatic under threshold payments unless the approximation is itself monotone in bids. A standard way to retain truthful behavior is to restrict the mechanism to a family: we precommit to a finite set of feasible allocations (e.g., those representable by a discretized DP), select the best in that range at price r*, and compute thresholds within the range. This restores DSIC but typically weakens welfare by an additional approximation factor from the restricted range. Alternatively, one can move to by energy class (a menu of per-class prices πh), which preserves strategyproofness by construction but generally yields weaker approximation guarantees than critical-price PEA-style selection.
Even if each worker has a nominal footprint êi, realized
energy usage can be random due to device throttling, network retries, or
workload heterogeneity. If requester j faces a risk constraint
Pr (∑ieixij ≤ Ej) ≥ 1 − ηj,
then carbon feasibility is inherently probabilistic. There are two
conceptually different modeling choices.
If ei
are independent with known means/variances (from telemetry), we can
replace the chance constraint by a deterministic sufficient condition
using concentration bounds, e.g.,
$$
\sum_i \mathbb{E}[e_i]x_{ij} \ +\ z_{\eta_j}\sqrt{\sum_i
\mathrm{Var}(e_i)x_{ij}}\ \le\ E_j,
$$
or a simpler robust bound ∑i(𝔼[ei] + λi)xij ≤ Ej.
This restores a deterministic knapsack-type constraint with modified
weights, so the computational story reduces to the heterogeneous-energy
case above. Welfare is reduced by the conservatism margin; importantly,
incentives remain well-behaved because the safety margin is computed
from bid-independent primitives (telemetry estimates), not from
bids.
Alternatively, we can allow occasional budget overruns but enforce ex post corrections: if realized energy exceeds Ej, the platform can (i) drop some training jobs, (ii) delay them to a later carbon window, or (iii) charge carbon penalties/tokens. This moves the model closer to a stochastic scheduling problem; DB-PEA’s hard feasibility guarantee becomes an or statement depending on the chosen recourse. Mechanism-theoretically, recourse must be specified carefully: if workers anticipate that high-energy outcomes trigger job cancellation, they may face ex post risk that breaks ex post IR unless compensated (e.g., with partial payments upon attempted participation).
Our baseline treats energy as verifiable. If it is not, workers could under-report energy-related attributes or deviate during training (e.g., by running extra local epochs) in ways that increase carbon usage. This is not a single-parameter cost report problem; it is a moral-hazard problem that requires monitoring and enforcement.
A minimal extension is : the platform measures realized energy ẽi and charges a penalty if ẽi exceeds a declared or contracted level. For example, worker i posts a refundable deposit di and pays a penalty ϕ(ẽi − ē) if over budget. To preserve the incentive logic on bids bi, we want the penalty rule to be independent of the bid ; then the worker’s strategic choice over bi remains one-dimensional (win/lose) and threshold-style DSIC can survive. However, ex post IR becomes contingent: if penalties can exceed payments, a truthful low-cost worker could obtain negative utility after an adverse realization. This is not a bug but a modeling choice: either (a) we accept risk-bearing by workers, or (b) we design into the contract (e.g., cap penalties and correspondingly tighten carbon budgets conservatively). From a policy perspective, this highlights a real tradeoff: strict carbon compliance with noisy measurements typically requires either slack (fewer hires) or explicit risk transfers.
A different enforcement route is to impose (e.g., OS-level energy quotas) so that deviations cannot violate carbon constraints. This restores hard feasibility but may reduce model accuracy if such controls affect learning quality; economically, this is an additional non-monetary constraint that can be folded into compatibility/eligibility (workers whose devices cannot enforce caps are not in S(r)).
The partition structure {Gℓ}ℓ = 1L implies a laminar, matroid-like feasibility: each worker belongs to exactly one group, and we impose per-(group,requester) caps τℓj. Many wireless and systems interference models are not partitions. A more realistic abstraction is a graph H = (S, ℰ) where an edge (i, i′) indicates that workers i and i′ cannot be assigned to the requester (or more generally, that assigning both induces failure probability above δ). Then requester j must choose an among assigned workers, subject to money/carbon constraints. Even at a fixed price r, maximizing hires with an interference graph is a maximum independent set problem and is NP-hard in general.
There are three ways we can proceed, each with distinct guarantee profiles.
If H is bipartite with bounded degree, or if interference is captured by a collection of cliques with bounded overlap (a ``conflict hypergraph’’ with small rank), then we can replace the graph by a manageable family of packing constraints (clique inequalities) and recover an approximate flow/LP formulation. This is the closest analogue to our original group caps: we are effectively learning a small set of bottleneck constraints and treating them as τ-caps.
Without special structure, we can use greedy or local-ratio algorithms to obtain constant-factor approximations for independent-set-like constraints under additional assumptions (e.g., bounded inductive independence). Embedding such approximations inside a truthful mechanism is nontrivial: many natural greedy rules are not monotone in bids. A common resolution is to use or schemes that yield truthfulness in expectation, at the cost of weaker welfare and feasibility guarantees (typically in expectation, and with approximation constants depending on graph parameters).
Practically, one may estimate congestion probabilities q(⋅) on neighborhoods and derive caps τℓj by clustering the interference graph into groups (graph partitioning) or by identifying high-conflict communities. This sacrifices modeling fidelity but restores the polynomial-time and DSIC properties of DB-PEA. The loss is then interpretability: α and our approximation bound are with respect to the constraint system, not the original graph, so the guarantee is only as meaningful as the calibration quality.
Across all extensions, the common pattern is that once carbon (or interference) stops being representable as simple capacities, the mechanism must trade among three desiderata: (1) exact feasibility, (2) polynomial-time computation, and (3) strong (deterministic) DSIC with clean constant-factor welfare. Under heterogeneous/stochastic energy or general interference graphs, we can typically keep of these at once, but not all three without additional structure. Our baseline should therefore be read as a transparent, policy-relevant benchmark: it isolates the consequences of adding a (carbon) without conflating them with the computational and incentive complications that arise when measurement, heterogeneity, and interference are fully general.
Our theoretical results isolate a clean benchmark: when per-contract energy is constant and verifiable, a hard carbon budget is operationally a cardinality cap, and the platform can incorporate it without entangling feasibility with incentives. The goal of our experiments is to (i) stress-test this benchmark under realistic heterogeneity in data quality and interference, (ii) quantify the welfare–learning tradeoffs induced by the carbon cap Kj = ⌊Ej/e⌋ and compatibility caps τℓj, and (iii) compare DB-PEA to natural baselines used in FL procurement (single-budget mechanisms and carbon-agnostic heuristics). We emphasize outcomes that are legible to practitioners and policy audiences: and , alongside feasibility and runtime.
We organize the evaluation around four questions. (Q1) Our hypothesis is that carbon-agnostic rules may over-hire (or select a low-price set) that is feasible monetarily but infeasible in carbon, requiring ad hoc truncation that harms accuracy. (Q2) Here we expect the approximation loss to track the tightness proxy α, but we want to see the empirical magnitude of this loss under realistic group imbalance. (Q3) We expect monotone comparative statics: larger Bj or Ej weakly increase hires and learning performance until compatibility binds. (Q4) We measure max-flow runtime, payment levels, and the extent to which computed allocations saturate budgets versus leaving slack.
For each run, we record: (i) learning outcomes: final test accuracy,
area-under-curve (AUC) for accuracy vs. communication rounds, and
stability (variance over random seeds); (ii) economic outcomes: total
payments ∑i, jpij,
per-requester spend ∑ipij,
winner utilities ui = pi − ci
under truthful bids, and the distribution of payments across groups;
(iii) resource outcomes: total carbon usage per requester e|Sj|
and utilization ratio e|Sj|/Ej;
(iv) efficiency ratios,
$$
\text{Acc/\$} := \frac{\text{final accuracy}}{\sum_{i,j} p_{ij}},\qquad
\text{Acc/CO}_2 := \frac{\text{final accuracy}}{\sum_j e|S_j|},
$$
and analogues based on AUC; and (v) feasibility/robustness: frequency of
violations (should be zero for DB-PEA by construction), slack in
(BM)/(BC)/(COMP), and runtime per instance as a function of n, m, L.
We generate synthetic procurement instances to cover a wide range of regimes while keeping the mechanism inputs transparent. Workers i ∈ S have costs ci drawn from distributions that can be tuned to represent competitive or scarce markets (e.g., lognormal or mixture models). Bids are truthful in the baseline (bi = ci) to evaluate allocative performance; we additionally include a stress test where a random fraction of workers shade bids (e.g., bi = γci with γ > 1) to empirically verify that outcomes track the DSIC prediction (allocation monotonicity and nonnegative utility for truthful participants). When we optimize weighted value, we set observable weights vi using a proxy for expected contribution to learning (described below); we also report the cardinality objective case vi ≡ 1.
Requesters j ∈ A draw monetary budgets Bj and energy budgets Ej from controlled ranges. We vary the ratio Ej/Bj to span three regimes: money-binding, carbon-binding, and jointly binding. Per-contract energy is held constant at e in the main experiments (consistent with the baseline model), so the requester’s carbon-limited hiring capacity is Kj = ⌊Ej/e⌋. To connect with practice, we calibrate e to represent a fixed FL training contract (e.g., a fixed number of local epochs and batch size), and we report results both in normalized ``energy units’’ and in a concrete mapping (e.g., Wh) in the appendix.
We generate incompatibility groups {Gℓ}ℓ = 1L as a partition of workers. Group structure can represent shared wireless channels, colocation constraints, or correlated failure modes; empirically, we consider both balanced groups and highly imbalanced groups (heavy-tailed |Gℓ|) because imbalance is where caps are most distortionary. We derive τℓj from a chance constraint mapping τℓj = max {k : qℓ(k) ≤ δ} using parametric families qℓ(k) that capture a range of congestion steepness (e.g., logistic or polynomial growth in k). We vary δ to move from permissive compatibility to tight interference constraints, and we report how outcomes covary with the implied α = min {m, maxℓ, j⌈|Gℓ|/τℓj⌉}.
We implement DB-PEA exactly as specified: compute the virtual price set Rb = {Bj/t : t ∈ {1, …, Kj}}, find r* such that E(r*) = Mf(r*), solve the max-flow instance for OSP(r*), apply a deterministic tie-break (min-lex) to select winners, and pay threshold bids capped by r*. For comparators, we include:
A direct analogue that ignores carbon by replacing caps with ⌊Bj/r⌋ only (effectively Kj = ∞) and then post-hoc truncates to satisfy carbon, either by (a) dropping the highest-payment winners until |Sj| ≤ Kj or (b) random truncation. This isolates whether incorporating carbon the critical-price step matters.
Greedy selection by lowest bid (or best vi/bi), subject to monetary budgets and compatibility, followed by truncation to meet carbon. We also test a ``carbon-first’’ heuristic that enforces |Sj| ≤ Kj first and then spends remaining money, to show that naive carbon handling can leave monetary slack and reduce learning quality.
Fixed per-worker posted price π (common or requester-specific), hiring all workers with bi ≤ π up to feasibility. This is a simple, implementable benchmark and a useful foil for the role of the endogenous critical price r*.
All methods are evaluated under identical compatibility and budget primitives; when a baseline produces an infeasible set, we apply the same deterministic truncation policy to restore feasibility so comparisons are meaningful.
To connect procurement decisions to learning outcomes, we run federated learning on standard benchmarks (FMNIST and CIFAR-10), using FedAvg with a fixed number of communication rounds and a fixed local training contract per selected worker (e.g., E local epochs, fixed batch size, fixed optimizer). Each selected worker corresponds to a client with local data. To reflect non-IID conditions, we assign data via a Dirichlet split with concentration parameter β (small β yields highly heterogeneous clients). We treat the requester’s objective as either (i) maximizing the number of hired clients (cardinality, vi ≡ 1), or (ii) maximizing an additive proxy vi that predicts the client’s marginal contribution (e.g., validation accuracy gain from a small pilot, or a historical reputation score). We then report FL accuracy as the downstream outcome and analyze how well vi aligns with realized learning contribution.
We consider both single-requester and multi-requester FL. In the multi-requester setting, each requester trains an independent model (separate FL tasks) on its own selected clients, which cleanly matches the mechanism’s per-requester budgets and caps. This avoids confounding from cross-requester model coupling and allows direct interpretation of per-requester accuracy-per-resource.
In the baseline experiments, carbon consumption per selected worker is exactly e by design, so realized usage equals planned usage and (BC) holds deterministically. We nonetheless log device-side proxies (e.g., runtime, number of batches) to show that a constant-e abstraction is not vacuous: it corresponds to a contract that fixes training work and thus energy up to small noise. In a secondary robustness check (explicitly labeled as outside the baseline theorem), we introduce mild random perturbations ẽi = e(1 + ξi) to illustrate how a deterministic cap can be violated under measurement noise, and we compare conservative choices of Kj (e.g., using e(1 + ξ̄)) to ex post truncation. This robustness check is not used to claim feasibility guarantees; it is meant to quantify how much slack would be needed in practice.
We run systematic sweeps over: (i) carbon tightness via Ej (hence Kj), (ii) monetary budgets Bj, (iii) congestion tolerance δ (hence τℓj), (iv) group imbalance (variance of |Gℓ|), and (v) bid/cost dispersion. For each sweep, we plot allocation size, r*, budget utilization, and learning metrics, with special attention to regimes where the binding constraint switches (money ↔︎ carbon ↔︎ compatibility). We also report how often baselines require truncation and how truncation degrades learning relative to carbon-aware selection.
Beyond ``wins/losses’’ on accuracy, the experiments are designed to make the mechanism’s economic logic visible. When carbon is tight, we expect DB-PEA to internalize the reduced employability E(r) = ∑jmin {⌊Bj/r⌋, Kj} and select a set that is coherent with both budgets, leading to higher accuracy-per-carbon than carbon-agnostic methods that hire too many cheap workers and then discard them. When congestion caps tighten, we expect all methods to lose hires, but DB-PEA should degrade smoothly and avoid pathological concentration of winners in a few unconstrained groups (a failure mode of naive greedy rules). Finally, by reporting the full resource frontier (accuracy vs. dollars vs. carbon), we aim to provide an empirical bridge to the next section’s implementation discussion: how one might set Ej in practice, and what telemetry and auditing would be required for the constant-e abstraction to be a reliable contract.
The theoretical appeal of a hard carbon budget in our setting is that, when per-contract energy is constant and verifiable, carbon feasibility reduces to a simple per-requester hiring cap Kj = ⌊Ej/e⌋. The implementation question is therefore not whether we can under carbon constraints, but whether we can make the contractual abstraction—``each selected worker consumes e energy units’’—credible enough that (BC) is meaningful to both requesters and auditors. In practice, we view the mechanism as one component of a broader governance stack: (i) ex ante specification of a training contract, (ii) measurement and verification (M&V) that maps contract execution into energy units, and (iii) a compliance layer that translates energy into carbon reporting and internal controls.
A requester’s input Ej is best
interpreted as an budget for a specific procurement episode (e.g., one
FL training run or one model update cycle). To map organizational carbon
commitments into Ej, a natural
pipeline is: start from a carbon allotment in tCO2e, convert to energy via a
grid emissions factor, and then allocate across ML activities.
Concretely, if ηt, loc denotes
an emissions factor (e.g., kgCO2e/kWh) indexed by time and
location, then a carbon budget C̄j corresponds
to an energy budget
$$
E_j \;=\; \frac{\bar
C_j}{\eta_{t,\mathrm{loc}}}\quad\text{(location-based accounting),}
$$
or to a more conservative Ej using the
maximum feasible η over the
execution window (market-based accounting can be incorporated
analogously if renewable certificates are used). This interpretation is
operationally useful because the platform’s enforcement target is energy
usage, while the requester’s reporting target is carbon. In settings
where clients are geographically dispersed and η varies by location, we recommend
either (i) normalizing to energy (our baseline) and performing the
CO2 conversion at reporting
time, or (ii) defining multiple energy budgets by region and running
separate procurements; the latter improves accounting fidelity but
complicates the market design.
The constant-e assumption
should be understood as a : fixed model architecture, fixed number of
local steps/epochs, fixed batch size, fixed precision, and a specified
device class (or a bounded set of device classes). Under such a
contract, energy variation is not eliminated, but it becomes predictable
enough that we can set e as an
rather than an average. In implementation, we can define e using a conservative
quantile:
e := Q0.95(energy
consumption per
contract) ⇒ Pr (ẽi ≤ e) ≈ 0.95,
and then choose Kj = ⌊Ej/e⌋
to reduce the risk of ex post carbon overrun. This ``engineering
margin’’ is the simplest way to preserve the hard-feasibility
interpretation of (BC) when metering noise or device heterogeneity
cannot be eliminated.
For measurement, the platform can rely on a hierarchy of M&V options, each with different assurance and cost. At one extreme, when workers are data centers or managed devices, direct power metering (PDUs, on-board energy counters) provides high assurance. For edge devices, M&V often relies on OS-level battery discharge models, CPU/GPU utilization counters, and runtime telemetry. To prevent strategic manipulation of reported telemetry, one can use signed measurement logs, remote attestation (e.g., TPM/TEE-based), and spot audits. Importantly, our incentive results do not require workers to report energy; they only require that the platform and requesters agree on a verifiable contract so that e is a policy parameter rather than a private type.
Even with conservative e, deviations can occur (background load, thermal throttling, network retries). A practical enforcement design is to separate from . Selection is done using Kj = ⌊Ej/e⌋ to ensure planned feasibility. Settlement can incorporate (i) a small tolerance band (no adjustment within ±ϵ), (ii) penalties for repeated overages (temporary exclusion or lower priority), or (iii) make-good requirements where the requester reduces future Ej by the measured overage. We emphasize that if settlement rules are allowed to depend on worker bids, one must re-audit DSIC; the cleanest approach is to keep settlement dependent only on verifiable execution metrics, not on bi.
Although our main mechanism enforces carbon through a hard cap Kj, many organizations prefer an internal ``token’’ model for governance: each requester receives a token endowment 𝒯j denominated in energy units; each selected worker consumes e tokens; and tokens are debited automatically at execution time. Conceptually, this is equivalent to choosing Ej = 𝒯j in our model, but the token framing adds two practical advantages. First, it supports : a central sustainability office can allocate and reallocate 𝒯j over time based on priorities and observed learning returns. Second, it creates a natural audit trail: the platform can publish per-requester token debits alongside model training logs, simplifying internal controls and external reporting.
If tokens are made tradable across requesters (an internal cap-and-trade), then Ej becomes endogenous to organizational decisions. Our mechanism can accommodate this as an outer loop: requesters trade tokens, then run procurement with the resulting (Bj, Ej). We do not claim incentive properties for the token market itself; rather, we view it as a policy layer that determines budgets, while DB-PEA implements procurement conditional on those budgets.
We recommend treating Ej as an explicit knob that encodes an organization’s marginal rate of substitution between learning performance and carbon. Three implementation heuristics are common. (i) : set Ej proportional to historical spend or expected business value, updated quarterly. (ii) : choose Ej so that the implied hiring cap Kj hits a desired utilization rate (e.g., 80–95%) under typical markets; this reduces frequent binding and instability. (iii) : in early deployments, deliberately sweep Ej over a range and record accuracy-per-carbon; then select an operating point where marginal accuracy gains per additional energy flatten. The third approach aligns well with procurement governance because it produces a transparent empirical rationale for carbon budgets, rather than treating them as arbitrary constraints.
Compatibility caps are where engineering reliability constraints
enter the allocation problem. Our modeling choice,
τℓj := max {k ∈ ℤ+ : qℓ(k) ≤ δ},
is deliberately modular: it says that if group ℓ experiences failure (or
unacceptable degradation) with probability qℓ(k)
when k workers share a
constrained resource, then we hire at most τℓj
from that group for requester j to satisfy tolerance δ. This is implementable provided we
can (i) define what constitutes ``failure’’ (dropout rate, packet loss,
straggler probability, interference), and (ii) estimate qℓ(⋅) from logs
or experiments.
In deployments, qℓ(k)
is rarely known exactly; it is estimated with uncertainty and may drift
over time. We therefore recommend computing caps using an upper
confidence bound:
τℓj := max {k: q̂ℓ(k) + UCBℓ(k) ≤ δ},
where UCBℓ(k) is
chosen to reflect sampling error and nonstationarity (e.g., using
Hoeffding-style bounds or Bayesian credible intervals). This
conservative capping is the congestion analogue of choosing a
conservative e for carbon: it
preserves feasibility guarantees (low failure probability) at the cost
of potentially leaving performance on the table when the environment is
better than expected.
The tolerance parameter δ is, in effect, a reliability policy. Smaller δ yields smaller τℓj, tighter compatibility, and lower allocation sizes; larger δ permits more aggressive parallelism but increases the risk of FL underperformance due to dropouts or interference. Because the welfare loss from tightening caps is mediated by the tightness proxy α = min {m, maxℓ, j⌈|Gℓ|/τℓj⌉}, we recommend setting δ at the : safety- or compliance-critical training runs should use stricter tolerances, while exploratory runs can tolerate higher failure probabilities. For transparency, the platform should publish (internally) the implied τℓj and the empirical basis for qℓ(⋅); otherwise, stakeholders cannot distinguish an intentional reliability policy from ad hoc throttling.
A recurring governance failure mode in resource-constrained ML systems is the presence of ``hidden constraints’’ that are applied after the fact (e.g., manual truncation to meet energy limits or to avoid interference). Such truncation is precisely what our design aims to avoid, because it breaks the link between declared constraints and realized allocations. We therefore advocate an audit log that records, per requester and run: the declared (Bj, Ej), the derived caps Kj and τℓj, the critical price r*, the selected set Sj, and the realized energy and reliability metrics. This log is also where distributional concerns can be monitored: compatibility groups may correlate with geography or device class, and repeated binding of τℓj could systematically exclude certain populations. While our mechanism is not a fairness mechanism per se, the explicit representation of group caps makes it easier to (i) detect disparities and (ii) adjust policy parameters (δ, group definitions, or infrastructure investments) in a principled way.
Two limitations are worth stating plainly. First, if per-worker energy footprints are heterogeneous and cannot be bounded tightly by a common e, carbon feasibility becomes a knapsack-style constraint (as flagged in our extension), and exact polynomial-time allocation is no longer available. In that case, one should either discretize into energy classes with posted prices or accept approximation/randomization with correspondingly weaker operational guarantees. Second, our framework treats Bj and Ej as hard budgets that are not strategically chosen within the mechanism. In real organizations, budgets are outcomes of internal politics and planning; the mechanism can improve procurement conditional on budgets, but it cannot by itself resolve the upstream question of how budgets should be allocated across projects.
Taken together, these implementation considerations suggest a pragmatic reading of our results: the mechanism is most valuable when it is embedded in a disciplined contracting and M&V process. When that process exists, carbon and congestion constraints can be enforced through transparent caps (Kj and τℓj) rather than through opaque truncation, aligning economic incentives with operational feasibility and audit-ready carbon accounting.
We began from a practical tension that is becoming central to federated and distributed learning deployments: organizations want to procure many small training contributions, but they must do so under hard constraints that are not well captured by a single budget. The first is monetary—the familiar limitation that total payments cannot exceed a project’s spend authority. The second is physical and policy-driven—an energy (or carbon) budget that constrains the number of contracts that can be executed within an episode. In parallel, operational reliability imposes compatibility constraints: allocating too many workers that share a bottleneck (network segment, device class, region, or other interference domain) can degrade training or increase failure risk. Our model isolates these frictions in a way that is deliberately modular: monetary budgets enter through standard payment feasibility, energy budgets enter as a hard cap on hires when per-contract energy is standardized and verifiable, and reliability enters through per-requester group caps derived from chance constraints.
The main conceptual contribution is to show that, in this structured environment, we can keep the appealing features of classical procurement design—dominant-strategy truthfulness, ex post individual rationality, and computational tractability—without treating carbon as a soft objective or an after-the-fact adjustment. When per-worker energy is constant at a verifiable level e, each requester’s carbon budget Ej induces a simple hiring cap Kj = ⌊Ej/e⌋, and the two-budget constraint becomes at the level of feasibility: for any price r, requester j can afford at most ⌊Bj/r⌋ workers financially and at most Kj workers physically, so the operative demand at price r is min {⌊Bj/r⌋, Kj}. This observation is economically important because it restores a one-dimensional “price-per-worker” logic even in the presence of an additional non-monetary budget: carbon does not need to be monetized to be enforced.
On the mechanism-design side, we extend the price-based, single-parameter procurement template of PEA/CARE to a setting with compatibility constraints. The mechanism identifies a critical price r* from a finite candidate set (derived from the requesters’ budgets) such that aggregate employability at that price matches the maximum feasible allocation subject to compatibility caps and per-requester caps. It then selects a feasible set of winners among those whose bids are at most r* and pays threshold payments. Two aspects are worth emphasizing. First, the existence of a critical price is not merely a technical device; it is the point at which the market “clears” in the limited sense that raising the price further would not increase the number of workers we can allocate. Second, computational feasibility is not an afterthought: for any fixed r, the allocation problem reduces to a max-flow (or b-matching) instance once we express group caps and requester caps as capacities. In this sense, the mechanism is not only incentive compatible but also implementable at the scale required by modern FL markets.
Our approximation guarantee formalizes the efficiency loss that is
unavoidable when we insist on DSIC, hard feasibility for two budgets,
and compatibility constraints that can be tight. The key driver of the
bound is the tightness parameter
α = min {m, maxℓ, j⌈|Gℓ|/τℓj⌉},
which measures how constraining the interference structure can be
relative to available parallelism. Intuitively, when group caps are
loose (large τℓj),
the market behaves closer to a standard multi-buyer procurement problem
and the mechanism’s allocation approaches the full-information optimum.
When group caps are tight, feasibility becomes the limiting factor and
no mechanism can “buy” its way out with higher prices alone; the
approximation factor degrades in a controlled way that is transparent in
α. A practical reading is that
investments which relax bottlenecks (increasing τℓj)
generate returns: they improve not only realized training reliability
but also allocative efficiency under budgets.
We also view the policy and implementation layer as part of the contribution, because it clarifies which assumptions are essential for the clean mechanism guarantees. The constant-e abstraction is not a claim that devices are identical; it is a claim that the procurement contract can be standardized and verified well enough that energy becomes a policy parameter rather than a private type. Similarly, the cap mapping τℓj from a congestion chance constraint is meant to separate engineering estimation from economic allocation: once the platform commits to caps derived from reliability tolerances, allocation and payments can be performed transparently without discretionary truncation. This separation matters for governance. It makes budgets and reliability policies auditable, and it reduces the temptation (or necessity) to enforce carbon constraints ex post in ways that are opaque to participants and difficult to reconcile with incentive guarantees.
Several limitations delineate a natural agenda for future work. The most immediate is heterogeneous energy usage. If each worker has a known but non-uniform footprint ei, carbon feasibility becomes a knapsack constraint and exact allocation becomes NP-hard even absent strategic behavior. Our extension discussion outlines pragmatic paths—energy class discretization, FPTAS-style subroutines, or posted-price variants—but there is a genuine frontier here: designing mechanisms that remain strongly budget-feasible, computationally practical, and robust to strategic bidding when carbon is a multidimensional resource. A second limitation is dynamics. Many FL deployments are repeated, with budgets replenished, reputations updated, and congestion conditions changing over time. Extending the critical-price logic to a multi-period setting raises questions about intertemporal incentives (e.g., bid shading today to improve selection odds tomorrow), as well as about how to amortize carbon budgets across episodes in a way that is both efficient and accountable.
A third set of open questions concerns heterogeneity in value. We focused on additive objectives (cardinality or reputation-weighted value) because they align with the single-parameter cost model and admit tractable allocation. Yet FL performance gains can be submodular, state-dependent, or require diversity constraints that are not captured by static group caps. Incorporating richer value models will likely require maximal-in-range designs, approximate VCG variants, or carefully structured posted prices; each direction interacts nontrivially with budget feasibility. Closely related is the role of fairness and representation. Group caps make interference constraints explicit, but groups may also correlate with protected characteristics or with regions that an organization wants to include for equity reasons. A principled integration of fairness constraints—beyond the reliability-driven τℓj—would require additional structure (e.g., lower bounds, parity constraints, or outcome-based audits) and may change both approximation behavior and incentive properties.
Finally, we emphasize a broader takeaway: treating carbon and reliability constraints as leads to cleaner economics and better operational outcomes than treating them as soft penalties. In procurement markets, it is tempting to “optimize a weighted sum” of spend and emissions, or to allocate first and then trim to meet carbon limits. Our analysis suggests that, when budgets are genuinely hard and audit-relevant, such approaches are fragile. By contrast, enforcing energy feasibility through explicit caps and running a DSIC threshold-payment mechanism inside that feasible region yields allocations that are predictable, auditable, and strategically robust. The mechanism does not eliminate the policy choices embedded in Ej and δ; rather, it makes those choices explicit and traceable, allowing organizations to debate and revise them without simultaneously changing the economic rules of the market.
In sum, the model illuminates a concrete tradeoff faced by practitioners: more aggressive parallelism and broader participation are valuable, but they are constrained by money, by carbon, and by interference. Our dual-budget extension of PEA provides a disciplined way to navigate this tradeoff. It offers a tractable mechanism with strong incentive and feasibility guarantees under realistic operational constraints, while making clear where the hard problems begin—heterogeneous energy, dynamics, and richer objectives. We hope this framework helps move carbon-aware learning procurement from ad hoc throttling toward transparent, policy-compliant market design.