Dual-Budget Compatibility-Aware Procurement for Federated Learning: Monetary & Carbon Budgets under Congestion Caps

Metadata

Total Words: 14,946
Export Date: 2026-01-14 21:40:14
Description: Federated learning (FL) markets increasingly face multi-dimensional constraints: requesters must pay workers while also meeting carbon/energy caps, and worker selection must respect congestion-driven incompatibilities (e.g., channel contention) and data-diversity requirements. Existing budget-feasible mechanisms in FL typically model a single monetary budget and treat worker quality as exogenous, ignoring sustainability constraints that are binding in 2026-era deployments. Building on compatibility-aware procurement ideas (CARE) and critical-price selection, we introduce a clean dual-budget model with per-requester money budgets and hard carbon budgets, plus incompatibility groups with caps derived from chance constraints on congestion. Under a tractable yet policy-relevant energy model (fixed per-worker energy per training round, verifiable via device telemetry), we propose a Dual-Budget PEA mechanism that uses virtual prices and max-flow assignment to select compatible workers and compute threshold payments. We prove DSIC, ex post IR, feasibility for both budgets, and a constant-factor approximation guarantee that matches the single-budget case up to a feasibility-tightness parameter. We further discuss extensions to heterogeneous energy footprints (requiring pseudo-polynomial DP/FPTAS and weaker incentive notions) and validate the approach in FL experiments where carbon budgets bind.

1. Introduction: why dual budgets (money + carbon) and compatibility constraints are binding in 2026 FL; limitations of single-budget reputation-based mechanisms (CARE) and why sustainability must be modeled explicitly.
2. Related Work: (i) budget-feasible mechanisms and multi-buyer procurement, (ii) FL incentive mechanisms, (iii) sustainability/energy constraints in ML systems, (iv) compatibility/interference-constrained markets.
3. Model: agents, budgets (B_j,E_j), worker costs c_i, bids b_i, energy model e, incompatibility groups and congestion-derived caps _{j}, objective (additive reputation/value).
4. Baselines and Impossibility Intuition: why naive reductions (combine budgets, greedy by b_i/v_i) can violate carbon feasibility or break truthfulness; identify where multi-constraint IC becomes hard if energy is private.
5. Dual-Budget Virtual-Price Mechanism (DB-PEA): define employability under a price r with both money and carbon caps; compute critical price; flow-based allocation; deterministic tie-breaking; threshold payment computation.
6. Theoretical Guarantees: DSIC + ex post IR; budget feasibility (money + carbon); computational complexity via max-flow; approximation bound in terms of ; discussion of dependence on carbon tightness K_j.
7. Extensions (Flagged): heterogeneous energy e_i (few classes) and stochastic energy; auditing and penalties; interference graph instead of group caps; note where numerical methods/DP are required and how guarantees weaken.
8. Experiments (Plan): simulation with carbon caps, congestion caps, and FL training on CIFAR/FMNIST; compare to CARE-style single-budget mechanisms and carbon-agnostic heuristics; report accuracy-per-dollar and accuracy-per-carbon.
9. Policy/Implementation Discussion: carbon accounting, telemetry verification, internal carbon token systems; guidance on setting E_j and mapping congestion chance constraints to _{j}.
10. Conclusion

Content

1. Introduction: why dual budgets (money + carbon) and compatibility constraints are binding in 2026 FL; limitations of single-budget reputation-based mechanisms (CARE) and why sustainability must be modeled explicitly.

Federated learning (FL) procurement has matured into an operational market: multiple model owners (requesters'') compete for participation from a large population of heterogeneous devices and users (workers’’) who incur private costs when training, communicating, and exposing their devices to privacy and reliability risks. In 2026, the binding constraints in this market are no longer purely monetary. Two additional frictions have become first-order in practice and, we argue, must be modeled explicitly if one wants mechanisms that remain implementable outside stylized laboratory settings: (i) hard sustainability constraints (energy/carbon budgets) imposed by regulation, internal accounting, or contractual commitments, and (ii) compatibility constraints arising from interference and shared infrastructure, which induce group-based caps on how many workers can be simultaneously served by a requester.

The first friction is straightforward to observe in deployments. Large-scale FL often runs at the edge (phones, IoT, on-prem GPUs) but is orchestrated by cloud services that are increasingly subject to carbon accounting. Many requesters now face explicit emissions ceilings for a quarter or a product cycle, not merely a shadow price for carbon. This hard-cap logic appears in diverse forms: procurement policies that allocate a fixed number of energy credits'' to teams; sustainability clauses in enterprise contracts; and compliance regimes that require auditable disclosure of energy use and emissions. In such environments, paying more money cannot necessarily buy more participation once the carbon envelope is exhausted. Mechanism design that treats sustainability as an afterthought—e.g., by optimizing a monetary budget and then performing an ex postcarbon check’’—will frequently produce allocations that are infeasible, or it will require ad hoc clipping rules that break incentive guarantees.

The second friction—compatibility—reflects the operational reality that workers are not interchangeable across requesters once we account for shared bottlenecks. Edge participants share bandwidth, radio spectrum, regional backhaul, charging infrastructure, and even human attention; they also share risk exposure when multiple requesters target the same device populations. These interactions generate negative externalities that are naturally modeled as congestion or interference. In procurement terms, it is often not acceptable to allocate too many workers from the same ``incompatibility group’’ to a given requester: e.g., too many devices in a single cell tower region; too many clients behind the same NAT or ISP; too many workers who rely on the same vulnerable library or hardware supply chain; or too many participants whose data distributions are nearly identical (which can destabilize training and undermine statistical guarantees). Operators increasingly encode these considerations as caps derived from reliability or failure-probability requirements: if the probability of congestion-induced failure grows steeply with the number of concurrently served workers in a group, then a chance constraint can be translated into a deterministic cap. This translation is not merely technical convenience; it mirrors how SRE and ML platform teams implement policy—via simple capacity limits that are enforceable in real time.

These two frictions interact. Carbon constraints shrink the feasible set of hires even when monetary budgets are slack, while compatibility caps can prevent a requester from fully spending either monetary or carbon budgets by blocking access to available workers in a congested group. As a result, the market-clearing logic underlying single-budget procurement mechanisms becomes incomplete: it is no longer enough to find a price that balances money-based demand and supply. We must simultaneously respect a second (non-monetary) feasibility dimension and a combinatorial structure induced by incompatibility. Our aim in this paper is to make this dual-feasibility environment precise, and then show that the key desiderata of mechanism design—dominant-strategy incentive compatibility (DSIC), ex post individual rationality (IR), and budget feasibility—remain achievable with polynomial-time computation and provable welfare guarantees.

Why is dual budgeting the right modeling choice, rather than, say, an internal carbon price that can be folded into monetary costs? The answer is institutional rather than philosophical. Internal carbon prices are common, but they typically coexist with hard caps: teams are charged a carbon price for accounting, yet they still face a quarterly emissions ceiling that cannot be exceeded. Moreover, carbon prices fluctuate (and differ across organizations), while the mechanism designer (the platform) often faces a simpler requirement: never schedule training that violates an externally declared carbon allotment. This hard-cap view makes the sustainability constraint structurally analogous to a second budget, but crucially it is not transferable across requesters and not relaxable by additional payments. In other words, money and carbon are not perfect substitutes at the mechanism level, even if they are related in corporate planning.

Existing reputation-based and budget-feasible mechanisms for FL procurement illustrate the gap. A representative approach—typified by CARE-style designs—assigns workers based on reported costs and observable quality/reputation, ensuring that payments fit within a monetary budget. Such mechanisms are attractive because they blend economic incentives (truthful cost reporting) with ML-relevant signals (quality weights), and they often inherit clean threshold-payment logic from single-parameter auction theory. However, in a carbon-constrained environment, a purely monetary mechanism can fail in two ways. First, it can simply over-allocate: the mechanism may select a set of workers that is affordable in dollars but exceeds the requester’s energy/carbon cap once training actually occurs. Second, even when ex post clipping is applied (e.g., ``drop some winners until carbon fits’’), the resulting allocation rule is typically no longer monotone in bids, undermining DSIC and inducing strategic misreporting. The platform then faces a choice between feasibility and incentive guarantees—an undesirable tradeoff if the goal is a robust market protocol.

The sustainability dimension also changes what it means for an allocation to be ``efficient.’’ In single-budget settings, spending more budget typically increases welfare by allowing more hires or higher-quality hires. Under a hard carbon cap, marginal dollars may not translate into marginal participation; instead, they may only shift which workers are selected within a fixed carbon envelope. This creates a new role for pricing: prices must clear a market subject to a non-monetary quantity limit. Put differently, carbon introduces a binding quantity constraint that truncates monetary demand. Ignoring this truncation leads to miscalibrated price signals and, in multi-requester markets, can distort competition across requesters: one requester with abundant money but scarce carbon may bid up participation that it cannot ultimately use, crowding out a requester who could.

Compatibility constraints further complicate the picture because they convert what would be a separable per-requester selection problem into a coupled allocation problem. In many FL systems, a worker can plausibly be assigned to at most one requester’s training job in a given period, and the platform must respect group-level caps. This resembles matching with additional side constraints—an environment where naive greedy selection can be far from optimal and, more importantly for incentives, can be non-monotone. The compatibility structure thus cannot be relegated to an engineering afterthought (``we’ll just enforce it in scheduling’’) if we also want truthful bidding and interpretable payments.

Our modeling choices are guided by implementability. We assume that per-worker energy usage per training contract is publicly verifiable (via telemetry or standardized measurement) and approximately constant at the level relevant for procurement. This assumption is not innocuous, but it captures a common contractual arrangement: organizations often procure participation for a standardized training job (same number of rounds, similar device requirements), and they can instrument energy use at the orchestrator level. Under this assumption, a carbon/energy budget becomes a hard cap on the number of hires a requester can sustain, which integrates cleanly with monetary affordability. We also treat compatibility through group-requester caps derived from a chance constraint; again, the goal is to encode a complex reliability requirement into an enforceable and auditable rule that can be checked at allocation time.

Within this structured environment, our contribution is conceptual as much as technical. Conceptually, we argue that sustainability constraints in FL procurement should be treated as first-class feasibility constraints—akin to budgets—not as a soft preference or a post hoc reporting exercise. Mechanism design provides a language for doing so while preserving truthfulness and transparency. Technically, the key message is that adding a carbon cap does not necessarily destroy the tractability and incentive properties that made single-budget mechanisms appealing. When carbon enters as a per-requester cardinality cap (induced by constant, verifiable per-worker energy), we can extend price-based employability ideas to a dual-budget notion of demand, compute allocations via standard flow constructions under compatibility constraints, and retain threshold-style payments that support DSIC and ex post IR.

At the same time, we emphasize the limits of this baseline model. Energy usage is not truly constant across workers, and measurement can be noisy or manipulable if instrumentation is weak. When energy footprints are heterogeneous, carbon feasibility becomes knapsack-like and the allocation problem inherits computational hardness; in such regimes, one must rely on approximation, discretization, or randomized mechanisms, and the clean deterministic DSIC story becomes more delicate. Similarly, compatibility groups are an abstraction: real interference patterns can be time-varying and network-dependent, and caps may need periodic recalibration. We therefore view our results as illuminating a tradeoff: stronger sustainability and reliability guarantees can be incorporated into truthful procurement, but doing so requires careful attention to what is measurable and enforceable.

This introduction motivates why the dual-budget, compatibility-constrained procurement problem is the right object for 2026-era FL systems. The next section situates our approach relative to the literatures on budget-feasible procurement and multi-buyer mechanisms, FL incentives and reputation, sustainability constraints in ML systems, and interference-constrained markets.

Our setting sits at the intersection of four literatures that are often studied separately: budget-feasible procurement and multi-buyer mechanisms, incentive design for federated learning (FL), sustainability constraints in ML systems, and allocation under compatibility/interference constraints. A recurring theme across these areas is that the operational constraints that make a protocol deployable (budgets, capacity limits, reliability requirements) are exactly the constraints that tend to break the clean monotonicity and separability properties on which dominant-strategy mechanism design relies. Our goal is to leverage the parts of these literatures that compose—price-based selection, threshold payments, and flow-based allocation—while being explicit about what must be assumed (and verified) for this composition to remain valid.

The classic point of departure is for procurement: a buyer with a hard budget seeks to purchase items (or services) from strategic sellers with private costs, while maximizing a value function subject to the budget. Beginning with early work such as Singer~ and a large sequence of refinements, this literature establishes that one can obtain constant-factor approximations under truthfulness and budget feasibility for a variety of valuation classes (additive, submodular) using posted prices, random sampling, and carefully designed thresholds (see surveys and follow-ups in the same line). Two aspects are especially relevant to us. First, many truthful budget-feasible mechanisms rely on structure on the seller side (each agent has one private cost) and consequently use monotone allocation rules plus threshold payments. Second, a substantial subset of these mechanisms can be interpreted as variants of “market-clearing” at a uniform price that is computed from bids and the budget, which is precisely the logic behind price-elicitation approaches.

However, most canonical budget-feasible results are for a buyer (or a single aggregate budget) purchasing from many sellers, whereas FL procurement platforms increasingly intermediate among requesters with separate budgets and heterogeneous constraints. Multi-buyer procurement brings the problem closer to matching or assignment markets, and raises additional design issues: competition across requesters, feasibility coupling (a worker cannot be assigned to multiple requesters simultaneously within a period), and the need for per-requester budget compliance rather than a single global constraint. Related work in multi-unit and multi-buyer environments includes double auctions, two-sided matching markets with prices, and “clinching” style procedures under budgets; yet DSIC with hard budgets on the buyer side often requires structure (e.g., monotone demand, gross substitutes) that is absent once we add combinatorial side constraints. In practice, platforms therefore fall back on posted prices or heuristic ranking rules that are easy to implement but come with limited incentive guarantees.

A further strand studies budget-feasible mechanisms under beyond money, such as matroid constraints, matching constraints, or general packing constraints. These constraints matter because they mirror real engineering restrictions (capacity, interference, limited parallelism), but they also tend to make naive greedy allocation non-monotone. The mechanisms most relevant to our approach are those that (i) compute a critical price (or a small menu of prices), (ii) restrict attention to agents below that price, and (iii) choose a feasible subset maximizing cardinality or additive value under the side constraints. This is the conceptual template behind PEA- and CARE-style designs that we build upon; our contribution is to show how a second hard budget (carbon) can be folded into the same template sacrificing polynomial-time implementability or threshold-based truthfulness, provided the sustainability constraint reduces to a cardinality cap.

FL has generated a large literature on incentives, broadly motivated by the fact that participation is costly, devices are heterogeneous, and data owners may strategically respond to reward schemes. Many papers propose reverse auctions or contract-theoretic schemes in which a single model owner purchases computation or data from devices, sometimes incorporating quality signals (e.g., sample size, past accuracy contributions) or robustness considerations (e.g., adversarial or unreliable participants). A parallel thread emphasizes contribution evaluation—often via Shapley-value approximations, marginal accuracy improvements, or gradient-based metrics—and then uses these scores to distribute rewards. Reputation systems and repeated-interaction models attempt to address long-run reliability and sybil resistance, complementing one-shot payment rules.

Two gaps remain salient for our purposes. First, much of the FL incentive literature assumes a requester and optimizes a single resource dimension (typically monetary), whereas modern FL platforms increasingly multiplex many concurrent training jobs with separate budgets and policies. Second, even when energy or battery constraints are mentioned, they are usually treated as (i) private costs folded into c_i, (ii) soft penalties in an objective, or (iii) device-side constraints that the platform does not explicitly enforce. In contrast, our focus is on sustainability constraints that are auditable and non-transferable across requesters. This distinction matters: if carbon is a hard feasibility constraint rather than a component of worker disutility, then simply “paying more” cannot restore feasibility, and ad hoc post-processing of monetary winners (to satisfy carbon) can break the monotonicity needed for DSIC.

We also note that several FL mechanisms incorporate observable “quality weights” (reputation, reliability, or proxy utility) and then solve a weighted procurement problem subject to budgets. Our baseline treatment aligns with this practice: the mechanism may maximize additive value ∑v_ix_ij with public v_i, while strategic behavior enters only through reported costs. The novelty, relative to existing reputation-weighted procurement rules, is that we enforce a second hard cap per requester and maintain threshold payments in a multi-requester assignment environment with group-based compatibility limits.

Outside mechanism design, there is now a mature systems-and-policy literature on the carbon footprint of training and deploying ML models. Work on “Green AI,” carbon accounting dashboards, and carbon-aware scheduling has made energy consumption measurable and increasingly actionable: cloud providers expose region-specific emissions intensities; operators shift workloads in time and space to exploit renewable availability; and organizations adopt internal carbon prices or caps to govern compute usage. In parallel, regulators and procurement contracts have moved from aspirational sustainability goals toward , which in practice often take the form of fixed quarterly budgets for energy or emissions.

Most of this literature is optimization- or scheduling-driven rather than incentive-driven: it presumes the operator can control workloads and does not contend with strategic reporting of costs by edge participants. Where economics enters, it is frequently through carbon pricing (a shadow cost per unit energy) or multi-objective tradeoffs (accuracy versus energy). These approaches are valuable, but they do not directly address our institutional premise: in many deployments, the platform is required to satisfy a hard cap (a compliance constraint), not merely to trade off carbon against dollars in an objective. From a mechanism-design perspective, this suggests modeling sustainability as a second feasibility constraint rather than as a price. Our framework is complementary to carbon pricing: one can still interpret the carbon cap as reflecting an organizational willingness-to-pay for emissions, but the mechanism must operate under the constraint that the cap cannot be violated ex post.

A key modeling fork concerns whether energy usage is homogeneous and verifiable. Systems work highlights that energy can vary across devices, networks, and training regimes. We acknowledge this: when energy footprints differ meaningfully across workers, the sustainability constraint becomes knapsack-like and exact allocation with additional constraints is computationally hard. This observation links our “constant per-contract energy” assumption to the sustainability literature’s emphasis on instrumentation and standardization: if platforms can standardize FL jobs (fixed rounds, fixed client workloads) and verify energy via telemetry, then the sustainability constraint simplifies into a tractable cap. If not, then one must turn to discretization, approximation, or randomized policies—approaches more commonly seen in systems optimization than in deterministic DSIC mechanism design.

Finally, our compatibility constraints connect to allocation problems with interference, congestion, or conflict constraints. In wireless networking and spectrum markets, feasibility is often modeled via conflict graphs or interference sets; in cloud and edge scheduling, feasibility is governed by shared bottlenecks (bandwidth, backhaul, local compute, or failure domains). Mechanism design in such settings is challenging because feasible sets are rarely matroids, and because “local” constraints can induce global non-monotonicities. A pragmatic response—mirroring how operators implement policy—is to replace complex interference interactions with enforceable caps derived from reliability targets, e.g., chance constraints on congestion or failure probabilities. This yields deterministic group-based limits that are easy to audit and enforce at allocation time.

From an algorithmic standpoint, group-based caps and per-requester capacities lead naturally to flow and b-matching formulations, which are among the few combinatorial primitives that are both expressive enough to capture real constraints and computationally tractable at scale. This algorithmic tractability is not merely a convenience: it is also a mechanism-design enabler, because polynomial-time winner determination under a monotone “eligible set” (agents below a price) makes threshold payments implementable. Our work therefore aligns with a broader trend in constrained mechanism design: restrict the allocation problem to a structure (flows, matchings, partition caps) where feasibility can be certified efficiently and where the economic logic (monotonicity and thresholds) can be preserved.

In summary, prior work gives us three largely separate toolkits: (i) truthful budget-feasible procurement at a critical price, (ii) FL-specific incentive heuristics incorporating reputation and contribution signals, and (iii) systems-level carbon accounting and interference-aware scheduling. What is missing is a unified procurement mechanism that treats sustainability as a hard per-requester budget, respects operational compatibility limits, and still provides DSIC and ex post IR with polynomial-time computation in a multi-requester market. Our dual-budget extension is intended to fill precisely this gap, while making explicit where the design is robust (constant-factor guarantees under enforceable caps) and where it is inherently limited (heterogeneous or non-verifiable energy footprints).

3. Model: agents, budgets (B_j,E_j), worker costs c_i, bids b_i, energy model e, incompatibility groups and congestion-derived caps _{j}, objective (additive reputation/value).

We study a procurement market for federated-learning (FL) training in which a platform matches a set of strategic worker devices to multiple concurrent requesters (training jobs). The modeling choice that disciplines the analysis is that requesters face hard feasibility constraints: a monetary budget and an auditable sustainability (energy/carbon) budget. This reflects common operational practice: budgets are committed ex ante (e.g., a project’s dollar budget and a compliance-driven energy cap), and the platform must remain feasible , rather than merely trading off dollars against emissions in an objective.

There are two types of agents. The set of workers is S with |S| = n, indexed by i. Each worker represents a device (or data owner) that can be recruited to participate in one FL training contract for one requester during the relevant period. The set of requesters is A with |A| = m, indexed by j. Each requester corresponds to a training job (or a model owner) that wishes to hire a subset of workers subject to its feasibility constraints. The platform is a mechanism designer: it collects reports, computes an assignment, and specifies payments.

Each worker i has a private participation cost c_i ≥ 0 (covering local computation, privacy risk, opportunity cost, and other device-side burdens). Workers submit a report (bid) b_i ∈ ℝ₊ interpreted as the minimum payment they claim to require. We emphasize the single-parameter structure because it is the main lever that enables dominant-strategy incentive compatibility (DSIC) with threshold payments. In particular, we do allow workers to report multi-dimensional types (e.g., separate costs for energy, data quality, and latency); we treat these dimensions as either embedded in c_i or managed through observables and feasibility constraints described below.

Optionally, each worker has a publicly observable reputation/quality weight v_i ≥ 0 (e.g., historical reliability, contribution proxy, device class). This captures the practical fact that platforms often have some non-strategic score used to prioritize participants even when costs are privately known. When we focus on maximizing the number of hires, we set v_i ≡ 1; when we model quality-sensitive procurement, the platform’s objective becomes additive in v_i.

Each requester j comes with a monetary budget B_j > 0 and a carbon/energy budget E_j > 0. The monetary budget is standard: the total payments made by requester j cannot exceed B_j. The sustainability budget is modeled as a hard cap on energy use attributable to that requester’s training contract within the period. We interpret E_j broadly (it may be an energy cap, an emissions cap converted into energy units using an accounting factor, or an internal carbon budget denominated in “energy-equivalent” units), but in all cases it is a non-transferable feasibility constraint at the requester level.

A key simplifying assumption, motivated by standardized FL jobs, is that energy usage per worker-contract is constant and verifiable: each recruited worker consumes e > 0 units of energy for participating in the contract. This is plausible when the platform fixes the training recipe (number of rounds, local epochs, batch sizes) and audits consumption via telemetry or provider accounting. Under this assumption, the carbon/energy constraint for requester j reduces to a cap:
$$ e\sum_i x_{ij}\le E_j \quad \Longleftrightarrow \quad \sum_i x_{ij}\le K_j:=\left\lfloor \frac{E_j}{e}\right\rfloor, $$
so K_j is the maximum number of workers that requester j can hire under its sustainability budget. This reduction—energy to a per-requester hire cap—is central: it preserves computational tractability and, importantly, keeps the sustainability constraint , which is crucial for clean incentive arguments.

Beyond budgets, the platform must respect operational constraints that capture interference, correlated failures, or “do-not-co-locate” policies. We model these via incompatibility groups. The worker set S is partitioned into L groups G₁, …, G_L (e.g., devices sharing a network bottleneck, a geographic region with limited backhaul, or a common failure domain). For each requester j and group ℓ, the platform enforces a cap τ_ℓj ∈ ℤ₊ limiting how many workers from group G_ℓ may be simultaneously assigned to requester j:
∑_{i ∈ G_ℓ}x_ij ≤ τ_ℓj ∀ℓ ∈ {1, …, L}, ∀j ∈ A.
We view τ_ℓj as an for more complex stochastic congestion interactions. Concretely, let q_ℓ(k) denote a primitive “failure/congestion probability” when k workers from group ℓ are activated in parallel for the same requester, and let δ ∈ (0, 1) be an operator-chosen tolerance. A canonical mapping is
τ_ℓj := max {k ∈ ℤ₊ : q_ℓ(k) ≤ δ},
although our mechanism will treat τ_ℓj as an input once derived or estimated. This modeling choice mirrors practice: operators rarely enforce fully state-dependent interference constraints in real time; instead they certify policies in the form of auditable caps that guarantee reliability targets.

An allocation specifies, for each worker–requester pair, an indicator x_ij ∈ {0, 1} equal to 1 if worker i is assigned to requester j. We write S_j := {i : x_ij = 1} for the set assigned to requester j. Workers are unit-supply in our baseline: each worker can serve at most one requester in the period,
∑_j ∈ Ax_ij ≤ 1 ∀i ∈ S,
reflecting the fact that a device cannot meaningfully participate in multiple FL jobs simultaneously (or the platform disallows it to avoid privacy/accounting complications). The mechanism also specifies payments p_ij ≥ 0 from requester j to worker i, and we define total payment to worker i as p_i := ∑_jp_ij.

The platform must output (x, p) satisfying three families of constraints:

In what follows, we often combine the first two constraints at the level of workers a requester can afford at a uniform per-worker price. Intuitively, if workers are each paid (at most) a common price r, then requester j can fund at most ⌊B_j/r⌋ workers monetarily and at most K_j workers environmentally; the binding cap is their minimum. This “dual budget” structure is what will later allow us to integrate sustainability caps into price-based procurement without altering the threshold-payment logic.

Workers are quasi-linear. If worker i is hired by some requester, it incurs cost c_i once; otherwise it incurs no cost. Thus utility is
u_i(b) = p_i(b) − c_i ⋅ 1{∃j: x_ij(b) = 1}.
We target two standard economic properties. First, : truthful reporting b_i = c_i should maximize u_i for each worker regardless of others’ reports. Second, : under truthful reporting, selected workers receive nonnegative utility. These are natural for platform procurement in which workers are one-shot participants and cannot be assumed to play Bayes-Nash equilibria.

The platform seeks to maximize requester-side value from recruited participants subject to feasibility. In the simplest “throughput” formulation, value is the total number of recruited workers, i.e.,
max ∑_i, jx_ij.
When quality weights are used, the objective becomes additive in v_i:
max ∑_i, jv_ix_ij.
We stress that v_i is taken as observable and non-strategic in the baseline; strategic behavior enters only through b_i. This aligns with many deployed systems that treat reputation or device class as platform-determined while eliciting (explicitly or implicitly) a compensation requirement from participants.

The constant-and-verifiable energy assumption is both a strength and a limitation. It is a strength because it lets us treat sustainability as a hard per-requester hire cap K_j, which composes cleanly with compatibility caps and supports polynomial-time allocation. It is a limitation because real energy footprints vary across devices and network conditions. Our interpretation is that this assumption is plausible when the platform standardizes training workloads (so that per-participant energy is approximately fixed) and audits consumption; when this is not feasible, one must confront knapsack-like feasibility and the resulting computational and incentive complications. We return to this issue explicitly as an extension.

4. Baselines and Impossibility Intuition: why naive reductions (combine budgets, greedy by b_i/v_i) can violate carbon feasibility or break truthfulness; identify where multi-constraint IC becomes hard if energy is private.

Before introducing our mechanism, it is useful to understand why a number of seemingly natural reductions’’ either fail feasibility (especially the hard carbon cap) or fail incentive compatibility once we insist on \emph{ex post} guarantees. The overarching lesson is that dual budgets are not just a matter of adding one more constraint: they change the geometry of what aprice-based’’ procurement step can safely do, and they expose where single-parameter truthfulness is fragile.

A first impulse—common in policy discussions—is to convert carbon into dollars via an internal carbon price λ and define an ``effective’’ monetary budget B̃_j := B_j + λE_j (or, dually, to treat E_j as a soft penalty in the objective). One then runs a standard budget-feasible procurement procedure under B̃_j and ignores carbon thereafter.

This is conceptually appealing but mismatched to our premise that E_j is a compliance constraint. If carbon is a hard cap, then feasibility must hold for realized allocation: no choice of λ can guarantee that a dollar-feasible outcome is also carbon-feasible, because the mapping from dollars to energy depends on the realized set of hires. In our baseline model, where each hire consumes verifiable energy e, carbon-feasibility is simply |S_j| ≤ K_j. Any reduction that does not explicitly enforce |S_j| ≤ K_j risks selecting |S_j| > K_j whenever B̃_j is large relative to B_j or when λ is tuned too low. Conversely, tuning λ to be ``conservative’’ can force the mechanism to under-hire even when abundant monetary budget is available, which is precisely the kind of systematic inefficiency operators wish to avoid when E_j is slack.

More subtly, a Lagrangian-style collapse also undermines the interpretability of guarantees: it may yield carbon feasibility only ``on average’’ (or only for the design point at which λ was calibrated). In the operational settings we have in mind—auditability, reporting, and non-transferable sustainability allocations—this is typically not acceptable.

A second common engineering pattern is sequential feasibility: compute a money-feasible allocation and payments (e.g., by a budget-feasible auction or by a PEA-style critical price for B_j), and then, if |S_j| > K_j, discard some winners until |S_j| = K_j. The truncation can be drop highest bids,’’drop lowest quality,’’ or some ad hoc fairness rule.

The feasibility problem is immediate: once we start removing winners, the original payment computation is no longer coherent. If the original payments were computed as thresholds for the money-only allocation, they need not be valid thresholds for the truncated allocation. In particular, a worker can face a discontinuity where (i) they would have been selected and paid under the money-only rule, but (ii) they are removed by the carbon truncation for reasons unrelated to their own bid. This breaks the monotonicity logic that underwrites DSIC in single-parameter environments: even if the money-only allocation is monotone in b_i, the composition ``money-only selection → carbon truncation’’ need not be. Intuitively, when carbon binds, selection depends on which workers happened to be selected, so a worker’s probability of surviving truncation can move in the wrong direction as they change their bid (through how their bid affects the set of provisional winners and the truncation rule’s tie situations).

The budget-feasibility problem is equally severe if one attempts to restore DSIC by paying threshold bids after truncation. Thresholds in the truncated problem are typically than in the money-only problem because the truncation induces stronger competition for fewer slots. Paying these higher thresholds can easily violate ∑_ip_ij ≤ B_j unless the allocation rule was designed with a uniform price cap in mind. Put differently: ``truncate after the fact’’ is not merely a feasibility patch; it changes the market-clearing price logic, and reintroduces exactly the budget-feasibility issues that budget-feasible mechanisms work hard to avoid.

A third baseline, ubiquitous in FL systems, is to sort workers by a cost-per-quality score (e.g., b_i/v_i) and greedily accept workers while constraints remain. This can look compelling because it resembles a knapsack heuristic and because it often performs well in unconstrained simulations.

From a mechanism-design perspective, however, two obstacles appear.

If we pay each selected worker their bid (a common ``reverse auction’’ engineering choice), then workers optimally inflate bids up to the point of being rejected. This is not a minor technicality: the selection rule itself encourages strategic overstatement, and the resulting equilibrium can be arbitrarily inefficient.

In single-parameter settings, any monotone allocation can be paired with threshold payments to obtain DSIC. Greedy rules in constrained matching environments are often monotone once we incorporate unit-supply across requesters, compatibility caps, and the fact that requesters have different binding constraints at different prices. Even when a greedy rule is monotone, the induced threshold payments can exceed budgets because the threshold that keeps worker i in the selected set can be determined by a different constraint than the one that was binding at the realized bids. This is the procurement analogue of why naive thresholding in budgeted environments can explode payments: the marginal excluded competitor can sit at a price that the requester cannot afford for all accepted slots.

The deeper point is that our constraints are not ``one-dimensional.’’ A greedy rule that is locally sensible for money alone (or carbon alone) can become globally non-monotone once the feasibility region is shaped by both ⌊B_j/r⌋ and K_j, and further intersected with the group caps τ_ℓj. In practice, platforms often respond by hard-coding tie-breaking rules, post-processing allocations, or adjusting payments ex post. Each of these steps is precisely where DSIC typically fails.

Another plausible reduction is to treat each requester independently: run a budget-feasible procurement for each j (respecting that requester’s B_j and K_j), and then resolve conflicts when the same worker is selected by multiple requesters (since workers are unit-supply). Conflict resolution might assign the worker to the highest-paying requester, the requester with the tightest deadline, or simply the earliest requester in a fixed order.

The problem is that the conflict-resolution stage couples the auctions in a way that is generally incentive-compatible. A worker’s chance of being finally allocated depends not only on their own bid relative to others in one requester’s auction, but also on whether they are simultaneously attractive elsewhere and on how the platform breaks ties across requesters. Thus, even if each per-requester auction is DSIC in isolation, their composition with an ex post conflict rule typically is not DSIC globally. From a feasibility standpoint, per-requester designs also struggle to respect group caps τ_ℓj when those caps are derived from shared operational resources and must be enforced at the platform level with a consistent accounting.

Up to this point, our criticism has been about naive . There is also a genuine lurking behind the scenes: if energy usage is not constant-and-verifiable, the problem ceases to be cleanly single-parameter.

Suppose instead each worker has an energy footprint e_i that is private information (or manipulable) and enters feasibility as ∑_{i ∈ S_j}e_i ≤ E_j. A worker’s type becomes (c_i, e_i), which is multi-dimensional. At that point, the canonical ``monotone allocation + threshold payment’’ recipe no longer applies: DSIC in multi-parameter domains requires much stronger structural conditions (cycle-monotonicity), and simple posted-price or threshold constructions can fail even when the allocation problem were easy computationally.

Computationally, heterogeneous energy also transforms carbon feasibility into a knapsack constraint, and the allocation problem becomes NP-hard even without strategic behavior. This creates a three-way tension that is hard to escape simultaneously: (i) we would like exact or near-exact welfare (or value) maximization, (ii) we would like DSIC, and (iii) we would like polynomial-time computation and budget feasibility. Standard tools each break somewhere: VCG-style mechanisms demand exact welfare maximization and do not naturally respect hard budgets; knapsack approximations can be made truthful only with careful maximal-in-range restrictions or randomization; and any reliance on reported e_i threatens feasibility if reports are not verifiable.

This is precisely why we built the baseline model around a verifiable constant e. It is not that heterogeneous energy is unimportant—it is operationally central—but rather that, without verifiability, the sustainability constraint is no longer ``just another cap.’’ It becomes both a computational bottleneck and an incentive bottleneck.

These baselines clarify what we need from a workable mechanism in our setting. We need an allocation rule that (a) enforces carbon feasibility as a hard cap (rather than through pricing or truncation), (b) preserves single-parameter incentive structure by keeping the sustainability constraint bid-independent, and (c) controls payments in a way that is compatible with hard monetary budgets even in the presence of compatibility caps and multiple requesters. The next section builds exactly this: a dual-budget, virtual-price extension of PEA in which both budgets enter through a unified ``employability at price r’’ notion, while feasibility under τ_ℓj is handled algorithmically via flow.

5. Dual-Budget Virtual-Price Mechanism (DB-PEA): define employability under a price r with both money and carbon caps; compute critical price; flow-based allocation; deterministic tie-breaking; threshold payment computation.

We now describe the mechanism that resolves the tension highlighted above: we want a procurement rule (so that DSIC can be obtained via monotonicity and thresholds), while simultaneously enforcing two feasibility systems—monetary budgets and carbon budgets—together with platform-level compatibility caps. The key design move is to treat carbon not as a priced externality, but as a that is folded into the same ``how many hires are affordable at price r’’ accounting that drives price-based procurement.

In our baseline model, each selected worker consumes a verifiable and constant amount of energy e > 0 per training contract. Therefore the carbon/energy constraint for requester j,
e∑_ix_ij ≤ E_j,
is equivalent to a
$$ \sum_i x_{ij} \le K_j := \Big\lfloor \frac{E_j}{e}\Big\rfloor. $$
This reduction is not merely notational: it is the sense in which the sustainability constraint can be enforced and . From here onward, carbon enters the mechanism only through K_j.

As in price-equilibrating procurement, we consider posted-price outcomes indexed by a per-worker price r. At price r, requester j can afford at most ⌊B_j/r⌋ workers under money, and at most K_j workers under carbon. We therefore define the as
$$ \kappa_j(r) \;:=\; \min\Big\{\big\lfloor \tfrac{B_j}{r}\big\rfloor,\; K_j\Big\}. $$
Summing across requesters yields the (i.e., the total number of worker-slots the demand side can fund and certify at price r):
$$ E(r) \;:=\; \sum_{j\in A} \kappa_j(r) \;=\; \sum_{j\in A}\min\Big\{\big\lfloor \tfrac{B_j}{r}\big\rfloor,\; K_j\Big\}. $$
On the supply side, workers with bids at most r are available:
S(r) := { i ∈ S : b_i ≤ r }.
Two features are worth emphasizing for intuition. First, E(r) is entirely determined by budgets and r; bids affect only which workers are in S(r). Second, carbon affects E(r) only by truncating each requester’s money-based employability at K_j. When K_j is slack (relative to ⌊B_j/r⌋), the mechanism reduces to its money-only counterpart; when K_j is tight, it becomes a hard limit on how many workers the requester may ever hire, regardless of monetary slack.

We restrict attention to a finite
$$ R_b \;:=\; \Big\{\frac{B_j}{t} : j\in A,\ t\in\{1,\dots,K_j\}\Big\}, $$
which is exactly the set of prices at which some requester’s money-based capacity ⌊B_j/r⌋ can change while remaining relevant under the carbon cap (there is no point considering t > K_j since carbon forbids more than K_j hires anyway). Operationally, R_b is a discretization induced by budgets; it will let us compute the critical price by scanning a polynomial-size list.

Even if the demand side can employ E(r) workers in aggregate, the platform may be unable to realize E(r) hires because of unit-supply (each worker can serve at most one requester) and compatibility constraints. For each incompatibility group G_ℓ and requester j, we require
∑_{i ∈ G_ℓ}x_ij ≤ τ_ℓj.
At a fixed price r, we therefore define the (for the cardinality objective; the v_i-weighted variant is discussed below) as:
OSP(r): max ∑_{i ∈ S(r), j ∈ A}x_ij
subject to (i) unit-supply ∑_jx_ij ≤ 1 for all i, (ii) compatibility caps ∑_{i ∈ G_ℓ}x_ij ≤ τ_ℓj for all (ℓ, j), and (iii) per-requester dual-budget caps ∑_ix_ij ≤ κ_j(r) for all j. Let M_f(r) denote the optimal objective value of OSP(r), i.e., the maximum number of workers we can actually allocate at price r given all hard constraints.

Computationally, OSP(r) is a capacitated matching problem with a natural max-flow reduction. One convenient construction is the following network:

A maximum s–t flow in this network corresponds to a feasible allocation, and its value equals the number of assigned workers. Thus M_f(r) can be computed in polynomial time for each r.

The purpose of the critical price is to find the smallest uniform price at which the platform can realize all employable slots. Formally, we seek a price r^* ∈ R_b at which
M_f(r^*) = E(r^*),
and we take r^* to be the smallest such price in R_b (ties broken deterministically). Intuitively, as we raise r, each requester’s money-based capacity ⌊B_j/r⌋ weakly decreases and is always truncated by K_j, so employability E(r) weakly decreases. Meanwhile, higher r expands the available worker set S(r). When r is very low, S(r) can be too small (or too ``imbalanced’’ across groups) to fill all employable slots, so M_f(r) < E(r). When r is high enough, the platform can fill whatever employability remains, and we reach equality. The smallest price at which equality holds is the point at which supply-side feasibility (under compatibility) meets demand-side employability (under dual budgets).

Algorithmically, DB-PEA computes r^* by sorting R_b in increasing order, and for each candidate r:
(i) computing κ_j(r) and E(r), (ii) solving OSP(r) via max-flow to obtain M_f(r), and (iii) stopping at the first r with M_f(r) = E(r). Because |R_b| = ∑_jK_j and each max-flow is polynomial, this entire step is polynomial.

At the critical price r^*, we compute a feasible allocation of size M_f(r^*) by solving OSP(r^*). Since max-flow may have multiple optimal solutions, we fix a deterministic tie-breaking rule to make the allocation . This is not cosmetic: DSIC payments are defined via thresholds, which require that the allocation be well-defined for every bid profile.

A clean approach is to impose a lexicographic priority ordering over workers that is consistent with bids (e.g., sort workers by increasing b_i and then by a fixed identifier), and then choose, among all maximum-cardinality feasible allocations, the one that is minimum-lex in that order. Practically, we implement this by converting the max-flow to a min-cost max-flow: we first maximize flow value, and subject to that, minimize a cost that assigns each worker an infinitesimal cost according to the priority ordering (equivalently, use integer costs with sufficiently separated magnitudes). This yields a unique allocation while preserving feasibility and the critical-price logic.

The output of this step is the allocation matrix (x_ij), equivalently the assigned sets S_j = {i : x_ij = 1}, satisfying all constraints by construction:
∑_ix_ij ≤ κ_j(r^*) ≤ K_j, ∑_{i ∈ G_ℓ}x_ij ≤ τ_ℓj.

Finally, we compute payments so that truthful bidding is a dominant strategy. Given a deterministic monotone allocation rule, the canonical payment is each winner’s : the largest bid at which the worker would still be selected holding others’ bids fixed.

Concretely, for each worker i who is allocated under the above rule, define
θ_i := sup { z ≥ 0 : i is selected by DB-PEA when b_i = z and b_−i fixed }.
We then pay p_i = θ_i (and p_i = 0 for non-winners), with the payment routed from the requester j to whom i is assigned, i.e., p_ij = p_i if x_ij = 1 and 0 otherwise. Because only prices in R_b can change capacities, and because the worker’s inclusion depends only on whether b_i ≤ r for candidate prices and on deterministic tie-breaking, θ_i can be computed by searching over a finite set (e.g., by scanning or binary searching over R_b while recomputing the induced outcome with b_i set to the candidate). This yields polynomial-time payment computation.

A useful operational interpretation is that DB-PEA behaves like a posted-price market-clearing rule at a bid-dependent price level, but it pays each winner the posted price at which they become pivotal for feasibility under compatibility and dual budgets. The next section formalizes that this payment is exactly what restores DSIC while keeping payments within requesters’ hard monetary budgets.

Our core mechanism description above maximizes the number of hires (equivalently, takes v_i ≡ 1). If the platform instead wants to maximize ∑_i, jv_ix_ij with publicly observable v_i, we can incorporate v_i into the tie-breaking/selection objective at r^* by solving a max-cost flow (or min-cost with negative costs) on the same network. Because v_i is not privately controlled by the worker, this refinement changes welfare but does not create new strategic dimensions; the incentive analysis continues to pivot on monotonicity in b_i and the existence of thresholds. We return to the formal guarantees (and to the approximation dependence on compatibility tightness) in the next section.

6. Theoretical Guarantees: DSIC + ex post IR; budget feasibility (money + carbon); computational complexity via max-flow; approximation bound in terms of ; discussion of dependence on carbon tightness K_j.

We now formalize what DB-PEA buys us, conceptually and technically: a procurement environment with hard budget systems (money and carbon) and a compatibility system. The main point is that carbon enters the mechanism only through the bid-independent hire caps K_j, so the incentive and feasibility arguments largely decouple: incentives hinge on monotonicity in bids, while feasibility is enforced by capacities in the allocation subproblem.

DB-PEA is a deterministic, direct-revelation mechanism in a single-parameter domain (each worker has a private cost c_i and submits a bid b_i). The standard route to dominant-strategy truthfulness is therefore to (i) ensure the allocation rule is in each agent’s bid, and (ii) pay payments.

We rely on the fact that bids affect the mechanism only through in the subproblems OSP(r): worker i is present in S(r) iff b_i ≤ r. Lowering b_i can only add i to more sets S(r) (never remove it), and does not change any requester’s capacities κ_j(r) = min {⌊B_j/r⌋, K_j} or any compatibility caps τ_ℓj. Hence, for every candidate virtual price r ∈ R_b, the feasible region of OSP(r) weakly expands when b_i decreases.

The only remaining subtlety is : max-flow can admit multiple optimal allocations. We therefore fix a deterministic, bid-consistent tie-break (e.g., min-lex relative to a priority order that refines increasing bids by a fixed id). Under such a rule, when i lowers its bid it cannot lose to a higher-bid worker in the priority order; thus selection is monotone. Given monotonicity and determinism, threshold payments θ_i implement DSIC (the worker’s utility is maximized by bidding truthfully) and ex post IR (a truthful winner is paid at least its cost, while a non-winner pays/receives nothing). Formally, for truthful reporting b_i = c_i, we have u_i = p_i − c_i ≥ 0 for winners because θ_i ≥ c_i by definition of the threshold, and u_i = 0 for non-winners.

The mechanism must satisfy hard systems simultaneously: (i) requester monetary budgets, (ii) requester carbon budgets, and (iii) compatibility caps. Compatibility is enforced directly by the flow construction, so it holds by construction. Carbon and money feasibility require slightly more care because payments are individualized thresholds rather than a single posted price.

∑_ip_ij ≤ B_j and e∑_ix_ij ≤ E_j ⇔ ∑_ix_ij ≤ K_j,

Carbon feasibility is immediate: the allocation enforces ∑_ix_ij ≤ κ_j(r^*) ≤ K_j for every requester j, and K_j = ⌊E_j/e⌋ is exactly the carbon cap under constant per-worker energy e.

Monetary feasibility uses two ingredients. First, DB-PEA never assigns more than ⌊B_j/r^*⌋ workers to requester j because ∑_ix_ij ≤ κ_j(r^*) ≤ ⌊B_j/r^*⌋. Second, each winning worker’s threshold payment is capped by the critical price: θ_i ≤ r^* (either as a property of the threshold under the critical-price construction, or explicitly by paying min {θ_i, r^*}, which preserves DSIC because the cap is bid-independent from the worker’s perspective once r^* is fixed). Therefore,
$$ \sum_i p_{ij} =\sum_{i:x_{ij}=1}\theta_i \le \sum_{i:x_{ij}=1} r^* = r^*\cdot |S_j| \le r^*\cdot \Big\lfloor \frac{B_j}{r^*}\Big\rfloor \le B_j. $$
The same logic also clarifies why the mechanism treats carbon as a quantity cap rather than a price: since K_j affects only slots requester j may fill, it does not introduce any new payment externalities that could jeopardize budget balance.

DB-PEA is computationally tractable precisely because, under constant e, carbon feasibility reduces to a cardinality constraint per requester. This keeps the allocation subproblem within the scope of max-flow (or b-matching).

The dependence on ∑_jK_j is economically natural: a tighter carbon regime (smaller K_j) reduces the virtual price set and typically lowers computational burden, while a very slack carbon regime reverts to the money-only setting. In practice, one can often accelerate computation further by reusing residual graphs across nearby r values, but such engineering is orthogonal to the theoretical tractability claim.

We next summarize the welfare guarantee, stated in the simplest cardinality form (v_i ≡ 1). Let OPT denote the maximum number of workers that can be feasibly allocated under budgets and compatibility constraints when the platform knows true costs. Let ALG be DB-PEA’s number of allocated workers.

$$ \textsf{ALG}\ \ge\ \frac{1}{2\alpha+1}\ \textsf{OPT}, \qquad \alpha:=\min\Big\{m,\ \max_{\ell,j}\Big\lceil \frac{|G_\ell|}{\tau_{\ell j}}\Big\rceil\Big\}. $$

The parameter α captures how constraining the compatibility system is relative to the partition sizes. When τ_ℓj is large (weak incompatibilities), α is small and the approximation is strong; when some group-requester cap is extremely tight relative to the group size, α grows and any uniform-price, monotone rule can be forced to leave feasible value on the table due to bottlenecks. Importantly, α is of carbon: it depends only on how the interference/congestion structure restricts routing workers to requesters.

Carbon enters only via the per-requester caps K_j and hence only shrinks the feasible set for both OPT and ALG. The approximation statement is relative to OPT , so tightening carbon budgets reduces the absolute achievable allocation but does not introduce new combinatorial pathologies beyond restricting the region. Put differently, DB-PEA’s loss relative to the full-information optimum is controlled by compatibility bottlenecks (through α) rather than by the mere presence of a second (quantity) budget.

Although the approximation is stable, carbon does shift the market-clearing logic and therefore the realized allocation size. Since K_j = ⌊E_j/e⌋, tighter carbon budgets weakly reduce each κ_j(r) and hence the employability curve E(r) = ∑_jκ_j(r). Economically, that is a demand contraction: at any candidate price r, fewer carbon-certified slots exist. Mechanically, this has two implications:

From a policy perspective, this monotone effect is desirable: stricter carbon limits should not be undermined by strategic bidding. In DB-PEA they are not, because K_j is exogenous and enforced directly in the allocation.

If the platform’s objective is ∑_i, jv_ix_ij with publicly observable v_i, the same incentive and feasibility arguments go through so long as the allocation rule remains monotone in bids (e.g., maximize total weight the critical-price capacity constraints, with deterministic tie-breaking). What can change is the approximation statement: bounding performance relative to the true weighted optimum typically requires additional structure (e.g., bucketing v_i values and applying the PEA/CARE analysis per bucket), which may introduce an additional logarithmic factor. We flag this as a welfare-refinement rather than an incentive or feasibility complication: the carbon and budget logic remains unchanged.

7. Extensions (Flagged): heterogeneous energy e_i (few classes) and stochastic energy; auditing and penalties; interference graph instead of group caps; note where numerical methods/DP are required and how guarantees weaken.

Our baseline analysis leans on a deliberately strong simplification: per-contract energy is constant and verifiable (e), so the carbon budget E_j reduces to a hard cardinality cap K_j = ⌊E_j/e⌋. This is not only analytically convenient; it is also the knife-edge that keeps the allocation subproblem in max-flow territory and allows carbon feasibility to be enforced without entering the payment logic. In practice, however, energy footprints vary across devices, training workloads, and network conditions, and interference is rarely a clean partition into groups. We therefore flag several extensions that are economically natural but technically more delicate, and we indicate (i) where numerical methods or dynamic programming become unavoidable, and (ii) how incentive/approximation guarantees typically weaken.

Suppose each worker i has a publicly known (or auditable) energy footprint e_i > 0 per contract, while budgets remain (B_j, E_j). Carbon feasibility for requester j becomes
∑_ie_ix_ij ≤ E_j,
which is a knapsack constraint even when v_i ≡ 1. The allocation problem at a fixed virtual price r is no longer a flow with simple capacities: each requester now faces a resource constraint in as well as a slot constraint in (through ⌊B_j/r⌋), and the compatibility constraints couple requesters through shared groups. This intersection typically destroys total unimodularity and makes exact optimization NP-hard (a standard reduction embeds knapsack by taking a single requester, no compatibility constraints, and asking for a maximum-cardinality subset under ∑_ie_ix_ij ≤ E_j).

Two practical approaches are worth separating.

If e_i take values in a small set (or can be coarsened into H bins), we can treat energy as constant within each class. Concretely, let classes h ∈ {1, …, H} have representative energies ē_h and let S^(h)(r) = {i ∈ S(r) : e_i ∈ class h}. A simple implementation is a sequential fill: for each requester j, allocate across classes subject to per-class pseudo-caps K_jh such that ∑_hē_hK_jh ≤ E_j and K_jh ∈ ℤ₊. Conditional on these integer pseudo-caps, the within-class assignment reverts to the baseline max-flow form with requester capacity ∑_hK_jh and group caps τ_ℓj, and we can search over (K_jh) via a small DP per requester when H is small.

The economic price of discretization is twofold. First, feasibility becomes approximate: rounding e_i to ē_h requires either a conservative choice (guaranteeing ∑_ie_ix_ij ≤ E_j but potentially under-allocating) or an ex post correction step (dropping some winners when realized energy exceeds the budget). Second, the clean constant-factor welfare bound can pick up an additional factor that depends on the coarsening error or on H (e.g., via a loss term from conservative rounding).

Holding r fixed, one can decouple across requesters only after accounting for compatibility. A pragmatic compromise is to keep the routing structure (compatibility) in a flow-like object, but to replace the requester capacity edge by a knapsack subroutine. For example, for each requester j and each group ℓ, we know we can accept at most τ_ℓj workers from G_ℓ; selecting which workers to take among the available ones then becomes a knapsack variant when workers are partitioned by groups, or a general knapsack if not. This is where numerical methods enter: pseudo-polynomial DP in E_j (after scaling) or an FPTAS is natural, but it replaces polynomial-time max-flow with a complexity that is polynomial in n and 1/ε but depends on scaled budgets.

On incentives, the lesson is that . If we implement an approximate allocation rule, DSIC is no longer automatic under threshold payments unless the approximation is itself monotone in bids. A standard way to retain truthful behavior is to restrict the mechanism to a family: we precommit to a finite set of feasible allocations (e.g., those representable by a discretized DP), select the best in that range at price r^*, and compute thresholds within the range. This restores DSIC but typically weakens welfare by an additional approximation factor from the restricted range. Alternatively, one can move to by energy class (a menu of per-class prices π_h), which preserves strategyproofness by construction but generally yields weaker approximation guarantees than critical-price PEA-style selection.

Even if each worker has a nominal footprint ê_i, realized energy usage can be random due to device throttling, network retries, or workload heterogeneity. If requester j faces a risk constraint
Pr (∑_ie_ix_ij ≤ E_j) ≥ 1 − η_j,
then carbon feasibility is inherently probabilistic. There are two conceptually different modeling choices.

If e_i are independent with known means/variances (from telemetry), we can replace the chance constraint by a deterministic sufficient condition using concentration bounds, e.g.,
$$ \sum_i \mathbb{E}[e_i]x_{ij} \ +\ z_{\eta_j}\sqrt{\sum_i \mathrm{Var}(e_i)x_{ij}}\ \le\ E_j, $$
or a simpler robust bound ∑_i(𝔼[e_i] + λ_i)x_ij ≤ E_j. This restores a deterministic knapsack-type constraint with modified weights, so the computational story reduces to the heterogeneous-energy case above. Welfare is reduced by the conservatism margin; importantly, incentives remain well-behaved because the safety margin is computed from bid-independent primitives (telemetry estimates), not from bids.

Alternatively, we can allow occasional budget overruns but enforce ex post corrections: if realized energy exceeds E_j, the platform can (i) drop some training jobs, (ii) delay them to a later carbon window, or (iii) charge carbon penalties/tokens. This moves the model closer to a stochastic scheduling problem; DB-PEA’s hard feasibility guarantee becomes an or statement depending on the chosen recourse. Mechanism-theoretically, recourse must be specified carefully: if workers anticipate that high-energy outcomes trigger job cancellation, they may face ex post risk that breaks ex post IR unless compensated (e.g., with partial payments upon attempted participation).

Our baseline treats energy as verifiable. If it is not, workers could under-report energy-related attributes or deviate during training (e.g., by running extra local epochs) in ways that increase carbon usage. This is not a single-parameter cost report problem; it is a moral-hazard problem that requires monitoring and enforcement.

A minimal extension is : the platform measures realized energy ẽ_i and charges a penalty if ẽ_i exceeds a declared or contracted level. For example, worker i posts a refundable deposit d_i and pays a penalty ϕ(ẽ_i − ē) if over budget. To preserve the incentive logic on bids b_i, we want the penalty rule to be independent of the bid ; then the worker’s strategic choice over b_i remains one-dimensional (win/lose) and threshold-style DSIC can survive. However, ex post IR becomes contingent: if penalties can exceed payments, a truthful low-cost worker could obtain negative utility after an adverse realization. This is not a bug but a modeling choice: either (a) we accept risk-bearing by workers, or (b) we design into the contract (e.g., cap penalties and correspondingly tighten carbon budgets conservatively). From a policy perspective, this highlights a real tradeoff: strict carbon compliance with noisy measurements typically requires either slack (fewer hires) or explicit risk transfers.

A different enforcement route is to impose (e.g., OS-level energy quotas) so that deviations cannot violate carbon constraints. This restores hard feasibility but may reduce model accuracy if such controls affect learning quality; economically, this is an additional non-monetary constraint that can be folded into compatibility/eligibility (workers whose devices cannot enforce caps are not in S(r)).

The partition structure {G_ℓ}_ℓ = 1^L implies a laminar, matroid-like feasibility: each worker belongs to exactly one group, and we impose per-(group,requester) caps τ_ℓj. Many wireless and systems interference models are not partitions. A more realistic abstraction is a graph H = (S, ℰ) where an edge (i, i^′) indicates that workers i and i^′ cannot be assigned to the requester (or more generally, that assigning both induces failure probability above δ). Then requester j must choose an among assigned workers, subject to money/carbon constraints. Even at a fixed price r, maximizing hires with an interference graph is a maximum independent set problem and is NP-hard in general.

There are three ways we can proceed, each with distinct guarantee profiles.

If H is bipartite with bounded degree, or if interference is captured by a collection of cliques with bounded overlap (a ``conflict hypergraph’’ with small rank), then we can replace the graph by a manageable family of packing constraints (clique inequalities) and recover an approximate flow/LP formulation. This is the closest analogue to our original group caps: we are effectively learning a small set of bottleneck constraints and treating them as τ-caps.

Without special structure, we can use greedy or local-ratio algorithms to obtain constant-factor approximations for independent-set-like constraints under additional assumptions (e.g., bounded inductive independence). Embedding such approximations inside a truthful mechanism is nontrivial: many natural greedy rules are not monotone in bids. A common resolution is to use or schemes that yield truthfulness in expectation, at the cost of weaker welfare and feasibility guarantees (typically in expectation, and with approximation constants depending on graph parameters).

Practically, one may estimate congestion probabilities q(⋅) on neighborhoods and derive caps τ_ℓj by clustering the interference graph into groups (graph partitioning) or by identifying high-conflict communities. This sacrifices modeling fidelity but restores the polynomial-time and DSIC properties of DB-PEA. The loss is then interpretability: α and our approximation bound are with respect to the constraint system, not the original graph, so the guarantee is only as meaningful as the calibration quality.

Across all extensions, the common pattern is that once carbon (or interference) stops being representable as simple capacities, the mechanism must trade among three desiderata: (1) exact feasibility, (2) polynomial-time computation, and (3) strong (deterministic) DSIC with clean constant-factor welfare. Under heterogeneous/stochastic energy or general interference graphs, we can typically keep of these at once, but not all three without additional structure. Our baseline should therefore be read as a transparent, policy-relevant benchmark: it isolates the consequences of adding a (carbon) without conflating them with the computational and incentive complications that arise when measurement, heterogeneity, and interference are fully general.

8. Experiments (Plan): simulation with carbon caps, congestion caps, and FL training on CIFAR/FMNIST; compare to CARE-style single-budget mechanisms and carbon-agnostic heuristics; report accuracy-per-dollar and accuracy-per-carbon.

Our theoretical results isolate a clean benchmark: when per-contract energy is constant and verifiable, a hard carbon budget is operationally a cardinality cap, and the platform can incorporate it without entangling feasibility with incentives. The goal of our experiments is to (i) stress-test this benchmark under realistic heterogeneity in data quality and interference, (ii) quantify the welfare–learning tradeoffs induced by the carbon cap K_j = ⌊E_j/e⌋ and compatibility caps τ_ℓj, and (iii) compare DB-PEA to natural baselines used in FL procurement (single-budget mechanisms and carbon-agnostic heuristics). We emphasize outcomes that are legible to practitioners and policy audiences: and , alongside feasibility and runtime.

We organize the evaluation around four questions. (Q1) Our hypothesis is that carbon-agnostic rules may over-hire (or select a low-price set) that is feasible monetarily but infeasible in carbon, requiring ad hoc truncation that harms accuracy. (Q2) Here we expect the approximation loss to track the tightness proxy α, but we want to see the empirical magnitude of this loss under realistic group imbalance. (Q3) We expect monotone comparative statics: larger B_j or E_j weakly increase hires and learning performance until compatibility binds. (Q4) We measure max-flow runtime, payment levels, and the extent to which computed allocations saturate budgets versus leaving slack.

For each run, we record: (i) learning outcomes: final test accuracy, area-under-curve (AUC) for accuracy vs. communication rounds, and stability (variance over random seeds); (ii) economic outcomes: total payments ∑_i, jp_ij, per-requester spend ∑_ip_ij, winner utilities u_i = p_i − c_i under truthful bids, and the distribution of payments across groups; (iii) resource outcomes: total carbon usage per requester e|S_j| and utilization ratio e|S_j|/E_j; (iv) efficiency ratios,
$$ \text{Acc/\$} := \frac{\text{final accuracy}}{\sum_{i,j} p_{ij}},\qquad \text{Acc/CO}_2 := \frac{\text{final accuracy}}{\sum_j e|S_j|}, $$
and analogues based on AUC; and (v) feasibility/robustness: frequency of violations (should be zero for DB-PEA by construction), slack in (BM)/(BC)/(COMP), and runtime per instance as a function of n, m, L.

We generate synthetic procurement instances to cover a wide range of regimes while keeping the mechanism inputs transparent. Workers i ∈ S have costs c_i drawn from distributions that can be tuned to represent competitive or scarce markets (e.g., lognormal or mixture models). Bids are truthful in the baseline (b_i = c_i) to evaluate allocative performance; we additionally include a stress test where a random fraction of workers shade bids (e.g., b_i = γc_i with γ > 1) to empirically verify that outcomes track the DSIC prediction (allocation monotonicity and nonnegative utility for truthful participants). When we optimize weighted value, we set observable weights v_i using a proxy for expected contribution to learning (described below); we also report the cardinality objective case v_i ≡ 1.

Requesters j ∈ A draw monetary budgets B_j and energy budgets E_j from controlled ranges. We vary the ratio E_j/B_j to span three regimes: money-binding, carbon-binding, and jointly binding. Per-contract energy is held constant at e in the main experiments (consistent with the baseline model), so the requester’s carbon-limited hiring capacity is K_j = ⌊E_j/e⌋. To connect with practice, we calibrate e to represent a fixed FL training contract (e.g., a fixed number of local epochs and batch size), and we report results both in normalized ``energy units’’ and in a concrete mapping (e.g., Wh) in the appendix.

We generate incompatibility groups {G_ℓ}_ℓ = 1^L as a partition of workers. Group structure can represent shared wireless channels, colocation constraints, or correlated failure modes; empirically, we consider both balanced groups and highly imbalanced groups (heavy-tailed |G_ℓ|) because imbalance is where caps are most distortionary. We derive τ_ℓj from a chance constraint mapping τ_ℓj = max {k : q_ℓ(k) ≤ δ} using parametric families q_ℓ(k) that capture a range of congestion steepness (e.g., logistic or polynomial growth in k). We vary δ to move from permissive compatibility to tight interference constraints, and we report how outcomes covary with the implied α = min {m, max_ℓ, j⌈|G_ℓ|/τ_ℓj⌉}.

We implement DB-PEA exactly as specified: compute the virtual price set R_b = {B_j/t : t ∈ {1, …, K_j}}, find r^* such that E(r^*) = M_f(r^*), solve the max-flow instance for OSP(r^*), apply a deterministic tie-break (min-lex) to select winners, and pay threshold bids capped by r^*. For comparators, we include:

A direct analogue that ignores carbon by replacing caps with ⌊B_j/r⌋ only (effectively K_j = ∞) and then post-hoc truncates to satisfy carbon, either by (a) dropping the highest-payment winners until |S_j| ≤ K_j or (b) random truncation. This isolates whether incorporating carbon the critical-price step matters.

Greedy selection by lowest bid (or best v_i/b_i), subject to monetary budgets and compatibility, followed by truncation to meet carbon. We also test a ``carbon-first’’ heuristic that enforces |S_j| ≤ K_j first and then spends remaining money, to show that naive carbon handling can leave monetary slack and reduce learning quality.

Fixed per-worker posted price π (common or requester-specific), hiring all workers with b_i ≤ π up to feasibility. This is a simple, implementable benchmark and a useful foil for the role of the endogenous critical price r^*.

All methods are evaluated under identical compatibility and budget primitives; when a baseline produces an infeasible set, we apply the same deterministic truncation policy to restore feasibility so comparisons are meaningful.

To connect procurement decisions to learning outcomes, we run federated learning on standard benchmarks (FMNIST and CIFAR-10), using FedAvg with a fixed number of communication rounds and a fixed local training contract per selected worker (e.g., E local epochs, fixed batch size, fixed optimizer). Each selected worker corresponds to a client with local data. To reflect non-IID conditions, we assign data via a Dirichlet split with concentration parameter β (small β yields highly heterogeneous clients). We treat the requester’s objective as either (i) maximizing the number of hired clients (cardinality, v_i ≡ 1), or (ii) maximizing an additive proxy v_i that predicts the client’s marginal contribution (e.g., validation accuracy gain from a small pilot, or a historical reputation score). We then report FL accuracy as the downstream outcome and analyze how well v_i aligns with realized learning contribution.

We consider both single-requester and multi-requester FL. In the multi-requester setting, each requester trains an independent model (separate FL tasks) on its own selected clients, which cleanly matches the mechanism’s per-requester budgets and caps. This avoids confounding from cross-requester model coupling and allows direct interpretation of per-requester accuracy-per-resource.

In the baseline experiments, carbon consumption per selected worker is exactly e by design, so realized usage equals planned usage and (BC) holds deterministically. We nonetheless log device-side proxies (e.g., runtime, number of batches) to show that a constant-e abstraction is not vacuous: it corresponds to a contract that fixes training work and thus energy up to small noise. In a secondary robustness check (explicitly labeled as outside the baseline theorem), we introduce mild random perturbations ẽ_i = e(1 + ξ_i) to illustrate how a deterministic cap can be violated under measurement noise, and we compare conservative choices of K_j (e.g., using e(1 + ξ̄)) to ex post truncation. This robustness check is not used to claim feasibility guarantees; it is meant to quantify how much slack would be needed in practice.

We run systematic sweeps over: (i) carbon tightness via E_j (hence K_j), (ii) monetary budgets B_j, (iii) congestion tolerance δ (hence τ_ℓj), (iv) group imbalance (variance of |G_ℓ|), and (v) bid/cost dispersion. For each sweep, we plot allocation size, r^*, budget utilization, and learning metrics, with special attention to regimes where the binding constraint switches (money ↔︎ carbon ↔︎ compatibility). We also report how often baselines require truncation and how truncation degrades learning relative to carbon-aware selection.

Beyond ``wins/losses’’ on accuracy, the experiments are designed to make the mechanism’s economic logic visible. When carbon is tight, we expect DB-PEA to internalize the reduced employability E(r) = ∑_jmin {⌊B_j/r⌋, K_j} and select a set that is coherent with both budgets, leading to higher accuracy-per-carbon than carbon-agnostic methods that hire too many cheap workers and then discard them. When congestion caps tighten, we expect all methods to lose hires, but DB-PEA should degrade smoothly and avoid pathological concentration of winners in a few unconstrained groups (a failure mode of naive greedy rules). Finally, by reporting the full resource frontier (accuracy vs. dollars vs. carbon), we aim to provide an empirical bridge to the next section’s implementation discussion: how one might set E_j in practice, and what telemetry and auditing would be required for the constant-e abstraction to be a reliable contract.

9. Policy/Implementation Discussion: carbon accounting, telemetry verification, internal carbon token systems; guidance on setting E_j and mapping congestion chance constraints to _{j}.

The theoretical appeal of a hard carbon budget in our setting is that, when per-contract energy is constant and verifiable, carbon feasibility reduces to a simple per-requester hiring cap K_j = ⌊E_j/e⌋. The implementation question is therefore not whether we can under carbon constraints, but whether we can make the contractual abstraction—``each selected worker consumes e energy units’’—credible enough that (BC) is meaningful to both requesters and auditors. In practice, we view the mechanism as one component of a broader governance stack: (i) ex ante specification of a training contract, (ii) measurement and verification (M&V) that maps contract execution into energy units, and (iii) a compliance layer that translates energy into carbon reporting and internal controls.

A requester’s input E_j is best interpreted as an budget for a specific procurement episode (e.g., one FL training run or one model update cycle). To map organizational carbon commitments into E_j, a natural pipeline is: start from a carbon allotment in tCO₂e, convert to energy via a grid emissions factor, and then allocate across ML activities. Concretely, if η_t, loc denotes an emissions factor (e.g., kgCO₂e/kWh) indexed by time and location, then a carbon budget C̄_j corresponds to an energy budget
$$ E_j \;=\; \frac{\bar C_j}{\eta_{t,\mathrm{loc}}}\quad\text{(location-based accounting),} $$
or to a more conservative E_j using the maximum feasible η over the execution window (market-based accounting can be incorporated analogously if renewable certificates are used). This interpretation is operationally useful because the platform’s enforcement target is energy usage, while the requester’s reporting target is carbon. In settings where clients are geographically dispersed and η varies by location, we recommend either (i) normalizing to energy (our baseline) and performing the CO₂ conversion at reporting time, or (ii) defining multiple energy budgets by region and running separate procurements; the latter improves accounting fidelity but complicates the market design.

The constant-e assumption should be understood as a : fixed model architecture, fixed number of local steps/epochs, fixed batch size, fixed precision, and a specified device class (or a bounded set of device classes). Under such a contract, energy variation is not eliminated, but it becomes predictable enough that we can set e as an rather than an average. In implementation, we can define e using a conservative quantile:
e := Q_0.95(energy consumption per contract) ⇒ Pr (ẽ_i ≤ e) ≈ 0.95,
and then choose K_j = ⌊E_j/e⌋ to reduce the risk of ex post carbon overrun. This ``engineering margin’’ is the simplest way to preserve the hard-feasibility interpretation of (BC) when metering noise or device heterogeneity cannot be eliminated.

For measurement, the platform can rely on a hierarchy of M&V options, each with different assurance and cost. At one extreme, when workers are data centers or managed devices, direct power metering (PDUs, on-board energy counters) provides high assurance. For edge devices, M&V often relies on OS-level battery discharge models, CPU/GPU utilization counters, and runtime telemetry. To prevent strategic manipulation of reported telemetry, one can use signed measurement logs, remote attestation (e.g., TPM/TEE-based), and spot audits. Importantly, our incentive results do not require workers to report energy; they only require that the platform and requesters agree on a verifiable contract so that e is a policy parameter rather than a private type.

Even with conservative e, deviations can occur (background load, thermal throttling, network retries). A practical enforcement design is to separate from . Selection is done using K_j = ⌊E_j/e⌋ to ensure planned feasibility. Settlement can incorporate (i) a small tolerance band (no adjustment within ±ϵ), (ii) penalties for repeated overages (temporary exclusion or lower priority), or (iii) make-good requirements where the requester reduces future E_j by the measured overage. We emphasize that if settlement rules are allowed to depend on worker bids, one must re-audit DSIC; the cleanest approach is to keep settlement dependent only on verifiable execution metrics, not on b_i.

Although our main mechanism enforces carbon through a hard cap K_j, many organizations prefer an internal ``token’’ model for governance: each requester receives a token endowment 𝒯_j denominated in energy units; each selected worker consumes e tokens; and tokens are debited automatically at execution time. Conceptually, this is equivalent to choosing E_j = 𝒯_j in our model, but the token framing adds two practical advantages. First, it supports : a central sustainability office can allocate and reallocate 𝒯_j over time based on priorities and observed learning returns. Second, it creates a natural audit trail: the platform can publish per-requester token debits alongside model training logs, simplifying internal controls and external reporting.

If tokens are made tradable across requesters (an internal cap-and-trade), then E_j becomes endogenous to organizational decisions. Our mechanism can accommodate this as an outer loop: requesters trade tokens, then run procurement with the resulting (B_j, E_j). We do not claim incentive properties for the token market itself; rather, we view it as a policy layer that determines budgets, while DB-PEA implements procurement conditional on those budgets.

We recommend treating E_j as an explicit knob that encodes an organization’s marginal rate of substitution between learning performance and carbon. Three implementation heuristics are common. (i) : set E_j proportional to historical spend or expected business value, updated quarterly. (ii) : choose E_j so that the implied hiring cap K_j hits a desired utilization rate (e.g., 80–95%) under typical markets; this reduces frequent binding and instability. (iii) : in early deployments, deliberately sweep E_j over a range and record accuracy-per-carbon; then select an operating point where marginal accuracy gains per additional energy flatten. The third approach aligns well with procurement governance because it produces a transparent empirical rationale for carbon budgets, rather than treating them as arbitrary constraints.

Compatibility caps are where engineering reliability constraints enter the allocation problem. Our modeling choice,
τ_ℓj := max {k ∈ ℤ₊ : q_ℓ(k) ≤ δ},
is deliberately modular: it says that if group ℓ experiences failure (or unacceptable degradation) with probability q_ℓ(k) when k workers share a constrained resource, then we hire at most τ_ℓj from that group for requester j to satisfy tolerance δ. This is implementable provided we can (i) define what constitutes ``failure’’ (dropout rate, packet loss, straggler probability, interference), and (ii) estimate q_ℓ(⋅) from logs or experiments.

In deployments, q_ℓ(k) is rarely known exactly; it is estimated with uncertainty and may drift over time. We therefore recommend computing caps using an upper confidence bound:
τ_ℓj := max {k: q̂_ℓ(k) + UCB_ℓ(k) ≤ δ},
where UCB_ℓ(k) is chosen to reflect sampling error and nonstationarity (e.g., using Hoeffding-style bounds or Bayesian credible intervals). This conservative capping is the congestion analogue of choosing a conservative e for carbon: it preserves feasibility guarantees (low failure probability) at the cost of potentially leaving performance on the table when the environment is better than expected.

The tolerance parameter δ is, in effect, a reliability policy. Smaller δ yields smaller τ_ℓj, tighter compatibility, and lower allocation sizes; larger δ permits more aggressive parallelism but increases the risk of FL underperformance due to dropouts or interference. Because the welfare loss from tightening caps is mediated by the tightness proxy α = min {m, max_ℓ, j⌈|G_ℓ|/τ_ℓj⌉}, we recommend setting δ at the : safety- or compliance-critical training runs should use stricter tolerances, while exploratory runs can tolerate higher failure probabilities. For transparency, the platform should publish (internally) the implied τ_ℓj and the empirical basis for q_ℓ(⋅); otherwise, stakeholders cannot distinguish an intentional reliability policy from ad hoc throttling.

A recurring governance failure mode in resource-constrained ML systems is the presence of ``hidden constraints’’ that are applied after the fact (e.g., manual truncation to meet energy limits or to avoid interference). Such truncation is precisely what our design aims to avoid, because it breaks the link between declared constraints and realized allocations. We therefore advocate an audit log that records, per requester and run: the declared (B_j, E_j), the derived caps K_j and τ_ℓj, the critical price r^*, the selected set S_j, and the realized energy and reliability metrics. This log is also where distributional concerns can be monitored: compatibility groups may correlate with geography or device class, and repeated binding of τ_ℓj could systematically exclude certain populations. While our mechanism is not a fairness mechanism per se, the explicit representation of group caps makes it easier to (i) detect disparities and (ii) adjust policy parameters (δ, group definitions, or infrastructure investments) in a principled way.

Two limitations are worth stating plainly. First, if per-worker energy footprints are heterogeneous and cannot be bounded tightly by a common e, carbon feasibility becomes a knapsack-style constraint (as flagged in our extension), and exact polynomial-time allocation is no longer available. In that case, one should either discretize into energy classes with posted prices or accept approximation/randomization with correspondingly weaker operational guarantees. Second, our framework treats B_j and E_j as hard budgets that are not strategically chosen within the mechanism. In real organizations, budgets are outcomes of internal politics and planning; the mechanism can improve procurement conditional on budgets, but it cannot by itself resolve the upstream question of how budgets should be allocated across projects.

Taken together, these implementation considerations suggest a pragmatic reading of our results: the mechanism is most valuable when it is embedded in a disciplined contracting and M&V process. When that process exists, carbon and congestion constraints can be enforced through transparent caps (K_j and τ_ℓj) rather than through opaque truncation, aligning economic incentives with operational feasibility and audit-ready carbon accounting.

10. Conclusion

We began from a practical tension that is becoming central to federated and distributed learning deployments: organizations want to procure many small training contributions, but they must do so under hard constraints that are not well captured by a single budget. The first is monetary—the familiar limitation that total payments cannot exceed a project’s spend authority. The second is physical and policy-driven—an energy (or carbon) budget that constrains the number of contracts that can be executed within an episode. In parallel, operational reliability imposes compatibility constraints: allocating too many workers that share a bottleneck (network segment, device class, region, or other interference domain) can degrade training or increase failure risk. Our model isolates these frictions in a way that is deliberately modular: monetary budgets enter through standard payment feasibility, energy budgets enter as a hard cap on hires when per-contract energy is standardized and verifiable, and reliability enters through per-requester group caps derived from chance constraints.

The main conceptual contribution is to show that, in this structured environment, we can keep the appealing features of classical procurement design—dominant-strategy truthfulness, ex post individual rationality, and computational tractability—without treating carbon as a soft objective or an after-the-fact adjustment. When per-worker energy is constant at a verifiable level e, each requester’s carbon budget E_j induces a simple hiring cap K_j = ⌊E_j/e⌋, and the two-budget constraint becomes at the level of feasibility: for any price r, requester j can afford at most ⌊B_j/r⌋ workers financially and at most K_j workers physically, so the operative demand at price r is min {⌊B_j/r⌋, K_j}. This observation is economically important because it restores a one-dimensional “price-per-worker” logic even in the presence of an additional non-monetary budget: carbon does not need to be monetized to be enforced.

On the mechanism-design side, we extend the price-based, single-parameter procurement template of PEA/CARE to a setting with compatibility constraints. The mechanism identifies a critical price r^* from a finite candidate set (derived from the requesters’ budgets) such that aggregate employability at that price matches the maximum feasible allocation subject to compatibility caps and per-requester caps. It then selects a feasible set of winners among those whose bids are at most r^* and pays threshold payments. Two aspects are worth emphasizing. First, the existence of a critical price is not merely a technical device; it is the point at which the market “clears” in the limited sense that raising the price further would not increase the number of workers we can allocate. Second, computational feasibility is not an afterthought: for any fixed r, the allocation problem reduces to a max-flow (or b-matching) instance once we express group caps and requester caps as capacities. In this sense, the mechanism is not only incentive compatible but also implementable at the scale required by modern FL markets.

Our approximation guarantee formalizes the efficiency loss that is unavoidable when we insist on DSIC, hard feasibility for two budgets, and compatibility constraints that can be tight. The key driver of the bound is the tightness parameter
α = min {m, max_ℓ, j⌈|G_ℓ|/τ_ℓj⌉},
which measures how constraining the interference structure can be relative to available parallelism. Intuitively, when group caps are loose (large τ_ℓj), the market behaves closer to a standard multi-buyer procurement problem and the mechanism’s allocation approaches the full-information optimum. When group caps are tight, feasibility becomes the limiting factor and no mechanism can “buy” its way out with higher prices alone; the approximation factor degrades in a controlled way that is transparent in α. A practical reading is that investments which relax bottlenecks (increasing τ_ℓj) generate returns: they improve not only realized training reliability but also allocative efficiency under budgets.

We also view the policy and implementation layer as part of the contribution, because it clarifies which assumptions are essential for the clean mechanism guarantees. The constant-e abstraction is not a claim that devices are identical; it is a claim that the procurement contract can be standardized and verified well enough that energy becomes a policy parameter rather than a private type. Similarly, the cap mapping τ_ℓj from a congestion chance constraint is meant to separate engineering estimation from economic allocation: once the platform commits to caps derived from reliability tolerances, allocation and payments can be performed transparently without discretionary truncation. This separation matters for governance. It makes budgets and reliability policies auditable, and it reduces the temptation (or necessity) to enforce carbon constraints ex post in ways that are opaque to participants and difficult to reconcile with incentive guarantees.

Several limitations delineate a natural agenda for future work. The most immediate is heterogeneous energy usage. If each worker has a known but non-uniform footprint e_i, carbon feasibility becomes a knapsack constraint and exact allocation becomes NP-hard even absent strategic behavior. Our extension discussion outlines pragmatic paths—energy class discretization, FPTAS-style subroutines, or posted-price variants—but there is a genuine frontier here: designing mechanisms that remain strongly budget-feasible, computationally practical, and robust to strategic bidding when carbon is a multidimensional resource. A second limitation is dynamics. Many FL deployments are repeated, with budgets replenished, reputations updated, and congestion conditions changing over time. Extending the critical-price logic to a multi-period setting raises questions about intertemporal incentives (e.g., bid shading today to improve selection odds tomorrow), as well as about how to amortize carbon budgets across episodes in a way that is both efficient and accountable.

A third set of open questions concerns heterogeneity in value. We focused on additive objectives (cardinality or reputation-weighted value) because they align with the single-parameter cost model and admit tractable allocation. Yet FL performance gains can be submodular, state-dependent, or require diversity constraints that are not captured by static group caps. Incorporating richer value models will likely require maximal-in-range designs, approximate VCG variants, or carefully structured posted prices; each direction interacts nontrivially with budget feasibility. Closely related is the role of fairness and representation. Group caps make interference constraints explicit, but groups may also correlate with protected characteristics or with regions that an organization wants to include for equity reasons. A principled integration of fairness constraints—beyond the reliability-driven τ_ℓj—would require additional structure (e.g., lower bounds, parity constraints, or outcome-based audits) and may change both approximation behavior and incentive properties.

Finally, we emphasize a broader takeaway: treating carbon and reliability constraints as leads to cleaner economics and better operational outcomes than treating them as soft penalties. In procurement markets, it is tempting to “optimize a weighted sum” of spend and emissions, or to allocate first and then trim to meet carbon limits. Our analysis suggests that, when budgets are genuinely hard and audit-relevant, such approaches are fragile. By contrast, enforcing energy feasibility through explicit caps and running a DSIC threshold-payment mechanism inside that feasible region yields allocations that are predictable, auditable, and strategically robust. The mechanism does not eliminate the policy choices embedded in E_j and δ; rather, it makes those choices explicit and traceable, allowing organizations to debate and revise them without simultaneously changing the economic rules of the market.

In sum, the model illuminates a concrete tradeoff faced by practitioners: more aggressive parallelism and broader participation are valuable, but they are constrained by money, by carbon, and by interference. Our dual-budget extension of PEA provides a disciplined way to navigate this tradeoff. It offers a tractable mechanism with strong incentive and feasibility guarantees under realistic operational constraints, while making clear where the hard problems begin—heterogeneous energy, dynamics, and richer objectives. We hope this framework helps move carbon-aware learning procurement from ad hoc throttling toward transparent, policy-compliant market design.