Learning How Much to Trust: Exogenously-Calibrated Meta-Mechanisms for Truthful Budget-Feasible Procurement

Metadata

Total Words: 10,269
Export Date: 2026-01-14 22:19:15
Description: Learning-augmented mechanism design improves welfare when predictions are accurate while maintaining worst-case guarantees when they are not. A practical obstacle in 2026-era platforms is that the designer does not know ex ante how much to trust the predictor. Existing learning-augmented procurement mechanisms (e.g., posted-price mechanisms that mix a prediction-based rule with a prediction-agnostic baseline via a parameter τ) require the designer to fix τ ahead of time. We introduce a clean meta-mechanism framework for selecting τ online using only exogenous calibration signals (e.g., backtest error on holdout data, forecast calibration, or conformal prediction sets), ensuring the selection remains independent of the agents’ current bids and therefore preserves universal truthfulness and budget feasibility. We show that a Hedge-style selector achieves near-best-in-hindsight welfare over a finite menu {M_τ}, up to $O(\sqrt{T\log|\mathcal{T}|})$ regret, while inheriting each mechanism’s error-dependent consistency/robustness guarantees. We also prove a negative result: even weak dependence of τ on current bids can break dominant-strategy truthfulness. Our results operationalize ‘how much to trust the predictor’ in procurement platforms under governance constraints, extending the source mechanisms’ tunable consistency–robustness tradeoffs to a repeated, audit-friendly setting.

1. Introduction and motivation: why choosing τ is the main deployment bottleneck; overview of exogenous calibration signals and governance constraints; contributions and results.
2. Related work: learning-augmented algorithms (consistency/robustness), learning-augmented mechanism design, online budget-feasible mechanisms with predictions (mixing parameter τ), and online learning/expert advice.
3. Model: repeated procurement with per-round DSIC budget-feasible mechanisms; definition of prediction error ε_t and calibration signal z_t; exogeneity conditions; performance benchmarks (best fixed τ in hindsight; error-robust bounds).
4. The mechanism menu: formal requirements on M_τ (universal truthfulness, budget feasibility, error-dependent welfare bound α(τ, ε)); examples instantiated from posted-price mixtures (e.g., Mechanism 1 family).
5. Meta-selection rules: (a) full-information selector using lower bounds L_t(τ) derived from z_t; (b) bandit selector using realized welfare and unbiased estimators; discussion of when each is appropriate.
6. Main theorems: DSIC preservation under exogenous selection; regret bounds; end-to-end welfare guarantee relative to best τ; robustness under calibration failure probability δ.
7. Impossibility/negative results: bid-dependent selection breaks DSIC; forward-looking agents can game τ if selection depends on past bids without commitments—formalize assumptions needed to avoid this.
8. Comparative statics and design guidance: menu size vs regret, tighter calibration sets vs performance, mapping z_t to ℰ(z_t); when to prefer conservative vs aggressive menus.
9. Discussion and extensions: non-monotone objectives, multi-budget/multi-buyer, semi-random arrivals, and integrating fairness constraints; what would require numerical tuning.
Appendix: proofs, additional lemmas on unbiased estimators/off-policy evaluation, and explicit counterexamples for DSIC failure under bid-dependent selection.

Content

1. Introduction and motivation: why choosing τ is the main deployment bottleneck; overview of exogenous calibration signals and governance constraints; contributions and results.

Deploying learning-augmented procurement mechanisms is often less constrained by the design of the underlying allocation rule than by a more prosaic question: how much should we trust the prediction, and when? In our setting, each round brings a fresh population of strategic suppliers, a hard per-round budget B, and a buyer valuation v_t(⋅) (e.g., submodular coverage or diversity). A learning-augmented mechanism M_τ can exploit a prediction signal ω_t (such as a forecast of OPT_t or an appropriate scale parameter) to set posted prices, thresholds, or acceptance rules that perform well when the prediction is accurate. Yet these mechanisms typically expose a trust parameter τ: at τ ≈ 1 the mechanism leans aggressively on ω_t and is highly “consistent” with accurate predictions; at τ ≈ 0 it becomes conservative and “robust” to errors, reverting toward worst-case guarantees. In practice, choosing τ is the main deployment bottleneck. A single fixed choice is rarely appropriate across changing market conditions, shifting supplier pools, and nonstationary model quality. At the same time, allowing τ to adapt freely to observed bids risks breaking dominant-strategy incentives and undermining the very procurement guarantees (budget feasibility, individual rationality) that motivated the mechanism in the first place.

This paper studies the governance problem of selecting τ over time, under constraints that are natural in regulated or audited procurement environments. We imagine a platform or “monitor” that is organizationally separated from the bidding process: before suppliers report costs in round t, the monitor observes an exogenous diagnostic z_t measuring prediction quality (for example, a holdout loss, a drift statistic, a conformal radius, or a model-card style risk score). Crucially, z_t is independent of the current bids and is observable prior to the round’s strategic interaction. The monitor must choose τ_t based only on these diagnostics (and their history), and the mechanism then runs M_{τ_t} with the chosen trust level. This separation is not merely a modeling convenience. It mirrors common governance constraints—firewalls between prediction systems and procurement decisions, documentation requirements for why a model is trusted, and restrictions on conditioning policy parameters on the content of submitted bids. Our aim is to show that such constraints are compatible with strong welfare performance, and in fact can be leveraged to obtain clean, auditable guarantees.

The technical starting point is that, for each fixed τ, the learning-augmented mechanism comes with an error-dependent approximation factor α(τ, ε_t), where ε_t summarizes the prediction error in round t. This captures a familiar tradeoff: more trust yields better performance when ε_t is small but can degrade sharply when ε_t is large. The challenge is that ε_t is not observed at decision time, and ω_t alone does not certify its own accuracy. Our key move is to route adaptivity through calibrated diagnostics. We assume that z_t induces a confidence set ℰ(z_t) ⊆ [0, 1) such that Pr [ε_t ∈ ℰ(z_t)] ≥ 1 − δ. This transforms the problem from “guess ε_t” to “optimize against a set of plausible errors,” a form that naturally supports worst-case reasoning and accountability: a policymaker can point to ℰ(z_t) as an explicit, statistically justified uncertainty statement.

Conceptually, this lets us define for each τ a calibrated worst-case factor
$$ \overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in\mathcal{E}(z_t)} \alpha(\tau,\varepsilon), $$
which is computable from the diagnostic z_t (given the known performance profile α) and yields a round-specific notion of how risky it is to trust the prediction at level τ. The monitor’s problem becomes an online choice over a finite menu 𝒯: select τ_t to maximize welfare lower bounds implied by $\overline{\alpha}(\tau,z_t)$, subject to the imperative that τ_t not depend on contemporaneous bids. This is where the economic logic meets online learning: we can view each τ ∈ 𝒯 as an “expert,” and each round supplies (through z_t) a prediction of how well that expert would perform in a worst-case sense. The monitor then updates its choice rule over time, balancing exploration and exploitation, but doing so entirely outside the strategic channel.

Our contributions are threefold. First, we formalize a meta-mechanism architecture in which the selection of τ_t is measurably independent of current reports, while the within-round procurement remains a standard universally truthful, individually rational, ex post budget-feasible mechanism. This yields an end-to-end procedure that preserves dominant-strategy incentives despite adaptivity over time. The key intuition is simple: dominant strategies are robust to randomization and to history dependence, but not to within-round feedback loops that let a bidder manipulate the policy parameter governing her own acceptance or payment. By insisting that τ_t is chosen before bids arrive and as a function of z_≤ t only, we eliminate precisely that manipulation channel.

Second, we provide a welfare guarantee that is “best-in-hindsight” over the trust menu. Rather than comparing to a single fixed τ chosen ex ante, we show that the meta-procedure tracks the best static choice in 𝒯 evaluated under the calibrated worst-case factors $\overline{\alpha}(\tau,z_t)$, up to a standard regret term scaling as $O(\sqrt{T\log|\mathcal{T}|})$ times an appropriate welfare scale V. Economically, this means the monitor can be conservative when diagnostics indicate low reliability, and opportunistic when diagnostics certify high reliability, without paying much more than the unavoidable learning cost for not knowing ex ante which trust level will be most appropriate across the realized sequence of environments.

Third, we clarify a negative result that motivates our governance constraint: if τ_t is allowed to depend on even a single agent’s current report, dominant-strategy truthfulness can fail. This observation is practically important because “data-driven policy knobs” are often tuned using real-time bid information (e.g., adjusting thresholds when bids seem high). Our framework explains why such tuning is not innocuous in strategic environments: it changes the mapping from an agent’s report to the mechanism’s rule, creating profitable deviations even when each M_τ is truthful in isolation.

We view these results as an argument for a particular division of labor in learning-augmented procurement: predictions can and should inform allocation, but the degree of trust should be mediated by exogenous, auditable calibration signals rather than endogenous bid data. The model does not claim that all relevant information about prediction quality is captured by z_t, nor that calibration can be achieved without cost; rather, it isolates a tractable and institutionally plausible route to combine learning-augmentation with incentive compatibility. In doing so, it turns the “choose τ” bottleneck from an ad hoc deployment choice into a principled, analyzable component of the mechanism.

A first strand of work that motivates our approach studies learning-augmented algorithms, where an online algorithm is given a prediction ω (or advice) and is evaluated along two dimensions: it should be consistent—performing near-optimally when the prediction is accurate—and robust—not degrading too much when the prediction is wrong. This consistency/robustness paradigm appears across caching, paging, scheduling, and covering problems, often via an explicit interpolation parameter that mixes a prediction-driven policy with a worst-case baseline (e.g., “follow the advice” with probability τ, otherwise revert to a classical competitive algorithm). Conceptually, our trust parameter τ plays the same role: at higher τ, the mechanism is more consistent with ω_t, while at lower τ it becomes robust to error. A key difference is that procurement introduces a strategic layer: unlike many learning-augmented algorithmic settings, the input is not passively revealed but is reported by self-interested agents, so the standard ways of tuning τ using observed outcomes can create incentive problems. This is precisely why we route adaptivity through an exogenous diagnostic z_t rather than through bids.

A second, closely related literature is learning-augmented mechanism design, which asks how predictions about values, costs, or future arrivals can improve welfare, revenue, or budget utilization while maintaining incentive constraints. Here, the relevant object is not only an allocation rule but a truthful payment scheme (e.g., posted prices, threshold payments, or variants of Myerson-style monotonicity arguments). Several papers show how predictions can be incorporated while preserving dominant strategies, typically by ensuring that the allocation is monotone in each agent’s report and payments are defined by critical thresholds. In our environment, we take as a primitive a finite menu {M_τ}_{τ ∈ 𝒯} of such truthful, budget-feasible mechanisms, each equipped with an error-dependent guarantee summarized by α(τ, ε). Our contribution is not to re-derive truthfulness for a particular M_τ, but to clarify an additional governance layer: even if each mechanism is DSIC “in isolation,” adapting the mechanism choice to contemporaneous reports can destroy DSIC. This observation connects to a broader theme in learning-augmented mechanism design: preserving incentives often requires strict separations between what is learned, what is predicted, and what is optimized on the basis of strategic reports.

A third body of work concerns online budget-feasible procurement mechanisms, especially for monotone submodular valuations under a hard budget. Classic results develop posted-price and threshold-based online mechanisms that are universally truthful, individually rational, and ex post budget-feasible, often with constant-factor approximations under random arrival or secretary-style assumptions, and with weaker guarantees in fully adversarial orders. More recent work augments these mechanisms with predictions—e.g., forecasts of OPT_t, of the distribution of costs, or of an appropriate threshold/price level—to improve welfare when predictions are accurate. These designs naturally introduce a mixing knob (our τ) that interpolates between a “predicted threshold” and a “safe threshold” that guarantees feasibility and a baseline approximation. Our paper is aligned with this line: we explicitly model a prediction signal ω_t that can be informative but unreliable, and we summarize the resulting welfare via a function α(τ, ε_t). The distinctive aspect of our framework is that we separate mechanism design (within a round) from policy selection (across rounds): the within-round mechanism is held fixed once τ_t is chosen, while τ_t itself is selected by a monitor that is restricted to use only exogenous diagnostics. This separation matches how prediction-assisted procurement is often deployed: a model-risk or compliance function may certify when a predictive model is “in scope,” while the procurement engine must remain DSIC and budget-feasible regardless of such certification.

A fourth ingredient is the literature on calibration and predictive uncertainty as a tool for decision-making. While calibration is usually studied for probabilistic forecasts, conformal prediction, or uncertainty quantification, the economic role is similar: a diagnostic z_t induces a set of plausible errors ℰ(z_t) with a coverage guarantee 1 − δ. Our use of calibration is deliberately minimalistic. We do not require that ω_t be unbiased or that errors be i.i.d.; we only require an exogenous statement of uncertainty that is measurable before the strategic interaction. This framing lets us convert the unobserved error ε_t into a worst-case factor $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$, which is the key quantity fed into the meta-level selection rule. In this sense, we import a familiar idea from robust optimization—optimize against a confidence region—into a repeated mechanism-selection problem where incentives constrain what may be conditioned on.

Finally, our meta-mechanism draws on online learning with expert advice. If each τ ∈ 𝒯 is viewed as an expert, then each round provides a signal-dependent proxy for the welfare one could guarantee by choosing that expert. Standard regret-minimization tools (e.g., multiplicative weights) imply that the cumulative performance of the selector competes with the best fixed expert in hindsight up to $O(\sqrt{T\log|\mathcal{T}|})$ regret, provided the per-round feedback is well-defined and bounded. What is novel in our setting is the interplay with strategic behavior: we must ensure that the “losses” used by the learning rule are computed without peeking at bids in a way that would make agents pivotal in the choice of τ_t. Put differently, in a non-strategic online learning problem it is innocuous to define feedback using the realized instance; in procurement, using bid-dependent feedback to set τ_t can endogenize the mechanism itself and create profitable deviations. Our restriction that τ_t be measurable with respect to σ(z_≤ t) can be seen as an incentive-compatible analogue of the non-anticipation constraints common in online learning, tailored to preserve dominant strategies.

Taken together, these strands suggest a common lesson: learning-augmentation is most powerful when it provides a “fast lane” under accuracy, but it requires an explicit “guardrail” under error. Our contribution is to formalize a guardrail that is simultaneously statistical (through calibration sets), economic (through DSIC and budget feasibility), and algorithmic (through expert-style selection over τ).

3. Model: repeated procurement with per-round DSIC budget-feasible mechanisms; definition of prediction error ε_t and calibration signal z_t; exogeneity conditions; performance benchmarks (best fixed τ in hindsight; error-robust bounds).

3. Model: repeated budget-feasible procurement with calibrated predictions

We study a repeated online procurement problem over rounds t = 1, …, T. In each round a single buyer (the principal) wishes to procure a subset of suppliers subject to a hard budget constraint B that applies within the round. The buyer’s benefit from a set of procured suppliers S_t ⊆ N_t is given by a monotone submodular valuation function v_t : 2^N_t → ℝ_≥ 0, accessed via a value oracle. The submodularity assumption captures diminishing returns and allows us to interpret the buyer’s objective as a welfare-maximization problem with combinatorial structure, rather than a pure “cheapest wins” procurement.

Agents and private information. In round t, a set N_t of suppliers arrives (potentially random, but observed by the mechanism within the round). Each supplier i ∈ N_t has a private cost c_i, t ≥ 0 for being selected. Suppliers are single-parameter: their type is fully summarized by this cost. They submit bids b_i, t to the platform. If selected, supplier i receives a payment p_i, t and obtains utility
$$ u_{i,t}= \begin{cases} p_{i,t}-c_{i,t}, & i\in S_t,\\ 0, & i\notin S_t. \end{cases} $$
The buyer’s realized welfare in round t is W_t := v_t(S_t), subject to an ex post budget constraint ∑_{i ∈ S_t}p_i, t ≤ B.

Benchmark and per-round optimality. To evaluate performance we compare against the offline budget-feasible optimum within each round:
OPT_t := max {v_t(S) : S ⊆ N_t, ∑_i ∈ Sc_i, t ≤ B}.
This benchmark is intentionally demanding: it knows all realized costs in round t and solves the knapsack-like submodular maximization problem under the budget. Our mechanisms will compete with OPT_t only up to approximation factors, reflecting both online constraints and incentive constraints.

A menu of learning-augmented mechanisms. Rather than designing a single mechanism, we assume the platform has access to a finite menu {M_τ : τ ∈ 𝒯} indexed by a trust parameter τ. Each M_τ is an online procurement mechanism that, for a fixed τ, is (i) dominant-strategy incentive compatible (DSIC) within the round, (ii) individually rational (IR), and (iii) ex post budget-feasible. Moreover, each M_τ is learning-augmented: it is allowed to use an externally provided prediction signal ω_t in round t (e.g., a forecast of OPT_t, or a forecast of a good posted-price/threshold level).

We summarize the benefit of ω_t through an error-dependent welfare guarantee. Specifically, there is an unobserved prediction error parameter ε_t ∈ [0, 1) (we will later instantiate it with a concrete relation such as ω_t = (1 − ε_t)OPT_t for underestimation). The key property is that for each τ and each round t,
$$ \mathbb{E}\big[v_t(S_t)\,\big|\,\tau_t=\tau\big]\ \ge\ \frac{1}{\alpha(\tau,\varepsilon_t)}\ \mathrm{OPT}_t, $$
where α(τ, ε) ≥ 1 is an approximation/competitive factor capturing the usual consistency–robustness tradeoff: for more “trusting” τ, α(τ, ε) can be small when ε is small, but may grow when ε is large; for more conservative τ, α(τ, ε) is flatter but may never be as small under perfect predictions.

Calibration signals and uncertainty sets. A central modeling choice is that the platform does not directly observe ε_t, nor can it reliably infer ε_t from bids without risking endogeneity. Instead, before bids are collected in round t, the platform observes an exogenous calibration/quality signal z_t. This signal could represent, for instance, a holdout loss of the predictive model on recent non-strategic data, a conformal prediction radius, or a model-monitoring statistic used in practice by a risk/compliance function.

From z_t, the platform constructs a confidence set ℰ(z_t) ⊆ [0, 1) of plausible prediction errors such that
Pr [ε_t ∈ ℰ(z_t)] ≥ 1 − δ,
for a prescribed failure probability δ. We emphasize what we do not assume: we do not need ε_t to be i.i.d. over time, nor ω_t to be unbiased; the calibration guarantee is allowed to be distribution-free and time-varying as long as it is valid at the stated level.

This calibration device lets us convert the unobserved ε_t into a worst-case welfare factor that is computable from z_t alone:
$$ \overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in \mathcal{E}(z_t)}\alpha(\tau,\varepsilon). $$
Intuitively, $\overline{\alpha}(\tau,z_t)$ is the “model-risk-adjusted” approximation factor for choosing trust level τ in a round whose diagnostic is z_t.

Timing and exogeneity restrictions. The round-t sequence is:
1. The platform observes (ω_t, z_t) and chooses τ_t.
2. Suppliers in N_t submit bids b_i, t.
3. The platform runs M_{τ_t} online, making irrevocable accept/reject decisions and payments, producing S_t and {p_i, t}.
4. Welfare v_t(S_t) is realized.

The crucial exogeneity conditions are:
- z_t (and hence ℰ(z_t)) is independent of current-round bids b_⋅, t.
- The chosen trust parameter τ_t is measurable with respect to σ(z₁, …, z_t) (and any public context), but not a function of the current bids.

This restriction is not cosmetic. It formalizes a separation-of-duties principle: model monitoring and mechanism operation can interact, but the monitor does not condition the mechanism choice on the strategic reports it will later elicit. In applications, this matches the idea that the platform may certify whether the predictive model is “in policy” based on diagnostics that are not manipulable by suppliers within the procurement event.

Performance benchmark across rounds. Because the platform chooses τ_t adaptively, we evaluate it against a natural “best fixed trust level in hindsight” benchmark that accounts for calibration. In each round, choosing τ can guarantee at least $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ whenever ε_t ∈ ℰ(z_t). Summing across rounds, the benchmark is
$$ \max_{\tau\in\mathcal{T}}\ \sum_{t=1}^T \frac{1}{\overline{\alpha}(\tau,z_t)}\ \mathrm{OPT}_t, $$
which selects a single τ that would have performed best given the realized sequence of diagnostics. Our meta-level goal is then to achieve cumulative welfare close to this benchmark up to a standard regret term (depending on T and |𝒯|) and a scale parameter V that bounds per-round values such as OPT_t.

We view this benchmark as striking the right balance between ambition and feasibility: it credits the platform for using uncertainty information z_t, but it does not allow hindsight tuning of τ to the realized bids or realized ε_t. In particular, it reflects the practical constraint that the platform can pre-commit to a monitoring-and-selection policy but cannot safely “peek” into strategic inputs to decide how much it trusts the prediction in that same round.

At the core of our approach is a deliberate modularity: we do not hard-code a single learning-augmented procurement rule, but instead assume the platform has access to a menu {M_τ : τ ∈ 𝒯} of candidate online mechanisms indexed by a scalar “trust” knob τ. The economic motivation is straightforward. In procurement with predictions, there is typically no uniformly best way to incorporate ω_t: leaning heavily on ω_t can yield near-optimal welfare when predictions are accurate, yet it can be fragile when the signal is miscalibrated or stale. Conversely, conservative rules are robust but leave value on the table when the model is performing well. The menu representation turns this modeling tension into an explicit design object, and allows the meta-layer (Section 5) to arbitrate among mechanisms without ever conditioning on strategic reports within the round.

Formally, each M_τ is an online, budget-feasible, truthful mechanism for the single-parameter environment induced by costs. We treat the within-round arrival process as arbitrary (and potentially adaptive), and require the mechanism to make irrevocable accept/reject decisions as suppliers arrive. The trust parameter τ is fixed at the start of the round, publicly known, and then the mechanism runs to completion.

Truthfulness and feasibility as hard constraints

We impose three non-negotiable properties on each M_τ.

(i) Universal truthfulness (within a round). Each M_τ may randomize, but its randomization must be “outside” the strategic interaction in the standard strong sense: M_τ is a distribution over deterministic mechanisms such that, for every realization of the mechanism’s internal randomness and every realized arrival sequence, truthful bidding is a dominant strategy. Concretely, if we write M_τ as drawing a seed r and then running a deterministic rule M_τ^r, we require that for all r, M_τ^r is DSIC in the single-parameter sense: the allocation rule is monotone in each bid and payments are the corresponding critical values. This property matters because the meta-layer will adapt τ over time; universal truthfulness ensures that the mechanism remains DSIC even when the selector’s randomness and history-dependent choices are common knowledge.

(ii) Ex post budget feasibility. For every bid profile b_t and every realization of internal randomness, total payments must satisfy
∑_{i ∈ N_t}p_i, t(b_t) ≤ B.
We emphasize the ex post nature: the budget constraint is not only in expectation, but holds pathwise. This aligns with procurement practice, where payments must be authorized within a fixed spending limit rather than amortized across rounds.

(iii) Individual rationality. If a supplier is selected, then the payment covers its bid, i.e.,
i ∈ S_t(b_t) ⇒ p_i, t(b_t) ≥ b_i, t.
Given DSIC, IR can be interpreted as ensuring nonnegative utility for truthful bidders, which is essential for participation in repeated settings.

These properties are standard, but they strongly restrict how prediction signals can be used. In particular, while ω_t can influence posted prices or acceptance thresholds, it cannot enter in a way that makes an agent’s acceptance depend non-monotonically on its own bid, nor can it induce budget violations when many agents accept simultaneously.

Error-dependent welfare guarantees

What distinguishes the menu {M_τ} from an arbitrary collection of truthful mechanisms is a common, prediction-aware performance guarantee. We assume that in each round there is an (unobserved) error parameter ε_t ∈ [0, 1) summarizing the quality of ω_t, and that each M_τ admits an error-dependent approximation factor α(τ, ε) ≥ 1 such that
$$ \mathbb{E}\big[v_t(S_t)\,\big|\,\tau_t=\tau\big]\ \ge\ \frac{1}{\alpha(\tau,\varepsilon_t)}\ \mathrm{OPT}_t. $$
The expectation is taken over the mechanism’s internal randomness and any randomness in the arrival process (holding fixed the realized round-t instance). Economically, α(τ, ε) encodes the consistency–robustness frontier: for larger τ (more trust in predictions), α(τ, ε) may be close to 1 when ε is small, but can degrade rapidly when ε is large; for smaller τ, α is typically flatter in ε, reflecting robust performance even under poor predictions.

We do not require a particular functional form for α. What we will rely on later is simply that (a) α(τ, ε) is known to the platform for each menu element (as an analytical guarantee), and (b) it can be evaluated under uncertainty sets ℰ(z_t) to produce the worst-case calibrated factor $\overline{\alpha}(\tau,z_t)$.

A canonical example: posted-price mixtures (the “Mechanism 1” family)

To ground the abstraction, it is helpful to keep in mind a common template that satisfies the above constraints: learning-augmented posted-price mechanisms implemented as mixtures of a prediction-driven rule and a robust fallback.

A prototypical “Mechanism 1(τ)” proceeds as follows. At the start of round t, it uses ω_t to compute a target acceptance threshold—for example, a marginal-value-per-dollar cutoff λ_t^pred intended to mimic the Lagrange multiplier of the offline budget-feasible optimum. Independently, it also computes a conservative cutoff λ_t^rob that does not rely on ω_t (or relies on it only through safe bounds). Then it randomizes:
$$ \lambda_t\ =\ \begin{cases} \lambda^{\mathrm{pred}}_t,& \text{with prob. }\tau,\\ \lambda^{\mathrm{rob}}_t,& \text{with prob. }1-\tau. \end{cases} $$
Given λ_t, the mechanism implements a posted-price/threshold rule while suppliers arrive: it accepts supplier i if its bid is below a critical price derived from its marginal contribution relative to λ_t and the remaining budget. In submodular procurement, one convenient representation is to evaluate the marginal value Δ_i = v_t(S ∪ {i}) − v_t(S) at the current accepted set S, and offer a take-it-or-leave-it price proportional to Δ_i/λ_t, truncated to respect remaining budget. Payments are exactly these posted prices, so each supplier faces a single-parameter take-it-or-leave-it decision, which yields DSIC and IR immediately.

Budget feasibility is ensured mechanically by construction: because prices are posted and each acceptance debits the remaining budget, total payments never exceed B ex post. Universal truthfulness follows because the only randomness is the initial draw of λ_t; conditional on that draw, the posted-price mechanism is deterministic and truthful.

The welfare analysis of this family typically has the following structure. When predictions are accurate (small ε_t), λ_t^pred approximates the “right” tradeoff between marginal values and costs, so the τ-branch attains a constant-factor approximation close to 1 (up to the inherent online/submodular loss). When predictions are inaccurate, the robust branch protects welfare with a worst-case constant approximation that does not deteriorate with ε_t. One then obtains an error-dependent bound of the schematic form
α(τ, ε) ≲ min {(robust constant)/(1 − τ), (consistency term increasing in ε)/τ},
capturing precisely the tradeoff governed by τ. While the constants and the dependence on ε depend on the specific instantiation (e.g., whether ω_t predicts OPT_t, a multiplier λ^*, or a good price ladder), the overarching point is that mixtures let us separate incentive/budget feasibility (enforced pointwise) from prediction usage (which enters only through the posted-price level).

Practical interpretation and limitations

From an implementation perspective, the menu 𝒯 can be read as a small set of platform policies: “aggressive,” “balanced,” and “conservative” ways of turning a model score into procurement prices. This is often closer to practice than a continuously tuned parameter, and it simplifies governance: each M_τ can be audited for DSIC and budget compliance, while the meta-layer merely chooses among pre-approved policies based on exogenous diagnostics.

The limitation is equally clear. Our assumptions place substantial weight on the existence of analytically certified α(τ, ε) bounds for each menu element. This is realistic for posted-price/threshold mechanisms where DSIC and budget feasibility are transparent, but it is more restrictive for sophisticated allocation rules whose incentive properties are delicate. The menu abstraction is therefore best viewed as a disciplined interface: it packages complex within-round mechanism design into a small number of vetted options, enabling principled cross-round adaptation without reopening the incentive constraints each time we update how much we “trust” the model.

5. Meta-selection rules: (a) full-information selector using lower bounds L_t(τ) derived from z_t; (b) bandit selector using realized welfare and unbiased estimators; discussion of when each is appropriate.

5. Meta-selection rules: choosing τ_t without looking at bids

Having fixed a vetted menu {M_τ : τ ∈ 𝒯}, the remaining design question is how the platform selects τ_t over time. Conceptually, τ is a policy choice—how aggressively we incorporate the prediction ω_t. The difficulty is that this choice must be made (i) online and (ii) in a way that does not entangle the within-round strategic game. Our guiding restriction is therefore measurability: τ_t must be chosen using only exogenous information observed before bids in round t (in particular, z_t and past history), never as a function of current reports b_⋅, t.

We present two meta-selection templates, corresponding to whether the platform can compute, prior to bidding, informative per-round “scores” for each τ. The first is a full-information selector based on calibrated lower bounds derived from z_t. The second is a bandit selector that learns from realized welfare outcomes.

5.1 Full-information selection via calibrated welfare lower bounds

In some environments, the calibration signal z_t is rich enough that—before seeing bids—we can bound how well each candidate mechanism will perform in that round up to the (unknown) OPT_t. The key object is the worst-case approximation factor under the calibrated error set,
$$ \overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon), $$
which is computable from the known guarantee α(τ, ε) and the confidence set ℰ(z_t).

This suggests a natural “expert advice” viewpoint: each τ ∈ 𝒯 is an expert, and in round t expert τ offers a certified welfare fraction $1/\overline{\alpha}(\tau,z_t)$ of the (unobserved) OPT_t. Of course, OPT_t itself is unknown, and we cannot score experts by $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ directly. We therefore introduce a scale proxy L_t(τ) that is computable from z_t alone and is monotone in $1/\overline{\alpha}(\tau,z_t)$. A convenient generic choice is
$$ L_t(\tau)\ :=\ \frac{1}{\overline{\alpha}(\tau,z_t)}\cdot \widehat{V}_t, $$
where V̂_t is any nonnegative magnitude estimate available ex ante (for instance, a conservative upper bound V, or a context-dependent bound on attainable welfare). When ω_t predicts value (e.g., ω_t ≈ OPT_t up to multiplicative error), we may set V̂_t = ω_t to sharpen adaptation; when ω_t is not value-calibrated, we fall back to a uniform scale V̂_t = V to obtain a scale-stable selector.

Given scores {L_t(τ)}_{τ ∈ 𝒯}, we can run a standard full-information online learning rule such as multiplicative weights (Hedge). Writing w_t(τ) for the weight on expert τ at the start of round t, one instantiation is
$$ p_t(\tau)\ =\ \frac{w_t(\tau)}{\sum_{\tau'}w_t(\tau')},\qquad w_{t+1}(\tau)\ =\ w_t(\tau)\exp\big(\eta\, L_t(\tau)/V\big), $$
and then we sample τ_t ∼ p_t and run M_{τ_t}. The normalization by V is purely to keep learning rates well-scaled; any boundedness assumption that makes L_t(τ)/V ∈ [0, 1] suffices.

Economically, this rule says: when z_t indicates the prediction is reliable (so ℰ(z_t) is small), experts with larger τ tend to have smaller $\overline{\alpha}(\tau,z_t)$, hence larger L_t(τ), and accumulate weight. When z_t flags low reliability, the robust mechanisms are rewarded because their $\overline{\alpha}(\tau,z_t)$ deteriorates less. Crucially, this arbitration happens entirely “outside” the bidding game.

This full-information route is appropriate when z_t can be interpreted operationally—for example, as a conformal prediction radius, a drift statistic, or an out-of-sample loss—and when the platform can credibly map it to a set ℰ(z_t) that controls worst-case degradation.

5.2 Bandit selection from realized welfare (and unbiased estimators)

In other settings, the platform may not trust any ex ante scoring rule beyond a very weak bound, even if it has a diagnostic signal z_t. Or the platform may prefer to learn directly from outcomes because the mapping z_t ↦ ℰ(z_t) is conservative. In such cases we can treat τ-selection as a bandit problem: in each round we choose one τ_t, observe the realized welfare W_t = v_t(S_t), and update.

To apply adversarial bandit algorithms (e.g., Exp3), we need an unbiased per-expert reward estimate. Let 𝒯 be finite and suppose we sample τ_t from a mixed strategy p_t with full support (often enforced via explicit exploration). After observing W_t, define the importance-weighted estimate
$$ \widehat{g}_t(\tau)\ :=\ \frac{\mathbf{1}\{\tau_t=\tau\}\, W_t}{p_t(\tau)}. $$
Then 𝔼[ĝ_t(τ) ∣ history] = 𝔼[W_t ∣ τ_t = τ], so ĝ_t(τ) is an unbiased estimator of expert τ’s expected welfare in that round (conditional on the realized instance). We can therefore update weights similarly to multiplicative weights, replacing L_t(τ) with ĝ_t(τ) (again normalized by V to control learning rates).

Two practical remarks matter here. First, bandit feedback uses realized welfare, which depends on bids and arrivals; nevertheless, the selector’s action τ_t is still chosen before bids, so the informational dependence does not create strategic manipulation within the round. Second, bandit learning is typically noisier and requires exploration: if we never try a given τ, we cannot learn its value. This exploration can be interpreted as deliberate policy experimentation, which is often acceptable in repeated procurement so long as each tested mechanism remains budget-feasible and truthful.

When should we prefer the bandit route? It is attractive when (i) welfare can be reliably computed at the end of each round (via a value oracle for v_t), (ii) the platform distrusts the ex ante calibration mapping or expects model performance to be nonstationary in ways z_t does not capture, or (iii) the platform seeks robustness to misspecification of α(τ, ε) by learning which τ works empirically. The cost is statistical: bandit regret scales worse than full-information regret in general, reflecting that we only observe the payoff of the chosen policy.

5.3 Hybridizing z_t and bandit feedback

A pragmatic compromise is to use z_t to shape exploration. For instance, we can set a prior or baseline distribution q_t(τ) ∝ exp (ηL_t(τ)/V) from calibrated scores, and then mix it with uniform exploration to ensure support:
p_t(τ) = (1 − γ) q_t(τ) + γ/|𝒯|.
This hybrid retains the safety interpretation of calibration (we do not over-commit to brittle τ when z_t looks unfavorable) while still allowing the platform to learn from realized welfare in regimes where the diagnostics are imperfect.

Across all variants, the meta-layer plays a narrow but crucial role: it adapts among pre-approved truthful mechanisms using only exogenous signals and past outcomes, thereby improving cumulative welfare without reintroducing within-round incentive problems.

6. Main theorems: DSIC preservation under exogenous selection; regret bounds; end-to-end welfare guarantee relative to best τ; robustness under calibration failure probability δ.

We now formalize the guarantees delivered by the meta-layer: selecting τ_t using only exogenous information preserves the per-round incentive properties of the base mechanisms, and standard online-learning analysis converts per-round certified performance into an end-to-end welfare bound relative to the best fixed τ ∈ 𝒯. We also make explicit how the calibration failure probability δ enters the welfare guarantee.

The first result is essentially a “conditioning” argument: since τ_t is chosen before bids and independently of them, the within-round strategic game faced by agents is exactly the one induced by the chosen mechanism M_{τ_t}.

Fix any (possibly randomized) meta-rule Meta such that, in each round t, τ_t is measurable with respect to σ(z₁, …, z_t) (and any public context) and is independent of the current bid profile b_⋅, t. Suppose that for every τ ∈ 𝒯, the mechanism M_τ is universally truthful (DSIC), individually rational, and ex post budget-feasible in each round. Then the composed procedure that samples/chooses τ_t via Meta and runs M_{τ_t} in round t is also universally truthful, individually rational, and ex post budget-feasible in each round.

Condition on any realization of τ_t = τ and any arrival realization within round t. By assumption, the induced mechanism is M_τ, which is DSIC, IR, and budget-feasible. Since τ_t is selected independently of b_⋅, t, an agent cannot affect the distribution over τ_t by misreporting in round t. Therefore, truthful bidding weakly dominates deviations when we take expectations over τ_t (universal truthfulness follows because the composition is a distribution over deterministic DSIC mechanisms, indexed by τ_t and internal randomness). Budget feasibility and IR hold pointwise for the realized τ_t, hence for the mixture as well. ▫

Economically, Theorem 6.1 tells us that we can treat τ as a platform-side “policy knob” that is tuned using diagnostics z_t and past outcomes, without reopening incentive issues—provided the knob is set before agents play and is not responsive to their current reports.

Next we connect the meta-layer’s learning dynamics to welfare. The key observation is that each τ comes with a calibrated worst-case approximation factor $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$, and hence a per-round $1/\overline{\alpha}(\tau,z_t)$ of OPT_t on the event {ε_t ∈ ℰ(z_t)}. When we run a full-information learner on any bounded score L_t(τ) that is monotone in $1/\overline{\alpha}(\tau,z_t)$ (e.g., $L_t(\tau)=\widehat V_t/\overline{\alpha}(\tau,z_t)$), the regret bound translates into a welfare bound.

Assume 0 ≤ L_t(τ) ≤ V for all t, τ. Let Meta be multiplicative weights (Hedge) applied to the experts τ ∈ 𝒯 with these scores, producing a distribution p_t and sampling τ_t ∼ p_t. Then for any sequence of signals (z_t)_t,
$$ \sum_{t=1}^T \mathbb{E}\big[L_t(\tau_t)\big]\ \ge\ \max_{\tau\in\mathcal{T}}\sum_{t=1}^T L_t(\tau)\ -\ R_T, \qquad R_T\ =\ O\!\left(V\sqrt{T\log|\mathcal{T}|}\right), $$
where the expectation is over the meta-randomization (and any internal randomness of the base mechanisms, if L_t is itself random but bounded).

The meta-layer competes with the best trust parameter τ chosen in hindsight, in terms of the certified lower-bound proxy. This is the right comparator when τ represents a stable policy choice (e.g., a procurement agency’s “degree of reliance” on predictions).

To turn this into a welfare statement, we use the base guarantee: for each t and each realized ε_t,
$$ \mathbb{E}[W_t\mid \tau_t=\tau]\ \ge\ \frac{1}{\alpha(\tau,\varepsilon_t)}\,\mathrm{OPT}_t. $$
On the calibrated event {ε_t ∈ ℰ(z_t)}, we have $\alpha(\tau,\varepsilon_t)\le \overline{\alpha}(\tau,z_t)$, hence
$$ \mathbb{E}[W_t\mid \tau_t=\tau,\, \varepsilon_t\in\mathcal{E}(z_t)]\ \ge\ \frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathrm{OPT}_t. $$
Summing and comparing to the best τ yields the next corollary.

Suppose OPT_t ≤ V almost surely. Under the conditions of Theorem 6.2 and calibration Pr [ε_t ∈ ℰ(z_t)] ≥ 1 − δ,
$$ \sum_{t=1}^T \mathbb{E}[W_t]\ \ge\ \max_{\tau\in\mathcal{T}}\sum_{t=1}^T \frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathbb{E}[\mathrm{OPT}_t]\ -\ R_T\ -\ \delta T V. $$
The additive δTV term is a conservative price for calibration failure: in the worst case, we lose at most V welfare in a round where the prediction error falls outside ℰ(z_t). (If one has a nontrivial fallback guarantee for α(τ, ε) even off the calibrated set, this term can be sharpened accordingly.)

When scores L_t(τ) are unavailable or distrusted, we can instead learn from realized welfare via importance-weighted estimators and obtain the standard adversarial bandit tradeoff.

Assume 0 ≤ W_t ≤ V almost surely and that Meta uses an adversarial bandit algorithm (e.g. Exp3) over 𝒯 with exploration ensuring p_t(τ) ≥ γ/|𝒯|. Then
$$ \sum_{t=1}^T \mathbb{E}[W_t]\ \ge\ \max_{\tau\in\mathcal{T}}\sum_{t=1}^T \mathbb{E}[W_t\mid \tau_t=\tau]\ -\ O\!\left(V\sqrt{T|\mathcal{T}|\log|\mathcal{T}|}\right). $$
Combined with the calibrated per-round guarantee (as in Corollary 6.3), this yields an end-to-end bound relative to $\max_{\tau}\sum_t \mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$, with a larger regret term reflecting partial feedback.

Taken together, these results make the practical message precise: once we restrict the meta-choice τ_t to be exogenous to current bids, we can safely “wrap” truthful budget-feasible mechanisms in standard learning machinery, obtaining cumulative welfare close to the best calibrated trust level, while explicitly accounting for both learning regret and the residual miscalibration risk δ.

7. Impossibility/negative results: bid-dependent selection breaks DSIC; forward-looking agents can game τ if selection depends on past bids without commitments—formalize assumptions needed to avoid this.

The preceding guarantees hinge on a sharp separation between and . This separation is not merely a technical convenience: once the meta-layer is allowed to look at bids—either contemporaneously within the round, or retrospectively across rounds when agents are forward-looking—dominant-strategy truthfulness can fail in stark, low-dimensional examples. We view these negative results as design guidance: they delineate which forms of adaptivity are “safe” in procurement markets and which forms effectively create a new (and generally non-truthful) mechanism.

In single-parameter environments, DSIC is equivalent to monotonicity of the allocation rule in each agent’s own bid (holding others fixed), together with threshold payments. Each M_τ satisfies this property by assumption. However, if the selector chooses τ_t as a function of the bid profile b_⋅, t, the allocation rule is no longer the monotone allocation rule of any fixed M_τ; it is a spliced rule whose behavior can be non-monotone in b_i, t.

A minimal counterexample uses a single agent and posted prices.

There exist two deterministic DSIC posted-price mechanisms M_τ⁽¹⁾ and M_τ⁽²⁾ and a meta-rule τ_t(b_i, t) such that the composed mechanism is not DSIC.

One agent supplies a single item. The buyer’s value is v({i}) = 1 and the per-round budget is B = 1. Define two truthful posted-price mechanisms:
M_τ⁽¹⁾: offer price p₁ = 0.4, M_τ⁽²⁾: offer price p₂ = 0.6.
Each is DSIC and IR: the agent accepts iff c ≤ p_k. Now define a bid-dependent selector
$$ \tau_t(b_i)= \begin{cases} \tau^{(2)}, & b_i\le 0.4,\\ \tau^{(1)}, & b_i>0.4. \end{cases} $$
Consider an agent with true cost c = 0.5. Truthful reporting b_i = 0.5 triggers τ⁽¹⁾, the price 0.4, and hence rejection and utility 0. By misreporting b_i^′ = 0.4, the agent triggers τ⁽²⁾, receives price 0.6, accepts, and obtains utility 0.6 − 0.5 = 0.1 > 0. Thus truth-telling is not a dominant strategy.

The economic point is that bid-dependent selection endogenizes the “policy knob” τ. From the agent’s perspective, the bid no longer only affects acceptance within a fixed mechanism; it also affects . This creates profitable manipulations even when every mechanism in the menu is truthful in isolation.

One can generalize the pathology: any selection rule that makes τ_t respond discontinuously (or merely monotonically) to b_i, t can introduce non-monotonic acceptance regions, so the composed mechanism fails the standard characterization of truthful single-parameter mechanisms. In that sense, “choosing among truthful mechanisms based on bids” is not an innocuous mixture; it is itself a new mechanism whose incentive properties must be re-verified from scratch.

A more subtle failure mode arises when τ_t is allowed to depend on bids or outcomes that are functions of past bids. If each round draws a fresh set of agents who never return, then dependence on b_⋅, 1, …, b_{⋅, t − 1} does not create strategic leverage for round-t agents: they cannot affect the past, and their current bid does not affect τ_t by construction. In many online procurement applications (e.g., one-off microtasks, spot markets with a large anonymous pool), this “fresh population” approximation is reasonable.

However, if agents are long-lived, expect repeat interaction, or can coordinate across identities, then past-bid dependence creates incentives: an agent may sacrifice current-round utility to move the platform toward a more favorable τ in future rounds.

There exist a two-round instance with a single agent participating in both rounds, a menu {M_τ} of per-round DSIC mechanisms, and a meta-update rule τ₂ = ϕ(b₁) such that truthful bidding in round 1 is not optimal for the agent once it internalizes the effect on round 2.

Let M_τ be a posted-price mechanism that offers price τ (still DSIC in each round for fixed τ). Suppose the platform updates τ₂ = b₁ (or $\tau_2=\min\{B,\max\{\underline\tau,b_1\}\}$ to respect budgets). The agent has constant cost c = 0.2 in both rounds and discount factor β ∈ (0, 1]. If it bids truthfully b₁ = 0.2, then τ₂ = 0.2 and second-round utility is 0. If instead it overbids to b₁^′ = 0.6, it may reduce or eliminate first-round trade, but it induces τ₂ = 0.6 and then earns second-round utility 0.6 − 0.2 = 0.4, which can outweigh any first-round loss when β is sufficiently large. Thus the platform’s learning rule becomes a manipulable state variable.

This example is intentionally stark, but the mechanism-design lesson is robust: whenever the platform uses bid-influenced statistics (acceptance rates, paid prices, realized welfare) to choose future τ, strategic agents with intertemporal incentives will treat those statistics as targets to steer. Put differently, “no-regret learning” at the platform level can be compatible with DSIC only under a or assumption that rules out profitable intertemporal manipulation.

These negative results clarify the role of our modeling restrictions. To avoid within-round failures, we require τ_t to be chosen before bids and to be independent of b_⋅, t; this is the minimal condition that prevents agents from influencing the mechanism they face in the same round. To avoid cross-round manipulation, we need at least one of the following (application-dependent) safeguards:

In practice, these correspond to concrete institutional choices: using holdout-based forecast diagnostics rather than bid-derived KPIs to tune trust; committing to published update schedules; and treating any adaptivity based on market reports as a potential incentive channel that must be explicitly mechanism-designed rather than “bolted on” at the meta-layer.

Having separated “safe” exogenous diagnostics from strategic reports, we can use the welfare bound to derive practical comparative statics: how to pick the menu 𝒯, how much effort to spend tightening ℰ(z_t), and how to interpret the resulting behavior of the meta-layer.

The meta-layer’s benefit comes from competing with the best fixed τ ∈ 𝒯 in hindsight, round-by-round weighted by the calibration-adjusted factor $\overline{\alpha}(\tau,z_t)$. The cost is the regret term, which scales as
$$ R_T \cdot V \;\approx\; O\!\left(\sqrt{T\log|\mathcal{T}|}\right)\cdot V. $$
Two immediate comparative statics follow.

First, enlarging the menu improves the benchmark $\max_{\tau\in\mathcal{T}}\sum_t \mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ but only logarithmically worsens the regret penalty. This makes it tempting to add many trust levels. However, the relevant object is not cardinality per se but : if two parameters τ, τ^′ yield nearly identical $\overline{\alpha}(\cdot,z)$ for all plausible z, then adding both mostly increases log |𝒯| without improving the benchmark. Operationally, we prefer menus that are in the sense that for each plausible error regime there is a distinct best τ, but not so fine-grained that adjacent τ’s are redundant.

Second, the scale V matters as much as |𝒯|. If OPT_t varies substantially across rounds (e.g., varying arrival volumes n_t), then choosing a normalization that makes per-round proxy losses comparable can dramatically tighten constants. In implementations, we often recommend either (i) normalizing the learning objective by a known upper bound on OPT_t, or (ii) using scale-free updates where the proxy L_t(τ) is itself in [0, 1]. The menu can then be slightly larger without making exploration “too expensive” in high-value rounds.

Our welfare guarantee depends on $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$. Holding the family α(τ, ε) fixed, shrinking ℰ(z_t) weakly decreases $\overline{\alpha}$ and thus improves the per-round welfare benchmark. This is the basic “value of information” effect: more informative diagnostics z_t allow the meta-layer to select more aggressive trust when predictions are likely accurate.

But the guarantee is only as credible as the coverage property Pr [ε_t ∈ ℰ(z_t)] ≥ 1 − δ. A smaller set with the same nominal δ typically requires stronger modeling assumptions (e.g., stationarity of forecast residuals conditional on z_t). In other words, ℰ(z_t) embodies a tradeoff between (performance) and (robustness). From a mechanism-design perspective, this tradeoff has a concrete welfare interpretation: occasional calibration failure effectively drops us into a regime where the realized approximation factor can be worse than $\overline{\alpha}(\tau,z_t)$, and if the selector tends to choose aggressive τ precisely when it believes the error is small, failures can be concentrated in high-stakes rounds.

This suggests a conservative design heuristic: when deploying in settings where distribution shift is plausible, we should spend “statistical slack” on maintaining coverage (smaller δ, more conservative ℰ) before spending it on sharpness.

The mapping z_t ↦ ℰ(z_t) is where statistical practice enters the economic guarantee. We highlight three templates that preserve exogeneity (constructed from holdout data, conformal scores, or external sensors, not from bids).

If the learning-augmented mechanism is primarily sensitive to (e.g., ω_t = (1 − ε_t)OPT_t), a natural choice is
ℰ(z_t) = [0, ε̂(z_t)],
where ε̂(z) is a high-quantile bound on underestimation conditional on z. Conformal prediction offers an off-the-shelf way to set ε̂(z) with finite-sample coverage under exchangeability assumptions.

If both under- and overestimation matter (e.g., mechanisms that can “overspend” welfare opportunity when ω_t is too high), we can use an interval
$$ \mathcal{E}(z_t)=[\underline{\varepsilon}(z_t),\overline{\varepsilon}(z_t)]\cap[0,1), $$
paired with a mechanism analysis that treats α(τ, ε) as increasing in |ε| or separately in ε’s sign. This is especially relevant when the prediction signal ω_t is a model output that can drift upward as well as downward.

In many platforms, z_t is a coarse quality flag (e.g., “model version,” “data freshness,” “coverage radius”). Then ℰ(z_t) can be pre-specified for each tier, yielding a transparent policy: the platform publicly commits to which trust parameters are admissible under each tier. This reduces the perceived arbitrariness of adaptivity and simplifies auditing.

Across templates, a key operational point is . If we want an anytime guarantee over T rounds, a union bound suggests per-round δ = O(1/T). If instead we accept rare failures in exchange for sharper sets, we should explicitly interpret the resulting guarantee as “high probability per round” rather than “uniformly over the horizon.”

Because $\overline{\alpha}(\tau,z_t)$ is a worst-case factor over ℰ(z_t), the selector effectively chooses τ_t to minimize $\overline{\alpha}(\tau,z_t)$ (or maximize a proxy L_t(τ) derived from it). This yields a clean design dichotomy.

A includes trust parameters whose α(τ, ε) degrades slowly in ε, even if they are never the very best under tiny error. This is desirable when (a) calibration sets are wide (weak diagnostics), (b) the cost of miscalibration is high (large V or high variance in OPT_t), or (c) the environment is nonstationary. In procurement terms, conservative menus correspond to mechanisms that rely less on ω_t and more on robust posted-price or threshold rules that guarantee a baseline fraction of OPT_t regardless of prediction quality.

An adds parameters that can outperform under small error but deteriorate quickly as ε grows. These become attractive when z_t is informative enough that ℰ(z_t) is frequently tight. Put plainly: aggressive options pay off when we can reliably certify that “today’s forecast is good.”

A practical compromise is to ensure the menu contains (i) at least one robust baseline τ^rob with a uniform guarantee, and (ii) a small number of aggressive candidates whose advantage is material only when z_t indicates high confidence. This structure makes the meta-layer’s behavior interpretable: it defaults to robustness and “switches on” trust only when diagnostics justify it.

These comparative statics do not eliminate the need for domain knowledge: the shape of α(τ, ε) is mechanism-specific, and the informativeness of z_t is application-specific. What our framework contributes is a disciplined way to translate those objects into transparent design choices—how large a menu to deploy, how to invest in calibration, and when to privilege conservative robustness over aggressive exploitation—while preserving the incentive guarantees that motivate the separation between bids and diagnostics.

9. Discussion and extensions: non-monotone objectives, multi-budget/multi-buyer, semi-random arrivals, and integrating fairness constraints; what would require numerical tuning.

Our baseline model makes three choices that keep the incentive and learning layers cleanly separable: (i) per-round objectives are monotone submodular, (ii) there is a single hard budget B for a single buyer each round, and (iii) arrivals and the calibration signal z_t are treated as exogenous to current bids. These assumptions are natural in many procurement settings (crowdsourcing with diminishing returns; a single program budget; diagnostics computed from holdout data), but they are not essential. What essential is that the meta-layer’s choice τ_t remain measurable with respect to non-strategic information in round t, so that we can compose universal truthfulness of M_τ with an online selection rule without reintroducing bid-dependence.

In some applications, adding a supplier can reduce value because of congestion, cannibalization, or diversity penalties. Formally, v_t may be submodular but non-monotone, or even neither. In such cases, the relevant benchmark OPT_t = max {v_t(S) : ∑_i ∈ Sc_i, t ≤ B} remains meaningful, but the algorithmic core changes: monotonicity is what allows many posted-price/threshold mechanisms to align incentives with welfare maximization.

A pragmatic extension is to restrict attention to non-monotone submodular v_t, where there exist approximation algorithms based on randomization and contention-resolution ideas. Translating these into budget-feasible mechanisms typically requires additional structure (e.g., mechanisms that sample a subset of agents to estimate thresholds, then apply an oblivious rule to the remainder). The learning-augmented component can still enter through a prediction ω_t about OPT_t or about the marginal value distribution, but the approximation factor α(τ, ε) will often worsen and may become explicitly dependent on extra parameters (e.g., sampling rates) needed to preserve incentive properties. The meta-layer remains valid as long as each candidate M_τ is universally truthful and admits an error-dependent guarantee; however, in non-monotone settings we should expect α(τ, ε) to be less “smooth” in ε, making calibration sets ℰ(z_t) and conservative menu design comparatively more important.

Many public-sector procurement problems feature multiple accounting lines (e.g., regional budgets, program earmarks) or knapsack-like constraints. A clean generalization is to replace the single constraint ∑_ip_i, t ≤ B with m constraints ∑_ip_i, t^(j) ≤ B^(j) for j = 1, …, m. The welfare benchmark becomes a multi-knapsack optimum, and the base mechanisms M_τ must be redesigned accordingly (often via Lagrangian relaxations or sequential posted prices with per-constraint pacing).

From our perspective, the meta-layer is largely unchanged: we can index τ by a vector of trust and pacing parameters, and define α(τ, ε) relative to the multi-constraint OPT_t. What changes is the in regret: the natural V upper bound may depend on min_jB^(j) or on the curvature of the Lagrangian surrogate, and poorly normalized proxy losses can lead to unstable adaptation across constraints. This is one place where numerical tuning (learning rates, normalization constants, clipping) is not merely cosmetic: it is part of ensuring that the selector does not overreact to noisy L_t(τ) when some constraints are tight and others slack.

If there are multiple buyers with budgets B^(k) competing for the same suppliers, the problem becomes a dynamic assignment/auction with externalities. A direct extension of per-round DSIC procurement is typically impossible without changing the market design: suppliers may face different offered prices, and buyers’ selections interact. One promising direction is to treat each round as a matching market with a rule (e.g., a truthful double auction variant or posted-price market clearing) and to let learning augmentation enter via predictions about demand/supply balance.

In such multi-buyer settings, our “meta chooses τ_t, then runs M_{τ_t}” composition still applies, but the base family {M_τ} must be truthful with respect to whichever side is strategic (suppliers, buyers, or both). Moreover, the calibration signal z_t must remain exogenous to all strategic reports in the relevant round; otherwise, strategic manipulation can occur indirectly by influencing the diagnostic (e.g., if z_t is computed from acceptance rates that agents can affect).

We have allowed n_t and N_t to be random, and the within-round online nature of M_τ already accommodates adversarial orderings in the strongest models. Still, many platforms face “semi-random” arrivals: an adversary chooses a set, then an independent random order is realized, or arrivals depend on past posted prices but not on current bids. These intermediate models can be exploited to sharpen α(τ, ε) (e.g., better constants under random order) and, correspondingly, to justify more aggressive trust parameters when diagnostics are strong.

The main conceptual caution is that if arrivals respond to mechanism choices, then the meta-layer becomes part of the environment that shapes future N_t. This does not break per-round DSIC—truthfulness is defined holding the mechanism fixed—but it does complicate welfare guarantees because the hindsight comparator $\max_{\tau}\sum_t \mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ now depends on a counterfactual arrival process. Addressing this rigorously would require a model of platform-induced feedback and an appropriate benchmark (e.g., policy regret rather than external regret).

A frequent practical requirement is that procurement satisfy fairness constraints: minimum spend shares for disadvantaged groups, quotas, or caps to prevent over-concentration. Formally, we might impose linear constraints on selected sets, such as
ℓ_g ≤ ∑_{i ∈ S_t}a_i, g ≤ u_g for groups g,
or require approximate proportionality over time (e.g., ∑_t ≤ T∑_{i ∈ S_t}a_i, g tracks targets). There are two distinct issues.

First, fairness constraints change the algorithmic core of M_τ: feasibility is no longer a single knapsack, and posted-price rules may need to be group-aware. Second, fairness interacts with the learning augmentation: predictions ω_t about unconstrained OPT_t may be misleading if binding fairness constraints reduce attainable welfare. A more coherent design is to predict a benchmark OPT_t^fair and calibrate errors relative to it. When this is not possible, the calibration set ℰ(z_t) should be widened to reflect constraint-induced model mismatch, again pushing the selector toward conservative τ.

The incentive properties in our composition argument are structural: as long as τ_t does not depend on current bids and each M_τ is universally truthful, per-round DSIC and ex post budget feasibility follow without tuning. By contrast, three components are inherently quantitative and typically require tuning in implementations: (i) the construction of proxy losses L_t(τ) from z_t (including normalization to a stable range), (ii) the online learning update (learning rate schedules, exploration parameters if bandit feedback is used), and (iii) the calibration mapping z_t ↦ ℰ(z_t) (choice of δ, window lengths, and robustness to drift). These choices do not alter the basic logic of the framework, but they do determine whether the meta-layer’s adaptivity is a modest refinement or a material welfare improvement in finite samples.

Taken together, these extensions reinforce a simple message: learning augmentation is most valuable when it is (through z_t and calibration) and (through bid-independent meta-choices). Departures from monotonicity, single budgets, or exogenous arrivals are feasible, but they raise the premium on careful mechanism engineering and on diagnostics that remain credibly outside strategic control.

Appendix: proofs, additional lemmas on unbiased estimators/off-policy evaluation, and explicit counterexamples for DSIC failure under bid-dependent selection.

This appendix records the formal steps that are implicit in the main text: (i) why the meta-layer preserves dominant-strategy incentives and budget feasibility when it is bid-independent, (ii) how to construct unbiased (or conservatively biased) welfare proxies that support learning over τ, and (iii) why allowing τ_t to depend on current bids can destroy DSIC even if every M_τ is DSIC when run in isolation.

Fix a round t. Let Meta select τ_t ∈ 𝒯 as a (possibly randomized) function of (z₁, …, z_t) only, and then run M_{τ_t} on bids b_⋅, t. We treat arrivals and any internal randomness of M_τ in the standard “universal truthfulness” way: for each τ, M_τ is a distribution over deterministic DSIC mechanisms, conditional on the realized arrival sequence.

Fix any agent i, any true cost c_i, t, and any bids b_−i, t of others. Condition on the realized signal history (z₁, …, z_t) and on the realized τ_t. Under this conditioning, the allocation/payment rule faced by agent i is exactly M_{τ_t}, which is DSIC by assumption, hence truthful reporting maximizes u_i, t for that fixed τ_t. Taking expectations over the randomness in τ_t (and any internal randomization) preserves the inequality because τ_t is independent of b_i, t under the conditioning; there is no channel through which i can influence which DSIC mechanism is selected within the round. ▫

For any realized τ_t, the executed mechanism is M_{τ_t}, which respects the budget ex post. ▫

These are “structural” statements: they do not rely on tuning, stochastic assumptions on values, or any smoothness of α(τ, ε); the only substantive requirement is that the selection step not be a function of current bids.

For each τ, the base guarantee takes the form
$$ \mathbb{E}[W_t\mid \tau_t=\tau]\ \ge\ \frac{1}{\alpha(\tau,\varepsilon_t)}\,\mathrm{OPT}_t. $$
Calibration yields a high-probability event {ε_t ∈ ℰ(z_t)}. Define the conservative (signal-only) factor
$$ \overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon). $$
Then on the calibration event we have $\alpha(\tau,\varepsilon_t)\le \overline{\alpha}(\tau,z_t)$, hence
$$ \mathbb{E}[W_t\mid \tau_t=\tau,\ \varepsilon_t\in\mathcal{E}(z_t)]\ \ge\ \frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathrm{OPT}_t. $$
This motivates a proxy “reward” L_t(τ) that depends only on z_t (and possibly ω_t) and lower-bounds $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ up to a known scale factor. In the full-information idealization, one may take $L_t(\tau)=V/\overline{\alpha}(\tau,z_t)$ when OPT_t ≤ V, which yields a monotone transformation suitable for regret minimization over τ. In practice, we often replace V by an online normalization (e.g., a clipped estimate of OPT_t) to avoid sensitivity to loose global bounds.

A practical complication is that W_t = v_t(S_t) is only observed for the chosen τ_t. When the selector randomizes over τ with probabilities π_t(τ) > 0, we can form an inverse-propensity (IPS) estimate of the welfare that obtained by each τ.

Let W_t(τ) denote the (random) welfare that would result in round t if we ran M_τ on the realized bids and arrivals (including M_τ’s own randomization). The logging policy draws τ_t ∼ π_t and observes W_t := W_t(τ_t). Define, for each τ,
$$ \widehat{W}_t(\tau)\ :=\ \frac{\mathbf{1}\{\tau_t=\tau\}}{\pi_t(\tau)}\,W_t. $$

Conditioning on ℱ_t − 1, τ_t is drawn from π_t independent of within-round outcomes. Thus
$$ \mathbb{E}[\widehat{W}_t(\tau)\mid \mathcal{F}_{t-1}] =\sum_{\tau'}\pi_t(\tau')\frac{\mathbf{1}\{\tau'=\tau\}}{\pi_t(\tau)}\mathbb{E}[W_t(\tau')\mid \mathcal{F}_{t-1}] =\mathbb{E}[W_t(\tau)\mid \mathcal{F}_{t-1}]. $$
▫

The limitation is variance: if π_t(τ) is small, the importance weight explodes. A standard fix is clipping:
$$ \widehat{W}^{\mathrm{clip}}_t(\tau)\ :=\ \frac{\mathbf{1}\{\tau_t=\tau\}}{\max\{\pi_t(\tau),\gamma\}}\,W_t, $$
which is biased downward (hence conservative) for γ > 0. This conservatism is often acceptable here because our regret-style guarantees compare against a best-in-hindsight conservative benchmark $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$, and because the mechanism design layer does not require unbiasedness for truthfulness—only the learning layer’s stability does.

We give an explicit single-round example showing that even if M_τ is DSIC for every τ, the composite rule need not be DSIC if τ is chosen as a function of the current bids.

Consider one item to be procured (so v({i}) = 1 and v(∅) = 0), budget B = 1, and two candidate posted-price mechanisms:

Both mechanisms are DSIC: the allocation is monotone in an agent’s bid and the payment is a bid-independent posted price conditional on acceptance.

Now define a (bid-dependent) selector rule: choose τ^H if min_ib_i ≤ 0.5, else choose τ^L. Intuitively, “if bids look cheap, trust the predictor and pay more aggressively,” but crucially the trigger is a current bid.

Let agent 1’s true cost be c₁ = 0.45 and agent 2’s bid be fixed at b₂ = 0.6. If agent 1 bids truthfully b₁ = 0.45, then min {b₁, b₂} = 0.45 ≤ 0.5, so the selector picks M_τ^H. Agent 1 is the lowest bidder and is purchased at price 0.9, yielding utility u₁ = 0.9 − 0.45 = 0.45.

If instead agent 1 shades upward to b₁^′ = 0.55, then min {b₁^′, b₂} = 0.55 > 0.5, so the selector picks M_τ^L. Now agent 2 is the lowest bidder at 0.6, but 0.6 > 0.4, so no one is purchased and agent 1’s utility is 0. This deviation is worse, so DSIC is not yet violated.

To generate a profitable deviation, reverse the trigger: choose τ^H if min_ib_i > 0.5, else choose τ^L. With the same costs and b₂ = 0.6, truthful bidding b₁ = 0.45 makes min b = 0.45 ≤ 0.5, so τ^L is selected; agent 1 is purchased at price 0.4, which is below cost, so she would prefer to avoid winning (utility −0.05). By misreporting upward to b₁^′ = 0.55, we get min b = 0.55 > 0.5, so τ^H is selected and agent 1 is purchased at price 0.9 (utility 0.45), a strict improvement. Hence truthful reporting is not a dominant strategy.

The economic point is that bid-dependent selection creates a new lever: an agent can manipulate not only whether she wins given a mechanism, but also mechanism (i.e., which pricing rule) is run. This breaks the standard monotonicity/payment logic that underpins DSIC, even when every candidate mechanism is itself DSIC.

Learning How Much to Trust: Exogenously-Calibrated Meta-Mechanisms for Truthful Budget-Feasible Procurement

Metadata

Table of Contents

Content

1. Introduction and motivation: why choosing τ is the main deployment bottleneck; overview of exogenous calibration signals and governance constraints; contributions and results.

2. Related work: learning-augmented algorithms (consistency/robustness), learning-augmented mechanism design, online budget-feasible mechanisms with predictions (mixing parameter τ), and online learning/expert advice.

3. Model: repeated procurement with per-round DSIC budget-feasible mechanisms; definition of prediction error εt and calibration signal zt; exogeneity conditions; performance benchmarks (best fixed τ in hindsight; error-robust bounds).

3. Model: repeated budget-feasible procurement with calibrated predictions

4. The mechanism menu: formal requirements on Mτ (universal truthfulness, budget feasibility, error-dependent welfare bound α(τ, ε)); examples instantiated from posted-price mixtures (e.g., Mechanism 1 family).

4. The mechanism menu: formal requirements on Mτ

Truthfulness and feasibility as hard constraints

Error-dependent welfare guarantees

A canonical example: posted-price mixtures (the “Mechanism 1” family)

Practical interpretation and limitations

5. Meta-selection rules: (a) full-information selector using lower bounds Lt(τ) derived from zt; (b) bandit selector using realized welfare and unbiased estimators; discussion of when each is appropriate.

5. Meta-selection rules: choosing τt without looking at bids

5.1 Full-information selection via calibrated welfare lower bounds

5.2 Bandit selection from realized welfare (and unbiased estimators)

5.3 Hybridizing zt and bandit feedback

6. Main theorems: DSIC preservation under exogenous selection; regret bounds; end-to-end welfare guarantee relative to best τ; robustness under calibration failure probability δ.

7. Impossibility/negative results: bid-dependent selection breaks DSIC; forward-looking agents can game τ if selection depends on past bids without commitments—formalize assumptions needed to avoid this.

8. Comparative statics and design guidance: menu size vs regret, tighter calibration sets vs performance, mapping zt to ℰ(zt); when to prefer conservative vs aggressive menus.

9. Discussion and extensions: non-monotone objectives, multi-budget/multi-buyer, semi-random arrivals, and integrating fairness constraints; what would require numerical tuning.

Appendix: proofs, additional lemmas on unbiased estimators/off-policy evaluation, and explicit counterexamples for DSIC failure under bid-dependent selection.

3. Model: repeated procurement with per-round DSIC budget-feasible mechanisms; definition of prediction error ε_t and calibration signal z_t; exogeneity conditions; performance benchmarks (best fixed τ in hindsight; error-robust bounds).

4. The mechanism menu: formal requirements on M_τ (universal truthfulness, budget feasibility, error-dependent welfare bound α(τ, ε)); examples instantiated from posted-price mixtures (e.g., Mechanism 1 family).

4. The mechanism menu: formal requirements on M_τ

5. Meta-selection rules: (a) full-information selector using lower bounds L_t(τ) derived from z_t; (b) bandit selector using realized welfare and unbiased estimators; discussion of when each is appropriate.

5. Meta-selection rules: choosing τ_t without looking at bids

5.3 Hybridizing z_t and bandit feedback

8. Comparative statics and design guidance: menu size vs regret, tighter calibration sets vs performance, mapping z_t to ℰ(z_t); when to prefer conservative vs aggressive menus.