Deploying learning-augmented procurement mechanisms is often less constrained by the design of the underlying allocation rule than by a more prosaic question: how much should we trust the prediction, and when? In our setting, each round brings a fresh population of strategic suppliers, a hard per-round budget B, and a buyer valuation vt(⋅) (e.g., submodular coverage or diversity). A learning-augmented mechanism Mτ can exploit a prediction signal ωt (such as a forecast of OPTt or an appropriate scale parameter) to set posted prices, thresholds, or acceptance rules that perform well when the prediction is accurate. Yet these mechanisms typically expose a trust parameter τ: at τ ≈ 1 the mechanism leans aggressively on ωt and is highly “consistent” with accurate predictions; at τ ≈ 0 it becomes conservative and “robust” to errors, reverting toward worst-case guarantees. In practice, choosing τ is the main deployment bottleneck. A single fixed choice is rarely appropriate across changing market conditions, shifting supplier pools, and nonstationary model quality. At the same time, allowing τ to adapt freely to observed bids risks breaking dominant-strategy incentives and undermining the very procurement guarantees (budget feasibility, individual rationality) that motivated the mechanism in the first place.
This paper studies the governance problem of selecting τ over time, under constraints that are natural in regulated or audited procurement environments. We imagine a platform or “monitor” that is organizationally separated from the bidding process: before suppliers report costs in round t, the monitor observes an exogenous diagnostic zt measuring prediction quality (for example, a holdout loss, a drift statistic, a conformal radius, or a model-card style risk score). Crucially, zt is independent of the current bids and is observable prior to the round’s strategic interaction. The monitor must choose τt based only on these diagnostics (and their history), and the mechanism then runs Mτt with the chosen trust level. This separation is not merely a modeling convenience. It mirrors common governance constraints—firewalls between prediction systems and procurement decisions, documentation requirements for why a model is trusted, and restrictions on conditioning policy parameters on the content of submitted bids. Our aim is to show that such constraints are compatible with strong welfare performance, and in fact can be leveraged to obtain clean, auditable guarantees.
The technical starting point is that, for each fixed τ, the learning-augmented mechanism comes with an error-dependent approximation factor α(τ, εt), where εt summarizes the prediction error in round t. This captures a familiar tradeoff: more trust yields better performance when εt is small but can degrade sharply when εt is large. The challenge is that εt is not observed at decision time, and ωt alone does not certify its own accuracy. Our key move is to route adaptivity through calibrated diagnostics. We assume that zt induces a confidence set ℰ(zt) ⊆ [0, 1) such that Pr [εt ∈ ℰ(zt)] ≥ 1 − δ. This transforms the problem from “guess εt” to “optimize against a set of plausible errors,” a form that naturally supports worst-case reasoning and accountability: a policymaker can point to ℰ(zt) as an explicit, statistically justified uncertainty statement.
Conceptually, this lets us define for each τ a calibrated worst-case
factor
$$
\overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in\mathcal{E}(z_t)}
\alpha(\tau,\varepsilon),
$$
which is computable from the diagnostic zt (given the
known performance profile α)
and yields a round-specific notion of how risky it is to trust the
prediction at level τ. The
monitor’s problem becomes an online choice over a finite menu 𝒯: select τt to maximize
welfare lower bounds implied by $\overline{\alpha}(\tau,z_t)$, subject to the
imperative that τt not depend on
contemporaneous bids. This is where the economic logic meets online
learning: we can view each τ ∈ 𝒯 as an “expert,” and each round
supplies (through zt) a prediction
of how well that expert would perform in a worst-case sense. The monitor
then updates its choice rule over time, balancing exploration and
exploitation, but doing so entirely outside the strategic channel.
Our contributions are threefold. First, we formalize a meta-mechanism architecture in which the selection of τt is measurably independent of current reports, while the within-round procurement remains a standard universally truthful, individually rational, ex post budget-feasible mechanism. This yields an end-to-end procedure that preserves dominant-strategy incentives despite adaptivity over time. The key intuition is simple: dominant strategies are robust to randomization and to history dependence, but not to within-round feedback loops that let a bidder manipulate the policy parameter governing her own acceptance or payment. By insisting that τt is chosen before bids arrive and as a function of z ≤ t only, we eliminate precisely that manipulation channel.
Second, we provide a welfare guarantee that is “best-in-hindsight” over the trust menu. Rather than comparing to a single fixed τ chosen ex ante, we show that the meta-procedure tracks the best static choice in 𝒯 evaluated under the calibrated worst-case factors $\overline{\alpha}(\tau,z_t)$, up to a standard regret term scaling as $O(\sqrt{T\log|\mathcal{T}|})$ times an appropriate welfare scale V. Economically, this means the monitor can be conservative when diagnostics indicate low reliability, and opportunistic when diagnostics certify high reliability, without paying much more than the unavoidable learning cost for not knowing ex ante which trust level will be most appropriate across the realized sequence of environments.
Third, we clarify a negative result that motivates our governance constraint: if τt is allowed to depend on even a single agent’s current report, dominant-strategy truthfulness can fail. This observation is practically important because “data-driven policy knobs” are often tuned using real-time bid information (e.g., adjusting thresholds when bids seem high). Our framework explains why such tuning is not innocuous in strategic environments: it changes the mapping from an agent’s report to the mechanism’s rule, creating profitable deviations even when each Mτ is truthful in isolation.
We view these results as an argument for a particular division of labor in learning-augmented procurement: predictions can and should inform allocation, but the degree of trust should be mediated by exogenous, auditable calibration signals rather than endogenous bid data. The model does not claim that all relevant information about prediction quality is captured by zt, nor that calibration can be achieved without cost; rather, it isolates a tractable and institutionally plausible route to combine learning-augmentation with incentive compatibility. In doing so, it turns the “choose τ” bottleneck from an ad hoc deployment choice into a principled, analyzable component of the mechanism.
A first strand of work that motivates our approach studies learning-augmented algorithms, where an online algorithm is given a prediction ω (or advice) and is evaluated along two dimensions: it should be consistent—performing near-optimally when the prediction is accurate—and robust—not degrading too much when the prediction is wrong. This consistency/robustness paradigm appears across caching, paging, scheduling, and covering problems, often via an explicit interpolation parameter that mixes a prediction-driven policy with a worst-case baseline (e.g., “follow the advice” with probability τ, otherwise revert to a classical competitive algorithm). Conceptually, our trust parameter τ plays the same role: at higher τ, the mechanism is more consistent with ωt, while at lower τ it becomes robust to error. A key difference is that procurement introduces a strategic layer: unlike many learning-augmented algorithmic settings, the input is not passively revealed but is reported by self-interested agents, so the standard ways of tuning τ using observed outcomes can create incentive problems. This is precisely why we route adaptivity through an exogenous diagnostic zt rather than through bids.
A second, closely related literature is learning-augmented mechanism design, which asks how predictions about values, costs, or future arrivals can improve welfare, revenue, or budget utilization while maintaining incentive constraints. Here, the relevant object is not only an allocation rule but a truthful payment scheme (e.g., posted prices, threshold payments, or variants of Myerson-style monotonicity arguments). Several papers show how predictions can be incorporated while preserving dominant strategies, typically by ensuring that the allocation is monotone in each agent’s report and payments are defined by critical thresholds. In our environment, we take as a primitive a finite menu {Mτ}τ ∈ 𝒯 of such truthful, budget-feasible mechanisms, each equipped with an error-dependent guarantee summarized by α(τ, ε). Our contribution is not to re-derive truthfulness for a particular Mτ, but to clarify an additional governance layer: even if each mechanism is DSIC “in isolation,” adapting the mechanism choice to contemporaneous reports can destroy DSIC. This observation connects to a broader theme in learning-augmented mechanism design: preserving incentives often requires strict separations between what is learned, what is predicted, and what is optimized on the basis of strategic reports.
A third body of work concerns online budget-feasible procurement mechanisms, especially for monotone submodular valuations under a hard budget. Classic results develop posted-price and threshold-based online mechanisms that are universally truthful, individually rational, and ex post budget-feasible, often with constant-factor approximations under random arrival or secretary-style assumptions, and with weaker guarantees in fully adversarial orders. More recent work augments these mechanisms with predictions—e.g., forecasts of OPTt, of the distribution of costs, or of an appropriate threshold/price level—to improve welfare when predictions are accurate. These designs naturally introduce a mixing knob (our τ) that interpolates between a “predicted threshold” and a “safe threshold” that guarantees feasibility and a baseline approximation. Our paper is aligned with this line: we explicitly model a prediction signal ωt that can be informative but unreliable, and we summarize the resulting welfare via a function α(τ, εt). The distinctive aspect of our framework is that we separate mechanism design (within a round) from policy selection (across rounds): the within-round mechanism is held fixed once τt is chosen, while τt itself is selected by a monitor that is restricted to use only exogenous diagnostics. This separation matches how prediction-assisted procurement is often deployed: a model-risk or compliance function may certify when a predictive model is “in scope,” while the procurement engine must remain DSIC and budget-feasible regardless of such certification.
A fourth ingredient is the literature on calibration and predictive uncertainty as a tool for decision-making. While calibration is usually studied for probabilistic forecasts, conformal prediction, or uncertainty quantification, the economic role is similar: a diagnostic zt induces a set of plausible errors ℰ(zt) with a coverage guarantee 1 − δ. Our use of calibration is deliberately minimalistic. We do not require that ωt be unbiased or that errors be i.i.d.; we only require an exogenous statement of uncertainty that is measurable before the strategic interaction. This framing lets us convert the unobserved error εt into a worst-case factor $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$, which is the key quantity fed into the meta-level selection rule. In this sense, we import a familiar idea from robust optimization—optimize against a confidence region—into a repeated mechanism-selection problem where incentives constrain what may be conditioned on.
Finally, our meta-mechanism draws on online learning with expert advice. If each τ ∈ 𝒯 is viewed as an expert, then each round provides a signal-dependent proxy for the welfare one could guarantee by choosing that expert. Standard regret-minimization tools (e.g., multiplicative weights) imply that the cumulative performance of the selector competes with the best fixed expert in hindsight up to $O(\sqrt{T\log|\mathcal{T}|})$ regret, provided the per-round feedback is well-defined and bounded. What is novel in our setting is the interplay with strategic behavior: we must ensure that the “losses” used by the learning rule are computed without peeking at bids in a way that would make agents pivotal in the choice of τt. Put differently, in a non-strategic online learning problem it is innocuous to define feedback using the realized instance; in procurement, using bid-dependent feedback to set τt can endogenize the mechanism itself and create profitable deviations. Our restriction that τt be measurable with respect to σ(z ≤ t) can be seen as an incentive-compatible analogue of the non-anticipation constraints common in online learning, tailored to preserve dominant strategies.
Taken together, these strands suggest a common lesson: learning-augmentation is most powerful when it provides a “fast lane” under accuracy, but it requires an explicit “guardrail” under error. Our contribution is to formalize a guardrail that is simultaneously statistical (through calibration sets), economic (through DSIC and budget feasibility), and algorithmic (through expert-style selection over τ).
We study a repeated online procurement problem over rounds t = 1, …, T. In each round a single buyer (the principal) wishes to procure a subset of suppliers subject to a hard budget constraint B that applies within the round. The buyer’s benefit from a set of procured suppliers St ⊆ Nt is given by a monotone submodular valuation function vt : 2Nt → ℝ ≥ 0, accessed via a value oracle. The submodularity assumption captures diminishing returns and allows us to interpret the buyer’s objective as a welfare-maximization problem with combinatorial structure, rather than a pure “cheapest wins” procurement.
Agents and private information. In round t, a set Nt of suppliers
arrives (potentially random, but observed by the mechanism within the
round). Each supplier i ∈ Nt
has a private cost ci, t ≥ 0
for being selected. Suppliers are single-parameter: their type is fully
summarized by this cost. They submit bids bi, t
to the platform. If selected, supplier i receives a payment pi, t
and obtains utility
$$
u_{i,t}=
\begin{cases}
p_{i,t}-c_{i,t}, & i\in S_t,\\
0, & i\notin S_t.
\end{cases}
$$
The buyer’s realized welfare in round t is Wt := vt(St),
subject to an ex post budget constraint ∑i ∈ Stpi, t ≤ B.
Benchmark and per-round optimality. To evaluate
performance we compare against the offline budget-feasible optimum
within each round:
OPTt := max {vt(S) : S ⊆ Nt, ∑i ∈ Sci, t ≤ B}.
This benchmark is intentionally demanding: it knows all realized costs
in round t and solves the
knapsack-like submodular maximization problem under the budget. Our
mechanisms will compete with OPTt only up to
approximation factors, reflecting both online constraints and incentive
constraints.
A menu of learning-augmented mechanisms. Rather than designing a single mechanism, we assume the platform has access to a finite menu {Mτ : τ ∈ 𝒯} indexed by a trust parameter τ. Each Mτ is an online procurement mechanism that, for a fixed τ, is (i) dominant-strategy incentive compatible (DSIC) within the round, (ii) individually rational (IR), and (iii) ex post budget-feasible. Moreover, each Mτ is learning-augmented: it is allowed to use an externally provided prediction signal ωt in round t (e.g., a forecast of OPTt, or a forecast of a good posted-price/threshold level).
We summarize the benefit of ωt through an
error-dependent welfare guarantee. Specifically, there is an
unobserved prediction error parameter εt ∈ [0, 1) (we
will later instantiate it with a concrete relation such as ωt = (1 − εt)OPTt
for underestimation). The key property is that for each τ and each round t,
$$
\mathbb{E}\big[v_t(S_t)\,\big|\,\tau_t=\tau\big]\ \ge\
\frac{1}{\alpha(\tau,\varepsilon_t)}\ \mathrm{OPT}_t,
$$
where α(τ, ε) ≥ 1 is an
approximation/competitive factor capturing the usual
consistency–robustness tradeoff: for more “trusting” τ, α(τ, ε) can be
small when ε is small, but may
grow when ε is large; for more
conservative τ, α(τ, ε) is flatter
but may never be as small under perfect predictions.
Calibration signals and uncertainty sets. A central modeling choice is that the platform does not directly observe εt, nor can it reliably infer εt from bids without risking endogeneity. Instead, before bids are collected in round t, the platform observes an exogenous calibration/quality signal zt. This signal could represent, for instance, a holdout loss of the predictive model on recent non-strategic data, a conformal prediction radius, or a model-monitoring statistic used in practice by a risk/compliance function.
From zt, the platform
constructs a confidence set ℰ(zt) ⊆ [0, 1)
of plausible prediction errors such that
Pr [εt ∈ ℰ(zt)] ≥ 1 − δ,
for a prescribed failure probability δ. We emphasize what we do
not assume: we do not need εt to be i.i.d.
over time, nor ωt to be
unbiased; the calibration guarantee is allowed to be distribution-free
and time-varying as long as it is valid at the stated level.
This calibration device lets us convert the unobserved εt into a
worst-case welfare factor that is computable from zt alone:
$$
\overline{\alpha}(\tau,z_t)\ :=\ \sup_{\varepsilon\in
\mathcal{E}(z_t)}\alpha(\tau,\varepsilon).
$$
Intuitively, $\overline{\alpha}(\tau,z_t)$ is the
“model-risk-adjusted” approximation factor for choosing trust level
τ in a round whose diagnostic
is zt.
Timing and exogeneity restrictions. The round-t sequence is:
1. The platform observes (ωt, zt)
and chooses τt.
2. Suppliers in Nt submit bids
bi, t.
3. The platform runs Mτt
online, making irrevocable accept/reject decisions and payments,
producing St and {pi, t}.
4. Welfare vt(St)
is realized.
The crucial exogeneity conditions are:
- zt (and
hence ℰ(zt)) is
independent of current-round bids b⋅, t.
- The chosen trust parameter τt is measurable
with respect to σ(z1, …, zt)
(and any public context), but not a function of the current
bids.
This restriction is not cosmetic. It formalizes a separation-of-duties principle: model monitoring and mechanism operation can interact, but the monitor does not condition the mechanism choice on the strategic reports it will later elicit. In applications, this matches the idea that the platform may certify whether the predictive model is “in policy” based on diagnostics that are not manipulable by suppliers within the procurement event.
Performance benchmark across rounds. Because the
platform chooses τt adaptively,
we evaluate it against a natural “best fixed trust level in hindsight”
benchmark that accounts for calibration. In each round, choosing τ can guarantee at least $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$
whenever εt ∈ ℰ(zt).
Summing across rounds, the benchmark is
$$
\max_{\tau\in\mathcal{T}}\ \sum_{t=1}^T
\frac{1}{\overline{\alpha}(\tau,z_t)}\ \mathrm{OPT}_t,
$$
which selects a single τ that
would have performed best given the realized sequence of diagnostics.
Our meta-level goal is then to achieve cumulative welfare close to this
benchmark up to a standard regret term (depending on T and |𝒯|) and a scale parameter V that bounds per-round values such
as OPTt.
We view this benchmark as striking the right balance between ambition and feasibility: it credits the platform for using uncertainty information zt, but it does not allow hindsight tuning of τ to the realized bids or realized εt. In particular, it reflects the practical constraint that the platform can pre-commit to a monitoring-and-selection policy but cannot safely “peek” into strategic inputs to decide how much it trusts the prediction in that same round.
At the core of our approach is a deliberate modularity: we do not hard-code a single learning-augmented procurement rule, but instead assume the platform has access to a menu {Mτ : τ ∈ 𝒯} of candidate online mechanisms indexed by a scalar “trust” knob τ. The economic motivation is straightforward. In procurement with predictions, there is typically no uniformly best way to incorporate ωt: leaning heavily on ωt can yield near-optimal welfare when predictions are accurate, yet it can be fragile when the signal is miscalibrated or stale. Conversely, conservative rules are robust but leave value on the table when the model is performing well. The menu representation turns this modeling tension into an explicit design object, and allows the meta-layer (Section 5) to arbitrate among mechanisms without ever conditioning on strategic reports within the round.
Formally, each Mτ is an online, budget-feasible, truthful mechanism for the single-parameter environment induced by costs. We treat the within-round arrival process as arbitrary (and potentially adaptive), and require the mechanism to make irrevocable accept/reject decisions as suppliers arrive. The trust parameter τ is fixed at the start of the round, publicly known, and then the mechanism runs to completion.
We impose three non-negotiable properties on each Mτ.
(i) Universal truthfulness (within a round). Each Mτ may randomize, but its randomization must be “outside” the strategic interaction in the standard strong sense: Mτ is a distribution over deterministic mechanisms such that, for every realization of the mechanism’s internal randomness and every realized arrival sequence, truthful bidding is a dominant strategy. Concretely, if we write Mτ as drawing a seed r and then running a deterministic rule Mτr, we require that for all r, Mτr is DSIC in the single-parameter sense: the allocation rule is monotone in each bid and payments are the corresponding critical values. This property matters because the meta-layer will adapt τ over time; universal truthfulness ensures that the mechanism remains DSIC even when the selector’s randomness and history-dependent choices are common knowledge.
(ii) Ex post budget feasibility. For every bid
profile bt
and every realization of internal randomness, total payments must
satisfy
∑i ∈ Ntpi, t(bt) ≤ B.
We emphasize the ex post nature: the budget constraint is not
only in expectation, but holds pathwise. This aligns with procurement
practice, where payments must be authorized within a fixed spending
limit rather than amortized across rounds.
(iii) Individual rationality. If a supplier is
selected, then the payment covers its bid, i.e.,
i ∈ St(bt) ⇒ pi, t(bt) ≥ bi, t.
Given DSIC, IR can be interpreted as ensuring nonnegative utility for
truthful bidders, which is essential for participation in repeated
settings.
These properties are standard, but they strongly restrict how prediction signals can be used. In particular, while ωt can influence posted prices or acceptance thresholds, it cannot enter in a way that makes an agent’s acceptance depend non-monotonically on its own bid, nor can it induce budget violations when many agents accept simultaneously.
What distinguishes the menu {Mτ} from an
arbitrary collection of truthful mechanisms is a common,
prediction-aware performance guarantee. We assume that in each round
there is an (unobserved) error parameter εt ∈ [0, 1)
summarizing the quality of ωt, and that
each Mτ
admits an error-dependent approximation factor α(τ, ε) ≥ 1 such
that
$$
\mathbb{E}\big[v_t(S_t)\,\big|\,\tau_t=\tau\big]\ \ge\
\frac{1}{\alpha(\tau,\varepsilon_t)}\ \mathrm{OPT}_t.
$$
The expectation is taken over the mechanism’s internal randomness and
any randomness in the arrival process (holding fixed the realized
round-t instance).
Economically, α(τ, ε) encodes
the consistency–robustness frontier: for larger τ (more trust in predictions), α(τ, ε) may be
close to 1 when ε is small, but can degrade rapidly
when ε is large; for smaller
τ, α is typically flatter in ε, reflecting robust performance
even under poor predictions.
We do not require a particular functional form for α. What we will rely on later is simply that (a) α(τ, ε) is known to the platform for each menu element (as an analytical guarantee), and (b) it can be evaluated under uncertainty sets ℰ(zt) to produce the worst-case calibrated factor $\overline{\alpha}(\tau,z_t)$.
To ground the abstraction, it is helpful to keep in mind a common template that satisfies the above constraints: learning-augmented posted-price mechanisms implemented as mixtures of a prediction-driven rule and a robust fallback.
A prototypical “Mechanism 1(τ)” proceeds as follows. At the
start of round t, it uses
ωt to
compute a target acceptance threshold—for example, a
marginal-value-per-dollar cutoff λtpred
intended to mimic the Lagrange multiplier of the offline budget-feasible
optimum. Independently, it also computes a conservative cutoff λtrob
that does not rely on ωt (or relies on
it only through safe bounds). Then it randomizes:
$$
\lambda_t\ =\
\begin{cases}
\lambda^{\mathrm{pred}}_t,& \text{with prob. }\tau,\\
\lambda^{\mathrm{rob}}_t,& \text{with prob. }1-\tau.
\end{cases}
$$
Given λt,
the mechanism implements a posted-price/threshold rule while suppliers
arrive: it accepts supplier i
if its bid is below a critical price derived from its marginal
contribution relative to λt and the
remaining budget. In submodular procurement, one convenient
representation is to evaluate the marginal value Δi = vt(S ∪ {i}) − vt(S)
at the current accepted set S,
and offer a take-it-or-leave-it price proportional to Δi/λt,
truncated to respect remaining budget. Payments are exactly these posted
prices, so each supplier faces a single-parameter take-it-or-leave-it
decision, which yields DSIC and IR immediately.
Budget feasibility is ensured mechanically by construction: because prices are posted and each acceptance debits the remaining budget, total payments never exceed B ex post. Universal truthfulness follows because the only randomness is the initial draw of λt; conditional on that draw, the posted-price mechanism is deterministic and truthful.
The welfare analysis of this family typically has the following
structure. When predictions are accurate (small εt), λtpred
approximates the “right” tradeoff between marginal values and costs, so
the τ-branch attains a
constant-factor approximation close to 1 (up to the inherent online/submodular
loss). When predictions are inaccurate, the robust branch protects
welfare with a worst-case constant approximation that does not
deteriorate with εt. One then
obtains an error-dependent bound of the schematic form
α(τ, ε) ≲ min {(robust
constant)/(1 − τ), (consistency term increasing in
ε)/τ},
capturing precisely the tradeoff governed by τ. While the constants and the
dependence on ε depend on the
specific instantiation (e.g., whether ωt predicts
OPTt, a multiplier
λ*, or a good price
ladder), the overarching point is that mixtures let us separate
incentive/budget feasibility (enforced pointwise) from prediction usage
(which enters only through the posted-price level).
From an implementation perspective, the menu 𝒯 can be read as a small set of platform policies: “aggressive,” “balanced,” and “conservative” ways of turning a model score into procurement prices. This is often closer to practice than a continuously tuned parameter, and it simplifies governance: each Mτ can be audited for DSIC and budget compliance, while the meta-layer merely chooses among pre-approved policies based on exogenous diagnostics.
The limitation is equally clear. Our assumptions place substantial weight on the existence of analytically certified α(τ, ε) bounds for each menu element. This is realistic for posted-price/threshold mechanisms where DSIC and budget feasibility are transparent, but it is more restrictive for sophisticated allocation rules whose incentive properties are delicate. The menu abstraction is therefore best viewed as a disciplined interface: it packages complex within-round mechanism design into a small number of vetted options, enabling principled cross-round adaptation without reopening the incentive constraints each time we update how much we “trust” the model.
Having fixed a vetted menu {Mτ : τ ∈ 𝒯}, the remaining design question is how the platform selects τt over time. Conceptually, τ is a policy choice—how aggressively we incorporate the prediction ωt. The difficulty is that this choice must be made (i) online and (ii) in a way that does not entangle the within-round strategic game. Our guiding restriction is therefore measurability: τt must be chosen using only exogenous information observed before bids in round t (in particular, zt and past history), never as a function of current reports b⋅, t.
We present two meta-selection templates, corresponding to whether the platform can compute, prior to bidding, informative per-round “scores” for each τ. The first is a full-information selector based on calibrated lower bounds derived from zt. The second is a bandit selector that learns from realized welfare outcomes.
In some environments, the calibration signal zt is rich
enough that—before seeing bids—we can bound how well each candidate
mechanism will perform in that round up to the (unknown) OPTt. The key object is
the worst-case approximation factor under the calibrated error
set,
$$
\overline{\alpha}(\tau,z_t)\ :=\
\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon),
$$
which is computable from the known guarantee α(τ, ε) and the
confidence set ℰ(zt).
This suggests a natural “expert advice” viewpoint: each τ ∈ 𝒯 is an expert, and in round
t expert τ offers a certified welfare
fraction $1/\overline{\alpha}(\tau,z_t)$ of the
(unobserved) OPTt.
Of course, OPTt
itself is unknown, and we cannot score experts by $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$
directly. We therefore introduce a scale proxy Lt(τ)
that is computable from zt alone and is
monotone in $1/\overline{\alpha}(\tau,z_t)$. A convenient
generic choice is
$$
L_t(\tau)\ :=\ \frac{1}{\overline{\alpha}(\tau,z_t)}\cdot \widehat{V}_t,
$$
where V̂t
is any nonnegative magnitude estimate available ex ante (for instance, a
conservative upper bound V, or
a context-dependent bound on attainable welfare). When ωt predicts
value (e.g., ωt ≈ OPTt
up to multiplicative error), we may set V̂t = ωt
to sharpen adaptation; when ωt is not
value-calibrated, we fall back to a uniform scale V̂t = V
to obtain a scale-stable selector.
Given scores {Lt(τ)}τ ∈ 𝒯,
we can run a standard full-information online learning rule such as
multiplicative weights (Hedge). Writing wt(τ)
for the weight on expert τ at
the start of round t, one
instantiation is
$$
p_t(\tau)\ =\ \frac{w_t(\tau)}{\sum_{\tau'}w_t(\tau')},\qquad
w_{t+1}(\tau)\ =\ w_t(\tau)\exp\big(\eta\, L_t(\tau)/V\big),
$$
and then we sample τt ∼ pt
and run Mτt.
The normalization by V is
purely to keep learning rates well-scaled; any boundedness assumption
that makes Lt(τ)/V ∈ [0, 1]
suffices.
Economically, this rule says: when zt indicates the prediction is reliable (so ℰ(zt) is small), experts with larger τ tend to have smaller $\overline{\alpha}(\tau,z_t)$, hence larger Lt(τ), and accumulate weight. When zt flags low reliability, the robust mechanisms are rewarded because their $\overline{\alpha}(\tau,z_t)$ deteriorates less. Crucially, this arbitration happens entirely “outside” the bidding game.
This full-information route is appropriate when zt can be interpreted operationally—for example, as a conformal prediction radius, a drift statistic, or an out-of-sample loss—and when the platform can credibly map it to a set ℰ(zt) that controls worst-case degradation.
In other settings, the platform may not trust any ex ante scoring rule beyond a very weak bound, even if it has a diagnostic signal zt. Or the platform may prefer to learn directly from outcomes because the mapping zt ↦ ℰ(zt) is conservative. In such cases we can treat τ-selection as a bandit problem: in each round we choose one τt, observe the realized welfare Wt = vt(St), and update.
To apply adversarial bandit algorithms (e.g., Exp3), we need an unbiased per-expert reward
estimate. Let 𝒯 be finite and suppose
we sample τt from a mixed
strategy pt with full
support (often enforced via explicit exploration). After observing Wt, define the
importance-weighted estimate
$$
\widehat{g}_t(\tau)\ :=\ \frac{\mathbf{1}\{\tau_t=\tau\}\,
W_t}{p_t(\tau)}.
$$
Then 𝔼[ĝt(τ) ∣ history] = 𝔼[Wt ∣ τt = τ],
so ĝt(τ)
is an unbiased estimator of expert τ’s expected welfare in that round
(conditional on the realized instance). We can therefore update weights
similarly to multiplicative weights, replacing Lt(τ)
with ĝt(τ)
(again normalized by V to
control learning rates).
Two practical remarks matter here. First, bandit feedback uses realized welfare, which depends on bids and arrivals; nevertheless, the selector’s action τt is still chosen before bids, so the informational dependence does not create strategic manipulation within the round. Second, bandit learning is typically noisier and requires exploration: if we never try a given τ, we cannot learn its value. This exploration can be interpreted as deliberate policy experimentation, which is often acceptable in repeated procurement so long as each tested mechanism remains budget-feasible and truthful.
When should we prefer the bandit route? It is attractive when (i) welfare can be reliably computed at the end of each round (via a value oracle for vt), (ii) the platform distrusts the ex ante calibration mapping or expects model performance to be nonstationary in ways zt does not capture, or (iii) the platform seeks robustness to misspecification of α(τ, ε) by learning which τ works empirically. The cost is statistical: bandit regret scales worse than full-information regret in general, reflecting that we only observe the payoff of the chosen policy.
A pragmatic compromise is to use zt to shape
exploration. For instance, we can set a prior or baseline
distribution qt(τ) ∝ exp (ηLt(τ)/V)
from calibrated scores, and then mix it with uniform exploration to
ensure support:
pt(τ) = (1 − γ) qt(τ) + γ/|𝒯|.
This hybrid retains the safety interpretation of calibration (we do not
over-commit to brittle τ when
zt looks
unfavorable) while still allowing the platform to learn from realized
welfare in regimes where the diagnostics are imperfect.
Across all variants, the meta-layer plays a narrow but crucial role: it adapts among pre-approved truthful mechanisms using only exogenous signals and past outcomes, thereby improving cumulative welfare without reintroducing within-round incentive problems.
We now formalize the guarantees delivered by the meta-layer: selecting τt using only exogenous information preserves the per-round incentive properties of the base mechanisms, and standard online-learning analysis converts per-round certified performance into an end-to-end welfare bound relative to the best fixed τ ∈ 𝒯. We also make explicit how the calibration failure probability δ enters the welfare guarantee.
The first result is essentially a “conditioning” argument: since τt is chosen before bids and independently of them, the within-round strategic game faced by agents is exactly the one induced by the chosen mechanism Mτt.
Fix any (possibly randomized) meta-rule Meta such that, in each round t, τt is measurable
with respect to σ(z1, …, zt)
(and any public context) and is independent of the current bid profile
b⋅, t.
Suppose that for every τ ∈ 𝒯,
the mechanism Mτ is
universally truthful (DSIC), individually rational, and ex post
budget-feasible in each round. Then the composed procedure that
samples/chooses τt via Meta and runs Mτt
in round t is also universally
truthful, individually rational, and ex post budget-feasible in each
round.
Condition on any realization of τt = τ and any arrival realization within round t. By assumption, the induced mechanism is Mτ, which is DSIC, IR, and budget-feasible. Since τt is selected independently of b⋅, t, an agent cannot affect the distribution over τt by misreporting in round t. Therefore, truthful bidding weakly dominates deviations when we take expectations over τt (universal truthfulness follows because the composition is a distribution over deterministic DSIC mechanisms, indexed by τt and internal randomness). Budget feasibility and IR hold pointwise for the realized τt, hence for the mixture as well. ▫
Economically, Theorem 6.1 tells us that we can treat τ as a platform-side “policy knob” that is tuned using diagnostics zt and past outcomes, without reopening incentive issues—provided the knob is set before agents play and is not responsive to their current reports.
Next we connect the meta-layer’s learning dynamics to welfare. The key observation is that each τ comes with a calibrated worst-case approximation factor $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$, and hence a per-round $1/\overline{\alpha}(\tau,z_t)$ of OPTt on the event {εt ∈ ℰ(zt)}. When we run a full-information learner on any bounded score Lt(τ) that is monotone in $1/\overline{\alpha}(\tau,z_t)$ (e.g., $L_t(\tau)=\widehat V_t/\overline{\alpha}(\tau,z_t)$), the regret bound translates into a welfare bound.
Assume 0 ≤ Lt(τ) ≤ V
for all t, τ. Let
Meta be multiplicative weights (Hedge)
applied to the experts τ ∈ 𝒯
with these scores, producing a distribution pt and sampling
τt ∼ pt.
Then for any sequence of signals (zt)t,
$$
\sum_{t=1}^T \mathbb{E}\big[L_t(\tau_t)\big]\ \ge\
\max_{\tau\in\mathcal{T}}\sum_{t=1}^T L_t(\tau)\ -\ R_T,
\qquad
R_T\ =\ O\!\left(V\sqrt{T\log|\mathcal{T}|}\right),
$$
where the expectation is over the meta-randomization (and any internal
randomness of the base mechanisms, if Lt is itself
random but bounded).
The meta-layer competes with the best trust parameter τ chosen in hindsight, in terms of the certified lower-bound proxy. This is the right comparator when τ represents a stable policy choice (e.g., a procurement agency’s “degree of reliance” on predictions).
To turn this into a welfare statement, we use the base guarantee: for
each t and each realized εt,
$$
\mathbb{E}[W_t\mid \tau_t=\tau]\ \ge\
\frac{1}{\alpha(\tau,\varepsilon_t)}\,\mathrm{OPT}_t.
$$
On the calibrated event {εt ∈ ℰ(zt)},
we have $\alpha(\tau,\varepsilon_t)\le
\overline{\alpha}(\tau,z_t)$, hence
$$
\mathbb{E}[W_t\mid \tau_t=\tau,\, \varepsilon_t\in\mathcal{E}(z_t)]\
\ge\ \frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathrm{OPT}_t.
$$
Summing and comparing to the best τ yields the next corollary.
Suppose OPTt ≤ V almost
surely. Under the conditions of Theorem 6.2 and calibration Pr [εt ∈ ℰ(zt)] ≥ 1 − δ,
$$
\sum_{t=1}^T \mathbb{E}[W_t]\ \ge\ \max_{\tau\in\mathcal{T}}\sum_{t=1}^T
\frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathbb{E}[\mathrm{OPT}_t]\ -\
R_T\ -\ \delta T V.
$$
The additive δTV term is a
conservative price for calibration failure: in the worst case, we lose
at most V welfare in a round
where the prediction error falls outside ℰ(zt). (If one
has a nontrivial fallback guarantee for α(τ, ε) even off
the calibrated set, this term can be sharpened accordingly.)
When scores Lt(τ) are unavailable or distrusted, we can instead learn from realized welfare via importance-weighted estimators and obtain the standard adversarial bandit tradeoff.
Assume 0 ≤ Wt ≤ V
almost surely and that Meta uses an
adversarial bandit algorithm (e.g. Exp3) over 𝒯
with exploration ensuring pt(τ) ≥ γ/|𝒯|.
Then
$$
\sum_{t=1}^T \mathbb{E}[W_t]\ \ge\ \max_{\tau\in\mathcal{T}}\sum_{t=1}^T
\mathbb{E}[W_t\mid \tau_t=\tau]\ -\
O\!\left(V\sqrt{T|\mathcal{T}|\log|\mathcal{T}|}\right).
$$
Combined with the calibrated per-round guarantee (as in Corollary 6.3),
this yields an end-to-end bound relative to $\max_{\tau}\sum_t
\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$, with a larger regret
term reflecting partial feedback.
Taken together, these results make the practical message precise: once we restrict the meta-choice τt to be exogenous to current bids, we can safely “wrap” truthful budget-feasible mechanisms in standard learning machinery, obtaining cumulative welfare close to the best calibrated trust level, while explicitly accounting for both learning regret and the residual miscalibration risk δ.
The preceding guarantees hinge on a sharp separation between and . This separation is not merely a technical convenience: once the meta-layer is allowed to look at bids—either contemporaneously within the round, or retrospectively across rounds when agents are forward-looking—dominant-strategy truthfulness can fail in stark, low-dimensional examples. We view these negative results as design guidance: they delineate which forms of adaptivity are “safe” in procurement markets and which forms effectively create a new (and generally non-truthful) mechanism.
In single-parameter environments, DSIC is equivalent to monotonicity of the allocation rule in each agent’s own bid (holding others fixed), together with threshold payments. Each Mτ satisfies this property by assumption. However, if the selector chooses τt as a function of the bid profile b⋅, t, the allocation rule is no longer the monotone allocation rule of any fixed Mτ; it is a spliced rule whose behavior can be non-monotone in bi, t.
A minimal counterexample uses a single agent and posted prices.
There exist two deterministic DSIC posted-price mechanisms Mτ(1)
and Mτ(2)
and a meta-rule τt(bi, t)
such that the composed mechanism is not DSIC.
One agent supplies a single item. The buyer’s value is v({i}) = 1 and the
per-round budget is B = 1.
Define two truthful posted-price mechanisms:
Mτ(1): offer
price
p1 = 0.4, Mτ(2): offer
price p2 = 0.6.
Each is DSIC and IR: the agent accepts iff c ≤ pk.
Now define a bid-dependent selector
$$
\tau_t(b_i)=
\begin{cases}
\tau^{(2)}, & b_i\le 0.4,\\
\tau^{(1)}, & b_i>0.4.
\end{cases}
$$
Consider an agent with true cost c = 0.5. Truthful reporting bi = 0.5
triggers τ(1), the
price 0.4, and hence rejection and
utility 0. By misreporting bi′ = 0.4,
the agent triggers τ(2), receives price
0.6, accepts, and obtains utility 0.6 − 0.5 = 0.1 > 0. Thus truth-telling is
not a dominant strategy.
The economic point is that bid-dependent selection endogenizes the “policy knob” τ. From the agent’s perspective, the bid no longer only affects acceptance within a fixed mechanism; it also affects . This creates profitable manipulations even when every mechanism in the menu is truthful in isolation.
One can generalize the pathology: any selection rule that makes τt respond discontinuously (or merely monotonically) to bi, t can introduce non-monotonic acceptance regions, so the composed mechanism fails the standard characterization of truthful single-parameter mechanisms. In that sense, “choosing among truthful mechanisms based on bids” is not an innocuous mixture; it is itself a new mechanism whose incentive properties must be re-verified from scratch.
A more subtle failure mode arises when τt is allowed to depend on bids or outcomes that are functions of past bids. If each round draws a fresh set of agents who never return, then dependence on b⋅, 1, …, b⋅, t − 1 does not create strategic leverage for round-t agents: they cannot affect the past, and their current bid does not affect τt by construction. In many online procurement applications (e.g., one-off microtasks, spot markets with a large anonymous pool), this “fresh population” approximation is reasonable.
However, if agents are long-lived, expect repeat interaction, or can coordinate across identities, then past-bid dependence creates incentives: an agent may sacrifice current-round utility to move the platform toward a more favorable τ in future rounds.
There exist a two-round instance with a single agent participating in
both rounds, a menu {Mτ} of
per-round DSIC mechanisms, and a meta-update rule τ2 = ϕ(b1)
such that truthful bidding in round 1 is not optimal for the agent once
it internalizes the effect on round 2.
Let Mτ be a posted-price mechanism that offers price τ (still DSIC in each round for fixed τ). Suppose the platform updates τ2 = b1 (or $\tau_2=\min\{B,\max\{\underline\tau,b_1\}\}$ to respect budgets). The agent has constant cost c = 0.2 in both rounds and discount factor β ∈ (0, 1]. If it bids truthfully b1 = 0.2, then τ2 = 0.2 and second-round utility is 0. If instead it overbids to b1′ = 0.6, it may reduce or eliminate first-round trade, but it induces τ2 = 0.6 and then earns second-round utility 0.6 − 0.2 = 0.4, which can outweigh any first-round loss when β is sufficiently large. Thus the platform’s learning rule becomes a manipulable state variable.
This example is intentionally stark, but the mechanism-design lesson is robust: whenever the platform uses bid-influenced statistics (acceptance rates, paid prices, realized welfare) to choose future τ, strategic agents with intertemporal incentives will treat those statistics as targets to steer. Put differently, “no-regret learning” at the platform level can be compatible with DSIC only under a or assumption that rules out profitable intertemporal manipulation.
These negative results clarify the role of our modeling restrictions. To avoid within-round failures, we require τt to be chosen before bids and to be independent of b⋅, t; this is the minimal condition that prevents agents from influencing the mechanism they face in the same round. To avoid cross-round manipulation, we need at least one of the following (application-dependent) safeguards:In practice, these correspond to concrete institutional choices: using holdout-based forecast diagnostics rather than bid-derived KPIs to tune trust; committing to published update schedules; and treating any adaptivity based on market reports as a potential incentive channel that must be explicitly mechanism-designed rather than “bolted on” at the meta-layer.
Having separated “safe” exogenous diagnostics from strategic reports, we can use the welfare bound to derive practical comparative statics: how to pick the menu 𝒯, how much effort to spend tightening ℰ(zt), and how to interpret the resulting behavior of the meta-layer.
The meta-layer’s benefit comes from competing with the best fixed
τ ∈ 𝒯 in hindsight,
round-by-round weighted by the calibration-adjusted factor $\overline{\alpha}(\tau,z_t)$. The cost is
the regret term, which scales as
$$
R_T \cdot V \;\approx\; O\!\left(\sqrt{T\log|\mathcal{T}|}\right)\cdot
V.
$$
Two immediate comparative statics follow.
First, enlarging the menu improves the benchmark $\max_{\tau\in\mathcal{T}}\sum_t \mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ but only logarithmically worsens the regret penalty. This makes it tempting to add many trust levels. However, the relevant object is not cardinality per se but : if two parameters τ, τ′ yield nearly identical $\overline{\alpha}(\cdot,z)$ for all plausible z, then adding both mostly increases log |𝒯| without improving the benchmark. Operationally, we prefer menus that are in the sense that for each plausible error regime there is a distinct best τ, but not so fine-grained that adjacent τ’s are redundant.
Second, the scale V matters as much as |𝒯|. If OPTt varies substantially across rounds (e.g., varying arrival volumes nt), then choosing a normalization that makes per-round proxy losses comparable can dramatically tighten constants. In implementations, we often recommend either (i) normalizing the learning objective by a known upper bound on OPTt, or (ii) using scale-free updates where the proxy Lt(τ) is itself in [0, 1]. The menu can then be slightly larger without making exploration “too expensive” in high-value rounds.
Our welfare guarantee depends on $\overline{\alpha}(\tau,z_t)=\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon)$. Holding the family α(τ, ε) fixed, shrinking ℰ(zt) weakly decreases $\overline{\alpha}$ and thus improves the per-round welfare benchmark. This is the basic “value of information” effect: more informative diagnostics zt allow the meta-layer to select more aggressive trust when predictions are likely accurate.
But the guarantee is only as credible as the coverage property Pr [εt ∈ ℰ(zt)] ≥ 1 − δ. A smaller set with the same nominal δ typically requires stronger modeling assumptions (e.g., stationarity of forecast residuals conditional on zt). In other words, ℰ(zt) embodies a tradeoff between (performance) and (robustness). From a mechanism-design perspective, this tradeoff has a concrete welfare interpretation: occasional calibration failure effectively drops us into a regime where the realized approximation factor can be worse than $\overline{\alpha}(\tau,z_t)$, and if the selector tends to choose aggressive τ precisely when it believes the error is small, failures can be concentrated in high-stakes rounds.
This suggests a conservative design heuristic: when deploying in settings where distribution shift is plausible, we should spend “statistical slack” on maintaining coverage (smaller δ, more conservative ℰ) before spending it on sharpness.
The mapping zt ↦ ℰ(zt) is where statistical practice enters the economic guarantee. We highlight three templates that preserve exogeneity (constructed from holdout data, conformal scores, or external sensors, not from bids).
If the learning-augmented mechanism is primarily sensitive to (e.g.,
ωt = (1 − εt)OPTt),
a natural choice is
ℰ(zt) = [0, ε̂(zt)],
where ε̂(z) is a
high-quantile bound on underestimation conditional on z. Conformal prediction offers an
off-the-shelf way to set ε̂(z) with finite-sample
coverage under exchangeability assumptions.
If both under- and overestimation matter (e.g., mechanisms that can
“overspend” welfare opportunity when ωt is too high),
we can use an interval
$$
\mathcal{E}(z_t)=[\underline{\varepsilon}(z_t),\overline{\varepsilon}(z_t)]\cap[0,1),
$$
paired with a mechanism analysis that treats α(τ, ε) as
increasing in |ε| or
separately in ε’s sign. This
is especially relevant when the prediction signal ωt is a model
output that can drift upward as well as downward.
In many platforms, zt is a coarse quality flag (e.g., “model version,” “data freshness,” “coverage radius”). Then ℰ(zt) can be pre-specified for each tier, yielding a transparent policy: the platform publicly commits to which trust parameters are admissible under each tier. This reduces the perceived arbitrariness of adaptivity and simplifies auditing.
Across templates, a key operational point is . If we want an anytime guarantee over T rounds, a union bound suggests per-round δ = O(1/T). If instead we accept rare failures in exchange for sharper sets, we should explicitly interpret the resulting guarantee as “high probability per round” rather than “uniformly over the horizon.”
Because $\overline{\alpha}(\tau,z_t)$ is a worst-case factor over ℰ(zt), the selector effectively chooses τt to minimize $\overline{\alpha}(\tau,z_t)$ (or maximize a proxy Lt(τ) derived from it). This yields a clean design dichotomy.
A includes trust parameters whose α(τ, ε) degrades slowly in ε, even if they are never the very best under tiny error. This is desirable when (a) calibration sets are wide (weak diagnostics), (b) the cost of miscalibration is high (large V or high variance in OPTt), or (c) the environment is nonstationary. In procurement terms, conservative menus correspond to mechanisms that rely less on ωt and more on robust posted-price or threshold rules that guarantee a baseline fraction of OPTt regardless of prediction quality.
An adds parameters that can outperform under small error but deteriorate quickly as ε grows. These become attractive when zt is informative enough that ℰ(zt) is frequently tight. Put plainly: aggressive options pay off when we can reliably certify that “today’s forecast is good.”
A practical compromise is to ensure the menu contains (i) at least one robust baseline τrob with a uniform guarantee, and (ii) a small number of aggressive candidates whose advantage is material only when zt indicates high confidence. This structure makes the meta-layer’s behavior interpretable: it defaults to robustness and “switches on” trust only when diagnostics justify it.
These comparative statics do not eliminate the need for domain knowledge: the shape of α(τ, ε) is mechanism-specific, and the informativeness of zt is application-specific. What our framework contributes is a disciplined way to translate those objects into transparent design choices—how large a menu to deploy, how to invest in calibration, and when to privilege conservative robustness over aggressive exploitation—while preserving the incentive guarantees that motivate the separation between bids and diagnostics.
Our baseline model makes three choices that keep the incentive and learning layers cleanly separable: (i) per-round objectives are monotone submodular, (ii) there is a single hard budget B for a single buyer each round, and (iii) arrivals and the calibration signal zt are treated as exogenous to current bids. These assumptions are natural in many procurement settings (crowdsourcing with diminishing returns; a single program budget; diagnostics computed from holdout data), but they are not essential. What essential is that the meta-layer’s choice τt remain measurable with respect to non-strategic information in round t, so that we can compose universal truthfulness of Mτ with an online selection rule without reintroducing bid-dependence.
In some applications, adding a supplier can reduce value because of congestion, cannibalization, or diversity penalties. Formally, vt may be submodular but non-monotone, or even neither. In such cases, the relevant benchmark OPTt = max {vt(S) : ∑i ∈ Sci, t ≤ B} remains meaningful, but the algorithmic core changes: monotonicity is what allows many posted-price/threshold mechanisms to align incentives with welfare maximization.
A pragmatic extension is to restrict attention to non-monotone submodular vt, where there exist approximation algorithms based on randomization and contention-resolution ideas. Translating these into budget-feasible mechanisms typically requires additional structure (e.g., mechanisms that sample a subset of agents to estimate thresholds, then apply an oblivious rule to the remainder). The learning-augmented component can still enter through a prediction ωt about OPTt or about the marginal value distribution, but the approximation factor α(τ, ε) will often worsen and may become explicitly dependent on extra parameters (e.g., sampling rates) needed to preserve incentive properties. The meta-layer remains valid as long as each candidate Mτ is universally truthful and admits an error-dependent guarantee; however, in non-monotone settings we should expect α(τ, ε) to be less “smooth” in ε, making calibration sets ℰ(zt) and conservative menu design comparatively more important.
Many public-sector procurement problems feature multiple accounting lines (e.g., regional budgets, program earmarks) or knapsack-like constraints. A clean generalization is to replace the single constraint ∑ipi, t ≤ B with m constraints ∑ipi, t(j) ≤ B(j) for j = 1, …, m. The welfare benchmark becomes a multi-knapsack optimum, and the base mechanisms Mτ must be redesigned accordingly (often via Lagrangian relaxations or sequential posted prices with per-constraint pacing).
From our perspective, the meta-layer is largely unchanged: we can index τ by a vector of trust and pacing parameters, and define α(τ, ε) relative to the multi-constraint OPTt. What changes is the in regret: the natural V upper bound may depend on minjB(j) or on the curvature of the Lagrangian surrogate, and poorly normalized proxy losses can lead to unstable adaptation across constraints. This is one place where numerical tuning (learning rates, normalization constants, clipping) is not merely cosmetic: it is part of ensuring that the selector does not overreact to noisy Lt(τ) when some constraints are tight and others slack.
If there are multiple buyers with budgets B(k) competing for the same suppliers, the problem becomes a dynamic assignment/auction with externalities. A direct extension of per-round DSIC procurement is typically impossible without changing the market design: suppliers may face different offered prices, and buyers’ selections interact. One promising direction is to treat each round as a matching market with a rule (e.g., a truthful double auction variant or posted-price market clearing) and to let learning augmentation enter via predictions about demand/supply balance.
In such multi-buyer settings, our “meta chooses τt, then runs Mτt” composition still applies, but the base family {Mτ} must be truthful with respect to whichever side is strategic (suppliers, buyers, or both). Moreover, the calibration signal zt must remain exogenous to all strategic reports in the relevant round; otherwise, strategic manipulation can occur indirectly by influencing the diagnostic (e.g., if zt is computed from acceptance rates that agents can affect).
We have allowed nt and Nt to be random, and the within-round online nature of Mτ already accommodates adversarial orderings in the strongest models. Still, many platforms face “semi-random” arrivals: an adversary chooses a set, then an independent random order is realized, or arrivals depend on past posted prices but not on current bids. These intermediate models can be exploited to sharpen α(τ, ε) (e.g., better constants under random order) and, correspondingly, to justify more aggressive trust parameters when diagnostics are strong.
The main conceptual caution is that if arrivals respond to mechanism choices, then the meta-layer becomes part of the environment that shapes future Nt. This does not break per-round DSIC—truthfulness is defined holding the mechanism fixed—but it does complicate welfare guarantees because the hindsight comparator $\max_{\tau}\sum_t \mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$ now depends on a counterfactual arrival process. Addressing this rigorously would require a model of platform-induced feedback and an appropriate benchmark (e.g., policy regret rather than external regret).
A frequent practical requirement is that procurement satisfy fairness
constraints: minimum spend shares for disadvantaged groups, quotas, or
caps to prevent over-concentration. Formally, we might impose linear
constraints on selected sets, such as
ℓg ≤ ∑i ∈ Stai, g ≤ ug for
groups g,
or require approximate proportionality over time (e.g., ∑t ≤ T∑i ∈ Stai, g
tracks targets). There are two distinct issues.
First, fairness constraints change the algorithmic core of Mτ: feasibility is no longer a single knapsack, and posted-price rules may need to be group-aware. Second, fairness interacts with the learning augmentation: predictions ωt about unconstrained OPTt may be misleading if binding fairness constraints reduce attainable welfare. A more coherent design is to predict a benchmark OPTtfair and calibrate errors relative to it. When this is not possible, the calibration set ℰ(zt) should be widened to reflect constraint-induced model mismatch, again pushing the selector toward conservative τ.
The incentive properties in our composition argument are structural: as long as τt does not depend on current bids and each Mτ is universally truthful, per-round DSIC and ex post budget feasibility follow without tuning. By contrast, three components are inherently quantitative and typically require tuning in implementations: (i) the construction of proxy losses Lt(τ) from zt (including normalization to a stable range), (ii) the online learning update (learning rate schedules, exploration parameters if bandit feedback is used), and (iii) the calibration mapping zt ↦ ℰ(zt) (choice of δ, window lengths, and robustness to drift). These choices do not alter the basic logic of the framework, but they do determine whether the meta-layer’s adaptivity is a modest refinement or a material welfare improvement in finite samples.
Taken together, these extensions reinforce a simple message: learning augmentation is most valuable when it is (through zt and calibration) and (through bid-independent meta-choices). Departures from monotonicity, single budgets, or exogenous arrivals are feasible, but they raise the premium on careful mechanism engineering and on diagnostics that remain credibly outside strategic control.
This appendix records the formal steps that are implicit in the main text: (i) why the meta-layer preserves dominant-strategy incentives and budget feasibility when it is bid-independent, (ii) how to construct unbiased (or conservatively biased) welfare proxies that support learning over τ, and (iii) why allowing τt to depend on current bids can destroy DSIC even if every Mτ is DSIC when run in isolation.
Fix a round t. Let Meta select τt ∈ 𝒯 as a (possibly randomized) function of (z1, …, zt) only, and then run Mτt on bids b⋅, t. We treat arrivals and any internal randomness of Mτ in the standard “universal truthfulness” way: for each τ, Mτ is a distribution over deterministic DSIC mechanisms, conditional on the realized arrival sequence.
Fix any agent i, any true cost ci, t, and any bids b−i, t of others. Condition on the realized signal history (z1, …, zt) and on the realized τt. Under this conditioning, the allocation/payment rule faced by agent i is exactly Mτt, which is DSIC by assumption, hence truthful reporting maximizes ui, t for that fixed τt. Taking expectations over the randomness in τt (and any internal randomization) preserves the inequality because τt is independent of bi, t under the conditioning; there is no channel through which i can influence which DSIC mechanism is selected within the round. ▫
For any realized τt, the executed mechanism is Mτt, which respects the budget ex post. ▫
These are “structural” statements: they do not rely on tuning, stochastic assumptions on values, or any smoothness of α(τ, ε); the only substantive requirement is that the selection step not be a function of current bids.
For each τ, the base
guarantee takes the form
$$
\mathbb{E}[W_t\mid \tau_t=\tau]\ \ge\
\frac{1}{\alpha(\tau,\varepsilon_t)}\,\mathrm{OPT}_t.
$$
Calibration yields a high-probability event {εt ∈ ℰ(zt)}.
Define the conservative (signal-only) factor
$$
\overline{\alpha}(\tau,z_t)\ :=\
\sup_{\varepsilon\in\mathcal{E}(z_t)}\alpha(\tau,\varepsilon).
$$
Then on the calibration event we have $\alpha(\tau,\varepsilon_t)\le
\overline{\alpha}(\tau,z_t)$, hence
$$
\mathbb{E}[W_t\mid \tau_t=\tau,\ \varepsilon_t\in\mathcal{E}(z_t)]\ \ge\
\frac{1}{\overline{\alpha}(\tau,z_t)}\,\mathrm{OPT}_t.
$$
This motivates a proxy “reward” Lt(τ)
that depends only on zt (and possibly
ωt) and
lower-bounds $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$
up to a known scale factor. In the full-information idealization, one
may take $L_t(\tau)=V/\overline{\alpha}(\tau,z_t)$
when OPTt ≤ V, which
yields a monotone transformation suitable for regret minimization over
τ. In practice, we often
replace V by an online
normalization (e.g., a clipped estimate of OPTt) to avoid
sensitivity to loose global bounds.
A practical complication is that Wt = vt(St) is only observed for the chosen τt. When the selector randomizes over τ with probabilities πt(τ) > 0, we can form an inverse-propensity (IPS) estimate of the welfare that obtained by each τ.
Let Wt(τ)
denote the (random) welfare that would result in round t if we ran Mτ on the
realized bids and arrivals (including Mτ’s own
randomization). The logging policy draws τt ∼ πt
and observes Wt := Wt(τt).
Define, for each τ,
$$
\widehat{W}_t(\tau)\ :=\
\frac{\mathbf{1}\{\tau_t=\tau\}}{\pi_t(\tau)}\,W_t.
$$
Conditioning on ℱt − 1, τt is drawn from
πt
independent of within-round outcomes. Thus
$$
\mathbb{E}[\widehat{W}_t(\tau)\mid \mathcal{F}_{t-1}]
=\sum_{\tau'}\pi_t(\tau')\frac{\mathbf{1}\{\tau'=\tau\}}{\pi_t(\tau)}\mathbb{E}[W_t(\tau')\mid
\mathcal{F}_{t-1}]
=\mathbb{E}[W_t(\tau)\mid \mathcal{F}_{t-1}].
$$
▫
The limitation is variance: if πt(τ)
is small, the importance weight explodes. A standard fix is
clipping:
$$
\widehat{W}^{\mathrm{clip}}_t(\tau)\ :=\
\frac{\mathbf{1}\{\tau_t=\tau\}}{\max\{\pi_t(\tau),\gamma\}}\,W_t,
$$
which is biased downward (hence conservative) for γ > 0. This conservatism is often
acceptable here because our regret-style guarantees compare against a
best-in-hindsight conservative benchmark $\mathrm{OPT}_t/\overline{\alpha}(\tau,z_t)$,
and because the mechanism design layer does not require unbiasedness for
truthfulness—only the learning layer’s stability does.
We give an explicit single-round example showing that even if Mτ is DSIC for every τ, the composite rule need not be DSIC if τ is chosen as a function of the current bids.
Consider one item to be procured (so v({i}) = 1 and v(∅) = 0), budget B = 1, and two candidate posted-price mechanisms:Both mechanisms are DSIC: the allocation is monotone in an agent’s bid and the payment is a bid-independent posted price conditional on acceptance.
Now define a (bid-dependent) selector rule: choose τH if minibi ≤ 0.5, else choose τL. Intuitively, “if bids look cheap, trust the predictor and pay more aggressively,” but crucially the trigger is a current bid.
Let agent 1’s true cost be c1 = 0.45 and agent 2’s bid be fixed at b2 = 0.6. If agent 1 bids truthfully b1 = 0.45, then min {b1, b2} = 0.45 ≤ 0.5, so the selector picks MτH. Agent 1 is the lowest bidder and is purchased at price 0.9, yielding utility u1 = 0.9 − 0.45 = 0.45.
If instead agent 1 shades upward to b1′ = 0.55, then min {b1′, b2} = 0.55 > 0.5, so the selector picks MτL. Now agent 2 is the lowest bidder at 0.6, but 0.6 > 0.4, so no one is purchased and agent 1’s utility is 0. This deviation is worse, so DSIC is not yet violated.
To generate a profitable deviation, reverse the trigger: choose τH if minibi > 0.5, else choose τL. With the same costs and b2 = 0.6, truthful bidding b1 = 0.45 makes min b = 0.45 ≤ 0.5, so τL is selected; agent 1 is purchased at price 0.4, which is below cost, so she would prefer to avoid winning (utility −0.05). By misreporting upward to b1′ = 0.55, we get min b = 0.55 > 0.5, so τH is selected and agent 1 is purchased at price 0.9 (utility 0.45), a strict improvement. Hence truthful reporting is not a dominant strategy.
The economic point is that bid-dependent selection creates a new lever: an agent can manipulate not only whether she wins given a mechanism, but also mechanism (i.e., which pricing rule) is run. This breaks the standard monotonicity/payment logic that underpins DSIC, even when every candidate mechanism is itself DSIC.