Digital advertising auctions are no longer populated by bidders who ``choose a bid’’ in the classical sense. By 2026, a typical advertiser participates through an agent that ingests platform-predicted conversion values, enforces spend constraints, and updates bids continuously using feedback from recent auctions. From the advertiser’s perspective, the relevant control knob is not a one-shot bid but a learning-and-pacing system; from the platform’s perspective, the auction mechanism is only one component in a larger dynamical loop that includes prediction, eligibility, reserve policies, and bidder automation. In this environment, the traditional equilibrium objects of auction theory—Bayes–Nash equilibrium, symmetric equilibria, or even static best responses—are often the wrong abstraction. They can be misleading about what is achievable, what is stable, and what is socially efficient.
The first reason is . The equilibrium paradigm assumes a fixed distribution and a sufficiently long period for behavior to converge (implicitly or explicitly) to a steady state. Real ad markets drift: seasonality, product launches, query mix, and platform model updates can change values on timescales comparable to campaign horizons. When the underlying value sequence moves, there may be no stationary equilibrium to converge to, and convergence itself is not a sensible desideratum. What matters operationally is whether the coupled system (mechanism plus learning agents) a moving benchmark and how much efficiency is lost due to adaptation lags. This suggests measuring performance through regret-like quantities and stability terms that degrade gracefully with market drift, rather than asking whether play converges.
The second reason is . Many advertisers face strict budget limits and return-on-spend (RoS) targets that bind at the campaign level, not auction-by-auction. These constraints change the meaning of a bid: a bidder is not simply trading off value against payment in a quasilinear utility; it is attempting to allocate a limited spend across opportunities while maintaining a minimum value-per-dollar ratio. In practice, this induces systematic bid shading and pacing behavior that cannot be captured by unconstrained equilibrium analyses. Moreover, welfare benchmarks based on raw value can become inappropriate when advertisers are liquidity constrained. A large nominal value is not meaningful if it cannot be realized under the budget or if it would violate an RoS policy. This motivates as a benchmark: it respects the idea that each bidder’s contribution to welfare is capped by what it can feasibly finance, and it aligns more closely with what constrained advertisers can plausibly achieve.
A third reason equilibrium is the wrong object is . Autobidders differ in update rules, exploration, smoothing, and risk controls. Even if each agent is ``optimizing,’’ it may do so against a surrogate objective dictated by internal constraints and platform reporting. The appropriate behavioral model is thus not that bidders play an exact best response in every round, but that they run online algorithms with performance guarantees relative to a comparator class. This is both more realistic and more informative: it allows us to reason about the welfare outcomes of a market populated by learning systems without presuming a particular equilibrium selection.
These observations point to a different question: Our goal is to provide a welfare theory for repeated auctions with autobidding that is robust to (i) decentralized online learning, (ii) binding budgets and RoS constraints, and (iii) non-stationary values. Conceptually, we replace equilibrium efficiency guarantees (price of anarchy at a fixed equilibrium) with guarantees: bounds that hold along the realized play path of the learning process, for finite horizons, and that degrade smoothly with the difficulty of online adaptation.
A key methodological choice is to use the framework as a bridge
between individual incentives and social welfare. Smoothness arguments
yield variational inequalities: they compare the welfare of the realized
outcome to the welfare of a carefully chosen deviation, and then convert
no-regret'' behavior into welfare bounds. In unconstrained settings, this has become a standard route from learning dynamics to efficiency. Our setting requires two extensions. First, the welfare objective must reflect budgets and RoS constraints; second, the bidders' incentives must incorporate these constraints in a way that is compatible with no-regret learning. We therefore work with Lagrangian surrogates: each bidder is evaluated on a payoff that augments value and payment with multiplier-weighted constraint terms. This is not merely a proof device: it mirrors how real pacing systems operate. Practical autobidders routinely maintain internalshadow
prices’’ for spend and performance constraints, updating them as
constraints tighten or relax, and then translate these shadow prices
into bid multipliers or throttling decisions.
Non-stationarity raises a further complication. Even if a mechanism is smooth and bidders have small regret, the benchmark deviation used in smoothness may itself change over time as values drift. In such environments, the relevant notion is : the ability of the learning algorithm to compete with a sequence of comparators that is allowed to move, but only at a rate commensurate with market drift. This captures the operational requirement that autobidders should not merely learn an average best response; they should changing opportunities while respecting long-run constraints. Correspondingly, our welfare bounds include explicit drift-dependent terms (via a bounded-variation measure) that quantify the inherent difficulty of tracking a moving target. This makes the theory predictive in a practical sense: when markets are calm, the welfare guarantee approaches the static benchmark; when markets are turbulent, the bound worsens in a way that is proportional to measurable variation.
The platform also plays an active role through prediction and policy, and this creates an additional lever for efficiency. In modern ad systems, the platform often provides value proxies (conversion predictions, quality scores) and uses them to set reserves, apply boosts, or adjust eligibility. We treat these interventions as that can be imperfect but informative. The central economic point is that even coarse multiplicative information about relative values can be used to improve robustness: it can dampen the misallocations created when constraint-driven bidders with high shadow prices crowd out genuinely high-value bidders. In our analysis, better advice leads to better constants in the smoothness inequality, and hence to improved welfare guarantees under learning. This connects mechanism design to ML system quality in a clean, quantitative way: prediction improvements are not only about revenue or relevance; they can tighten worst-case efficiency guarantees when bidders are constrained and adaptive.
We emphasize what this approach does assume. We do not require convergence to any equilibrium, nor do we assume bidders are truthful or have stable private types drawn i.i.d. from a distribution. We also do not require the platform to know bidders’ budgets or RoS targets exactly, beyond what is needed to define feasible outcomes. Instead, we assume that bidder-side learning achieves standard online performance guarantees on the relevant surrogate payoff sequence and that constraint violations are controlled (or eliminated) by pacing rules. These are testable, engineering-aligned assumptions: they can be verified by inspecting algorithmic update rules and by monitoring realized constraint satisfaction.
At the same time, we acknowledge limitations. Liquid welfare is an appropriate benchmark for constrained bidders, but it is not the only plausible one, and it can understate welfare when budgets reflect accounting rules rather than true liquidity. Our drift measure is deliberately simple, and richer forms of non-stationarity (strategic feedback loops, delayed attribution, or correlated shocks across bidders) may require sharper stability notions. Finally, ML advice can fail in adversarial ways; our guarantees improve with advice quality but do not eliminate the need for conservative mechanism design when predictions are biased or non-calibrated.
Overall, the model illuminates a central tradeoff faced by platforms and advertisers: automation and constraints are indispensable for managing spend and performance at scale, yet they create dynamic interactions that make equilibrium an unreliable guide. A learning-robust welfare lens—grounded in smoothness, regret, and drift—offers a tractable alternative that speaks directly to how autobidding systems behave and how policy interventions (such as reserve/boost design and prediction improvements) translate into efficiency in repeated auction markets.
Our analysis sits at the intersection of four literatures that are often treated separately in auction theory: (i) budget-feasible bidding via LP and dual variables in (approximately) truthful auctions, (ii) online primal–dual learning for long-run constraints, (iii) price-of-anarchy (PoA) guarantees for auctions with liquidity constraints and the associated liquid-welfare benchmark, and (iv) learning dynamics in markets that need not converge, especially under non-stationarity. A fifth, more recent strand concerns how platform-side prediction can be interpreted as ``advice’’ that changes the effective mechanism via reserves, boosts, and eligibility rules. We briefly situate our approach in each strand and clarify what is new in the way we combine them.
A canonical way to understand budgeted bidding in repeated auctions is through a linear-programming relaxation of the advertiser’s allocation problem and its Lagrangian dual. In a stylized model where the bidder faces exogenous per-round prices (or a truthful allocation rule), the bidder’s optimal policy often takes a simple threshold or multiplicative form: bid proportional to value with a pacing multiplier that acts as a shadow price on spend. This perspective is widespread in the advertising literature on pacing and budget smoothing, including work that characterizes ``pacing equilibria’’ and relates them to competitive equilibria or Eisenberg–Gale-type convex programs; see, e.g., . The economic content is that budgets turn quasilinear utility maximization into a resource-allocation problem, and shadow prices encode intertemporal tradeoffs: spending today tightens the constraint tomorrow, so bids are shaded.
Our setting preserves this dual intuition but departs from the
classical pacing models in two ways. First, we allow an RoS (or ROI)
constraint in addition to a spend budget. This creates a shadow price
that penalizes allocations with insufficient value-per-dollar, aligning
with how many practical autobidders operate (they maintain both a budget
pacer'' and a performanceguardrail’’). Second, we do not
assume the bidder faces exogenous prices: prices are endogenously
generated by a strategic population of other paced learners, and values
drift over time. This pushes us away from equilibrium characterizations
and toward welfare guarantees that hold along the realized learning
path.
A large body of work in online optimization studies how to make
sequential decisions subject to constraints that couple decisions across
time
(long-term constraints''). Standard techniques include online mirror descent with Lagrange-multiplier updates, online saddle-point methods, and primal--dual algorithms that achieve sublinear regret while ensuring vanishing average constraint violation; see, among many others, \cite{Zinkevich2003,Hazan2016}. These tools have been adapted to online auction participation, where theprimal’’
action is a bid (or a policy parameter) and the ``dual’’ variables are
pacing multipliers. In truthful mechanisms, the mapping from bids to
outcomes is sufficiently regular that simple multiplier updates yield
explicit bidding rules, and the analysis reduces to regret bounds for
online convex optimization on a surrogate payoff.
We build directly on this line, but we emphasize a performance notion that is particularly suited to drifting markets: against a comparator sequence with bounded path length. Dynamic regret has become a standard lens for non-stationary online learning, where the right benchmark is not a single fixed action but a slowly moving best action; see, e.g., . In ad markets, this is not merely a mathematical convenience: seasonality and model updates imply that the ``right’’ pacing multiplier itself changes. Accordingly, we treat primal–dual pacing as a tracking problem, and we make the dependence on market variation explicit in the welfare loss.
On the mechanism-design side, our welfare analysis uses the smoothness framework for bounding the PoA of games induced by auctions . Smoothness provides a robust route from local deviation inequalities to global welfare guarantees and, crucially for our purposes, it composes naturally with no-regret learning: if players have small regret, the time-average outcome is approximately efficient relative to the smoothness benchmark . This is the conceptual bridge that lets us reason about markets populated by learning agents without requiring equilibrium convergence.
Budgets complicate the welfare objective itself. In the presence of hard budgets, raw social welfare can be an unrealistic benchmark because a bidder with large value but small liquidity cannot actually realize that value at the required payments. The liquid-welfare benchmark formalizes this intuition by capping each bidder’s welfare contribution by its budget (or, more generally, by what is feasible under its financial constraints). Liquid welfare has been studied extensively in the context of budget-feasible mechanism design and PoA analysis for auctions with budgets; see, e.g., . Our contribution here is not to propose a new benchmark, but to tailor the smoothness method to a setting with campaign-level constraints (budget and RoS) and to do so in a way that is compatible with learning. Technically, this leads us to a constrained-smoothness statement written for Lagrangian surrogate payoffs: the same multipliers that appear in bidder-side pacing also appear in the inequality that underwrites the welfare bound.
A parallel literature in learning in games emphasizes that many adaptive dynamics do not converge to Nash equilibrium, even in simple auction environments, yet they can still yield meaningful welfare guarantees when they have no-regret properties . This observation is particularly salient in ad auctions, where heterogeneity in bidder algorithms, delayed feedback, and periodic resets make steady-state notions empirically fragile. The smoothness/no-regret paradigm is now a standard way to extract efficiency conclusions without convergence assumptions, and it has been applied broadly in auctions and market design .
Our focus differs in that we incorporate rather than treating payoffs as adversarial but time-invariant in their comparator structure. When values drift, the deviation that certifies smoothness (and the ``best response’’ it approximates) also drifts. This is where dynamic regret becomes essential: it quantifies how well the learning dynamics track a moving benchmark, and it gives a welfare bound that degrades smoothly with drift. Conceptually, we view non-stationarity not as an adversary that destroys guarantees, but as a measurable feature of the environment that enters additively through a variation term.
Modern ad platforms do not implement a bare second-price or VCG mechanism; they implement a mechanism prediction-driven policy layers: personalized reserves, quality-score boosts, and eligibility filters. These interventions are often motivated by revenue, relevance, or policy concerns, but they also change the strategic landscape faced by autobidders. A growing literature studies mechanism design with predictions (or ``advice’’), including learning-augmented and robust-auction-design results showing how approximate value information can improve efficiency or revenue while retaining worst-case guarantees when predictions are imperfect; see, e.g., . In parallel, the PoA literature has documented that appropriately chosen reserves can improve welfare guarantees in auctions, and that such improvements can be quantified in smoothness constants.
We adopt a deliberately reduced-form view: the platform has access to multiplicative approximations to bidder values and uses them to set reserves/boosts. The technical role of advice is to tighten the constrained-smoothness parameters, which then tightens the learning-robust welfare bound. The economic role is to mitigate a specific failure mode of constraint-driven bidding: when some bidders have high shadow prices (from tight constraints), they may bid aggressively even on modest-value opportunities, crowding out genuinely high-value bidders. Advice-driven reserves/boosts act as a corrective by reintroducing value information into the effective allocation rule, improving worst-case efficiency even when bidders are shading.
Putting these strands together, we can view the market as a coupled primal–dual control system: bidders update shadow prices to satisfy campaign constraints; the mechanism maps bids to allocations and payments; and platform-side prediction shapes the mapping itself. Our contribution is to provide a welfare guarantee that is simultaneously (i) consistent with liquid-welfare benchmarks under budgets and RoS constraints, (ii) robust to decentralized learning without convergence, (iii) stable under bounded market drift via dynamic regret and explicit variation terms, and (iv) quantitatively responsive to the quality of ML advice through improved smoothness constants. With this context in place, we next formalize the repeated-auction model, constraints, and benchmarks that the welfare theory will reference.
We study a horizon of T repeated, single-slot ad auctions indexed by t ∈ [T]. In each round a public context (query, user, page, time-of-day, etc.) arrives, and each advertiser i ∈ [n] observes a scalar value vi, t ∈ [0, v̄] for winning the impression in that context. We interpret vi, t as an expected downstream value (e.g., expected conversions times value-per-conversion) conditional on the information available to bidder i at the time of bidding. Treating values as predictions is empirically natural in ad markets, and it also clarifies what is (and is not) being guaranteed: our welfare benchmark is computed with respect to the same vi, t sequence that bidders act on, rather than an unobserved ``true’’ value.
In every round t, bidders
submit bids bi, t
from a bounded action space ℬi (e.g., [0, b̄]), forming a bid profile bt = (b1, t, …, bn, t).
The platform runs a fixed mechanism M that maps bt to an
allocation and payments:
xi, t(bt) ∈ [0, 1], pi, t(bt) ≥ 0,
with the single-slot feasibility constraint $\sum_{i=1}^n x_{i,t}(\mathbf{b}_t) \le 1$
for all t. Allowing xi, t(bt) ∈ [0, 1]
accommodates randomization (e.g., tie-breaking or probabilistic
eligibility), and it is convenient for the convex-analytic arguments
used later. We keep M
reduced-form: it may correspond to a second-price auction with reserves,
VCG with boosts, or another allocation/payment rule deployed by the
platform; the welfare theorems will only require an efficiency property
of M (formalized via
constrained smoothness in the next section), not truthfulness.
Given outcomes (xi, t, pi, t),
bidder i realizes value
Vi, t := xi, t(bt) vi, t.
We allow advertisers to interpolate between pure value maximization and
quasilinear utility by introducing a parameter λi ∈ [0, 1] and
writing the (unconstrained) per-round objective as Vi, t − λipi, t(bt).
This is a simple way to capture heterogeneous bidding goals seen in
practice: some campaigns are almost purely volume-driven (small λi), while
others behave closer to profit maximizers (larger λi).
Importantly, the hard constraints below will be enforced at the level,
so the learning problem is not myopic even when λi = 0.
Each bidder i faces two
horizon-coupled constraints. First, a hard spend budget Bi:
$$
\sum_{t=1}^T p_{i,t}(\mathbf{b}_t) \;\le\; B_i.
$$
Second, a hard return-on-spend (RoS) target τi > 0,
requiring cumulative value to be at least τi times
cumulative spend:
$$
\sum_{t=1}^T x_{i,t}(\mathbf{b}_t)\, v_{i,t} \;\ge\; \tau_i \sum_{t=1}^T
p_{i,t}(\mathbf{b}_t).
$$
This RoS constraint is a stylized version of ROI/CPA guardrails used by
many autobidders: the campaign is permitted to spend only insofar as it
is supported by predicted value at a target rate. We emphasize that the
constraint couples allocation and payments, hence it cannot generally be
enforced by a static per-auction bid cap alone; it requires
intertemporal pacing and, from an analytical perspective, it motivates
Lagrangian dual variables that evolve over time.
While we state the constraints in hard form, we will allow learning algorithms to incur a small cumulative violation Violi(T) (and we will later highlight how primal–dual pacing can make Violi(T) = 0 with appropriate safeguards). This is both technically convenient and practically aligned with the fact that platforms often permit small transient deviations while avoiding systematic overspend or underperformance.
With liquidity constraints, conventional social welfare ∑i, txi, tvi, t can be misleading as an efficiency yardstick: allocating many high-value impressions to a bidder with a tight budget (or strict RoS) may be infeasible to support at the required payments. We therefore evaluate outcomes using a benchmark that caps each bidder’s contribution by what its constraints can rationally finance.
For a realized allocation sequence {xi, t}t = 1T,
define bidder i’s as
$$
\min\!\left\{\, B_i,\; \frac{1}{\tau_i}\sum_{t=1}^T x_{i,t}
v_{i,t}\right\}.
$$
The term (1/τi)∑txi, tvi, t
is the maximum spend that could be justified by the realized value at
RoS τi,
whereas Bi
is the hard liquidity cap; the minimum of the two is the amount of
``effective welfare’’ bidder i
can contribute. Summing across bidders yields cumulative liquid
welfare,
$$
\mathrm{Lwel}(\{x_{i,t}\}) \;:=\; \sum_{i=1}^n \min\!\left\{\, B_i,\;
\frac{1}{\tau_i}\sum_{t=1}^T x_{i,t} v_{i,t}\right\}.
$$
This objective has two virtues for our purposes. Economically, it
respects the fact that budgets and performance constraints limit how
much value can be translated into feasible payments and hence into
implementable allocations. Analytically, it aligns with the kinds of
inequalities that relate revenue, payments, and feasible surplus in
auction games, which is what ultimately makes price-of-anarchy style
reasoning possible under constraints.
Because Lwel(⋅) is defined over the entire horizon, it is not inherently additive across rounds. Nevertheless, it is convenient to speak of per-round contributions Lwelt when stating learning guarantees; any decomposition that sums to the realized Lwel (e.g., via a telescoping construction) suffices for the reductions used later. The key point is that the benchmark is campaign-level and constraint-aware.
To measure inefficiency, we compare the realized outcome to an
offline planner who knows the entire value sequence {vi, t}
in advance and chooses feasible allocations to maximize liquid welfare.
Formally, letting 𝒳 denote the set of
sequences {xi, t}
satisfying per-round feasibility (xi, t ∈ [0, 1]
and ∑ixi, t ≤ 1),
we define
OPTT := max{xi, t} ∈ 𝒳 Lwel({xi, t}).
This benchmark is intentionally mechanism-agnostic: it does not require
that the allocation be generated by equilibrium bidding in M, only that it be feasible in the
single-slot sense and evaluated under the same (Bi, τi)
caps. We view this as the appropriate notion of ``what was achievable’’
from the perspective of a platform designer or policy analyst: it
captures the best constraint-feasible use of impression supply given
realized predicted values, abstracting from strategic and learning
frictions. A limitation of this choice is that it may be unattainable
under a particular mechanism or information structure; the role of
constrained smoothness will be precisely to relate attainable outcomes
under M to this offline ideal
up to constant factors.
A central feature of ad markets is that values drift: conversion
rates change with seasonality, user mix, and product availability;
prediction models are retrained; and competing bidders enter or exit. To
model this without committing to a stochastic process, we treat {vi, t}
as an arbitrary bounded sequence and quantify its drift via a
bounded-variation measure,
$$
\mathrm{Var}_T \;:=\; \sum_{t=2}^T \max_{i\in[n]}
\bigl|v_{i,t}-v_{i,t-1}\bigr|.
$$
This scalar proxy is deliberately coarse, but it captures the stability
property that matters for learning guarantees: when VarT is small, the
``right’’ bid shading and pacing multipliers should also move slowly,
making tracking feasible with small dynamic regret. When VarT is large, no
algorithm can reliably track a rapidly shifting environment without
incurring additional loss, so we expect (and later obtain) welfare
bounds that degrade additively in VarT.
Finally, we specify feedback only at the level needed to define the learning problem: bidders observe vi, t before bidding and receive enough post-auction information to update their learning rules (e.g., payments and whether they won). We state our main results for clean full-information or semi-bandit feedback so that standard primal–dual regret bounds apply; extending to more limited feedback is possible but would primarily change the regret rates, which then enter the welfare guarantees as explicit error terms.
To connect strategic bidding in a fixed mechanism M to the offline benchmark OPTT, we use a smoothness-style inequality adapted to (i) hard intertemporal constraints (budgets and RoS) and (ii) a constraint-aware welfare objective (liquid welfare). Standard smoothness arguments were developed for quasilinear utilities and unconstrained social welfare; in our setting both of those ingredients fail in an essential way. Constrained smoothness is the device that restores a single ``one-shot’’ variational inequality that can later be aggregated over time and combined with regret bounds.
The core idea is to evaluate unilateral deviations using a Lagrangian
surrogate rather than the advertiser’s raw per-round objective. Fix
nonnegative multipliers μi, νi ≥ 0
corresponding to the budget and RoS constraints. For a given round
(suppressing t), we define
bidder i’s at bid profile
b as
The additive constant μiBi/T
is immaterial for best responses but convenient when we sum over time.
What matters is that can be rewritten as a quasilinear payoff with
coefficients:
Thus, conditional on the multipliers, bidder i behaves it had value (1 + νi)vi
and a ``price weight’’ (λi + μi + τiνi)
on payments. This observation is what allows us to import one-shot
smoothness reasoning into a constrained, horizon-coupled problem: the
constraints are not enforced inside the deviation itself, but they are
priced into the surrogate utility.
Let LW(⋅) denote the relevant
liquid-welfare benchmark for the one-shot instance (or, equivalently,
the per-round component induced by the offline optimum; we keep the
definition abstract since the repeated-game reduction will handle the
horizon coupling). We say that M is if for every value profile
v, every constraint
profile {(Bi, τi)}i,
every nonnegative multiplier profile {(μi, νi)}i,
and every bid profile b, there exists a (possibly
randomized) deviation mapping bi′ = δi(vi, Bi, τi, μi, νi)
such that
where Rev(b) := ∑ipi(b)
and LW⋆ is the (one-shot)
optimal liquid welfare for the instance. The expectation is over any
randomness in the deviations and/or mechanism.
Two aspects of are worth emphasizing. First, the deviation may depend on the bidder’s own constraints and multipliers; this is unavoidable because those parameters determine how aggressively the bidder should trade off value against spend. Second, the right-hand side contains revenue, not welfare, because later we will upper bound revenue by a constraint-aware welfare quantity (intuitively, if constraints are respected, the platform cannot extract more than what campaigns can finance).
In the unconstrained, quasilinear setting, smoothness is typically stated with utilities vixi − pi and benchmark SW⋆ = maxx∑ivixi. The deviation is often a simple shading rule such as ``bid a random fraction of value.’’ Our constrained version differs in three structural ways.
We do not require the deviation bids to satisfy budgets or RoS constraints . Instead, feasibility is enforced by the learning dynamics through the multipliers, while the analysis evaluates deviations in the Lagrangian payoff . This is analytically crucial: hard constraints are intertemporal and would otherwise destroy the one-shot nature of smoothness.
Social welfare can be artificially inflated by allocating to a bidder whose constraints prevent it from paying for those allocations at the implied prices. Liquid welfare corrects this by capping each bidder’s effective contribution by the tighter of its budget and its RoS-implied spend capacity. Smoothness must therefore be proven with respect to a benchmark that already ``knows’’ about liquidity.
In standard smoothness, revenue is subtracted because payments are
transfers that reduce quasilinear utility. In our repeated constrained
setting, the same subtraction appears inside , but the repeated-game
reduction will additionally use that, for any realized trajectory
satisfying the constraints,
This inequality is the hinge that turns into a welfare bound: revenue is
not an arbitrary negative term, but something that can be dominated by a
constraint-aware welfare measure when campaigns are kept feasible.
Because is quasilinear after rescaling, a useful way to think about
the deviation δi is that it
bids proportional to a . Concretely, many auction formats admit
deviations of the form
where κi
decreases when the budget multiplier μi rises (the
bidder is ``pacing’’ spend) and also adjusts with νi to respect
the RoS target (the bidder is effectively raising the bar on value
relative to spend). Smoothness then asserts that if each bidder
unilaterally adopted its shadow-value deviation against an arbitrary
profile b, the of the
resulting Lagrangian payoffs would cover a constant fraction of the
liquid-welfare optimum, up to a revenue term.
The definition is general, but it is not vacuous: it is satisfied by canonical ad-auction mechanisms under mild regularity conditions.
In truthful formats, allocations are monotone in bids and payments are threshold-like, which makes proportional deviations particularly effective. Via , we can often reduce constrained smoothness to the usual smoothness proof for the same mechanism with rescaled values and payments. Moreover, when the platform incorporates ML advice through personalized reserves or boosts, the deviation analysis can be tightened: roughly, better screening of low effective values prevents constraint-driven ``high-multiplier’’ bidders from winning at the expense of truly high vi bidders, improving the constant α (and hence the welfare approximation).
First-price auctions are smooth in the standard sense with a constant strictly worse than 1 (reflecting bid shading), and a similar argument can be carried out with Lagrangian payoffs provided payments remain bid-bounded and allocations are monotone. The resulting constrained constants are typically weaker than in second-price/VCG, which matches the practical intuition that first-price introduces additional strategic friction on top of pacing and RoS considerations.
Constrained smoothness is a property of the and is not guaranteed in arbitrary designs.
If small bid changes can flip allocations in ways that destroy threshold structure, then proportional (or randomized shading) deviations may fail to secure any fraction of LW⋆ uniformly over b.
The reduction relies on being able to relate revenue to liquid welfare along the realized path (cf. ). If the mechanism can charge large payments unrelated to bids (or if bidders can be forced into negative payoffs even when following conservative bids), then revenue may no longer be bounded by a constraint-aware welfare quantity, and the revenue subtraction in cannot be absorbed.
Our deviations are allowed to depend on vi and on (Bi, τi, μi, νi), but not on rivals’ values. If the only deviations that certify efficiency require knowledge of others’ types or bids beyond what the model permits, then smoothness is not the right tool.
These limitations are not merely technical: they delineate when we can hope for robust, learning-compatible welfare guarantees. In the next section we exploit as the ``mechanism layer’’ inequality, and we specify learning dynamics (primal–dual pacing with dynamic regret control) that make the multiplier process and constraint violations behave well enough to turn this one-shot inequality into a horizon-level liquid-welfare guarantee.
The smoothness inequality in is a purely one-shot statement: it compares payoffs under a particular deviation to the optimal liquid-welfare benchmark for the same instance. To convert it into a horizon-level guarantee, we need a complementary that explains how bidders choose bids over time while respecting hard budgets and RoS targets. We emphasize a practical modeling choice: we do not assume bidders can solve the constrained horizon problem offline. Instead, we require only that each bidder runs an online algorithm whose performance can be summarized by two quantities that will enter the final welfare bound: a regret term (static or dynamic) and a constraint-violation term.
Fix bidder i and consider
the Lagrangian payoff from , now indexed by time:
For the purpose of regret analysis, it is convenient to view ui, t
as a function of bi, t
holding rivals fixed, i.e.,
ui, t(bi, t) := ui, t((bi, t, b−i, t); μi, νi).
Under full-information feedback (or semi-bandit feedback sufficient to
infer threshold prices), bidder i can evaluate ui, t(b)
for alternative bids b ∈ ℬi and run
standard online-optimization updates. Conceptually, the bidder is
directly learning in the original constrained objective; it learns
against the surrogate sequence ui, t(⋅),
where constraint pressures are encoded by the multipliers.
A simple and widely-used class of bidding rules is : bids are
proportional to predicted value,
with κi, t
updated over time to control spend and RoS. The proportional form is not
imposed by our welfare analysis, but it is a useful sufficient condition
because it reduces the bidder’s action space from bids to a single
scalar. In truthful single-slot formats (second-price/VCG-like with
monotone allocation and threshold payments), the transformed quasilinear
representation suggests a closed-form mapping from multipliers to an
``effective shading’’ level. A canonical choice is
which increases bids when the RoS constraint is slack (small νi, t)
and decreases bids when the budget constraint tightens (large μi, t).
While is mechanism-dependent as an exact best response, it captures the
economic direction of all primal–dual pacing loops: dual variables are
shadow prices, and the primal bid is a monotone function of these
prices.
We update multipliers by (projected) subgradient ascent on the dual
of the horizon constraints, using realized per-round spend and
value:
optionally with projection onto a bounded box [0, μ̄] × [0, ν̄] to obtain
uniform Lipschitz constants. These updates are interpretable: μi, t
increases when the bidder is overspending relative to its average
per-round budget, and νi, t
increases when realized value falls short of τi times spend.
In practice, such dual updates correspond closely to the
pacing multipliers'' andRoS multipliers’’ implemented in
autobidding systems.
Our welfare theorem will allow small (or zero) violations, so we make
explicit what is being measured. Define cumulative violations
and let Violi(T) denote
either their sum or any upper bound used by the analysis. In many ad
platforms the budget constraint is enforced mechanically (spend cannot
exceed the remaining budget), which corresponds to Violibud(T) = 0
pathwise. RoS is often enforced by throttling or pausing when the
observed value-to-spend ratio drops; this again yields a hard (or nearly
hard) constraint in operational terms. Our analysis accommodates both
cases: when constraints are enforced strictly, the violation term
disappears; when constraints are enforced softly, the welfare bound
degrades linearly with Violi(T) scaled
by a Lipschitz constant Lipi capturing the
largest per-round sensitivity of payoffs to constraint slack.
Given a sequence of per-round payoff functions ui, t(⋅)
(as above), external regret compares the realized sequence (bi, t)t ≤ T
to the best fixed bid in hindsight:
$$
\mathrm{Reg}_i(T) := \max_{b\in\mathcal{B}_i} \sum_{t=1}^T u_{i,t}(b) -
\sum_{t=1}^T u_{i,t}(b_{i,t}).
$$
In non-stationary markets, the relevant benchmark is typically not
fixed. We therefore use against a drifting comparator sequence (bi, t⋆)t ≤ T
with bounded path length:
The comparator restriction is essential: without limiting drift, dynamic
regret is vacuous. Economically, the path length P captures how quickly the ``right’’
pacing level moves as the market moves. In our setting, we will align
P with the variation measure
VarT (or a
mechanism-specific proxy): when values and market conditions drift
slowly, so do the best-response deviations used in smoothness, and a
bidder that can track such drift incurs only a small additional
loss.
For the main welfare reduction, we do not need the bidder’s update rule to be exactly ; we only need verifiable performance guarantees. A clean set of sufficient conditions (stated informally here) is:
These conditions are intentionally modular: they isolate what the learning layer must deliver, independently of the particular mechanism-specific smoothness proof.
In second-price/VCG-like mechanisms, the mapping from bids to allocations and threshold payments is monotone and piecewise smooth, which makes the induced Lagrangian payoff well behaved as a function of a one-dimensional pacing parameter κi, t. When bidders receive (at least) the clearing price conditional on winning, they can implement online updates on κi, t using either full-information evaluation (testing counterfactual bids) or semi-bandit gradient estimation. On the dual side, the updates are standard projected subgradient steps for a convex dual objective; with step sizes $\eta_\mu,\eta_\nu \asymp 1/\sqrt{T}$ and bounded iterates, they yield $O(\sqrt{T})$ stability and, when coupled with hard stopping rules (e.g., cease bidding once remaining budget is exhausted), achieve Violibud(T) = 0 exactly.
Two caveats are worth flagging. First, the regret bounds above are cleanest under convexity/concavity assumptions and sufficiently rich feedback; in highly discontinuous auction rules or with severely censored feedback, dynamic regret can deteriorate, and our welfare guarantees weaken proportionally. Second, RoS targets create a subtle identification issue: if realized value xi, tvi, t is noisy or delayed (as in conversion attribution), then the dual update for νi, t must be modified (e.g., using unbiased estimators or delayed-gradient corrections). Our framework still applies as long as the resulting algorithm admits a regret/violation summary of the form (C1)–(C3); the constants simply absorb the additional estimation error.
This concludes the learning layer: bidders need only ensure that their bidding dynamics are no-regret (or low dynamic-regret) with respect to the Lagrangian payoff sequence and that any constraint violations are either zero by enforcement or quantitatively controlled. In the next section, we combine these algorithmic guarantees with constrained smoothness to obtain a learning-robust liquid-welfare bound with explicit dependence on ∑iDRi(T) and VarT.
We now state the reduction that turns the one-shot constrained-smoothness inequality into a horizon-level efficiency guarantee when bidders learn online. The message is that we can treat the repeated auction as a sequence of (possibly drifting) instances: smoothness provides, in each round, a that would certify good liquid welfare if bidders were willing to play it; dynamic regret ensures bidders track, up to a controllable loss, the cumulative payoff of that drifting deviation; and the only additional degradation comes from non-stationarity (captured by VarT) and any constraint violations that are not enforced mechanically.
Inequality decomposes welfare loss into three economically interpretable components.
First, the multiplicative constant $\frac{1}{\alpha+\beta}$ is the familiar term induced by the one-shot smoothness inequality: when bidders best-respond (or no-regret learn) relative to the deviation family, aggregate welfare is at least a constant fraction of the offline benchmark. The β term appears because smoothness typically subtracts realized revenue, which we then control via feasibility and liquid-welfare dominance (in many constrained environments, revenue is upper bounded by liquid welfare once budgets/RoS bind).
Second, ∑iDRi(T; ciVarT) captures . If bidders could instantly track the drifting comparator bids singled out by the smoothness proof, this term would vanish. With standard online optimization rates, DRi is sublinear in T for moderate drift, implying that the per-round welfare loss from learning decays.
Third, Cdrift ⋅ VarT is the of non-stationarity in the reduction itself. Even if bidders could perfectly track payoffs, the smoothness deviation we use changes as the instance changes; bounding the welfare impact of splicing together per-round deviations requires a stability argument that scales with how much the environment moves.
Finally, the violation term isolates the operational choice of versus enforcement. When budgets are enforced by the platform and RoS is enforced via throttling/pausing, we typically have Violi(T) = 0 pathwise, and the last term disappears entirely. When constraints are treated as soft, violations translate linearly into welfare loss with slope Lipi, reflecting the worst-case sensitivity of the Lagrangian payoff to constraint slack.
Although we have stated with big structural terms, it is useful to
make explicit what does hide in the constants. The factors Cdrift and Cviol do not scale with
T; they depend on boundedness
of values and payments and on any projection ranges for multipliers
(which ensure the per-round Lagrangian payoffs are uniformly Lipschitz).
Thus, under a standard regime where DRi(T; ⋅) = o(T)
and VarT = o(T),
the theorem implies an asymptotic efficiency guarantee:
$$
\frac{1}{T}\sum_{t=1}^T \mathrm{Lwel}_t
\;\ge\;
\frac{1}{\alpha+\beta}\cdot \frac{\mathrm{OPT}_T}{T}
\;-\; o(1).
$$
This makes the tradeoff transparent: longer horizons wash out transient
learning errors, while faster market drift and richer constraint
management (via large multipliers or soft constraints) can increase the
constant factors.
The theorem also clarifies how platform-side design choices enter. If
the platform uses ML advice to implement reserves/boosts and thereby
improves the constrained smoothness constant from α to a smaller α(γ) (as in the
instantiation discussed in the global context), then the leading
constant improves mechanically:
$$
\frac{1}{\alpha+\beta}
\quad\leadsto\quad
\frac{1}{\alpha(\gamma)+\beta},
$$
and all lower-order terms remain of the same qualitative form. In
particular, better advice quality γ improves welfare in the precise
sense that it strengthens the multiplicative approximation to OPTT without requiring
any change in bidder learning dynamics.
The argument is the standard smoothness-to-welfare reduction, with two modifications reflecting our setting. First, we apply the constrained-smoothness inequality round-by-round, summing over t = 1, …, T and using the deviation mapping supplied by (H1). Second, rather than comparing bidders to a fixed deviation (which would yield static regret), we compare them to the deviation sequence and invoke dynamic regret (H2). The cost of moving from a per-round deviation to an admissible comparator sequence is controlled by (H3): the path length of the deviation sequence is bounded by a constant multiple of VarT. Finally, we convert the resulting bound on summed Lagrangian payoffs into liquid welfare by (i) canceling the dual terms via feasibility (or bounding them by LipiVioli(T) under soft constraints), and (ii) absorbing revenue terms using the β-penalty and the usual revenue-to-welfare relationship that holds under budget/RoS feasibility. Randomization is handled by taking expectations at the end, since every inequality in the chain is linear in realized payoffs.
The theorem is intentionally modular: it separates what the must certify (a constrained-smoothness inequality with explicit (α, β)) from what the must deliver (dynamic regret and controlled violations), and it isolates non-stationarity through a single drift statistic VarT. The price we pay is that the drift dependence is necessarily coarse: VarT is a worst-case scalar summary, and in applications one can often replace it by a finer, mechanism-specific stability measure that yields smaller constants. Moreover, the bound is only as good as the feedback model allows: if bidders cannot estimate counterfactual payoffs well enough to obtain sublinear dynamic regret, the learning term dominates and the welfare guarantee correspondingly weakens.
With the main reduction in hand, we next instantiate (H1) and the corresponding constants for commonly used ad auction formats (truthful and non-truthful), including how reserves/boosts and randomization affect (α, β) and the comparator stability needed for (H3).
We now make hypothesis (H1) concrete by recording (α, β) parameters for
canonical single-slot mechanisms and explaining how the budget/RoS
constraints enter the smoothness proof through the Lagrangian
transformation. The guiding observation is that our per-round Lagrangian
payoff is an affine reweighting of the usual quasi-linear utility, so
standard smoothness arguments can be imported essentially as a black box
once we identify the induced virtual value'' andvirtual
bid’’.
Fix a bidder i and
multipliers (μi, νi) ≥ 0.
Up to the constant term μiBi/T,
the per-round Lagrangian payoff can be written as
Thus, for fixed multipliers, bidder i behaves as if she had (i) a (1 + νi)vi, t
for allocation and (ii) a weight wi on money.
Dividing by wi (which does
not affect best responses) shows that maximizing is equivalent to
maximizing a quasi-linear utility with
and standard payment pi, t.
This reduction is the technical reason constrained smoothness remains
tractable: we can prove smoothness for the induced instance with values
(ṽi, t)
and then translate back, with constants depending only on
boundedness/projection radii for multipliers (to keep ṽi, t
bounded and Lipschitz).
In a single-slot environment, VCG coincides with the second-price auction (SPA) up to tie-breaking and reserve handling, so we state results in SPA language. The key property is that, under conservative bidding (no overbidding relative to the relevant value proxy), SPA admits a deviation guaranteeing a constant fraction of welfare needing to subtract revenue.
Concretely, consider the standard deviation that bids a fixed
fraction of value. For our constrained setting, the natural deviation
is
where ṽi, t
is defined in . Since the SPA allocation is monotone in bids and the
winner pays the highest competing bid, the classical smoothness
calculation (adapted to allow randomized tie-breaking) implies an
inequality of the form
where OPTteff
denotes the per-round optimal welfare computed using the effective
values (ṽi, t).
Translating back to liquid welfare requires only the same bookkeeping as
in the reduction: the multipliers are already internalized by the
Lagrangian, and feasibility plus constraint handling ensures we can
compare OPTteff to the
liquid-welfare benchmark for the original instance. In the notation of
(H1), this corresponds to (α, β) = (2, 0) for SPA/VCG
under conservative bidding relative to ṽi, t.
Two remarks clarify what is assumed and what is gained.
Because bidder utilities in penalize payments by a weight wi ≥ λi,
the relevant
no-overbidding'' condition is $b_{i,t}\le \widetilde{v}_{i,t}$ (or the corresponding per-click analogue). In primal-dual pacing for truthful formats, bids typically take the form $b_{i,t}=\kappa_{i,t} v_{i,t}$ with $\kappa_{i,t}\in[0,1]$ generated by multipliers; this automatically enforces conservativeness once the mapping between $\kappa_{i,t}$ and $(\mu_{i,t},\nu_{i,t})$ is chosen consistently with \eqref{eq:effective_value}. Operationally, this is precisely the regime in which platforms are comfortable interpreting bids asshaded
values’’ rather than arbitrary strategic signals.
If we allow arbitrary overbidding, one typically obtains a smoothness inequality with a revenue subtraction (i.e., β > 0). In our framework this is still admissible because the reduction ultimately controls revenue by liquid welfare under hard feasibility/constraints, but the leading constant degrades from 1/(α + β) = 1/2 to a weaker fraction. We therefore treat (2, 0) as the clean baseline for truthful auctions when the learning layer enforces conservativeness, and we view the β > 0 variants as robustness add-ons rather than the economically salient case.
For the first-price auction (FPA), the winner pays her own bid, so
the per-round payoff is nonlinear in the bid even after the Lagrangian
reweighting. Nonetheless, the mechanism remains smooth in the standard
sense: there exists a randomized deviation (often based on bidding a
random fraction of value) that guarantees a constant fraction of the
optimum realized revenue. Plugging the effective values ṽi, t
into the classical deviation construction yields, for each round t,
which corresponds to $(\alpha,\beta)=\bigl(\tfrac{e}{e-1},\,1\bigr)$
in our (1/α, β)
parameterization.
Two economic takeaways are worth emphasizing.
Because β = 1 in , our reduction uses a revenue-to-liquid-welfare comparison to absorb the −Revt term. This step is harmless in regimes where budgets/RoS bind (revenue is then effectively ``liquid’’), but it can be loose when many bidders are slack. Said differently, non-truthful formats shift part of the efficiency analysis from allocation geometry to payment geometry; in a constrained environment this is not a dealbreaker, but it makes the constants more sensitive to how often constraints bind.
In SPA/VCG, the mapping bi, t = κi, tvi, t together with truthful pricing makes the per-round Lagrangian payoff effectively linear in the decision variable κi, t, enabling clean regret guarantees via standard online convex optimization. In FPA, the payoff as a function of bid is typically only unimodal and depends on the distribution of competing bids; even with full-information feedback on the winning threshold, obtaining dynamic regret bounds may require richer bidder-side estimation or explicit exploration. Our welfare theorem still applies verbatim once (H2) is verified, but in practice FPA shifts burden from mechanism design to bidder algorithm design.
Finally, we record a robustness principle that will matter both for practical auction implementations (where randomization is ubiquitous) and for proof hygiene (where randomized deviations are standard).
Suppose the mechanism randomizes in a way that preserves feasibility almost surely (e.g., randomized tie-breaking among equal bids; random selection among a shortlist; probabilistic throttling that never allocates more than one slot). Then any one-shot smoothness inequality proved for deterministic outcomes extends immediately by taking expectations over the mechanism’s internal randomness, because both sides of the inequality are linear in allocations and payments. In particular, if a deterministic SPA is (2, 0)-constrained-smooth under conservative bidding, then an SPA with randomized tie-breaking is also (2, 0)-constrained-smooth in expectation.
For FPA and for several reserve-based or boosted-score auctions, the best known deviations are randomized. Our constrained smoothness definition allows randomized deviations by design, so no additional machinery is required: we view the deviation mapping as producing a distribution over bids conditional on (vi, t, μi, νi), and we interpret the smoothness inequality in expectation over that deviation randomness (holding the realized bid profile bt fixed).
Randomization only affects (α, β) when it changes the underlying allocation/payment rule in a way that effectively ``dilutes’’ competition, such as δ-exploration mechanisms that allocate the slot uniformly at random with probability δ and run the base auction with probability 1 − δ. In such cases, smoothness typically degrades gracefully by a factor on the order of 1/(1 − δ) in the welfare term, reflecting the mechanically lost efficiency from forced exploration. This gives a clean way to account knowing that, in learning-heavy deployments, some randomization may be intentionally introduced for calibration or experimentation.
Taken together, these instantiations justify treating (H1) as a mild, mechanism-level condition: for truthful formats we obtain clean constants with minimal caveats, while for non-truthful and randomized formats we can still certify constrained smoothness at the cost of either a revenue term (β > 0) or a controlled loss due to explicit exploration. The next step is to show how platform-side ML advice, implemented via reserves/boosts, improves the leading constant by strengthening the effective competition among high-value bidders.
The smoothness constants recorded above treat the auction as purely bid-driven, and therefore inherit the classical factor-2 barrier for single-slot allocation under conservative behavior. In constrained autobidding, this barrier is often not the economically right baseline: bidders shade of budgets and RoS, so bids partially reflect dual prices rather than underlying marginal value. A natural platform response—already common in practice—is to fold platform-side predictions into the allocation rule via bidder-specific reserves or score boosts. In our framework, such advice enters only through (H1): it modifies the mechanism M, and hence the parameters (α, β) that feed into the learning-to-welfare reduction.
Fix a round t. Suppose the
platform has an advice signal v̂i, t
satisfying a multiplicative guarantee
The upper bound v̂i, t ≤ vi, t
is a substantive modeling choice: it is precisely the regime in which
using v̂i, t
for reserves/boosts is ``safe’’ from an individual-rationality
standpoint under conservative bidding, because it never forces a bidder
to pay (or effectively commit to pay) above her realized value. The
lower bound v̂i, t ≥ γvi, t
is the quantitative content: when γ is larger, advice is closer to the
truth and can be used more aggressively without sacrificing welfare
guarantees.
Because our smoothness arguments operate on the effective values ṽi, t in , it is convenient to note that transfers immediately to the effective scale for any fixed multipliers: defining $\widehat{\widetilde v}_{i,t}:=\frac{1+\nu_i}{\lambda_i+\mu_i+\tau_i\nu_i}\hat v_{i,t}$ yields $\widehat{\widetilde v}_{i,t}\in[\gamma \widetilde v_{i,t},\,\widetilde v_{i,t}]$. Thus, advice quality is preserved by the Lagrangian reweighting.
One convenient way to incorporate advice while retaining clean
analysis is an SPA run on bidder scores that add an advice-based
constant:
with the threshold payment
and pi, t = 0
for losers. Since the boost term is independent of bi, t,
– is simply VCG applied to a shifted bid space; in particular, the
threshold payment never exceeds the winner’s bid because winning implies
bi, t ≥ maxj ≠ isj, t − ζv̂i, t.
Economically, says the platform is willing to ``credit’’ bidders with a
portion of their predicted value when ranking ads, counteracting shading
induced by pacing.
The parameter ζ ≥ 0 controls how strongly advice influences allocation. For our purposes, we only need that ζ is chosen as a (known) function of γ so that boosts are strong enough to prevent gross misallocations but not so strong as to make bids irrelevant. One can view as a stylized proxy for the many score-based implementations used in ad platforms (including reserve-like floors, quality-score boosts, and bid multipliers), all of which preserve ex post feasibility and can be analyzed with the same smoothness vocabulary.
The effect of advice is to strengthen the deviation guarantee in the smoothness inequality: when a high-value bidder loses under the deviation, it must be because someone else has either (i) sufficiently high bid pressure or (ii) sufficiently high advice. Case (ii) is exactly where bites, because losing to a high-v̂ bidder means losing to someone whose value is not too small relative to the optimum. This improves the worst-case inefficiency relative to a purely bid-driven ranking, where the winner can in principle have arbitrarily low value.
Formally, adapting the robust-auction-design calculations for
single-slot score auctions to our effective-value instance yields the
following per-round constrained smoothness inequality for an
advice-augmented SPA of the form –:
for an appropriate deviation mapping (a fixed-fraction bid in the
effective scale, as in , but with the fraction chosen to be compatible
with the boost strength). Comparing to , we see that advice improves the
welfare coefficient from 1/2 to (1 + γ)/2. In the notation of (H1),
this corresponds to
so that the leading welfare guarantee in our learning theorem improves
monotonically with γ.
It is worth pausing on the economics behind . The classical 1/2 arises because, absent additional
structure, the deviation can only guarantee that either the deviating
optimal bidder wins (earning utility) or else the competing threshold is
high (which, in more general mechanisms, is paid as revenue and is then
related back to welfare). Advice effectively creates a third
possibility: even if the deviator does not win and even if the competing
bids are not informative, the winner’s advice-based component certifies
that her true value is not too small. The worst case therefore
interpolates between
competition-based'' andadvice-certified’’ allocation,
producing exactly the arithmetic average (1 + γ)/2.
Equation delivers a clean comparative static: as γ increases, α(γ) decreases, and therefore the multiplicative welfare loss due to strategic behavior and learning shrinks. Two limiting cases are informative. If γ ↓ 0, advice can be arbitrarily weak, and we revert to the baseline α(0) = 2 (no improvement over standard SPA/VCG). If γ ↑ 1, advice is essentially exact and the bound approaches α(1) = 1, consistent with the intuition that sufficiently informative platform-side signals can nearly pin down the welfare-maximizing allocation even when bids are heavily paced.
From a policy-and-practice perspective, says that investments in prediction quality have a welfare payoff . The improvement is not merely statistical; it is mechanism-theoretic. The platform is using prediction to reshape the strategic environment so that the smoothness deviation captures more of the offline optimum.
First, the conservative orientation v̂i, t ≤ vi, t matters: if advice can overestimate values, then reserves/boosts based on v̂i, t can induce over-allocation to low true-value bidders and can also break the conservativeness conditions that underlie clean smoothness constants. In practice, this motivates the use of calibrated lower-confidence bounds (or other conservative corrections) when turning ML predictions into mechanism parameters.
Second, γ should be interpreted as a multiplicative quality parameter. If advice quality is heterogeneous across bidders or contexts, one can still apply the same analysis by taking γ to be the minimal guarantee over the relevant slice of traffic, at the cost of conservatism in the bound. This is a feature rather than a bug: it makes explicit that the welfare guarantee is only as strong as the weakest region in which prediction is deployed.
Finally, while we focused on SPA/VCG-like formats where advice yields β = 0, analogous ideas can be applied to first-price and other non-truthful auctions; there the main effect is typically a reduction in α some control over β (the revenue subtraction), with the exact tradeoff depending on how boosts interact with bid shading. For our purposes, the key message is that incorporating advice through reserves/boosts is a mechanism-level way to improve the constants in (H1), and therefore to tighten the welfare guarantees that persist under dynamic learning and drift.
Our baseline presentation focuses on single-slot allocation, a linear ``value–spend’’ objective, and feedback rich enough to support closed-form primal–dual pacing. We briefly discuss how each of these choices can be relaxed while keeping the same organizing logic: (i) fold the relevant business or engineering feature into the per-round primitives (xi, t, pi, t, vi, t) and the benchmark OPTT, and (ii) re-check a constrained-smoothness inequality (H1) and a learning guarantee (H2) for the induced Lagrangian payoff sequence.
In many ad markets, each round t allocates K ≥ 1 slots with position weights
(e.g., click-through rates) a1 ≥ a2 ≥ ⋯ ≥ aK ≥ 0.
A convenient extension is to let the mechanism output an assignment
xi, tk(bt) ∈ [0, 1]
indicating whether bidder i
receives slot k, with
feasibility ∑ixi, tk ≤ 1
for each k and ∑kxi, tk ≤ 1
for each i. If vi, t
denotes value per click (or per unit exposure), then bidder i’s realized value becomes
$$
V_{i,t} \;=\; \sum_{k=1}^K a_k\,x^{k}_{i,t}(\mathbf{b}_t)\,v_{i,t}.
$$
Budgets and RoS constraints remain unchanged with Vi, t
substituted, and liquid welfare naturally generalizes to
$$
\mathrm{Lwel} \;=\; \sum_{i=1}^n
\min\Bigl\{B_i,\;\frac{1}{\tau_i}\sum_{t=1}^T \sum_{k=1}^K a_k
x^{k}_{i,t} v_{i,t}\Bigr\}.
$$
From an analytical standpoint, nothing essential hinges on K = 1: the role of constrained
smoothness is to certify that for each per-round instance there exists a
deviation profile whose aggregate Lagrangian payoff lower bounds the
(effective) welfare benchmark up to constants. For VCG position auctions
(or variants that are VCG in the space of scores), the standard
smoothness constructions for assignment games extend by scaling each
bidder’s value by ak and using
deviations that mimic truthful bidding on the induced effective values.
The constants (α, β)
will generally depend on the position weights through the same
parameters that appear in classical price-of-anarchy analyses of
position auctions (e.g., ``semi-smoothness’’ constants), but our
learning-to-welfare reduction is unchanged once (H1) is verified.
Likewise, if the platform introduces advice-driven score adjustments
slot-by-slot, the same multiplicative-quality parameter γ can be used to improve α provided the advice certifies
value along the relevant margin (winning a high-weight slot rather than
a low-weight slot).
Two practical caveats are worth recording. First, market drift may be
better captured by variation in values akvi, t
or in the induced threshold prices for each slot; one can define
$$
\mathrm{Var}_T^{\mathrm{pos}} \;:=\; \sum_{t=2}^T \max_{i}\max_{k}\,
\bigl|a_k v_{i,t} - a_k v_{i,t-1}\bigr|
\;=\;\Bigl(\max_k a_k\Bigr)\mathrm{Var}_T,
$$
so the drift penalty simply rescales. Second, real systems often couple
the positions through reserve policies or ``top-of-page’’ constraints;
in such cases the feasibility set 𝒳 is
more structured, but the constrained-smoothness vocabulary is expressly
designed to be mechanism-agnostic: one re-derives the deviation
inequality for the actual allocation rule and proceeds identically.
We allow each bidder to trade off value and spend via λi ∈ [0, 1], ranging from pure value maximization (λi = 0) to quasilinear utility (λi = 1). In practice, such mixed objectives arise because some advertisers optimize value subject to tight budgets (small λi), while others treat spend as intrinsically costly (large λi) or incorporate additional penalties (e.g., risk or variance terms) that are approximately linear in spend at the margin.
Formally, the key observation is that λi enters only
through the Lagrangian surrogate. Writing the per-round Lagrangian
payoff as
$$
\widehat{u}_{i,t}(b_{i,t},\mathbf{b}_{-i,t};\mu_i,\nu_i)
=
(1+\nu_i)V_{i,t} \;-\; (\lambda_i+\mu_i+\tau_i\nu_i)p_{i,t} \;+\;
\mu_i\frac{B_i}{T},
$$
we see that the learning problem remains a no-regret (or dynamic-regret)
problem on a sequence of linear payoffs in (Vi, t, pi, t)
with bidder-specific coefficients. In particular, the induced
``effective value’’ scaling
$$
\widetilde v_{i,t} \;:=\;
\frac{1+\nu_i}{\lambda_i+\mu_i+\tau_i\nu_i}\,v_{i,t}
$$
continues to summarize incentives, so any smoothness proof that is
stated in terms of values can be re-used by substituting ṽi, t.
This makes mixed objectives essentially free in our framework:
heterogeneity in λi affects the
constants through bidder-specific Lipschitz bounds (how much utility
changes when payments move) but not the structure of the welfare
guarantee.
Two extensions sometimes requested in applications are also compatible. First, advertisers may change λi over time (e.g., during promotions). If λi, t varies with bounded variation, it can be folded into the drift term by treating the effective values ṽi, t as the primitive sequence whose variation drives the comparator path length. Second, one can incorporate a platform objective that is itself a convex combination of welfare and revenue by modifying the benchmark (and, correspondingly, the deviation mapping in (H1)); the same learning reduction applies so long as the benchmark is the one certified by the constrained-smoothness inequality.
Return-on-spend constraints appear in several operational forms, not all identical to the cumulative inequality ∑tVi, t ≥ τi∑tpi, t. We emphasize that our choice is not merely for convenience: the cumulative form is (i) economically interpretable as a long-run ROI target and (ii) analytically compatible with convex dual methods and smoothness reductions.
That said, three common variants can be accommodated.
Often vi, t is an expectation (e.g., predicted conversion value), while realized value is a random variable Yi, t observed with delay. If the constraint is intended to hold in expectation, one simply replaces vi, t by 𝔼[Yi, t ∣ context], which is already our modeling primitive. If instead the constraint must hold with high probability on realized outcomes, then the dual updates need to incorporate concentration (martingale) bounds; operationally, this leads to ``conservative’’ pacing rules that include a safety buffer, which can be analyzed by adding an extra slack term to the RoS constraint and tracking its contribution to Violi(T).
Many systems enforce RoS over rolling windows (e.g., last W auctions or a day). This can be modeled by introducing constraints indexed by window endpoints, yielding multiple dual variables {νi, s} that update on different clocks. While the smoothness inequality remains per-round, the learning problem becomes a multi-constraint online optimization problem; standard primal–dual methods still yield sublinear regret, but the constants depend on the number of active windows. A pragmatic compromise is to treat windows as ``soft’’ constraints and penalize violations in the objective; in our notation this simply enlarges the Lagrangian surrogate and increases the Lipschitz term multiplying Violi(T).
Some teams phrase RoS as a ratio constraint $\frac{\sum_t V_{i,t}}{\sum_t p_{i,t}}\ge \tau_i$ (with a convention when spend is zero). This is equivalent to our linear form when ∑tpi, t > 0 and is therefore already covered; the linear form is preferable because it avoids degeneracy and keeps dual updates well-defined even when spend is small early in the horizon.
Across these variants, the recurring technical point is that the welfare theorem penalizes constraint handling only through Violi(T), so any enforcement architecture that keeps violations controlled—hard stopping rules, conservative buffers, or explicit penalties—can be plugged in without changing the core mechanism-level analysis.
Our cleanest corollaries rely on feedback rich enough to compute (Vi, t, pi, t) and, in truthful auctions, to map dual prices to proportional bids in closed form. Real platforms can deviate from this ideal in two ways: bidders may not observe informative counterfactuals (e.g., the threshold price when they lose), and the auction may be non-truthful or ``black-box’’ enough that the dependence of outcomes on bids is not analytically tractable.
When feedback is (the bidder observes only her own xi, t and pi, t), the natural replacement is a bandit or semi-bandit online learning algorithm on ℬi (or on a parametric policy class Πi). Regret rates typically weaken from $O(\sqrt{T})$ to O(T2/3) (or worse) depending on smoothness/convexity assumptions and whether gradients can be estimated. In our welfare bound this translates directly into a larger additive loss term through ∑iDRi(T). The comparative static is therefore sharp: limited transparency is not merely a learning inconvenience; it is quantitatively a welfare cost.
When the mapping from bids to outcomes lacks a closed form (e.g., first-price dynamics, complex pacing layers, or unknown reserve logic), a practical approach is of the primal–dual loop. Concretely, bidders can maintain multipliers (μi, t, νi, t) as before, but estimate the bid response bi, t(μi, t, νi, t) by (i) local perturbations of bids to estimate the marginal tradeoff between expected value and spend, or (ii) fitting a low-dimensional response model from historical data and updating it online. In either case, the theory guides what should be calibrated: the effective-value ratio (1 + ν)/(λ + μ + τν) is the sufficient statistic in truthful settings, so numerical routines can target this one-dimensional mapping even when the underlying auction is complex. The limitation is equally clear: exploration needed for estimation creates short-run inefficiency and may itself induce constraint violations unless additional safeguards are imposed. This is precisely the regime where simulation becomes informative, motivating the experimental plan we turn to next.
Our theory isolates two quantitative drivers of performance in repeated auctions with constrained autobidders: (i) the extent of non-stationarity, captured by VarT, and (ii) mechanism-side improvements in the constrained-smoothness constant, captured by α(γ) when ML advice is used to set reserves/boosts. A focused simulation suite can therefore serve two purposes: it can (a) validate that the realized welfare loss is approximately linear in VarT in regimes where dynamic regret is controlled, and (b) demonstrate the predicted improvement in welfare as the advice quality parameter γ increases.
We recommend simulating single-slot auctions with n advertisers over T rounds, with per-round values
{vi, t}
generated by a controlled drift process. In each round, bidders submit
bids via an autobidding policy (e.g., primal–dual pacing) and the
platform runs a fixed mechanism M (e.g., SPA/VCG with optional
reserves/boosts). For each run we record: (i) cumulative liquid welfare
$\sum_{t=1}^T \mathrm{Lwel}_t$, (ii)
realized revenue $\sum_{t=1}^T \sum_i
p_{i,t}$, (iii) constraint satisfaction (budget spend and RoS),
and (iv) learning diagnostics such as DRi(T) (or its
empirical proxy) and the realized path length of bids. To benchmark
efficiency, we estimate OPTT by solving an
offline planning problem using the realized {vi, t}:
in the single-slot case, an LP relaxation over allocations {xi, t ∈ [0, 1]}
with feasibility ∑ixi, t ≤ 1,
and with bidder-side cumulative constraints mirroring the simulated
budgets and RoS targets. The principal plot is then the welfare
ratio
$$
\mathrm{Eff}(T)\;:=\;\frac{\sum_{t=1}^T
\mathrm{Lwel}_t}{\mathrm{OPT}_T},
$$
reported with confidence intervals across random seeds.
To make the VarT dependence visible (rather than washed out by i.i.d. noise), we recommend drift processes in which VarT can be dialed up while keeping the marginal distribution of values approximately fixed. Three convenient families are:
Partition [T] into S blocks of length L = T/S. In each
block s, draw a baseline value
vector (θ1, s, …, θn, s)
(e.g., lognormal) and then set vi, t = θi, s + ϵi, t
within the block. Drift is introduced by changing θ⋅, s between
blocks; in this design,
$$
\mathrm{Var}_T \approx \sum_{s=2}^S \max_i
|\theta_{i,s}-\theta_{i,s-1}|,
$$
so increasing either the number of switches S or the jump sizes increases VarT predictably.
Fix a total drift budget D and generate values recursively via vi, t = vi, t − 1 + Δi, t where Δi, t is chosen to satisfy ∑t ≥ 2maxi|Δi, t| ≤ D while inducing ``hard’’ changes (e.g., alternating which bidder is highest value). This construction is useful for stress-testing tracking: it tends to force large shifts in the identity of the efficient winner without changing the average scale of values.
Use vi, t = mi + Aisin (2πt/P) + ϵi, t
with period P and amplitude
Ai. Here
VarT scales like
(T/P)maxiAi,
making it straightforward to separate
slow drift'' (large $P$) fromfast drift’’ (small P) while keeping the same
amplitude.
Across designs, we recommend reporting the realized VarT computed from the generated vi, t sequence to confirm that the intended drift control is reflected in the actual statistic used by the theorem.
Budgets and RoS targets should be heterogeneous to reflect realistic
pacing pressure. A simple parameterization is to draw Bi proportional
to a target spend rate (e.g., Bi/T
uniform in a range) and to set τi so that RoS
is binding for some bidders but slack for others. We then implement
primal–dual pacing with bids of the form bi, t = κi, tvi, t
(or the corresponding mapping in the chosen auction), with dual
updates
μi, t + 1 = [μi, t + η(pi, t − Bi/T)]+, νi, t + 1 = [νi, t + η(τipi, t − xi, tvi, t)]+,
and with a hard stopping rule that sets bi, t = 0
once spend hits Bi to ensure
Violi(T) = 0 by
construction. To connect directly to the dynamic guarantee, one can
optionally include an explicit tracking component (e.g., optimistic
mirror descent on the dual, or a slowly varying stepsize) and measure an
empirical dynamic regret proxy against a sliding-window best
response.
The most informative experiment is a sweep over drift levels while
holding fixed n, T, and the learning rule. For each
drift family above, vary a single knob (jump size, number of switches,
or period P) to span a range
of VarT values. For
each setting, report the welfare gap
$$
\mathrm{Gap}(T)\;:=\;\mathrm{OPT}_T-\sum_{t=1}^T \mathrm{Lwel}_t.
$$
Our prediction is that, once T
is large enough that per-round regret terms are small relative to T, the gap grows approximately
linearly with VarT
(and the slope changes modestly with mechanism and constraint
tightness). A particularly transparent visualization is a scatter plot
of Gap(T) versus VarT across all seeds and
drift settings, together with a fitted line. To make the comparison
meaningful, we recommend ensuring that the total scale of values (e.g.,
$\frac{1}{T}\sum_t \max_i v_{i,t}$) is
held approximately constant across the sweep; otherwise, changes in
OPTT confound the
drift effect.
To test the advice channel, we generate platform predictions v̂i, t satisfying the multiplicative guarantee v̂i, t ∈ [γvi, t, vi, t] by construction; for example, take v̂i, t = vi, t ⋅ ζi, t where ζi, t ∼ Unif([γ, 1]), or a more structured model in which ζi, t depends on the context and bidder. The platform then uses v̂i, t to set a personalized reserve ri, t or a score boost si, t (depending on the auction format) in a way that is monotone in v̂i, t. Concretely, in an SPA-with-scores implementation one can rank bidders by a score bi, t + si, t and impose a reserve ri, t so that a bidder must clear bi, t ≥ ri, t to be eligible; the policy choice (how ri, t and si, t depend on v̂i, t) should be fixed across the sweep so that only γ changes.
The recommended experiment is then a grid over γ ∈ {0.2, 0.4, 0.6, 0.8, 1.0} crossed with a representative set of drift levels. For each cell, report Eff(T) and also the distribution of winners by true value quantiles (to verify the intended mechanism-side correction: higher-v bidders should win more often as γ increases). The predicted pattern is monotone improvement in welfare with γ, and the gains should be largest when constraints are tight (because that is when advice-driven reserves/boosts most effectively prevent low-value/high-multiplier bidding from crowding out efficient allocations).
To interpret results, we recommend including three baseline variants: (i) i.i.d. values as a sanity check (the gap should then be largely explained by learning transients), (ii) γ = 0 (or, operationally, reserves/boosts turned off), and (iii) a pacing heuristic (e.g., fixed κi chosen to hit the budget in expectation) to highlight the role of dynamic tracking. Robustness checks that are often informative in practice include: heavy-tailed values (lognormal with large variance), correlated values across bidders (common shocks), and mis-specified advice where v̂i, t violates the multiplicative bound for a fraction of rounds (to assess how sensitive the gains are to departures from the γ model).
For reproducibility, we suggest reporting (n, T), the distributional family for values, drift parameters and realized VarT, budget and RoS distributions, stepsizes η, and the stopping/enforcement logic. Each figure should include (i) welfare ratio Eff(T), (ii) constraint satisfaction (fraction of runs with any violation and average slack), and (iii) a decomposition of the welfare gap into a component attributable to drift (variation sweep) versus a component attributable to learning noise (no-drift control). The objective of this section is not to ``fit’’ constants, but to verify the comparative statics emphasized by the theory: linear-in-VarT degradation in difficult drifting environments, and systematic welfare improvement as γ increases under advice-driven mechanism adjustments.
We studied repeated single-slot ad auctions in which the frictions that dominate practice—hard budgets, hard return-on-spend (RoS) targets, and non-stationary values—are front and center. Our organizing device was a constrained-smoothness inequality written for a liquid-welfare benchmark, combined with learning guarantees stated in terms of (dynamic) regret on a Lagrangian payoff sequence. This perspective does two things at once. First, it separates efficiency losses (captured by (α, β), and improved to α(γ) when the platform uses sufficiently accurate ML advice) from losses (captured by ∑iDRi(T) and by the drift proxy VarT). Second, it yields a welfare bound that is meaningful for large markets and long horizons: the leading term scales with OPTT while the error terms are additive and interpretable as tracking and non-stationarity penalties.
At a conceptual level, the model illuminates a tradeoff that is often implicit in deployed autobidding systems. On the one hand, advertiser-side learning rules (e.g., primal–dual pacing) can enforce budgets and RoS targets nearly exactly, but doing so induces bid shading that can be allocatively harmful in any fixed auction format. On the other hand, platform-side adjustments (reserves, boosts, scoring) can partially undo this distortion, but only to the extent that they are informed by reliable value proxies. The constrained-smoothness constant α (and its advice-refined counterpart α(γ)) is a natural quantitative summary of how well a mechanism ``tolerates’’ constraint-driven bidding behavior while still producing liquid welfare.
We also view our results as a step toward a common language for three communities that typically reason separately: auction theory (which often assumes equilibrium and stationary types), online learning (which often abstracts away from mechanism design), and production ads systems (which internalize constraints and feedback limitations but rarely obtain end-to-end welfare guarantees). That said, several limitations remain, and they lead to a concrete open-problem agenda.
Our dynamic guarantee degrades additively with VarT (and with dynamic
regret). An important theoretical question is whether an O(VarT) penalty
is unavoidable for any algorithm–mechanism pair under our information
structure, or whether it is an artifact of the proof technique (in
particular, of the stability step that compares drifting deviation maps
over time). A crisp direction is to construct adversarial drift
instances in which, even under full-information feedback and even for
``benign’’ mechanisms such as VCG-like rules, the welfare gap
$$
\mathrm{OPT}_T-\sum_{t=1}^T \mathrm{Lwel}_t
$$
is provably Ω(VarT) for any
no-violation policy class that respects hard budgets and RoS
constraints. Such a lower bound would formalize a practical intuition:
when the identity of the efficient winner changes rapidly, any algorithm
that paces spend to satisfy horizon-wide constraints must either (i)
delay adaptation and lose welfare, or (ii) overshoot constraints and be
forced to stop bidding later. Conversely, it would be equally valuable
to identify settings in which VarT can be replaced by a
smaller, instance-dependent complexity (e.g., a predictability or margin
condition on values) without contradicting worst-case impossibility.
We emphasized that ML advice can improve efficiency via a better
smoothness constant, e.g.,
$$
\alpha(\gamma)=\frac{2}{1+\gamma}
$$
in VCG/SPA-like settings under a multiplicative signal model. Two kinds
of tightness questions remain. First, for a fixed mechanism family (say,
SPA with bidder-specific reserves and score boosts), what is the α(γ) as a function of γ under liquid-welfare objectives
and hard RoS constraints? Second, how do budgets and RoS targets
interact with the revenue term in smoothness (the role of β) once we restrict attention to
feasible (no-violation) bidding dynamics rather than arbitrary bid
profiles? Answering these requires instance constructions that witness
inefficiency and also careful accounting of how liquid welfare
upper-bounds revenue when constraints bind. From a design standpoint,
tight constants matter: a factor improvement in α is often the difference between
near-optimal in aggregate'' andsystematically leaving value
on the table’’ in constrained, competitive markets.
Our framework treats the mechanism as fixed and evaluates welfare
given learning behavior. A natural next step is to invert the problem:
choose M (within an
implementable class) to minimize α subject to operational constraints
(simplicity, monotonicity, robustness, revenue floors), anticipating
that bidders run primal–dual pacing or related dynamics. This is not the
classic
optimal auction'' problem; the objective is not revenue, and the solution concept is not a static equilibrium. Rather, we want mechanisms whose constrained-smoothness inequality is as tight as possible \emph{uniformly over} the Lagrangian multipliers that learning induces. One plausible design principle is to build the scoring/reserve rule to bedual-compatible’’:
if bidders shade bids as bi, t ≈ κi, tvi, t,
the platform should adjust scores to approximate ordering by vi, t,
not by the shaded proxy. Making this precise could lead to a
convex-analytic characterization of the optimal boost/reserve mapping in
terms of the distribution of multipliers (μi, νi)
induced by constraints.
We modeled advice by a clean multiplicative parameter γ and treated it as exogenous. In practice, the quality of value predictions depends on the auction rules (which affect data), on the bidding policies (which affect selection), and on platform measurement (which affects the value labels themselves). A more complete theory would endogenize γ as an equilibrium object of a data-generating process, potentially creating feedback loops: changing reserves changes who wins, which changes observed conversions, which changes the next period’s predictions. Even short of full endogeneity, a pressing question is robustness: if v̂i, t violates v̂i, t ∈ [γvi, t, vi, t] on a nontrivial fraction of rounds, can we replace γ with an ``effective’’ quality parameter that degrades gracefully with the violation rate? Such results would better match the policy question that practitioners face: not whether predictions are perfect, but whether they are reliable enough to justify mechanism-side intervention.
Our learning guarantees are cleanest under feedback that allows bidders to evaluate (or estimate accurately) Lagrangian payoffs. Many ad platforms provide only partial information (e.g., no counterfactual prices, delayed conversions, noisy attribution). Extending the welfare theorem to these environments requires bandit or semi-bandit dynamic regret bounds with constraints, which are technically subtle even without strategic interactions. A particularly relevant target is to quantify how welfare scales with the delay distribution of value feedback and with the variance of conversion measurement. The mechanism-side constants (α, β) would remain the same, but the behavior-side terms could degrade from $O(\sqrt{T})$ to worse rates; identifying the sharp dependence would help translate measurement investments into welfare predictions.
Single-slot auctions are a useful primitive, but many ad markets are multi-slot, multi-format, and paced across campaigns. Extending constrained smoothness and liquid welfare to settings with multiple simultaneous constraints (multiple budgets, multiple RoS-like targets, frequency caps) and with richer feasibility constraints (position auctions, knapsack-like constraints) is an open direction. Likewise, advertisers often interpolate between value maximization and profit maximization (our λi), or optimize non-linear objectives (risk, attribution-weighted conversions). A general theory should identify which classes of objectives preserve the Lagrangian reduction and which break it, and how the definition of liquid welfare should be modified to remain an economically meaningful benchmark.
In sum, our results suggest a practical agenda: (i) understand when drift truly limits efficiency and when it can be mitigated; (ii) quantify the best constants achievable by advice-aware mechanisms; and (iii) design auction rules that are not merely incentive-compatible in a static sense, but under the constrained dynamics that real autobidders implement. The broader implication is that efficiency in modern ad markets is jointly produced by mechanism design, prediction quality, and online learning—and improving any one component is most valuable when it is explicitly coordinated with the other two.