Efficiency Without Convergence: Learning-Robust Liquid Welfare Guarantees for Non-Stationary Autobidding

Metadata

Total Words: 14,051
Export Date: 2026-01-16 01:04:34
Description: Autobidding systems in online advertising increasingly rely on learning-based pacing to satisfy budgets and target CPA/ROI (RoS) constraints. Classical equilibrium analysis is brittle in this setting: equilibria can be PPAD-hard to compute, and observed dynamics may exhibit bi-stability or limit cycles. We provide efficiency guarantees that do not require equilibrium existence or convergence. Building on primal-dual bidding formulations (dual-based bid shading in truthful auctions) and recent welfare-without-convergence results for pacing dynamics, we introduce a constrained-smoothness framework tailored to budget/RoS feasibility and liquid welfare benchmarks. We then combine it with dynamic-regret bounds for online primal-dual learning to show that the realized liquid welfare under practical pacing-style no-regret algorithms is within a constant factor of the offline optimum, up to additive losses that scale with non-stationarity (bounded variation) and learning regret. Finally, we show how platform ML advice implemented via personalized reserves/boosts improves the constant factor as a function of signal quality , yielding welfare guarantees strictly better than the canonical factor-2 barrier in several standard auction formats. The results offer a tractable, deployment-relevant notion of stability: welfare degrades smoothly with drift rather than collapsing when dynamics fail to converge.

1. Introduction and motivation: why equilibrium is the wrong object for 2026 autobidding; goals (learning-robust welfare under drift).
2. Related work: LP/dual bidding in truthful auctions; online primal-dual learning; PoA/liquid welfare; dynamics without convergence; ML advice reserves/boosts.
3. Model: repeated auctions, budgets and RoS constraints, liquid welfare benchmark, offline OPT over T, and bounded-variation non-stationarity.
4. Constrained smoothness: definition, intuition, and how it differs from standard smoothness; examples and non-examples.
5. Learning layer: primal-dual pacing algorithms, (dynamic) regret notions, and constraint violation accounting; clean sufficient conditions on algorithms.
6. Main theorem: learning-robust liquid welfare bound (pathwise / in expectation) with explicit dependence on DR(T) and Var_T; discussion of constants.
7. Instantiations: SPA/VCG (truthful), FPA (non-truthful), and randomized variants; mapping to (, )-constrained smoothness parameters.
8. ML advice and improved constants: reserves/boosts with signal quality ; derive () and comparative statics.
9. Extensions: multi-slot position auctions; mixed objectives (); alternative RoS definitions; partial feedback effects (when closed-form is unavailable, indicate numerical calibration).
10. Simulation plan (optional but recommended): synthetic drifting markets; demonstrate tightness of Var_T dependence and the ML advice improvement.
11. Conclusion and open problems: lower bounds, tight constants, and mechanism design to minimize under learning dynamics.

Content

1. Introduction and motivation: why equilibrium is the wrong object for 2026 autobidding; goals (learning-robust welfare under drift).

Digital advertising auctions are no longer populated by bidders who ``choose a bid’’ in the classical sense. By 2026, a typical advertiser participates through an agent that ingests platform-predicted conversion values, enforces spend constraints, and updates bids continuously using feedback from recent auctions. From the advertiser’s perspective, the relevant control knob is not a one-shot bid but a learning-and-pacing system; from the platform’s perspective, the auction mechanism is only one component in a larger dynamical loop that includes prediction, eligibility, reserve policies, and bidder automation. In this environment, the traditional equilibrium objects of auction theory—Bayes–Nash equilibrium, symmetric equilibria, or even static best responses—are often the wrong abstraction. They can be misleading about what is achievable, what is stable, and what is socially efficient.

The first reason is . The equilibrium paradigm assumes a fixed distribution and a sufficiently long period for behavior to converge (implicitly or explicitly) to a steady state. Real ad markets drift: seasonality, product launches, query mix, and platform model updates can change values on timescales comparable to campaign horizons. When the underlying value sequence moves, there may be no stationary equilibrium to converge to, and convergence itself is not a sensible desideratum. What matters operationally is whether the coupled system (mechanism plus learning agents) a moving benchmark and how much efficiency is lost due to adaptation lags. This suggests measuring performance through regret-like quantities and stability terms that degrade gracefully with market drift, rather than asking whether play converges.

The second reason is . Many advertisers face strict budget limits and return-on-spend (RoS) targets that bind at the campaign level, not auction-by-auction. These constraints change the meaning of a bid: a bidder is not simply trading off value against payment in a quasilinear utility; it is attempting to allocate a limited spend across opportunities while maintaining a minimum value-per-dollar ratio. In practice, this induces systematic bid shading and pacing behavior that cannot be captured by unconstrained equilibrium analyses. Moreover, welfare benchmarks based on raw value can become inappropriate when advertisers are liquidity constrained. A large nominal value is not meaningful if it cannot be realized under the budget or if it would violate an RoS policy. This motivates as a benchmark: it respects the idea that each bidder’s contribution to welfare is capped by what it can feasibly finance, and it aligns more closely with what constrained advertisers can plausibly achieve.

A third reason equilibrium is the wrong object is . Autobidders differ in update rules, exploration, smoothing, and risk controls. Even if each agent is ``optimizing,’’ it may do so against a surrogate objective dictated by internal constraints and platform reporting. The appropriate behavioral model is thus not that bidders play an exact best response in every round, but that they run online algorithms with performance guarantees relative to a comparator class. This is both more realistic and more informative: it allows us to reason about the welfare outcomes of a market populated by learning systems without presuming a particular equilibrium selection.

These observations point to a different question: Our goal is to provide a welfare theory for repeated auctions with autobidding that is robust to (i) decentralized online learning, (ii) binding budgets and RoS constraints, and (iii) non-stationary values. Conceptually, we replace equilibrium efficiency guarantees (price of anarchy at a fixed equilibrium) with guarantees: bounds that hold along the realized play path of the learning process, for finite horizons, and that degrade smoothly with the difficulty of online adaptation.

A key methodological choice is to use the framework as a bridge between individual incentives and social welfare. Smoothness arguments yield variational inequalities: they compare the welfare of the realized outcome to the welfare of a carefully chosen deviation, and then convert no-regret'' behavior into welfare bounds. In unconstrained settings, this has become a standard route from learning dynamics to efficiency. Our setting requires two extensions. First, the welfare objective must reflect budgets and RoS constraints; second, the bidders' incentives must incorporate these constraints in a way that is compatible with no-regret learning. We therefore work with Lagrangian surrogates: each bidder is evaluated on a payoff that augments value and payment with multiplier-weighted constraint terms. This is not merely a proof device: it mirrors how real pacing systems operate. Practical autobidders routinely maintain internalshadow prices’’ for spend and performance constraints, updating them as constraints tighten or relax, and then translate these shadow prices into bid multipliers or throttling decisions.

Non-stationarity raises a further complication. Even if a mechanism is smooth and bidders have small regret, the benchmark deviation used in smoothness may itself change over time as values drift. In such environments, the relevant notion is : the ability of the learning algorithm to compete with a sequence of comparators that is allowed to move, but only at a rate commensurate with market drift. This captures the operational requirement that autobidders should not merely learn an average best response; they should changing opportunities while respecting long-run constraints. Correspondingly, our welfare bounds include explicit drift-dependent terms (via a bounded-variation measure) that quantify the inherent difficulty of tracking a moving target. This makes the theory predictive in a practical sense: when markets are calm, the welfare guarantee approaches the static benchmark; when markets are turbulent, the bound worsens in a way that is proportional to measurable variation.

The platform also plays an active role through prediction and policy, and this creates an additional lever for efficiency. In modern ad systems, the platform often provides value proxies (conversion predictions, quality scores) and uses them to set reserves, apply boosts, or adjust eligibility. We treat these interventions as that can be imperfect but informative. The central economic point is that even coarse multiplicative information about relative values can be used to improve robustness: it can dampen the misallocations created when constraint-driven bidders with high shadow prices crowd out genuinely high-value bidders. In our analysis, better advice leads to better constants in the smoothness inequality, and hence to improved welfare guarantees under learning. This connects mechanism design to ML system quality in a clean, quantitative way: prediction improvements are not only about revenue or relevance; they can tighten worst-case efficiency guarantees when bidders are constrained and adaptive.

We emphasize what this approach does assume. We do not require convergence to any equilibrium, nor do we assume bidders are truthful or have stable private types drawn i.i.d. from a distribution. We also do not require the platform to know bidders’ budgets or RoS targets exactly, beyond what is needed to define feasible outcomes. Instead, we assume that bidder-side learning achieves standard online performance guarantees on the relevant surrogate payoff sequence and that constraint violations are controlled (or eliminated) by pacing rules. These are testable, engineering-aligned assumptions: they can be verified by inspecting algorithmic update rules and by monitoring realized constraint satisfaction.

At the same time, we acknowledge limitations. Liquid welfare is an appropriate benchmark for constrained bidders, but it is not the only plausible one, and it can understate welfare when budgets reflect accounting rules rather than true liquidity. Our drift measure is deliberately simple, and richer forms of non-stationarity (strategic feedback loops, delayed attribution, or correlated shocks across bidders) may require sharper stability notions. Finally, ML advice can fail in adversarial ways; our guarantees improve with advice quality but do not eliminate the need for conservative mechanism design when predictions are biased or non-calibrated.

Overall, the model illuminates a central tradeoff faced by platforms and advertisers: automation and constraints are indispensable for managing spend and performance at scale, yet they create dynamic interactions that make equilibrium an unreliable guide. A learning-robust welfare lens—grounded in smoothness, regret, and drift—offers a tractable alternative that speaks directly to how autobidding systems behave and how policy interventions (such as reserve/boost design and prediction improvements) translate into efficiency in repeated auction markets.

Our analysis sits at the intersection of four literatures that are often treated separately in auction theory: (i) budget-feasible bidding via LP and dual variables in (approximately) truthful auctions, (ii) online primal–dual learning for long-run constraints, (iii) price-of-anarchy (PoA) guarantees for auctions with liquidity constraints and the associated liquid-welfare benchmark, and (iv) learning dynamics in markets that need not converge, especially under non-stationarity. A fifth, more recent strand concerns how platform-side prediction can be interpreted as ``advice’’ that changes the effective mechanism via reserves, boosts, and eligibility rules. We briefly situate our approach in each strand and clarify what is new in the way we combine them.

A canonical way to understand budgeted bidding in repeated auctions is through a linear-programming relaxation of the advertiser’s allocation problem and its Lagrangian dual. In a stylized model where the bidder faces exogenous per-round prices (or a truthful allocation rule), the bidder’s optimal policy often takes a simple threshold or multiplicative form: bid proportional to value with a pacing multiplier that acts as a shadow price on spend. This perspective is widespread in the advertising literature on pacing and budget smoothing, including work that characterizes ``pacing equilibria’’ and relates them to competitive equilibria or Eisenberg–Gale-type convex programs; see, e.g., . The economic content is that budgets turn quasilinear utility maximization into a resource-allocation problem, and shadow prices encode intertemporal tradeoffs: spending today tightens the constraint tomorrow, so bids are shaded.

Our setting preserves this dual intuition but departs from the classical pacing models in two ways. First, we allow an RoS (or ROI) constraint in addition to a spend budget. This creates a shadow price that penalizes allocations with insufficient value-per-dollar, aligning with how many practical autobidders operate (they maintain both a budget pacer'' and a performanceguardrail’’). Second, we do not assume the bidder faces exogenous prices: prices are endogenously generated by a strategic population of other paced learners, and values drift over time. This pushes us away from equilibrium characterizations and toward welfare guarantees that hold along the realized learning path.

A large body of work in online optimization studies how to make sequential decisions subject to constraints that couple decisions across time (long-term constraints''). Standard techniques include online mirror descent with Lagrange-multiplier updates, online saddle-point methods, and primal--dual algorithms that achieve sublinear regret while ensuring vanishing average constraint violation; see, among many others, \cite{Zinkevich2003,Hazan2016}. These tools have been adapted to online auction participation, where theprimal’’ action is a bid (or a policy parameter) and the ``dual’’ variables are pacing multipliers. In truthful mechanisms, the mapping from bids to outcomes is sufficiently regular that simple multiplier updates yield explicit bidding rules, and the analysis reduces to regret bounds for online convex optimization on a surrogate payoff.

We build directly on this line, but we emphasize a performance notion that is particularly suited to drifting markets: against a comparator sequence with bounded path length. Dynamic regret has become a standard lens for non-stationary online learning, where the right benchmark is not a single fixed action but a slowly moving best action; see, e.g., . In ad markets, this is not merely a mathematical convenience: seasonality and model updates imply that the ``right’’ pacing multiplier itself changes. Accordingly, we treat primal–dual pacing as a tracking problem, and we make the dependence on market variation explicit in the welfare loss.

On the mechanism-design side, our welfare analysis uses the smoothness framework for bounding the PoA of games induced by auctions . Smoothness provides a robust route from local deviation inequalities to global welfare guarantees and, crucially for our purposes, it composes naturally with no-regret learning: if players have small regret, the time-average outcome is approximately efficient relative to the smoothness benchmark . This is the conceptual bridge that lets us reason about markets populated by learning agents without requiring equilibrium convergence.

Budgets complicate the welfare objective itself. In the presence of hard budgets, raw social welfare can be an unrealistic benchmark because a bidder with large value but small liquidity cannot actually realize that value at the required payments. The liquid-welfare benchmark formalizes this intuition by capping each bidder’s welfare contribution by its budget (or, more generally, by what is feasible under its financial constraints). Liquid welfare has been studied extensively in the context of budget-feasible mechanism design and PoA analysis for auctions with budgets; see, e.g., . Our contribution here is not to propose a new benchmark, but to tailor the smoothness method to a setting with campaign-level constraints (budget and RoS) and to do so in a way that is compatible with learning. Technically, this leads us to a constrained-smoothness statement written for Lagrangian surrogate payoffs: the same multipliers that appear in bidder-side pacing also appear in the inequality that underwrites the welfare bound.

A parallel literature in learning in games emphasizes that many adaptive dynamics do not converge to Nash equilibrium, even in simple auction environments, yet they can still yield meaningful welfare guarantees when they have no-regret properties . This observation is particularly salient in ad auctions, where heterogeneity in bidder algorithms, delayed feedback, and periodic resets make steady-state notions empirically fragile. The smoothness/no-regret paradigm is now a standard way to extract efficiency conclusions without convergence assumptions, and it has been applied broadly in auctions and market design .

Our focus differs in that we incorporate rather than treating payoffs as adversarial but time-invariant in their comparator structure. When values drift, the deviation that certifies smoothness (and the ``best response’’ it approximates) also drifts. This is where dynamic regret becomes essential: it quantifies how well the learning dynamics track a moving benchmark, and it gives a welfare bound that degrades smoothly with drift. Conceptually, we view non-stationarity not as an adversary that destroys guarantees, but as a measurable feature of the environment that enters additively through a variation term.

Modern ad platforms do not implement a bare second-price or VCG mechanism; they implement a mechanism prediction-driven policy layers: personalized reserves, quality-score boosts, and eligibility filters. These interventions are often motivated by revenue, relevance, or policy concerns, but they also change the strategic landscape faced by autobidders. A growing literature studies mechanism design with predictions (or ``advice’’), including learning-augmented and robust-auction-design results showing how approximate value information can improve efficiency or revenue while retaining worst-case guarantees when predictions are imperfect; see, e.g., . In parallel, the PoA literature has documented that appropriately chosen reserves can improve welfare guarantees in auctions, and that such improvements can be quantified in smoothness constants.

We adopt a deliberately reduced-form view: the platform has access to multiplicative approximations to bidder values and uses them to set reserves/boosts. The technical role of advice is to tighten the constrained-smoothness parameters, which then tightens the learning-robust welfare bound. The economic role is to mitigate a specific failure mode of constraint-driven bidding: when some bidders have high shadow prices (from tight constraints), they may bid aggressively even on modest-value opportunities, crowding out genuinely high-value bidders. Advice-driven reserves/boosts act as a corrective by reintroducing value information into the effective allocation rule, improving worst-case efficiency even when bidders are shading.

Putting these strands together, we can view the market as a coupled primal–dual control system: bidders update shadow prices to satisfy campaign constraints; the mechanism maps bids to allocations and payments; and platform-side prediction shapes the mapping itself. Our contribution is to provide a welfare guarantee that is simultaneously (i) consistent with liquid-welfare benchmarks under budgets and RoS constraints, (ii) robust to decentralized learning without convergence, (iii) stable under bounded market drift via dynamic regret and explicit variation terms, and (iv) quantitatively responsive to the quality of ML advice through improved smoothness constants. With this context in place, we next formalize the repeated-auction model, constraints, and benchmarks that the welfare theory will reference.

3. Model: repeated auctions, budgets and RoS constraints, liquid welfare benchmark, offline OPT over T, and bounded-variation non-stationarity.

We study a horizon of T repeated, single-slot ad auctions indexed by t ∈ [T]. In each round a public context (query, user, page, time-of-day, etc.) arrives, and each advertiser i ∈ [n] observes a scalar value v_i, t ∈ [0, v̄] for winning the impression in that context. We interpret v_i, t as an expected downstream value (e.g., expected conversions times value-per-conversion) conditional on the information available to bidder i at the time of bidding. Treating values as predictions is empirically natural in ad markets, and it also clarifies what is (and is not) being guaranteed: our welfare benchmark is computed with respect to the same v_i, t sequence that bidders act on, rather than an unobserved ``true’’ value.

In every round t, bidders submit bids b_i, t from a bounded action space ℬ_i (e.g., [0, b̄]), forming a bid profile b_t = (b_1, t, …, b_n, t). The platform runs a fixed mechanism M that maps b_t to an allocation and payments:
x_i, t(b_t) ∈ [0, 1], p_i, t(b_t) ≥ 0,
with the single-slot feasibility constraint $\sum_{i=1}^n x_{i,t}(\mathbf{b}_t) \le 1$ for all t. Allowing x_i, t(b_t) ∈ [0, 1] accommodates randomization (e.g., tie-breaking or probabilistic eligibility), and it is convenient for the convex-analytic arguments used later. We keep M reduced-form: it may correspond to a second-price auction with reserves, VCG with boosts, or another allocation/payment rule deployed by the platform; the welfare theorems will only require an efficiency property of M (formalized via constrained smoothness in the next section), not truthfulness.

Given outcomes (x_i, t, p_i, t), bidder i realizes value
V_i, t := x_i, t(b_t) v_i, t.
We allow advertisers to interpolate between pure value maximization and quasilinear utility by introducing a parameter λ_i ∈ [0, 1] and writing the (unconstrained) per-round objective as V_i, t − λ_ip_i, t(b_t). This is a simple way to capture heterogeneous bidding goals seen in practice: some campaigns are almost purely volume-driven (small λ_i), while others behave closer to profit maximizers (larger λ_i). Importantly, the hard constraints below will be enforced at the level, so the learning problem is not myopic even when λ_i = 0.

Each bidder i faces two horizon-coupled constraints. First, a hard spend budget B_i:
$$ \sum_{t=1}^T p_{i,t}(\mathbf{b}_t) \;\le\; B_i. $$
Second, a hard return-on-spend (RoS) target τ_i > 0, requiring cumulative value to be at least τ_i times cumulative spend:
$$ \sum_{t=1}^T x_{i,t}(\mathbf{b}_t)\, v_{i,t} \;\ge\; \tau_i \sum_{t=1}^T p_{i,t}(\mathbf{b}_t). $$
This RoS constraint is a stylized version of ROI/CPA guardrails used by many autobidders: the campaign is permitted to spend only insofar as it is supported by predicted value at a target rate. We emphasize that the constraint couples allocation and payments, hence it cannot generally be enforced by a static per-auction bid cap alone; it requires intertemporal pacing and, from an analytical perspective, it motivates Lagrangian dual variables that evolve over time.

While we state the constraints in hard form, we will allow learning algorithms to incur a small cumulative violation Viol_i(T) (and we will later highlight how primal–dual pacing can make Viol_i(T) = 0 with appropriate safeguards). This is both technically convenient and practically aligned with the fact that platforms often permit small transient deviations while avoiding systematic overspend or underperformance.

With liquidity constraints, conventional social welfare ∑_i, tx_i, tv_i, t can be misleading as an efficiency yardstick: allocating many high-value impressions to a bidder with a tight budget (or strict RoS) may be infeasible to support at the required payments. We therefore evaluate outcomes using a benchmark that caps each bidder’s contribution by what its constraints can rationally finance.

For a realized allocation sequence {x_i, t}_t = 1^T, define bidder i’s as
$$ \min\!\left\{\, B_i,\; \frac{1}{\tau_i}\sum_{t=1}^T x_{i,t} v_{i,t}\right\}. $$
The term (1/τ_i)∑_tx_i, tv_i, t is the maximum spend that could be justified by the realized value at RoS τ_i, whereas B_i is the hard liquidity cap; the minimum of the two is the amount of ``effective welfare’’ bidder i can contribute. Summing across bidders yields cumulative liquid welfare,
$$ \mathrm{Lwel}(\{x_{i,t}\}) \;:=\; \sum_{i=1}^n \min\!\left\{\, B_i,\; \frac{1}{\tau_i}\sum_{t=1}^T x_{i,t} v_{i,t}\right\}. $$
This objective has two virtues for our purposes. Economically, it respects the fact that budgets and performance constraints limit how much value can be translated into feasible payments and hence into implementable allocations. Analytically, it aligns with the kinds of inequalities that relate revenue, payments, and feasible surplus in auction games, which is what ultimately makes price-of-anarchy style reasoning possible under constraints.

Because Lwel(⋅) is defined over the entire horizon, it is not inherently additive across rounds. Nevertheless, it is convenient to speak of per-round contributions Lwel_t when stating learning guarantees; any decomposition that sums to the realized Lwel (e.g., via a telescoping construction) suffices for the reductions used later. The key point is that the benchmark is campaign-level and constraint-aware.

To measure inefficiency, we compare the realized outcome to an offline planner who knows the entire value sequence {v_i, t} in advance and chooses feasible allocations to maximize liquid welfare. Formally, letting 𝒳 denote the set of sequences {x_i, t} satisfying per-round feasibility (x_i, t ∈ [0, 1] and ∑_ix_i, t ≤ 1), we define
OPT_T := max_{{x_i, t} ∈ 𝒳} Lwel({x_i, t}).
This benchmark is intentionally mechanism-agnostic: it does not require that the allocation be generated by equilibrium bidding in M, only that it be feasible in the single-slot sense and evaluated under the same (B_i, τ_i) caps. We view this as the appropriate notion of ``what was achievable’’ from the perspective of a platform designer or policy analyst: it captures the best constraint-feasible use of impression supply given realized predicted values, abstracting from strategic and learning frictions. A limitation of this choice is that it may be unattainable under a particular mechanism or information structure; the role of constrained smoothness will be precisely to relate attainable outcomes under M to this offline ideal up to constant factors.

A central feature of ad markets is that values drift: conversion rates change with seasonality, user mix, and product availability; prediction models are retrained; and competing bidders enter or exit. To model this without committing to a stochastic process, we treat {v_i, t} as an arbitrary bounded sequence and quantify its drift via a bounded-variation measure,
$$ \mathrm{Var}_T \;:=\; \sum_{t=2}^T \max_{i\in[n]} \bigl|v_{i,t}-v_{i,t-1}\bigr|. $$
This scalar proxy is deliberately coarse, but it captures the stability property that matters for learning guarantees: when Var_T is small, the ``right’’ bid shading and pacing multipliers should also move slowly, making tracking feasible with small dynamic regret. When Var_T is large, no algorithm can reliably track a rapidly shifting environment without incurring additional loss, so we expect (and later obtain) welfare bounds that degrade additively in Var_T.

Finally, we specify feedback only at the level needed to define the learning problem: bidders observe v_i, t before bidding and receive enough post-auction information to update their learning rules (e.g., payments and whether they won). We state our main results for clean full-information or semi-bandit feedback so that standard primal–dual regret bounds apply; extending to more limited feedback is possible but would primarily change the regret rates, which then enter the welfare guarantees as explicit error terms.

4. Constrained smoothness: definition, intuition, and how it differs from standard smoothness; examples and non-examples.

To connect strategic bidding in a fixed mechanism M to the offline benchmark OPT_T, we use a smoothness-style inequality adapted to (i) hard intertemporal constraints (budgets and RoS) and (ii) a constraint-aware welfare objective (liquid welfare). Standard smoothness arguments were developed for quasilinear utilities and unconstrained social welfare; in our setting both of those ingredients fail in an essential way. Constrained smoothness is the device that restores a single ``one-shot’’ variational inequality that can later be aggregated over time and combined with regret bounds.

The core idea is to evaluate unilateral deviations using a Lagrangian surrogate rather than the advertiser’s raw per-round objective. Fix nonnegative multipliers μ_i, ν_i ≥ 0 corresponding to the budget and RoS constraints. For a given round (suppressing t), we define bidder i’s at bid profile b as

The additive constant μ_iB_i/T is immaterial for best responses but convenient when we sum over time. What matters is that can be rewritten as a quasilinear payoff with coefficients:

Thus, conditional on the multipliers, bidder i behaves it had value (1 + ν_i)v_i and a ``price weight’’ (λ_i + μ_i + τ_iν_i) on payments. This observation is what allows us to import one-shot smoothness reasoning into a constrained, horizon-coupled problem: the constraints are not enforced inside the deviation itself, but they are priced into the surrogate utility.

Let LW(⋅) denote the relevant liquid-welfare benchmark for the one-shot instance (or, equivalently, the per-round component induced by the offline optimum; we keep the definition abstract since the repeated-game reduction will handle the horizon coupling). We say that M is if for every value profile v, every constraint profile {(B_i, τ_i)}_i, every nonnegative multiplier profile {(μ_i, ν_i)}_i, and every bid profile b, there exists a (possibly randomized) deviation mapping b_i^′ = δ_i(v_i, B_i, τ_i, μ_i, ν_i) such that

where Rev(b) := ∑_ip_i(b) and LW^⋆ is the (one-shot) optimal liquid welfare for the instance. The expectation is over any randomness in the deviations and/or mechanism.

Two aspects of are worth emphasizing. First, the deviation may depend on the bidder’s own constraints and multipliers; this is unavoidable because those parameters determine how aggressively the bidder should trade off value against spend. Second, the right-hand side contains revenue, not welfare, because later we will upper bound revenue by a constraint-aware welfare quantity (intuitively, if constraints are respected, the platform cannot extract more than what campaigns can finance).

In the unconstrained, quasilinear setting, smoothness is typically stated with utilities v_ix_i − p_i and benchmark SW^⋆ = max_x∑_iv_ix_i. The deviation is often a simple shading rule such as ``bid a random fraction of value.’’ Our constrained version differs in three structural ways.

We do not require the deviation bids to satisfy budgets or RoS constraints . Instead, feasibility is enforced by the learning dynamics through the multipliers, while the analysis evaluates deviations in the Lagrangian payoff . This is analytically crucial: hard constraints are intertemporal and would otherwise destroy the one-shot nature of smoothness.

Social welfare can be artificially inflated by allocating to a bidder whose constraints prevent it from paying for those allocations at the implied prices. Liquid welfare corrects this by capping each bidder’s effective contribution by the tighter of its budget and its RoS-implied spend capacity. Smoothness must therefore be proven with respect to a benchmark that already ``knows’’ about liquidity.

In standard smoothness, revenue is subtracted because payments are transfers that reduce quasilinear utility. In our repeated constrained setting, the same subtraction appears inside , but the repeated-game reduction will additionally use that, for any realized trajectory satisfying the constraints,

This inequality is the hinge that turns into a welfare bound: revenue is not an arbitrary negative term, but something that can be dominated by a constraint-aware welfare measure when campaigns are kept feasible.

Because is quasilinear after rescaling, a useful way to think about the deviation δ_i is that it bids proportional to a . Concretely, many auction formats admit deviations of the form

where κ_i decreases when the budget multiplier μ_i rises (the bidder is ``pacing’’ spend) and also adjusts with ν_i to respect the RoS target (the bidder is effectively raising the bar on value relative to spend). Smoothness then asserts that if each bidder unilaterally adopted its shadow-value deviation against an arbitrary profile b, the of the resulting Lagrangian payoffs would cover a constant fraction of the liquid-welfare optimum, up to a revenue term.

The definition is general, but it is not vacuous: it is satisfied by canonical ad-auction mechanisms under mild regularity conditions.

In truthful formats, allocations are monotone in bids and payments are threshold-like, which makes proportional deviations particularly effective. Via , we can often reduce constrained smoothness to the usual smoothness proof for the same mechanism with rescaled values and payments. Moreover, when the platform incorporates ML advice through personalized reserves or boosts, the deviation analysis can be tightened: roughly, better screening of low effective values prevents constraint-driven ``high-multiplier’’ bidders from winning at the expense of truly high v_i bidders, improving the constant α (and hence the welfare approximation).

First-price auctions are smooth in the standard sense with a constant strictly worse than 1 (reflecting bid shading), and a similar argument can be carried out with Lagrangian payoffs provided payments remain bid-bounded and allocations are monotone. The resulting constrained constants are typically weaker than in second-price/VCG, which matches the practical intuition that first-price introduces additional strategic friction on top of pacing and RoS considerations.

Constrained smoothness is a property of the and is not guaranteed in arbitrary designs.

If small bid changes can flip allocations in ways that destroy threshold structure, then proportional (or randomized shading) deviations may fail to secure any fraction of LW^⋆ uniformly over b.

The reduction relies on being able to relate revenue to liquid welfare along the realized path (cf. ). If the mechanism can charge large payments unrelated to bids (or if bidders can be forced into negative payoffs even when following conservative bids), then revenue may no longer be bounded by a constraint-aware welfare quantity, and the revenue subtraction in cannot be absorbed.

Our deviations are allowed to depend on v_i and on (B_i, τ_i, μ_i, ν_i), but not on rivals’ values. If the only deviations that certify efficiency require knowledge of others’ types or bids beyond what the model permits, then smoothness is not the right tool.

These limitations are not merely technical: they delineate when we can hope for robust, learning-compatible welfare guarantees. In the next section we exploit as the ``mechanism layer’’ inequality, and we specify learning dynamics (primal–dual pacing with dynamic regret control) that make the multiplier process and constraint violations behave well enough to turn this one-shot inequality into a horizon-level liquid-welfare guarantee.

5. Learning layer: primal-dual pacing algorithms, (dynamic) regret notions, and constraint violation accounting; clean sufficient conditions on algorithms.

The smoothness inequality in is a purely one-shot statement: it compares payoffs under a particular deviation to the optimal liquid-welfare benchmark for the same instance. To convert it into a horizon-level guarantee, we need a complementary that explains how bidders choose bids over time while respecting hard budgets and RoS targets. We emphasize a practical modeling choice: we do not assume bidders can solve the constrained horizon problem offline. Instead, we require only that each bidder runs an online algorithm whose performance can be summarized by two quantities that will enter the final welfare bound: a regret term (static or dynamic) and a constraint-violation term.

Fix bidder i and consider the Lagrangian payoff from , now indexed by time:

For the purpose of regret analysis, it is convenient to view u_i, t as a function of b_i, t holding rivals fixed, i.e.,
u_i, t(b_i, t) := u_i, t((b_i, t, b_−i, t); μ_i, ν_i).
Under full-information feedback (or semi-bandit feedback sufficient to infer threshold prices), bidder i can evaluate u_i, t(b) for alternative bids b ∈ ℬ_i and run standard online-optimization updates. Conceptually, the bidder is directly learning in the original constrained objective; it learns against the surrogate sequence u_i, t(⋅), where constraint pressures are encoded by the multipliers.

A simple and widely-used class of bidding rules is : bids are proportional to predicted value,

with κ_i, t updated over time to control spend and RoS. The proportional form is not imposed by our welfare analysis, but it is a useful sufficient condition because it reduces the bidder’s action space from bids to a single scalar. In truthful single-slot formats (second-price/VCG-like with monotone allocation and threshold payments), the transformed quasilinear representation suggests a closed-form mapping from multipliers to an ``effective shading’’ level. A canonical choice is

which increases bids when the RoS constraint is slack (small ν_i, t) and decreases bids when the budget constraint tightens (large μ_i, t). While is mechanism-dependent as an exact best response, it captures the economic direction of all primal–dual pacing loops: dual variables are shadow prices, and the primal bid is a monotone function of these prices.

We update multipliers by (projected) subgradient ascent on the dual of the horizon constraints, using realized per-round spend and value:

optionally with projection onto a bounded box [0, μ̄] × [0, ν̄] to obtain uniform Lipschitz constants. These updates are interpretable: μ_i, t increases when the bidder is overspending relative to its average per-round budget, and ν_i, t increases when realized value falls short of τ_i times spend. In practice, such dual updates correspond closely to the pacing multipliers'' andRoS multipliers’’ implemented in autobidding systems.

Our welfare theorem will allow small (or zero) violations, so we make explicit what is being measured. Define cumulative violations

and let Viol_i(T) denote either their sum or any upper bound used by the analysis. In many ad platforms the budget constraint is enforced mechanically (spend cannot exceed the remaining budget), which corresponds to Viol_i^bud(T) = 0 pathwise. RoS is often enforced by throttling or pausing when the observed value-to-spend ratio drops; this again yields a hard (or nearly hard) constraint in operational terms. Our analysis accommodates both cases: when constraints are enforced strictly, the violation term disappears; when constraints are enforced softly, the welfare bound degrades linearly with Viol_i(T) scaled by a Lipschitz constant Lip_i capturing the largest per-round sensitivity of payoffs to constraint slack.

Given a sequence of per-round payoff functions u_i, t(⋅) (as above), external regret compares the realized sequence (b_i, t)_t ≤ T to the best fixed bid in hindsight:
$$ \mathrm{Reg}_i(T) := \max_{b\in\mathcal{B}_i} \sum_{t=1}^T u_{i,t}(b) - \sum_{t=1}^T u_{i,t}(b_{i,t}). $$
In non-stationary markets, the relevant benchmark is typically not fixed. We therefore use against a drifting comparator sequence (b_i, t^⋆)_t ≤ T with bounded path length:

The comparator restriction is essential: without limiting drift, dynamic regret is vacuous. Economically, the path length P captures how quickly the ``right’’ pacing level moves as the market moves. In our setting, we will align P with the variation measure Var_T (or a mechanism-specific proxy): when values and market conditions drift slowly, so do the best-response deviations used in smoothness, and a bidder that can track such drift incurs only a small additional loss.

For the main welfare reduction, we do not need the bidder’s update rule to be exactly ; we only need verifiable performance guarantees. A clean set of sufficient conditions (stated informally here) is:

These conditions are intentionally modular: they isolate what the learning layer must deliver, independently of the particular mechanism-specific smoothness proof.

In second-price/VCG-like mechanisms, the mapping from bids to allocations and threshold payments is monotone and piecewise smooth, which makes the induced Lagrangian payoff well behaved as a function of a one-dimensional pacing parameter κ_i, t. When bidders receive (at least) the clearing price conditional on winning, they can implement online updates on κ_i, t using either full-information evaluation (testing counterfactual bids) or semi-bandit gradient estimation. On the dual side, the updates are standard projected subgradient steps for a convex dual objective; with step sizes $\eta_\mu,\eta_\nu \asymp 1/\sqrt{T}$ and bounded iterates, they yield $O(\sqrt{T})$ stability and, when coupled with hard stopping rules (e.g., cease bidding once remaining budget is exhausted), achieve Viol_i^bud(T) = 0 exactly.

Two caveats are worth flagging. First, the regret bounds above are cleanest under convexity/concavity assumptions and sufficiently rich feedback; in highly discontinuous auction rules or with severely censored feedback, dynamic regret can deteriorate, and our welfare guarantees weaken proportionally. Second, RoS targets create a subtle identification issue: if realized value x_i, tv_i, t is noisy or delayed (as in conversion attribution), then the dual update for ν_i, t must be modified (e.g., using unbiased estimators or delayed-gradient corrections). Our framework still applies as long as the resulting algorithm admits a regret/violation summary of the form (C1)–(C3); the constants simply absorb the additional estimation error.

This concludes the learning layer: bidders need only ensure that their bidding dynamics are no-regret (or low dynamic-regret) with respect to the Lagrangian payoff sequence and that any constraint violations are either zero by enforcement or quantitatively controlled. In the next section, we combine these algorithmic guarantees with constrained smoothness to obtain a learning-robust liquid-welfare bound with explicit dependence on ∑_iDR_i(T) and Var_T.

6. Main theorem: learning-robust liquid welfare bound (pathwise / in expectation) with explicit dependence on DR(T) and Var_T; discussion of constants.

We now state the reduction that turns the one-shot constrained-smoothness inequality into a horizon-level efficiency guarantee when bidders learn online. The message is that we can treat the repeated auction as a sequence of (possibly drifting) instances: smoothness provides, in each round, a that would certify good liquid welfare if bidders were willing to play it; dynamic regret ensures bidders track, up to a controllable loss, the cumulative payoff of that drifting deviation; and the only additional degradation comes from non-stationarity (captured by Var_T) and any constraint violations that are not enforced mechanically.

Inequality decomposes welfare loss into three economically interpretable components.

First, the multiplicative constant $\frac{1}{\alpha+\beta}$ is the familiar term induced by the one-shot smoothness inequality: when bidders best-respond (or no-regret learn) relative to the deviation family, aggregate welfare is at least a constant fraction of the offline benchmark. The β term appears because smoothness typically subtracts realized revenue, which we then control via feasibility and liquid-welfare dominance (in many constrained environments, revenue is upper bounded by liquid welfare once budgets/RoS bind).

Second, ∑_iDR_i(T; c_iVar_T) captures . If bidders could instantly track the drifting comparator bids singled out by the smoothness proof, this term would vanish. With standard online optimization rates, DR_i is sublinear in T for moderate drift, implying that the per-round welfare loss from learning decays.

Third, C_drift ⋅ Var_T is the of non-stationarity in the reduction itself. Even if bidders could perfectly track payoffs, the smoothness deviation we use changes as the instance changes; bounding the welfare impact of splicing together per-round deviations requires a stability argument that scales with how much the environment moves.

Finally, the violation term isolates the operational choice of versus enforcement. When budgets are enforced by the platform and RoS is enforced via throttling/pausing, we typically have Viol_i(T) = 0 pathwise, and the last term disappears entirely. When constraints are treated as soft, violations translate linearly into welfare loss with slope Lip_i, reflecting the worst-case sensitivity of the Lagrangian payoff to constraint slack.

Although we have stated with big structural terms, it is useful to make explicit what does hide in the constants. The factors C_drift and C_viol do not scale with T; they depend on boundedness of values and payments and on any projection ranges for multipliers (which ensure the per-round Lagrangian payoffs are uniformly Lipschitz). Thus, under a standard regime where DR_i(T; ⋅) = o(T) and Var_T = o(T), the theorem implies an asymptotic efficiency guarantee:
$$ \frac{1}{T}\sum_{t=1}^T \mathrm{Lwel}_t \;\ge\; \frac{1}{\alpha+\beta}\cdot \frac{\mathrm{OPT}_T}{T} \;-\; o(1). $$
This makes the tradeoff transparent: longer horizons wash out transient learning errors, while faster market drift and richer constraint management (via large multipliers or soft constraints) can increase the constant factors.

The theorem also clarifies how platform-side design choices enter. If the platform uses ML advice to implement reserves/boosts and thereby improves the constrained smoothness constant from α to a smaller α(γ) (as in the instantiation discussed in the global context), then the leading constant improves mechanically:
$$ \frac{1}{\alpha+\beta} \quad\leadsto\quad \frac{1}{\alpha(\gamma)+\beta}, $$
and all lower-order terms remain of the same qualitative form. In particular, better advice quality γ improves welfare in the precise sense that it strengthens the multiplicative approximation to OPT_T without requiring any change in bidder learning dynamics.

The argument is the standard smoothness-to-welfare reduction, with two modifications reflecting our setting. First, we apply the constrained-smoothness inequality round-by-round, summing over t = 1, …, T and using the deviation mapping supplied by (H1). Second, rather than comparing bidders to a fixed deviation (which would yield static regret), we compare them to the deviation sequence and invoke dynamic regret (H2). The cost of moving from a per-round deviation to an admissible comparator sequence is controlled by (H3): the path length of the deviation sequence is bounded by a constant multiple of Var_T. Finally, we convert the resulting bound on summed Lagrangian payoffs into liquid welfare by (i) canceling the dual terms via feasibility (or bounding them by Lip_iViol_i(T) under soft constraints), and (ii) absorbing revenue terms using the β-penalty and the usual revenue-to-welfare relationship that holds under budget/RoS feasibility. Randomization is handled by taking expectations at the end, since every inequality in the chain is linear in realized payoffs.

The theorem is intentionally modular: it separates what the must certify (a constrained-smoothness inequality with explicit (α, β)) from what the must deliver (dynamic regret and controlled violations), and it isolates non-stationarity through a single drift statistic Var_T. The price we pay is that the drift dependence is necessarily coarse: Var_T is a worst-case scalar summary, and in applications one can often replace it by a finer, mechanism-specific stability measure that yields smaller constants. Moreover, the bound is only as good as the feedback model allows: if bidders cannot estimate counterfactual payoffs well enough to obtain sublinear dynamic regret, the learning term dominates and the welfare guarantee correspondingly weakens.

With the main reduction in hand, we next instantiate (H1) and the corresponding constants for commonly used ad auction formats (truthful and non-truthful), including how reserves/boosts and randomization affect (α, β) and the comparator stability needed for (H3).

7. Instantiations: SPA/VCG (truthful), FPA (non-truthful), and randomized variants; mapping to (, )-constrained smoothness parameters.

We now make hypothesis (H1) concrete by recording (α, β) parameters for canonical single-slot mechanisms and explaining how the budget/RoS constraints enter the smoothness proof through the Lagrangian transformation. The guiding observation is that our per-round Lagrangian payoff is an affine reweighting of the usual quasi-linear utility, so standard smoothness arguments can be imported essentially as a black box once we identify the induced virtual value'' andvirtual bid’’.

Fix a bidder i and multipliers (μ_i, ν_i) ≥ 0. Up to the constant term μ_iB_i/T, the per-round Lagrangian payoff can be written as

Thus, for fixed multipliers, bidder i behaves as if she had (i) a (1 + ν_i)v_i, t for allocation and (ii) a weight w_i on money. Dividing by w_i (which does not affect best responses) shows that maximizing is equivalent to maximizing a quasi-linear utility with

and standard payment p_i, t. This reduction is the technical reason constrained smoothness remains tractable: we can prove smoothness for the induced instance with values (ṽ_i, t) and then translate back, with constants depending only on boundedness/projection radii for multipliers (to keep ṽ_i, t bounded and Lipschitz).

In a single-slot environment, VCG coincides with the second-price auction (SPA) up to tie-breaking and reserve handling, so we state results in SPA language. The key property is that, under conservative bidding (no overbidding relative to the relevant value proxy), SPA admits a deviation guaranteeing a constant fraction of welfare needing to subtract revenue.

Concretely, consider the standard deviation that bids a fixed fraction of value. For our constrained setting, the natural deviation is

where ṽ_i, t is defined in . Since the SPA allocation is monotone in bids and the winner pays the highest competing bid, the classical smoothness calculation (adapted to allow randomized tie-breaking) implies an inequality of the form

where OPT_t^eff denotes the per-round optimal welfare computed using the effective values (ṽ_i, t). Translating back to liquid welfare requires only the same bookkeeping as in the reduction: the multipliers are already internalized by the Lagrangian, and feasibility plus constraint handling ensures we can compare OPT_t^eff to the liquid-welfare benchmark for the original instance. In the notation of (H1), this corresponds to (α, β) = (2, 0) for SPA/VCG under conservative bidding relative to ṽ_i, t.

Two remarks clarify what is assumed and what is gained.

Because bidder utilities in penalize payments by a weight w_i ≥ λ_i, the relevant no-overbidding'' condition is $b_{i,t}\le \widetilde{v}_{i,t}$ (or the corresponding per-click analogue). In primal-dual pacing for truthful formats, bids typically take the form $b_{i,t}=\kappa_{i,t} v_{i,t}$ with $\kappa_{i,t}\in[0,1]$ generated by multipliers; this automatically enforces conservativeness once the mapping between $\kappa_{i,t}$ and $(\mu_{i,t},\nu_{i,t})$ is chosen consistently with \eqref{eq:effective_value}. Operationally, this is precisely the regime in which platforms are comfortable interpreting bids asshaded values’’ rather than arbitrary strategic signals.

If we allow arbitrary overbidding, one typically obtains a smoothness inequality with a revenue subtraction (i.e., β > 0). In our framework this is still admissible because the reduction ultimately controls revenue by liquid welfare under hard feasibility/constraints, but the leading constant degrades from 1/(α + β) = 1/2 to a weaker fraction. We therefore treat (2, 0) as the clean baseline for truthful auctions when the learning layer enforces conservativeness, and we view the β > 0 variants as robustness add-ons rather than the economically salient case.

For the first-price auction (FPA), the winner pays her own bid, so the per-round payoff is nonlinear in the bid even after the Lagrangian reweighting. Nonetheless, the mechanism remains smooth in the standard sense: there exists a randomized deviation (often based on bidding a random fraction of value) that guarantees a constant fraction of the optimum realized revenue. Plugging the effective values ṽ_i, t into the classical deviation construction yields, for each round t,

which corresponds to $(\alpha,\beta)=\bigl(\tfrac{e}{e-1},\,1\bigr)$ in our (1/α, β) parameterization.

Two economic takeaways are worth emphasizing.

Because β = 1 in , our reduction uses a revenue-to-liquid-welfare comparison to absorb the −Rev_t term. This step is harmless in regimes where budgets/RoS bind (revenue is then effectively ``liquid’’), but it can be loose when many bidders are slack. Said differently, non-truthful formats shift part of the efficiency analysis from allocation geometry to payment geometry; in a constrained environment this is not a dealbreaker, but it makes the constants more sensitive to how often constraints bind.

In SPA/VCG, the mapping b_i, t = κ_i, tv_i, t together with truthful pricing makes the per-round Lagrangian payoff effectively linear in the decision variable κ_i, t, enabling clean regret guarantees via standard online convex optimization. In FPA, the payoff as a function of bid is typically only unimodal and depends on the distribution of competing bids; even with full-information feedback on the winning threshold, obtaining dynamic regret bounds may require richer bidder-side estimation or explicit exploration. Our welfare theorem still applies verbatim once (H2) is verified, but in practice FPA shifts burden from mechanism design to bidder algorithm design.

Finally, we record a robustness principle that will matter both for practical auction implementations (where randomization is ubiquitous) and for proof hygiene (where randomized deviations are standard).

Suppose the mechanism randomizes in a way that preserves feasibility almost surely (e.g., randomized tie-breaking among equal bids; random selection among a shortlist; probabilistic throttling that never allocates more than one slot). Then any one-shot smoothness inequality proved for deterministic outcomes extends immediately by taking expectations over the mechanism’s internal randomness, because both sides of the inequality are linear in allocations and payments. In particular, if a deterministic SPA is (2, 0)-constrained-smooth under conservative bidding, then an SPA with randomized tie-breaking is also (2, 0)-constrained-smooth in expectation.

For FPA and for several reserve-based or boosted-score auctions, the best known deviations are randomized. Our constrained smoothness definition allows randomized deviations by design, so no additional machinery is required: we view the deviation mapping as producing a distribution over bids conditional on (v_i, t, μ_i, ν_i), and we interpret the smoothness inequality in expectation over that deviation randomness (holding the realized bid profile b_t fixed).

Randomization only affects (α, β) when it changes the underlying allocation/payment rule in a way that effectively ``dilutes’’ competition, such as δ-exploration mechanisms that allocate the slot uniformly at random with probability δ and run the base auction with probability 1 − δ. In such cases, smoothness typically degrades gracefully by a factor on the order of 1/(1 − δ) in the welfare term, reflecting the mechanically lost efficiency from forced exploration. This gives a clean way to account knowing that, in learning-heavy deployments, some randomization may be intentionally introduced for calibration or experimentation.

Taken together, these instantiations justify treating (H1) as a mild, mechanism-level condition: for truthful formats we obtain clean constants with minimal caveats, while for non-truthful and randomized formats we can still certify constrained smoothness at the cost of either a revenue term (β > 0) or a controlled loss due to explicit exploration. The next step is to show how platform-side ML advice, implemented via reserves/boosts, improves the leading constant by strengthening the effective competition among high-value bidders.

8. ML advice and improved constants: reserves/boosts with signal quality ; derive () and comparative statics.

The smoothness constants recorded above treat the auction as purely bid-driven, and therefore inherit the classical factor-2 barrier for single-slot allocation under conservative behavior. In constrained autobidding, this barrier is often not the economically right baseline: bidders shade of budgets and RoS, so bids partially reflect dual prices rather than underlying marginal value. A natural platform response—already common in practice—is to fold platform-side predictions into the allocation rule via bidder-specific reserves or score boosts. In our framework, such advice enters only through (H1): it modifies the mechanism M, and hence the parameters (α, β) that feed into the learning-to-welfare reduction.

Fix a round t. Suppose the platform has an advice signal v̂_i, t satisfying a multiplicative guarantee

The upper bound v̂_i, t ≤ v_i, t is a substantive modeling choice: it is precisely the regime in which using v̂_i, t for reserves/boosts is ``safe’’ from an individual-rationality standpoint under conservative bidding, because it never forces a bidder to pay (or effectively commit to pay) above her realized value. The lower bound v̂_i, t ≥ γv_i, t is the quantitative content: when γ is larger, advice is closer to the truth and can be used more aggressively without sacrificing welfare guarantees.

Because our smoothness arguments operate on the effective values ṽ_i, t in , it is convenient to note that transfers immediately to the effective scale for any fixed multipliers: defining $\widehat{\widetilde v}_{i,t}:=\frac{1+\nu_i}{\lambda_i+\mu_i+\tau_i\nu_i}\hat v_{i,t}$ yields $\widehat{\widetilde v}_{i,t}\in[\gamma \widetilde v_{i,t},\,\widetilde v_{i,t}]$. Thus, advice quality is preserved by the Lagrangian reweighting.

One convenient way to incorporate advice while retaining clean analysis is an SPA run on bidder scores that add an advice-based constant:

with the threshold payment

and p_i, t = 0 for losers. Since the boost term is independent of b_i, t, – is simply VCG applied to a shifted bid space; in particular, the threshold payment never exceeds the winner’s bid because winning implies b_i, t ≥ max_j ≠ is_j, t − ζv̂_i, t. Economically, says the platform is willing to ``credit’’ bidders with a portion of their predicted value when ranking ads, counteracting shading induced by pacing.

The parameter ζ ≥ 0 controls how strongly advice influences allocation. For our purposes, we only need that ζ is chosen as a (known) function of γ so that boosts are strong enough to prevent gross misallocations but not so strong as to make bids irrelevant. One can view as a stylized proxy for the many score-based implementations used in ad platforms (including reserve-like floors, quality-score boosts, and bid multipliers), all of which preserve ex post feasibility and can be analyzed with the same smoothness vocabulary.

The effect of advice is to strengthen the deviation guarantee in the smoothness inequality: when a high-value bidder loses under the deviation, it must be because someone else has either (i) sufficiently high bid pressure or (ii) sufficiently high advice. Case (ii) is exactly where bites, because losing to a high-v̂ bidder means losing to someone whose value is not too small relative to the optimum. This improves the worst-case inefficiency relative to a purely bid-driven ranking, where the winner can in principle have arbitrarily low value.

Formally, adapting the robust-auction-design calculations for single-slot score auctions to our effective-value instance yields the following per-round constrained smoothness inequality for an advice-augmented SPA of the form –:

for an appropriate deviation mapping (a fixed-fraction bid in the effective scale, as in , but with the fraction chosen to be compatible with the boost strength). Comparing to , we see that advice improves the welfare coefficient from 1/2 to (1 + γ)/2. In the notation of (H1), this corresponds to

so that the leading welfare guarantee in our learning theorem improves monotonically with γ.

It is worth pausing on the economics behind . The classical 1/2 arises because, absent additional structure, the deviation can only guarantee that either the deviating optimal bidder wins (earning utility) or else the competing threshold is high (which, in more general mechanisms, is paid as revenue and is then related back to welfare). Advice effectively creates a third possibility: even if the deviator does not win and even if the competing bids are not informative, the winner’s advice-based component certifies that her true value is not too small. The worst case therefore interpolates between competition-based'' andadvice-certified’’ allocation, producing exactly the arithmetic average (1 + γ)/2.

Equation delivers a clean comparative static: as γ increases, α(γ) decreases, and therefore the multiplicative welfare loss due to strategic behavior and learning shrinks. Two limiting cases are informative. If γ ↓ 0, advice can be arbitrarily weak, and we revert to the baseline α(0) = 2 (no improvement over standard SPA/VCG). If γ ↑ 1, advice is essentially exact and the bound approaches α(1) = 1, consistent with the intuition that sufficiently informative platform-side signals can nearly pin down the welfare-maximizing allocation even when bids are heavily paced.

From a policy-and-practice perspective, says that investments in prediction quality have a welfare payoff . The improvement is not merely statistical; it is mechanism-theoretic. The platform is using prediction to reshape the strategic environment so that the smoothness deviation captures more of the offline optimum.

First, the conservative orientation v̂_i, t ≤ v_i, t matters: if advice can overestimate values, then reserves/boosts based on v̂_i, t can induce over-allocation to low true-value bidders and can also break the conservativeness conditions that underlie clean smoothness constants. In practice, this motivates the use of calibrated lower-confidence bounds (or other conservative corrections) when turning ML predictions into mechanism parameters.

Second, γ should be interpreted as a multiplicative quality parameter. If advice quality is heterogeneous across bidders or contexts, one can still apply the same analysis by taking γ to be the minimal guarantee over the relevant slice of traffic, at the cost of conservatism in the bound. This is a feature rather than a bug: it makes explicit that the welfare guarantee is only as strong as the weakest region in which prediction is deployed.

Finally, while we focused on SPA/VCG-like formats where advice yields β = 0, analogous ideas can be applied to first-price and other non-truthful auctions; there the main effect is typically a reduction in α some control over β (the revenue subtraction), with the exact tradeoff depending on how boosts interact with bid shading. For our purposes, the key message is that incorporating advice through reserves/boosts is a mechanism-level way to improve the constants in (H1), and therefore to tighten the welfare guarantees that persist under dynamic learning and drift.

9. Extensions: multi-slot position auctions; mixed objectives (); alternative RoS definitions; partial feedback effects (when closed-form is unavailable, indicate numerical calibration).

Our baseline presentation focuses on single-slot allocation, a linear ``value–spend’’ objective, and feedback rich enough to support closed-form primal–dual pacing. We briefly discuss how each of these choices can be relaxed while keeping the same organizing logic: (i) fold the relevant business or engineering feature into the per-round primitives (x_i, t, p_i, t, v_i, t) and the benchmark OPT_T, and (ii) re-check a constrained-smoothness inequality (H1) and a learning guarantee (H2) for the induced Lagrangian payoff sequence.

In many ad markets, each round t allocates K ≥ 1 slots with position weights (e.g., click-through rates) a₁ ≥ a₂ ≥ ⋯ ≥ a_K ≥ 0. A convenient extension is to let the mechanism output an assignment x_i, t^k(b_t) ∈ [0, 1] indicating whether bidder i receives slot k, with feasibility ∑_ix_i, t^k ≤ 1 for each k and ∑_kx_i, t^k ≤ 1 for each i. If v_i, t denotes value per click (or per unit exposure), then bidder i’s realized value becomes
$$ V_{i,t} \;=\; \sum_{k=1}^K a_k\,x^{k}_{i,t}(\mathbf{b}_t)\,v_{i,t}. $$
Budgets and RoS constraints remain unchanged with V_i, t substituted, and liquid welfare naturally generalizes to
$$ \mathrm{Lwel} \;=\; \sum_{i=1}^n \min\Bigl\{B_i,\;\frac{1}{\tau_i}\sum_{t=1}^T \sum_{k=1}^K a_k x^{k}_{i,t} v_{i,t}\Bigr\}. $$
From an analytical standpoint, nothing essential hinges on K = 1: the role of constrained smoothness is to certify that for each per-round instance there exists a deviation profile whose aggregate Lagrangian payoff lower bounds the (effective) welfare benchmark up to constants. For VCG position auctions (or variants that are VCG in the space of scores), the standard smoothness constructions for assignment games extend by scaling each bidder’s value by a_k and using deviations that mimic truthful bidding on the induced effective values. The constants (α, β) will generally depend on the position weights through the same parameters that appear in classical price-of-anarchy analyses of position auctions (e.g., ``semi-smoothness’’ constants), but our learning-to-welfare reduction is unchanged once (H1) is verified. Likewise, if the platform introduces advice-driven score adjustments slot-by-slot, the same multiplicative-quality parameter γ can be used to improve α provided the advice certifies value along the relevant margin (winning a high-weight slot rather than a low-weight slot).

Two practical caveats are worth recording. First, market drift may be better captured by variation in values a_kv_i, t or in the induced threshold prices for each slot; one can define
$$ \mathrm{Var}_T^{\mathrm{pos}} \;:=\; \sum_{t=2}^T \max_{i}\max_{k}\, \bigl|a_k v_{i,t} - a_k v_{i,t-1}\bigr| \;=\;\Bigl(\max_k a_k\Bigr)\mathrm{Var}_T, $$
so the drift penalty simply rescales. Second, real systems often couple the positions through reserve policies or ``top-of-page’’ constraints; in such cases the feasibility set 𝒳 is more structured, but the constrained-smoothness vocabulary is expressly designed to be mechanism-agnostic: one re-derives the deviation inequality for the actual allocation rule and proceeds identically.

We allow each bidder to trade off value and spend via λ_i ∈ [0, 1], ranging from pure value maximization (λ_i = 0) to quasilinear utility (λ_i = 1). In practice, such mixed objectives arise because some advertisers optimize value subject to tight budgets (small λ_i), while others treat spend as intrinsically costly (large λ_i) or incorporate additional penalties (e.g., risk or variance terms) that are approximately linear in spend at the margin.

Formally, the key observation is that λ_i enters only through the Lagrangian surrogate. Writing the per-round Lagrangian payoff as
$$ \widehat{u}_{i,t}(b_{i,t},\mathbf{b}_{-i,t};\mu_i,\nu_i) = (1+\nu_i)V_{i,t} \;-\; (\lambda_i+\mu_i+\tau_i\nu_i)p_{i,t} \;+\; \mu_i\frac{B_i}{T}, $$
we see that the learning problem remains a no-regret (or dynamic-regret) problem on a sequence of linear payoffs in (V_i, t, p_i, t) with bidder-specific coefficients. In particular, the induced ``effective value’’ scaling
$$ \widetilde v_{i,t} \;:=\; \frac{1+\nu_i}{\lambda_i+\mu_i+\tau_i\nu_i}\,v_{i,t} $$
continues to summarize incentives, so any smoothness proof that is stated in terms of values can be re-used by substituting ṽ_i, t. This makes mixed objectives essentially free in our framework: heterogeneity in λ_i affects the constants through bidder-specific Lipschitz bounds (how much utility changes when payments move) but not the structure of the welfare guarantee.

Two extensions sometimes requested in applications are also compatible. First, advertisers may change λ_i over time (e.g., during promotions). If λ_i, t varies with bounded variation, it can be folded into the drift term by treating the effective values ṽ_i, t as the primitive sequence whose variation drives the comparator path length. Second, one can incorporate a platform objective that is itself a convex combination of welfare and revenue by modifying the benchmark (and, correspondingly, the deviation mapping in (H1)); the same learning reduction applies so long as the benchmark is the one certified by the constrained-smoothness inequality.

Return-on-spend constraints appear in several operational forms, not all identical to the cumulative inequality ∑_tV_i, t ≥ τ_i∑_tp_i, t. We emphasize that our choice is not merely for convenience: the cumulative form is (i) economically interpretable as a long-run ROI target and (ii) analytically compatible with convex dual methods and smoothness reductions.

That said, three common variants can be accommodated.

Often v_i, t is an expectation (e.g., predicted conversion value), while realized value is a random variable Y_i, t observed with delay. If the constraint is intended to hold in expectation, one simply replaces v_i, t by 𝔼[Y_i, t ∣ context], which is already our modeling primitive. If instead the constraint must hold with high probability on realized outcomes, then the dual updates need to incorporate concentration (martingale) bounds; operationally, this leads to ``conservative’’ pacing rules that include a safety buffer, which can be analyzed by adding an extra slack term to the RoS constraint and tracking its contribution to Viol_i(T).

Many systems enforce RoS over rolling windows (e.g., last W auctions or a day). This can be modeled by introducing constraints indexed by window endpoints, yielding multiple dual variables {ν_i, s} that update on different clocks. While the smoothness inequality remains per-round, the learning problem becomes a multi-constraint online optimization problem; standard primal–dual methods still yield sublinear regret, but the constants depend on the number of active windows. A pragmatic compromise is to treat windows as ``soft’’ constraints and penalize violations in the objective; in our notation this simply enlarges the Lagrangian surrogate and increases the Lipschitz term multiplying Viol_i(T).

Some teams phrase RoS as a ratio constraint $\frac{\sum_t V_{i,t}}{\sum_t p_{i,t}}\ge \tau_i$ (with a convention when spend is zero). This is equivalent to our linear form when ∑_tp_i, t > 0 and is therefore already covered; the linear form is preferable because it avoids degeneracy and keeps dual updates well-defined even when spend is small early in the horizon.

Across these variants, the recurring technical point is that the welfare theorem penalizes constraint handling only through Viol_i(T), so any enforcement architecture that keeps violations controlled—hard stopping rules, conservative buffers, or explicit penalties—can be plugged in without changing the core mechanism-level analysis.

Our cleanest corollaries rely on feedback rich enough to compute (V_i, t, p_i, t) and, in truthful auctions, to map dual prices to proportional bids in closed form. Real platforms can deviate from this ideal in two ways: bidders may not observe informative counterfactuals (e.g., the threshold price when they lose), and the auction may be non-truthful or ``black-box’’ enough that the dependence of outcomes on bids is not analytically tractable.

When feedback is (the bidder observes only her own x_i, t and p_i, t), the natural replacement is a bandit or semi-bandit online learning algorithm on ℬ_i (or on a parametric policy class Π_i). Regret rates typically weaken from $O(\sqrt{T})$ to O(T^2/3) (or worse) depending on smoothness/convexity assumptions and whether gradients can be estimated. In our welfare bound this translates directly into a larger additive loss term through ∑_iDR_i(T). The comparative static is therefore sharp: limited transparency is not merely a learning inconvenience; it is quantitatively a welfare cost.

When the mapping from bids to outcomes lacks a closed form (e.g., first-price dynamics, complex pacing layers, or unknown reserve logic), a practical approach is of the primal–dual loop. Concretely, bidders can maintain multipliers (μ_i, t, ν_i, t) as before, but estimate the bid response b_i, t(μ_i, t, ν_i, t) by (i) local perturbations of bids to estimate the marginal tradeoff between expected value and spend, or (ii) fitting a low-dimensional response model from historical data and updating it online. In either case, the theory guides what should be calibrated: the effective-value ratio (1 + ν)/(λ + μ + τν) is the sufficient statistic in truthful settings, so numerical routines can target this one-dimensional mapping even when the underlying auction is complex. The limitation is equally clear: exploration needed for estimation creates short-run inefficiency and may itself induce constraint violations unless additional safeguards are imposed. This is precisely the regime where simulation becomes informative, motivating the experimental plan we turn to next.

10. Simulation plan (optional but recommended): synthetic drifting markets; demonstrate tightness of Var_T dependence and the ML advice improvement.

Our theory isolates two quantitative drivers of performance in repeated auctions with constrained autobidders: (i) the extent of non-stationarity, captured by Var_T, and (ii) mechanism-side improvements in the constrained-smoothness constant, captured by α(γ) when ML advice is used to set reserves/boosts. A focused simulation suite can therefore serve two purposes: it can (a) validate that the realized welfare loss is approximately linear in Var_T in regimes where dynamic regret is controlled, and (b) demonstrate the predicted improvement in welfare as the advice quality parameter γ increases.

We recommend simulating single-slot auctions with n advertisers over T rounds, with per-round values {v_i, t} generated by a controlled drift process. In each round, bidders submit bids via an autobidding policy (e.g., primal–dual pacing) and the platform runs a fixed mechanism M (e.g., SPA/VCG with optional reserves/boosts). For each run we record: (i) cumulative liquid welfare $\sum_{t=1}^T \mathrm{Lwel}_t$, (ii) realized revenue $\sum_{t=1}^T \sum_i p_{i,t}$, (iii) constraint satisfaction (budget spend and RoS), and (iv) learning diagnostics such as DR_i(T) (or its empirical proxy) and the realized path length of bids. To benchmark efficiency, we estimate OPT_T by solving an offline planning problem using the realized {v_i, t}: in the single-slot case, an LP relaxation over allocations {x_i, t ∈ [0, 1]} with feasibility ∑_ix_i, t ≤ 1, and with bidder-side cumulative constraints mirroring the simulated budgets and RoS targets. The principal plot is then the welfare ratio
$$ \mathrm{Eff}(T)\;:=\;\frac{\sum_{t=1}^T \mathrm{Lwel}_t}{\mathrm{OPT}_T}, $$
reported with confidence intervals across random seeds.

To make the Var_T dependence visible (rather than washed out by i.i.d. noise), we recommend drift processes in which Var_T can be dialed up while keeping the marginal distribution of values approximately fixed. Three convenient families are:

Partition [T] into S blocks of length L = T/S. In each block s, draw a baseline value vector (θ_1, s, …, θ_n, s) (e.g., lognormal) and then set v_i, t = θ_i, s + ϵ_i, t within the block. Drift is introduced by changing θ_⋅, s between blocks; in this design,
$$ \mathrm{Var}_T \approx \sum_{s=2}^S \max_i |\theta_{i,s}-\theta_{i,s-1}|, $$
so increasing either the number of switches S or the jump sizes increases Var_T predictably.

Fix a total drift budget D and generate values recursively via v_i, t = v_{i, t − 1} + Δ_i, t where Δ_i, t is chosen to satisfy ∑_t ≥ 2max_i|Δ_i, t| ≤ D while inducing ``hard’’ changes (e.g., alternating which bidder is highest value). This construction is useful for stress-testing tracking: it tends to force large shifts in the identity of the efficient winner without changing the average scale of values.

Use v_i, t = m_i + A_isin (2πt/P) + ϵ_i, t with period P and amplitude A_i. Here Var_T scales like (T/P)max_iA_i, making it straightforward to separate slow drift'' (large $P$) fromfast drift’’ (small P) while keeping the same amplitude.

Across designs, we recommend reporting the realized Var_T computed from the generated v_i, t sequence to confirm that the intended drift control is reflected in the actual statistic used by the theorem.

Budgets and RoS targets should be heterogeneous to reflect realistic pacing pressure. A simple parameterization is to draw B_i proportional to a target spend rate (e.g., B_i/T uniform in a range) and to set τ_i so that RoS is binding for some bidders but slack for others. We then implement primal–dual pacing with bids of the form b_i, t = κ_i, tv_i, t (or the corresponding mapping in the chosen auction), with dual updates
μ_{i, t + 1} = [μ_i, t + η(p_i, t − B_i/T)]₊, ν_{i, t + 1} = [ν_i, t + η(τ_ip_i, t − x_i, tv_i, t)]₊,
and with a hard stopping rule that sets b_i, t = 0 once spend hits B_i to ensure Viol_i(T) = 0 by construction. To connect directly to the dynamic guarantee, one can optionally include an explicit tracking component (e.g., optimistic mirror descent on the dual, or a slowly varying stepsize) and measure an empirical dynamic regret proxy against a sliding-window best response.

The most informative experiment is a sweep over drift levels while holding fixed n, T, and the learning rule. For each drift family above, vary a single knob (jump size, number of switches, or period P) to span a range of Var_T values. For each setting, report the welfare gap
$$ \mathrm{Gap}(T)\;:=\;\mathrm{OPT}_T-\sum_{t=1}^T \mathrm{Lwel}_t. $$
Our prediction is that, once T is large enough that per-round regret terms are small relative to T, the gap grows approximately linearly with Var_T (and the slope changes modestly with mechanism and constraint tightness). A particularly transparent visualization is a scatter plot of Gap(T) versus Var_T across all seeds and drift settings, together with a fitted line. To make the comparison meaningful, we recommend ensuring that the total scale of values (e.g., $\frac{1}{T}\sum_t \max_i v_{i,t}$) is held approximately constant across the sweep; otherwise, changes in OPT_T confound the drift effect.

To test the advice channel, we generate platform predictions v̂_i, t satisfying the multiplicative guarantee v̂_i, t ∈ [γv_i, t, v_i, t] by construction; for example, take v̂_i, t = v_i, t ⋅ ζ_i, t where ζ_i, t ∼ Unif([γ, 1]), or a more structured model in which ζ_i, t depends on the context and bidder. The platform then uses v̂_i, t to set a personalized reserve r_i, t or a score boost s_i, t (depending on the auction format) in a way that is monotone in v̂_i, t. Concretely, in an SPA-with-scores implementation one can rank bidders by a score b_i, t + s_i, t and impose a reserve r_i, t so that a bidder must clear b_i, t ≥ r_i, t to be eligible; the policy choice (how r_i, t and s_i, t depend on v̂_i, t) should be fixed across the sweep so that only γ changes.

The recommended experiment is then a grid over γ ∈ {0.2, 0.4, 0.6, 0.8, 1.0} crossed with a representative set of drift levels. For each cell, report Eff(T) and also the distribution of winners by true value quantiles (to verify the intended mechanism-side correction: higher-v bidders should win more often as γ increases). The predicted pattern is monotone improvement in welfare with γ, and the gains should be largest when constraints are tight (because that is when advice-driven reserves/boosts most effectively prevent low-value/high-multiplier bidding from crowding out efficient allocations).

To interpret results, we recommend including three baseline variants: (i) i.i.d. values as a sanity check (the gap should then be largely explained by learning transients), (ii) γ = 0 (or, operationally, reserves/boosts turned off), and (iii) a pacing heuristic (e.g., fixed κ_i chosen to hit the budget in expectation) to highlight the role of dynamic tracking. Robustness checks that are often informative in practice include: heavy-tailed values (lognormal with large variance), correlated values across bidders (common shocks), and mis-specified advice where v̂_i, t violates the multiplicative bound for a fraction of rounds (to assess how sensitive the gains are to departures from the γ model).

For reproducibility, we suggest reporting (n, T), the distributional family for values, drift parameters and realized Var_T, budget and RoS distributions, stepsizes η, and the stopping/enforcement logic. Each figure should include (i) welfare ratio Eff(T), (ii) constraint satisfaction (fraction of runs with any violation and average slack), and (iii) a decomposition of the welfare gap into a component attributable to drift (variation sweep) versus a component attributable to learning noise (no-drift control). The objective of this section is not to ``fit’’ constants, but to verify the comparative statics emphasized by the theory: linear-in-Var_T degradation in difficult drifting environments, and systematic welfare improvement as γ increases under advice-driven mechanism adjustments.

11. Conclusion and open problems: lower bounds, tight constants, and mechanism design to minimize under learning dynamics.

We studied repeated single-slot ad auctions in which the frictions that dominate practice—hard budgets, hard return-on-spend (RoS) targets, and non-stationary values—are front and center. Our organizing device was a constrained-smoothness inequality written for a liquid-welfare benchmark, combined with learning guarantees stated in terms of (dynamic) regret on a Lagrangian payoff sequence. This perspective does two things at once. First, it separates efficiency losses (captured by (α, β), and improved to α(γ) when the platform uses sufficiently accurate ML advice) from losses (captured by ∑_iDR_i(T) and by the drift proxy Var_T). Second, it yields a welfare bound that is meaningful for large markets and long horizons: the leading term scales with OPT_T while the error terms are additive and interpretable as tracking and non-stationarity penalties.

At a conceptual level, the model illuminates a tradeoff that is often implicit in deployed autobidding systems. On the one hand, advertiser-side learning rules (e.g., primal–dual pacing) can enforce budgets and RoS targets nearly exactly, but doing so induces bid shading that can be allocatively harmful in any fixed auction format. On the other hand, platform-side adjustments (reserves, boosts, scoring) can partially undo this distortion, but only to the extent that they are informed by reliable value proxies. The constrained-smoothness constant α (and its advice-refined counterpart α(γ)) is a natural quantitative summary of how well a mechanism ``tolerates’’ constraint-driven bidding behavior while still producing liquid welfare.

We also view our results as a step toward a common language for three communities that typically reason separately: auction theory (which often assumes equilibrium and stationary types), online learning (which often abstracts away from mechanism design), and production ads systems (which internalize constraints and feedback limitations but rarely obtain end-to-end welfare guarantees). That said, several limitations remain, and they lead to a concrete open-problem agenda.

Our dynamic guarantee degrades additively with Var_T (and with dynamic regret). An important theoretical question is whether an O(Var_T) penalty is unavoidable for any algorithm–mechanism pair under our information structure, or whether it is an artifact of the proof technique (in particular, of the stability step that compares drifting deviation maps over time). A crisp direction is to construct adversarial drift instances in which, even under full-information feedback and even for ``benign’’ mechanisms such as VCG-like rules, the welfare gap
$$ \mathrm{OPT}_T-\sum_{t=1}^T \mathrm{Lwel}_t $$
is provably Ω(Var_T) for any no-violation policy class that respects hard budgets and RoS constraints. Such a lower bound would formalize a practical intuition: when the identity of the efficient winner changes rapidly, any algorithm that paces spend to satisfy horizon-wide constraints must either (i) delay adaptation and lose welfare, or (ii) overshoot constraints and be forced to stop bidding later. Conversely, it would be equally valuable to identify settings in which Var_T can be replaced by a smaller, instance-dependent complexity (e.g., a predictability or margin condition on values) without contradicting worst-case impossibility.

We emphasized that ML advice can improve efficiency via a better smoothness constant, e.g.,
$$ \alpha(\gamma)=\frac{2}{1+\gamma} $$
in VCG/SPA-like settings under a multiplicative signal model. Two kinds of tightness questions remain. First, for a fixed mechanism family (say, SPA with bidder-specific reserves and score boosts), what is the α(γ) as a function of γ under liquid-welfare objectives and hard RoS constraints? Second, how do budgets and RoS targets interact with the revenue term in smoothness (the role of β) once we restrict attention to feasible (no-violation) bidding dynamics rather than arbitrary bid profiles? Answering these requires instance constructions that witness inefficiency and also careful accounting of how liquid welfare upper-bounds revenue when constraints bind. From a design standpoint, tight constants matter: a factor improvement in α is often the difference between near-optimal in aggregate'' andsystematically leaving value on the table’’ in constrained, competitive markets.

Our framework treats the mechanism as fixed and evaluates welfare given learning behavior. A natural next step is to invert the problem: choose M (within an implementable class) to minimize α subject to operational constraints (simplicity, monotonicity, robustness, revenue floors), anticipating that bidders run primal–dual pacing or related dynamics. This is not the classic optimal auction'' problem; the objective is not revenue, and the solution concept is not a static equilibrium. Rather, we want mechanisms whose constrained-smoothness inequality is as tight as possible \emph{uniformly over} the Lagrangian multipliers that learning induces. One plausible design principle is to build the scoring/reserve rule to bedual-compatible’’: if bidders shade bids as b_i, t ≈ κ_i, tv_i, t, the platform should adjust scores to approximate ordering by v_i, t, not by the shaded proxy. Making this precise could lead to a convex-analytic characterization of the optimal boost/reserve mapping in terms of the distribution of multipliers (μ_i, ν_i) induced by constraints.

We modeled advice by a clean multiplicative parameter γ and treated it as exogenous. In practice, the quality of value predictions depends on the auction rules (which affect data), on the bidding policies (which affect selection), and on platform measurement (which affects the value labels themselves). A more complete theory would endogenize γ as an equilibrium object of a data-generating process, potentially creating feedback loops: changing reserves changes who wins, which changes observed conversions, which changes the next period’s predictions. Even short of full endogeneity, a pressing question is robustness: if v̂_i, t violates v̂_i, t ∈ [γv_i, t, v_i, t] on a nontrivial fraction of rounds, can we replace γ with an ``effective’’ quality parameter that degrades gracefully with the violation rate? Such results would better match the policy question that practitioners face: not whether predictions are perfect, but whether they are reliable enough to justify mechanism-side intervention.

Our learning guarantees are cleanest under feedback that allows bidders to evaluate (or estimate accurately) Lagrangian payoffs. Many ad platforms provide only partial information (e.g., no counterfactual prices, delayed conversions, noisy attribution). Extending the welfare theorem to these environments requires bandit or semi-bandit dynamic regret bounds with constraints, which are technically subtle even without strategic interactions. A particularly relevant target is to quantify how welfare scales with the delay distribution of value feedback and with the variance of conversion measurement. The mechanism-side constants (α, β) would remain the same, but the behavior-side terms could degrade from $O(\sqrt{T})$ to worse rates; identifying the sharp dependence would help translate measurement investments into welfare predictions.

Single-slot auctions are a useful primitive, but many ad markets are multi-slot, multi-format, and paced across campaigns. Extending constrained smoothness and liquid welfare to settings with multiple simultaneous constraints (multiple budgets, multiple RoS-like targets, frequency caps) and with richer feasibility constraints (position auctions, knapsack-like constraints) is an open direction. Likewise, advertisers often interpolate between value maximization and profit maximization (our λ_i), or optimize non-linear objectives (risk, attribution-weighted conversions). A general theory should identify which classes of objectives preserve the Lagrangian reduction and which break it, and how the definition of liquid welfare should be modified to remain an economically meaningful benchmark.

In sum, our results suggest a practical agenda: (i) understand when drift truly limits efficiency and when it can be mitigated; (ii) quantify the best constants achievable by advice-aware mechanisms; and (iii) design auction rules that are not merely incentive-compatible in a static sense, but under the constrained dynamics that real autobidders implement. The broader implication is that efficiency in modern ad markets is jointly produced by mechanism design, prediction quality, and online learning—and improving any one component is most valuable when it is explicitly coordinated with the other two.