← Back

Liquid Welfare without Convergence under (ε, δ)-Approximate Core Auctions

Metadata

Table of Contents

  1. 1. Introduction: why ‘core’ is too idealized for 2026 ad auctions; desiderata (safety, efficiency without convergence, auditable mechanism assumptions); contributions and diagnostic angle.
  2. 2. Related Work: pacing equilibria vs learning dynamics; core auctions in ad markets; robust PoA/approximate smoothness; constrained bandits; links to Gaitonde et al. (2026) and Lucier et al. (2024).
  3. 3. Model: repeated auction environment; budgets and optional ROI; allocation feasibility (downward-closed X); bandit feedback; definition of liquid welfare and ex-ante benchmark.
  4. 4. (ε, δ)-Approximate Core Auctions: exact core inequality used in prior proofs; new approximate condition; discussion of how reserves/quality can induce violations; closed-form reductions for single-item auctions.
  5. 5. Pacing Dynamics and Feasibility: generalized pacing algorithms; constraints satisfied ex post; epoch/threshold decomposition reused from prior work; what needs to hold for the welfare proof (event-feasibility).
  6. 6. Main Welfare Theorem under Approximate Core: statement for budget-only; extension to budget+ROI; explicit dependence on (ε, δ) and horizon; proof roadmap with a single ‘core step’ replaced by approximate core.
  7. 7. Empirical Diagnostics: log-based estimators/bounds for (ε, δ); restricted deviations (top-m allocations, single-item deviations); sampling coalitions; confidence intervals; when randomized bid perturbations are needed.
  8. 8. Extensions: weak non-stationarity (bounded drift), approximate welfare-maximization in allocation, heterogeneous step-sizes; discussion of how each changes the error terms.
  9. 9. Discussion and Policy Implications: interpreting measured core violation; design knobs (reserve/quality) vs welfare guarantees; auditability; limitations and open problems.

Content

1. Introduction: why ‘core’ is too idealized for 2026 ad auctions; desiderata (safety, efficiency without convergence, auditable mechanism assumptions); contributions and diagnostic angle.

Modern ad markets are increasingly mediated by : algorithmic agents that translate an advertiser’s value signals into bids while respecting hard spending constraints (budgets) and, in many applications, soft or hard performance constraints (e.g., minimum return on investment). By 2026, it is difficult to maintain the classical fiction that an advertiser strategically chooses a bid in a stationary equilibrium of a one-shot mechanism. Instead, bids are the byproduct of a feedback loop: the platform runs an auction each round; the autobidder observes its own allocation and payment; and it updates internal pacing parameters that throttle bids to manage long-run constraints. The relevant economic question is therefore not only whether the per-round mechanism has desirable equilibrium properties, but whether the overall of a fixed auction with learning-and-pacing dynamics is safe, efficient, and predictable over a finite horizon.

A prominent line of recent work has proposed a crisp answer under strong per-round assumptions: if each round’s outcome lies in the (revealed) core of the reported bids, then welfare guarantees can be obtained even of the learning dynamics. The core condition is appealing because it provides a direct charging argument: if a coalition of ``good’’ bidders is underserved relative to some benchmark allocation, then bidders outside the coalition must be paying enough to justify that loss. This style of analysis is particularly well suited to pacing, where individual bids are endogenous, evolve over time, and may not settle down to a fixed point. However, the exact core is also an idealization. In the auction formats deployed at scale—including single-item auctions with reserves, multi-slot variants with practical tie-breaking and pricing rules, and mechanisms with additional business constraints—the core can fail in small but systematic ways. Moreover, even when a mechanism is conceptually core-selecting, the system may deviate due to discretization, approximation, and incomplete enforcement of theoretical feasibility constraints.

Our starting premise is that insisting on exact core outcomes is too brittle as a modeling assumption for diagnostics and accountability in production ad auctions. In practice, platform designers and external auditors rarely have the leverage to certify exact core selection across all coalitions and all feasible deviations, and advertisers do not experience the system as a crisp cooperative solution concept. What practitioners can often do, however, is quantify how badly a mechanism violates a core-like inequality on the realized traffic. This suggests a more operational question: per-round outcomes are only approximately core, how does this approximation propagate through the pacing-and-learning pipeline into end-to-end welfare outcomes?

We answer this question by replacing exact core with an (εcore, δcore)-approximate core condition, parameterizing two distinct failure modes. A multiplicative slack εcore captures proportional weakening of the blocking power of outside payments, while an additive leakage δcore captures per-round ``unchargeable’’ gaps that cannot be supported even after scaling. This decomposition is not merely technical. Multiplicative violations correspond naturally to systematic pricing distortions (e.g., under-collection relative to bid-weighted welfare), while additive violations capture absolute frictions such as reserves, rounding, or omitted constraints. In applied settings, separating these channels matters: a small multiplicative violation can be tolerable at high scale, whereas a small additive leakage may accumulate linearly over time and meaningfully depress welfare for long-horizon campaigns.

This paper is organized around three desiderata that we view as central for theory that aims to inform the design and monitoring of large-scale ad auctions.

Advertisers and regulators care first about constraint satisfaction: budgets should not be exceeded, and when ROI targets are enforced, spending should be aligned with realized value. The pacing algorithms used in practice are engineered precisely to deliver such safety. Our welfare analysis is built to be compatible with that engineering reality: we treat the pacing rule as an online procedure that never overbids and enforces the relevant constraints round-by-round (or, in the ROI case, by maintaining appropriate dual variables). This perspective shifts attention away from equilibrium existence and toward the robustness of welfare guarantees under the actual control loops deployed in autobidding systems.

Even if a market has a stable equilibrium in principle, the learning dynamics may not converge within the horizon of a campaign, and the environment may be nonstationary at the time scales relevant for advertisers. We therefore adopt the ``welfare without convergence’’ viewpoint developed in recent work: welfare should be high on average over the horizon, on the realized path of play generated by pacing updates, without requiring that multipliers settle to a fixed point. This is particularly important when there are many bidders with heterogeneous constraints, where convergence can be slow or sensitive to step sizes, feedback delays, and measurement noise. The question becomes: what properties of the per-round auction are sufficient to guarantee good welfare ?

A theoretical condition is only as useful as our ability to check it in situ. Exact core is computationally and informationally demanding: it quantifies over all coalitions and all feasible reallocations, which is infeasible to validate from logs in realistic markets. Our goal is to articulate assumptions that can be or at least upper bounded empirically using the data that platforms routinely record (bids, allocations, payments). This leads naturally to approximate core as a diagnostic interface: rather than claiming a mechanism is core, we estimate how far it is from core on a targeted family of deviations relevant to the welfare proof, and translate that estimate into a predicted welfare loss.

Within this framing, our contributions are threefold.

Two caveats are worth emphasizing. First, approximate core is not a panacea: if violations are large or adversarially structured, no black-box welfare guarantee can be expected. Our point is instead that the by which core fails are often small and stable in mature systems, and those margins can be exploited analytically and empirically. Second, log-based estimation cannot hope to quantify over all coalitions and all feasible reallocations; any operational test necessarily restricts attention to a tractable set. We view this not as a weakness but as a reflection of how welfare proofs function: the analysis invokes only specific deviations (e.g., those induced by a benchmark allocation and a pacing-defined coalition), so validating approximate core on that restricted class is both sufficient for the guarantee and feasible to implement.

Ultimately, the model illuminates a tradeoff faced by ad platforms: one can design mechanisms that are closer to core but operationally complex, or accept small, measurable deviations due to reserves, simplifications, or policy constraints. Our results provide a quantitative bridge between these choices and their welfare implications under realistic pacing dynamics.


Our paper sits at the intersection of three literatures that have largely developed in parallel: (i) equilibrium models of pacing and budget management in ad auctions, (ii) ``welfare without convergence’’ analyses that treat pacing as an online learning dynamic, and (iii) cooperative and robust-inefficiency frameworks (core selection, smoothness, and their approximate variants) that convert per-round structural properties of an auction into multi-round welfare guarantees. We briefly situate our contribution in each thread and highlight how our use of (εcore, δcore)-approximate core can be interpreted as a bridge between mechanism design assumptions that are theoretically crisp and assumptions that are auditable in production systems.

A classical way to model budget-constrained bidders in repeated ad auctions is to posit a environment and study a fixed point in which bidders scale their bids by bidder-specific multipliers so as to exactly exhaust budgets in expectation (or per unit time). This idea appears in early work on budget smoothing and throttling in sponsored search and display advertising, and has been refined in modern equilibrium notions such as multiplicative pacing equilibria (MPE) and their variants . The equilibrium viewpoint is attractive because it yields comparative statics (e.g., how budget changes affect effective bids, prices, and allocations) and sometimes supports existence and (approximate) efficiency claims under regularity conditions. It also aligns with how platforms often describe pacing operationally: each advertiser has a ``pacing multiplier’’ that acts as a knob translating internal value signals into auction bids.

However, equilibrium analyses typically require either convergence to a fixed point or an assumption that the market is in steady state. In practice, ad markets are transient at the time scale of many campaigns: budgets reset, targeting changes, and value signals drift. Moreover, even when a static equilibrium exists, the implemented control loop is a stochastic online procedure with delayed and bandit feedback. These observations motivate a complementary line of work that treats pacing not as an equilibrium selection device but as a learning-and-control dynamic whose welfare properties must be understood over a finite horizon.

Recent work has developed a ``welfare without convergence’’ approach for paced bidding: rather than proving that multipliers converge to an equilibrium, one proves that the realized sequence of allocations under a pacing algorithm achieves high welfare on average, even if bids never settle . The key methodological move is to combine online learning invariants (e.g., approximate satisfaction of budget constraints over epochs; no-overbidding; complementary slackness-style statements for dual variables) with a per-round that converts welfare shortfalls of a subset of bidders into payments made by others. This is a natural fit for autobidding systems because it mirrors how such systems are engineered: the primary requirement is constraint safety, and performance is evaluated over the campaign horizon rather than at a putative steady state.

Our contribution is directly in the spirit of this line of work, but relaxes an assumption that is often too brittle for deployed mechanisms: prior analyses typically rely on each per-round auction outcome being in the (revealed) core of the reported bids, or satisfying an equivalent core-support inequality used in the charging step. We keep the same overall proof architecture but replace exact core with a quantitative, two-parameter approximation that degrades the charging argument smoothly.

Core-selecting auctions have a long history in combinatorial allocation problems, motivated by the desire to avoid outcomes that would be blocked by coalitions when VCG prices fall outside the core . In ad markets, core-like ideas arise because the platform is effectively selling a package of feasibility constraints (inventory, slots, targeting, pacing eligibility, and sometimes additional policy constraints) and because pricing rules must balance revenue, stability, and perceived fairness. Core-selecting or ``core-support’’ payment rules have been studied as alternatives to VCG in settings where VCG revenue can be low or where coalition deviations are a practical concern .

At the same time, ad-auction implementations often deviate from ideal core selection for reasons that are mundane but important: reserves, rounding, discrete bid grids, ad quality adjustments, and business constraints that are not fully captured by the modeled feasibility set. Our use of (εcore, δcore)-approximate core is designed to acknowledge these realities. The multiplicative term εcore captures systematic slack in the ability of outside payments to block deviations (e.g., under-collection relative to bid-weighted surplus), while the additive term δcore captures absolute frictions (e.g., reserve-induced gaps) that do not scale with bid magnitudes. This decomposition is also useful for diagnostics: it suggests distinct operational levers for platforms (pricing calibration versus rule-induced leakage).

A different but related tradition bounds welfare loss via price-of-anarchy arguments, using the smoothness framework and its variants . Smoothness has been extended to budgeted settings through the concept of and the associated liquid price of anarchy . These results are powerful because they require minimal assumptions on learning (often only no-regret) and can handle broad classes of games. However, the canonical smoothness approach typically treats each round as a game induced by the auction and focuses on equilibrium or no-regret outcomes in that game. In autobidding markets, the strategic object is not a fixed bid but a controller (pacing) that couples rounds through a hard budget constraint; the induced dynamics are therefore not well captured by per-round equilibrium concepts alone.

The welfare-without-convergence line can be viewed as complementary: it uses structural properties of the mechanism (core-like inequalities) plus properties of the controller (pacing invariants) to obtain finite-horizon welfare guarantees. Our approximate-core relaxation plays a role analogous to ``approximate smoothness’’ in robust PoA: it weakens a per-round inequality in a way that yields a controlled degradation of the final welfare bound. The difference is conceptual as well as technical: the inequality we relax is cooperative (a core blocking constraint) rather than noncooperative (a deviation inequality defining smoothness), and our ultimate benchmark is ex-ante liquid welfare rather than a Nash or coarse correlated equilibrium welfare.

From a learning perspective, pacing resembles a primal–dual method for online resource allocation: multipliers act as dual variables for budget (and ROI) constraints, and bid shading implements the primal action . There is a substantial literature on bandits with knapsacks and online convex optimization with long-term constraints, which provides regret guarantees and constraint violation bounds under bandit feedback . Our model deliberately adopts a simplified but operationally faithful abstraction: bidders observe only their own realized outcomes and update pacing multipliers via a gradient-like rule. The novelty is not the existence of such updates but their interaction with the mechanism’s per-round allocation and payment rules.

In particular, standard constrained-bandit analyses typically assume a fixed reward function chosen by the learner, whereas in auctions the mapping from actions (bids) to outcomes (allocations and payments) is endogenous and depends on other learners’ actions. Welfare-without-convergence results exploit additional structure—notably no-overbidding and core-like support inequalities—to control this strategic coupling. Our approximate-core condition can be interpreted as a way to make that structural requirement robust to implementation artifacts and policy constraints that are invisible to the learners but present in the platform.

Technically, our analysis is closest to the pacing welfare guarantees in (budget-only) and (budget plus ROI constraints). Both works share a common proof skeleton: decompose time into epochs where pacing multipliers are stable enough to relate realized spend to the per-round budget rate; define a ``good’’ set of bidders or rounds where bids are sufficiently informative; and then invoke a per-round core inequality to charge the coalition’s welfare deficit to payments of the complement. Our main conceptual step is to localize exactly where core is used and replace it with an approximate variant that introduces (i) a multiplicative loss in the charging rate and (ii) an additive per-round leakage. This yields an end-to-end welfare degradation model that is continuous in measured core violations, rather than an all-or-nothing dependence on exact core.

Finally, our emphasis on bounds connects to an emerging empirical and systems-oriented literature that seeks to validate auction properties from logged data. While full core verification is intractable in large markets, platforms routinely record the primitives needed to test core-like inequalities on restricted deviation classes (e.g., deviations corresponding to allocating to the highest bidders, or to structured alternative slot assignments in separable multi-slot environments). Our restricted-deviation perspective is motivated by how the welfare proofs actually function: they require only the deviations constructed by the analysis, not an arbitrary coalition’s best response. This stands in contrast to classical cooperative certification, but is closer to how mechanism monitoring and accountability are performed in practice.

In sum, existing work provides compelling welfare guarantees either by assuming strong per-round structural properties (exact core) or by applying robust PoA frameworks under equilibrium/no-regret conditions. We complement these approaches by showing that small, measurable departures from core can be propagated through realistic pacing dynamics to yield quantitative, finite-horizon welfare predictions.


3. Model: repeated auction environment; budgets and optional ROI; allocation feasibility (downward-closed X); bandit feedback; definition of liquid welfare and ex-ante benchmark.

We study a repeated auction environment meant to capture the control loop implemented by modern autobidders. Time is divided into T auction rounds indexed by t ∈ {1, …, T}. In each round t, a value profile vt = (v1, t, …, vn, t) ∈ [0, ]n is realized, where vk, t is bidder k’s value-per-unit (or an effective value signal) for the relevant impression opportunity in that round. We assume vt ∼ F i.i.d. across t, allowing arbitrary correlation across bidders within a round. The i.i.d. assumption is a modeling convenience that lets us state clean ex-ante benchmarks and apply standard concentration arguments; in practice values drift, but the quantities we use (epoch-average spend, and benchmark-average value) are often empirically stable over moderate horizons, and one can view F as the stationary distribution of a slowly mixing process.

Each round features a fixed auction mechanism M = (x, p). Given a bid profile bt = (b1, t, …, bn, t), the mechanism selects an allocation vector xt = x(bt) ∈ 𝒳 ⊆ [0, 1]n and a payment vector pt = p(bt) ∈ ℝ+n. The feasible set 𝒳 is assumed to be : if x ∈ 𝒳 and 0 ≤ x ≤ x coordinate-wise, then x ∈ 𝒳. This captures the idea that the platform can always withhold allocation without violating feasibility. Downward-closed constraints subsume many ad-auction environments, including single-item allocation (kxk ≤ 1), multi-slot assignment with separable click rates, and more general packing constraints used to model inventory and targeting.

We interpret xk, t ∈ [0, 1] as the (possibly fractional) amount of service'' delivered to bidder \(k\) in round \(t\)---for example, a click probability or an impression fraction after eligibility filtering. The realized gross value in round \(t\) is then \(v_{k,t}x_{k,t}\), and the payment is \(p_{k,t}\). We do not assume quasilinear utility maximization at the per-round level; instead, we model bidders as running a controller to maximize delivered value subject to long-run constraints. Nonetheless, we will use the standard individual rationality (IR) property, \[ 0\le p_{k}(b)\le b_{k}\,x_{k}(b)\qquad \forall k, \] as a convenient envelope condition relating payments to declared values. In most common ad auctions, IR holds (possibly after appropriate normalization of bids and quality scores), and it plays an important role in translatingdeclared’’ quantities expressed in bids into realized payments.

Each bidder k has a total budget Bk over the horizon. We write Bk = ρkT, where ρk is the corresponding per-round budget rate. The budget constraint is hard and must be satisfied ex post:
$$ \sum_{t=1}^T p_{k,t}\le B_k\qquad \forall k. $$
To accommodate performance constraints used in practice (e.g., return-on-ad-spend targets), we also allow an optional ROI constraint. Fix γk ≥ 1. The ROI requirement is that the bidder’s cumulative value must cover γk times cumulative spend:
$$ \gamma_k\sum_{t=1}^T p_{k,t}\le \sum_{t=1}^T v_{k,t}x_{k,t}\qquad \forall k. $$
When γk = 1, this reduces to ``value at least spend.’’ While ROI constraints can be expressed in many equivalent forms, this one is particularly convenient for primal–dual pacing dynamics: it yields a linear constraint in payments and delivered value and can be enforced by maintaining a nonnegative dual variable.

In each round t, bidder k observes its own value signal vk, t and submits an bk, t ≤ vk, t derived by shading value through a pacing multiplier. The basic (budget-only) form is
$$ b_{k,t}=\frac{v_{k,t}}{1+\mu_{k,t}}, $$
where μk, t ≥ 0 is a multiplier updated over time. In the budget+ROI setting, we allow two multipliers μk, tB and μk, tR and define μk, t = max {μk, tB, μk, tR, 0} so that
$$ b_{k,t}=\frac{v_{k,t}}{1+\mu_{k,t}}. $$
This form captures the operational reality that an autobidder typically has multiple active constraints and chooses the tightest one to determine how aggressively to bid.

The evolution of μk, t is governed by a gradient-like update using bandit feedback. Specifically, after round t bidder k observes (vk, t, bk, t, xk, t, pk, t) (and does not observe other bidders’ bids or values), and updates its multiplier(s) so as to push realized spend toward ρk and, when applicable, push realized ROI slack toward zero. One representative budget-only update is the projected rule
μk, t + 1 = Π[0, μmax] (μk, t − ηk(ρk − pk, t)),
and an ROI-aware variant updates μB in the direction of pk, t − ρk and μR in the direction of γkpk, t − vk, txk, t. Our welfare statements do not rely on a particular parameterization of the update, but rather on three high-level invariants that are standard in the pacing literature: (i) (bk, t ≤ vk, t always), (ii) (budget, and ROI if present, are satisfied ex post over the realized trajectory), and (iii) (when a constraint is slack, the algorithm does not maintain an artificially high multiplier that would systematically suppress bids). These conditions formalize the intended role of pacing as a dual controller rather than as a strategic instrument to manipulate per-round outcomes.

Given a profile of pacing algorithms (one per bidder) and the mechanism M, the interaction induces a (random) allocation sequence Φ = (x1, …, xT). We evaluate performance in expectation over v1, …, vT ∼ F (and over any internal randomness in the pacing updates, if present).

Budgets and ROI constraints prevent standard social welfare k, tvk, txk, t from being an appropriate efficiency yardstick, because value that cannot be supported by feasible spend is not implementable. Following prior work, we therefore use , which caps each bidder’s contribution by the binding resource. In the budget-only setting we define bidder k’s liquid value as
$$ W_k(x,v)=\min\left\{B_k,\sum_{t=1}^T v_{k,t}x_{k,t}\right\}, $$
and total liquid welfare as $W(x,v)=\sum_{k=1}^n W_k(x,v)$. Under budget+ROI constraints, the effective cap changes because each unit of spend must be backed by γk units of value; a convenient normalization is
$$ W_k(x,v)=\min\left\{B_k,\frac{1}{\gamma_k}\sum_{t=1}^T v_{k,t}x_{k,t}\right\},\qquad W(x,v)=\sum_{k=1}^n W_k(x,v). $$
This objective has a clear operational interpretation: it measures the maximum ``constraint-feasible’’ value that could be certified for each bidder, up to the point where additional delivered value no longer relaxes the binding constraint. Liquid welfare is therefore the natural quantity to compare mechanisms and dynamics when budgets (and ROI) are first-class constraints rather than soft penalties.

To state welfare guarantees without assuming convergence of bids or multipliers, we compare the realized liquid welfare of Φ to an ex-ante benchmark that does not depend on the realized trajectory. Consider a single-round allocation rule y : [0, ]n → 𝒳 that maps a value profile v to a feasible allocation y(v). If we were to apply y independently in each round and ignore strategic and feasibility coupling across time, bidder k would obtain expected per-round value 𝔼v ∼ F[yk(v)vk]. The per-round contribution to liquid welfare is capped by the per-round budget rate ρk = Bk/T (and scaled by 1/γk under ROI). This yields the ex-ante liquid welfare
$$ W(y,F)=\sum_{k=1}^n T\cdot \min\left\{\rho_k,\;\frac{1}{\gamma_k}\,\mathbb E_{v\sim F}\!\left[y_k(v)\,v_k\right]\right\}, $$
where setting γk ≡ 1 recovers the budget-only definition. We then define the optimal benchmark as
OPTex-ante(F) = supy : [0, ]n → 𝒳W(y, F).
This benchmark is deliberately coarse: it abstracts away from the intricate intertemporal coupling created by hard budgets and from the endogenous mapping from bids to outcomes. Precisely because it is coarse, it is stable and interpretable: it answers the question, ``if we could choose, in each round, a feasible allocation as a function of values, how much liquid welfare could we generate on average given the budget rates?’’ Our main results will show that realistic pacing dynamics, interacting with an auction that approximately supports outcomes against coalition deviations, achieve a constant fraction of this ex-ante optimum up to terms that vanish per round as T grows.


4. (ε, δ)-Approximate Core Auctions: exact core inequality used in prior proofs; new approximate condition; discussion of how reserves/quality can induce violations; closed-form reductions for single-item auctions.

The welfare guarantees we ultimately seek are ``without convergence’’: we do not assume bids or multipliers settle to a fixed point, and we do not model bidders as best-responding each round. Instead, we follow a line of argument in which auction structure is used to support an intertemporal charging scheme. In the existing analyses for paced bidding (notably, the budget-only proof of Gaitonde et al. and the budget+ROI proof of Lucier et al.), the per-round mechanism is assumed to satisfy an condition relative to the reported bids. Our first goal is conceptual: we isolate precisely how this core condition enters those proofs. Our second goal is practical: we relax exact core to an approximate condition that better matches modern ad auctions, and we show that the relaxation perturbs the welfare bound in a controlled (and measurable) way.

The core is a coalition-stability property: it says that no group of bidders can jointly reassign the allocation among themselves and be strictly better off, given what everyone else pays. In our repeated-auction setting, core is not invoked to argue equilibrium existence or incentive properties. Rather, it plays a single structural role: it upper-bounds the amount of that a chosen coalition could gain in the current round by switching to a benchmark allocation y, in terms of the payments made by bidders that coalition in the same round. Informally, when the proof identifies a set Ut of currently important'' bidders (e.g., those who are under-spending and hence should be bidding aggressively), it compares the realized allocation \(x_t\) to a hypothetical comparator \(y(v_t)\). If \(U_t\) loses declared welfare relative to \(y\), core lets uscharge’’ that loss to someone else: the bidders not in Ut must be paying enough to support the outcome. This is exactly the inequality one needs to transform a sequence of per-round deviations into a bound in terms of total revenue, and then (via IR and budget/ROI feasibility) into a bound in terms of liquid welfare.

Exact core is a strong assumption, and in many platform auctions it fails in systematic ways (we discuss several below). We therefore work with an inequality. Fix any bid profile b, any coalition S ⊆ [n], and any feasible alternative allocation y ∈ 𝒳. Let x(b) and p(b) be the mechanism outcome. Our condition requires
i ∉ Spi(b) ≥ (1 − εcore)∑i ∈ Sbiyi − ∑i ∈ Sbixi(b) − δcore.
The right-hand side is the coalition’s (discounted) declared value under the deviation y, net of its declared value under the realized allocation x(b). The left-hand side is the total payment collected from the complement coalition. When εcore = δcore = 0, this reduces to the exact inequality used in prior proofs. When εcore > 0, we permit a shortfall: outside payments may cover only a (1 − εcore) fraction of the coalition’s declared gain. When δcore > 0, we permit an leakage that does not scale with bids.

This particular parameterization is not cosmetic: it matches how the charging argument degrades. Multiplicative slack weakens the constant-factor approximation (the 1/2 becomes (1 − εcore)/2), while additive slack accumulates linearly over time, contributing an O(δcoreT) term. Put differently, εcore governs the at which revenue can justify a welfare loss, whereas δcore captures per-round violations that cannot be amortized away.

Exact core typically presumes that the mechanism outcome can be justified purely by payments against feasible reallocations. In ad auctions, several ubiquitous design choices create systematic gaps.

Hard reserves (or other platform-side eligibility constraints) can prevent reallocations that would otherwise be feasible under 𝒳. In our inequality, the deviation y is any allocation in 𝒳, not necessarily one that would clear the reserve. If the mechanism withholds allocation when the top bid is below a reserve, then coalitions that include the highest bidder can propose y that allocates to themselves; outside payments may be zero, so the inequality fails unless δcore is large. One can partially reconcile this by explicitly incorporating a seller-side value for withholding (treating the reserve as a proxy for platform value), but absent that modeling choice reserves appear exactly as additive core violations.

Many platforms rank by ``effective bids’’ (bid times quality) and charge prices that depend on predicted click-through or conversion rates. If the bids bi in our model are already effective values, then core may hold approximately; but if logs contain a mixture of raw bids, quality multipliers, and per-impression pricing, the coalition comparisons induced by y can misalign with realized payments. This is another source of apparent (ε, δ) slack: the mechanism may be stable against deviations measured in one set of units, but our analysis charges welfare losses measured in another. In empirical diagnostics, this motivates a careful normalization so that bixi is commensurate with payments (consistent with IR).

Even in mechanisms that are ``core-like’’ in an idealized model, implementation details—bid rounding, reserve bucketing, delayed budget checks, partial participation, and forecasting errors in click value—create small but non-negligible deviations. These are naturally captured by a small δcore that scales with the unit of account (e.g., cents-per-click) rather than with bidder values.

The full approximate-core condition quantifies over all coalitions S and all y ∈ 𝒳, which can be computationally daunting. In single-item environments, however, it reduces to a small set of economically interpretable inequalities. Suppose 𝒳 = {x ∈ [0, 1]n : ∑ixi ≤ 1} and the mechanism always allocates the item to at most one bidder. Let i denote the winner under x(b), and let b(1) ≥ b(2) ≥ ⋯ denote the order statistics of bids.

Consider a losing bidder j ≠ i and the coalition S = {j} with deviation y that allocates the item to j. Since xj(b) = 0 and yj = 1, the approximate-core inequality becomes
i ≠ jpi(b) ≥ (1 − εcore)bj − δcore.
In a single-item auction only the winner pays, so this reads
pi(b) ≥ (1 − εcore)bj − δcore   ∀j ≠ i.
Taking j to be the highest losing bidder yields the familiar ``winner pays (approximately) the second bid’’ implication:
pi(b) ≥ (1 − εcore)b(2) − δcore.
Thus, in this canonical case, approximate core can be checked by a per-round inequality comparing the observed payment to the (scaled) second-highest bid, up to an additive tolerance. Conversely, if the mechanism sometimes allocates to a bidder who is not the highest bidder (e.g., due to quality or reserve effects), then the same inequality with j equal to the highest bidder forces an even larger payment to maintain core; violations will register as larger inferred δcore or εcore.

This closed-form reduction matters for two reasons. First, it clarifies interpretation: in single-item settings, ``core-ness’’ is essentially a lower bound on revenue relative to competitive pressure from losers. Second, it underpins tractable diagnostics: from auction logs one can directly compute the per-round shortfall ((1 − ε)b(2) − pi)+ and take maxima or high quantiles as operational estimates of δcore for a chosen ε.

We emphasize two limitations. We do not claim that approximate core is necessary for good welfare, nor that real platforms explicitly optimize for it. Rather, we use approximate core as a structural condition that connects per-round auction design to long-run welfare under pacing. Moreover, while the definition quantifies over all coalitions and all feasible reallocations, our welfare proof will only instantiate it on a restricted set of coalitions and deviations arising from the charging argument. This is precisely what enables the log-based tests described later: we can measure the relevant core slack along the deviations that the proof actually uses, and translate that measured slack into a predicted welfare degradation.


5. Pacing Dynamics and Feasibility: generalized pacing algorithms; constraints satisfied ex post; epoch/threshold decomposition reused from prior work; what needs to hold for the welfare proof (event-feasibility).

Before we state the welfare theorem, we make explicit what we require from the bidders’ pacing dynamics. We deliberately do require convergence of bids or multipliers, nor do we assume bidders best-respond. Instead, we view pacing as an online primal–dual method for enforcing intertemporal constraints in a repeated per-round mechanism. The welfare argument will only use a small set of robust consequences of this view: (i) bids are always feasible in the sense of no-overbidding, (ii) realized play satisfies budget (and ROI, if present) , and (iii) the pacing updates imply that, on any time interval in which a bidder remains ``similarly paced,’’ cumulative spend is close to the per-round budget rate up to a finite-horizon error.

We allow any algorithmic variant that produces a per-round effective bid of the form
$$ b_{k,t}\;=\;\frac{v_{k,t}}{1+\mu_{k,t}} \qquad\text{(budget-only)} $$
or, with ROI,
$$ b_{k,t}\;=\;\frac{v_{k,t}}{1+\max\{\mu^B_{k,t},\mu^R_{k,t},0\}}. $$
The multipliers are updated by a (possibly projected) subgradient step using only bandit feedback, e.g.,
μk, t + 1 = Π[0, μmax] (μk, t − ηk(ρk − pk, t))
or the two-multiplier update of Lucier et al. that separately tracks budget and ROI slacks. The particular parametrization is not important; what matters is that the update is monotone in the constraint residual: when a bidder spends above its target rate in a round, its multiplier weakly increases (making future bids smaller), and when it spends below target, its multiplier weakly decreases.

Our first requirement is that the algorithm never causes a bidder to violate its constraints (and the platform’s standard budget enforcement). Concretely, we assume:

For all k, t, bk, t ≤ vk, t. Under the multiplicative pacing forms above and μ ≥ 0, this holds mechanically.

The mechanism is IR with respect to bids: 0 ≤ pk, t ≤ bk, txk, t. Together with no-overbidding, this implies pk, t ≤ vk, txk, t, a key inequality when we later compare payments to value obtained.

Budgets Bk (and ROI constraints, when present) are satisfied on the realized path. For budgets, this can be ensured either by the platform’s hard stop (no participation after budget exhaustion) or by a pacing scheme that guarantees tpk, t ≤ Bk by construction; our analysis is agnostic. For ROI, the Lucier et al. two-multiplier scheme enforces γktpk, t ≤ ∑tvk, txk, t by treating the ROI residual γkpk, t − vk, txk, t as a constraint violation that triggers additional shading.

We refer to these properties collectively as : on every realized history, the bidder never overbids and never violates the relevant intertemporal constraints.

The welfare analysis in the next section will repeatedly condition on a ``good’’ event under which empirical sums behave regularly across many adaptively chosen time intervals. The proof never uses that the multipliers converge; it uses only that (a) multipliers move in the correct direction when a bidder is over- or under-spending, and (b) because values and payments are bounded, the cumulative deviation from the target rate cannot be large on intervals during which the multiplier stays in a fixed range.

To make this concrete, let us focus on a single bidder k and define the per-round budget residual
gk, t := pk, t − ρk.
A standard projected subgradient argument yields a pathwise inequality of the form
$$ \sum_{t=1}^T g_{k,t} \;\le\; \frac{\mu_{\max}}{\eta_k} \;+\; \eta_k \sum_{t=1}^T g_{k,t}^2, $$
up to constants depending on the projection and initialization. Since pk, t ∈ [0, ] by IR and no-overbidding (and xk, t ≤ 1), we have |gk, t| ≤ max {, ρk} ≤ , so tgk, t2 ≤ T2. Thus, with an appropriate choice of step size $\eta_k\asymp 1/\sqrt{T}$, the budget mismatch t(pk, t − ρk) is $O(\bar v\sqrt{T})$. This type of bound is the first finite-horizon ingredient: it implies that a bidder cannot systematically under-spend or over-spend relative to ρk without driving its multiplier to a boundary, which the projection prevents.

The welfare proof does not charge mismatches only over the full horizon; it charges them over carefully chosen subsets of rounds in which the proof treats a bidder as aggressive'' orconservative.’’ Following prior work, we therefore reuse an epoch decomposition based on multiplier . For budget-only pacing, define threshold bins indexed by j ∈ {0, 1, …, J},
Ij := [2j, 2j + 1)   for j < J,   IJ := [2J, μmax],
where J = ⌈log2(1 + μmax)⌉. An for bidder k is a maximal contiguous set of rounds on which μk, t ∈ Ij for some fixed j. Intuitively, within an epoch the bidder’s shading factor 1/(1 + μk, t) is within a constant factor, so bids remain comparable to values and the bidder’s ``intended’’ spend level is stable. The proof constructs coalitions Ut by selecting bidders whose multipliers are below a threshold (i.e., those who should be bidding close to value because they are under-spending), and the epoch partition is what lets us sum per-round inequalities over long stretches without losing control of constants.

In the ROI setting, one performs the same decomposition with μk, t = max {μk, tB, μk, tR, 0}, while separately tracking which multiplier is active. The important point is that each bidder has only O(log (1 + μmax)) multiplier scales, so union bounds over epochs remain tractable.

We next state the qualitative form of the epoch property we need. Fix a bidder k and an epoch E ⊆ {1, …, T}. Because the update responds to pk, t − ρk, if μk, t stays within a bin Ij throughout E, then the cumulative drift of μk, t over E is limited. Telescoping the update implies
$$ \Bigl|\sum_{t\in E} (p_{k,t}-\rho_k)\Bigr| \;\le\; O\!\left(\frac{2^{j+1}}{\eta_k}\right) \;+\;\text{(projection slack)}. $$
With $\eta_k\asymp 1/\sqrt{T}$, this gives an $O(2^j\sqrt{T})$ deterministic control. In addition, since the realized process is stochastic through vt ∼ F, we also require a control ensuring that empirical sums over the (random, history-dependent) epochs do not fluctuate too much beyond their conditional expectations. Because each pk, t ∈ [0, ], standard martingale concentration yields that, with probability at least 1 − δ,
$$ \sup_{k}\;\sup_{E\in\mathcal E_k}\; \Bigl|\sum_{t\in E} (p_{k,t}-\mathbb E[p_{k,t}\mid \mathcal F_{t-1}])\Bigr| \;\le\; \tilde O\!\bigl(\bar v\sqrt{|E|}\bigr), $$
where k is the set of epochs for bidder k and t − 1 is the history up to round t − 1. The (⋅) hides logarithmic factors in n, T, 1/δ, arising from a union bound over bidders and epochs. Combining this with the deterministic drift control yields the ``approximately matches ρk per round’’ property that prior pacing analyses use.

A final pacing property is needed to relate bids back to values in the welfare comparison. If a bidder is persistently under-spending, a well-designed pacing algorithm should not keep it heavily shaded; otherwise we could not argue that its declared welfare tracks its true welfare benchmark. We therefore assume a mild no-unnecessary-pacing'' condition: whenever a bidder’s budget (and ROI, if present) is slack along the realized path, its multiplier is driven toward \(0\), so that \(b_{k,t}\approx v_{k,t}\). Formally, one can phrase this as: if over an interval the bidder’s realized spend is below \(\rho_k\) by more than the finite-horizon error terms, then \(\mu_{k,t}=0\) on most rounds of that interval. This is exactly what the negative feedback in the update enforces, and it is what allows the proof to treat low-\(\mu\) bidders as a coalition that istrying’’ to buy the benchmark allocation.

In the next section, we will plug approximate core into an otherwise standard charging proof. All of the pacing content above is used only to (i) ensure feasibility and permit the conversion from payments to obtained value (via IR and no-overbidding), and (ii) justify the epoch-based decomposition that controls how much a bidder can be under- or over-spending while maintaining a given multiplier scale. The resulting finite-horizon errors are precisely the $\tilde O(n\bar v\sqrt{T})$ terms that will appear in the main theorem. Crucially, none of these steps depends on core or on equilibrium reasoning; they are purely dynamical properties of the pacing update coupled with bounded per-round outcomes.


6. Main Welfare Theorem under Approximate Core: statement for budget-only; extension to budget+ROI; explicit dependence on (ε, δ) and horizon; proof roadmap with a single ‘core step’ replaced by approximate core.

The welfare guarantees in prior pacing analyses are often phrased as if they required a strong cooperative stability property of the per-round auction (exact core). Our starting point is that, for the repeated-auction welfare comparison, the core is not used as a global equilibrium concept; it is used as a that lets us ``pay for’’ a coalition’s missed declared welfare by charging it to the payments collected from the complement in that same round. This observation suggests a natural robustness question for practice: if the implemented auction is only core (because of reserves, discretization, throttling, or other platform constraints), do we still obtain a meaningful welfare bound, and how does the loss scale with the measured violation?

We answer this by parameterizing the mechanism’s deviation from core by (εcore, δcore) and propagating these parameters through the charging argument. The resulting bound is intentionally explicit in horizon T: multiplicative slack εcore degrades the $\tfrac12$ approximation factor, while additive slack δcore creates a per-round ``leak’’ that accumulates linearly.

Fix budgets Bk = ρkT. Suppose bidders follow any generalized pacing dynamics satisfying event-feasibility and the epoch rate-matching properties used in the standard pacing analyses (as formalized in the preceding section). Suppose also that each per-round mechanism M = (x, p) is IR with respect to bids and satisfies (εcore, δcore)-approximate core: for every bid profile b, every coalition S ⊆ [n], and every feasible allocation y ∈ 𝒳,
i ∉ Spi(b) ≥ (1 − εcore)∑i ∈ Sbiyi − ∑i ∈ Sbixi(b) − δcore.
Let Φ denote the induced allocation-sequence rule when all bidders pace. Then for any single-round allocation rule y : [0, ]n → 𝒳,

and hence the same bound holds with W(y, F) replaced by OPTex-ante(F) = supyW(y, F).

Two features of~ are worth emphasizing. First, the approximation factor is $\tfrac12$ when εcore = 0, matching the exact-core results; approximate core simply multiplies the factor by (1 − εcore). Second, δcore enters additively and scales as T: this is unavoidable because an additive violation is, by definition, a per-round shortfall in the inequality that cannot be amortized away by averaging over time.

Now suppose each bidder k additionally has an ROI constraint with parameter γk ≥ 1, and bidders use a two-multiplier pacing rule that enforces both budget and ROI ex post (e.g., the Lucier et al. update). In this case we evaluate performance using the ROI-scaled liquid welfare,
$$ W_k(x,v)\;=\;\min\Bigl\{B_k,\;\frac{1}{\gamma_k}\sum_{t=1}^T v_{k,t}x_{k,t}\Bigr\}, \qquad W(x,v)\;=\;\sum_k W_k(x,v), $$
and the corresponding ex-ante benchmark W(y, F) (with the same γk scaling). Under the same per-round (εcore, δcore)-approximate core and IR assumptions, the same guarantee as~ holds after replacing W(⋅) by this budget+ROI liquid welfare. Intuitively, ROI affects the definition of what counts as ``liquid’’ value, but it does not alter the role played by approximate core: we still need only a per-round inequality that converts a coalition’s missed surplus into payments made by others.

We next describe how approximate core enters the welfare argument. The proof is a modification of the standard charging template, and we isolate the only point where core is used.


Fix any benchmark rule y. Round-by-round, we compare the realized allocation xt to the benchmark allocation y(vt). The basic object is the benchmark-improvement term
$$ \sum_{k=1}^n b_{k,t}\bigl(y_k(v_t)-x_{k,t}\bigr), $$
which measures how much additional declared welfare (at the current bids) a coalition could obtain by switching the allocation from xt to y(vt). The pacing analysis constructs, for each round t, a coalition Ut of bidders who are ``under-paced’’ in that round (low multiplier, hence bidding close to value). For these bidders, declared welfare tracks true welfare up to controlled finite-horizon errors; for the complement Utc, we will charge losses to their payments.


Under exact core, one obtains an inequality of the form
k ∈ Utbk, t(yk(vt) − xk, t)+ ≤ ∑k ∉ Utpk, t,
for the particular choice of Ut and y appearing in the proof. With (εcore, δcore)-approximate core, this becomes the approximate charging inequality

This is exactly where εcore and δcore enter, and it is the only place they enter: all other steps use only pacing invariants, boundedness, and concentration.


Summing~ over t yields a bound on the total missed declared welfare of the good'' bidders \(\{U_t\}\) in terms of total payments made by thebad’’ bidders {Utc}, plus an additive δcoreT/(1 − εcore) term. The pacing epoch decomposition is then used to show that bidders spend close to their target rate when their multipliers lie in the relevant ranges, and that bidders who are persistently under-spending must have small multipliers and hence bid close to value on most rounds. This is the step that contributes the finite-horizon term $\tilde O(n\bar v\sqrt T)$: it is the price of working without convergence and with adaptively chosen epochs.


IR and no-overbidding imply pk, t ≤ bk, txk, t ≤ vk, txk, t. Summing and applying the definition of liquid welfare yields tpk, t ≤ Wk(x, v) up to truncation at Bk; aggregating across bidders gives k, tpk, t ≤ W(x, v). This is the second accounting identity in the $\tfrac12$ argument: revenue cannot exceed liquid welfare.


The canonical structure is: (i) missed (benchmark) welfare for the under-paced coalition is at most (a constant multiple of) payments from the complement, and (ii) payments from the complement are at most realized liquid welfare. Approximate core weakens (i) by a factor 1/(1 − εcore) and adds δcore/(1 − εcore) per round; carrying these through yields~. In particular, the multiplicative factor becomes (1 − εcore)/2 after rearrangement, and the additive leakage accumulates as O(nδcoreT) when we sum across the (random) coalitions and apply the union bounds already present in the epoch analysis.

The theorem provides an end-to-end translation from a mechanism property that can, in principle, be (εcore, δcore) into a welfare prediction under realistic nonconvergent pacing dynamics. At the same time, the bound is intentionally conservative. If δcore is not small on the scale of per-round values, the linear-in-T leakage dominates and the guarantee becomes vacuous; this reflects a real failure of the charging logic rather than an artifact of the proof. Likewise, if εcore is close to 1, the multiplicative guarantee collapses because the mechanism can no longer reliably block profitable coalition deviations. These caveats motivate the next section: we need operational diagnostics that estimate (εcore, δcore) in the specific deviation directions that the proof actually uses, rather than insisting on an intractable worst-case check over all coalitions and all feasible reallocations.


7. Empirical Diagnostics: log-based estimators/bounds for (ε, δ); restricted deviations (top-m allocations, single-item deviations); sampling coalitions; confidence intervals; when randomized bid perturbations are needed.

The welfare bound becomes actionable only if we can attach data-driven magnitudes to εcore and δcore. A direct worst-case verification of approximate core is infeasible in large ad auctions because the definition quantifies over all coalitions S ⊆ [n] and all feasible reallocations y ∈ 𝒳. Our key observation is that the repeated-auction welfare proof does require this full quantification: it invokes the core inequality only for particular deviation directions y and particular coalitions S (constructed from bids and pacing states) that appear in the charging argument. This suggests a diagnostic pipeline that is deliberately : we select a restricted family of coalitions 𝒮 and deviations 𝒴 sufficient for the analysis, and we estimate the smallest (ε, δ) for which the mechanism appears (ε, δ)-approximately core on 𝒮 × 𝒴 along the realized bid path.

Formally, for each observed round t with log (bt, xt, pt), coalition S ∈ 𝒮, and deviation y ∈ 𝒴, define the one-round statistic

where [⋅]+ = max {⋅, 0}. By construction, the mechanism is (ε, δ)-approximately core on (𝒮, 𝒴) in round t if and only if Vt(S, y; ε) ≤ δ for all (S, y) ∈ 𝒮 × 𝒴. This statistic uses only bids, allocations, and payments; crucially, it does require counterfactual simulation of the mechanism under alternative bids, because the core inequality compares the realized outcome (xt, pt) to an arbitrary feasible allocation y evaluated at the same bid profile.

To obtain a computable diagnostic, we restrict to deviation maps y(⋅) that (i) are feasible under 𝒳, (ii) can be evaluated quickly from bids, and (iii) capture the benchmark reallocations used in charging arguments. Two canonical examples cover many ad-auction implementations.

Here 𝒳 = {x ∈ {0, 1}n : ∑ixi ≤ 1}, and a natural deviation is to reallocate the item to the highest bidder in a chosen set. For any m ∈ {1, …, n}, define ytop-m(b) to allocate the item to the highest bidder among the top m bids (or, more generally, to the highest bidder within a coalition S). Because the welfare proof typically compares against an ex-ante benchmark that behaves like a welfare maximizer subject to budgets, the only reallocations that matter are those that move the item from a lower bid to a higher bid among a designated group of “good” bidders; ytop-m and its coalition-restricted variants cover precisely these moves.

Suppose 𝒳 is a polymatroid feasibility set induced by positions with click rates α1 ≥ α2 ≥ ⋯, and allocations are fractional click shares xi ∈ [0, 1] satisfying feasibility. A tractable deviation family is to compute, for each bid profile b, the allocation that maximizes ibixi within a restricted alternative ordering (e.g., permuting only a prefix of bidders or only the bidders in a candidate coalition). In polymatroid environments this can be implemented via greedy sorting by bid and assigning along the rank function; thus y(b) is computable from logs without simulating the auction’s payment rule. Empirically, we find that restricting to a small menu of such deviations (top-m reorderings, single-swap deviations, or “move a bidder up one slot”) already identifies the dominant sources of non-core behavior such as reserves and discretized pricing.

The second obstacle is the exponential number of coalitions. Here again the proof structure guides a reduction. In the welfare analysis, the coalitions Ut are not arbitrary: they are defined by pacing states or by “bidding close to value” conditions (under-paced bidders). In practice, we can operationalize 𝒮 as the family of coalitions produced by the same construction applied to observed behavior—for example,
Ut(τ) = {k : μk, t ≤ τ},
for a grid of thresholds τ, or, when multipliers are not logged, proxies such as “high bid-to-spend” bidders or “bidders below their target spend rate over the last window.” This produces a small, interpretable set of coalitions that correspond to the coalitions whose welfare losses we attempt to charge to revenue.

When we do want additional coverage beyond proof-generated coalitions, we can sample coalitions randomly. A simple approach is to sample L subsets S(1), …, S(L) each round (or each day) from a distribution over subsets (e.g., fixed size m, or independent inclusion with probability q), and to compute max, yVt(S(), y; ε). This does not certify worst-case approximate core, but it yields a controlled, distribution-dependent notion of “typical” violations that aligns with how the repeated process visits bid profiles.

Given finite sets 𝒮 and 𝒴, an operational estimator for δ at a fixed ε is the empirical maximum violation

This statistic is intentionally conservative: it treats the worst observed violation as binding. If we prefer robustness to rare outliers (e.g., instrumentation glitches, atypical throttling), we can instead report a high quantile, such as the (1 − α)-quantile over t of maxS, yVt(S, y; ε), and interpret the result as an “(ε, δ) guarantee on 1 − α fraction of rounds.” This relaxation often matches operational needs because welfare predictions are themselves average-case.

To estimate ε, we can sweep a grid ε ∈ {0, ε1, …, εJ} and report the resulting curve δ̂(ε). This curve makes the multiplicative–additive tradeoff explicit: allowing a slightly larger multiplicative slack can dramatically shrink the additive leakage, especially in environments where violations scale roughly proportionally with the size of i ∈ Sbi, tyi. When desired, we can invert the relationship to find the smallest ε such that δ̂(ε) ≤ δ0 for a target tolerance δ0 (chosen, for instance, as a small fraction of per-round revenue).

For statistical uncertainty, we emphasize that δ̂(ε) is a functional of the realized path and therefore depends on the stochastic process generating bids. Under i.i.d. rounds (or weaker mixing), and with |Vt| ≤ n by bounded bids and feasibility, we can attach high-probability bounds to the violation for any fixed (S, y) using Bernstein/Hoeffding inequalities, and then union bound over |𝒮||𝒴|. Concretely, letting $\overline V(S,y;\varepsilon)=\frac1T\sum_t V_t(S,y;\varepsilon)$, with probability at least 1 − β,
$$ \max_{S\in\mathcal S,y\in\mathcal Y}\Bigl(\mathbb E[V_t(S,y;\varepsilon)]-\overline V(S,y;\varepsilon)\Bigr) \;\le\; O\!\left(n\bar v\sqrt{\frac{\log(|\mathcal S||\mathcal Y|/\beta)}{T}}\right). $$
This gives a principled confidence interval for an version of δ. While our welfare theorem is stated with a per-round δcore, in practice it is often more meaningful to calibrate the additive leakage to the average frequency and magnitude of violations along the operating regime induced by pacing; the above bound provides one route to do so. When rounds are weakly non-stationary, block bootstrap or martingale concentration can be substituted, and we return to such extensions in the next section.

A fundamental limitation of any purely observational test is support: logs reveal violations only on bid profiles that actually occur. If the mechanism has sharp discontinuities (bid discretization, reserve cliffs, quota-based throttling), the worst violations may live in narrow regions that the natural pacing dynamics rarely explore. Moreover, some deviation families y that are theoretically relevant may depend on latent objects not in standard logs (e.g., value-based benchmarks or ROI-relevant transformations), in which case we can only approximate them with bid-based surrogates.

In such cases, small randomized bid perturbations can be valuable, not to “force” violations, but to ensure local coverage around the operating point and to reduce sensitivity to tie-breaking and discretization. Practically, this can be implemented as occasional controlled shading/noising of submitted bids within a narrow band that preserves no-overbidding and budget feasibility, combined with careful accounting so that the platform’s standard objectives are not materially affected. The diagnostic benefit is that it generates additional bid realizations near critical thresholds (reserves, minimum increments), allowing us to estimate whether the observed (ε, δ) are stable properties of the mechanism or artifacts of limited exploration.

The output of this section is thus not a single number, but a compact set of empirical objects: a deviation menu 𝒴, a coalition menu 𝒮, and an estimated tradeoff curve ε ↦ δ̂(ε) (with uncertainty bands when possible). Plugging these estimates into the welfare bound yields an end-to-end diagnostic of the form “measured core leakage predicted welfare loss,” which can be tracked over mechanism changes (reserves, pricing tweaks, throttling policies) and over time. The next section discusses how this diagnostic must be adjusted when the environment is only weakly stationary, when the per-round allocation is only approximately welfare-maximizing, and when pacing updates are heterogeneous across bidders.


8. Extensions: weak non-stationarity (bounded drift), approximate welfare-maximization in allocation, heterogeneous step-sizes; discussion of how each changes the error terms.

Our baseline theorem is intentionally modular: the only place where the environment enters is through (i) concentration needed to relate realized averages to expectations under F, and (ii) the pacing invariants that prevent systematic under- or over-spending. This modularity lets us accommodate three common departures from the clean i.i.d. model that arise in practice: (a) slow drift in the value distribution, (b) allocation rules that are only approximately welfare-maximizing in bids (due to quality factors, discretization, or heuristics), and (c) heterogeneous (and possibly time-varying) pacing step-sizes across bidders. In each case, the structure of the welfare guarantee is unchanged—a constant-factor approximation up to additive error terms—but the additive terms acquire extra components that can be estimated from logs in the same spirit as δ̂(ε).

Seasonality, product launches, and shifting query mixes mean that the per-round distribution over value profiles is rarely exactly i.i.d. over long horizons. A tractable relaxation is to assume rounds: vt ∼ Ft with Ft slowly varying. To make ``slowly’’ operational, we can measure drift by a variation budget
$$ \mathsf{Var}(F_{1:T}) \;:=\; \sum_{t=1}^{T-1} d(F_t,F_{t+1}), $$
where d(⋅, ⋅) is any metric that upper-bounds changes in expectations of bounded functions (e.g., total variation distance). Because bids and feasible allocations are bounded, the relevant class of functions has range at most n, so drift translates directly into error in replacing time-averages by expectations.

Under bounded drift, the right benchmark is no longer a single ex-ante optimum OPTex-ante(F). Instead we compare to a benchmark that knows the sequence (Ft) but must commit to a single-round rule y:
$$ \mathrm{OPT}^{\mathrm{ex\mbox{-}ante}}(F_{1:T}) \;:=\; \sup_{y:[0,\bar v]^n\to\mathcal X}\; \sum_{t=1}^T \sum_{k=1}^n \min\Bigl\{\rho_k,\tfrac{1}{\gamma_k}\,\mathbb E_{v\sim F_t}[y_k(v)\,v_k]\Bigr\}. $$
The proof then proceeds exactly as before on each round, but whenever we invoke concentration (or empirical-to-expectation comparisons) we incur an additional drift term. Concretely, one obtains a bound of the schematic form
$$ \mathbb E[W(\Phi,F_{1:T})] \;\ge\; \frac{1-\varepsilon_{\mathrm{core}}}{2}\, \mathrm{OPT}^{\mathrm{ex\mbox{-}ante}}(F_{1:T}) \;-\; \tilde O(n\bar v\sqrt T) \;-\; O(n\delta_{\mathrm{core}}T) \;-\; O\!\bigl(n\bar v\,T\,\mathsf{Var}(F_{1:T})\bigr), $$
where the last term captures the price of non-stationarity. The dependence on T Var(F1 : T) is pessimistic but interpretable: if the distribution moves substantially each day, then any single fixed rule y is necessarily miscalibrated, and our repeated-auction argument cannot ``average out’’ the mismatch. When drift is genuinely mild (e.g., Var(F1 : T) = o(1)), the additional error is lower order. Methodologically, we also note that the same drift control can be applied to the diagnostic statistics Vt(S, y; ε): if violations are stable over time, then the measured δ̂(ε) remains meaningful despite changing Ft.

Many ad auction implementations deliberately deviate from exact bid-welfare maximization: ranking may include predicted quality, reserve-induced discontinuities, or discrete price grids; feasibility may be approximate due to latency constraints; and multi-stage pipelines may perform filtering before the final allocation. These features can be modeled by assuming that the allocation x(⋅) is only approximately optimal for the declared objective ibixi. A convenient condition is that for all bid profiles b,

for some εalloc ∈ [0, 1) and δalloc ≥ 0. This is a purely bid-based guarantee and therefore testable from logs in the same way as our restricted-deviation core tests: the right-hand side depends only on bids and feasibility, not on payments.

How does interact with approximate core? The key observation is that our welfare analysis ultimately needs to upper-bound of the form i ∈ Ubi, t(yi − xi, t)+ for particular U and y. Approximate core bounds these gaps by (outside) payments plus δcore, while approximate allocation bounds them by intrinsic inefficiency of x relative to the best feasible reallocation on bids. Thus we can take the tighter of the two bounds round-by-round, yielding a guarantee that degrades gracefully when one of the properties is weaker.

At the level of aggregate rates, a conservative way to record this is to treat εalloc as an additional multiplicative slack (reducing the constant approximation factor) and δalloc as an additional per-round additive leakage. The resulting welfare statement takes the same shape as before but with
$$ \frac{1-\varepsilon_{\mathrm{core}}}{2} \quad\leadsto\quad \frac{(1-\varepsilon_{\mathrm{core}})(1-\varepsilon_{\mathrm{alloc}})}{2}, \qquad O(n\delta_{\mathrm{core}}T) \quad\leadsto\quad O\bigl(n(\delta_{\mathrm{core}}+\delta_{\mathrm{alloc}})T\bigr). $$
This formulation is intentionally ``accounting-like’’: it separates welfare loss attributable to coalition-blocking failures (core leakage) from welfare loss attributable to allocation heuristics (approximate maximization). In practice, we often expect δalloc to be concentrated around known discontinuities (e.g., reserve cliffs), which makes it useful as a mechanism-debugging metric even when it is small in aggregate.

Our baseline $\tilde O(n\bar v\sqrt T)$ term hides the dependence on learning rates and on the stability of the pacing multipliers. In operational systems, bidders choose different step-sizes (or the platform imposes different smoothing) due to heterogeneous volatility and risk tolerance. We can accommodate this by tracking error terms bidder-by-bidder.

Consider, for simplicity, budget-only pacing with projected subgradient updates
μk, t + 1 = Π[0, μmax](μk, t − ηk(ρk − pk, t)),
with bidder-specific ηk > 0. Standard online convex optimization bounds imply that each bidder satisfies a spend-tracking inequality of the form
$$ \sum_{t=1}^T (\rho_k-p_{k,t}) \;\le\; O\!\left(\frac{\mu_{\max}^2}{\eta_k}+\eta_k T \bar v^2\right)^{1/2}, $$
up to logarithmic factors when we sharpen with high probability. When inserted into the epoch decomposition used in the welfare proof, this implies that the global finite-horizon term scales as
$$ \tilde O\!\left(\sum_{k=1}^n \bar v\sqrt{\frac{T}{\eta_k}}\right) \quad\text{or more conservatively}\quad \tilde O\!\left(\sum_{k=1}^n \left(\frac{\mu_{\max}^2}{\eta_k}+\eta_k T \bar v^2\right)^{1/2}\right), $$
rather than $\tilde O(n\bar v\sqrt T)$. The interpretation matches practice: aggressive updates (large ηk) reduce bias from slow adaptation but increase variance and boundary-crossing events; conservative updates do the reverse. If $\eta_k\propto 1/\sqrt T$ uniformly, we recover the baseline rate.

Time-varying step-sizes ηk, t can be handled similarly by replacing ηkT with $\sum_{t=1}^T \eta_{k,t}$ and μmax2/ηk with μmax2/ηk, 1 (or the appropriate telescoping potential term). This extension is particularly relevant when pacing is tuned adaptively (e.g., smaller steps during stable periods and larger steps during shocks). The main limitation is conceptual rather than technical: heterogeneous learning dynamics mean that the ``proof-aligned’’ coalitions Ut may evolve differently across bidders, so empirical diagnostics should condition coalition construction on bidder-specific smoothing windows to avoid mistaking slow adaptation for systematic welfare loss.

Taken together, these extensions reinforce the same overarching message: the constant-factor welfare guarantee is robust, but the additive terms provide the right language for engineering tradeoffs. Drift, heuristic allocation, and heterogeneous learning do not invalidate the framework; they simply move mass from the approximation factor into measurable, auditable leakage terms that we can attribute to specific system components.


9. Discussion and Policy Implications: interpreting measured core violation; design knobs (reserve/quality) vs welfare guarantees; auditability; limitations and open problems.

Our results suggest a practical way to interpret measured core violation'' as an engineering-relevant notion of welfare risk. The core inequalities are not merely axioms of cooperative stability; in the pacing-based welfare proofs, they are the single device that converts a bidder-side welfare shortfall (relative to a benchmark allocation) into a statement about \emph{who must have paid for that shortfall} in the same round. When core holds exactly, this charging step is tight; when core holds only approximately, the charging step leaks by precisely the measured parameters \((\varepsilon_{\mathrm{core}},\delta_{\mathrm{core}})\). The conceptual takeaway is that \(\varepsilon_{\mathrm{core}}\) and \(\delta_{\mathrm{core}}\) are not abstractgame-theoretic error bars’’: they are interpretable summaries of how often, and by how much, the mechanism produces outcomes that cannot be supported by contemporaneous outside payments.

This interpretation is especially useful because it turns core-like structure into an auditable diagnostic. In many auction deployments, it is unrealistic to assert that a complex allocation and pricing stack is exactly core, even when the underlying idealized mechanism would be. Yet, from logs, we can still compute a worst-case violation over the deviation family that matters for the welfare argument. Operationally, this yields a simple workflow: (i) select the deviation class 𝒴 and coalitions 𝒮 dictated by the proof skeleton (e.g., ``high-bid’’ coalitions and a benchmark allocation), (ii) compute the empirical violation statistic round-by-round, and (iii) aggregate to obtain a high-probability upper bound on δcore at a chosen εcore. If the resulting δ̂(ε) is small, then the same analysis that underwrites the welfare theorem predicts only a small additive welfare loss attributable to non-core behavior. If it is large, we obtain a concrete target for mechanism debugging: find and explain which design elements create the observed leakage.

Design choices that are natural in product terms often map cleanly onto these two parameters. Reserves are the canonical example. A strict reserve can create rounds in which a coalition of bidders could, in principle, reallocate the item among themselves at bids above the reserve and still increase their declared surplus, but the mechanism refuses to allocate (or extracts payment in a way that breaks the relevant inequality). In our language, such reserve-induced non-core effects appear as an additive gap δcore unless the reserve is explicitly incorporated into the benchmark as a seller value or outside option. This distinction matters for policy: a reserve that is justified as a platform-side value (e.g., to protect user experience or to reflect an opportunity cost of showing an ad) can be made compatible with the welfare interpretation by augmenting the objective, whereas a reserve that is merely a pricing heuristic will generally register as leakage. The same logic applies to quality adjustments. Quality scores can be welfare-enhancing when they are proxies for true value (e.g., expected clicks) and are reflected consistently in bids and payments; but if quality is used as a discretionary ranking knob without a corresponding pricing interpretation, the measured violations can increase even if short-run revenue improves. In this sense, δ̂(ε) offers a unifying ``exchange rate’’ between product choices and welfare guarantees: any design knob that makes outcomes harder to support by payments tends to move mass from the approximation factor into the leakage term.

From a platform-governance perspective, the main policy implication is that a mechanism can be evaluated not only by point estimates of revenue and click metrics, but also by a falsifiable stability-and-welfare diagnostic. In regulated or high-stakes settings, it is often infeasible to disclose full mechanism details, and it is equally infeasible for an external auditor to solve for equilibria in a learning, budget-constrained environment. Our framework points to a middle ground: audit the inequality that the welfare proof needs. Concretely, an auditor with access to logs (bt, xt, pt) and an agreed-upon deviation family can compute an empirical upper bound on the mechanism’s effective non-core behavior, and therefore an interpretable upper bound on the welfare degradation attributable to those deviations. This is not a substitute for full transparency, but it does provide a mechanism-agnostic test that is harder to game than purely outcome-based reporting, since it checks a structural relationship between allocations and payments.

At the same time, we emphasize two limitations that are intrinsic rather than merely technical. First, any tractable measurement must restrict the set of coalitions and deviations. Full core quantification is combinatorially intractable, and our diagnostic inherits this fact. The justification for restriction is that the welfare proofs only ever invoke certain coalitions Ut and certain deviations y. This makes the diagnostic rather than universal: a small measured violation over the tested family certifies the welfare guarantee , but does not certify cooperative stability in the classical sense. Second, the diagnostic is only as meaningful as the mapping from the implemented system to the logged primitives. In many ad markets, bids are the output of internal pacing and shading logic; payments may reflect multi-stage corrections; and the effective feasibility set 𝒳 may depend on latency and policy filters. If the logs omit the relevant constraints, then an apparent violation may reflect missing state rather than a true ``blocking gap.’’ Thus, for auditability, a key practical requirement is to define an : which filters and constraints are treated as part of feasibility (and therefore part of the benchmark), and which are treated as mechanism-induced distortions (and therefore contribute to leakage).

A further conceptual caveat is that our welfare guarantees are stated relative to liquid welfare benchmarks that treat budgets (and optionally ROI) as hard constraints. This is appropriate for many advertisers, but it is not a complete normative theory of welfare in ad markets. Budgets can be endogenous, driven by internal finance rules, and ROI targets can encode risk management rather than true preferences. Moreover, the liquid welfare objective does not directly internalize user-side or publisher-side externalities, nor does it capture distributional goals (e.g., supporting small advertisers) except insofar as these are represented by modified feasibility or value models. For policy discussions, the right reading is therefore conditional: that the platform has committed to this liquid-welfare benchmark as a proxy for advertiser value subject to constraints, measured core violation provides a disciplined way to bound welfare loss within that proxy.

These limitations point naturally to open problems. On the measurement side, an important direction is to characterize minimal deviation families 𝒴 that are simultaneously (i) sufficient for welfare charging arguments, and (ii) rich enough to detect the main sources of non-core behavior in modern auction stacks (reserves, throttling, quality filters, and multi-stage pipelines). On the theory side, it would be valuable to replace per-round approximate core with conditions tailored to repeated interaction, such as a or core notion that accounts for the fact that budgets couple rounds and that bidders learn. Another open question is robustness to strategic bid shading in the presence of pacing: our analysis treats effective bids as the relevant declared values, but in some systems bidders can manipulate the mapping from values to bids through their own learning layers. Finally, there is a mechanism-design question that is directly suggested by our accounting perspective: can we design allocation and pricing rules that explicitly minimize an empirical estimate of δ̂(ε) (subject to revenue or policy constraints), thereby turning ``approximate core’’ from a post hoc diagnostic into a design objective?

In summary, the policy message is not that platforms should mechanically optimize for core. Rather, the message is that approximate core violation provides a concrete, measurable intermediate quantity that links mechanism design choices to welfare guarantees in budget-constrained, learning-driven markets. When a platform tightens a reserve, changes a quality score, or introduces a heuristic allocation stage, it is also implicitly choosing a point on a tradeoff frontier: revenue and policy objectives on one axis, and core-consistent welfare certification on the other. Our framework supplies the language, and the logging-based estimators supply the measurement, to make that frontier explicit.