Many-Constraint Autobidding: Ex-Post Feasible Multi-KPI Pacing with Liquid-Welfare Guarantees without Convergence

Metadata

Total Words: 14,810
Export Date: 2026-01-16 01:37:36
Description: Online ad platforms in 2026 routinely expose multi-KPI campaign controls: budgets, ROAS/ROI targets, CPA caps, and related safety constraints. Existing theory for repeated auctions has established constant-factor efficiency (liquid welfare) without convergence for budget pacing, and more recently for joint budget+ROI constraints, but these results do not scale to the multi-constraint reality. We propose a clean repeated-auction model in which each advertiser faces a global budget and multiple ROI/CPA-type constraints tied to different predicted value streams (e.g., clicks, conversions, revenue), while bidding remains linear and uses only bandit feedback. We design a multi-multiplier pacing algorithm that generalizes the slack-tracking invariant of Lucier–Pattathil–Slivkins–Zhang: each constraint’s multiplier encodes cumulative slack, yet bids are chosen by the most conservative active constraint. We prove ex-post feasibility of all constraints (no violations on any realization). Our main efficiency theorem shows that when all advertisers run the algorithm in any core auction, the resulting expected liquid welfare over the entire learning trajectory is at least one-half of the optimal ex-ante liquid welfare, up to additive terms that scale polynomially with the number of constraints and ; the guarantee holds whether or not bids converge. We also provide an individual dynamic-regret guarantee relative to a generalized constraint-pacing benchmark. The results extend the ‘efficiency without convergence’ program from single-constraint and two-constraint settings to multi-KPI autobidding, providing a tractable theoretical foundation for modern campaign constraint interfaces and compliance requirements.

1. Introduction: Multi-KPI autobidding in 2026; why equilibrium/convergence is unrealistic; why liquid welfare remains the right aggregate metric under constraints.
2. Related work: budget pacing without convergence (Gaitonde et al.), budget+ROI (Lucier et al.), pacing equilibria, BwK impossibilities; highlight why multi-constraint welfare is technically nontrivial (switching/boundary errors).
3. Model: repeated core auctions; value streams for objective vs constraints; definition of multi-constraint liquid welfare; feedback model; discussion of what ‘constraints’ represent (ROI/CPA/ROAS).
4. Algorithm: Multi-KPI Slack-Tracking Pacing (MKP): multipliers per constraint, conservative bid aggregation, step sizes, and safe initialization.
5. Ex-post feasibility: invariants; monotonicity; proofs that all constraints are satisfied with probability 1 for any bid/auction outcome mapping satisfying p ≤ b x and core auction conditions.
6. Efficiency without convergence: define ex-ante benchmark; main liquid welfare theorem; develop the multi-boundary accounting device (e.g., smoothed-max epochs / amortization potential) to control switching errors across many constraints.
7. Individual guarantees: define generalized per-round constraint-pacing benchmark; dynamic regret bound depending on total path length across constraints; stationary corollary (best fixed feasible multiplier).
8. Extensions and limitations: approximate-core auctions; mild non-stationarity; notes on adding non-monetary constraints (frequency caps) and delayed value as future work.
9. Numerical study (optional): simulated multi-player self-play with multiple constraints and competing algorithms; constraint violations, welfare, and regret scaling; identify regimes where switching is prevalent.

Content

1. Introduction: Multi-KPI autobidding in 2026; why equilibrium/convergence is unrealistic; why liquid welfare remains the right aggregate metric under constraints.

By 2026, the modal large advertiser no longer ``sets bids’’ in any meaningful human sense. Instead, the advertiser configures an : a piece of software that ingests noisy, fast-moving signals about marginal value, converts these into an for each auction, and adapts continuously as delivery accumulates. What has changed in practice is not merely the scale of automation, but the of objectives and guardrails that the advertiser expects the system to respect. A typical campaign is tasked with maximizing one key performance indicator (KPI) while simultaneously honoring several accounting and attribution constraints: total spend must not exceed budget; return-on-investment (ROI) or cost-per-action (CPA) must remain within a specified range; and, increasingly, there are additional internal constraints that operationalize business or compliance requirements (for example, profitability by product line, incremental lift thresholds, or channel-specific efficiency). Our starting point is that the platform continues to run a per-round auction, but the economically relevant behavior is shaped by the of many adaptive constraint-handling algorithms rather than by a single static bidding rule.

This paper takes that multi-constraint reality seriously. We study a repeated auction environment in which each advertiser has an objective value stream (the KPI being maximized) and, for each of several ROI/CPA-type constraints, an associated ``constraint value stream’’ that measures the numerator of the constraint. These streams may be correlated within a round and are only observed through realized outcomes, reflecting the common situation in which advertisers learn about value and constraint satisfaction from their own deliveries and bills. The substantive question is then not whether the platform can compute an equilibrium for a stylized game, but whether the of (i) a reasonable auction rule and (ii) lightweight, implementable bidder-side updates can deliver allocations that are both (no budget or ROI violations) and (high aggregate value) over long horizons.

A natural response from classical auction theory is to ask for equilibrium predictions: what happens when each bidder best-responds, and do the resulting bids clear? In the autobidding setting, however, the premise required for equilibrium analysis is typically absent. Advertisers are not choosing bids as static actions; they are running online learning systems that update in response to feedback, and those updates operate on different time scales across advertisers. Even if the environment were stationary, the induced game is not: each agent’s algorithm changes its mapping from observed data to bids over time. In addition, the informational primitives that would support equilibrium reasoning are not present. An advertiser may understand its own attribution model and budget, but it seldom has a stable, accurate model of the joint distribution of other bidders’ values and constraints, and it certainly does not observe others’ constraint slacks. When values and conversion rates drift with seasonality, creative refreshes, or macroeconomic shocks, stationarity is at best an approximation, and convergence guarantees become a tenuous target.

In this context, insisting on equilibrium can be counterproductive for both theory and practice. On the theory side, equilibrium results often require restrictive assumptions on information, continuity, and best responses that do not map cleanly onto the bandit-feedback regime faced by bidders. On the practice side, platforms and advertisers need that hold under continual adaptation: budgets and ROI constraints are not aspirational long-run goals but hard constraints that finance teams expect to be satisfied on the realized history. It is therefore more informative to ask for guarantees that hold convergence: can we design bidder-side updates that keep constraints satisfied ex post, and can we still say something nontrivial about welfare when everyone is adapting simultaneously?

Our answer is affirmative. We analyze a multi-multiplier pacing approach that generalizes the familiar idea of ``pacing’’ a bid by a scalar to respect a budget. Here each advertiser maintains a collection of multipliers—one for the budget and one for each ROI/CPA-type constraint—and submits an effective bid that scales down the objective value by the (largest) active multiplier. Intuitively, each multiplier tracks the cumulative slack of its associated constraint: when spend exceeds a per-round target, the budget multiplier increases; when an ROI constraint is tight, its multiplier increases. Crucially, all multipliers update each round using only the advertiser’s realized allocation and payment, so the algorithm is implementable under bandit feedback. The main conceptual point is that we do not require the multipliers to converge, nor do we require the profile of bids to settle into a steady state. Instead, we exploit monotonicity properties of allocations and payments in the effective bid to show that the slack-tracking dynamics can enforce feasibility while still allowing the auction to allocate efficiently in aggregate.

To evaluate efficiency, we need an aggregate objective that respects the presence of constraints. Raw welfare—the sum of realized values ∑_k, tv_k, t⁽⁰⁾x_k, t—is not the right benchmark when bidders are financially constrained. A bidder with an enormous nominal value but a tiny budget cannot realize that value at scale; similarly, a bidder with strict ROI requirements cannot rationally spend into low-ROI impressions even if those impressions have high nominal KPI value. If we judge an outcome by raw welfare, we risk labeling as ``inefficient’’ an allocation that is actually the best achievable under the constraints that bidders face. This is not merely a modeling preference: in applied settings, budget and ROI constraints are binding precisely because they reflect real resource limits (cash flow, profitability, or risk tolerance). An efficiency concept that ignores them can recommend allocations that are infeasible for the participants and thus irrelevant for policy and mechanism design.

We therefore adopt as the appropriate efficiency yardstick in constrained environments, and we extend it to the multi-constraint setting. The guiding interpretation is simple: each advertiser’s contribution to welfare should be capped by what that advertiser can feasibly ``turn into’’ value given its financial and ROI restrictions. In the budget-only case, liquid welfare replaces an agent’s total realized value by the minimum of that value and the budget; the cap reflects the fact that beyond the budget the agent cannot pay and thus cannot credibly demand allocation. With ROI/CPA constraints, the effective cap is not only monetary but also : if an advertiser must maintain a minimum ratio between a constraint-specific value stream and payments, then its feasible scale is limited by the accumulated constraint value. The resulting multi-constraint liquid welfare aggregates these caps across advertisers, yielding a metric that is both economically meaningful and analytically tractable.

This welfare notion has two practical virtues. First, it respects heterogeneity in constraint tightness. Two advertisers may have the same objective KPI values but different ROI thresholds; liquid welfare credits the advertiser that can productively spend more under its constraints without imputing unattainable surplus to the stricter advertiser. Second, it aligns naturally with the platform’s operational goals. Platforms typically want an allocation that is ``as efficient as possible’’ subject to advertisers’ declared budgets and efficiency constraints, because repeated violations lead to churn, regulatory scrutiny, or contractual disputes. Liquid welfare formalizes this by measuring efficiency the feasible region carved out by those constraints.

The analytic challenge is then to connect a dynamic, non-convergent bidding process to a clean welfare guarantee. Our results proceed in an ex-ante benchmarking style that is common in modern analyses of repeated mechanisms: we compare the expected liquid welfare induced by the auction and bidders’ algorithms to the best expected liquid welfare achievable by a fixed, single-round allocation rule that has access to the value profile in that round. This benchmark is demanding in two ways. It does not presume that the actual dynamics settle down to any equilibrium; and it allows the benchmark to be welfare-maximizing in each round subject only to ex-ante feasibility caps induced by budgets and ROI constraints. The striking implication is that even when the bidders are learning and constraints are multiple, we can still guarantee a constant-factor approximation to this ex-ante optimum, up to a sublinear additive term that vanishes on a per-round basis.

A key modeling choice behind this guarantee is to focus on auctions with a strong welfare property, namely core-like deviation-proofness. While the precise definition belongs later in the paper, the economic idea is that the auction outcome is stable against coalitional deviations in a way that allows us to ``charge’’ losses in one place to payments collected elsewhere. This charging logic is particularly well-suited to environments with pacing: when a bidder shades its bid to satisfy constraints, welfare may fall relative to a fully truthful, unconstrained allocation, but the reduction can be bounded in terms of payments and thus amortized across the market. The contribution of the present work is to show that this amortization can survive the additional complexity introduced by constraints per advertiser, where the identity of the binding constraint can change over time. In practice, this change-of-regime phenomenon is exactly what one observes when an advertiser alternates between being budget-limited, ROI-limited, or unconstrained as demand conditions shift.

Why should we care about such a welfare guarantee, as opposed to merely proving constraint satisfaction? The reason is that feasibility alone can be achieved by trivial conservatism: one can always bid near zero and violate nothing. What matters for market design is the tradeoff between safety and performance. Our analysis makes this tradeoff explicit: multipliers act as shadow prices for constraints, and the learning rates govern how quickly bidders react to realized slack. When learning is too aggressive, a bidder can oscillate and risk overshooting a constraint; when learning is too timid, the bidder may underdeliver and forgo value. By proving that a simple slack-tracking update can achieve both ex-post feasibility and near-optimal liquid welfare, we provide a formal justification for the common industry practice of multiplier-based pacing, and we clarify how multi-KPI requirements change the nature of the efficiency loss.

There is also a policy-facing aspect to this perspective. As automated bidding systems become the default, regulators and advertisers increasingly ask whether outcomes are interpretable and whether constraints are honored in a verifiable way. A design that guarantees ex-post feasibility for budgets and ROI-type constraints is inherently more auditable than one that guarantees feasibility only in expectation or only at an equilibrium that may never be reached. Moreover, liquid welfare offers a principled way to talk about ``market health’’ when participants are constrained: it avoids conflating efficiency with the ability to pay, while still recognizing that constrained demand should not be treated as fully elastic. In settings where platforms must justify allocation decisions to large advertisers or to oversight bodies, being able to point to a welfare objective that internalizes the declared constraints can be valuable.

At the same time, we emphasize what our model does claim. We do not assume that advertisers report values truthfully, nor do we model the full complexity of attribution and measurement error; instead, we treat the realized value streams as primitives and study how an algorithm reacts to them under limited feedback. We also abstract from intertemporal complementarities in delivery (such as learning-to-target effects), and we adopt an i.i.d. structure over time to isolate the strategic and algorithmic issues created by multiple constraints. These simplifications are deliberate: our aim is to understand which guarantees are possible in a repeated auction when bidders are constrained and adaptive, not to provide a complete engineering model of ad delivery.

The remainder of the paper builds from this motivation to a concrete algorithmic and welfare analysis. We first formalize the multi-constraint pacing dynamics and show that, with appropriate initialization and step sizes, all constraints are satisfied on every realized history despite bandit feedback. We then connect these feasibility guarantees to expected liquid welfare, showing that the market-level outcome retains a robust fraction of the best ex-ante feasible welfare even without convergence of bids or multipliers. Finally, we complement the welfare perspective with individual-level learning guarantees, bounding each advertiser’s regret relative to a natural per-round pacing benchmark that captures time variation in optimal constraint shadow prices. In combination, these results suggest a coherent economic message: in multi-KPI autobidding markets, convergence is an unrealistic desideratum, but constraint-respecting efficiency—measured appropriately via liquid welfare—remains both meaningful and attainable.

Our work sits at the intersection of (i) budget pacing in repeated auctions, (ii) ROI/CPA-constrained autobidding, (iii) equilibrium models of paced bidding, and (iv) online learning with knapsack-type constraints. The common theme is that advertisers face intertemporal feasibility constraints but interact through per-round auctions. The point of departure in the present paper is the combination of ROI/CPA-type constraints per advertiser with a welfare objective that is explicitly . This combination turns out to be technically nontrivial even when one restricts attention to simple multiplier updates and to auctions with strong welfare properties.

A foundational line of work studies ``pacing’’ as a bidder-side control problem in repeated auctions, where the bidder adjusts a scalar multiplier to keep cumulative spend within a budget. In these models, the bidder typically observes only its realized payment and allocation each round, and the goal is to satisfy the budget constraint while obtaining high value. A key conceptual advance is that one can seek guarantees that hold assuming that the induced dynamics converge to a stationary point. This is particularly appropriate for ad markets, where the distribution of opportunities and competition changes frequently and bidder-side learning systems update continuously.

Gaitonde et al. provide an influential analysis in this direction. They formalize budget pacing as an online learning problem in repeated auctions and show that a simple update of a budget multiplier can guarantee ex-post budget feasibility together with a constant-factor welfare (or value) guarantee relative to an ex-ante benchmark, even when all bidders are simultaneously adapting. Importantly, the proofs do not rely on convergence to a Nash equilibrium; instead, they exploit monotonicity of payments in the multiplier and a charging argument that relates welfare loss from bid shading to payments collected in the market. Our analysis adopts the same high-level stance—robust performance guarantees in nonstationary, nonconvergent environments—and extends it along two axes: multiple constraints per bidder and a welfare notion that must internalize those constraints.

Lucier et al. study an autobidding model with both a budget constraint and an ROI (or CPA) constraint, together with bandit-style feedback. Their algorithm maintains two multipliers (one for budget and one for ROI) and bids using the more conservative of the two. A central technical idea is a invariant: each multiplier encodes the cumulative slack of its corresponding constraint, and safe initialization plus appropriately small step sizes prevent the cumulative slack from becoming negative, yielding ex-post feasibility. On the efficiency side, Lucier et al. connect this bidder-side control to market-level welfare in a repeated auction, again without requiring convergence.

Our multi-KPI'' setting generalizes this budget+ROI framework by allowing an arbitrary number of ROI/CPA-type constraints per advertiser. At a high level, one might expect the extension to be straightforward: maintain a multiplier for each constraint and bid according to the maximum. The subtlety is that the welfare proof becomes materially harder. With two constraints, there are only a fewregimes’’ (budget-binding, ROI-binding, or neither), and one can control the number of times the dynamics hover near the boundary where the identity of the binding constraint switches. With many constraints, boundary behavior proliferates: multiple multipliers can be near-tied, and the process can switch frequently among them. Bounding the welfare loss attributable to such switching requires new amortization ideas beyond simply union bounding over constraints.

A large literature models pacing as an equilibrium phenomenon: bidders choose a pacing multiplier (or an equivalent shading factor) so that, in expectation, they spend their budget across a distribution of auctions. In the simplest versions, each bidder draws a value per auction, bids a shaded value, and the multiplier is chosen to make expected spend match the budget. This leads to fixed-point characterizations of ``pacing equilibria’’ and comparative statics in terms of budgets, value distributions, and competition . These analyses are valuable for understanding steady-state market outcomes and for platform-side forecasting.

Our focus is different in two respects. First, the equilibrium assumption is often not a good approximation for modern autobidding, where the mapping from signals to bids is updated continuously and bidder populations are heterogeneous in their learning and constraint-management rules. Second, even when a fixed point exists, it does not by itself provide the operational guarantee advertisers demand: that budgets and ROI constraints hold on the realized trajectory. The welfare and feasibility guarantees we pursue are therefore ``pathwise’’ (ex post) and robust to nonconvergence, rather than asymptotic properties of a stationary equilibrium.

That said, the equilibrium literature informs our modeling choices. In particular, the use of a scalar multiplier as a shadow price for feasibility constraints is common across both equilibrium and learning-based approaches. One can interpret our multipliers as time-varying approximations to the Lagrange multipliers that would arise in a (stochastic) constrained optimization formulation. The point is not that the multipliers converge to these Lagrange values, but that the shadow-price interpretation continues to organize the economic logic: higher multipliers correspond to tighter constraints and therefore more conservative bidding.

From the perspective of online learning, budget pacing resembles ``bandits with knapsacks’’ (BwK): an agent repeatedly chooses actions, observes stochastic rewards and resource consumption, and must keep cumulative consumption within a budget . The BwK framework yields powerful algorithms and regret guarantees, but it also highlights fundamental limitations. In adversarial or highly nonstationary environments, one cannot in general guarantee both high reward and strict constraint satisfaction without additional structure; moreover, many BwK guarantees are in expectation and allow violations with small probability, which can be unacceptable for hard financial constraints.

The repeated-auction setting differs from canonical BwK in ways that cut both for and against strong guarantees. On the one hand, the bidder does not control the mapping from bids to outcomes; allocations and payments are generated by an auction mechanism and depend on others’ bids. On the other hand, auctions often provide monotonicity and single-parameter'' structure: increasing one's bid (or decreasing one's multiplier) tends to weakly increase allocation and payment. This structure can substitute for stochastic optimization assumptions and can enable ex-post feasibility via slack tracking. Our analysis can be read as identifying conditions under which the auction environment iswell-behaved enough’’ that the strongest form of feasibility (no violations on any realized history) is compatible with meaningful welfare guarantees.

The multi-constraint setting also connects to generalizations of BwK with multiple resources. While the algorithmic literature provides methods for handling multiple knapsack constraints, transferring these guarantees to strategic multi-agent auctions is not immediate. The key difficulty is that resource consumption is endogenous to other agents’ learning and to the mechanism, and the bidder receives only bandit feedback about its own outcomes. Our approach therefore follows the pacing tradition more closely than the general BwK template: rather than solving a global primal-dual program each round, each agent runs simple per-constraint multiplier updates and relies on monotonicity plus safe bounds to ensure feasibility.

A recurring theme in constrained mechanism design is that unconstrained welfare is not the right yardstick when agents face hard budgets. Liquid welfare was introduced to address this issue by capping each agent’s contribution by its budget, yielding an efficiency notion that respects financial feasibility . Subsequent work studies approximation mechanisms and price-of-anarchy guarantees under liquid welfare in one-shot auctions and games . In repeated settings, ex-ante versions of liquid welfare provide a natural benchmark: one compares an algorithm’s expected outcome to the best single-round allocation rule subject to ex-ante caps induced by budgets.

Our contribution is not to introduce a new welfare concept from scratch, but to extend this logic to environments with ROI/CPA-type constraints per agent, where the effective feasibility cap is no longer purely monetary. Intuitively, an ROI constraint limits how much value-denominated ``numerator’’ the bidder can generate per unit of spend, so the bidder’s feasible scale is capped by whichever constraint is tightest on the realized history. This creates a structure in the agent’s liquid value, and it is precisely this minimum that complicates welfare analysis under learning dynamics: the binding constraint can switch across time in response to both realized outcomes and the agent’s own conservative bidding.

It is helpful to isolate what changes when we move from one ROI constraint to many. At the feasibility level, a natural algorithmic rule suggests itself: maintain a multiplier for each constraint and bid according to the largest multiplier. Safe initialization and slack-tracking updates can be shown to preserve feasibility constraint-by-constraint, essentially because each constraint’s multiplier only increases when its slack is negative. While there are additional bookkeeping issues (e.g., bounding all multipliers simultaneously), the feasibility proof largely scales by repeating the two-constraint argument for each constraint.

The welfare analysis, however, is not a straightforward repetition. The reason is that welfare proofs in pacing models typically partition time into regimes: rounds where the bidder is high-bidding'' (multiplier small) and rounds where some constraint binds and the bidder shades its bid. In the two-constraint case, one can control the measure of boundary rounds where the identity of the binding constraint changes, and one can charge welfare loss in conservative rounds to either accumulated slack or to payments in aggressive rounds. With many constraints, there can be many near-ties: several multipliers can be close, and the identity of the maximum can change frequently due to noise in realized payments and constraint values. Each such switch can create aboundary error’’ round in which the welfare accounting is loose: the bidder might be bidding according to constraint j while constraint j^′ is almost as tight, causing the charging argument to incur an additive loss.

At a high level, the multi-constraint problem thus resembles a multi-expert switching process driven by noisy gradients: the algorithm effectively selects the maximum among several evolving multipliers, and the welfare bound must control the cumulative cost of switching and near-switching. This is where the technical novelty lies. One needs a potential argument that (i) is compatible with bandit feedback and (ii) bounds the total number (or total ``mass’’) of rounds in which the maximum is not well-separated. Our approach develops such an amortization, yielding a welfare loss that grows polynomially (rather than exponentially) with the number of constraints, and remains sublinear in the time horizon.

The broader autobidding literature includes many practical systems that handle multiple objectives and constraints via heuristic combinations of multipliers, calibration layers, and prediction models. Our aim is to provide a clean economic and algorithmic account of when such a multi-multiplier approach can be justified by rigorous guarantees. The price we pay is abstraction: we treat the per-round value streams as primitives, we focus on i.i.d. time structure to isolate the learning-vs.-feasibility tradeoff, and we work with auctions that satisfy strong welfare properties. These assumptions are common in the pacing literature, but they delimit the scope of the results. In particular, extending comparable welfare guarantees to nonstationary value distributions, richer feedback models (e.g., delayed conversions), or weaker auction properties remains an important direction for future work.

In summary, our paper can be viewed as a step from budget pacing'' andbudget+ROI pacing’’ to a setting that more closely matches current advertiser practice: multiple KPI-aligned guardrails enforced simultaneously by a single autobidder. The key message, consistent with the recent nonconvergence literature, is that one can obtain robust feasibility and meaningful welfare guarantees even when all agents are learning, provided one chooses an efficiency benchmark that internalizes the constraints and develops proof techniques that control the boundary effects created by multi-constraint switching.

3. Model: repeated core auctions; value streams for objective vs constraints; definition of multi-constraint liquid welfare; feedback model; discussion of what ‘constraints’ represent (ROI/CPA/ROAS).

We model an advertising market as a repeated auction environment in which each advertiser delegates bidding to an autobidder that must simultaneously manage a and multiple . The purpose of this section is to separate the economic primitives (values, constraints, and feasibility) from the algorithmic choice of how to bid and learn. Doing so clarifies what information the bidder actually uses, what is being optimized, and what efficiency benchmark is appropriate once feasibility is taken seriously.

Time is discrete with rounds t ∈ {1, …, T}. In each round, the platform runs an auction and chooses an allocation vector
x_t = (x_1, t, …, x_n, t) ∈ X ⊆ [0, 1]ⁿ,
where x_k, t represents the (possibly fractional) quantity of the opportunity awarded to advertiser k in round t. The set X is assumed : if x ∈ X and 0 ≤ x^′ ≤ x coordinatewise, then x^′ ∈ X. This captures common environments such as a single indivisible impression (where X is the simplex), multiple identical units, or position auctions with fractional click-through scaling, and it provides the monotonicity structure needed for pacing-style arguments.

Each advertiser k submits a nonnegative effective bid b_k, t ≥ 0, yielding bid profile b_t. The auction then chooses an allocation x_t and payments p_t = (p_1, t, …, p_n, t) with p_k, t ≥ 0. We assume the platform runs a in each round. Informally, this means: (i) the allocation is welfare-maximizing with respect to the submitted bids (so the mechanism allocates to maximize ∑_kb_k, tx_k, t over X), and (ii) the resulting payments are (so no coalition of bidders can profitably deviate, given the mechanism’s outcome). Core-selecting auctions are used in practice in several ad and procurement settings precisely because they combine strong welfare properties with stability against coordinated deviations.

For the feasibility results that follow, we additionally use a standard no-overbidding-type condition on the realized outcome:

This inequality holds in many common auction formats (e.g., first-price and certain core-selecting implementations) and has a direct economic interpretation: per unit allocated, the bidder never pays more than its submitted per-unit bid. Condition is the bridge between bid shading (via multipliers) and hard ex-post feasibility.

A distinctive feature of modern autobidding is that a single bidder may care about several KPIs simultaneously. We represent this by allowing multiple per advertiser in each round:
{v_k, t⁽⁰⁾, v_k, t^(j) for j ∈ ℛ_k},
where v_k, t⁽⁰⁾ ∈ [0, v̄] is the advertiser’s per-unit value for its KPI (the numerator of what the advertiser aims to maximize), and each v_k, t^(j) ∈ [0, v̄] is a per-unit value associated with constraint j. We emphasize two modeling choices that match practice.

For example, an advertiser might optimize expected purchases (objective) but impose a return-on-ad-spend (ROAS) constraint using predicted revenue, or impose a cost-per-acquisition (CPA) constraint using predicted conversions. Even when these quantities are related, they are not identical: they can be computed from different models, can have different calibration, and can respond differently to targeting and auction selection. Our formulation allows the objective stream v⁽⁰⁾ and constraint streams v^(j) to be distinct and correlated in arbitrary ways.

We assume the per-round value profile is drawn i.i.d. over time from a distribution F (the components may be arbitrarily correlated across advertisers and across streams within a round). The i.i.d. assumption is not meant to deny nonstationarity in ad markets; rather, it isolates the central technical issue we study: how a bidder can guarantee and obtain meaningful welfare guarantees even when all bidders are learning and outcomes are endogenous to the market interaction. Extending the welfare analysis to richer forms of nonstationarity is an important direction, but the i.i.d. baseline already captures substantial within-round complexity and strategic interaction.

Given an allocation x_k, t, the realized objective contribution for advertiser k in round t is v_k, t⁽⁰⁾x_k, t, and the realized constraint-numerator contributions are v_k, t^(j)x_k, t. The bidder does directly choose x_k, t; it is induced by the auction through the submitted bid.

Advertiser k faces two classes of intertemporal feasibility constraints.

Total spend cannot exceed a hard budget B_k:

We denote the per-round budget rate by ρ_k := B_k/T, which serves as the natural scale at which a pacing algorithm tries to regulate spending.

For each j ∈ ℛ_k, the advertiser must satisfy a linear inequality of the form

This template captures standard ratio guardrails after normalization. For instance:
(i) a ROAS constraint revenue / spend $\ge r$'' corresponds to \eqref{eq:roi} with $v^{(j)}$ equal to revenue-per-unit and $\gamma_{k,j}=r$; (ii) an ROI constraint of the formprofit / spend ≥ r’’ can be represented similarly if the numerator stream tracks profit-per-unit;
(iii) a CPA constraint spend / conversions $\le c$'' is equivalent to(1/c) ⋅ spend ≤ conversions’’ and can be fit into by scaling the constraint stream (or, equivalently, by redefining γ after a change of units). The key point is that expresses that, over the horizon, at a constraint-specific exchange rate γ_k, j.

In practice, advertisers often maintain multiple guardrails simultaneously (e.g., a minimum ROAS plus a maximum CPA, or separate CPA constraints for different conversion types). This motivates allowing |ℛ_k| to be arbitrary and advertiser-specific.

Finally, we impose a standard per-round cap:

This is not merely a technical convenience. Autobidders are typically constrained (either by design or by advertiser policy) to avoid bidding above their modeled value for the optimized KPI. As we discuss below, the bidder will still shade bids below v⁽⁰⁾ to ensure long-run feasibility; prevents pathological ``overbidding to catch up’’ behavior.

Advertiser k’s unconstrained objective is to maximize total objective value
$$ \sum_{t=1}^T v^{(0)}_{k,t} x_{k,t}, $$
subject to the feasibility constraints –. Because the auction outcome depends on all bidders’ actions, one should not expect to exactly optimize this objective in a strategic environment. Our focus is therefore on (i) feasibility guarantees that hold on the realized path and (ii) market-level welfare benchmarks that incorporate those constraints.

A central modeling step is to evaluate allocations using . For a realized allocation sequence x = (x₁, …, x_T) and realized values v, we define

Liquid welfare is then $W(x,v) := \sum_{k=1}^n W_k(x,v)$.

The economic content of is straightforward: advertiser k cannot turn arbitrarily large allocations into utility because scaling up allocation also scales up spend, and spend must be supported by both money (budget) and KPI numerator (ROI/CPA). The effective ``cap’’ on feasible scale is therefore the tightest among these limits. When there is only a budget, reduces to standard liquid welfare; with multiple ROI/CPA guardrails, the cap becomes the minimum across constraints, reflecting that any one binding guardrail can throttle feasible spending.

This welfare notion also matches the practical objective of an autobidder operating under hard guardrails: a campaign is not successful if it generates high raw value while violating a CPA or ROAS target on the realized trajectory. The liquid benchmark explicitly discounts value that could only be obtained by violating constraints.

Each round proceeds as follows. First, the value streams for all advertisers are realized. Advertiser k observes its own realizations {v_k, t⁽⁰⁾, v_k, t^(j)}_{j ∈ ℛ_k} and then submits an effective bid b_k, t (a single number) to the auction. The platform computes x_t and charges payments p_t. After the round, advertiser k observes its own realized outcome (x_k, t, p_k, t) together with its own value streams (equivalently, it can compute the realized constraint ``slacks’’ γ_k, jp_k, t − v_k, t^(j)x_k, t).

Crucially, we assume : each advertiser sees only its own outcomes, not other bidders’ values, bids, allocations, or payments. This matches how real autobidders operate. Even when platforms provide reporting, it is typically aggregated, delayed, and noisy relative to the per-auction dynamics. Our feedback assumption is therefore deliberately austere: any learning or feasibility guarantee should be achievable without requiring market-wide observability.

We also note a limitation. Many constraint numerators (e.g., conversions, revenue) can be delayed relative to the auction. Modeling delay would require additional state variables and would generally weaken ex-post guarantees, since the bidder cannot immediately tell whether it is violating a CPA/ROAS target. We abstract from this and treat v_k, t^(j) as observed at the end of round t, which is appropriate when constraints are enforced using predicted (or otherwise promptly observed) KPI numerators.

Because values are i.i.d. over time, a natural welfare benchmark is : one compares the induced allocation sequence rule (generated by the auction coupled with bidders’ adaptive bidding) to the best allocation rule that, in expectation, respects the same feasibility caps. Formally, if y(⋅) maps a realized per-round value profile to an allocation in X, then its expected multi-constraint liquid welfare takes the form
$$ \sum_{k=1}^n T \cdot \min\left\{\, \rho_k,\; \min_{j \in \mathcal R_k} \frac{1}{\gamma_{k,j}} \mathbb{E}_{v \sim F}\!\left[y_k(v) v^{(j)}_k\right] \right\}, $$
which mirrors at the per-round rate level. Our main results will show that there exists a simple bidder-side learning rule that (i) satisfies and on every realized history and (ii) when run by all bidders in a core auction yields a constant-factor approximation to this ex-ante benchmark up to a sublinear-in-T additive term.

The remaining task is algorithmic: to specify how an advertiser converts multiple constraints into a single bid each round, using only its bandit feedback, while preserving hard feasibility and enabling welfare accounting at the market level. We turn to this next.

4. Algorithm: Multi-KPI Slack-Tracking Pacing (MKP): multipliers per constraint, conservative bid aggregation, step sizes, and safe initialization.

We now specify the bidder-side procedure that turns many intertemporal guardrails into a single effective bid each round. The design goal is deliberately stringent: the autobidder should (i) require only its own realized feedback, (ii) submit a scalar bid to the auction, and yet (iii) enforce constraints on the realized trajectory without relying on convergence of market dynamics. The algorithm we study is a direct multi-constraint generalization of slack-tracking pacing, with one multiplier per constraint and a conservative aggregation rule that determines the bid.

Fix an advertiser k and suppress k when unambiguous. MKP maintains a separate nonnegative Lagrange-multiplier-like variable for each feasibility constraint:
μ_t^B (budget multiplier), μ_t^R, j (ROI/CPA multiplier for each j ∈ ℛ).
These multipliers should be read as . When the budget is running ``hot’’ (spend above the per-round rate), μ_t^B grows and the bidder shades bids down; when an ROI constraint is underperforming (spend not sufficiently supported by its numerator stream), the corresponding μ_t^R, j grows and also shades bids down.

The key modeling constraint is that the auction accepts only a single bid b_t. We therefore define an multiplier by a conservative aggregation:

Given μ_t, the effective bid is a multiplicatively paced version of the objective value:

This bid automatically respects the per-round cap b_t ≤ v_t⁽⁰⁾ and is monotone in the conservative multiplier: larger μ_t implies weaker bidding and therefore (under standard monotonicity of welfare-maximizing allocations and the payment upper bound ) weakly lower spending.

The max-aggregation encodes a simple economic principle: . Put differently, if any guardrail is close to violation, we want today’s bid to be consistent with that guardrail even if other guardrails currently have slack. This is the sense in which MKP is ``safe by design’’: it is conservative in the direction that protects feasibility.

After bidding , the auction returns an allocation x_t and payment p_t. The bidder then updates every constraint multiplier using its realized one-step . Define the budget slack increment and ROI slack increments as
g_t^B := p_t − ρ, g_t^R, j := γ_jp_t − v_t^(j)x_t for each j ∈ ℛ,
where ρ = B/T is the per-round budget rate (for advertiser k) and we have again suppressed k for readability. Positive g means we overspent relative to the budget rate'' orwe violated the ROI inequality in this round,’’ while negative g means we accrued slack.

MKP performs additive updates with constraint-specific step sizes:

Two comments are important for interpretation.

Equations – coincide with (stochastic) gradient ascent on the Lagrangian of the constrained problem, viewing (p_t − ρ) and (γ_jp_t − v_t^(j)x_t) as subgradients of constraint violations. In many online primal–dual methods, such updates are analyzed via convergence of time averages. Our objective is different: we use the same algebraic structure to obtain feasibility guarantees, and later to support welfare accounting even when no limit behavior exists.

The right-hand sides of – depend only on (p_t, x_t) and the bidder’s own value streams. The bidder does need to estimate counterfactual allocations under alternative bids, nor observe other bidders. This matters in practice because autobidders can reliably measure their own spend and attributed KPI numerators (or their predictions), while market-wide observability is typically infeasible.

Algorithmically, MKP can be summarized as follows.

With multiple constraints, one might wonder why we do not combine multipliers additively (as in a classical Lagrangian), e.g., bid with μ_t = μ_t^B + ∑_jμ_t^R, j. The reason is that the bidder has only one control (the scalar bid), while feasibility must hold against a across constraints. In particular, the liquid-value cap is min {B, min_j(⋅)}, and ex-post feasibility requires satisfying inequality. When constraints are coupled through a single bid, is the most robust reduction: if any constraint is tight, we want it to dominate the shading decision. Additive aggregation can perversely ``average’’ constraints, bidding too aggressively when one constraint is nearly binding but others are slack.

The max rule also yields a clean ``regime’’ structure: in any round, there is an active constraint (possibly more than one in the case of ties) that determines μ_t. This regime viewpoint will later be useful in the welfare proof, where we account separately for rounds in which the budget is effectively pacing versus rounds in which an ROI guardrail is effectively pacing.

The updates – can in principle drive multipliers upward or downward depending on realized slack. For feasibility, it is crucial that multipliers start in a and that step sizes are small enough that the process cannot ``jump’’ past safety thresholds.

We initialize at the conservative levels

These choices have an intuitive interpretation. If μ^B = v̄/ρ − 1, then b_t ≤ v_t⁽⁰⁾/(1 + μ^B) ≤ v̄/(1 + μ^B) = ρ, so the bid is never above the per-round budget rate scale. Similarly, if μ^R, j = γ_j − 1, then b_t ≤ v_t⁽⁰⁾/γ_j, i.e., the bidder behaves as though each unit must clear an exchange rate consistent with ROI strictness γ_j. Under the payment upper bound p_t ≤ b_tx_t, these initializations are the algebraic starting point for an inductive argument that cumulative violations cannot occur.

We will impose the following sufficient conditions (advertiser-specific, constraint-specific):

Economically, these inequalities ensure that a single unusually large payment (or unusually poor ROI realization) cannot force the algorithm to take a destabilizing step. Technically, they will guarantee two related properties: (i) multipliers remain uniformly bounded above by their safe levels in , and (ii) the cumulative slacks tracked by – never become negative in a way that would correspond to a realized constraint violation.

We allow μ_t^B and μ_t^R, j to drift below 0 in principle; this simply represents that the bidder has accumulated slack and could bid more aggressively. However, the actual bid uses μ_t = max {⋅, 0}, so the bidder never amplifies its bid above v_t⁽⁰⁾. This separation is convenient: multipliers can be treated as unconstrained slack-tracking state variables, while the bid remains economically interpretable and respects the per-round cap by construction.

MKP is best viewed as a conservative controller: it treats each guardrail as generating a ``pressure’’ (a multiplier), and then bids according to the strongest pressure. This is consistent with how practitioners often describe multi-goal autobidding systems: .

Two limitations are worth emphasizing. First, MKP relies on the payment upper bound p_t ≤ b_tx_t to connect bid shading to realized spend; if the auction format can charge above bid-per-unit (e.g., due to certain nonstandard pricing rules), one must either modify the algorithm (e.g., incorporate an upper-confidence correction) or accept weaker guarantees. Second, the ROI updates use v_t^(j)x_t as the realized numerator contribution in round t. This is appropriate when constraint numerators are promptly observable or reliably predicted at bidding time; with delayed or highly noisy attribution, one would need additional machinery (e.g., delayed-feedback OCO) and would generally lose the strongest ex-post statements.

With the algorithm now specified, the next step is to show that these simple updates indeed enforce constraints on every realized history. The proof will formalize the slack-tracking intuition: once a constraint approaches tightness, its multiplier becomes (and remains) large enough that the realized one-step slack cannot be negative, which inductively rules out cumulative violations.

5. Ex-post feasibility: invariants; monotonicity; proofs that all constraints are satisfied with probability 1 for any bid/auction outcome mapping satisfying p ≤ b x and core auction conditions.

We now formalize the ``safety’’ claim implicit in slack-tracking: MKP enforces intertemporal constraints . The point is not merely that violations vanish in expectation or in the limit, but that they do not occur at all (with probability 1 over the realized values), even though the bidder observes only its own bandit feedback and the market need not converge.

The ex-post feasibility proof is intentionally modular: it relies only on a small set of per-round inequalities that link bids to payments and to the KPI numerators used in ROI constraints.

For each advertiser and round, the auction outcome satisfies

This is a standard property in many position auctions and welfare-maximizing core mechanisms with per-unit pricing: the bidder is never charged more than its bid per unit times the allocated amount.

For each ROI/CPA constraint j ∈ ℛ, the realized per-unit ``numerator value’’ dominates the objective per-unit value:

Economically, is a calibration requirement: the stream v^(j) used in the constraint should be measured in the same (or a more conservative) units than the value used to form bids. In practice, platforms often implement ROI/CPA guardrails by choosing a constraint numerator that is deliberately conservative relative to the objective (e.g., predicted attributed value net of adjustments), precisely to ensure that satisfying the algorithmic inequality implies satisfying the business constraint.

If instead one has v_t^(j) ≥ α_jv_t⁽⁰⁾ for known α_j ∈ (0, 1], then all statements below extend by replacing the safe ROI level γ_j − 1 with γ_j/α_j − 1. We keep to match the initialization .

Notably, the property of the auction is not used for feasibility; it will enter only in the efficiency argument in the next section. Likewise, we do not require convergence or any stationarity beyond the realized per-round inequalities above.

The multiplier updates – imply an exact accounting identity: each multiplier is the initial safe level plus a scaled cumulative slack. For any prefix length t,

Thus, to prove the inequalities, it suffices to prove a on the multipliers: if we can show that μ_t^B ≤ μ₁^B and μ_t^R, j ≤ μ₁^R, j = γ_j − 1 for all t, then the corresponding cumulative slacks in – must be nonpositive for every prefix, which is exactly ex-post feasibility.

This is the sense in which slack-tracking is an design: rather than proving convergence of (μ_t), we prove that the process cannot cross a safety boundary.

Although our proofs below are algebraic, they embody a simple monotonicity logic. The max-aggregation rule ensures that the realized bid b_t is the bid that would be implied by any individual constraint multiplier considered in isolation. Formally, since μ_t ≥ μ_t^B and μ_t ≥ μ_t^R, j,

Therefore, whenever we upper bound a violation term using only b_t (via ), the max rule can only help: bidding ``more conservatively than necessary’’ cannot create a payment explosion, and under it cannot create an ROI shortfall either, because a sufficiently small bid makes each unit purchased cheap enough to satisfy γ_jp ≤ v^(j)x on that round.

We next show that, under the step-size conditions , each multiplier is near its safe boundary. Intuitively, as a multiplier approaches its safe level from below, the induced bid becomes so conservative that the corresponding one-step slack becomes nonpositive (or, more generally, too small to push the multiplier across the boundary in one step).

Assume and η^B ≤ 1/v̄. Then along every realized history,

Fix a round t and suppose μ_t^B ≤ μ₁^B. By and ,
$$ p_t \;\le\; b_t x_t \;\le\; b_t \;\le\; \frac{v^{(0)}_t}{1+\mu^B_t} \;\le\; \frac{\bar v}{1+\mu^B_t}. $$
Hence

By the definition of μ₁^B = v̄/ρ − 1, we have ρ = v̄/(1 + μ₁^B), so the right-hand side of equals
$$ \frac{\bar v}{1+\mu^B_t} - \frac{\bar v}{1+\mu^B_1}. $$
The function f(μ) = v̄/(1 + μ) has derivative |f^′(μ)| = v̄/(1 + μ)² ≤ v̄, so by the mean value theorem,
$$ \frac{\bar v}{1+\mu^B_t} - \frac{\bar v}{1+\mu^B_1} \;\le\; \bar v(\mu^B_1-\mu^B_t). $$
Combining with the update μ_t + 1^B = μ_t^B + η^B(p_t − ρ) yields
μ_t + 1^B ≤ μ_t^B + η^Bv̄(μ₁^B − μ_t^B) ≤ μ_t^B + (μ₁^B − μ_t^B) = μ₁^B,
where we used η^B ≤ 1/v̄. This proves the inductive step; the base case holds by initialization. ▫

Fix j ∈ ℛ. Assume , , and η_j^R ≤ 1/v̄. Then along every realized history,

Fix t and suppose μ_t^R, j ≤ γ_j − 1. Using and ,

where the last inequality uses v_t^(j) ≥ v_t⁽⁰⁾. Rearranging,
$$ \gamma_j \frac{v^{(0)}_t}{1+\mu^{R,j}_t} - v^{(0)}_t \;=\; v^{(0)}_t \left(\frac{\gamma_j}{1+\mu^{R,j}_t}-1\right) \;=\; v^{(0)}_t \cdot \frac{\gamma_j-1-\mu^{R,j}_t}{1+\mu^{R,j}_t} \;\le\; \bar v \,(\gamma_j-1-\mu^{R,j}_t), $$
using x_t ≤ 1 and 1 + μ_t^R, j ≥ 1. Therefore,
γ_jp_t − v_t^(j)x_t ≤ v̄(γ_j − 1 − μ_t^R, j).
Plugging into the update μ_t + 1^R, j = μ_t^R, j + η_j^R(γ_jp_t − v_t^(j)x_t) gives
μ_t + 1^R, j ≤ μ_t^R, j + η_j^Rv̄(γ_j − 1 − μ_t^R, j) ≤ μ_t^R, j + (γ_j − 1 − μ_t^R, j) = γ_j − 1,
where we used η_j^R ≤ 1/v̄. Again the base case is the initialization μ₁^R, j = γ_j − 1. ▫

We can now state the pathwise feasibility conclusion.

Under and , and with step sizes satisfying , MKP guarantees for every realized sequence of values and auction outcomes that
$$ \sum_{t=1}^T p_t \;\le\; B, \qquad \gamma_j \sum_{t=1}^T p_t \;\le\; \sum_{t=1}^T v^{(j)}_t x_t \quad \forall j\in\mathcal R. $$
Equivalently, constraints hold with probability 1 over the randomness in the i.i.d. draws.

By Lemma~1, μ_T + 1^B ≤ μ₁^B. Using and η^B > 0,
$$ \sum_{t=1}^{T} (p_t-\rho) \;=\; \frac{\mu^B_{T+1}-\mu^B_1}{\eta^B} \;\le\; 0, $$
so $\sum_{t=1}^T p_t \le T\rho = B$. Similarly, by Lemma~2 for each j,
$$ \sum_{t=1}^{T} (\gamma_j p_t - v^{(j)}_t x_t) \;=\; \frac{\mu^{R,j}_{T+1}-\mu^{R,j}_1}{\eta^{R}_j} \;\le\; 0, $$
and since μ₁^R, j = γ_j − 1, we obtain $\gamma_j \sum_{t=1}^T p_t \le \sum_{t=1}^T v^{(j)}_t x_t$ for all j. These arguments are deterministic for any realized path, hence the ``probability 1’’ statement. ▫

Two interpretive points are worth emphasizing.

The proposition is not a statement about limiting behavior: multipliers may oscillate, constraints may alternate in being ``active,’’ and the market may never settle. Feasibility is nevertheless guaranteed because MKP enforces an upper barrier on each constraint-specific multiplier, and the multiplier is an affine transform of cumulative slack. This is precisely the operational notion of a guardrail: it must hold on the realized trajectory, not just on average.

The role of is to make the safe ROI level γ_j − 1 meaningful: when μ_t ≥ γ_j − 1, we have b_t ≤ v_t⁽⁰⁾/γ_j ≤ v_t^(j)/γ_j, and then implies γ_jp_t ≤ v_t^(j)x_t . If the ROI numerator stream is noisier, delayed, or systematically smaller than the bidding value stream, then one must either (i) adjust the safe level as noted earlier, or (ii) accept weaker, typically expectation-based, feasibility notions. We view this as a substantive modeling choice rather than a technicality: ex-post safety is only as strong as the measurement system implementing the constraint.

Having established that MKP is safe on every realized history, we can now turn to the efficiency question: even without convergence, does conservative constraint-tracking preserve welfare relative to a suitable ex-ante benchmark, and how do we control the additional error created by switching among many potentially binding constraints?

6. Efficiency without convergence: define ex-ante benchmark; main liquid welfare theorem; develop the multi-boundary accounting device (e.g., smoothed-max epochs / amortization potential) to control switching errors across many constraints.

Our next goal is to understand whether slack-tracking is a safety device or whether it also preserves allocative efficiency. The subtlety is that MKP need not converge: the identity of the binding constraint for a bidder may change repeatedly over time, and the resulting bids may be conservative in ways that are not well approximated by any fixed multiplier. Nevertheless, because values are i.i.d. over time and because the auction is assumed to be core-selecting (hence admits a per-round welfare-to-revenue charging argument), we can prove a constant-factor welfare guarantee relative to a natural ex-ante benchmark, up to an additive $\tilde O(\sqrt{T})$ term.

Because the primitives are i.i.d. over rounds, it is natural to benchmark the realized sequence rule Φ induced by the auction plus the bidders’ algorithms against an single-round allocation policy. Concretely, let y : [0, v̄]^{n(1 + ∑_km_k)} → X be any (possibly randomized) single-round allocation rule mapping the realized per-round value profile v to a feasible allocation y(v) ∈ X. We measure its ex-ante liquid welfare as

This benchmark is deliberately : bidder k cannot contribute more than ρ_k per round on average because of its total budget B_k = Tρ_k, and cannot contribute more than its weakest ROI/CPA-implied per-round cap $\min_j \frac{1}{\gamma_{k,j}}\mathbb E[y_k(v) v^{(j)}_k]$.

Given the induced allocation-sequence rule Φ produced when all bidders run MKP and the platform runs the core auction each round, we analogously define
W(Φ, F) := 𝔼_{v ∼ F^T}[W(x(v), v)], x(v) = Φ(v),
where W(⋅, ⋅) is the multi-constraint liquid welfare defined earlier. The main question in this section is whether W(Φ, F) is comparable to W(y, F) for every benchmark y.

The efficiency proof begins from a per-round inequality implied by core outcomes, which we use as a black box. Fix a round t and suppress t from notation. Let b = (b₁, …, b_n) be the realized bid profile and (x, p) be the outcome of the core auction. The relevant implication of core-ness is that no feasible allocation rule y(⋅) can be implemented by a blocking coalition that makes all its members weakly better off while paying the seller. In its simplest aggregate form, this yields the inequality

Combining with the payment upper bound p_k ≤ b_kx_k (Assumption (A1) in the previous section) immediately gives a familiar ``half approximation’’ for :

Inequality is the per-round engine behind the global theorem: it says that, even though bidders shade their bids via multipliers, the auction outcome is never too inefficient . The remaining work is to translate bid welfare back into , and to quantify the additional loss created by the fact that each bidder uses a over many evolving constraint multipliers.

For each advertiser k and round t, recall that MKP forms an overall multiplier
$$ \mu_{k,t}=\max\Big\{\mu^B_{k,t},\;\max_{j\in\mathcal R_k}\mu^{R,j}_{k,t},\;0\Big\}, \qquad b_{k,t}=\frac{v^{(0)}_{k,t}}{1+\mu_{k,t}}. $$
It is useful to label the identity of the component of this maximum. Define the active index
a_k, t ∈ arg max {μ_k, t^B, {μ_k, t^R, j}_{j ∈ ℛ_k}, 0},
breaking ties arbitrarily but measurably. This partitions time into for bidder k: budget-binding rounds (a_k, t = B), ROI-binding rounds (a_k, t = j), and unconstrained rounds (a_k, t = 0).

If there were only one constraint, or if the active index never changed, we could proceed almost exactly as in the two-constraint analyses: we would use to compare allocations to any benchmark y, and then use slack-tracking to argue that shading is ``appropriate’’ in the sense that the bidder either (i) bids close to value (hence loses little welfare) or (ii) is near its liquid cap (hence cannot contribute much more welfare anyway).

With many constraints, the technical obstacle is the of a_k, t. When constraints are close, MKP uses the most conservative multiplier, which may be slightly larger than the ``right’’ one for the welfare comparison in that round. This conservatism is safe (by construction) but potentially wasteful. We therefore need an accounting device that bounds the total welfare loss due to frequent near-ties and switches.

We introduce a simple way to control the cumulative error created by taking a hard maximum over many evolving multipliers. Fix a resolution parameter Δ > 0. For each bidder k, define the set of

where c ranges over the components {B} ∪ ℛ_k ∪ {0} and we write μ_k, t⁰ ≡ 0. Intuitively, outside ℬ_k(Δ) there is a : the active multiplier exceeds every other candidate by at least Δ, so small perturbations in the identity of the max would not change bids much.

Two facts make useful.

Because bids are of the form v⁽⁰⁾/(1 + μ), the mapping μ ↦ 1/(1 + μ) is 1-Lipschitz on [0, ∞) in the sense that
$$ \left|\frac{1}{1+\mu}-\frac{1}{1+\mu'}\right| \;\le\; |\mu-\mu'|. $$
Thus, if two candidate multipliers differ by at most Δ, then the induced bids differ by at most Δ ⋅ v⁽⁰⁾ ≤ Δv̄. In boundary rounds, even if MKP ``chooses the wrong’’ constraint to be conservative about, the welfare loss we attribute to this conservatism can be upper bounded by O(Δv̄) per bidder, hence O(nΔv̄) per round. This is the first lever: .

Each component multiplier evolves by an additive update with bounded increments. Indeed, for budget we have μ_{k, t + 1}^B − μ_k, t^B = η_k^B(p_k, t − ρ_k) and for ROI constraints μ_{k, t + 1}^R, j − μ_k, t^R, j = η_k, j^R(γ_k, jp_k, t − v_k, t^(j)x_k, t). Under the standing bounds p_k, t ≤ v̄ and v_k, t^(j)x_k, t ≤ v̄, each increment has magnitude at most O(ηv̄).

Now consider, for any two components c ≠ c^′, the gap process g_k, t^{c, c^′} := μ_k, t^c − μ_k, t^{c^′}. Each step changes this gap by at most O(ηv̄), so for the max identity to switch between c and c^′ (or for c^′ to come within Δ of c), the gap must traverse an interval of width Δ. A standard discretization argument then yields a bound of the form

where η_k denotes a representative step size scale and the poly(m_k) factor comes from taking a union bound over the O(m_k²) component pairs that can generate near-ties. The economic content of is that a bidder cannot spend too many rounds in ``indecision’’ unless its multipliers move enough to repeatedly erase and recreate gaps of size Δ, and the updates are too small to allow this at a high frequency.

Putting (i) and (ii) together, the welfare loss due to multi-constraint switching can be bounded by

where m = max_km_k. Choosing $\eta\asymp 1/\sqrt{T}$ and a matching $\Delta\asymp 1/\sqrt{T}$ yields an additive $\tilde O\big(n\,\mathrm{poly}(m)\,\bar v\sqrt{T}\big)$ term, which is the scale appearing in the main theorem below. While suppresses constants and some bookkeeping, it captures the key design logic: we trade off a small per-round ``approximation’’ error Δ against a bound on how often the process can hover near boundaries at that resolution.

We now state the efficiency guarantee informally at the level needed for the remainder of the paper; the proof follows the template above, combining the per-round half-welfare inequality , the ex-post feasibility result from Section~, and the multi-boundary accounting in –.

Fix any distribution F over per-round value profiles, i.i.d. over time, and suppose the platform runs a core auction each round satisfying the payment upper bound p_k, t ≤ b_k, tx_k, t. Suppose all advertisers run MKP with step sizes on the order of $\eta\lesssim \sqrt{\log(\bar v n T)/T}$ (with constants depending polynomially on m = max_k|ℛ_k|). Then, for any single-round benchmark allocation rule y,

Inequality says that, even though bidders are learning and may be alternately budget- or ROI-constrained over time, the resulting market outcome achieves a constant fraction of any ex-ante feasible liquid benchmark. The 1/2 factor is inherited from the core-based charging argument and is tight for this proof technique; improving it would require either additional structure (e.g., stronger revenue covering properties) or a different benchmark. The additive term is the finite-horizon price of learning and of managing many guardrails at once: it grows as $\sqrt{T}$ (so it vanishes on a per-round basis) and degrades polynomially with the number of constraints through the boundary accounting.

The theorem is not a convergence statement and does not imply that the market stabilizes; rather, it provides an operational welfare guarantee that holds equilibrium selection. From a policy and platform-design standpoint, the message is that adding multiple ROI/CPA guardrails need not destroy efficiency, but it does introduce a quantitative tradeoff: more constraints increase the frequency of near-tie situations in which bidders shade for the most conservative constraint, and bounding the welfare cost of this conservatism requires either smaller learning rates or accepting a larger finite-T error term. This is consistent with practice: multi-objective autobidders are typically stable but may appear ``slower’’ to adapt when many KPIs are enforced simultaneously.

Having established an aggregate welfare guarantee, we next turn to individual learning guarantees. In particular, we will define a generalized per-round constraint-pacing benchmark and show that each advertiser achieves sublinear dynamic regret against that benchmark, with rates that depend on the total path length of the benchmark multipliers across constraints.

7. Individual guarantees: define generalized per-round constraint-pacing benchmark; dynamic regret bound depending on total path length across constraints; stationary corollary (best fixed feasible multiplier).

The welfare theorem above is deliberately agnostic about equilibrium selection: it certifies that the performs well even if bidders do not converge. For practitioners, however, a complementary question is whether an individual autobidder is ``doing the right thing’’ from its own perspective, given its constraints and the (possibly evolving) competitive environment. In this subsection we formalize such an individual guarantee via a dynamic regret benchmark that generalizes classical budget pacing to many ROI/CPA-type constraints.

Fix an advertiser k and view the remainder of the system—the other bidders’ algorithms together with the platform’s core auction—as part of the environment. For any scalar multiplier μ ≥ 0, consider the counterfactual in which bidder k shades its objective value in round t by bidding
$$ b_{k,t}(\mu)\;=\;\frac{v^{(0)}_{k,t}}{1+\mu}, $$
while all other inputs are held fixed. Let x_k, t(μ) and p_k, t(μ) denote the resulting (random) allocation and payment to bidder k in that round. We define the associated and functions in expectation:

The interpretation is direct. A per-round budget target ρ_k = B_k/T corresponds to the condition Z_t^B(μ) ≤ ρ_k, while the jth ROI/CPA constraint corresponds to Z_t^R, j(μ) ≤ 0.

The benchmark we use is a that, in each round, chooses the right'' multiplier for that round in expectation. Formally, define $\mu^{B,*}_t$ to be any solution to \begin{equation}\label{eq:muB-star} Z^B_t(\mu)=\rho_k \end{equation} when such a solution exists, and otherwise set $\mu^{B,*}_t:=0$ (the case where even unshaded bidding does not spend at rate $\rho_k$). For each ROI constraint $j\in\mathcal R_k$, define $\mu^{R,j,*}_t$ to be any solution in $[0,\gamma_{k,j}-1]$ to \begin{equation}\label{eq:muR-star} Z^{R,j}_t(\mu)=0 \end{equation} when such a solution exists, and otherwise set $\mu^{R,j,*}_t:=0$.\footnote{Restricting to $[0,\gamma_{k,j}-1]$ matches the usualsafe’’ region suggested by ROI pacing: beyond γ_k, j − 1 the bidder is so conservative that the constraint is automatically satisfied under mild alignment conditions between billed value and the jth constraint value stream. The analysis can be extended to a larger interval when one has global Lipschitz control.} Finally, define the overall per-round comparator multiplier as

This benchmark has two appealing economic properties. First, it is : each μ_t^R, j, * captures how aggressively bidder k would have to shade in round t if it were only pacing for constraint j, while μ_t^B, * captures pure budget pacing. Second, it is : the sequence {μ_t^*}_t ≤ T may drift over time as competition or value distributions change, which is exactly what we see in real auction markets when seasonality, product changes, or competitor entry shift the mapping from bids to spend and performance.

To evaluate learning, we compare the realized objective value under MKP to the objective value that bidder k would obtain if it could play the benchmark sequence {μ_t^*} from . Let
U_k, t(μ) := v_k, t⁽⁰⁾x_k, t(μ)
denote bidder k’s round-t realized objective value under multiplier μ. We define the (expected) dynamic regret as

where μ_k, t is the overall multiplier chosen by MKP. The expectation is over the realized values, the auction randomness (if any), and any randomness in tie-breaking. Importantly, MKP guarantees ex-post feasibility of all constraints (Section~), so regret is a statement about performance .

The benchmark is deliberately conservative: it does not compare against an unconstrained best response (which may be ill-posed under budgets and ROIs), but instead against a pacing-style policy parameterized by a single scalar multiplier each round. This matches what industrial autobidders can actually implement at low latency, and it avoids conflating learning performance with the impossibility of anticipating future values under hard constraints.

A technical obstacle in proving is that MKP updates constraint-specific multipliers every round, even when they are not the active component of the maximum defining μ_k, t. Thus the update for a non-active constraint is computed using feedback generated under a multiplier than that constraint would itself imply. This produces what one may call a ``wrong-gradient’’ effect.

The reason this is tolerable is economic: the error is one-sided. Updating a non-binding constraint using outcomes from a more conservative bid tends to slack (because spending and allocation fall when we increase μ), which can only make the corresponding multiplier smaller and hence less conservative in the future. Since feasibility is enforced by the maximum over multipliers and by safe initialization, such errors do not threaten constraint satisfaction; they only influence how quickly the algorithm approaches the appropriate level of conservatism.

To translate this intuition into a quantitative regret bound, we impose mild regularity on the environment faced by bidder k.

There exist constants L_B and {L_R, j}_{j ∈ ℛ_k} such that for all μ, μ^′ ≥ 0 and all rounds t,
|Z_t^B(μ) − Z_t^B(μ^′)| ≤ L_B|μ − μ^′|, |Z_t^R, j(μ) − Z_t^R, j(μ^′)| ≤ L_R, j|μ − μ^′|.
This assumption is satisfied, for instance, when the distribution of effective competing bids has a bounded density and the allocation rule is monotone in bids; it is also the natural continuity condition needed for any per-round ``solve for μ’’ comparator such as – to be well-behaved.

We now state the individual learning guarantee at the level needed for our applications. For each comparator sequence, define its :

These quantities measure how non-stationary the ``right’’ pacing levels are. Small path length corresponds to a stable market (or stable objectives), while large path length corresponds to drift, entry, or regime shifts.

Fix an advertiser k. Under boundedness of values in [0, v̄], the payment upper bound p_k, t ≤ b_k, tx_k, t, and the Lipschitz response assumption above, MKP with appropriately tuned step sizes guarantees

where the Õ(⋅) notation suppresses logarithmic factors in (n, T, v̄) and polynomial dependence on the Lipschitz constants and on m_k = |ℛ_k|.

Several comments help interpret . First, regret is sublinear in T even when the comparator moves, as long as its cumulative movement (path length) is not too large. Second, the dependence on {P_T^R, j} is additive across ROI constraints: each additional guardrail introduces an additional drifting target that the algorithm tracks, and the bound makes this cost explicit. Third, the exponents reflect the fact that we are solving a constrained tracking problem under bandit feedback and with a max-coupling across constraints; sharper rates are possible under stronger feedback (e.g., full-information counterfactuals) or stronger curvature assumptions, but is sufficient for an ``operational’’ guarantee that improves with scale.

When the environment is stationary in the sense that the expected response functions – do not depend on t (as in our i.i.d. model), we may take μ_t^B, * ≡ μ^B, * and μ_t^R, j, * ≡ μ^R, j, *, so that P_T^B = P_T^R, j = 0 for all j. In this case simplifies to
𝔼[Reg_k(T)] ≤ Õ (m_k T^7/8 + T^3/4),
and hence the average per-round regret vanishes:
$$ \frac{1}{T}\mathbb E\big[\mathrm{Reg}_k(T)\big]\to 0. $$
Equivalently, MKP asymptotically matches the performance of the best pacing multiplier (subject to the induced constraints) in the stationary limit, while still guaranteeing ex-post constraint satisfaction along the entire finite sample path. This is the sense in which slack-tracking is not merely a safety mechanism: it is also an individually rational learning rule that adapts to market conditions without requiring convergence of bids or multipliers.

From a platform-design perspective, the corollary helps explain a stylized empirical fact: when markets are stable, multiplier-based autobidders tend to settle into approximately constant pacing levels, and performance differences across algorithms are largely transient. When markets are unstable, the relevant question is not whether the algorithm converges, but whether it tracks the moving ``right’’ level quickly enough; the path-length terms in provide a quantitative handle on that tracking problem.

Having established both an aggregate welfare guarantee and an individual learning guarantee, we next discuss extensions and limitations of the analysis, including approximate-core auctions and modest forms of non-stationarity.

8. Extensions and limitations: approximate-core auctions; mild non-stationarity; notes on adding non-monetary constraints (frequency caps) and delayed value as future work.

Our analysis is intentionally modular: the feasibility guarantees are driven by the slack-tracking updates and require only minimal per-round accounting, while the welfare and regret guarantees rely on additional structural properties of the auction and of the environment faced by a bidder. This modularity makes it natural to ask what happens when we relax the cleanest assumptions (exact core, i.i.d. values, and purely monetary constraints). We discuss several such extensions and, equally importantly, where the current proof techniques appear to reach their limits.

The welfare proof uses a core-type inequality to ``charge’’ welfare loss to payments (or to foregone payments) when bidders shade. In practice, however, large ad platforms often implement allocation and pricing rules that are only approximately welfare-maximizing, and core-like properties may hold only up to approximation. Sources of approximation include discretization (bid rounding and minimum bid increments), reserve prices and other business rules, throttling and eligibility constraints, and engineering constraints that require decomposing a global allocation problem into independent subproblems.

A useful way to formalize these frictions is to assume the auction is in the following high-level sense: for every realized round and every feasible alternative allocation x̃_t ∈ X, the payments in the realized outcome cannot be jointly undercut'' by the coalition that would benefit from $\tilde x_t$, except up to a multiplicative factor and/or an additive slack. Concretely, one can posit that there exist parameters $\alpha\ge 1$ and $\varepsilon\ge 0$ such that the inequality used in the charging argument holds with a weakening of the form \[ \text{(exact core inequality)}\quad\Longrightarrow\quad \text{(approximate inequality)}\;\;\ge\;\;\frac{1}{\alpha}\cdot(\cdot)\;-\;\varepsilon. \] While the exact algebra depends on the particular core notion adopted by the mechanism, the economic implication is robust: the multiplicative constant in the welfare theorem degrades by at most a factor $\alpha$, and an additional additive term of order $\varepsilon T$ appears. Put differently, approximate-core auctions still support a meaningful welfare guarantee, but the platformpays’’ for deviations from exact core either by reducing the welfare fraction that can be certified or by introducing an error that accumulates linearly over time.

This extension is important for practice because it shifts attention to which frictions are consequential. Small discretization errors correspond to a small ε and are essentially harmless at scale, whereas systematic distortions (e.g., aggressive reserves) effectively increase α by weakening the ability to charge welfare losses to others’ payments. From a design standpoint, our framework therefore suggests a concrete engineering objective: if the platform cannot (or does not wish to) implement an exact core outcome, it can still preserve the welfare logic by keeping the implied (α, ε) parameters small in the regions of the bid space where autobidders most often operate.

A further limitation is interpretability across formats. Many real-world auctions combine multiple ad formats and constraints (sponsored search, shopping, display) in ways that blur the feasible set X and complicate any single notion of ``core.’’ Our welfare argument is most compelling when the platform can credibly view each round as selecting an outcome from a well-defined downward-closed feasibility set and when prices are linked to bids via a monotonic mechanism. When these conditions fail, approximate-core is still a reasonable diagnostic, but verifying it becomes an empirical question rather than a purely theoretical one.

We imposed i.i.d. values over time primarily to cleanly define an ex-ante welfare benchmark and to avoid conflating learning guarantees with adversarial drift. That said, the individual guarantee in Section~ already points to a more realistic model of market dynamics: the mapping from a bidder’s multiplier to its spend and ROI outcomes can drift as competitors enter, as budgets renew, or as user composition changes. The path-length terms in explicitly quantify this drift at the level of the comparator multipliers, and this is precisely the notion of non-stationarity that practitioners tend to care about.

Two observations are worth emphasizing. First, . The budget and ROI constraints are enforced along the realized sample path, and the slack-tracking invariant does not require any probabilistic stability. Thus, even in a highly non-stationary environment, MKP continues to guarantee no violations provided the per-round accounting conditions hold. Second, . When the ``right’’ multiplier changes from week to week, a fixed-multiplier comparator is no longer the appropriate yardstick; the per-round comparator sequence {μ_t^*} is the right object, and its path length becomes the measure of difficulty.

At the market level, extending the welfare theorem to non-stationarity is more delicate because our benchmark there is ex-ante and distributional. One natural extension replaces a single distribution F by a sequence {F_t} and compares to the sum of per-round ex-ante liquid welfare under F_t. Conceptually, the same charging argument can be applied round-by-round, but the proof must also control the additional boundary terms created by frequent switching of the binding constraint when the environment drifts. Our current analysis already incurs such boundary terms even in stationary settings (through the ``switching’’ amortization), and non-stationarity effectively increases their magnitude by making switches more frequent. This suggests a practical take-away: if the platform anticipates strong seasonality, it may be beneficial to reduce the frequency of hard regime changes in the allocation rule (for example, by smoothing reserve adjustments), since abrupt platform-side changes mechanically increase the difficulty of tracking for all bidders.

A complementary algorithmic response is to introduce explicit or . In drifting environments, constant step sizes can react faster but may amplify noise, while decaying step sizes stabilize but adapt slowly. Restarting (e.g., monthly) can approximate the best of both worlds when drift is episodic. These modifications are well understood in online optimization, but incorporating them while preserving our clean ex-post feasibility guarantee requires care because restarts reset multipliers and therefore reset accumulated slack. One safe approach is to restart only downward (i.e., to reduce conservatism) when one can certify that cumulative slack is sufficiently positive; formalizing such ``safe restarts’’ is a promising direction.

We assumed v_k, t^(j) ∈ [0, v̄] to keep both feasibility and regret statements transparent. In many performance advertising settings, however, the realized value of an impression (especially for conversion-related KPIs) can be heavy-tailed. While one can interpret v_k, t^(j) as a clipped or winsorized value (as is common operationally), it is useful to understand what the theory is buying from boundedness.

For feasibility, boundedness primarily enters through step-size constraints that prevent multipliers from overshooting their ``safe’’ region in a single update. With heavy tails, a single outlier could cause a large update unless the algorithm uses robust gradients (e.g., truncation) or adaptive step sizes that cap per-round change. For regret, heavy tails require concentration arguments that are not available under mere bounded second moments, particularly in the presence of bandit feedback. Practically, platforms and bidders already implement various forms of smoothing and cap-and-floor logic on both spend and performance signals; our analysis can be viewed as certifying that such logic is not merely heuristic but structurally necessary for worst-case safety.

A related limitation concerns measurement error: the constraint values v_k, t^(j) may themselves be estimated (e.g., predicted conversions) rather than observed truth. In that case, the algorithm can be feasible with respect to the constraint while violating the true one. A robust approach is to incorporate statistical margins (confidence intervals) directly into the constraint updates, effectively tightening γ_k, j in the updates to account for uncertainty. Doing so preserves the no-violation spirit but introduces an explicit conservatism–efficiency tradeoff that must be calibrated.

Budgets and ROI/CPA constraints are ``monetary’’ in the sense that they are naturally expressed in terms of payments and value-per-impression streams. Many real campaigns also face non-monetary constraints: frequency caps (do not show too many ads to the same user), minimum delivery or reach requirements, share-of-voice targets, brand-safety exclusions, and pacing constraints at finer granularities (hourly or geo-level budgets). These constraints typically couple across impressions and depend on user-level state, making them qualitatively different from the aggregate constraints we study.

Nevertheless, the slack-tracking template suggests a plausible route to integration. A frequency cap, for instance, can be written as a constraint on a cumulative exposure count. If the bidder observes its realized exposures (even with delay), one can add an additional multiplier whose update tracks exposure slack. The key difference is how the multiplier enters bidding: for budgets and ROI, a scalar multiplier naturally scales down an objective value. For frequency caps, the ``cost’’ of showing an additional impression depends on the current user-level exposure state, so a single scalar multiplier may be too coarse unless the platform provides a state-aggregated signal (e.g., an expected marginal frequency cost). In other words, extending the model requires either (i) richer bid adjustments (a context-dependent multiplier), or (ii) platform-side aggregation that maps user-level constraints into an effective per-impression shadow price.

This reveals a broader design lesson. Constraints that are naturally dualized at the impression level (payments, predicted value streams) fit cleanly into multiplier bidding. Constraints that are inherently combinatorial or stateful (frequency, reach) require additional structure—either more information to the bidder or more responsibility on the platform to enforce feasibility. Our current results therefore do not directly justify the common industry practice of bundling all controls into a single multiplier; rather, they clarify when such bundling is principled and when it is an approximation that should be validated empirically.

Finally, many KPI streams are delayed: conversions may be observed hours or days after an impression, and attribution rules can reassign credit retroactively. Delays fundamentally change the feedback model. Payments are typically known immediately, so the budget update remains well posed, but ROI/CPA updates depend on v_k, t^(j)x_k, t, which may be revealed only after a random delay. One can treat this as online learning with delayed feedback, for which regret bounds usually incur an additional term scaling with the total delay. The more pressing issue for our setting is : if the bidder cannot observe ROI slack in real time, it may overspend relative to the true constraint before the multiplier reacts.

A conservative solution is to update ROI multipliers using predicted or lower-confidence-bound values until true outcomes arrive. This mirrors how production systems often operate: they pace to predicted conversions and then correct ex post. The theoretical challenge is to quantify the welfare loss induced by this conservatism and to characterize when the correction dynamics remain stable. We view this as an important direction for future work because delays are the norm rather than the exception in performance advertising, and a theory that explicitly accounts for them would substantially strengthen the operational relevance of slack tracking.

Taken together, these extensions delineate a boundary between what is structurally robust in our approach and what is contingent on modeling choices. Ex-post feasibility is remarkably robust: it survives approximate auctions, non-stationarity, and many forms of noise so long as the per-round accounting remains valid. Welfare and regret guarantees are more sensitive: they depend on the auction’s degree of core-ness, on continuity of response to bid shading, and on how rapidly the ``right’’ pacing levels drift. These sensitivities motivate the empirical questions we turn to next: when multiple constraints compete to bind, how often do switches occur in realistic environments, how large are the resulting transient losses, and how do these effects compare across competing autobidding rules.

9. Numerical study (optional): simulated multi-player self-play with multiple constraints and competing algorithms; constraint violations, welfare, and regret scaling; identify regimes where switching is prevalent.

While our theory is deliberately worst-case, it is useful to complement it with a controlled numerical study that isolates the two phenomena that are hardest to see from the statements alone: (i) how conservative bidding induced by ROI-type constraints interacts with competition in a repeated auction, and (ii) how often the identity of the binding constraint switches in regimes where several constraints are simultaneously ``near tight.’’ Our goal here is not to claim empirical realism, but to provide a sanity check and a set of diagnostics that practitioners can reproduce with their own calibrated simulators.

We simulate a repeated single-slot auction with a downward-closed feasibility set: in each round t, at most one advertiser receives the impression, so x_k, t ∈ {0, 1} and ∑_kx_k, t ≤ 1. The allocation rule is the standard welfare-maximizing rule with respect to submitted bids: the impression is allocated to the highest effective bid b_k, t, and the winner pays the second-highest bid (ties broken uniformly). This choice has two practical advantages: it respects the per-round accounting condition p_k, t ≤ b_k, tx_k, t, and it cleanly separates strategic shading (through multipliers) from the platform-side allocation logic.

Values are i.i.d. across rounds, but can be correlated across streams within a bidder. Concretely, we draw a latent quality q_k, t ∼ Unif[0, 1], and then set
v_k, t⁽⁰⁾ = v̄ ⋅ q_k, t, v_k, t^(j) = v̄ ⋅ (λ_jq_k, t + (1 − λ_j)ξ_k, t^(j)),
where ξ_k, t^(j) ∼ Unif[0, 1] are independent and λ_j ∈ [0, 1] controls alignment between the objective stream and constraint stream j. This generates the intuitive cases: when λ_j ≈ 1, objective and constraint values move together (ROI is ``predictable’’ from the objective), whereas when λ_j ≈ 0, ROI is noisy and switching across constraints is more likely.

Each advertiser k has a total budget B_k = ρ_kT and m_k ROI constraints indexed by ℛ_k, with strictness parameters γ_k, j ≥ 1. To expose switching, we vary three knobs:

Unless otherwise stated we take v̄ = 1, n ∈ {10, 50}, and T ∈ {10⁴, 10⁵, 5 ⋅ 10⁵}. For each configuration we run 50 independent seeds and report means and interquartile ranges.

All bidders participate simultaneously (self-play). We compare the following bidding rules, each using only bandit feedback of realized (x_k, t, p_k, t) and its realized v_k, t^(j):

Step sizes are chosen to satisfy the safe-update conditions in our feasibility propositions; in practice we set η ∝ T^−1/2 and then scale by a small constant so that one-step changes are visibly smooth.

We track three families of outcomes.

For budget we report $\max\{0,\sum_{t=1}^T p_{k,t}-B_k\}$, and for each ROI constraint we report max {0, γ_k, j∑_tp_k, t − ∑_tv_k, t^(j)x_k, t}. We also report the fraction of runs with any positive violation.

We compute realized liquid welfare W(x, v) = ∑_kW_k(x, v) and normalize it by an offline benchmark computed on the realized sample path: the best feasible allocation sequence subject to all advertisers’ budgets and ROI constraints, solved as a mixed-integer program for the single-slot environment. (This benchmark is stronger than the ex-ante benchmark used in the theorem, so ratios here should be interpreted as descriptive rather than a direct test of the bound.)

Exact dynamic regret against the per-round comparator μ_t^* is difficult to compute in self-play because each agent’s comparator depends on others’ actions. We therefore use two proxies: (i) regret against the best constant multiplier in hindsight for each bidder (computed by replay with logged values), and (ii) regret against a slowly varying comparator obtained by fitting, for each bidder, the multiplier path that would have made its constraints tight in expectation under the observed opponent bid distribution (estimated in sliding windows).

Across all configurations we tested, MKP produced zero observed violations (budget and all ROI constraints) up to numerical tolerance, consistent with the ex-post nature of the guarantee. In contrast, the active-only update heuristic exhibits violations precisely in the regimes that motivate multi-constraint tracking: when the identity of the binding constraint changes, the inactive multipliers fail to ``accumulate slack’’ and can lag far behind. Violations are especially pronounced when λ_j is small (misalignment), because the realized ROI slack becomes noisy and the active constraint can flip frequently even when average strictness parameters are similar.

The sum-aggregator variant almost never violates constraints, but for a different reason: it is so conservative that it often underspends the budget by a wide margin. This mirrors a common platform observation: ``safety’’ can be purchased by conservative control, but at a potentially large welfare cost.

To connect welfare outcomes to the theoretical discussion about boundary rounds and regime switching, we measure, for each bidder,
$$ S_k := \sum_{t=2}^T \mathbf{1}\{a_{k,t}\neq a_{k,t-1}\}, $$
where a_k, t ∈ {B} ∪ ℛ_k denotes which constraint attains the maximum multiplier in round t. Two patterns are robust.

First, when there is a clear ``dominant’’ constraint (e.g., one ROI constraint has much larger γ_k, j than the rest, or the budget is extremely tight), switching is rare (S_k/T below 1%), and MKP’s welfare is close to the oracle constant-multiplier baseline. In these cases, the max-aggregator behaves like a single-constraint pacer most of the time, and the additional multipliers simply certify feasibility.

Second, when several constraints are near tight and moderately misaligned (e.g., two ROI constraints with similar γ but different λ), switching becomes prevalent (S_k/T in the 5%–20% range, increasing with m_k). Exactly in these configurations, the welfare gap between MKP and the oracle baseline widens, but in a way that is informative: most of the loss is concentrated in a small subset of ``switch rounds’’ in which the bidder’s effective bid changes abruptly because a different constraint becomes active. This is the empirical analogue of the boundary terms in the proof outline: the algorithm remains safe, but the price of safety is paid in transient conservatism when constraints compete to bind.

When we plot the best-constant-multiplier regret proxy as a function of T, MKP’s per-round regret decays roughly like T^−1/2 in the aligned regimes (large λ), and more slowly in the misaligned switching regimes. Active-only updates display the opposite tradeoff: they can appear competitive on objective value early on (because they are less conservative), but the accumulated constraint violations grow with T, making them infeasible as a long-run policy. The sum-aggregator has low regret in the constrained sense (it rarely approaches violations) but high regret in objective value due to persistent underbidding.

A practical interpretation is that switching is not merely a proof artifact: it is a genuine hardness parameter for online control with multiple constraints. In environments where switching is endemic (multiple comparable constraints, noisy KPI streams, frequent competitive shocks), bidders should expect slower adaptation, and platforms should expect larger transient efficiency losses from any safe pacing rule.

Increasing n makes the auction ``thicker’’ and mechanically increases the marginal value of precise shading: small differences in b_k, t more often change who wins the impression. In our simulations, thicker markets amplify both the upside of well-tuned multipliers (higher welfare relative to naive budget-only pacing) and the downside of switching (larger welfare drops on switch rounds). This is consistent with the economic intuition that in competitive auctions, conservative errors are more costly because they more frequently flip the allocation margin.

We emphasize three limitations. First, the single-slot environment abstracts away from rich feasibility sets and from approximate-core issues; multi-slot and combinatorial settings are likely to increase switching because different inventory segments can activate different constraints. Second, we assumed immediate observation of v_k, t^(j)x_k, t; delayed attribution would weaken the ability to react, and we expect the gap between safe and aggressive heuristics to widen. Third, our offline benchmark is not the same as the ex-ante benchmark in the theorem; it is included as a diagnostic rather than a direct test.

With those caveats, the numerical study supports three takeaways that align with the theory. (i) Slack tracking across constraints is not just a technical convenience; it materially reduces violations relative to ``update-only-the-active-constraint’’ heuristics. (ii) The welfare cost of multi-constraint safety is driven disproportionately by switching episodes, suggesting that measuring and predicting switching is valuable for both bidders and platforms. (iii) Conservative aggregation rules (like summing multipliers) can buy safety but at an efficiency cost that is visible even in simple stationary environments. These observations motivate two operational diagnostics: logging the active-constraint identity a_k, t and its switch rate, and stress-testing pacing controllers specifically in synthetic regimes engineered to induce frequent switching.