By 2026, the modal large advertiser no longer ``sets bids’’ in any meaningful human sense. Instead, the advertiser configures an : a piece of software that ingests noisy, fast-moving signals about marginal value, converts these into an for each auction, and adapts continuously as delivery accumulates. What has changed in practice is not merely the scale of automation, but the of objectives and guardrails that the advertiser expects the system to respect. A typical campaign is tasked with maximizing one key performance indicator (KPI) while simultaneously honoring several accounting and attribution constraints: total spend must not exceed budget; return-on-investment (ROI) or cost-per-action (CPA) must remain within a specified range; and, increasingly, there are additional internal constraints that operationalize business or compliance requirements (for example, profitability by product line, incremental lift thresholds, or channel-specific efficiency). Our starting point is that the platform continues to run a per-round auction, but the economically relevant behavior is shaped by the of many adaptive constraint-handling algorithms rather than by a single static bidding rule.
This paper takes that multi-constraint reality seriously. We study a repeated auction environment in which each advertiser has an objective value stream (the KPI being maximized) and, for each of several ROI/CPA-type constraints, an associated ``constraint value stream’’ that measures the numerator of the constraint. These streams may be correlated within a round and are only observed through realized outcomes, reflecting the common situation in which advertisers learn about value and constraint satisfaction from their own deliveries and bills. The substantive question is then not whether the platform can compute an equilibrium for a stylized game, but whether the of (i) a reasonable auction rule and (ii) lightweight, implementable bidder-side updates can deliver allocations that are both (no budget or ROI violations) and (high aggregate value) over long horizons.
A natural response from classical auction theory is to ask for equilibrium predictions: what happens when each bidder best-responds, and do the resulting bids clear? In the autobidding setting, however, the premise required for equilibrium analysis is typically absent. Advertisers are not choosing bids as static actions; they are running online learning systems that update in response to feedback, and those updates operate on different time scales across advertisers. Even if the environment were stationary, the induced game is not: each agent’s algorithm changes its mapping from observed data to bids over time. In addition, the informational primitives that would support equilibrium reasoning are not present. An advertiser may understand its own attribution model and budget, but it seldom has a stable, accurate model of the joint distribution of other bidders’ values and constraints, and it certainly does not observe others’ constraint slacks. When values and conversion rates drift with seasonality, creative refreshes, or macroeconomic shocks, stationarity is at best an approximation, and convergence guarantees become a tenuous target.
In this context, insisting on equilibrium can be counterproductive for both theory and practice. On the theory side, equilibrium results often require restrictive assumptions on information, continuity, and best responses that do not map cleanly onto the bandit-feedback regime faced by bidders. On the practice side, platforms and advertisers need that hold under continual adaptation: budgets and ROI constraints are not aspirational long-run goals but hard constraints that finance teams expect to be satisfied on the realized history. It is therefore more informative to ask for guarantees that hold convergence: can we design bidder-side updates that keep constraints satisfied ex post, and can we still say something nontrivial about welfare when everyone is adapting simultaneously?
Our answer is affirmative. We analyze a multi-multiplier pacing approach that generalizes the familiar idea of ``pacing’’ a bid by a scalar to respect a budget. Here each advertiser maintains a collection of multipliers—one for the budget and one for each ROI/CPA-type constraint—and submits an effective bid that scales down the objective value by the (largest) active multiplier. Intuitively, each multiplier tracks the cumulative slack of its associated constraint: when spend exceeds a per-round target, the budget multiplier increases; when an ROI constraint is tight, its multiplier increases. Crucially, all multipliers update each round using only the advertiser’s realized allocation and payment, so the algorithm is implementable under bandit feedback. The main conceptual point is that we do not require the multipliers to converge, nor do we require the profile of bids to settle into a steady state. Instead, we exploit monotonicity properties of allocations and payments in the effective bid to show that the slack-tracking dynamics can enforce feasibility while still allowing the auction to allocate efficiently in aggregate.
To evaluate efficiency, we need an aggregate objective that respects the presence of constraints. Raw welfare—the sum of realized values ∑k, tvk, t(0)xk, t—is not the right benchmark when bidders are financially constrained. A bidder with an enormous nominal value but a tiny budget cannot realize that value at scale; similarly, a bidder with strict ROI requirements cannot rationally spend into low-ROI impressions even if those impressions have high nominal KPI value. If we judge an outcome by raw welfare, we risk labeling as ``inefficient’’ an allocation that is actually the best achievable under the constraints that bidders face. This is not merely a modeling preference: in applied settings, budget and ROI constraints are binding precisely because they reflect real resource limits (cash flow, profitability, or risk tolerance). An efficiency concept that ignores them can recommend allocations that are infeasible for the participants and thus irrelevant for policy and mechanism design.
We therefore adopt as the appropriate efficiency yardstick in constrained environments, and we extend it to the multi-constraint setting. The guiding interpretation is simple: each advertiser’s contribution to welfare should be capped by what that advertiser can feasibly ``turn into’’ value given its financial and ROI restrictions. In the budget-only case, liquid welfare replaces an agent’s total realized value by the minimum of that value and the budget; the cap reflects the fact that beyond the budget the agent cannot pay and thus cannot credibly demand allocation. With ROI/CPA constraints, the effective cap is not only monetary but also : if an advertiser must maintain a minimum ratio between a constraint-specific value stream and payments, then its feasible scale is limited by the accumulated constraint value. The resulting multi-constraint liquid welfare aggregates these caps across advertisers, yielding a metric that is both economically meaningful and analytically tractable.
This welfare notion has two practical virtues. First, it respects heterogeneity in constraint tightness. Two advertisers may have the same objective KPI values but different ROI thresholds; liquid welfare credits the advertiser that can productively spend more under its constraints without imputing unattainable surplus to the stricter advertiser. Second, it aligns naturally with the platform’s operational goals. Platforms typically want an allocation that is ``as efficient as possible’’ subject to advertisers’ declared budgets and efficiency constraints, because repeated violations lead to churn, regulatory scrutiny, or contractual disputes. Liquid welfare formalizes this by measuring efficiency the feasible region carved out by those constraints.
The analytic challenge is then to connect a dynamic, non-convergent bidding process to a clean welfare guarantee. Our results proceed in an ex-ante benchmarking style that is common in modern analyses of repeated mechanisms: we compare the expected liquid welfare induced by the auction and bidders’ algorithms to the best expected liquid welfare achievable by a fixed, single-round allocation rule that has access to the value profile in that round. This benchmark is demanding in two ways. It does not presume that the actual dynamics settle down to any equilibrium; and it allows the benchmark to be welfare-maximizing in each round subject only to ex-ante feasibility caps induced by budgets and ROI constraints. The striking implication is that even when the bidders are learning and constraints are multiple, we can still guarantee a constant-factor approximation to this ex-ante optimum, up to a sublinear additive term that vanishes on a per-round basis.
A key modeling choice behind this guarantee is to focus on auctions with a strong welfare property, namely core-like deviation-proofness. While the precise definition belongs later in the paper, the economic idea is that the auction outcome is stable against coalitional deviations in a way that allows us to ``charge’’ losses in one place to payments collected elsewhere. This charging logic is particularly well-suited to environments with pacing: when a bidder shades its bid to satisfy constraints, welfare may fall relative to a fully truthful, unconstrained allocation, but the reduction can be bounded in terms of payments and thus amortized across the market. The contribution of the present work is to show that this amortization can survive the additional complexity introduced by constraints per advertiser, where the identity of the binding constraint can change over time. In practice, this change-of-regime phenomenon is exactly what one observes when an advertiser alternates between being budget-limited, ROI-limited, or unconstrained as demand conditions shift.
Why should we care about such a welfare guarantee, as opposed to merely proving constraint satisfaction? The reason is that feasibility alone can be achieved by trivial conservatism: one can always bid near zero and violate nothing. What matters for market design is the tradeoff between safety and performance. Our analysis makes this tradeoff explicit: multipliers act as shadow prices for constraints, and the learning rates govern how quickly bidders react to realized slack. When learning is too aggressive, a bidder can oscillate and risk overshooting a constraint; when learning is too timid, the bidder may underdeliver and forgo value. By proving that a simple slack-tracking update can achieve both ex-post feasibility and near-optimal liquid welfare, we provide a formal justification for the common industry practice of multiplier-based pacing, and we clarify how multi-KPI requirements change the nature of the efficiency loss.
There is also a policy-facing aspect to this perspective. As automated bidding systems become the default, regulators and advertisers increasingly ask whether outcomes are interpretable and whether constraints are honored in a verifiable way. A design that guarantees ex-post feasibility for budgets and ROI-type constraints is inherently more auditable than one that guarantees feasibility only in expectation or only at an equilibrium that may never be reached. Moreover, liquid welfare offers a principled way to talk about ``market health’’ when participants are constrained: it avoids conflating efficiency with the ability to pay, while still recognizing that constrained demand should not be treated as fully elastic. In settings where platforms must justify allocation decisions to large advertisers or to oversight bodies, being able to point to a welfare objective that internalizes the declared constraints can be valuable.
At the same time, we emphasize what our model does claim. We do not assume that advertisers report values truthfully, nor do we model the full complexity of attribution and measurement error; instead, we treat the realized value streams as primitives and study how an algorithm reacts to them under limited feedback. We also abstract from intertemporal complementarities in delivery (such as learning-to-target effects), and we adopt an i.i.d. structure over time to isolate the strategic and algorithmic issues created by multiple constraints. These simplifications are deliberate: our aim is to understand which guarantees are possible in a repeated auction when bidders are constrained and adaptive, not to provide a complete engineering model of ad delivery.
The remainder of the paper builds from this motivation to a concrete algorithmic and welfare analysis. We first formalize the multi-constraint pacing dynamics and show that, with appropriate initialization and step sizes, all constraints are satisfied on every realized history despite bandit feedback. We then connect these feasibility guarantees to expected liquid welfare, showing that the market-level outcome retains a robust fraction of the best ex-ante feasible welfare even without convergence of bids or multipliers. Finally, we complement the welfare perspective with individual-level learning guarantees, bounding each advertiser’s regret relative to a natural per-round pacing benchmark that captures time variation in optimal constraint shadow prices. In combination, these results suggest a coherent economic message: in multi-KPI autobidding markets, convergence is an unrealistic desideratum, but constraint-respecting efficiency—measured appropriately via liquid welfare—remains both meaningful and attainable.
Our work sits at the intersection of (i) budget pacing in repeated auctions, (ii) ROI/CPA-constrained autobidding, (iii) equilibrium models of paced bidding, and (iv) online learning with knapsack-type constraints. The common theme is that advertisers face intertemporal feasibility constraints but interact through per-round auctions. The point of departure in the present paper is the combination of ROI/CPA-type constraints per advertiser with a welfare objective that is explicitly . This combination turns out to be technically nontrivial even when one restricts attention to simple multiplier updates and to auctions with strong welfare properties.
A foundational line of work studies ``pacing’’ as a bidder-side control problem in repeated auctions, where the bidder adjusts a scalar multiplier to keep cumulative spend within a budget. In these models, the bidder typically observes only its realized payment and allocation each round, and the goal is to satisfy the budget constraint while obtaining high value. A key conceptual advance is that one can seek guarantees that hold assuming that the induced dynamics converge to a stationary point. This is particularly appropriate for ad markets, where the distribution of opportunities and competition changes frequently and bidder-side learning systems update continuously.
Gaitonde et al. provide an influential analysis in this direction. They formalize budget pacing as an online learning problem in repeated auctions and show that a simple update of a budget multiplier can guarantee ex-post budget feasibility together with a constant-factor welfare (or value) guarantee relative to an ex-ante benchmark, even when all bidders are simultaneously adapting. Importantly, the proofs do not rely on convergence to a Nash equilibrium; instead, they exploit monotonicity of payments in the multiplier and a charging argument that relates welfare loss from bid shading to payments collected in the market. Our analysis adopts the same high-level stance—robust performance guarantees in nonstationary, nonconvergent environments—and extends it along two axes: multiple constraints per bidder and a welfare notion that must internalize those constraints.
Lucier et al. study an autobidding model with both a budget constraint and an ROI (or CPA) constraint, together with bandit-style feedback. Their algorithm maintains two multipliers (one for budget and one for ROI) and bids using the more conservative of the two. A central technical idea is a invariant: each multiplier encodes the cumulative slack of its corresponding constraint, and safe initialization plus appropriately small step sizes prevent the cumulative slack from becoming negative, yielding ex-post feasibility. On the efficiency side, Lucier et al. connect this bidder-side control to market-level welfare in a repeated auction, again without requiring convergence.
Our
multi-KPI'' setting generalizes this budget+ROI framework by allowing an arbitrary number of ROI/CPA-type constraints per advertiser. At a high level, one might expect the extension to be straightforward: maintain a multiplier for each constraint and bid according to the maximum. The subtlety is that the welfare proof becomes materially harder. With two constraints, there are only a fewregimes’’
(budget-binding, ROI-binding, or neither), and one can control the
number of times the dynamics hover near the boundary where the identity
of the binding constraint switches. With many constraints, boundary
behavior proliferates: multiple multipliers can be near-tied, and the
process can switch frequently among them. Bounding the welfare loss
attributable to such switching requires new amortization ideas beyond
simply union bounding over constraints.
A large literature models pacing as an equilibrium phenomenon: bidders choose a pacing multiplier (or an equivalent shading factor) so that, in expectation, they spend their budget across a distribution of auctions. In the simplest versions, each bidder draws a value per auction, bids a shaded value, and the multiplier is chosen to make expected spend match the budget. This leads to fixed-point characterizations of ``pacing equilibria’’ and comparative statics in terms of budgets, value distributions, and competition . These analyses are valuable for understanding steady-state market outcomes and for platform-side forecasting.
Our focus is different in two respects. First, the equilibrium assumption is often not a good approximation for modern autobidding, where the mapping from signals to bids is updated continuously and bidder populations are heterogeneous in their learning and constraint-management rules. Second, even when a fixed point exists, it does not by itself provide the operational guarantee advertisers demand: that budgets and ROI constraints hold on the realized trajectory. The welfare and feasibility guarantees we pursue are therefore ``pathwise’’ (ex post) and robust to nonconvergence, rather than asymptotic properties of a stationary equilibrium.
That said, the equilibrium literature informs our modeling choices. In particular, the use of a scalar multiplier as a shadow price for feasibility constraints is common across both equilibrium and learning-based approaches. One can interpret our multipliers as time-varying approximations to the Lagrange multipliers that would arise in a (stochastic) constrained optimization formulation. The point is not that the multipliers converge to these Lagrange values, but that the shadow-price interpretation continues to organize the economic logic: higher multipliers correspond to tighter constraints and therefore more conservative bidding.
From the perspective of online learning, budget pacing resembles ``bandits with knapsacks’’ (BwK): an agent repeatedly chooses actions, observes stochastic rewards and resource consumption, and must keep cumulative consumption within a budget . The BwK framework yields powerful algorithms and regret guarantees, but it also highlights fundamental limitations. In adversarial or highly nonstationary environments, one cannot in general guarantee both high reward and strict constraint satisfaction without additional structure; moreover, many BwK guarantees are in expectation and allow violations with small probability, which can be unacceptable for hard financial constraints.
The repeated-auction setting differs from canonical BwK in ways that
cut both for and against strong guarantees. On the one hand, the bidder
does not control the mapping from bids to outcomes; allocations and
payments are generated by an auction mechanism and depend on others’
bids. On the other hand, auctions often provide monotonicity and
single-parameter'' structure: increasing one's bid (or decreasing one's multiplier) tends to weakly increase allocation and payment. This structure can substitute for stochastic optimization assumptions and can enable ex-post feasibility via slack tracking. Our analysis can be read as identifying conditions under which the auction environment iswell-behaved
enough’’ that the strongest form of feasibility (no violations on any
realized history) is compatible with meaningful welfare guarantees.
The multi-constraint setting also connects to generalizations of BwK with multiple resources. While the algorithmic literature provides methods for handling multiple knapsack constraints, transferring these guarantees to strategic multi-agent auctions is not immediate. The key difficulty is that resource consumption is endogenous to other agents’ learning and to the mechanism, and the bidder receives only bandit feedback about its own outcomes. Our approach therefore follows the pacing tradition more closely than the general BwK template: rather than solving a global primal-dual program each round, each agent runs simple per-constraint multiplier updates and relies on monotonicity plus safe bounds to ensure feasibility.
A recurring theme in constrained mechanism design is that unconstrained welfare is not the right yardstick when agents face hard budgets. Liquid welfare was introduced to address this issue by capping each agent’s contribution by its budget, yielding an efficiency notion that respects financial feasibility . Subsequent work studies approximation mechanisms and price-of-anarchy guarantees under liquid welfare in one-shot auctions and games . In repeated settings, ex-ante versions of liquid welfare provide a natural benchmark: one compares an algorithm’s expected outcome to the best single-round allocation rule subject to ex-ante caps induced by budgets.
Our contribution is not to introduce a new welfare concept from scratch, but to extend this logic to environments with ROI/CPA-type constraints per agent, where the effective feasibility cap is no longer purely monetary. Intuitively, an ROI constraint limits how much value-denominated ``numerator’’ the bidder can generate per unit of spend, so the bidder’s feasible scale is capped by whichever constraint is tightest on the realized history. This creates a structure in the agent’s liquid value, and it is precisely this minimum that complicates welfare analysis under learning dynamics: the binding constraint can switch across time in response to both realized outcomes and the agent’s own conservative bidding.
It is helpful to isolate what changes when we move from one ROI constraint to many. At the feasibility level, a natural algorithmic rule suggests itself: maintain a multiplier for each constraint and bid according to the largest multiplier. Safe initialization and slack-tracking updates can be shown to preserve feasibility constraint-by-constraint, essentially because each constraint’s multiplier only increases when its slack is negative. While there are additional bookkeeping issues (e.g., bounding all multipliers simultaneously), the feasibility proof largely scales by repeating the two-constraint argument for each constraint.
The welfare analysis, however, is not a straightforward repetition.
The reason is that welfare proofs in pacing models typically partition
time into regimes: rounds where the bidder is
high-bidding'' (multiplier small) and rounds where some constraint binds and the bidder shades its bid. In the two-constraint case, one can control the measure of boundary rounds where the identity of the binding constraint changes, and one can charge welfare loss in conservative rounds to either accumulated slack or to payments in aggressive rounds. With many constraints, there can be many near-ties: several multipliers can be close, and the identity of the maximum can change frequently due to noise in realized payments and constraint values. Each such switch can create aboundary
error’’ round in which the welfare accounting is loose: the bidder might
be bidding according to constraint j while constraint j′ is almost as tight,
causing the charging argument to incur an additive loss.
At a high level, the multi-constraint problem thus resembles a multi-expert switching process driven by noisy gradients: the algorithm effectively selects the maximum among several evolving multipliers, and the welfare bound must control the cumulative cost of switching and near-switching. This is where the technical novelty lies. One needs a potential argument that (i) is compatible with bandit feedback and (ii) bounds the total number (or total ``mass’’) of rounds in which the maximum is not well-separated. Our approach develops such an amortization, yielding a welfare loss that grows polynomially (rather than exponentially) with the number of constraints, and remains sublinear in the time horizon.
The broader autobidding literature includes many practical systems that handle multiple objectives and constraints via heuristic combinations of multipliers, calibration layers, and prediction models. Our aim is to provide a clean economic and algorithmic account of when such a multi-multiplier approach can be justified by rigorous guarantees. The price we pay is abstraction: we treat the per-round value streams as primitives, we focus on i.i.d. time structure to isolate the learning-vs.-feasibility tradeoff, and we work with auctions that satisfy strong welfare properties. These assumptions are common in the pacing literature, but they delimit the scope of the results. In particular, extending comparable welfare guarantees to nonstationary value distributions, richer feedback models (e.g., delayed conversions), or weaker auction properties remains an important direction for future work.
In summary, our paper can be viewed as a step from
budget pacing'' andbudget+ROI pacing’’ to a setting that
more closely matches current advertiser practice: multiple KPI-aligned
guardrails enforced simultaneously by a single autobidder. The key
message, consistent with the recent nonconvergence literature, is that
one can obtain robust feasibility and meaningful welfare guarantees even
when all agents are learning, provided one chooses an efficiency
benchmark that internalizes the constraints and develops proof
techniques that control the boundary effects created by multi-constraint
switching.
We model an advertising market as a repeated auction environment in which each advertiser delegates bidding to an autobidder that must simultaneously manage a and multiple . The purpose of this section is to separate the economic primitives (values, constraints, and feasibility) from the algorithmic choice of how to bid and learn. Doing so clarifies what information the bidder actually uses, what is being optimized, and what efficiency benchmark is appropriate once feasibility is taken seriously.
Time is discrete with rounds t ∈ {1, …, T}. In each
round, the platform runs an auction and chooses an allocation
vector
xt = (x1, t, …, xn, t) ∈ X ⊆ [0, 1]n,
where xk, t
represents the (possibly fractional) quantity of the opportunity awarded
to advertiser k in round t. The set X is assumed : if x ∈ X and 0 ≤ x′ ≤ x
coordinatewise, then x′ ∈ X. This
captures common environments such as a single indivisible impression
(where X is the simplex),
multiple identical units, or position auctions with fractional
click-through scaling, and it provides the monotonicity structure needed
for pacing-style arguments.
Each advertiser k submits a nonnegative effective bid bk, t ≥ 0, yielding bid profile bt. The auction then chooses an allocation xt and payments pt = (p1, t, …, pn, t) with pk, t ≥ 0. We assume the platform runs a in each round. Informally, this means: (i) the allocation is welfare-maximizing with respect to the submitted bids (so the mechanism allocates to maximize ∑kbk, txk, t over X), and (ii) the resulting payments are (so no coalition of bidders can profitably deviate, given the mechanism’s outcome). Core-selecting auctions are used in practice in several ad and procurement settings precisely because they combine strong welfare properties with stability against coordinated deviations.
For the feasibility results that follow, we additionally use a
standard no-overbidding-type condition on the realized outcome:
This inequality holds in many common auction formats (e.g., first-price
and certain core-selecting implementations) and has a direct economic
interpretation: per unit allocated, the bidder never pays more than its
submitted per-unit bid. Condition is the bridge between bid shading (via
multipliers) and hard ex-post feasibility.
A distinctive feature of modern autobidding is that a single bidder
may care about several KPIs simultaneously. We represent this by
allowing multiple per advertiser in each round:
{vk, t(0), vk, t(j) for j ∈ ℛk},
where vk, t(0) ∈ [0, v̄]
is the advertiser’s per-unit value for its KPI (the numerator of what
the advertiser aims to maximize), and each vk, t(j) ∈ [0, v̄]
is a per-unit value associated with constraint j. We emphasize two modeling choices
that match practice.
For example, an advertiser might optimize expected purchases (objective) but impose a return-on-ad-spend (ROAS) constraint using predicted revenue, or impose a cost-per-acquisition (CPA) constraint using predicted conversions. Even when these quantities are related, they are not identical: they can be computed from different models, can have different calibration, and can respond differently to targeting and auction selection. Our formulation allows the objective stream v(0) and constraint streams v(j) to be distinct and correlated in arbitrary ways.
We assume the per-round value profile is drawn i.i.d. over time from a distribution F (the components may be arbitrarily correlated across advertisers and across streams within a round). The i.i.d. assumption is not meant to deny nonstationarity in ad markets; rather, it isolates the central technical issue we study: how a bidder can guarantee and obtain meaningful welfare guarantees even when all bidders are learning and outcomes are endogenous to the market interaction. Extending the welfare analysis to richer forms of nonstationarity is an important direction, but the i.i.d. baseline already captures substantial within-round complexity and strategic interaction.
Given an allocation xk, t, the realized objective contribution for advertiser k in round t is vk, t(0)xk, t, and the realized constraint-numerator contributions are vk, t(j)xk, t. The bidder does directly choose xk, t; it is induced by the auction through the submitted bid.
Advertiser k faces two classes of intertemporal feasibility constraints.
Total spend cannot exceed a hard budget Bk:
We denote the per-round budget rate by ρk := Bk/T,
which serves as the natural scale at which a pacing algorithm tries to
regulate spending.
For each j ∈ ℛk, the
advertiser must satisfy a linear inequality of the form
This template captures standard ratio guardrails after normalization.
For instance:
(i) a ROAS constraint
revenue / spend $\ge r$'' corresponds to \eqref{eq:roi} with $v^{(j)}$ equal to revenue-per-unit and $\gamma_{k,j}=r$; (ii) an ROI constraint of the formprofit
/ spend ≥ r’’ can be
represented similarly if the numerator stream tracks
profit-per-unit;
(iii) a CPA constraint
spend / conversions $\le c$'' is equivalent to(1/c) ⋅ spend ≤ conversions’’ and
can be fit into by scaling the constraint stream (or, equivalently, by
redefining γ after a change of
units). The key point is that expresses that, over the horizon, at a
constraint-specific exchange rate γk, j.
In practice, advertisers often maintain multiple guardrails simultaneously (e.g., a minimum ROAS plus a maximum CPA, or separate CPA constraints for different conversion types). This motivates allowing |ℛk| to be arbitrary and advertiser-specific.
Finally, we impose a standard per-round cap:
This is not merely a technical convenience. Autobidders are typically
constrained (either by design or by advertiser policy) to avoid bidding
above their modeled value for the optimized KPI. As we discuss below,
the bidder will still shade bids below v(0) to ensure long-run
feasibility; prevents pathological ``overbidding to catch up’’
behavior.
Advertiser k’s
unconstrained objective is to maximize total objective value
$$
\sum_{t=1}^T v^{(0)}_{k,t} x_{k,t},
$$
subject to the feasibility constraints –. Because the auction outcome
depends on all bidders’ actions, one should not expect to exactly
optimize this objective in a strategic environment. Our focus is
therefore on (i) feasibility guarantees that hold on the realized path
and (ii) market-level welfare benchmarks that incorporate those
constraints.
A central modeling step is to evaluate allocations using . For a
realized allocation sequence x = (x1, …, xT)
and realized values v, we
define
Liquid welfare is then $W(x,v) := \sum_{k=1}^n
W_k(x,v)$.
The economic content of is straightforward: advertiser k cannot turn arbitrarily large allocations into utility because scaling up allocation also scales up spend, and spend must be supported by both money (budget) and KPI numerator (ROI/CPA). The effective ``cap’’ on feasible scale is therefore the tightest among these limits. When there is only a budget, reduces to standard liquid welfare; with multiple ROI/CPA guardrails, the cap becomes the minimum across constraints, reflecting that any one binding guardrail can throttle feasible spending.
This welfare notion also matches the practical objective of an autobidder operating under hard guardrails: a campaign is not successful if it generates high raw value while violating a CPA or ROAS target on the realized trajectory. The liquid benchmark explicitly discounts value that could only be obtained by violating constraints.
Each round proceeds as follows. First, the value streams for all advertisers are realized. Advertiser k observes its own realizations {vk, t(0), vk, t(j)}j ∈ ℛk and then submits an effective bid bk, t (a single number) to the auction. The platform computes xt and charges payments pt. After the round, advertiser k observes its own realized outcome (xk, t, pk, t) together with its own value streams (equivalently, it can compute the realized constraint ``slacks’’ γk, jpk, t − vk, t(j)xk, t).
Crucially, we assume : each advertiser sees only its own outcomes, not other bidders’ values, bids, allocations, or payments. This matches how real autobidders operate. Even when platforms provide reporting, it is typically aggregated, delayed, and noisy relative to the per-auction dynamics. Our feedback assumption is therefore deliberately austere: any learning or feasibility guarantee should be achievable without requiring market-wide observability.
We also note a limitation. Many constraint numerators (e.g., conversions, revenue) can be delayed relative to the auction. Modeling delay would require additional state variables and would generally weaken ex-post guarantees, since the bidder cannot immediately tell whether it is violating a CPA/ROAS target. We abstract from this and treat vk, t(j) as observed at the end of round t, which is appropriate when constraints are enforced using predicted (or otherwise promptly observed) KPI numerators.
Because values are i.i.d. over time, a natural welfare benchmark is :
one compares the induced allocation sequence rule (generated by the
auction coupled with bidders’ adaptive bidding) to the best allocation
rule that, in expectation, respects the same feasibility caps. Formally,
if y(⋅) maps a realized
per-round value profile to an allocation in X, then its expected
multi-constraint liquid welfare takes the form
$$
\sum_{k=1}^n T \cdot \min\left\{\, \rho_k,\; \min_{j \in \mathcal R_k}
\frac{1}{\gamma_{k,j}} \mathbb{E}_{v \sim F}\!\left[y_k(v)
v^{(j)}_k\right] \right\},
$$
which mirrors at the per-round rate level. Our main results will show
that there exists a simple bidder-side learning rule that (i) satisfies
and on every realized history and (ii) when run by all bidders in a core
auction yields a constant-factor approximation to this ex-ante benchmark
up to a sublinear-in-T
additive term.
The remaining task is algorithmic: to specify how an advertiser converts multiple constraints into a single bid each round, using only its bandit feedback, while preserving hard feasibility and enabling welfare accounting at the market level. We turn to this next.
We now specify the bidder-side procedure that turns many intertemporal guardrails into a single effective bid each round. The design goal is deliberately stringent: the autobidder should (i) require only its own realized feedback, (ii) submit a scalar bid to the auction, and yet (iii) enforce constraints on the realized trajectory without relying on convergence of market dynamics. The algorithm we study is a direct multi-constraint generalization of slack-tracking pacing, with one multiplier per constraint and a conservative aggregation rule that determines the bid.
Fix an advertiser k and
suppress k when unambiguous.
MKP maintains a separate nonnegative Lagrange-multiplier-like variable
for each feasibility constraint:
μtB (budget
multiplier), μtR, j (ROI/CPA
multiplier for each j ∈ ℛ).
These multipliers should be read as . When the budget is running ``hot’’
(spend above the per-round rate), μtB
grows and the bidder shades bids down; when an ROI constraint is
underperforming (spend not sufficiently supported by its numerator
stream), the corresponding μtR, j
grows and also shades bids down.
The key modeling constraint is that the auction accepts only a single
bid bt. We
therefore define an multiplier by a conservative aggregation:
Given μt,
the effective bid is a multiplicatively paced version of the objective
value:
This bid automatically respects the per-round cap bt ≤ vt(0)
and is monotone in the conservative multiplier: larger μt implies
weaker bidding and therefore (under standard monotonicity of
welfare-maximizing allocations and the payment upper bound ) weakly
lower spending.
The max-aggregation encodes a simple economic principle: . Put differently, if any guardrail is close to violation, we want today’s bid to be consistent with that guardrail even if other guardrails currently have slack. This is the sense in which MKP is ``safe by design’’: it is conservative in the direction that protects feasibility.
After bidding , the auction returns an allocation xt and payment
pt. The
bidder then updates every constraint multiplier using its realized
one-step . Define the budget slack increment and ROI slack increments
as
gtB := pt − ρ, gtR, j := γjpt − vt(j)xt for
each j ∈ ℛ,
where ρ = B/T is the
per-round budget rate (for advertiser k) and we have again suppressed
k for readability. Positive
g means
we overspent relative to the budget rate'' orwe violated
the ROI inequality in this round,’’ while negative g means we accrued slack.
MKP performs additive updates with constraint-specific step
sizes:
Two comments are important for interpretation.
Equations – coincide with (stochastic) gradient ascent on the Lagrangian of the constrained problem, viewing (pt − ρ) and (γjpt − vt(j)xt) as subgradients of constraint violations. In many online primal–dual methods, such updates are analyzed via convergence of time averages. Our objective is different: we use the same algebraic structure to obtain feasibility guarantees, and later to support welfare accounting even when no limit behavior exists.
The right-hand sides of – depend only on (pt, xt) and the bidder’s own value streams. The bidder does need to estimate counterfactual allocations under alternative bids, nor observe other bidders. This matters in practice because autobidders can reliably measure their own spend and attributed KPI numerators (or their predictions), while market-wide observability is typically infeasible.
Algorithmically, MKP can be summarized as follows.
With multiple constraints, one might wonder why we do not combine multipliers additively (as in a classical Lagrangian), e.g., bid with μt = μtB + ∑jμtR, j. The reason is that the bidder has only one control (the scalar bid), while feasibility must hold against a across constraints. In particular, the liquid-value cap is min {B, minj(⋅)}, and ex-post feasibility requires satisfying inequality. When constraints are coupled through a single bid, is the most robust reduction: if any constraint is tight, we want it to dominate the shading decision. Additive aggregation can perversely ``average’’ constraints, bidding too aggressively when one constraint is nearly binding but others are slack.
The max rule also yields a clean ``regime’’ structure: in any round, there is an active constraint (possibly more than one in the case of ties) that determines μt. This regime viewpoint will later be useful in the welfare proof, where we account separately for rounds in which the budget is effectively pacing versus rounds in which an ROI guardrail is effectively pacing.
The updates – can in principle drive multipliers upward or downward depending on realized slack. For feasibility, it is crucial that multipliers start in a and that step sizes are small enough that the process cannot ``jump’’ past safety thresholds.
We initialize at the conservative levels
These choices have an intuitive interpretation. If μB = v̄/ρ − 1,
then bt ≤ vt(0)/(1 + μB) ≤ v̄/(1 + μB) = ρ,
so the bid is never above the per-round budget rate scale. Similarly, if
μR, j = γj − 1,
then bt ≤ vt(0)/γj,
i.e., the bidder behaves as though each unit must clear an exchange rate
consistent with ROI strictness γj. Under the
payment upper bound pt ≤ btxt,
these initializations are the algebraic starting point for an inductive
argument that cumulative violations cannot occur.
We will impose the following sufficient conditions
(advertiser-specific, constraint-specific):
Economically, these inequalities ensure that a single unusually large
payment (or unusually poor ROI realization) cannot force the algorithm
to take a destabilizing step. Technically, they will guarantee two
related properties: (i) multipliers remain uniformly bounded above by
their safe levels in , and (ii) the cumulative slacks tracked by – never
become negative in a way that would correspond to a realized constraint
violation.
We allow μtB and μtR, j to drift below 0 in principle; this simply represents that the bidder has accumulated slack and could bid more aggressively. However, the actual bid uses μt = max {⋅, 0}, so the bidder never amplifies its bid above vt(0). This separation is convenient: multipliers can be treated as unconstrained slack-tracking state variables, while the bid remains economically interpretable and respects the per-round cap by construction.
MKP is best viewed as a conservative controller: it treats each guardrail as generating a ``pressure’’ (a multiplier), and then bids according to the strongest pressure. This is consistent with how practitioners often describe multi-goal autobidding systems: .
Two limitations are worth emphasizing. First, MKP relies on the payment upper bound pt ≤ btxt to connect bid shading to realized spend; if the auction format can charge above bid-per-unit (e.g., due to certain nonstandard pricing rules), one must either modify the algorithm (e.g., incorporate an upper-confidence correction) or accept weaker guarantees. Second, the ROI updates use vt(j)xt as the realized numerator contribution in round t. This is appropriate when constraint numerators are promptly observable or reliably predicted at bidding time; with delayed or highly noisy attribution, one would need additional machinery (e.g., delayed-feedback OCO) and would generally lose the strongest ex-post statements.
With the algorithm now specified, the next step is to show that these simple updates indeed enforce constraints on every realized history. The proof will formalize the slack-tracking intuition: once a constraint approaches tightness, its multiplier becomes (and remains) large enough that the realized one-step slack cannot be negative, which inductively rules out cumulative violations.
We now formalize the ``safety’’ claim implicit in slack-tracking: MKP enforces intertemporal constraints . The point is not merely that violations vanish in expectation or in the limit, but that they do not occur at all (with probability 1 over the realized values), even though the bidder observes only its own bandit feedback and the market need not converge.
The ex-post feasibility proof is intentionally modular: it relies only on a small set of per-round inequalities that link bids to payments and to the KPI numerators used in ROI constraints.
For each advertiser and round, the auction outcome satisfies
This is a standard property in many position auctions and
welfare-maximizing core mechanisms with per-unit pricing: the bidder is
never charged more than its bid per unit times the allocated amount.
For each ROI/CPA constraint j ∈ ℛ, the realized per-unit
``numerator value’’ dominates the objective per-unit value:
Economically, is a calibration requirement: the stream v(j) used in the
constraint should be measured in the same (or a more conservative) units
than the value used to form bids. In practice, platforms often implement
ROI/CPA guardrails by choosing a constraint numerator that is
deliberately conservative relative to the objective (e.g., predicted
attributed value net of adjustments), precisely to ensure that
satisfying the algorithmic inequality implies satisfying the business
constraint.
If instead one has vt(j) ≥ αjvt(0) for known αj ∈ (0, 1], then all statements below extend by replacing the safe ROI level γj − 1 with γj/αj − 1. We keep to match the initialization .
Notably, the property of the auction is not used for feasibility; it will enter only in the efficiency argument in the next section. Likewise, we do not require convergence or any stationarity beyond the realized per-round inequalities above.
The multiplier updates – imply an exact accounting identity: each
multiplier is the initial safe level plus a scaled cumulative slack. For
any prefix length t,
Thus, to prove the inequalities, it suffices to prove a on the
multipliers: if we can show that μtB ≤ μ1B
and μtR, j ≤ μ1R, j = γj − 1
for all t, then the
corresponding cumulative slacks in – must be nonpositive for every
prefix, which is exactly ex-post feasibility.
This is the sense in which slack-tracking is an design: rather than proving convergence of (μt), we prove that the process cannot cross a safety boundary.
Although our proofs below are algebraic, they embody a simple
monotonicity logic. The max-aggregation rule ensures that the realized
bid bt is
the bid that would be implied by any individual constraint multiplier
considered in isolation. Formally, since μt ≥ μtB
and μt ≥ μtR, j,
Therefore, whenever we upper bound a violation term using only bt (via ), the
max rule can only help: bidding ``more conservatively than necessary’’
cannot create a payment explosion, and under it cannot create an ROI
shortfall either, because a sufficiently small bid makes each unit
purchased cheap enough to satisfy γjp ≤ v(j)x
on that round.
We next show that, under the step-size conditions , each multiplier is near its safe boundary. Intuitively, as a multiplier approaches its safe level from below, the induced bid becomes so conservative that the corresponding one-step slack becomes nonpositive (or, more generally, too small to push the multiplier across the boundary in one step).
Assume and ηB ≤ 1/v̄.
Then along every realized history,
Fix a round t and suppose
μtB ≤ μ1B.
By and ,
$$
p_t \;\le\; b_t x_t \;\le\; b_t \;\le\; \frac{v^{(0)}_t}{1+\mu^B_t}
\;\le\; \frac{\bar v}{1+\mu^B_t}.
$$
Hence
By the definition of μ1B = v̄/ρ − 1,
we have ρ = v̄/(1 + μ1B),
so the right-hand side of equals
$$
\frac{\bar v}{1+\mu^B_t} - \frac{\bar v}{1+\mu^B_1}.
$$
The function f(μ) = v̄/(1 + μ)
has derivative |f′(μ)| = v̄/(1 + μ)2 ≤ v̄,
so by the mean value theorem,
$$
\frac{\bar v}{1+\mu^B_t} - \frac{\bar v}{1+\mu^B_1}
\;\le\;
\bar v(\mu^B_1-\mu^B_t).
$$
Combining with the update μt + 1B = μtB + ηB(pt − ρ)
yields
μt + 1B ≤ μtB + ηBv̄(μ1B − μtB) ≤ μtB + (μ1B − μtB) = μ1B,
where we used ηB ≤ 1/v̄.
This proves the inductive step; the base case holds by initialization.
▫
Fix j ∈ ℛ. Assume , , and
ηjR ≤ 1/v̄.
Then along every realized history,
Fix t and suppose μtR, j ≤ γj − 1.
Using and ,
where the last inequality uses vt(j) ≥ vt(0).
Rearranging,
$$
\gamma_j \frac{v^{(0)}_t}{1+\mu^{R,j}_t} - v^{(0)}_t
\;=\;
v^{(0)}_t \left(\frac{\gamma_j}{1+\mu^{R,j}_t}-1\right)
\;=\;
v^{(0)}_t \cdot \frac{\gamma_j-1-\mu^{R,j}_t}{1+\mu^{R,j}_t}
\;\le\;
\bar v \,(\gamma_j-1-\mu^{R,j}_t),
$$
using xt ≤ 1 and 1 + μtR, j ≥ 1.
Therefore,
γjpt − vt(j)xt ≤ v̄(γj − 1 − μtR, j).
Plugging into the update μt + 1R, j = μtR, j + ηjR(γjpt − vt(j)xt)
gives
μt + 1R, j ≤ μtR, j + ηjRv̄(γj − 1 − μtR, j) ≤ μtR, j + (γj − 1 − μtR, j) = γj − 1,
where we used ηjR ≤ 1/v̄.
Again the base case is the initialization μ1R, j = γj − 1.
▫
We can now state the pathwise feasibility conclusion.
Under and , and with step sizes satisfying , MKP guarantees for every
realized sequence of values and auction outcomes that
$$
\sum_{t=1}^T p_t \;\le\; B,
\qquad
\gamma_j \sum_{t=1}^T p_t \;\le\; \sum_{t=1}^T v^{(j)}_t x_t \quad
\forall j\in\mathcal R.
$$
Equivalently, constraints hold with probability 1 over the randomness in the
i.i.d. draws.
By Lemma~1, μT + 1B ≤ μ1B.
Using and ηB > 0,
$$
\sum_{t=1}^{T} (p_t-\rho)
\;=\;
\frac{\mu^B_{T+1}-\mu^B_1}{\eta^B}
\;\le\; 0,
$$
so $\sum_{t=1}^T p_t \le T\rho = B$.
Similarly, by Lemma~2 for each j,
$$
\sum_{t=1}^{T} (\gamma_j p_t - v^{(j)}_t x_t)
\;=\;
\frac{\mu^{R,j}_{T+1}-\mu^{R,j}_1}{\eta^{R}_j}
\;\le\; 0,
$$
and since μ1R, j = γj − 1,
we obtain $\gamma_j \sum_{t=1}^T p_t \le
\sum_{t=1}^T v^{(j)}_t x_t$ for all j. These arguments are deterministic
for any realized path, hence the ``probability 1’’ statement. ▫
Two interpretive points are worth emphasizing.
The proposition is not a statement about limiting behavior: multipliers may oscillate, constraints may alternate in being ``active,’’ and the market may never settle. Feasibility is nevertheless guaranteed because MKP enforces an upper barrier on each constraint-specific multiplier, and the multiplier is an affine transform of cumulative slack. This is precisely the operational notion of a guardrail: it must hold on the realized trajectory, not just on average.
The role of is to make the safe ROI level γj − 1 meaningful: when μt ≥ γj − 1, we have bt ≤ vt(0)/γj ≤ vt(j)/γj, and then implies γjpt ≤ vt(j)xt . If the ROI numerator stream is noisier, delayed, or systematically smaller than the bidding value stream, then one must either (i) adjust the safe level as noted earlier, or (ii) accept weaker, typically expectation-based, feasibility notions. We view this as a substantive modeling choice rather than a technicality: ex-post safety is only as strong as the measurement system implementing the constraint.
Having established that MKP is safe on every realized history, we can now turn to the efficiency question: even without convergence, does conservative constraint-tracking preserve welfare relative to a suitable ex-ante benchmark, and how do we control the additional error created by switching among many potentially binding constraints?
Our next goal is to understand whether slack-tracking is a safety device or whether it also preserves allocative efficiency. The subtlety is that MKP need not converge: the identity of the binding constraint for a bidder may change repeatedly over time, and the resulting bids may be conservative in ways that are not well approximated by any fixed multiplier. Nevertheless, because values are i.i.d. over time and because the auction is assumed to be core-selecting (hence admits a per-round welfare-to-revenue charging argument), we can prove a constant-factor welfare guarantee relative to a natural ex-ante benchmark, up to an additive $\tilde O(\sqrt{T})$ term.
Because the primitives are i.i.d. over rounds, it is natural to
benchmark the realized sequence rule Φ induced by the auction plus the
bidders’ algorithms against an single-round allocation policy.
Concretely, let y : [0, v̄]n(1 + ∑kmk) → X
be any (possibly randomized) single-round allocation rule mapping the
realized per-round value profile v to a feasible allocation y(v) ∈ X. We
measure its ex-ante liquid welfare as
This benchmark is deliberately : bidder k cannot contribute more than ρk per round on
average because of its total budget Bk = Tρk,
and cannot contribute more than its weakest ROI/CPA-implied per-round
cap $\min_j \frac{1}{\gamma_{k,j}}\mathbb
E[y_k(v) v^{(j)}_k]$.
Given the induced allocation-sequence rule Φ produced when all bidders run MKP
and the platform runs the core auction each round, we analogously
define
W(Φ, F) := 𝔼v ∼ FT[W(x(v), v)], x(v) = Φ(v),
where W(⋅, ⋅) is the
multi-constraint liquid welfare defined earlier. The main question in
this section is whether W(Φ, F) is
comparable to W(y, F) for every
benchmark y.
The efficiency proof begins from a per-round inequality implied by
core outcomes, which we use as a black box. Fix a round t and suppress t from notation. Let b = (b1, …, bn)
be the realized bid profile and (x, p) be the outcome of
the core auction. The relevant implication of core-ness is that no
feasible allocation rule y(⋅)
can be implemented by a blocking coalition that makes all its members
weakly better off while paying the seller. In its simplest aggregate
form, this yields the inequality
Combining with the payment upper bound pk ≤ bkxk
(Assumption (A1) in the previous section) immediately gives a familiar
``half approximation’’ for :
Inequality is the per-round engine behind the global theorem: it says
that, even though bidders shade their bids via multipliers, the auction
outcome is never too inefficient . The remaining work is to translate
bid welfare back into , and to quantify the additional loss created by
the fact that each bidder uses a over many evolving constraint
multipliers.
For each advertiser k and
round t, recall that MKP forms
an overall multiplier
$$
\mu_{k,t}=\max\Big\{\mu^B_{k,t},\;\max_{j\in\mathcal
R_k}\mu^{R,j}_{k,t},\;0\Big\},
\qquad
b_{k,t}=\frac{v^{(0)}_{k,t}}{1+\mu_{k,t}}.
$$
It is useful to label the identity of the component of this maximum.
Define the active index
ak, t ∈ arg max {μk, tB, {μk, tR, j}j ∈ ℛk, 0},
breaking ties arbitrarily but measurably. This partitions time into for
bidder k: budget-binding
rounds (ak, t = B),
ROI-binding rounds (ak, t = j),
and unconstrained rounds (ak, t = 0).
If there were only one constraint, or if the active index never changed, we could proceed almost exactly as in the two-constraint analyses: we would use to compare allocations to any benchmark y, and then use slack-tracking to argue that shading is ``appropriate’’ in the sense that the bidder either (i) bids close to value (hence loses little welfare) or (ii) is near its liquid cap (hence cannot contribute much more welfare anyway).
With many constraints, the technical obstacle is the of ak, t. When constraints are close, MKP uses the most conservative multiplier, which may be slightly larger than the ``right’’ one for the welfare comparison in that round. This conservatism is safe (by construction) but potentially wasteful. We therefore need an accounting device that bounds the total welfare loss due to frequent near-ties and switches.
We introduce a simple way to control the cumulative error created by
taking a hard maximum over many evolving multipliers. Fix a resolution
parameter Δ > 0. For each
bidder k, define the set of
where c ranges over the
components {B} ∪ ℛk ∪ {0}
and we write μk, t0 ≡ 0.
Intuitively, outside ℬk(Δ) there is a
: the active multiplier exceeds every other candidate by at least Δ, so small perturbations in the
identity of the max would not change bids much.
Two facts make useful.
Because bids are of the form v(0)/(1 + μ),
the mapping μ ↦ 1/(1 + μ) is 1-Lipschitz on [0, ∞) in the sense that
$$
\left|\frac{1}{1+\mu}-\frac{1}{1+\mu'}\right|
\;\le\;
|\mu-\mu'|.
$$
Thus, if two candidate multipliers differ by at most Δ, then the induced bids differ by
at most Δ ⋅ v(0) ≤ Δv̄.
In boundary rounds, even if MKP ``chooses the wrong’’ constraint to be
conservative about, the welfare loss we attribute to this conservatism
can be upper bounded by O(Δv̄) per bidder,
hence O(nΔv̄)
per round. This is the first lever: .
Each component multiplier evolves by an additive update with bounded increments. Indeed, for budget we have μk, t + 1B − μk, tB = ηkB(pk, t − ρk) and for ROI constraints μk, t + 1R, j − μk, tR, j = ηk, jR(γk, jpk, t − vk, t(j)xk, t). Under the standing bounds pk, t ≤ v̄ and vk, t(j)xk, t ≤ v̄, each increment has magnitude at most O(ηv̄).
Now consider, for any two components c ≠ c′, the gap
process gk, tc, c′ := μk, tc − μk, tc′.
Each step changes this gap by at most O(ηv̄), so for the
max identity to switch between c and c′ (or for c′ to come within Δ of c), the gap must traverse an
interval of width Δ. A
standard discretization argument then yields a bound of the form
where ηk
denotes a representative step size scale and the poly(mk) factor
comes from taking a union bound over the O(mk2)
component pairs that can generate near-ties. The economic content of is
that a bidder cannot spend too many rounds in ``indecision’’ unless its
multipliers move enough to repeatedly erase and recreate gaps of size
Δ, and the updates are too
small to allow this at a high frequency.
Putting (i) and (ii) together, the welfare loss due to
multi-constraint switching can be bounded by
where m = maxkmk.
Choosing $\eta\asymp 1/\sqrt{T}$ and a
matching $\Delta\asymp 1/\sqrt{T}$
yields an additive $\tilde
O\big(n\,\mathrm{poly}(m)\,\bar v\sqrt{T}\big)$ term, which is
the scale appearing in the main theorem below. While suppresses
constants and some bookkeeping, it captures the key design logic: we
trade off a small per-round ``approximation’’ error Δ against a bound on how often the
process can hover near boundaries at that resolution.
We now state the efficiency guarantee informally at the level needed for the remainder of the paper; the proof follows the template above, combining the per-round half-welfare inequality , the ex-post feasibility result from Section~, and the multi-boundary accounting in –.
Fix any distribution F over
per-round value profiles, i.i.d. over time, and suppose the platform
runs a core auction each round satisfying the payment upper bound pk, t ≤ bk, txk, t.
Suppose all advertisers run MKP with step sizes on the order of $\eta\lesssim \sqrt{\log(\bar v n T)/T}$
(with constants depending polynomially on m = maxk|ℛk|).
Then, for any single-round benchmark allocation rule y,
Inequality says that, even though bidders are learning and may be alternately budget- or ROI-constrained over time, the resulting market outcome achieves a constant fraction of any ex-ante feasible liquid benchmark. The 1/2 factor is inherited from the core-based charging argument and is tight for this proof technique; improving it would require either additional structure (e.g., stronger revenue covering properties) or a different benchmark. The additive term is the finite-horizon price of learning and of managing many guardrails at once: it grows as $\sqrt{T}$ (so it vanishes on a per-round basis) and degrades polynomially with the number of constraints through the boundary accounting.
The theorem is not a convergence statement and does not imply that the market stabilizes; rather, it provides an operational welfare guarantee that holds equilibrium selection. From a policy and platform-design standpoint, the message is that adding multiple ROI/CPA guardrails need not destroy efficiency, but it does introduce a quantitative tradeoff: more constraints increase the frequency of near-tie situations in which bidders shade for the most conservative constraint, and bounding the welfare cost of this conservatism requires either smaller learning rates or accepting a larger finite-T error term. This is consistent with practice: multi-objective autobidders are typically stable but may appear ``slower’’ to adapt when many KPIs are enforced simultaneously.
Having established an aggregate welfare guarantee, we next turn to individual learning guarantees. In particular, we will define a generalized per-round constraint-pacing benchmark and show that each advertiser achieves sublinear dynamic regret against that benchmark, with rates that depend on the total path length of the benchmark multipliers across constraints.
The welfare theorem above is deliberately agnostic about equilibrium selection: it certifies that the performs well even if bidders do not converge. For practitioners, however, a complementary question is whether an individual autobidder is ``doing the right thing’’ from its own perspective, given its constraints and the (possibly evolving) competitive environment. In this subsection we formalize such an individual guarantee via a dynamic regret benchmark that generalizes classical budget pacing to many ROI/CPA-type constraints.
Fix an advertiser k and
view the remainder of the system—the other bidders’ algorithms together
with the platform’s core auction—as part of the environment. For any
scalar multiplier μ ≥ 0,
consider the counterfactual in which bidder k shades its objective value in
round t by bidding
$$
b_{k,t}(\mu)\;=\;\frac{v^{(0)}_{k,t}}{1+\mu},
$$
while all other inputs are held fixed. Let xk, t(μ)
and pk, t(μ)
denote the resulting (random) allocation and payment to bidder k in that round. We define the
associated and functions in expectation:
The interpretation is direct. A per-round budget target ρk = Bk/T
corresponds to the condition ZtB(μ) ≤ ρk,
while the jth ROI/CPA
constraint corresponds to ZtR, j(μ) ≤ 0.
The benchmark we use is a that, in each round, chooses the
right'' multiplier for that round in expectation. Formally, define $\mu^{B,*}_t$ to be any solution to \begin{equation}\label{eq:muB-star} Z^B_t(\mu)=\rho_k \end{equation} when such a solution exists, and otherwise set $\mu^{B,*}_t:=0$ (the case where even unshaded bidding does not spend at rate $\rho_k$). For each ROI constraint $j\in\mathcal R_k$, define $\mu^{R,j,*}_t$ to be any solution in $[0,\gamma_{k,j}-1]$ to \begin{equation}\label{eq:muR-star} Z^{R,j}_t(\mu)=0 \end{equation} when such a solution exists, and otherwise set $\mu^{R,j,*}_t:=0$.\footnote{Restricting to $[0,\gamma_{k,j}-1]$ matches the usualsafe’’
region suggested by ROI pacing: beyond γk, j − 1
the bidder is so conservative that the constraint is automatically
satisfied under mild alignment conditions between billed value and the
jth constraint value stream.
The analysis can be extended to a larger interval when one has global
Lipschitz control.} Finally, define the overall per-round comparator
multiplier as
This benchmark has two appealing economic properties. First, it is : each μtR, j, * captures how aggressively bidder k would have to shade in round t if it were only pacing for constraint j, while μtB, * captures pure budget pacing. Second, it is : the sequence {μt*}t ≤ T may drift over time as competition or value distributions change, which is exactly what we see in real auction markets when seasonality, product changes, or competitor entry shift the mapping from bids to spend and performance.
To evaluate learning, we compare the realized objective value under
MKP to the objective value that bidder k would obtain if it could play the
benchmark sequence {μt*}
from . Let
Uk, t(μ) := vk, t(0)xk, t(μ)
denote bidder k’s round-t realized objective value under
multiplier μ. We define the
(expected) dynamic regret as
where μk, t
is the overall multiplier chosen by MKP. The expectation is over the
realized values, the auction randomness (if any), and any randomness in
tie-breaking. Importantly, MKP guarantees ex-post feasibility of all
constraints (Section~), so regret is a statement about performance .
The benchmark is deliberately conservative: it does not compare against an unconstrained best response (which may be ill-posed under budgets and ROIs), but instead against a pacing-style policy parameterized by a single scalar multiplier each round. This matches what industrial autobidders can actually implement at low latency, and it avoids conflating learning performance with the impossibility of anticipating future values under hard constraints.
A technical obstacle in proving is that MKP updates constraint-specific multipliers every round, even when they are not the active component of the maximum defining μk, t. Thus the update for a non-active constraint is computed using feedback generated under a multiplier than that constraint would itself imply. This produces what one may call a ``wrong-gradient’’ effect.
The reason this is tolerable is economic: the error is one-sided. Updating a non-binding constraint using outcomes from a more conservative bid tends to slack (because spending and allocation fall when we increase μ), which can only make the corresponding multiplier smaller and hence less conservative in the future. Since feasibility is enforced by the maximum over multipliers and by safe initialization, such errors do not threaten constraint satisfaction; they only influence how quickly the algorithm approaches the appropriate level of conservatism.
To translate this intuition into a quantitative regret bound, we impose mild regularity on the environment faced by bidder k.
There exist constants LB and {LR, j}j ∈ ℛk
such that for all μ, μ′ ≥ 0 and
all rounds t,
|ZtB(μ) − ZtB(μ′)| ≤ LB|μ − μ′|, |ZtR, j(μ) − ZtR, j(μ′)| ≤ LR, j|μ − μ′|.
This assumption is satisfied, for instance, when the distribution of
effective competing bids has a bounded density and the allocation rule
is monotone in bids; it is also the natural continuity condition needed
for any per-round ``solve for μ’’ comparator such as – to be
well-behaved.
We now state the individual learning guarantee at the level needed
for our applications. For each comparator sequence, define its :
These quantities measure how non-stationary the ``right’’ pacing levels
are. Small path length corresponds to a stable market (or stable
objectives), while large path length corresponds to drift, entry, or
regime shifts.
Fix an advertiser k. Under
boundedness of values in [0, v̄], the payment upper bound
pk, t ≤ bk, txk, t,
and the Lipschitz response assumption above, MKP with appropriately
tuned step sizes guarantees
where the Õ(⋅) notation
suppresses logarithmic factors in (n, T, v̄) and
polynomial dependence on the Lipschitz constants and on mk = |ℛk|.
Several comments help interpret . First, regret is sublinear in T even when the comparator moves, as long as its cumulative movement (path length) is not too large. Second, the dependence on {PTR, j} is additive across ROI constraints: each additional guardrail introduces an additional drifting target that the algorithm tracks, and the bound makes this cost explicit. Third, the exponents reflect the fact that we are solving a constrained tracking problem under bandit feedback and with a max-coupling across constraints; sharper rates are possible under stronger feedback (e.g., full-information counterfactuals) or stronger curvature assumptions, but is sufficient for an ``operational’’ guarantee that improves with scale.
When the environment is stationary in the sense that the expected
response functions – do not depend on t (as in our i.i.d. model), we may
take μtB, * ≡ μB, *
and μtR, j, * ≡ μR, j, *,
so that PTB = PTR, j = 0
for all j. In this case
simplifies to
𝔼[Regk(T)] ≤ Õ (mk T7/8 + T3/4),
and hence the average per-round regret vanishes:
$$
\frac{1}{T}\mathbb E\big[\mathrm{Reg}_k(T)\big]\to 0.
$$
Equivalently, MKP asymptotically matches the performance of the best
pacing multiplier (subject to the induced constraints) in the stationary
limit, while still guaranteeing ex-post constraint satisfaction along
the entire finite sample path. This is the sense in which slack-tracking
is not merely a safety mechanism: it is also an individually rational
learning rule that adapts to market conditions without requiring
convergence of bids or multipliers.
From a platform-design perspective, the corollary helps explain a stylized empirical fact: when markets are stable, multiplier-based autobidders tend to settle into approximately constant pacing levels, and performance differences across algorithms are largely transient. When markets are unstable, the relevant question is not whether the algorithm converges, but whether it tracks the moving ``right’’ level quickly enough; the path-length terms in provide a quantitative handle on that tracking problem.
Having established both an aggregate welfare guarantee and an individual learning guarantee, we next discuss extensions and limitations of the analysis, including approximate-core auctions and modest forms of non-stationarity.
Our analysis is intentionally modular: the feasibility guarantees are driven by the slack-tracking updates and require only minimal per-round accounting, while the welfare and regret guarantees rely on additional structural properties of the auction and of the environment faced by a bidder. This modularity makes it natural to ask what happens when we relax the cleanest assumptions (exact core, i.i.d. values, and purely monetary constraints). We discuss several such extensions and, equally importantly, where the current proof techniques appear to reach their limits.
The welfare proof uses a core-type inequality to ``charge’’ welfare loss to payments (or to foregone payments) when bidders shade. In practice, however, large ad platforms often implement allocation and pricing rules that are only approximately welfare-maximizing, and core-like properties may hold only up to approximation. Sources of approximation include discretization (bid rounding and minimum bid increments), reserve prices and other business rules, throttling and eligibility constraints, and engineering constraints that require decomposing a global allocation problem into independent subproblems.
A useful way to formalize these frictions is to assume the auction is
in the following high-level sense: for every realized round and every
feasible alternative allocation x̃t ∈ X,
the payments in the realized outcome cannot be jointly
undercut'' by the coalition that would benefit from $\tilde x_t$, except up to a multiplicative factor and/or an additive slack. Concretely, one can posit that there exist parameters $\alpha\ge 1$ and $\varepsilon\ge 0$ such that the inequality used in the charging argument holds with a weakening of the form \[ \text{(exact core inequality)}\quad\Longrightarrow\quad \text{(approximate inequality)}\;\;\ge\;\;\frac{1}{\alpha}\cdot(\cdot)\;-\;\varepsilon. \] While the exact algebra depends on the particular core notion adopted by the mechanism, the economic implication is robust: the multiplicative constant in the welfare theorem degrades by at most a factor $\alpha$, and an additional additive term of order $\varepsilon T$ appears. Put differently, approximate-core auctions still support a meaningful welfare guarantee, but the platformpays’’
for deviations from exact core either by reducing the welfare fraction
that can be certified or by introducing an error that accumulates
linearly over time.
This extension is important for practice because it shifts attention to which frictions are consequential. Small discretization errors correspond to a small ε and are essentially harmless at scale, whereas systematic distortions (e.g., aggressive reserves) effectively increase α by weakening the ability to charge welfare losses to others’ payments. From a design standpoint, our framework therefore suggests a concrete engineering objective: if the platform cannot (or does not wish to) implement an exact core outcome, it can still preserve the welfare logic by keeping the implied (α, ε) parameters small in the regions of the bid space where autobidders most often operate.
A further limitation is interpretability across formats. Many real-world auctions combine multiple ad formats and constraints (sponsored search, shopping, display) in ways that blur the feasible set X and complicate any single notion of ``core.’’ Our welfare argument is most compelling when the platform can credibly view each round as selecting an outcome from a well-defined downward-closed feasibility set and when prices are linked to bids via a monotonic mechanism. When these conditions fail, approximate-core is still a reasonable diagnostic, but verifying it becomes an empirical question rather than a purely theoretical one.
We imposed i.i.d. values over time primarily to cleanly define an ex-ante welfare benchmark and to avoid conflating learning guarantees with adversarial drift. That said, the individual guarantee in Section~ already points to a more realistic model of market dynamics: the mapping from a bidder’s multiplier to its spend and ROI outcomes can drift as competitors enter, as budgets renew, or as user composition changes. The path-length terms in explicitly quantify this drift at the level of the comparator multipliers, and this is precisely the notion of non-stationarity that practitioners tend to care about.
Two observations are worth emphasizing. First, . The budget and ROI constraints are enforced along the realized sample path, and the slack-tracking invariant does not require any probabilistic stability. Thus, even in a highly non-stationary environment, MKP continues to guarantee no violations provided the per-round accounting conditions hold. Second, . When the ``right’’ multiplier changes from week to week, a fixed-multiplier comparator is no longer the appropriate yardstick; the per-round comparator sequence {μt*} is the right object, and its path length becomes the measure of difficulty.
At the market level, extending the welfare theorem to non-stationarity is more delicate because our benchmark there is ex-ante and distributional. One natural extension replaces a single distribution F by a sequence {Ft} and compares to the sum of per-round ex-ante liquid welfare under Ft. Conceptually, the same charging argument can be applied round-by-round, but the proof must also control the additional boundary terms created by frequent switching of the binding constraint when the environment drifts. Our current analysis already incurs such boundary terms even in stationary settings (through the ``switching’’ amortization), and non-stationarity effectively increases their magnitude by making switches more frequent. This suggests a practical take-away: if the platform anticipates strong seasonality, it may be beneficial to reduce the frequency of hard regime changes in the allocation rule (for example, by smoothing reserve adjustments), since abrupt platform-side changes mechanically increase the difficulty of tracking for all bidders.
A complementary algorithmic response is to introduce explicit or . In drifting environments, constant step sizes can react faster but may amplify noise, while decaying step sizes stabilize but adapt slowly. Restarting (e.g., monthly) can approximate the best of both worlds when drift is episodic. These modifications are well understood in online optimization, but incorporating them while preserving our clean ex-post feasibility guarantee requires care because restarts reset multipliers and therefore reset accumulated slack. One safe approach is to restart only downward (i.e., to reduce conservatism) when one can certify that cumulative slack is sufficiently positive; formalizing such ``safe restarts’’ is a promising direction.
We assumed vk, t(j) ∈ [0, v̄] to keep both feasibility and regret statements transparent. In many performance advertising settings, however, the realized value of an impression (especially for conversion-related KPIs) can be heavy-tailed. While one can interpret vk, t(j) as a clipped or winsorized value (as is common operationally), it is useful to understand what the theory is buying from boundedness.
For feasibility, boundedness primarily enters through step-size constraints that prevent multipliers from overshooting their ``safe’’ region in a single update. With heavy tails, a single outlier could cause a large update unless the algorithm uses robust gradients (e.g., truncation) or adaptive step sizes that cap per-round change. For regret, heavy tails require concentration arguments that are not available under mere bounded second moments, particularly in the presence of bandit feedback. Practically, platforms and bidders already implement various forms of smoothing and cap-and-floor logic on both spend and performance signals; our analysis can be viewed as certifying that such logic is not merely heuristic but structurally necessary for worst-case safety.
A related limitation concerns measurement error: the constraint values vk, t(j) may themselves be estimated (e.g., predicted conversions) rather than observed truth. In that case, the algorithm can be feasible with respect to the constraint while violating the true one. A robust approach is to incorporate statistical margins (confidence intervals) directly into the constraint updates, effectively tightening γk, j in the updates to account for uncertainty. Doing so preserves the no-violation spirit but introduces an explicit conservatism–efficiency tradeoff that must be calibrated.
Budgets and ROI/CPA constraints are ``monetary’’ in the sense that they are naturally expressed in terms of payments and value-per-impression streams. Many real campaigns also face non-monetary constraints: frequency caps (do not show too many ads to the same user), minimum delivery or reach requirements, share-of-voice targets, brand-safety exclusions, and pacing constraints at finer granularities (hourly or geo-level budgets). These constraints typically couple across impressions and depend on user-level state, making them qualitatively different from the aggregate constraints we study.
Nevertheless, the slack-tracking template suggests a plausible route to integration. A frequency cap, for instance, can be written as a constraint on a cumulative exposure count. If the bidder observes its realized exposures (even with delay), one can add an additional multiplier whose update tracks exposure slack. The key difference is how the multiplier enters bidding: for budgets and ROI, a scalar multiplier naturally scales down an objective value. For frequency caps, the ``cost’’ of showing an additional impression depends on the current user-level exposure state, so a single scalar multiplier may be too coarse unless the platform provides a state-aggregated signal (e.g., an expected marginal frequency cost). In other words, extending the model requires either (i) richer bid adjustments (a context-dependent multiplier), or (ii) platform-side aggregation that maps user-level constraints into an effective per-impression shadow price.
This reveals a broader design lesson. Constraints that are naturally dualized at the impression level (payments, predicted value streams) fit cleanly into multiplier bidding. Constraints that are inherently combinatorial or stateful (frequency, reach) require additional structure—either more information to the bidder or more responsibility on the platform to enforce feasibility. Our current results therefore do not directly justify the common industry practice of bundling all controls into a single multiplier; rather, they clarify when such bundling is principled and when it is an approximation that should be validated empirically.
Finally, many KPI streams are delayed: conversions may be observed hours or days after an impression, and attribution rules can reassign credit retroactively. Delays fundamentally change the feedback model. Payments are typically known immediately, so the budget update remains well posed, but ROI/CPA updates depend on vk, t(j)xk, t, which may be revealed only after a random delay. One can treat this as online learning with delayed feedback, for which regret bounds usually incur an additional term scaling with the total delay. The more pressing issue for our setting is : if the bidder cannot observe ROI slack in real time, it may overspend relative to the true constraint before the multiplier reacts.
A conservative solution is to update ROI multipliers using predicted or lower-confidence-bound values until true outcomes arrive. This mirrors how production systems often operate: they pace to predicted conversions and then correct ex post. The theoretical challenge is to quantify the welfare loss induced by this conservatism and to characterize when the correction dynamics remain stable. We view this as an important direction for future work because delays are the norm rather than the exception in performance advertising, and a theory that explicitly accounts for them would substantially strengthen the operational relevance of slack tracking.
Taken together, these extensions delineate a boundary between what is structurally robust in our approach and what is contingent on modeling choices. Ex-post feasibility is remarkably robust: it survives approximate auctions, non-stationarity, and many forms of noise so long as the per-round accounting remains valid. Welfare and regret guarantees are more sensitive: they depend on the auction’s degree of core-ness, on continuity of response to bid shading, and on how rapidly the ``right’’ pacing levels drift. These sensitivities motivate the empirical questions we turn to next: when multiple constraints compete to bind, how often do switches occur in realistic environments, how large are the resulting transient losses, and how do these effects compare across competing autobidding rules.
While our theory is deliberately worst-case, it is useful to complement it with a controlled numerical study that isolates the two phenomena that are hardest to see from the statements alone: (i) how conservative bidding induced by ROI-type constraints interacts with competition in a repeated auction, and (ii) how often the identity of the binding constraint switches in regimes where several constraints are simultaneously ``near tight.’’ Our goal here is not to claim empirical realism, but to provide a sanity check and a set of diagnostics that practitioners can reproduce with their own calibrated simulators.
We simulate a repeated single-slot auction with a downward-closed feasibility set: in each round t, at most one advertiser receives the impression, so xk, t ∈ {0, 1} and ∑kxk, t ≤ 1. The allocation rule is the standard welfare-maximizing rule with respect to submitted bids: the impression is allocated to the highest effective bid bk, t, and the winner pays the second-highest bid (ties broken uniformly). This choice has two practical advantages: it respects the per-round accounting condition pk, t ≤ bk, txk, t, and it cleanly separates strategic shading (through multipliers) from the platform-side allocation logic.
Values are i.i.d. across rounds, but can be correlated across streams
within a bidder. Concretely, we draw a latent quality qk, t ∼ Unif[0, 1],
and then set
vk, t(0) = v̄ ⋅ qk, t, vk, t(j) = v̄ ⋅ (λjqk, t + (1 − λj)ξk, t(j)),
where ξk, t(j) ∼ Unif[0, 1]
are independent and λj ∈ [0, 1]
controls alignment between the objective stream and constraint stream
j. This generates the
intuitive cases: when λj ≈ 1,
objective and constraint values move together (ROI is ``predictable’’
from the objective), whereas when λj ≈ 0, ROI is
noisy and switching across constraints is more likely.
Each advertiser k has a total budget Bk = ρkT and mk ROI constraints indexed by ℛk, with strictness parameters γk, j ≥ 1. To expose switching, we vary three knobs:
Unless otherwise stated we take v̄ = 1, n ∈ {10, 50}, and T ∈ {104, 105, 5 ⋅ 105}. For each configuration we run 50 independent seeds and report means and interquartile ranges.
All bidders participate simultaneously (self-play). We compare the following bidding rules, each using only bandit feedback of realized (xk, t, pk, t) and its realized vk, t(j):
Step sizes are chosen to satisfy the safe-update conditions in our feasibility propositions; in practice we set η ∝ T−1/2 and then scale by a small constant so that one-step changes are visibly smooth.
We track three families of outcomes.
For budget we report $\max\{0,\sum_{t=1}^T p_{k,t}-B_k\}$, and for each ROI constraint we report max {0, γk, j∑tpk, t − ∑tvk, t(j)xk, t}. We also report the fraction of runs with any positive violation.
We compute realized liquid welfare W(x, v) = ∑kWk(x, v) and normalize it by an offline benchmark computed on the realized sample path: the best feasible allocation sequence subject to all advertisers’ budgets and ROI constraints, solved as a mixed-integer program for the single-slot environment. (This benchmark is stronger than the ex-ante benchmark used in the theorem, so ratios here should be interpreted as descriptive rather than a direct test of the bound.)
Exact dynamic regret against the per-round comparator μt* is difficult to compute in self-play because each agent’s comparator depends on others’ actions. We therefore use two proxies: (i) regret against the best constant multiplier in hindsight for each bidder (computed by replay with logged values), and (ii) regret against a slowly varying comparator obtained by fitting, for each bidder, the multiplier path that would have made its constraints tight in expectation under the observed opponent bid distribution (estimated in sliding windows).
Across all configurations we tested, MKP produced zero observed violations (budget and all ROI constraints) up to numerical tolerance, consistent with the ex-post nature of the guarantee. In contrast, the active-only update heuristic exhibits violations precisely in the regimes that motivate multi-constraint tracking: when the identity of the binding constraint changes, the inactive multipliers fail to ``accumulate slack’’ and can lag far behind. Violations are especially pronounced when λj is small (misalignment), because the realized ROI slack becomes noisy and the active constraint can flip frequently even when average strictness parameters are similar.
The sum-aggregator variant almost never violates constraints, but for a different reason: it is so conservative that it often underspends the budget by a wide margin. This mirrors a common platform observation: ``safety’’ can be purchased by conservative control, but at a potentially large welfare cost.
To connect welfare outcomes to the theoretical discussion about
boundary rounds and regime switching, we measure, for each bidder,
$$
S_k := \sum_{t=2}^T \mathbf{1}\{a_{k,t}\neq a_{k,t-1}\},
$$
where ak, t ∈ {B} ∪ ℛk
denotes which constraint attains the maximum multiplier in round t. Two patterns are robust.
First, when there is a clear ``dominant’’ constraint (e.g., one ROI constraint has much larger γk, j than the rest, or the budget is extremely tight), switching is rare (Sk/T below 1%), and MKP’s welfare is close to the oracle constant-multiplier baseline. In these cases, the max-aggregator behaves like a single-constraint pacer most of the time, and the additional multipliers simply certify feasibility.
Second, when several constraints are near tight and moderately misaligned (e.g., two ROI constraints with similar γ but different λ), switching becomes prevalent (Sk/T in the 5%–20% range, increasing with mk). Exactly in these configurations, the welfare gap between MKP and the oracle baseline widens, but in a way that is informative: most of the loss is concentrated in a small subset of ``switch rounds’’ in which the bidder’s effective bid changes abruptly because a different constraint becomes active. This is the empirical analogue of the boundary terms in the proof outline: the algorithm remains safe, but the price of safety is paid in transient conservatism when constraints compete to bind.
When we plot the best-constant-multiplier regret proxy as a function of T, MKP’s per-round regret decays roughly like T−1/2 in the aligned regimes (large λ), and more slowly in the misaligned switching regimes. Active-only updates display the opposite tradeoff: they can appear competitive on objective value early on (because they are less conservative), but the accumulated constraint violations grow with T, making them infeasible as a long-run policy. The sum-aggregator has low regret in the constrained sense (it rarely approaches violations) but high regret in objective value due to persistent underbidding.
A practical interpretation is that switching is not merely a proof artifact: it is a genuine hardness parameter for online control with multiple constraints. In environments where switching is endemic (multiple comparable constraints, noisy KPI streams, frequent competitive shocks), bidders should expect slower adaptation, and platforms should expect larger transient efficiency losses from any safe pacing rule.
Increasing n makes the auction ``thicker’’ and mechanically increases the marginal value of precise shading: small differences in bk, t more often change who wins the impression. In our simulations, thicker markets amplify both the upside of well-tuned multipliers (higher welfare relative to naive budget-only pacing) and the downside of switching (larger welfare drops on switch rounds). This is consistent with the economic intuition that in competitive auctions, conservative errors are more costly because they more frequently flip the allocation margin.
We emphasize three limitations. First, the single-slot environment abstracts away from rich feasibility sets and from approximate-core issues; multi-slot and combinatorial settings are likely to increase switching because different inventory segments can activate different constraints. Second, we assumed immediate observation of vk, t(j)xk, t; delayed attribution would weaken the ability to react, and we expect the gap between safe and aggressive heuristics to widen. Third, our offline benchmark is not the same as the ex-ante benchmark in the theorem; it is included as a diagnostic rather than a direct test.
With those caveats, the numerical study supports three takeaways that align with the theory. (i) Slack tracking across constraints is not just a technical convenience; it materially reduces violations relative to ``update-only-the-active-constraint’’ heuristics. (ii) The welfare cost of multi-constraint safety is driven disproportionately by switching episodes, suggesting that measuring and predicting switching is valuable for both bidders and platforms. (iii) Conservative aggregation rules (like summing multipliers) can buy safety but at an efficiency cost that is visible even in simple stationary environments. These observations motivate two operational diagnostics: logging the active-constraint identity ak, t and its switch rate, and stress-testing pacing controllers specifically in synthetic regimes engineered to induce frequent switching.