Digital advertising markets are increasingly mediated by : algorithmic agents that translate an advertiser’s objectives (reach, conversions, revenue) and constraints (budget, ROI/CPA targets, pacing smoothness) into per-auction bids at millisecond timescales. In practice, these agents operate in a regime that is simultaneously data-rich and information-poor. They observe abundant signals—estimated conversion rates, predicted values per impression, or contextual relevance scores—yet the business outcomes that justify spend often arrive only after substantial delay. This delay is not incidental: it is an institutional feature of modern measurement and attribution, shaped by user-level privacy protections, platform policies, and the operational realities of conversion tracking.
The 2026 measurement landscape sharpens this tension. First, conversion attribution is commonly governed by fixed windows (e.g., view-through or click-through horizons) that mechanically postpone the moment at which a conversion can be confidently assigned to an impression. Second, privacy-preserving measurement systems frequently , , or outcome reporting. Whether one thinks of on-device aggregation, delayed postbacks, or randomized reporting, the practical effect is the same: the value of an impression is learned after the auction has cleared, and often after many subsequent auctions have already been played. These frictions make it difficult to run a controller that is both aggressive (to capture value) and safe (to respect ROI/CPA requirements), because ``safety’’ is defined in terms of realized outcomes that are not yet observed.
Two constraints are especially salient in deployed autobidding. The first is the hard budget constraint: the advertiser cannot, ex post, exceed a spending cap over a horizon (a day, a week, a campaign). Violating this constraint is operationally unacceptable, and platforms routinely enforce it with conservative accounting and throttling. The second is a return constraint, which we interpret broadly to include ROI, CPA, or cost-per-incremental-action targets. While budget constraints depend only on payments and therefore can be checked immediately, ROI/CPA constraints depend on realized conversions and hence are intrinsically delayed. In a setting with delayed and noisy attribution, an algorithm that na"ively spends against point estimates of conversion value can easily drift into an ROI violation before the feedback arrives, at which point the violation cannot be undone.
Our goal is to formalize and resolve this practical dilemma: how can an autobidder participate in a repeated auction while ensuring (i) budget feasibility, (ii) ROI/CPA feasibility despite delayed outcome feedback, and (iii) nontrivial guarantees? The third criterion deserves emphasis. The repeated-auction environment is not a single-agent optimization problem; it is a strategic, multi-agent system in which many advertisers simultaneously run learning and control algorithms. The platform also plays an active role by determining allocation and payments as a function of bids. Even if each advertiser uses a ``reasonable’’ controller, the joint dynamics need not converge to a steady state, particularly when values are nonstationary, budgets bind, and agents update at different timescales. Yet from the perspective of policy and platform design, we care about aggregate outcomes—who gets served, how much value is generated, and how much welfare is left on the table—even when the system never settles.
This motivates a welfare notion that is compatible with constraints and non-convergence. Classical social welfare is ill-suited when advertisers face hard budgets and ROI requirements: a bidder with high value but a tight ROI target cannot simply pay arbitrarily to obtain impressions. The appropriate benchmark is a notion of welfare, which caps each agent’s contribution by what it can feasibly ``liquidate’’ into spend under its constraints. In the repeated-auction context, this benchmark is naturally stated in terms of an ex-ante proxy built from predictive value signals (the quantities an autobidder actually sees at decision time), and then related back to realized value using concentration. Conceptually, we want a guarantee that says: even if bids are generated by learning dynamics that oscillate, as long as the auction satisfies a mild deviation-proof condition (a core-like property) and advertisers pace in a disciplined way, the system achieves a constant fraction of the best feasible liquid welfare, up to unavoidable statistical and learning error terms.
The central technical challenge is the interaction between delayed outcomes and ROI feasibility. A budget controller can be myopic: it observes spend and updates a multiplier to keep cumulative payments below the cap. An ROI controller cannot do this directly, because the ``resource’’ it is balancing against (realized conversions) is not contemporaneously observed. The temptation in practice is to substitute predicted conversions or values, but this can be unsafe when predictions are biased or simply noisy at the granularity of individual auctions. Our modeling stance is that predictions can be unbiased conditional on the information available at bidding time, but realized conversions remain bounded and stochastic. Under this assumption, safety requires a : an algorithm must behave as if its available conversion value were a lower confidence bound rather than a point estimate. Delay enlarges the required buffer near the horizon, because the most recent outcomes remain unresolved and must be treated pessimistically.
A second challenge is the systems-level requirement of guarantees. Campaigns run for finite horizons, and the most costly failures occur early, when multipliers have not stabilized and when the algorithm has not yet accumulated evidence about realized ROI. Thus, we aim for statements that hold for all horizons and that quantify explicitly how safety and welfare degrade with the confidence level and the delay length. This emphasis also reflects accountability pressures faced by platforms and large advertisers: one often needs a knob (a confidence parameter) that trades performance for a certified bound on constraint violations, and one needs to understand how changes in measurement latency or attribution policy translate into conservative bidding and welfare loss.
Our contribution is therefore conceptual as much as technical. We
show how to separate the controller into two timescales: one that
enforces hard budget feasibility using immediate payment feedback, and
another that enforces ROI feasibility against a conservative,
statistically justified proxy of cumulative realized value. The proxy is
constructed so that it is predictable with respect to the bidding-time
information and so that, with high probability, it never overstates the
realized value that will eventually be observed. This design mirrors how
practitioners already think about guardrails'' andsafety
margins,’’ but it makes the guardrail explicit and analyzable: we can
trace how much efficiency is sacrificed in exchange for a chosen level
of safety, and how that sacrifice grows with delay and outcome
noise.
We also acknowledge what this perspective does not capture. Real-world attribution can be biased, adversarially manipulated, or subject to complex cross-campaign interference; moreover, delays are not always fixed, and conversions may be censored rather than merely postponed. Our focus is narrower: we isolate the clean economic tradeoff between delayed feedback and feasible bidding under ROI constraints, in a model that remains close enough to practice to inform platform design. The upshot is a framework in which ex-post safety and market-level efficiency are not competing desiderata to be handled heuristically, but jointly attainable guarantees—even when the underlying learning dynamics fail to converge.
Our starting point is the literature on in online advertising, in which an advertiser (or platform) modulates bids over time to respect a spending cap while competing in a sequence of auctions. Classic budget pacing mechanisms use a multiplicative ``pacing multiplier’’ to shade bids when cumulative spend threatens to exceed the budget, and they are tightly connected to dual variables of a Lagrangian relaxation of the budget constraint (see, e.g., ). A parallel line studies equilibrium structure and platform design under pacing, including how pacing interacts with first-price versus second-price payment rules and how it can be implemented through throttling or bid scaling . This body of work largely treats the constraint as : spend is observed at the moment of the auction outcome, so feasibility can be enforced ex post by accounting alone.
More recent work moves beyond budget-only control to incorporate , often expressed as ROI, CPA, or cost-per-action targets. Algorithmically, these constraints are naturally represented by an additional dual variable that tracks the gap between spend and value, producing a two-multiplier (or two-timescale) controller . Analytically, ROI constraints are more delicate than budgets because feasibility depends on realized outcomes and attribution logic, not merely on payments. Most existing analyses either (i) treat value feedback as contemporaneously observed, (ii) enforce ROI only in expectation (or in steady state), or (iii) abstract away from the strategic interaction among many simultaneously learning advertisers. Our contribution fits into this pacing tradition but isolates the friction that is central in practice: , and the constraint is naturally a rather than an asymptotic or average-case desideratum.
A second thread we build on is the use of as an efficiency benchmark in markets with hard financial constraints. Liquid welfare, originally introduced to capture the idea that an agent cannot contribute more than it can feasibly pay for , has since become a standard yardstick for auctions and matching markets under budgets . In repeated auction environments with pacing, liquid welfare is particularly attractive because it aligns with the economic object that pacing is implicitly optimizing: a bidder’s effective willingness-to-pay is clipped by its constraints. Recent ``welfare without convergence’’ results show that even when learning dynamics do not settle to an equilibrium, one can still certify a constant-factor approximation to an ex-ante liquid-welfare benchmark under mild auction conditions such as core-like deviation-proofness . Our analysis follows this philosophy but must confront an additional gap: with delayed conversions, the natural welfare proxy is formed from , whereas feasibility is defined in terms of (delayed) outcomes.
The algorithmic tools in our work are also related to (BwK) and more broadly to online learning with resource constraints. BwK studies a single decision-maker that repeatedly chooses arms with stochastic rewards and consumptions, aiming to maximize reward while keeping cumulative consumption within a budget . The primal–dual viewpoint in BwK is close in spirit to pacing: a dual variable prices the resource and induces conservative choices when the resource becomes scarce. However, BwK typically assumes the reward is observed (perhaps with noise) immediately after each pull, and it does not include strategic externalities induced by an auction that maps bids into allocations and payments. Moreover, ROI/CPA constraints are ratio-type constraints that couple reward and spend, rather than a single knapsack constraint on spend alone. Our controller can be viewed as a dual-based method for a ratio constraint, but the proof burden is different because we need feasibility under delayed, bounded outcomes rather than expected feasibility.
Delayed feedback has been studied extensively in online learning and bandits . The key observation is that delay primarily affects the : updates are based on stale feedback, which slows learning and typically worsens regret by an additive or multiplicative term depending on the delay model. While these results provide useful intuition, they do not directly address our setting for two reasons. First, we require a for a constraint defined in terms of delayed outcomes; the central question is not only how fast we learn, but how we avoid violating ROI before the evidence arrives. Second, delayed bandit analyses are usually single-agent performance comparisons against an optimal fixed action, whereas our primary efficiency statement is a welfare guarantee that holds under simultaneous play by many adapting agents and does not rely on convergence.
Our work is also connected to constrained online convex optimization and partial monitoring. There is a large literature on online algorithms that maintain feasibility of long-run constraints via primal–dual updates or virtual queues, often guaranteeing that constraint violation is sublinear while regret is near-optimal . In the advertising context, one can interpret pacing multipliers as such virtual queues. Yet two features of our environment push beyond the standard templates. First, the feedback is : advertisers observe allocations and payments immediately but observe conversion value only after delay, and the value itself is stochastic even conditional on bidding-time information. Second, the constraint of interest (ROI/CPA) is operationally closer to a requirement: platforms and advertisers typically want a knob that delivers an explicit bound on the probability of violation, not only an asymptotic vanishing violation rate.
What is new in our paper is therefore the synthesis of three ingredients that have largely been treated separately. (i) We incorporate into a repeated-auction pacing model in a way that is compatible with the informational reality of autobidding: bids are chosen using predictive signals, while realized value is revealed later through attribution. (ii) We provide a ROI/CPA feasibility guarantee by explicitly using martingale concentration to construct a lower confidence bound on cumulative realized value and updating the ROI multiplier against this conservative proxy. This makes the safety margin transparent and quantifiable in terms of the delay length and the desired confidence level. (iii) We retain the welfare perspective: when all advertisers run the algorithm in a core auction, we certify a constant-factor approximation to an ex-ante liquid-welfare benchmark, with an additional, interpretable penalty that reflects statistical conservatism induced by delay. In practical terms, our results formalize how measurement latency and privacy-induced reporting frictions translate into systematically more conservative bidding, and they do so in a way that preserves market-level efficiency guarantees rather than focusing solely on single-agent regret.
Delayed conversions force a conceptual choice about what it means to
be
ROI-feasible'' \emph{while bidding}. If conversion values were observed immediately, one could directly enforce the cumulative constraint \[ \sum_{t=1}^T p_{k,t}\;\le\;\frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t} \] either ex post (by stopping spend once the inequality becomes tight) or in expectation (by controlling the conditional drift of the left-minus-right side). With a fixed delay $D$, however, the right-hand side is not observable at decision time, so an advertiser cannot literally condition on the realized ROI numerator when choosing bids. In practice, autobidders instead act on \emph{predictive} value signals, and then later face an accounting reconciliation when conversions are reported. Our goal is to formalize a notion of feasibility that (i) is meaningful from a compliance perspective (we
violate with probability at most δ’’), and (ii) can be enforced using
information available at bid time.
Fix an advertiser k. Write
cumulative spend and (true) cumulative value as
$$
P_{k,t}\;=\;\sum_{\tau=1}^t p_{k,\tau},
\qquad
Y_{k,t}^{\mathrm{cum}}\;=\;\sum_{\tau=1}^t Y_{k,\tau}.
$$
The ROI/CPA constraint can be rewritten as a condition on cumulative
slack:
Gk, T ≡ γkPk, T − Yk, Tcum ≤ 0.
With delay, Gk, T
is unknown at time T, let
alone at intermediate times. What is known online is the predictable
proxy vk, txk, t,
which satisfies the martingale-unbiasedness condition 𝔼[Yk, t ∣ ℱt] = vk, txk, t.
This suggests tracking the slack
$$
\widehat G_{k,t}^{\mathrm{proxy}}
\;\equiv\;
\gamma_k P_{k,t}-\sum_{\tau=1}^t v_{k,\tau}x_{k,\tau},
$$
but feasibility in terms of Ĝk, tproxy
is not, by itself, a safety guarantee, because realized outcomes can
fall below their conditional means. The natural conservative correction
is therefore to replace realized value by a (LCB) on cumulative realized
value, yielding a sufficient condition for ROI feasibility.
For each k, consider the
martingale difference sequence ξk, t = Yk, t − vk, txk, t
with 𝔼[ξk, t ∣ ℱt] = 0
and ξk, t ∈ [−v̄, v̄].
A standard way to express concentration uniformly over horizons is to
define a radius function radk(t, δ)
such that, with probability at least 1 − δ,
We emphasize two features of . First, it is : it holds simultaneously
for all t, which is the right
form when an algorithm adapts bids over time. Second, it is compatible
with fixed delay: the inequality is a statement about the realized
trajectory, not about what is observable at time t. The delay affects (one cannot
verify the bound online) and (at time T there are D outcomes not yet revealed), but it
does not change the underlying martingale structure.
To make observability explicit, it is useful to separate revealed and
pending outcomes at time t.
Let
$$
Y_{k,t}^{\mathrm{revealed}}
\;=\;
\sum_{\tau=1}^{t-D} Y_{k,\tau},
\qquad
Y_{k,t}^{\mathrm{pending}}
\;=\;
\sum_{\tau=t-D+1}^{t} Y_{k,\tau},
$$
where the pending sum is interpreted as 0 if t ≤ D. At time t, only Yk, trevealed
is known, while Yk, tpending ∈ [0, Dv̄]
is unknown but nonnegative. This simple nonnegativity is already
informative: it implies that any lower bound on Yk, trevealed
is also a lower bound on Yk, tcum.
We now define the LCB process that will serve as the conservative
accounting numerator. For each k and time t, define
The quantity $\underline
Y_{k}(t;\delta)$ is ℱt-measurable because it
depends only on value signals, allocations, and a deterministic radius.
On the high-probability event in , it satisfies $\underline Y_{k}(t;\delta)\le
Y_{k,t}^{\mathrm{cum}}$ for all t. This is the key monotonicity we
need: spending against $\underline Y$
is, by construction, safe against the (unknown) realized Y up to a probability δ.
When delay is operationally salient (e.g., at the end of an
accounting period), one may further define an LCB based only on
information revealed by time t:
which lower-bounds Yk, trevealed
(and hence Yk, tcum)
on the same event. We view as the analytically clean object for
guaranteeing final-period feasibility, and as the practical real-time
diagnostic when one wants to know ``how safe are we given what has been
measured so far?’’
Using the LCB, we define a conservative feasibility notion that mirrors ex-post feasibility but accommodates delay and uncertainty.
A bidding (and allocation/payment) trajectory is for advertiser k if
ℙ (γkPk, T ≤ Yk, Tcum) ≥ 1 − δ.
It is if it satisfies the deterministic inequality
On the event , implies γkPk, T ≤ Yk, Tcum,
so LCB-feasibility is a sufficient condition for (1 − δ)-ROI-feasibility.
Two comments clarify the interpretation. First, this is ``ex-post’’ in the sense that it is a statement about the ROI numerator, not merely about conditional expectations. Second, it is probabilistic: because realized conversions are stochastic (even conditional on bid-time information), no nontrivial algorithm can guarantee ROI surely without assuming deterministic outcomes. The confidence level δ therefore plays the role of an explicit compliance knob.
A common alternative in learning-with-constraints is to enforce ROI ,
e.g.,
𝔼 [γkPk, T − Yk, Tcum] ≤ 0,
or to guarantee sublinear expected violation 𝔼[(γkPk, T − Yk, Tcum)+] = o(T).
These guarantees are mathematically convenient but economically
misaligned with many advertising objectives. An expectation guarantee
permits rare but severe shortfalls in realized conversions—exactly the
events that can trigger budget resets, credit limits, or contractual
disputes. Likewise, sublinear violation does not preclude an $\Theta(\sqrt{T})$ overspend relative to
realized value, which may be operationally large over long horizons.
Our LCB-based notion makes this tradeoff transparent. It replaces a potentially fragile expectation condition by a robust high-probability condition, at the cost of a statistical ``safety margin’’ radk(T, δ) that effectively reduces the value one is willing to spend against. This is the precise channel through which delay, noise, and stringent confidence requirements translate into conservative bidding and (as we later quantify) an additive welfare penalty.
Delayed conversion reporting forces us to separate two questions that are effectively identical in the no-delay case: what does it mean to be ROI/CPA-feasible , and what can an autobidder enforce ? When realized conversion value is observed immediately, one can treat the cumulative ROI constraint as an online accounting identity and simply ``stop spending’’ once the constraint becomes tight. With a fixed delay D, the right-hand side is revealed only after decisions have already been made, so any meaningful feasibility notion must explicitly confront uncertainty about the (as yet unobserved) realized value.
Fix advertiser k and define
cumulative spend and realized value up to time t:
$$
P_{k,t}\;=\;\sum_{\tau=1}^t p_{k,\tau},
\qquad
Y_{k,t}^{\mathrm{cum}}\;=\;\sum_{\tau=1}^t Y_{k,\tau}.
$$
The ROI/CPA requirement $\sum_{t=1}^T
p_{k,t}\le \frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t}$ is
equivalently a constraint on :
Gk, t ≡ γkPk, t − Yk, tcum, and
the target
is Gk, T ≤ 0.
In a delayed-feedback environment, Gk, t
is not observable at time t
because Yk, tcum
includes outcomes that have not yet been measured. The right modeling
move, therefore, is to identify an for Yk, tcum
and then guard it with a statistically valid safety margin.
The key primitive is the martingale-unbiasedness condition
𝔼 [Yk, t ∣ ℱt] = vk, txk, t,
so ∑τ ≤ tvk, τxk, τ
is the natural predictable proxy for cumulative realized value. This
suggests tracking the
$$
\widehat G_{k,t}^{\mathrm{proxy}}
\;\equiv\;
\gamma_k P_{k,t}-\sum_{\tau=1}^t v_{k,\tau}x_{k,\tau}.
$$
However, proxy slack alone is not a compliance guarantee: even if Ĝk, Tproxy ≤ 0,
realized conversions can fall below their conditional means along an
unlucky sample path, leading to Gk, T > 0
and an ROI violation. For operational constraints that are evaluated on
realized outcomes, the conservative object is therefore not the
conditional mean, but rather a (LCB) on cumulative realized value.
Let ξk, t = Yk, t − vk, txk, t.
Then (ξk, t)t ≥ 1
is a martingale difference sequence with respect to (ℱt), bounded in [−v̄, v̄]. Standard
concentration tools (Azuma–Hoeffding under boundedness, Freedman under a
variance proxy) yield a function radk(t, δ)
such that, with probability at least 1 − δ,
We emphasize the form: holds simultaneously for all t, which is essential because the
bidding process adapts to past observations and thus selects stopping
times endogenously.
A fixed delay D affects what is at time t, but it does not destroy the martingale structure underlying . What changes is the algorithmic interpretation: near the end of the horizon, there are outcomes that are not yet measurable, so any implementable diagnostic must treat these terms conservatively.
To isolate what can be verified online, decompose the cumulative
realized value at time t into
revealed and pending components:
$$
Y_{k,t}^{\mathrm{revealed}}
\;=\;
\sum_{\tau=1}^{t-D} Y_{k,\tau},
\qquad
Y_{k,t}^{\mathrm{pending}}
\;=\;
\sum_{\tau=t-D+1}^{t} Y_{k,\tau},
$$
with the convention that Yk, trevealed = 0
when t ≤ D. At time
t, Yk, trevealed
is known, while Yk, tpending
is unknown but satisfies the deterministic bound 0 ≤ Yk, tpending ≤ Dv̄.
This observation is simple but useful: any lower bound on Yk, trevealed
is automatically a lower bound on Yk, tcum.
We define the (signal-based) LCB process that will serve as a
conservative accounting numerator. For each advertiser k and time t, let
By construction, $\underline
Y_k(t;\delta)$ is ℱt-measurable: it depends
only on bids/allocations (immediately observed), value signals, and a
deterministic radius. On the event , it is a pathwise lower bound on
realized cumulative value, i.e.,
$$
\underline Y_k(t;\delta)\;\le\;Y_{k,t}^{\mathrm{cum}}\qquad\forall t.
$$
For real-time monitoring under delay, it is also natural to define a
revealed-value version,
which lower-bounds Yk, trevealed
on the same high-probability event. We view as the analytically clean
object for end-of-horizon feasibility statements, and as the
operationally meaningful ``what we can certify right now’’ quantity.
We now state the compliance notion we target.
A trajectory is for advertiser k if
ℙ (γkPk, T ≤ Yk, Tcum) ≥ 1 − δ.
Because Yk, Tcum
is random, a nontrivial sure guarantee is generally impossible without
additional determinism assumptions. Our approach is to enforce a
stronger, pathwise inequality against the LCB:
On the event , implies γkPk, T ≤ Yk, Tcum,
and hence it is a sufficient condition for (1 − δ)-ROI-feasibility. The
parameter δ becomes an
explicit compliance knob: tightening δ widens radk(T, δ)
and makes bidding more conservative.
A common alternative is to impose ROI constraints only in
expectation, for example
𝔼 [γkPk, T − Yk, Tcum] ≤ 0,
or to bound expected positive-part violation, 𝔼[(γkPk, T − Yk, Tcum)+].
Such formulations are attractive analytically, but they are a poor match
for how ROI/CPA constraints are enforced in practice. An expectation
guarantee permits rare but economically severe shortfalls in realized
conversions; these are precisely the events that trigger billing
disputes, credit holds, or internal risk limits. Likewise, a sublinear
expected violation can still correspond to a large absolute overspend
(e.g., $\Theta(\sqrt{T})$) relative to
realized value, which may be operationally material over long
horizons.
Our LCB-based notion makes the tradeoff explicit: to obtain a high-probability statement about realized ROI, we must withhold a statistical margin radk(T, δ) from the value we are willing to ``spend against.’’ This margin is the conduit through which delay, noise, and stringent compliance requirements translate into conservative bidding—and, as we quantify later, into an additive efficiency loss. The next section shows how to embed into a pacing-style update rule that remains implementable under delayed measurements.
We now describe a pacing rule that operationalizes the conservative feasibility idea from the previous section: rather than spending against realized conversion value (which is delayed), the bidder spends against a predictable lower confidence bound (LCB) on cumulative value. The design goal is deliberately modest: we do not attempt to learn a stationary best-response or reach a fixed point. Instead, we maintain while continuing to bid myopically on the current value signal vk, t, throttled by multipliers that track (i) budget tightness and (ii) ROI/CPA tightness.
Fix an advertiser k. TSP-DC
maintains two nonnegative pacing multipliers:
μk, tB ≥ 0 (budget
multiplier), μk, tR ≥ 0 (ROI/CPA
multiplier).
In each round t, after
observing the value signal vk, t,
the effective multiplier is the most conservative one,
and the bidder submits the shaded bid
This is the same ``multiplicative pacing’’ form used in budget pacing: a
higher multiplier reduces the bid proportionally, which is convenient
both analytically (it corresponds to a Lagrangian relaxation) and
practically (it is easy to implement inside an autobidder).
Because the budget constraint is required to hold surely, TSP-DC
includes a deterministic cap based on remaining budget. Let
$$
B^{\mathrm{rem}}_{k,t}\;\equiv\;B_k-\sum_{\tau=1}^{t-1} p_{k,\tau}
$$
denote remaining budget at the start of round t. If Bk, trem ≤ 0,
we set bk, t = 0
(the bidder drops out). Otherwise, we cap the bid as
This cap is conservative but robust: by individual rationality of the
auction, pk, t ≤ bk, txk, t ≤ bk, t,
so implies pk, t ≤ Bk, trem
and hence $\sum_{t=1}^T p_{k,t}\le B_k$
deterministically. (In settings where xk, t ∈ {0, 1},
the implication is immediate; for fractional allocations, xk, t ≤ 1
suffices.)
After bids are submitted and the auction clears, the bidder observes
the allocation and payment (xk, t, pk, t)
immediately. TSP-DC then updates its multipliers using two (possibly
different) learning rates ηB, ηR > 0.
The budget update uses only immediate spend and targets the per-round
budget rate ρk = Bk/T:
where [z]+ = max {z, 0}
denotes projection onto ℝ ≥ 0. This is the standard
pacing-as-dual-ascent update: if the bidder spends above its target rate
in round t, μB increases and
future bids are shaded more; if it underspends, μB relaxes.
The ROI controller is conceptually similar but must be more
conservative because realized value is delayed and noisy. We update
μR using a
per-round slack increment. Define
with the convention radk(0, δ) = 0.
The ROI multiplier update is
Several features are worth emphasizing.
(i) The term vk, txk, t
is observable at the end of round t (it depends on the signal and the
allocation), while Δk, t
is deterministic given (t, δ) and the chosen
confidence radius. This is the key implementability point under delay:
the ROI controller reacts immediately to spend and to the proxy for
value.
(ii) Because Δk, t
is defined as the difference of radii, the cumulative padded slack
satisfies
$$
\sum_{\tau=1}^t \tilde s_{k,\tau}
\;=\;
\gamma_k\sum_{\tau=1}^t p_{k,\tau}
-
\sum_{\tau=1}^t v_{k,\tau}x_{k,\tau}
+
\mathrm{rad}_k(t,\delta).
$$
Thus, μR
is effectively tracking (a projected, scaled version of) the
conservative proxy slack $\gamma_k
P_{k,t}-\underline Y_k(t;\delta)$, without requiring the bidder
to explicitly store or recompute long sums.
(iii) Any predictable function radk(t, δ)
that upper-bounds the deviation between realized and proxy cumulative
value (accounting for fixed delay D) can be used in . Operationally,
one can inflate the radius near the horizon to reflect that the last
D outcomes are pending. We
postpone the exact functional form and its dependence on (D, δ) to the next section;
for the algorithm, the only requirement is that radk(t, δ)
be nondecreasing in t, so that
Δk, t ≥ 0.
Using μk, t as in is a simple but important design choice. Both constraints reduce willingness-to-pay: a tight budget calls for spending less overall, and a tight ROI/CPA calls for spending less per unit of value. Since larger multipliers reduce bids, taking the maximum ensures that dominates bidding. This avoids pathological interactions where one controller relaxes while the other tightens: the bidder always uses the stricter throttle, while both multipliers continue to update in the background based on their own accounting signals.
For clarity, we summarize TSP-DC for advertiser k as the following per-round routine:The ``two-timescale’’ terminology reflects that the platform-facing hard budget constraint is typically managed more aggressively than ROI/CPA: budgets are contractual and immediately enforceable, whereas ROI is statistical and delayed. In our analysis, it is convenient (though not strictly necessary) to allow ηB and ηR to differ, e.g., with ηB larger so that μB reacts faster to overspend. Practitioners often also impose upper bounds $\mu^B_{k,t},\mu^R_{k,t}\le \overline\mu$ to prevent extreme bid shading when signals spike; such caps can be added as an additional projection without changing the logic of the update.
Finally, we stress what the algorithm do: it does not attempt to infer or predict other bidders, and it does not require that the induced dynamics converge. Each advertiser needs only its own (vk, t, xk, t, pk, t) stream and a pre-specified radius schedule. The next section formalizes the resulting feasibility guarantees and makes explicit how the required conservatism scales with delay D and confidence level δ.
We now state and prove the safety properties that motivate TSP-DC. The budget constraint is enforced deterministically (and hence does not interact with the conversion delay), while the ROI/CPA constraint is enforced against a predictable (LCB) on cumulative value. The main technical choice is therefore an explicit confidence radius schedule that (i) is valid for the martingale difference noise in realized conversions and (ii) is implementable under a fixed observation delay D.
Fix advertiser k and recall
ξk, t ≡ Yk, t − vk, txk, t,
where 𝔼[ξk, t ∣ ℱt] = 0
and ξk, t ∈ [−v̄, v̄].
A convenient (worst-case) Freedman/Azuma-style confidence radius for
partial sums is
which satisfies, for any fixed m,
$$
\mathbb{P}\!\left(\sum_{t=1}^{m} Y_{k,t}\;\ge\;\sum_{t=1}^{m}
v_{k,t}x_{k,t}-\mathrm{rad}^{\mathrm{basic}}(m,\alpha)\right)\;\ge\;1-\alpha.
$$
To obtain a statement that holds simultaneously over horizons and that
remains predictable when conversions are delayed, we combine two
standard reductions: (i) a union bound over m ∈ {0, 1, …, T} and (ii) a
worst-case bound on the D
(unrevealed) rounds. Concretely, define the delayed, time-uniform
radius
where (t − D)+ = max {t − D, 0}.
The first term controls statistical fluctuations on the revealed prefix
of length (t − D)+, and
the additive Dv̄ is
the conservative penalty for the D most recent (unrevealed) values,
each of which could be as low as 0 in
the worst case.
Given , we define the predictable cumulative value LCB
Importantly, $\underline{Y}_{k,t}(\delta)$ is computable
by the bidder at the end of round t from (vk, τ, xk, τ)τ ≤ t
and known constants (v̄, D, δ), without
needing any realized conversions.
The multiplier update is designed to the conservative slack over
time. For a clean feasibility statement that is robust to arbitrary
bid-to-payment mappings consistent with individual rationality, we
additionally use the same idea as the remaining-budget cap, now applied
to conservative ROI credit. Define the conservative remaining ROI spend
capacity at the start of round t as
and interpret Rk, trem(δ) ≤ 0
as ``no safe spend remains.’’ We then cap the bid by
in addition to the budget cap . This cap is operationally simple (it
depends only on the bidder’s own accounting) and can be viewed as the
hard-safety analogue of the ROI multiplier: when conservative credit is
scarce, bidding is throttled all the way to zero regardless of the
multiplier state.
We can now state the main safety guarantee.
The budget part is immediate from the earlier accounting argument: after applying we have pk, t ≤ bk, t ≤ Bk, trem, so the cumulative spend cannot exceed Bk.
For ROI, the cap ensures an analogous deterministic invariant with
respect to the LCB. Indeed, by individual rationality, pk, t ≤ bk, txk, t ≤ bk, t,
and therefore implies pk, t ≤ [Rk, trem(δ)]+.
Summing over rounds yields the telescoping inequality
which holds surely (it is purely algebraic given the cap and the
definition of $\underline{Y}$).
The second step is statistical: by construction of radk(⋅, δ) in ,
a union bound over k and t implies that with probability at
least 1 − δ,
Combining and yields the desired ROI/CPA feasibility:
$$
\sum_{t=1}^T p_{k,t}
\;\le\;
\frac{1}{\gamma_k}\left(\sum_{t=1}^T
v_{k,t}x_{k,t}-\mathrm{rad}_k(T,\delta)\right)
\;\le\;
\frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t}.
$$
Equation makes the safety–efficiency tradeoff transparent. The statistical term scales as $\bar v\sqrt{(T-D)\log(nT/\delta)}$ (plus a lower-order v̄log (nT/δ) term), while the delay contributes an additive Dv̄ that is unavoidable without additional structure: if the last D outcomes have not yet been observed, maintaining a one-sided guarantee forces us to treat them pessimistically. Practically, smaller δ (stricter safety) and larger D (longer attribution windows) both tighten the conservative LCB and therefore reduce admissible spend through ; this is precisely the policy-relevant channel through which measurement latency depresses feasible bidding even when expected conversions are unchanged.
We now turn from individual safety to a market-level guarantee: even though (i) conversion feedback is delayed and (ii) bidders update multipliers online without converging to an equilibrium, the induced outcomes are approximately efficient in the standard sense, up to a statistical conservatism penalty that is explicit in (D, δ). The key message is that delay does not break the familiar proof; it only changes the effective willingness-to-pay that we are allowed to treat as feasible when ROI/CPA must hold with high probability.
To make this precise, let Φ
denote the (random) allocation-sequence rule induced when all
advertisers run TSP-DC and bid bk, t = vk, t/(1 + μk, t)
(with the budget and ROI remaining-credit caps applied). Recall that our
ex-ante benchmark is defined in terms of the observable signals v and a single-round rule y(⋅) ∈ X:
$$
W^{\mathrm{ex\text{-}ante}}(y,F)
\;=\;
\sum_{k=1}^n T\cdot
\min\Big\{\rho_k,\;\frac{1}{\gamma_k}\,\mathbb{E}_{v\sim
F}\!\big[y_k(v)\,v_k\big]\Big\}.
$$
Likewise, Wex-ante(Φ, F)
denotes the welfare of the induced play, evaluated against the same
proxy values v.
The proof follows the same two-step structure as in pacing-without-convergence analyses: first, we relate the welfare loss to auction revenue using core-ness; second, we upper bound total revenue by an appropriate notion of that is feasible for each bidder. The only substantive change under delay is that the feasible liquid value is no longer $\frac{1}{\gamma_k}\sum_t v_{k,t}x_{k,t}$, but rather its counterpart $\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)$, which is smaller by radk(T, δ)/γk. This is exactly the additional penalty term in .
To see where the factor 1/2 comes from, recall that pacing bids have the multiplicative form bk, t = vk, t/(1 + μk, t). When a bidder is (say μk, t ≤ 1), we have bk, t ≥ vk, t/2, so the auction is, in effect, running on bids that preserve at least half of the signal values. In contrast, when a bidder is (μk, t > 1 or the remaining-credit caps bind), the bidder is expressing low willingness-to-pay not because the impression is intrinsically low value, but because some constraint is becoming tight. In that regime, the classic charging argument reallocates the would-be welfare loss to the fact that the bidder must be spending close to their liquid limit.
Formally, fix an arbitrary comparator y(⋅) ∈ X and apply the core
condition round-by-round with S chosen as the set of advertisers
whose bids are “small” relative to the comparator’s suggested
allocation. Summing the resulting inequalities over rounds yields a
bound of the form
$$
\sum_{t=1}^T \sum_{k=1}^n b_{k,t}y_{k,t}
\;\le\;
\sum_{t=1}^T \sum_{k=1}^n b_{k,t}x_{k,t}
\;+\;
\sum_{t=1}^T \sum_{k=1}^n p_{k,t},
$$
where we have used individual rationality to control the deviation term
by revenue. The left-hand side is a proxy for the comparator’s total
value evaluated at our bids, while the first term on the right is our
realized “bid-weighted” welfare, and the second is total revenue.
At this point, the pacing form of bids provides the welfare
approximation. Since bk, t = vk, t/(1 + μk, t),
we can write
vk, txk, t = (1 + μk, t) bk, txk, t ≥ bk, txk, t,
and, more importantly, whenever μk, t ≤ 1
we have vk, txk, t ≥ 2 bk, txk, t.
Aggregating across bidders and rounds, one shows that at least half of
the comparator’s proxy welfare is either (i) directly realized as proxy
welfare under Φ, or (ii) can
be charged to revenue collected during rounds in which some bidder is
significantly paced.
Finally, we upper bound total revenue by . By Theorem~, for each
advertiser k we have
deterministically
$$
\sum_{t=1}^T p_{k,t}
\;\le\;
\min\Big\{B_k,\;\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)\Big\}
\;=\;
\min\Big\{B_k,\;\frac{1}{\gamma_k}\sum_{t=1}^T v_{k,t}x_{k,t}\Big\}
\;-\;
O\!\Big(\frac{1}{\gamma_k}\mathrm{rad}_k(T,\delta)\Big),
$$
where the last step simply isolates the loss from enforcing ROI/CPA
against an LCB rather than the raw proxy sum. Summing over k bounds total revenue by the sum of
conservative liquid values; substituting this revenue bound into the
core charging inequality yields . The appearance of D is entirely through radk(T, δ):
with a fixed D-round delay,
the confidence radius contains an unavoidable additive Dv̄ term (cf. ), which
translates linearly into a liquid-welfare penalty.
Equation is then a direct corollary of the concentration event : once the realized cumulative values dominate their LCBs, the true liquid value $\frac{1}{\gamma_k}\sum_t Y_{k,t}$ dominates the conservative proxy liquid value $\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)$, so any ex-ante welfare lower bound transfers to realized welfare up to the same radk(T, δ)/γk loss. This makes the practical tradeoff transparent: longer attribution windows and stricter safety levels mechanically depress welfare by forcing bidders to act as if the last D outcomes might be zero, even when expected conversions are unchanged.
The welfare bound above is deliberately : it does not require the multipliers to converge, only that they remain safe and that the auction is core. In some applications, however, one also wants an learning interpretation: if an advertiser were optimizing in a stationary (or slowly drifting) environment, how close is TSP-DC to the best fixed (or slowly varying) pacing choice that could have been selected in hindsight? We briefly outline such a guarantee here. The main conceptual point is that delayed conversion feedback does not merely slow learning; because we insist on high-probability ROI/CPA feasibility, delay forces the comparator itself to be via a padded ROI constraint.
To avoid distractions from strategic interaction, fix an advertiser
k and view the rest of the
market and the auction mechanism as inducing, for each (effective)
multiplier μ ≥ 0, a
distribution over per-round outcomes (x(μ), p(μ))
conditional on the advertiser’s signal v. Under i.i.d. signals and a
stationary environment, define the expected proxy value and expected
spend at multiplier μ by
V(μ) = 𝔼[v x(μ)], P(μ) = 𝔼[p(μ)],
where the expectation is over the signal v and any randomness in the
auction/competition. We assume a mild regularity condition used in prior
pacing regret analyses: V(μ) and P(μ) are Lipschitz (or
piecewise Lipschitz) in μ, and
the per-round surrogate losses are convex in μ after the standard
reparameterization (equivalently, the mapping μ ↦ 1/(1 + μ) is smooth and
the auction response is not too discontinuous in bids). These
assumptions are not innocuous—in particular, single-item auctions can
create kinks when bid orderings swap—but they are a reasonable
approximation in thick markets or under small bid perturbations.
With delayed and noisy realized values Y, we cannot safely benchmark
against the multiplier that would be optimal under the ROI constraint
𝔼[γp] ≤ 𝔼[Y]
unless we allow a nonzero violation probability. Instead, the natural
comparator is the best multiplier that satisfies a ROI constraint,
reflecting the fact that the algorithm must act as if cumulative
realized value might undershoot its proxy by a confidence radius.
Concretely, for a horizon T
and confidence level δ, define
a per-round padding
$$
\epsilon_T(D,\delta)\;\asymp\;\frac{\mathrm{rad}_k(T,\delta)}{T},
$$
where radk(T, δ)
includes the fixed-delay conservatism (e.g., an additive Dv̄ term plus the usual
$\bar v\sqrt{T\log(1/\delta)}$
concentration scale). We then consider the
ℳT = {μ ≥ 0: P(μ) ≤ ρk, γkP(μ) ≤ V(μ) − ϵT(D, δ)}.
Let μT⋆ ∈ arg maxμ ∈ ℳTV(μ)
be the best fixed feasible multiplier under this padded notion of ROI.
The dependence on T is
important: as T grows, the
padding ϵT(D, δ)
shrinks, so the comparator becomes less conservative, matching the
intuition that long campaigns can afford tighter statistical
calibration.
Let the algorithm’s realized proxy value be $\sum_{t=1}^T v_{k,t}x_{k,t}$ (which is what
the bidder can optimize online, given that Y is delayed). Define the (proxy)
regret against μT⋆
as
$$
\mathrm{Reg}(T)
\;=\;
T\cdot V(\mu^\star_T)\;-\;\sum_{t=1}^T
\mathbb{E}\big[v_{k,t}x_{k,t}\big].
$$
Under the same step-size choices used in pacing-without-convergence
analyses (two timescales, with the budget multiplier updated more
aggressively than the ROI multiplier), and under the Lipschitz/convexity
surrogates described above, one can adapt standard online convex
optimization arguments to show a sublinear regret bound of the
form
Reg(T) ≤ Õ (T3/4) + Õ (radk(T, δ)),
where the first term is the familiar non-asymptotic price of
simultaneously learning and maintaining feasibility with multiplicative
bidding (the exponent 3/4 is typical
when one combines smoothing, two constraints, and a non-smooth auction
response), and the second term is the explicit statistical cost of
delay/noise through the padded ROI comparator. Importantly, delay does
not appear as an additional difficulty beyond this padding: because the
ROI controller is driven by the predictable proxy term vk, txk, t
plus an explicit confidence increment, the update does not require
waiting for Yk, t
to arrive in order to remain safe.
A more flexible benchmark allows the environment to drift slowly (e.g.,
seasonality in competition), and compares TSP-DC to a {μt⋆}t = 1T
with bounded path length
$$
\mathrm{PL}_T
\;=\;
\sum_{t=2}^T \big|\mu_t^\star-\mu_{t-1}^\star\big|.
$$
The appropriate comparator class is again padded: at each time t, the comparator must satisfy
instantaneous analogues of the budget-rate and padded ROI constraints in
expectation under the time-t
environment. Under bounded drift and the same regularity conditions, one
obtains a dynamic regret guarantee of the schematic form
$$
\mathrm{DReg}(T)
\;=\;
\sum_{t=1}^T \Big(V_t(\mu_t^\star)-V_t(\mu_t)\Big)
\;\le\;
\tilde O\!\Big(T^{3/4}\big(1+\mathrm{PL}_T\big)\Big)
\;+\;
\tilde O\!\big(\mathrm{rad}_k(T,\delta)\big),
$$
where Vt(⋅) denotes
the time-t expected proxy
value under the drifting environment. The dependence on PLT formalizes the usual
intuition: tracking a rapidly moving optimum is costly, but if the
optimal pacing multiplier changes slowly, the algorithm can shadow it
with only a mild additional penalty.
At a high level, we treat the multiplier updates as online gradient (or
mirror) descent on surrogate constraint losses. For the budget
constraint, the natural loss is proportional to pk, t − ρk
and is observed immediately. For ROI, we use the padded slack signal
(proxy value minus γ-scaled
spend, plus the predictable confidence increment), so the ROI update is
also driven by a quantity available at time t even though Yk, t
is not. The confidence padding is doing double duty: it ensures
high-probability feasibility, and it converts the delayed/noisy ROI
constraint into an online-learnable surrogate. The remaining technical
work is to relate regret in the surrogate losses to regret in proxy
value, and to control the non-smoothness introduced by the auction
response via Lipschitzness or smoothing arguments.
This optional regret view clarifies what delay and change. It does not
fundamentally prevent learning a good pacing policy in a stationary
market; rather, it forces the bidder to compete against a more
conservative policy class that builds in statistical slack of order
ϵT(D, δ).
On the other hand, the guarantee rests on conditions that may fail in
practice: discontinuities in allocation as bids cross can break
Lipschitzness, and fully adversarial opponents can induce linear regret
for any algorithm. For these reasons, we view the regret statement as a
useful complement—connecting to classical online optimization—but not as
the core robustness guarantee, which remains the feasibility and
aggregate welfare bound that holds without convergence and without
stationarity assumptions.
We close by discussing several modeling and implementation issues that arise in practice, and how they interface with the feasibility–efficiency logic above. The unifying theme is that our guarantees rely on two ingredients: (i) an that links the bidder’s controllable quantities (payments and allocations) to a predictable proxy for value (the signal term vk, txk, t), and (ii) a (the confidence radius) that shields ROI feasibility against delayed and noisy realizations. Each extension can be read as modifying one of these two ingredients.
In many advertising systems the realized value Yk, t
is not observed at the impression level. Instead, one sees (with delay)
an aggregated report over a time window (e.g., daily) and possibly a
report due to privacy thresholds, de-duplication, or measurement loss.
Let 𝒲j ⊆ {1, …, T}
denote the set of rounds whose conversions are reported in aggregate
bucket j, and let the report
be
Ŷk, j = ∑t ∈ 𝒲jYk, t + ζk, j,
where ζk, j
captures noise (e.g., privacy perturbation) and censoring can be
represented by Ŷk, j = 1{∑t ∈ 𝒲jYk, t ≥ τ} ⋅ (∑t ∈ 𝒲jYk, t + ζk, j)
for a threshold τ. The key
question is whether we retain an unbiasedness condition suitable for
concentration. If the platform provides an unbiased (or conservative)
estimator Ỹk, j
of ∑t ∈ 𝒲jYk, t
conditional on the bidder’s filtration, then we can re-run Proposition~1
at the by defining martingale differences
ξk, j = Ỹk, j − ∑t ∈ 𝒲jvk, txk, t.
The concentration radius then scales with the number of buckets rather
than T, but the range (and
variance proxy) per bucket increases with |𝒲j|. This captures the
operational tradeoff: coarser aggregation yields fewer feedback points
(slower statistical tightening) but potentially less per-point noise if
aggregation reduces variance.
If censoring destroys unbiasedness, we can still preserve by working with a lower bound $\underline Y_{k,j}\le \sum_{t\in\mathcal{W}_j}Y_{k,t}$ that is valid almost surely (for instance, treating censored buckets as zero). This is conservative—it tightens the ROI controller—but it fits our design philosophy: the ROI multiplier update only needs a lower confidence bound. Empirically, platforms often can provide monotone corrections (e.g., ``at least τ conversions occurred’’) that directly translate into such $\underline Y$ terms and reduce conservatism.
Fixed delay D is
analytically convenient but not essential. Suppose each realized value
Yk, t
is revealed after a random delay Dk, t ∈ {0, 1, 2, …},
possibly dependent on observables (e.g., conversion type) but
conditionally independent of future randomness given ℱt. Let ℱt now include all values
whose revelation times are ≤ t. The concentration argument can
be adapted by treating the revealed terms as a stopped martingale: for
any horizon T, decompose
$$
\sum_{t=1}^T Y_{k,t}
\;=\;
\sum_{t:\, t+\mathsf{D}_{k,t}\le T} Y_{k,t}
\;+\;
\sum_{t:\, t+\mathsf{D}_{k,t}> T} Y_{k,t}.
$$
The first sum admits a Freedman/Azuma bound as before (since
unbiasedness is with respect to ℱt, not revelation time),
and the second sum is bounded by v̄ times the number of pending items.
Thus a natural random-delay analogue of radk(T, δ)
is
$$
\mathrm{rad}_k^{\mathrm{rand}}(T,\delta)
\;=\;
O\!\Big(\bar v\sqrt{T\log(1/\delta)}\Big)
\;+\;
\bar v\cdot N_{k}^{\mathrm{pend}}(T),
$$
where Nkpend(T)
counts unresolved contributions at T. If we only have a
high-probability bound on Nkpend(T)
derived from a delay distribution estimate, we can fold that into the
same overall failure probability budget (splitting δ across events). Conceptually,
stochastic delay affects performance through the of the delay
distribution rather than its mean: rare long delays behave like an
additional adversarial uncertainty that must be padded against.
Modern attribution often decomposes value into components (clicks,
installs, purchases) with different delay profiles and different ROI
semantics. A convenient representation is
$$
Y_{k,t} \;=\; \sum_{m=1}^M Y^{(m)}_{k,t},
\qquad
\mathbb{E}\!\left[Y^{(m)}_{k,t}\mid \mathcal{F}_t\right] \;=\;
v^{(m)}_{k,t}x_{k,t},
$$
where each component m has its
own delay D(m) (or random
delay distribution). If the ROI constraint references total value, we
can simply apply concentration to the sum with a radius that adds across
components (or uses a joint variance proxy). If instead the constraint
is (e.g., ``CPA on purchases must hold even if clicks are plentiful’’),
then the controller must track M separate lower confidence bounds
and update a vector multiplier μR ∈ ℝ ≥ 0M.
The bidding rule becomes
$$
b_{k,t}
\;=\;
\frac{v_{k,t}}{1+\mu^B_{k,t}+\sum_{m=1}^M \lambda_m \mu^{R,(m)}_{k,t}},
$$
for suitable weights λm encoding how
each constraint throttles spend. The qualitative message persists:
feasibility is obtained by ensuring spend is covered by confidence
bounds on each relevant value component, and heterogeneous delays only
enter through component-wise padding terms.
The welfare argument uses core-ness to
charge'' welfare loss to revenue. Real auctions may only satisfy an approximate core condition due to reserve prices, discretization, quality adjustments, or throttling. A standard relaxation is an $\alpha$-core (or additive-$\varepsilon$ core) inequality: for all coalitions $S$ and feasible $y\in X$, \[ \sum_{i\notin S} p_i(b) \;\ge\; \frac{1}{\alpha}\sum_{i\in S}\big(b_i y_i- b_i x_i(b)\big)\;-\;\varepsilon. \] Carrying this through the charging proof typically degrades the $\tfrac{1}{2}$ factor to a constant depending on $\alpha$ and adds $T\varepsilon$ to the welfare loss. This is not merely a technicality: if the auction can systematically violate core-ness, then a bidder's pacing can become lessself-financing’’
in the welfare accounting, and aggregate efficiency can drop even if
every bidder is individually feasible. Practically, this suggests an
empirical diagnostic: measuring the degree of approximate-core violation
(under logged bids) can predict how tight an efficiency guarantee one
should expect from pacing alone.
The theory takes v̄ and radk(⋅, δ) as given, but implementing TSP-DC requires choosing these inputs. We view this as a calibration problem rather than a purely statistical one: the goal is to pick radii that are conservative enough to meet the desired violation probability, but not so conservative that they collapse bids.
A pragmatic approach is to use an empirical Bernstein/Freedman radius
with an estimated conditional variance proxy. Concretely, with
historical logs one can estimate (possibly per-campaign) bounds on Y or winsorize the tail to enforce a
bounded range, then set
$$
\mathrm{rad}_k(T,\delta)
\;\approx\;
c_1 \sqrt{\widehat V_k(T)\log(1/\delta)} \;+\; c_2 \bar v\log(1/\delta)
\;+\; \bar v\cdot \widehat N_k^{\mathrm{pend}}(T),
$$
where V̂k(T)
is an empirical variance proxy for ∑t(Yk, t − vk, txk, t).
Likewise, delay enters through an estimate of the pending mass N̂pend. Importantly, this
calibration can be done using replay or semi-synthetic augmentation, and
then deployed online with periodic re-estimation. The remaining
parameters (ηB, ηR)
are then tuned to control responsiveness versus oscillation; their
choice affects regret and transient welfare but does not change the
underlying safety logic as long as the invariant behind Proposition~2 is
maintained.
Several quantities that appear natural in the analysis are not available in closed form in realistic auctions. Most notably, the mapping μ ↦ (V(μ), P(μ)) depends on the joint distribution of opponents’ bids and the auction’s quality adjustments; even in single-item auctions it is generally piecewise-defined with kinks. Thus, computing a ``best’’ feasible multiplier (even the padded comparator) is typically a numerical task, using bid landscape estimation, simulation-based fixed point methods, or offline policy optimization. Similarly, if we extend to multi-component constraints (multiple μR, (m)), then projecting onto a feasible region or selecting step sizes may require adaptive methods (e.g., diagonal preconditioning) and empirical smoothing to mitigate discontinuities. Our view is that this is appropriate: the theory clarifies must be controlled (spend relative to a conservative value proxy) and uncertainty and delay enter (through padding), while the exact quantitative best response of the market is a domain where numerical estimation and engineering judgment are unavoidable.
While our results are stated abstractly, the underlying feasibility–efficiency logic is amenable to a fairly concrete empirical stress test. We therefore propose a semi-synthetic simulation pipeline whose purpose is not to ``prove’’ the theorems numerically, but to (i) quantify the magnitude of the conservatism terms induced by delay and aggregation, (ii) compare safety mechanisms under realistic feedback imperfections, and (iii) isolate which parts of the welfare loss are attributable to budget pacing, ROI pacing, or auction frictions.
We need a mapping from bids to allocations and payments in each round. A
convenient starting point is a single-item auction with xk, t ∈ [0, 1],
where the highest bid wins (possibly with randomized tie-breaking) and
the payment is between first and second price, ensuring individual
rationality pk, t ≤ bk, txk, t
and approximate core-ness. To make the exercise semi-synthetic rather
than purely parametric, we can estimate an empirical ``bid landscape’’
from logs: for each advertiser (or segment) we fit functions that map an
effective bid to (a) win probability and (b) conditional payment given
winning. The simulator then produces (xk, t, pk, t)
by sampling from these estimated outcome distributions given the bid
vector bt.
This approach preserves non-linearities (reserves, discretization,
quality multipliers) that matter for pacing stability while keeping full
control over the value-generation process.
For each round t and
advertiser k, we generate a
value signal vk, t ∈ [0, v̄]
and then a realized conversion value Yk, t ∈ [0, v̄]
satisfying the key accounting condition
𝔼 [Yk, t ∣ ℱt] = vk, txk, t.
A simple construction that enforces boundedness and unbiasedness
is
$$
Y_{k,t} \;=\; \bar v \cdot
\mathrm{Bernoulli}\!\left(\frac{v_{k,t}x_{k,t}}{\bar v}\right),
$$
which yields 𝔼[Yk, t ∣ ℱt] = vk, txk, t
and captures a stylized ``conversion/no conversion’’ mechanism with
value v̄. To study
heavier-tailed but bounded noise, we can instead take
Yk, t = Π[0, v̄] (vk, txk, t + ϵk, t),
where ϵk, t
is mean-zero conditional on ℱt and Π[0, v̄] denotes
clipping. By varying the distribution and scale of ϵk, t
we directly control the size of the confidence radius required for safe
ROI pacing, and can empirically validate the comparative statics with
respect to v̄ (or an empirical
variance proxy).
To model fixed delay D, we
maintain a queue: after generating Yk, t
at round t, we withhold it
from the advertiser until time t + D. The bidder’s
filtration ℱt thus
includes all outcomes and payments up to t and all realized values up to
t − D. In the
simulation, this matters operationally because the ROI controller cannot
react to short-run over-spend on value until the delayed signal arrives.
We can additionally examine random delays by drawing Dk, t from a
calibrated distribution (e.g., a shifted geometric or log-normal,
truncated for bounded support in experiments), revealing Yk, t
at t + Dk, t.
This allows us to measure how rare long delays translate into larger
pending mass near the horizon and therefore a larger effective padding
requirement.
We partition the horizon into reporting buckets 𝒲1, 𝒲2, … (e.g., daily
windows) and produce reports of the form
Ŷk, j = ∑t ∈ 𝒲jYk, t + ζk, j,
with ζk, j
drawn from a noise distribution representing privacy perturbation (e.g.,
Laplace noise calibrated to a target privacy level). To represent
threshold-based censoring, we additionally allow
Ŷk, j = 1 {∑t ∈ 𝒲jYk, t ≥ τ} ⋅ (∑t ∈ 𝒲jYk, t + ζk, j),
for a chosen threshold τ. From
the bidder’s perspective we then consider two operational modes: (i) an
in which the platform provides a debiased proxy Ỹk, j
for ∑t ∈ 𝒲jYk, t
(when feasible), and (ii) a in which censored buckets are treated as
zero (or as τ if the platform
reveals ``at least τ’’),
yielding a conservative $\underline
Y_{k,j}$. Both modes can be plugged into the same controller
logic, but their performance differs: unbiasedness tightens radii
faster, whereas lower bounds preserve safety at the cost of
throttling.
To avoid confounding by step-size tuning, we sweep (ηB, ηR) over a grid and report both best-achieved performance and robustness (sensitivity) across the sweep.
We vary D, bucket size |𝒲j|, censoring threshold
τ, and noise scale in ζ, as well as γk and budget
tightness ρk. The central
empirical prediction is a smooth tradeoff: TSP-DC should exhibit
near-zero budget violation and ROI violation near the target δ, with welfare decreasing as delay
and privacy constraints tighten, while naive ROI pacing should retain
higher welfare in benign regimes but exhibit sharp ROI failures when
D or censoring increases. A
limitation of any semi-synthetic study is that the bid-to-outcome
simulator may not satisfy exact core-ness and may omit strategic
response by other bidders; we therefore treat these experiments as
measuring the of the safety–efficiency tradeoff implied by our model,
not as a full equilibrium validation.
We studied a repeated-auction environment in which advertisers bid using per-round value signals while the conversions that justify those bids arrive only after a fixed delay. The central modeling tension is practical: modern bidding stacks are asked to satisfy spend constraints and ROI/CPA constraints, yet the relevant value feedback is incomplete at precisely the times when the controller must decide how aggressively to bid. Our results formalize a resolution of this tension. By combining a fast budget-pacing loop (driven by immediate payments) with a slower ROI/CPA loop (driven by a conservative lower confidence bound on cumulative value), we can guarantee ex-post budget feasibility and high-probability ROI feasibility under delayed, noisy conversion realization, while retaining a constant-factor efficiency guarantee in a broad class of core auctions. Importantly, the welfare guarantee does not rely on convergence of the learning dynamics; it follows from a charging argument that treats pacing as an online feasibility mechanism rather than an equilibrium-selection device.
From a systems perspective, the most salient message is that . Delayed conversion feedback does not break the accounting identity that underlies ROI feasibility—spend must be justified by value—but it does change the time at which evidence arrives, thereby forcing the algorithm to act on a predictable proxy (the signal value vk, txk, t) plus a padding term that immunizes against worst-case noise and missing tail information near the horizon. This yields a concrete and interpretable tradeoff: tighter safety (smaller δ), larger delays D, or noisier measurement (larger v̄ or larger conditional variance) widen the confidence radius, effectively shrinking willingness-to-pay and reducing welfare. In this sense, the theory provides a disciplined way to reason about an engineering phenomenon that is often treated heuristically: when attribution is delayed or privacy aggregation is coarse, conservative pacing is not merely ``risk aversion’’ but the price of maintaining a probabilistic constraint with finite-sample guarantees.
At the same time, our framework deliberately abstracts from many details that matter in production markets, and it therefore suggests several open problems. We group them into three themes: improving constants (and thereby narrowing the safety–efficiency gap), weakening stochastic assumptions on values and feedback, and moving beyond the linear pacing/bid-shading form that we analyze.
Our welfare and feasibility guarantees inherit an additive penalty
through radk(T, δ)
that is order $\bar
v\sqrt{T\log(1/\delta)}$ (plus a delay-induced handling of
unresolved terms). This rate is information-theoretically natural under
bounded noise, but the leading constants can be consequential at
marketplace scales, and the delay correction is pessimistic when the
pending mass is typically small. A first direction is to replace
worst-case bounded-difference radii with empirical-variance
(Freedman-style) or self-normalized confidence sequences, so that the
controller adapts when realized variance is low. A second direction is
to integrate the delay directly into the confidence sequence rather than
adding a separate ``pending D’’ term; for instance, one can
treat unrevealed increments as censored observations and develop
predictable lower bounds that exploit the fact that only the last D terms are unknown at time T. A third, more structural,
question is whether one can prove welfare bounds in which the
conservatism enters (as a small shrinkage in effective value) rather
than additively, at least in regimes where budgets scale linearly with
T. Such refinements would make
the theory closer to the empirical reality that large campaigns average
out noise quickly, even when per-impression conversion is sparse.
Although our core feasibility arguments are martingale-based and thus
compatible with substantial adaptivity, several benchmark and efficiency
statements are most transparent under stationarity (or i.i.d. value
signals) and unbiasedness of the delayed measurement process. In
practice, both assumptions are fragile. Value signals drift with user
mix, seasonality, and learning on the platform side; attribution
pipelines may be biased by privacy thresholds, cross-device matching, or
conversion deduplication; and delay is rarely fixed, often exhibiting
heavy tails. A natural next step is to develop pacing guarantees under
adversarial or nonstationary value processes, perhaps measuring
performance against a dynamic comparator with bounded variation, while
maintaining hard budget feasibility and probabilistic ROI feasibility.
This pushes us toward tools from online learning with delays and
drifting constraints, where one must decide how quickly the ROI
controller should ``forget’’ stale evidence without creating
oscillations. Similarly, replacing the strict unbiasedness condition
𝔼[Yk, t ∣ ℱt] = vk, txk, t
with a calibrated approximation (e.g., bounded bias or multiplicative
distortion) would align the model with privacy-preserving reporting.
Here the open problem is conceptual as well as technical: what is the
right notion of ROI feasibility when the reported Y is a noisy proxy for a latent
economic value, and how should the pacing rule trade off safety against
systematic under-reporting? Resolving this would strengthen the policy
relevance of the theory, because privacy constraints are not incidental
frictions; they are often binding design requirements.
We modeled bidding as a scaled form of the value signal, bk, t = vk, t/(1 + μk, t),
which captures a ubiquitous ``linear shading’’ implementation. This
choice is analytically convenient and operationally common, but it is
not obviously optimal once we admit richer auction formats,
heterogeneous objectives, or nonlinear risk/return tradeoffs. One open
direction is to study pacing rules that are (e.g., piecewise-linear or
quantile-based shading) so that the bidder can spend budget on the
highest-quality opportunities while throttling low-quality tail traffic
more aggressively. Another is to incorporate multi-goal constraints
(brand safety, reach, frequency caps) in a principled way, turning
pacing into a multi-dimensional online dual-control problem with delayed
feedback. On the mechanism side, core-ness is a powerful abstraction,
but many display and sponsored-search auctions include quality
adjustments, reserves, and non-separable click models; extending the
welfare-charging argument to these settings, or identifying the minimal
auction properties needed for a similar guarantee, remains open.
Finally, our analysis treats each bidder as a price-taker who reacts to
outcomes but does not explicitly model strategic interaction among
learning bidders. Establishing whether the welfare guarantee is robust
to strategic misreports of vk, t,
or to adversarial pacing dynamics that attempt to manipulate competitors
through transient bid shading, is an important step toward a more
game-theoretic understanding of safe pacing.
Stepping back, we view the model as illuminating a broader design principle: when economic constraints (budgets, ROI/CPA) must be satisfied under delayed and imperfect measurement, the right unit of analysis is not a static equilibrium but an tradeoff governed by concentration and information flow. Making this tradeoff sharper—statistically, behaviorally, and institutionally—is the central agenda for future work.