Safe Pacing with Delayed Conversions: High-Probability ROI/CPA Constraints and Liquid Welfare without Convergence

Metadata

Total Words: 11,342
Export Date: 2026-01-16 01:35:33
Description: Autobidders in modern ad markets optimize under budget and ROI/CPA constraints using only partial feedback. A growing 2026-era challenge is attribution: conversion value is delayed, noisy, and often censored by privacy systems. Existing theory for pacing dynamics and efficiency without convergence assumes immediate value feedback. We introduce a clean repeated-auction model where per-round payment is observed immediately but realized conversion value is revealed after a fixed delay and is a bounded martingale perturbation of a contemporaneous value signal. We propose a two-timescale pacing algorithm: a hard spending controller maintains ex-post budget feasibility, while a conservative ROI controller uses lower confidence bounds (LCBs) on cumulative delayed value to ensure ROI/CPA constraints hold with high probability. When all advertisers run this algorithm in core auctions, we extend the ‘efficiency without convergence’ results of budget pacing and budget+ROI pacing: expected ex-ante liquid welfare remains at least half of optimal up to the usual O(n) term, plus an explicit penalty depending on delay and confidence. Our approach adapts the slack-tracking multiplier invariant from prior work by inserting a predictable confidence process, and uses martingale concentration with delay to translate conservative proxy welfare to true welfare. We discuss how the required assumptions map to ad platform logging and measurement pipelines, and which parts require simulation rather than closed-form evaluation.

1. Introduction and 2026 motivation: attribution delays, privacy measurement, why ex-post safety and market-level efficiency matter without convergence.
2. Related work: pacing dynamics (budget-only, budget+ROI), liquid welfare, bandits with knapsacks, delayed-feedback bandits, constrained learning under partial monitoring; clarify what is new (delay + safety + welfare-without-convergence).
3. Model: repeated (core) auctions, value signals, delayed realized conversion value, constraints (budget ex-post; ROI/CPA high-probability), and welfare benchmark (ex-ante liquid welfare).
4. A conservative ROI/CPA feasibility concept: define cumulative LCB processes and a probabilistic notion of ‘ex-post’ feasibility under delayed realizations; compare to expected-violation formulations.
5. Algorithm: Two-timescale Safe Pacing with Delayed Conversions (TSP-DC): budget multiplier update on immediate payments; ROI multiplier update with confidence padding; bidding rule uses most conservative multiplier.
6. Feasibility theorem: budget holds surely; ROI/CPA holds with probability ≥ 1−δ. Provide explicit confidence radius and dependence on D, δ.
7. Aggregate efficiency theorem: expected ex-ante liquid welfare ≥ 1/2·OPT_ex-ante − O(nv) − (D,δ). Explain how the charging argument changes under conservatism.
8. Individual guarantees (optional): dynamic regret against a delay-robust pacing benchmark (e.g., per-round constraint-pacing with padded constraints), with rates scaling in path length and delay.
9. Extensions and discussion: censoring/aggregation, random (stochastic) delays, heterogeneous delays by event type, approximate-core auctions, and how to empirically estimate model parameters; what needs numerical methods.
10. Simulation plan (semi-synthetic): how to instantiate delayed feedback and privacy aggregation; key metrics (constraint violations, liquid welfare, regret) and baselines (BOMW, naive ROI pacing).
11. Conclusion and open problems: improving constants, removing i.i.d. assumptions, and moving beyond linear pacing.

Content

1. Introduction and 2026 motivation: attribution delays, privacy measurement, why ex-post safety and market-level efficiency matter without convergence.

Digital advertising markets are increasingly mediated by : algorithmic agents that translate an advertiser’s objectives (reach, conversions, revenue) and constraints (budget, ROI/CPA targets, pacing smoothness) into per-auction bids at millisecond timescales. In practice, these agents operate in a regime that is simultaneously data-rich and information-poor. They observe abundant signals—estimated conversion rates, predicted values per impression, or contextual relevance scores—yet the business outcomes that justify spend often arrive only after substantial delay. This delay is not incidental: it is an institutional feature of modern measurement and attribution, shaped by user-level privacy protections, platform policies, and the operational realities of conversion tracking.

The 2026 measurement landscape sharpens this tension. First, conversion attribution is commonly governed by fixed windows (e.g., view-through or click-through horizons) that mechanically postpone the moment at which a conversion can be confidently assigned to an impression. Second, privacy-preserving measurement systems frequently , , or outcome reporting. Whether one thinks of on-device aggregation, delayed postbacks, or randomized reporting, the practical effect is the same: the value of an impression is learned after the auction has cleared, and often after many subsequent auctions have already been played. These frictions make it difficult to run a controller that is both aggressive (to capture value) and safe (to respect ROI/CPA requirements), because ``safety’’ is defined in terms of realized outcomes that are not yet observed.

Two constraints are especially salient in deployed autobidding. The first is the hard budget constraint: the advertiser cannot, ex post, exceed a spending cap over a horizon (a day, a week, a campaign). Violating this constraint is operationally unacceptable, and platforms routinely enforce it with conservative accounting and throttling. The second is a return constraint, which we interpret broadly to include ROI, CPA, or cost-per-incremental-action targets. While budget constraints depend only on payments and therefore can be checked immediately, ROI/CPA constraints depend on realized conversions and hence are intrinsically delayed. In a setting with delayed and noisy attribution, an algorithm that na"ively spends against point estimates of conversion value can easily drift into an ROI violation before the feedback arrives, at which point the violation cannot be undone.

Our goal is to formalize and resolve this practical dilemma: how can an autobidder participate in a repeated auction while ensuring (i) budget feasibility, (ii) ROI/CPA feasibility despite delayed outcome feedback, and (iii) nontrivial guarantees? The third criterion deserves emphasis. The repeated-auction environment is not a single-agent optimization problem; it is a strategic, multi-agent system in which many advertisers simultaneously run learning and control algorithms. The platform also plays an active role by determining allocation and payments as a function of bids. Even if each advertiser uses a ``reasonable’’ controller, the joint dynamics need not converge to a steady state, particularly when values are nonstationary, budgets bind, and agents update at different timescales. Yet from the perspective of policy and platform design, we care about aggregate outcomes—who gets served, how much value is generated, and how much welfare is left on the table—even when the system never settles.

This motivates a welfare notion that is compatible with constraints and non-convergence. Classical social welfare is ill-suited when advertisers face hard budgets and ROI requirements: a bidder with high value but a tight ROI target cannot simply pay arbitrarily to obtain impressions. The appropriate benchmark is a notion of welfare, which caps each agent’s contribution by what it can feasibly ``liquidate’’ into spend under its constraints. In the repeated-auction context, this benchmark is naturally stated in terms of an ex-ante proxy built from predictive value signals (the quantities an autobidder actually sees at decision time), and then related back to realized value using concentration. Conceptually, we want a guarantee that says: even if bids are generated by learning dynamics that oscillate, as long as the auction satisfies a mild deviation-proof condition (a core-like property) and advertisers pace in a disciplined way, the system achieves a constant fraction of the best feasible liquid welfare, up to unavoidable statistical and learning error terms.

The central technical challenge is the interaction between delayed outcomes and ROI feasibility. A budget controller can be myopic: it observes spend and updates a multiplier to keep cumulative payments below the cap. An ROI controller cannot do this directly, because the ``resource’’ it is balancing against (realized conversions) is not contemporaneously observed. The temptation in practice is to substitute predicted conversions or values, but this can be unsafe when predictions are biased or simply noisy at the granularity of individual auctions. Our modeling stance is that predictions can be unbiased conditional on the information available at bidding time, but realized conversions remain bounded and stochastic. Under this assumption, safety requires a : an algorithm must behave as if its available conversion value were a lower confidence bound rather than a point estimate. Delay enlarges the required buffer near the horizon, because the most recent outcomes remain unresolved and must be treated pessimistically.

A second challenge is the systems-level requirement of guarantees. Campaigns run for finite horizons, and the most costly failures occur early, when multipliers have not stabilized and when the algorithm has not yet accumulated evidence about realized ROI. Thus, we aim for statements that hold for all horizons and that quantify explicitly how safety and welfare degrade with the confidence level and the delay length. This emphasis also reflects accountability pressures faced by platforms and large advertisers: one often needs a knob (a confidence parameter) that trades performance for a certified bound on constraint violations, and one needs to understand how changes in measurement latency or attribution policy translate into conservative bidding and welfare loss.

Our contribution is therefore conceptual as much as technical. We show how to separate the controller into two timescales: one that enforces hard budget feasibility using immediate payment feedback, and another that enforces ROI feasibility against a conservative, statistically justified proxy of cumulative realized value. The proxy is constructed so that it is predictable with respect to the bidding-time information and so that, with high probability, it never overstates the realized value that will eventually be observed. This design mirrors how practitioners already think about guardrails'' andsafety margins,’’ but it makes the guardrail explicit and analyzable: we can trace how much efficiency is sacrificed in exchange for a chosen level of safety, and how that sacrifice grows with delay and outcome noise.

We also acknowledge what this perspective does not capture. Real-world attribution can be biased, adversarially manipulated, or subject to complex cross-campaign interference; moreover, delays are not always fixed, and conversions may be censored rather than merely postponed. Our focus is narrower: we isolate the clean economic tradeoff between delayed feedback and feasible bidding under ROI constraints, in a model that remains close enough to practice to inform platform design. The upshot is a framework in which ex-post safety and market-level efficiency are not competing desiderata to be handled heuristically, but jointly attainable guarantees—even when the underlying learning dynamics fail to converge.

Our starting point is the literature on in online advertising, in which an advertiser (or platform) modulates bids over time to respect a spending cap while competing in a sequence of auctions. Classic budget pacing mechanisms use a multiplicative ``pacing multiplier’’ to shade bids when cumulative spend threatens to exceed the budget, and they are tightly connected to dual variables of a Lagrangian relaxation of the budget constraint (see, e.g., ). A parallel line studies equilibrium structure and platform design under pacing, including how pacing interacts with first-price versus second-price payment rules and how it can be implemented through throttling or bid scaling . This body of work largely treats the constraint as : spend is observed at the moment of the auction outcome, so feasibility can be enforced ex post by accounting alone.

More recent work moves beyond budget-only control to incorporate , often expressed as ROI, CPA, or cost-per-action targets. Algorithmically, these constraints are naturally represented by an additional dual variable that tracks the gap between spend and value, producing a two-multiplier (or two-timescale) controller . Analytically, ROI constraints are more delicate than budgets because feasibility depends on realized outcomes and attribution logic, not merely on payments. Most existing analyses either (i) treat value feedback as contemporaneously observed, (ii) enforce ROI only in expectation (or in steady state), or (iii) abstract away from the strategic interaction among many simultaneously learning advertisers. Our contribution fits into this pacing tradition but isolates the friction that is central in practice: , and the constraint is naturally a rather than an asymptotic or average-case desideratum.

A second thread we build on is the use of as an efficiency benchmark in markets with hard financial constraints. Liquid welfare, originally introduced to capture the idea that an agent cannot contribute more than it can feasibly pay for , has since become a standard yardstick for auctions and matching markets under budgets . In repeated auction environments with pacing, liquid welfare is particularly attractive because it aligns with the economic object that pacing is implicitly optimizing: a bidder’s effective willingness-to-pay is clipped by its constraints. Recent ``welfare without convergence’’ results show that even when learning dynamics do not settle to an equilibrium, one can still certify a constant-factor approximation to an ex-ante liquid-welfare benchmark under mild auction conditions such as core-like deviation-proofness . Our analysis follows this philosophy but must confront an additional gap: with delayed conversions, the natural welfare proxy is formed from , whereas feasibility is defined in terms of (delayed) outcomes.

The algorithmic tools in our work are also related to (BwK) and more broadly to online learning with resource constraints. BwK studies a single decision-maker that repeatedly chooses arms with stochastic rewards and consumptions, aiming to maximize reward while keeping cumulative consumption within a budget . The primal–dual viewpoint in BwK is close in spirit to pacing: a dual variable prices the resource and induces conservative choices when the resource becomes scarce. However, BwK typically assumes the reward is observed (perhaps with noise) immediately after each pull, and it does not include strategic externalities induced by an auction that maps bids into allocations and payments. Moreover, ROI/CPA constraints are ratio-type constraints that couple reward and spend, rather than a single knapsack constraint on spend alone. Our controller can be viewed as a dual-based method for a ratio constraint, but the proof burden is different because we need feasibility under delayed, bounded outcomes rather than expected feasibility.

Delayed feedback has been studied extensively in online learning and bandits . The key observation is that delay primarily affects the : updates are based on stale feedback, which slows learning and typically worsens regret by an additive or multiplicative term depending on the delay model. While these results provide useful intuition, they do not directly address our setting for two reasons. First, we require a for a constraint defined in terms of delayed outcomes; the central question is not only how fast we learn, but how we avoid violating ROI before the evidence arrives. Second, delayed bandit analyses are usually single-agent performance comparisons against an optimal fixed action, whereas our primary efficiency statement is a welfare guarantee that holds under simultaneous play by many adapting agents and does not rely on convergence.

Our work is also connected to constrained online convex optimization and partial monitoring. There is a large literature on online algorithms that maintain feasibility of long-run constraints via primal–dual updates or virtual queues, often guaranteeing that constraint violation is sublinear while regret is near-optimal . In the advertising context, one can interpret pacing multipliers as such virtual queues. Yet two features of our environment push beyond the standard templates. First, the feedback is : advertisers observe allocations and payments immediately but observe conversion value only after delay, and the value itself is stochastic even conditional on bidding-time information. Second, the constraint of interest (ROI/CPA) is operationally closer to a requirement: platforms and advertisers typically want a knob that delivers an explicit bound on the probability of violation, not only an asymptotic vanishing violation rate.

What is new in our paper is therefore the synthesis of three ingredients that have largely been treated separately. (i) We incorporate into a repeated-auction pacing model in a way that is compatible with the informational reality of autobidding: bids are chosen using predictive signals, while realized value is revealed later through attribution. (ii) We provide a ROI/CPA feasibility guarantee by explicitly using martingale concentration to construct a lower confidence bound on cumulative realized value and updating the ROI multiplier against this conservative proxy. This makes the safety margin transparent and quantifiable in terms of the delay length and the desired confidence level. (iii) We retain the welfare perspective: when all advertisers run the algorithm in a core auction, we certify a constant-factor approximation to an ex-ante liquid-welfare benchmark, with an additional, interpretable penalty that reflects statistical conservatism induced by delay. In practical terms, our results formalize how measurement latency and privacy-induced reporting frictions translate into systematically more conservative bidding, and they do so in a way that preserves market-level efficiency guarantees rather than focusing solely on single-agent regret.

3. Model: repeated (core) auctions, value signals, delayed realized conversion value, constraints (budget ex-post; ROI/CPA high-probability), and welfare benchmark (ex-ante liquid welfare).

Delayed conversions force a conceptual choice about what it means to be ROI-feasible'' \emph{while bidding}. If conversion values were observed immediately, one could directly enforce the cumulative constraint \[ \sum_{t=1}^T p_{k,t}\;\le\;\frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t} \] either ex post (by stopping spend once the inequality becomes tight) or in expectation (by controlling the conditional drift of the left-minus-right side). With a fixed delay $D$, however, the right-hand side is not observable at decision time, so an advertiser cannot literally condition on the realized ROI numerator when choosing bids. In practice, autobidders instead act on \emph{predictive} value signals, and then later face an accounting reconciliation when conversions are reported. Our goal is to formalize a notion of feasibility that (i) is meaningful from a compliance perspective (we violate with probability at most δ’’), and (ii) can be enforced using information available at bid time.

Fix an advertiser k. Write cumulative spend and (true) cumulative value as
$$ P_{k,t}\;=\;\sum_{\tau=1}^t p_{k,\tau}, \qquad Y_{k,t}^{\mathrm{cum}}\;=\;\sum_{\tau=1}^t Y_{k,\tau}. $$
The ROI/CPA constraint can be rewritten as a condition on cumulative slack:
G_k, T ≡ γ_kP_k, T − Y_k, T^cum ≤ 0.
With delay, G_k, T is unknown at time T, let alone at intermediate times. What is known online is the predictable proxy v_k, tx_k, t, which satisfies the martingale-unbiasedness condition 𝔼[Y_k, t ∣ ℱ_t] = v_k, tx_k, t. This suggests tracking the slack
$$ \widehat G_{k,t}^{\mathrm{proxy}} \;\equiv\; \gamma_k P_{k,t}-\sum_{\tau=1}^t v_{k,\tau}x_{k,\tau}, $$
but feasibility in terms of Ĝ_k, t^proxy is not, by itself, a safety guarantee, because realized outcomes can fall below their conditional means. The natural conservative correction is therefore to replace realized value by a (LCB) on cumulative realized value, yielding a sufficient condition for ROI feasibility.

For each k, consider the martingale difference sequence ξ_k, t = Y_k, t − v_k, tx_k, t with 𝔼[ξ_k, t ∣ ℱ_t] = 0 and ξ_k, t ∈ [−v̄, v̄]. A standard way to express concentration uniformly over horizons is to define a radius function rad_k(t, δ) such that, with probability at least 1 − δ,

We emphasize two features of . First, it is : it holds simultaneously for all t, which is the right form when an algorithm adapts bids over time. Second, it is compatible with fixed delay: the inequality is a statement about the realized trajectory, not about what is observable at time t. The delay affects (one cannot verify the bound online) and (at time T there are D outcomes not yet revealed), but it does not change the underlying martingale structure.

To make observability explicit, it is useful to separate revealed and pending outcomes at time t. Let
$$ Y_{k,t}^{\mathrm{revealed}} \;=\; \sum_{\tau=1}^{t-D} Y_{k,\tau}, \qquad Y_{k,t}^{\mathrm{pending}} \;=\; \sum_{\tau=t-D+1}^{t} Y_{k,\tau}, $$
where the pending sum is interpreted as 0 if t ≤ D. At time t, only Y_k, t^revealed is known, while Y_k, t^pending ∈ [0, Dv̄] is unknown but nonnegative. This simple nonnegativity is already informative: it implies that any lower bound on Y_k, t^revealed is also a lower bound on Y_k, t^cum.

We now define the LCB process that will serve as the conservative accounting numerator. For each k and time t, define

The quantity $\underline Y_{k}(t;\delta)$ is ℱ_t-measurable because it depends only on value signals, allocations, and a deterministic radius. On the high-probability event in , it satisfies $\underline Y_{k}(t;\delta)\le Y_{k,t}^{\mathrm{cum}}$ for all t. This is the key monotonicity we need: spending against $\underline Y$ is, by construction, safe against the (unknown) realized Y up to a probability δ.

When delay is operationally salient (e.g., at the end of an accounting period), one may further define an LCB based only on information revealed by time t:

which lower-bounds Y_k, t^revealed (and hence Y_k, t^cum) on the same event. We view as the analytically clean object for guaranteeing final-period feasibility, and as the practical real-time diagnostic when one wants to know ``how safe are we given what has been measured so far?’’

Using the LCB, we define a conservative feasibility notion that mirrors ex-post feasibility but accommodates delay and uncertainty.

A bidding (and allocation/payment) trajectory is for advertiser k if
ℙ (γ_kP_k, T ≤ Y_k, T^cum) ≥ 1 − δ.
It is if it satisfies the deterministic inequality

On the event , implies γ_kP_k, T ≤ Y_k, T^cum, so LCB-feasibility is a sufficient condition for (1 − δ)-ROI-feasibility.

Two comments clarify the interpretation. First, this is ``ex-post’’ in the sense that it is a statement about the ROI numerator, not merely about conditional expectations. Second, it is probabilistic: because realized conversions are stochastic (even conditional on bid-time information), no nontrivial algorithm can guarantee ROI surely without assuming deterministic outcomes. The confidence level δ therefore plays the role of an explicit compliance knob.

A common alternative in learning-with-constraints is to enforce ROI , e.g.,
𝔼 [γ_kP_k, T − Y_k, T^cum] ≤ 0,
or to guarantee sublinear expected violation 𝔼[(γ_kP_k, T − Y_k, T^cum)₊] = o(T). These guarantees are mathematically convenient but economically misaligned with many advertising objectives. An expectation guarantee permits rare but severe shortfalls in realized conversions—exactly the events that can trigger budget resets, credit limits, or contractual disputes. Likewise, sublinear violation does not preclude an $\Theta(\sqrt{T})$ overspend relative to realized value, which may be operationally large over long horizons.

Our LCB-based notion makes this tradeoff transparent. It replaces a potentially fragile expectation condition by a robust high-probability condition, at the cost of a statistical ``safety margin’’ rad_k(T, δ) that effectively reduces the value one is willing to spend against. This is the precise channel through which delay, noise, and stringent confidence requirements translate into conservative bidding and (as we later quantify) an additive welfare penalty.

4. A conservative ROI/CPA feasibility concept: define cumulative LCB processes and a probabilistic notion of ‘ex-post’ feasibility under delayed realizations; compare to expected-violation formulations.

Delayed conversion reporting forces us to separate two questions that are effectively identical in the no-delay case: what does it mean to be ROI/CPA-feasible , and what can an autobidder enforce ? When realized conversion value is observed immediately, one can treat the cumulative ROI constraint as an online accounting identity and simply ``stop spending’’ once the constraint becomes tight. With a fixed delay D, the right-hand side is revealed only after decisions have already been made, so any meaningful feasibility notion must explicitly confront uncertainty about the (as yet unobserved) realized value.

Fix advertiser k and define cumulative spend and realized value up to time t:
$$ P_{k,t}\;=\;\sum_{\tau=1}^t p_{k,\tau}, \qquad Y_{k,t}^{\mathrm{cum}}\;=\;\sum_{\tau=1}^t Y_{k,\tau}. $$
The ROI/CPA requirement $\sum_{t=1}^T p_{k,t}\le \frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t}$ is equivalently a constraint on :
G_k, t ≡ γ_kP_k, t − Y_k, t^cum, and the target is G_k, T ≤ 0.
In a delayed-feedback environment, G_k, t is not observable at time t because Y_k, t^cum includes outcomes that have not yet been measured. The right modeling move, therefore, is to identify an for Y_k, t^cum and then guard it with a statistically valid safety margin.

The key primitive is the martingale-unbiasedness condition
𝔼 [Y_k, t ∣ ℱ_t] = v_k, tx_k, t,
so ∑_τ ≤ tv_k, τx_k, τ is the natural predictable proxy for cumulative realized value. This suggests tracking the
$$ \widehat G_{k,t}^{\mathrm{proxy}} \;\equiv\; \gamma_k P_{k,t}-\sum_{\tau=1}^t v_{k,\tau}x_{k,\tau}. $$
However, proxy slack alone is not a compliance guarantee: even if Ĝ_k, T^proxy ≤ 0, realized conversions can fall below their conditional means along an unlucky sample path, leading to G_k, T > 0 and an ROI violation. For operational constraints that are evaluated on realized outcomes, the conservative object is therefore not the conditional mean, but rather a (LCB) on cumulative realized value.

Let ξ_k, t = Y_k, t − v_k, tx_k, t. Then (ξ_k, t)_t ≥ 1 is a martingale difference sequence with respect to (ℱ_t), bounded in [−v̄, v̄]. Standard concentration tools (Azuma–Hoeffding under boundedness, Freedman under a variance proxy) yield a function rad_k(t, δ) such that, with probability at least 1 − δ,

We emphasize the form: holds simultaneously for all t, which is essential because the bidding process adapts to past observations and thus selects stopping times endogenously.

A fixed delay D affects what is at time t, but it does not destroy the martingale structure underlying . What changes is the algorithmic interpretation: near the end of the horizon, there are outcomes that are not yet measurable, so any implementable diagnostic must treat these terms conservatively.

To isolate what can be verified online, decompose the cumulative realized value at time t into revealed and pending components:
$$ Y_{k,t}^{\mathrm{revealed}} \;=\; \sum_{\tau=1}^{t-D} Y_{k,\tau}, \qquad Y_{k,t}^{\mathrm{pending}} \;=\; \sum_{\tau=t-D+1}^{t} Y_{k,\tau}, $$
with the convention that Y_k, t^revealed = 0 when t ≤ D. At time t, Y_k, t^revealed is known, while Y_k, t^pending is unknown but satisfies the deterministic bound 0 ≤ Y_k, t^pending ≤ Dv̄. This observation is simple but useful: any lower bound on Y_k, t^revealed is automatically a lower bound on Y_k, t^cum.

We define the (signal-based) LCB process that will serve as a conservative accounting numerator. For each advertiser k and time t, let

By construction, $\underline Y_k(t;\delta)$ is ℱ_t-measurable: it depends only on bids/allocations (immediately observed), value signals, and a deterministic radius. On the event , it is a pathwise lower bound on realized cumulative value, i.e.,
$$ \underline Y_k(t;\delta)\;\le\;Y_{k,t}^{\mathrm{cum}}\qquad\forall t. $$
For real-time monitoring under delay, it is also natural to define a revealed-value version,

which lower-bounds Y_k, t^revealed on the same high-probability event. We view as the analytically clean object for end-of-horizon feasibility statements, and as the operationally meaningful ``what we can certify right now’’ quantity.

We now state the compliance notion we target.

A trajectory is for advertiser k if
ℙ (γ_kP_k, T ≤ Y_k, T^cum) ≥ 1 − δ.

Because Y_k, T^cum is random, a nontrivial sure guarantee is generally impossible without additional determinism assumptions. Our approach is to enforce a stronger, pathwise inequality against the LCB:

On the event , implies γ_kP_k, T ≤ Y_k, T^cum, and hence it is a sufficient condition for (1 − δ)-ROI-feasibility. The parameter δ becomes an explicit compliance knob: tightening δ widens rad_k(T, δ) and makes bidding more conservative.

A common alternative is to impose ROI constraints only in expectation, for example
𝔼 [γ_kP_k, T − Y_k, T^cum] ≤ 0,
or to bound expected positive-part violation, 𝔼[(γ_kP_k, T − Y_k, T^cum)₊]. Such formulations are attractive analytically, but they are a poor match for how ROI/CPA constraints are enforced in practice. An expectation guarantee permits rare but economically severe shortfalls in realized conversions; these are precisely the events that trigger billing disputes, credit holds, or internal risk limits. Likewise, a sublinear expected violation can still correspond to a large absolute overspend (e.g., $\Theta(\sqrt{T})$) relative to realized value, which may be operationally material over long horizons.

Our LCB-based notion makes the tradeoff explicit: to obtain a high-probability statement about realized ROI, we must withhold a statistical margin rad_k(T, δ) from the value we are willing to ``spend against.’’ This margin is the conduit through which delay, noise, and stringent compliance requirements translate into conservative bidding—and, as we quantify later, into an additive efficiency loss. The next section shows how to embed into a pacing-style update rule that remains implementable under delayed measurements.

5. Algorithm: Two-timescale Safe Pacing with Delayed Conversions (TSP-DC): budget multiplier update on immediate payments; ROI multiplier update with confidence padding; bidding rule uses most conservative multiplier.

We now describe a pacing rule that operationalizes the conservative feasibility idea from the previous section: rather than spending against realized conversion value (which is delayed), the bidder spends against a predictable lower confidence bound (LCB) on cumulative value. The design goal is deliberately modest: we do not attempt to learn a stationary best-response or reach a fixed point. Instead, we maintain while continuing to bid myopically on the current value signal v_k, t, throttled by multipliers that track (i) budget tightness and (ii) ROI/CPA tightness.

Fix an advertiser k. TSP-DC maintains two nonnegative pacing multipliers:
μ_k, t^B ≥ 0 (budget multiplier), μ_k, t^R ≥ 0 (ROI/CPA multiplier).
In each round t, after observing the value signal v_k, t, the effective multiplier is the most conservative one,

and the bidder submits the shaded bid

This is the same ``multiplicative pacing’’ form used in budget pacing: a higher multiplier reduces the bid proportionally, which is convenient both analytically (it corresponds to a Lagrangian relaxation) and practically (it is easy to implement inside an autobidder).

Because the budget constraint is required to hold surely, TSP-DC includes a deterministic cap based on remaining budget. Let
$$ B^{\mathrm{rem}}_{k,t}\;\equiv\;B_k-\sum_{\tau=1}^{t-1} p_{k,\tau} $$
denote remaining budget at the start of round t. If B_k, t^rem ≤ 0, we set b_k, t = 0 (the bidder drops out). Otherwise, we cap the bid as

This cap is conservative but robust: by individual rationality of the auction, p_k, t ≤ b_k, tx_k, t ≤ b_k, t, so implies p_k, t ≤ B_k, t^rem and hence $\sum_{t=1}^T p_{k,t}\le B_k$ deterministically. (In settings where x_k, t ∈ {0, 1}, the implication is immediate; for fractional allocations, x_k, t ≤ 1 suffices.)

After bids are submitted and the auction clears, the bidder observes the allocation and payment (x_k, t, p_k, t) immediately. TSP-DC then updates its multipliers using two (possibly different) learning rates η_B, η_R > 0. The budget update uses only immediate spend and targets the per-round budget rate ρ_k = B_k/T:

where [z]₊ = max {z, 0} denotes projection onto ℝ_≥ 0. This is the standard pacing-as-dual-ascent update: if the bidder spends above its target rate in round t, μ^B increases and future bids are shaded more; if it underspends, μ^B relaxes.

The ROI controller is conceptually similar but must be more conservative because realized value is delayed and noisy. We update μ^R using a per-round slack increment. Define

with the convention rad_k(0, δ) = 0. The ROI multiplier update is

Several features are worth emphasizing.

(i) The term v_k, tx_k, t is observable at the end of round t (it depends on the signal and the allocation), while Δ_k, t is deterministic given (t, δ) and the chosen confidence radius. This is the key implementability point under delay: the ROI controller reacts immediately to spend and to the proxy for value.

(ii) Because Δ_k, t is defined as the difference of radii, the cumulative padded slack satisfies
$$ \sum_{\tau=1}^t \tilde s_{k,\tau} \;=\; \gamma_k\sum_{\tau=1}^t p_{k,\tau} - \sum_{\tau=1}^t v_{k,\tau}x_{k,\tau} + \mathrm{rad}_k(t,\delta). $$
Thus, μ^R is effectively tracking (a projected, scaled version of) the conservative proxy slack $\gamma_k P_{k,t}-\underline Y_k(t;\delta)$, without requiring the bidder to explicitly store or recompute long sums.

(iii) Any predictable function rad_k(t, δ) that upper-bounds the deviation between realized and proxy cumulative value (accounting for fixed delay D) can be used in . Operationally, one can inflate the radius near the horizon to reflect that the last D outcomes are pending. We postpone the exact functional form and its dependence on (D, δ) to the next section; for the algorithm, the only requirement is that rad_k(t, δ) be nondecreasing in t, so that Δ_k, t ≥ 0.

Using μ_k, t as in is a simple but important design choice. Both constraints reduce willingness-to-pay: a tight budget calls for spending less overall, and a tight ROI/CPA calls for spending less per unit of value. Since larger multipliers reduce bids, taking the maximum ensures that dominates bidding. This avoids pathological interactions where one controller relaxes while the other tightens: the bidder always uses the stricter throttle, while both multipliers continue to update in the background based on their own accounting signals.

For clarity, we summarize TSP-DC for advertiser k as the following per-round routine:

The ``two-timescale’’ terminology reflects that the platform-facing hard budget constraint is typically managed more aggressively than ROI/CPA: budgets are contractual and immediately enforceable, whereas ROI is statistical and delayed. In our analysis, it is convenient (though not strictly necessary) to allow η_B and η_R to differ, e.g., with η_B larger so that μ^B reacts faster to overspend. Practitioners often also impose upper bounds $\mu^B_{k,t},\mu^R_{k,t}\le \overline\mu$ to prevent extreme bid shading when signals spike; such caps can be added as an additional projection without changing the logic of the update.

Finally, we stress what the algorithm do: it does not attempt to infer or predict other bidders, and it does not require that the induced dynamics converge. Each advertiser needs only its own (v_k, t, x_k, t, p_k, t) stream and a pre-specified radius schedule. The next section formalizes the resulting feasibility guarantees and makes explicit how the required conservatism scales with delay D and confidence level δ.

6. Feasibility theorem: budget holds surely; ROI/CPA holds with probability ≥ 1−δ. Provide explicit confidence radius and dependence on D, δ.

We now state and prove the safety properties that motivate TSP-DC. The budget constraint is enforced deterministically (and hence does not interact with the conversion delay), while the ROI/CPA constraint is enforced against a predictable (LCB) on cumulative value. The main technical choice is therefore an explicit confidence radius schedule that (i) is valid for the martingale difference noise in realized conversions and (ii) is implementable under a fixed observation delay D.

Fix advertiser k and recall ξ_k, t ≡ Y_k, t − v_k, tx_k, t, where 𝔼[ξ_k, t ∣ ℱ_t] = 0 and ξ_k, t ∈ [−v̄, v̄]. A convenient (worst-case) Freedman/Azuma-style confidence radius for partial sums is

which satisfies, for any fixed m,
$$ \mathbb{P}\!\left(\sum_{t=1}^{m} Y_{k,t}\;\ge\;\sum_{t=1}^{m} v_{k,t}x_{k,t}-\mathrm{rad}^{\mathrm{basic}}(m,\alpha)\right)\;\ge\;1-\alpha. $$
To obtain a statement that holds simultaneously over horizons and that remains predictable when conversions are delayed, we combine two standard reductions: (i) a union bound over m ∈ {0, 1, …, T} and (ii) a worst-case bound on the D (unrevealed) rounds. Concretely, define the delayed, time-uniform radius

where (t − D)₊ = max {t − D, 0}. The first term controls statistical fluctuations on the revealed prefix of length (t − D)₊, and the additive Dv̄ is the conservative penalty for the D most recent (unrevealed) values, each of which could be as low as 0 in the worst case.

Given , we define the predictable cumulative value LCB

Importantly, $\underline{Y}_{k,t}(\delta)$ is computable by the bidder at the end of round t from (v_k, τ, x_k, τ)_τ ≤ t and known constants (v̄, D, δ), without needing any realized conversions.

The multiplier update is designed to the conservative slack over time. For a clean feasibility statement that is robust to arbitrary bid-to-payment mappings consistent with individual rationality, we additionally use the same idea as the remaining-budget cap, now applied to conservative ROI credit. Define the conservative remaining ROI spend capacity at the start of round t as

and interpret R_k, t^rem(δ) ≤ 0 as ``no safe spend remains.’’ We then cap the bid by

in addition to the budget cap . This cap is operationally simple (it depends only on the bidder’s own accounting) and can be viewed as the hard-safety analogue of the ROI multiplier: when conservative credit is scarce, bidding is throttled all the way to zero regardless of the multiplier state.

We can now state the main safety guarantee.

The budget part is immediate from the earlier accounting argument: after applying we have p_k, t ≤ b_k, t ≤ B_k, t^rem, so the cumulative spend cannot exceed B_k.

For ROI, the cap ensures an analogous deterministic invariant with respect to the LCB. Indeed, by individual rationality, p_k, t ≤ b_k, tx_k, t ≤ b_k, t, and therefore implies p_k, t ≤ [R_k, t^rem(δ)]₊. Summing over rounds yields the telescoping inequality

which holds surely (it is purely algebraic given the cap and the definition of $\underline{Y}$).

The second step is statistical: by construction of rad_k(⋅, δ) in , a union bound over k and t implies that with probability at least 1 − δ,

Combining and yields the desired ROI/CPA feasibility:
$$ \sum_{t=1}^T p_{k,t} \;\le\; \frac{1}{\gamma_k}\left(\sum_{t=1}^T v_{k,t}x_{k,t}-\mathrm{rad}_k(T,\delta)\right) \;\le\; \frac{1}{\gamma_k}\sum_{t=1}^T Y_{k,t}. $$

Equation makes the safety–efficiency tradeoff transparent. The statistical term scales as $\bar v\sqrt{(T-D)\log(nT/\delta)}$ (plus a lower-order v̄log (nT/δ) term), while the delay contributes an additive Dv̄ that is unavoidable without additional structure: if the last D outcomes have not yet been observed, maintaining a one-sided guarantee forces us to treat them pessimistically. Practically, smaller δ (stricter safety) and larger D (longer attribution windows) both tighten the conservative LCB and therefore reduce admissible spend through ; this is precisely the policy-relevant channel through which measurement latency depresses feasible bidding even when expected conversions are unchanged.

7. Aggregate efficiency theorem: expected ex-ante liquid welfare ≥ 1/2·OPT_ex-ante − O(nv) − (D,δ). Explain how the charging argument changes under conservatism.

We now turn from individual safety to a market-level guarantee: even though (i) conversion feedback is delayed and (ii) bidders update multipliers online without converging to an equilibrium, the induced outcomes are approximately efficient in the standard sense, up to a statistical conservatism penalty that is explicit in (D, δ). The key message is that delay does not break the familiar proof; it only changes the effective willingness-to-pay that we are allowed to treat as feasible when ROI/CPA must hold with high probability.

To make this precise, let Φ denote the (random) allocation-sequence rule induced when all advertisers run TSP-DC and bid b_k, t = v_k, t/(1 + μ_k, t) (with the budget and ROI remaining-credit caps applied). Recall that our ex-ante benchmark is defined in terms of the observable signals v and a single-round rule y(⋅) ∈ X:
$$ W^{\mathrm{ex\text{-}ante}}(y,F) \;=\; \sum_{k=1}^n T\cdot \min\Big\{\rho_k,\;\frac{1}{\gamma_k}\,\mathbb{E}_{v\sim F}\!\big[y_k(v)\,v_k\big]\Big\}. $$
Likewise, W^ex-ante(Φ, F) denotes the welfare of the induced play, evaluated against the same proxy values v.

The proof follows the same two-step structure as in pacing-without-convergence analyses: first, we relate the welfare loss to auction revenue using core-ness; second, we upper bound total revenue by an appropriate notion of that is feasible for each bidder. The only substantive change under delay is that the feasible liquid value is no longer $\frac{1}{\gamma_k}\sum_t v_{k,t}x_{k,t}$, but rather its counterpart $\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)$, which is smaller by rad_k(T, δ)/γ_k. This is exactly the additional penalty term in .

To see where the factor 1/2 comes from, recall that pacing bids have the multiplicative form b_k, t = v_k, t/(1 + μ_k, t). When a bidder is (say μ_k, t ≤ 1), we have b_k, t ≥ v_k, t/2, so the auction is, in effect, running on bids that preserve at least half of the signal values. In contrast, when a bidder is (μ_k, t > 1 or the remaining-credit caps bind), the bidder is expressing low willingness-to-pay not because the impression is intrinsically low value, but because some constraint is becoming tight. In that regime, the classic charging argument reallocates the would-be welfare loss to the fact that the bidder must be spending close to their liquid limit.

Formally, fix an arbitrary comparator y(⋅) ∈ X and apply the core condition round-by-round with S chosen as the set of advertisers whose bids are “small” relative to the comparator’s suggested allocation. Summing the resulting inequalities over rounds yields a bound of the form
$$ \sum_{t=1}^T \sum_{k=1}^n b_{k,t}y_{k,t} \;\le\; \sum_{t=1}^T \sum_{k=1}^n b_{k,t}x_{k,t} \;+\; \sum_{t=1}^T \sum_{k=1}^n p_{k,t}, $$
where we have used individual rationality to control the deviation term by revenue. The left-hand side is a proxy for the comparator’s total value evaluated at our bids, while the first term on the right is our realized “bid-weighted” welfare, and the second is total revenue.

At this point, the pacing form of bids provides the welfare approximation. Since b_k, t = v_k, t/(1 + μ_k, t), we can write
v_k, tx_k, t = (1 + μ_k, t) b_k, tx_k, t ≥ b_k, tx_k, t,
and, more importantly, whenever μ_k, t ≤ 1 we have v_k, tx_k, t ≥ 2 b_k, tx_k, t. Aggregating across bidders and rounds, one shows that at least half of the comparator’s proxy welfare is either (i) directly realized as proxy welfare under Φ, or (ii) can be charged to revenue collected during rounds in which some bidder is significantly paced.

Finally, we upper bound total revenue by . By Theorem~, for each advertiser k we have deterministically
$$ \sum_{t=1}^T p_{k,t} \;\le\; \min\Big\{B_k,\;\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)\Big\} \;=\; \min\Big\{B_k,\;\frac{1}{\gamma_k}\sum_{t=1}^T v_{k,t}x_{k,t}\Big\} \;-\; O\!\Big(\frac{1}{\gamma_k}\mathrm{rad}_k(T,\delta)\Big), $$
where the last step simply isolates the loss from enforcing ROI/CPA against an LCB rather than the raw proxy sum. Summing over k bounds total revenue by the sum of conservative liquid values; substituting this revenue bound into the core charging inequality yields . The appearance of D is entirely through rad_k(T, δ): with a fixed D-round delay, the confidence radius contains an unavoidable additive Dv̄ term (cf. ), which translates linearly into a liquid-welfare penalty.

Equation is then a direct corollary of the concentration event : once the realized cumulative values dominate their LCBs, the true liquid value $\frac{1}{\gamma_k}\sum_t Y_{k,t}$ dominates the conservative proxy liquid value $\frac{1}{\gamma_k}\underline{Y}_{k,T}(\delta)$, so any ex-ante welfare lower bound transfers to realized welfare up to the same rad_k(T, δ)/γ_k loss. This makes the practical tradeoff transparent: longer attribution windows and stricter safety levels mechanically depress welfare by forcing bidders to act as if the last D outcomes might be zero, even when expected conversions are unchanged.

8. Individual guarantees (optional): dynamic regret against a delay-robust pacing benchmark (e.g., per-round constraint-pacing with padded constraints), with rates scaling in path length and delay.

The welfare bound above is deliberately : it does not require the multipliers to converge, only that they remain safe and that the auction is core. In some applications, however, one also wants an learning interpretation: if an advertiser were optimizing in a stationary (or slowly drifting) environment, how close is TSP-DC to the best fixed (or slowly varying) pacing choice that could have been selected in hindsight? We briefly outline such a guarantee here. The main conceptual point is that delayed conversion feedback does not merely slow learning; because we insist on high-probability ROI/CPA feasibility, delay forces the comparator itself to be via a padded ROI constraint.

To avoid distractions from strategic interaction, fix an advertiser k and view the rest of the market and the auction mechanism as inducing, for each (effective) multiplier μ ≥ 0, a distribution over per-round outcomes (x(μ), p(μ)) conditional on the advertiser’s signal v. Under i.i.d. signals and a stationary environment, define the expected proxy value and expected spend at multiplier μ by
V(μ) = 𝔼[v x(μ)], P(μ) = 𝔼[p(μ)],
where the expectation is over the signal v and any randomness in the auction/competition. We assume a mild regularity condition used in prior pacing regret analyses: V(μ) and P(μ) are Lipschitz (or piecewise Lipschitz) in μ, and the per-round surrogate losses are convex in μ after the standard reparameterization (equivalently, the mapping μ ↦ 1/(1 + μ) is smooth and the auction response is not too discontinuous in bids). These assumptions are not innocuous—in particular, single-item auctions can create kinks when bid orderings swap—but they are a reasonable approximation in thick markets or under small bid perturbations.

With delayed and noisy realized values Y, we cannot safely benchmark against the multiplier that would be optimal under the ROI constraint 𝔼[γp] ≤ 𝔼[Y] unless we allow a nonzero violation probability. Instead, the natural comparator is the best multiplier that satisfies a ROI constraint, reflecting the fact that the algorithm must act as if cumulative realized value might undershoot its proxy by a confidence radius. Concretely, for a horizon T and confidence level δ, define a per-round padding
$$ \epsilon_T(D,\delta)\;\asymp\;\frac{\mathrm{rad}_k(T,\delta)}{T}, $$
where rad_k(T, δ) includes the fixed-delay conservatism (e.g., an additive Dv̄ term plus the usual $\bar v\sqrt{T\log(1/\delta)}$ concentration scale). We then consider the
ℳ_T = {μ ≥ 0: P(μ) ≤ ρ_k, γ_kP(μ) ≤ V(μ) − ϵ_T(D, δ)}.
Let μ_T^⋆ ∈ arg max_{μ ∈ ℳ_T}V(μ) be the best fixed feasible multiplier under this padded notion of ROI. The dependence on T is important: as T grows, the padding ϵ_T(D, δ) shrinks, so the comparator becomes less conservative, matching the intuition that long campaigns can afford tighter statistical calibration.

Let the algorithm’s realized proxy value be $\sum_{t=1}^T v_{k,t}x_{k,t}$ (which is what the bidder can optimize online, given that Y is delayed). Define the (proxy) regret against μ_T^⋆ as
$$ \mathrm{Reg}(T) \;=\; T\cdot V(\mu^\star_T)\;-\;\sum_{t=1}^T \mathbb{E}\big[v_{k,t}x_{k,t}\big]. $$
Under the same step-size choices used in pacing-without-convergence analyses (two timescales, with the budget multiplier updated more aggressively than the ROI multiplier), and under the Lipschitz/convexity surrogates described above, one can adapt standard online convex optimization arguments to show a sublinear regret bound of the form
Reg(T) ≤ Õ (T^3/4) + Õ (rad_k(T, δ)),
where the first term is the familiar non-asymptotic price of simultaneously learning and maintaining feasibility with multiplicative bidding (the exponent 3/4 is typical when one combines smoothing, two constraints, and a non-smooth auction response), and the second term is the explicit statistical cost of delay/noise through the padded ROI comparator. Importantly, delay does not appear as an additional difficulty beyond this padding: because the ROI controller is driven by the predictable proxy term v_k, tx_k, t plus an explicit confidence increment, the update does not require waiting for Y_k, t to arrive in order to remain safe.

A more flexible benchmark allows the environment to drift slowly (e.g., seasonality in competition), and compares TSP-DC to a {μ_t^⋆}_t = 1^T with bounded path length
$$ \mathrm{PL}_T \;=\; \sum_{t=2}^T \big|\mu_t^\star-\mu_{t-1}^\star\big|. $$
The appropriate comparator class is again padded: at each time t, the comparator must satisfy instantaneous analogues of the budget-rate and padded ROI constraints in expectation under the time-t environment. Under bounded drift and the same regularity conditions, one obtains a dynamic regret guarantee of the schematic form
$$ \mathrm{DReg}(T) \;=\; \sum_{t=1}^T \Big(V_t(\mu_t^\star)-V_t(\mu_t)\Big) \;\le\; \tilde O\!\Big(T^{3/4}\big(1+\mathrm{PL}_T\big)\Big) \;+\; \tilde O\!\big(\mathrm{rad}_k(T,\delta)\big), $$
where V_t(⋅) denotes the time-t expected proxy value under the drifting environment. The dependence on PL_T formalizes the usual intuition: tracking a rapidly moving optimum is costly, but if the optimal pacing multiplier changes slowly, the algorithm can shadow it with only a mild additional penalty.

At a high level, we treat the multiplier updates as online gradient (or mirror) descent on surrogate constraint losses. For the budget constraint, the natural loss is proportional to p_k, t − ρ_k and is observed immediately. For ROI, we use the padded slack signal (proxy value minus γ-scaled spend, plus the predictable confidence increment), so the ROI update is also driven by a quantity available at time t even though Y_k, t is not. The confidence padding is doing double duty: it ensures high-probability feasibility, and it converts the delayed/noisy ROI constraint into an online-learnable surrogate. The remaining technical work is to relate regret in the surrogate losses to regret in proxy value, and to control the non-smoothness introduced by the auction response via Lipschitzness or smoothing arguments.

This optional regret view clarifies what delay and change. It does not fundamentally prevent learning a good pacing policy in a stationary market; rather, it forces the bidder to compete against a more conservative policy class that builds in statistical slack of order ϵ_T(D, δ). On the other hand, the guarantee rests on conditions that may fail in practice: discontinuities in allocation as bids cross can break Lipschitzness, and fully adversarial opponents can induce linear regret for any algorithm. For these reasons, we view the regret statement as a useful complement—connecting to classical online optimization—but not as the core robustness guarantee, which remains the feasibility and aggregate welfare bound that holds without convergence and without stationarity assumptions.

9. Extensions and discussion: censoring/aggregation, random (stochastic) delays, heterogeneous delays by event type, approximate-core auctions, and how to empirically estimate model parameters; what needs numerical methods.

We close by discussing several modeling and implementation issues that arise in practice, and how they interface with the feasibility–efficiency logic above. The unifying theme is that our guarantees rely on two ingredients: (i) an that links the bidder’s controllable quantities (payments and allocations) to a predictable proxy for value (the signal term v_k, tx_k, t), and (ii) a (the confidence radius) that shields ROI feasibility against delayed and noisy realizations. Each extension can be read as modifying one of these two ingredients.

In many advertising systems the realized value Y_k, t is not observed at the impression level. Instead, one sees (with delay) an aggregated report over a time window (e.g., daily) and possibly a report due to privacy thresholds, de-duplication, or measurement loss. Let 𝒲_j ⊆ {1, …, T} denote the set of rounds whose conversions are reported in aggregate bucket j, and let the report be
Ŷ_k, j = ∑_{t ∈ 𝒲_j}Y_k, t + ζ_k, j,
where ζ_k, j captures noise (e.g., privacy perturbation) and censoring can be represented by Ŷ_k, j = 1{∑_{t ∈ 𝒲_j}Y_k, t ≥ τ} ⋅ (∑_{t ∈ 𝒲_j}Y_k, t + ζ_k, j) for a threshold τ. The key question is whether we retain an unbiasedness condition suitable for concentration. If the platform provides an unbiased (or conservative) estimator Ỹ_k, j of ∑_{t ∈ 𝒲_j}Y_k, t conditional on the bidder’s filtration, then we can re-run Proposition~1 at the by defining martingale differences
ξ_k, j = Ỹ_k, j − ∑_{t ∈ 𝒲_j}v_k, tx_k, t.
The concentration radius then scales with the number of buckets rather than T, but the range (and variance proxy) per bucket increases with |𝒲_j|. This captures the operational tradeoff: coarser aggregation yields fewer feedback points (slower statistical tightening) but potentially less per-point noise if aggregation reduces variance.

If censoring destroys unbiasedness, we can still preserve by working with a lower bound $\underline Y_{k,j}\le \sum_{t\in\mathcal{W}_j}Y_{k,t}$ that is valid almost surely (for instance, treating censored buckets as zero). This is conservative—it tightens the ROI controller—but it fits our design philosophy: the ROI multiplier update only needs a lower confidence bound. Empirically, platforms often can provide monotone corrections (e.g., ``at least τ conversions occurred’’) that directly translate into such $\underline Y$ terms and reduce conservatism.

Fixed delay D is analytically convenient but not essential. Suppose each realized value Y_k, t is revealed after a random delay D_k, t ∈ {0, 1, 2, …}, possibly dependent on observables (e.g., conversion type) but conditionally independent of future randomness given ℱ_t. Let ℱ_t now include all values whose revelation times are ≤ t. The concentration argument can be adapted by treating the revealed terms as a stopped martingale: for any horizon T, decompose
$$ \sum_{t=1}^T Y_{k,t} \;=\; \sum_{t:\, t+\mathsf{D}_{k,t}\le T} Y_{k,t} \;+\; \sum_{t:\, t+\mathsf{D}_{k,t}> T} Y_{k,t}. $$
The first sum admits a Freedman/Azuma bound as before (since unbiasedness is with respect to ℱ_t, not revelation time), and the second sum is bounded by v̄ times the number of pending items. Thus a natural random-delay analogue of rad_k(T, δ) is
$$ \mathrm{rad}_k^{\mathrm{rand}}(T,\delta) \;=\; O\!\Big(\bar v\sqrt{T\log(1/\delta)}\Big) \;+\; \bar v\cdot N_{k}^{\mathrm{pend}}(T), $$
where N_k^pend(T) counts unresolved contributions at T. If we only have a high-probability bound on N_k^pend(T) derived from a delay distribution estimate, we can fold that into the same overall failure probability budget (splitting δ across events). Conceptually, stochastic delay affects performance through the of the delay distribution rather than its mean: rare long delays behave like an additional adversarial uncertainty that must be padded against.

Modern attribution often decomposes value into components (clicks, installs, purchases) with different delay profiles and different ROI semantics. A convenient representation is
$$ Y_{k,t} \;=\; \sum_{m=1}^M Y^{(m)}_{k,t}, \qquad \mathbb{E}\!\left[Y^{(m)}_{k,t}\mid \mathcal{F}_t\right] \;=\; v^{(m)}_{k,t}x_{k,t}, $$
where each component m has its own delay D^(m) (or random delay distribution). If the ROI constraint references total value, we can simply apply concentration to the sum with a radius that adds across components (or uses a joint variance proxy). If instead the constraint is (e.g., ``CPA on purchases must hold even if clicks are plentiful’’), then the controller must track M separate lower confidence bounds and update a vector multiplier μ^R ∈ ℝ_≥ 0^M. The bidding rule becomes
$$ b_{k,t} \;=\; \frac{v_{k,t}}{1+\mu^B_{k,t}+\sum_{m=1}^M \lambda_m \mu^{R,(m)}_{k,t}}, $$
for suitable weights λ_m encoding how each constraint throttles spend. The qualitative message persists: feasibility is obtained by ensuring spend is covered by confidence bounds on each relevant value component, and heterogeneous delays only enter through component-wise padding terms.

The welfare argument uses core-ness to charge'' welfare loss to revenue. Real auctions may only satisfy an approximate core condition due to reserve prices, discretization, quality adjustments, or throttling. A standard relaxation is an $\alpha$-core (or additive-$\varepsilon$ core) inequality: for all coalitions $S$ and feasible $y\in X$, \[ \sum_{i\notin S} p_i(b) \;\ge\; \frac{1}{\alpha}\sum_{i\in S}\big(b_i y_i- b_i x_i(b)\big)\;-\;\varepsilon. \] Carrying this through the charging proof typically degrades the $\tfrac{1}{2}$ factor to a constant depending on $\alpha$ and adds $T\varepsilon$ to the welfare loss. This is not merely a technicality: if the auction can systematically violate core-ness, then a bidder's pacing can become lessself-financing’’ in the welfare accounting, and aggregate efficiency can drop even if every bidder is individually feasible. Practically, this suggests an empirical diagnostic: measuring the degree of approximate-core violation (under logged bids) can predict how tight an efficiency guarantee one should expect from pacing alone.

The theory takes v̄ and rad_k(⋅, δ) as given, but implementing TSP-DC requires choosing these inputs. We view this as a calibration problem rather than a purely statistical one: the goal is to pick radii that are conservative enough to meet the desired violation probability, but not so conservative that they collapse bids.

A pragmatic approach is to use an empirical Bernstein/Freedman radius with an estimated conditional variance proxy. Concretely, with historical logs one can estimate (possibly per-campaign) bounds on Y or winsorize the tail to enforce a bounded range, then set
$$ \mathrm{rad}_k(T,\delta) \;\approx\; c_1 \sqrt{\widehat V_k(T)\log(1/\delta)} \;+\; c_2 \bar v\log(1/\delta) \;+\; \bar v\cdot \widehat N_k^{\mathrm{pend}}(T), $$
where V̂_k(T) is an empirical variance proxy for ∑_t(Y_k, t − v_k, tx_k, t). Likewise, delay enters through an estimate of the pending mass N̂^pend. Importantly, this calibration can be done using replay or semi-synthetic augmentation, and then deployed online with periodic re-estimation. The remaining parameters (η_B, η_R) are then tuned to control responsiveness versus oscillation; their choice affects regret and transient welfare but does not change the underlying safety logic as long as the invariant behind Proposition~2 is maintained.

Several quantities that appear natural in the analysis are not available in closed form in realistic auctions. Most notably, the mapping μ ↦ (V(μ), P(μ)) depends on the joint distribution of opponents’ bids and the auction’s quality adjustments; even in single-item auctions it is generally piecewise-defined with kinks. Thus, computing a ``best’’ feasible multiplier (even the padded comparator) is typically a numerical task, using bid landscape estimation, simulation-based fixed point methods, or offline policy optimization. Similarly, if we extend to multi-component constraints (multiple μ^R, (m)), then projecting onto a feasible region or selecting step sizes may require adaptive methods (e.g., diagonal preconditioning) and empirical smoothing to mitigate discontinuities. Our view is that this is appropriate: the theory clarifies must be controlled (spend relative to a conservative value proxy) and uncertainty and delay enter (through padding), while the exact quantitative best response of the market is a domain where numerical estimation and engineering judgment are unavoidable.

10. Simulation plan (semi-synthetic): how to instantiate delayed feedback and privacy aggregation; key metrics (constraint violations, liquid welfare, regret) and baselines (BOMW, naive ROI pacing).

While our results are stated abstractly, the underlying feasibility–efficiency logic is amenable to a fairly concrete empirical stress test. We therefore propose a semi-synthetic simulation pipeline whose purpose is not to ``prove’’ the theorems numerically, but to (i) quantify the magnitude of the conservatism terms induced by delay and aggregation, (ii) compare safety mechanisms under realistic feedback imperfections, and (iii) isolate which parts of the welfare loss are attributable to budget pacing, ROI pacing, or auction frictions.

We need a mapping from bids to allocations and payments in each round. A convenient starting point is a single-item auction with x_k, t ∈ [0, 1], where the highest bid wins (possibly with randomized tie-breaking) and the payment is between first and second price, ensuring individual rationality p_k, t ≤ b_k, tx_k, t and approximate core-ness. To make the exercise semi-synthetic rather than purely parametric, we can estimate an empirical ``bid landscape’’ from logs: for each advertiser (or segment) we fit functions that map an effective bid to (a) win probability and (b) conditional payment given winning. The simulator then produces (x_k, t, p_k, t) by sampling from these estimated outcome distributions given the bid vector b_t. This approach preserves non-linearities (reserves, discretization, quality multipliers) that matter for pacing stability while keeping full control over the value-generation process.

For each round t and advertiser k, we generate a value signal v_k, t ∈ [0, v̄] and then a realized conversion value Y_k, t ∈ [0, v̄] satisfying the key accounting condition
𝔼 [Y_k, t ∣ ℱ_t] = v_k, tx_k, t.
A simple construction that enforces boundedness and unbiasedness is
$$ Y_{k,t} \;=\; \bar v \cdot \mathrm{Bernoulli}\!\left(\frac{v_{k,t}x_{k,t}}{\bar v}\right), $$
which yields 𝔼[Y_k, t ∣ ℱ_t] = v_k, tx_k, t and captures a stylized ``conversion/no conversion’’ mechanism with value v̄. To study heavier-tailed but bounded noise, we can instead take
Y_k, t = Π_[0, v̄] (v_k, tx_k, t + ϵ_k, t),
where ϵ_k, t is mean-zero conditional on ℱ_t and Π_[0, v̄] denotes clipping. By varying the distribution and scale of ϵ_k, t we directly control the size of the confidence radius required for safe ROI pacing, and can empirically validate the comparative statics with respect to v̄ (or an empirical variance proxy).

To model fixed delay D, we maintain a queue: after generating Y_k, t at round t, we withhold it from the advertiser until time t + D. The bidder’s filtration ℱ_t thus includes all outcomes and payments up to t and all realized values up to t − D. In the simulation, this matters operationally because the ROI controller cannot react to short-run over-spend on value until the delayed signal arrives. We can additionally examine random delays by drawing D_k, t from a calibrated distribution (e.g., a shifted geometric or log-normal, truncated for bounded support in experiments), revealing Y_k, t at t + D_k, t. This allows us to measure how rare long delays translate into larger pending mass near the horizon and therefore a larger effective padding requirement.

We partition the horizon into reporting buckets 𝒲₁, 𝒲₂, … (e.g., daily windows) and produce reports of the form
Ŷ_k, j = ∑_{t ∈ 𝒲_j}Y_k, t + ζ_k, j,
with ζ_k, j drawn from a noise distribution representing privacy perturbation (e.g., Laplace noise calibrated to a target privacy level). To represent threshold-based censoring, we additionally allow
Ŷ_k, j = 1 {∑_{t ∈ 𝒲_j}Y_k, t ≥ τ} ⋅ (∑_{t ∈ 𝒲_j}Y_k, t + ζ_k, j),
for a chosen threshold τ. From the bidder’s perspective we then consider two operational modes: (i) an in which the platform provides a debiased proxy Ỹ_k, j for ∑_{t ∈ 𝒲_j}Y_k, t (when feasible), and (ii) a in which censored buckets are treated as zero (or as τ if the platform reveals ``at least τ’’), yielding a conservative $\underline Y_{k,j}$. Both modes can be plugged into the same controller logic, but their performance differs: unbiasedness tightens radii faster, whereas lower bounds preserve safety at the cost of throttling.

We simulate the proposed two-timescale safe pacing with delayed conversions (TSP-DC), using bids
$$ b_{k,t} \;=\; \frac{v_{k,t}}{1+\mu_{k,t}},\qquad \mu_{k,t}=\max\{\mu^B_{k,t},\mu^R_{k,t},0\}, $$
with μ^B updated from immediate spend and μ^R updated from a padded slack based on the predictable proxy v_k, tx_k, t and a chosen confidence sequence rad_k(t, δ). We compare against two baselines that isolate what the padding and delayed-value treatment accomplish:

To avoid confounding by step-size tuning, we sweep (η_B, η_R) over a grid and report both best-achieved performance and robustness (sensitivity) across the sweep.

We propose three families of metrics aligned with the theory.

We vary D, bucket size |𝒲_j|, censoring threshold τ, and noise scale in ζ, as well as γ_k and budget tightness ρ_k. The central empirical prediction is a smooth tradeoff: TSP-DC should exhibit near-zero budget violation and ROI violation near the target δ, with welfare decreasing as delay and privacy constraints tighten, while naive ROI pacing should retain higher welfare in benign regimes but exhibit sharp ROI failures when D or censoring increases. A limitation of any semi-synthetic study is that the bid-to-outcome simulator may not satisfy exact core-ness and may omit strategic response by other bidders; we therefore treat these experiments as measuring the of the safety–efficiency tradeoff implied by our model, not as a full equilibrium validation.

11. Conclusion and open problems: improving constants, removing i.i.d. assumptions, and moving beyond linear pacing.

We studied a repeated-auction environment in which advertisers bid using per-round value signals while the conversions that justify those bids arrive only after a fixed delay. The central modeling tension is practical: modern bidding stacks are asked to satisfy spend constraints and ROI/CPA constraints, yet the relevant value feedback is incomplete at precisely the times when the controller must decide how aggressively to bid. Our results formalize a resolution of this tension. By combining a fast budget-pacing loop (driven by immediate payments) with a slower ROI/CPA loop (driven by a conservative lower confidence bound on cumulative value), we can guarantee ex-post budget feasibility and high-probability ROI feasibility under delayed, noisy conversion realization, while retaining a constant-factor efficiency guarantee in a broad class of core auctions. Importantly, the welfare guarantee does not rely on convergence of the learning dynamics; it follows from a charging argument that treats pacing as an online feasibility mechanism rather than an equilibrium-selection device.

From a systems perspective, the most salient message is that . Delayed conversion feedback does not break the accounting identity that underlies ROI feasibility—spend must be justified by value—but it does change the time at which evidence arrives, thereby forcing the algorithm to act on a predictable proxy (the signal value v_k, tx_k, t) plus a padding term that immunizes against worst-case noise and missing tail information near the horizon. This yields a concrete and interpretable tradeoff: tighter safety (smaller δ), larger delays D, or noisier measurement (larger v̄ or larger conditional variance) widen the confidence radius, effectively shrinking willingness-to-pay and reducing welfare. In this sense, the theory provides a disciplined way to reason about an engineering phenomenon that is often treated heuristically: when attribution is delayed or privacy aggregation is coarse, conservative pacing is not merely ``risk aversion’’ but the price of maintaining a probabilistic constraint with finite-sample guarantees.

At the same time, our framework deliberately abstracts from many details that matter in production markets, and it therefore suggests several open problems. We group them into three themes: improving constants (and thereby narrowing the safety–efficiency gap), weakening stochastic assumptions on values and feedback, and moving beyond the linear pacing/bid-shading form that we analyze.

Our welfare and feasibility guarantees inherit an additive penalty through rad_k(T, δ) that is order $\bar v\sqrt{T\log(1/\delta)}$ (plus a delay-induced handling of unresolved terms). This rate is information-theoretically natural under bounded noise, but the leading constants can be consequential at marketplace scales, and the delay correction is pessimistic when the pending mass is typically small. A first direction is to replace worst-case bounded-difference radii with empirical-variance (Freedman-style) or self-normalized confidence sequences, so that the controller adapts when realized variance is low. A second direction is to integrate the delay directly into the confidence sequence rather than adding a separate ``pending D’’ term; for instance, one can treat unrevealed increments as censored observations and develop predictable lower bounds that exploit the fact that only the last D terms are unknown at time T. A third, more structural, question is whether one can prove welfare bounds in which the conservatism enters (as a small shrinkage in effective value) rather than additively, at least in regimes where budgets scale linearly with T. Such refinements would make the theory closer to the empirical reality that large campaigns average out noise quickly, even when per-impression conversion is sparse.

Although our core feasibility arguments are martingale-based and thus compatible with substantial adaptivity, several benchmark and efficiency statements are most transparent under stationarity (or i.i.d. value signals) and unbiasedness of the delayed measurement process. In practice, both assumptions are fragile. Value signals drift with user mix, seasonality, and learning on the platform side; attribution pipelines may be biased by privacy thresholds, cross-device matching, or conversion deduplication; and delay is rarely fixed, often exhibiting heavy tails. A natural next step is to develop pacing guarantees under adversarial or nonstationary value processes, perhaps measuring performance against a dynamic comparator with bounded variation, while maintaining hard budget feasibility and probabilistic ROI feasibility. This pushes us toward tools from online learning with delays and drifting constraints, where one must decide how quickly the ROI controller should ``forget’’ stale evidence without creating oscillations. Similarly, replacing the strict unbiasedness condition 𝔼[Y_k, t ∣ ℱ_t] = v_k, tx_k, t with a calibrated approximation (e.g., bounded bias or multiplicative distortion) would align the model with privacy-preserving reporting. Here the open problem is conceptual as well as technical: what is the right notion of ROI feasibility when the reported Y is a noisy proxy for a latent economic value, and how should the pacing rule trade off safety against systematic under-reporting? Resolving this would strengthen the policy relevance of the theory, because privacy constraints are not incidental frictions; they are often binding design requirements.

We modeled bidding as a scaled form of the value signal, b_k, t = v_k, t/(1 + μ_k, t), which captures a ubiquitous ``linear shading’’ implementation. This choice is analytically convenient and operationally common, but it is not obviously optimal once we admit richer auction formats, heterogeneous objectives, or nonlinear risk/return tradeoffs. One open direction is to study pacing rules that are (e.g., piecewise-linear or quantile-based shading) so that the bidder can spend budget on the highest-quality opportunities while throttling low-quality tail traffic more aggressively. Another is to incorporate multi-goal constraints (brand safety, reach, frequency caps) in a principled way, turning pacing into a multi-dimensional online dual-control problem with delayed feedback. On the mechanism side, core-ness is a powerful abstraction, but many display and sponsored-search auctions include quality adjustments, reserves, and non-separable click models; extending the welfare-charging argument to these settings, or identifying the minimal auction properties needed for a similar guarantee, remains open. Finally, our analysis treats each bidder as a price-taker who reacts to outcomes but does not explicitly model strategic interaction among learning bidders. Establishing whether the welfare guarantee is robust to strategic misreports of v_k, t, or to adversarial pacing dynamics that attempt to manipulate competitors through transient bid shading, is an important step toward a more game-theoretic understanding of safe pacing.

Stepping back, we view the model as illuminating a broader design principle: when economic constraints (budgets, ROI/CPA) must be satisfied under delayed and imperfect measurement, the right unit of analysis is not a static equilibrium but an tradeoff governed by concentration and information flow. Making this tradeoff sharper—statistically, behaviorally, and institutionally—is the central agenda for future work.