Mechanism Design Against Agentic Bidders: Learned Deviation Oracles for Robust Approximate-DSIC Auctions

Metadata

Total Words: 12,972
Export Date: 2026-01-16 02:26:29
Description: Differentiable mechanism design (e.g., RegretNet and its combinatorial extensions such as CANet/CAFormer) enforces incentive compatibility by penalizing estimated ex-post regret, where regret is approximated via gradient ascent over misreports. In modern platform markets, bidders are increasingly automated (bots/LLM agents) and can discover deviations that gradient-based inner loops miss, turning “low-regret” learned auctions into exploitable mechanisms. We formalize deviation search as an explicit adversary (deviation oracle) and propose training auctions against learned strategic agents—policy networks and black-box LLM-guided search—that output profitable misreports conditioned on the mechanism. We present a clean oracle-approximation model: if the attacker class contains an -approximate best-response oracle, then oracle-measured regret certifies true regret up to . This yields a tractable guarantee for approximate-DSIC that directly addresses the primary weakness flagged in prior work: misreport optimization in large, non-convex type spaces. We further provide an adversary-strength comparative static: strengthening the oracle class monotonically improves realized regret guarantees (up to ) at a quantifiable revenue cost. We outline practical algorithms (curriculum from gradient attacks to learned attacks, transfer-attack evaluation) and a benchmarking protocol for combinatorial auctions, showing how learned deviation oracles reduce realized regret against sophisticated attackers with minimal revenue loss.

1. Introduction and Motivation: why 2026-era agentic bidders break weak regret training; the security framing for incentive constraints; contributions and preview of results.
2. Related Literature: RegretNet/RochetNet; CAFormer/CANet feasibility handling; adversarial training in ML; robust mechanism design; strategic agents and algorithmic bidding.
3. A Clean Model of Deviation Oracles: formal environment, mechanism class, true regret, oracle-measured regret; the -approximate oracle condition; discussion of what is provable vs what requires numerics.
4. Training Objective as a Two-Player Game: designer chooses , attacker chooses ; loss functions (revenue vs regret); saddle-point / approximate equilibrium definitions; connection to augmented Lagrangian and to multi-task balancing from CAFormer.
5. Main Guarantees (Theory Section): certification bound ()^()+; nested-oracle monotonicity; revenue–robustness tradeoff; limitations and required regularity assumptions.
6. Deviation Oracles in Practice: (i) gradient ascent (baseline), (ii) learned policy oracle (RL), (iii) black-box search oracle, (iv) LLM-guided search oracle; inputs, action spaces, and computational budgets.
7. Algorithms: curriculum training (warm-start with gradient attacks, then switch to learned oracle); stabilization tricks (regret budgets, weight normalization); transfer-attack evaluation protocol (held-out oracle families).
8. Experiments: synthetic additive/unit-demand, then combinatorial (bundle values); stress tests under distribution shift; attacker generalization; ablations on oracle capacity, compute budget, and mechanism architecture (CANet vs CAFormer).
9. Discussion: implications for deployable auctions; auditability; how to combine with implementability/lottery execution; open problems (certification, scaling, bidding languages).
Appendices: technical proofs; additional experimental details; oracle implementation pseudocode.

Content

1. Introduction and Motivation: why 2026-era agentic bidders break weak regret training; the security framing for incentive constraints; contributions and preview of results.

Mechanism learning has become a practical response to a familiar tension in multi-item auctions: the designer would like to tailor allocation and pricing to a complex environment, yet the classical templates with strong incentive guarantees (e.g., VCG with rich bidding languages) can be operationally misaligned with platform objectives, computationally heavy, or simply too rigid for the constraints of modern marketplaces. Parametric mechanisms—often neural, differentiable, and trained in simulation—promise to bridge this gap by directly optimizing revenue or welfare subject to incentive and individual-rationality considerations. In this paradigm, incentive compatibility is rarely imposed as an exact constraint. Instead, it is enforced approximately via a regret penalty: we train parameters to make profitable deviations hard to find, at least under the deviation-search procedure used during training.

A central premise of this paper is that this premise has become materially weaker by 2026. The change is not conceptual but technological: bidders are increasingly . Rather than submitting bids manually or following a fixed, hand-coded bidding heuristic, sophisticated bidders can now deploy autonomous bidding agents that (i) observe outcomes at scale, (ii) run nontrivial optimization loops, and (iii) invoke large language models and tool-use pipelines to search over rich misreport strategies. This matters even in one-shot sealed-bid settings because a bidder can treat the bid as a decision variable and run an offline ``best response’’ computation conditional on the mechanism description. In a repeated marketplace, the same agents can further leverage feedback to refine deviations, amortizing compute across time. In short, the set of plausible deviations is no longer well-approximated by the low-budget local searches that are common in current training pipelines.

This shift creates a security problem. Weak regret training is analogous to robustness training against a narrow family of adversarial perturbations: it can yield a mechanism that looks incentive-stable under the training attack but is brittle to unseen deviation procedures. Put differently, incentive constraints become a surface for exploitation. When we deploy a learned mechanism without a credible stress test, we are effectively assuming that bidders will not invest in discovering profitable misreports, even when the payoff stakes and automation tools make such investment rational. The relevant question is therefore not merely whether the trained mechanism exhibits low regret, but whether it remains stable against a broader, stronger, and potentially adaptive class of deviation searches.

The standard training loop illustrates the issue. Designers typically estimate regret by computing, for each bidder i and valuation profile v, a candidate deviation v̂_i produced by some heuristic (e.g., gradient ascent on a differentiable surrogate), and then penalize the utility gain
u_i(v_i; (v̂_i, v_−i); θ) − u_i(v_i; v; θ).
If the deviation-search heuristic fails to find the bidder’s true best response, then the penalty can be small even when a large profitable deviation exists. This is not a benign estimation error: it is exactly the failure mode that strategic bidders exploit. The econometrician’s perspective makes the point crisp. The designer cares about expected regret under the type distribution, yet the training signal is mediated by an oracle. The quality of the oracle, and the compute budget that supports it, become first-order determinants of whether ``low regret’’ is meaningful.

We therefore adopt an explicitly adversarial framing. We model deviation computation as an with an attack budget (steps, rollouts, or calls to a tool-using LLM) and a deviation class (e.g., local gradient-based deviations versus learned policies that propose structured misreports). This is not a literal claim that bidders are malicious; it is a worst-case diagnostic stance that is standard in security engineering and increasingly standard in machine learning robustness. In our context, it simply reflects that a bidder’s objective is to increase her own utility, and that modern tooling lowers the cost of searching for misreports. The conceptual benefit is that it cleanly separates two questions: (i) what the mechanism would withstand under truly optimal strategic behavior, and (ii) what our training and evaluation procedures can certify given limited deviation search.

This paper makes three contributions along these lines, each aimed at turning ``regret training’’ into a more interpretable robustness statement.

First, we formalize a certification link between oracle-measured regret and true regret via an ε. The key idea is simple: if the deviation oracle returns a deviation whose utility is within ε of the best-response utility for almost all valuation profiles, then the regret computed at that deviation upper-bounds true regret up to ε. This turns an algorithmic property (how good is the deviation search?) into an economic guarantee (how close is the mechanism to being DSIC in expectation). Practically, it clarifies what must be demonstrated for regret numbers to be meaningful: not only that the mechanism has low measured regret, but that the attacker class and budget are strong enough that ε is plausibly small.

Second, we develop a monotonicity perspective using . In practice, we can often order deviation procedures by richness: a gradient-based local search is a subset of a policy oracle that can propose global, structured misreports; an LLM-guided search with tool use can strictly dominate both. When we train and evaluate with a richer oracle class, we may observe higher regret during training (because the attack is stronger), but we obtain a tighter certificate: the gap between measured and true regret shrinks with ε. This viewpoint suggests a concrete engineering discipline: upgrade the attacker, not to ``break’’ the mechanism for its own sake, but to prevent a false sense of security induced by weak deviation optimization. It also predicts a design tension that we indeed see: stronger attackers can force the mechanism away from brittle revenue gains that rely on undetected deviations.

Third, we connect this robustness notion to an explicit revenue–robustness tradeoff via the regret-penalized objective. The penalty weight λ plays the role of a Lagrange multiplier: increasing λ shifts the solution toward lower oracle-measured regret, typically at some revenue cost. Our contribution here is not to claim a universal frontier—the outcome depends on the mechanism class and the valuation environment—but to give a clean interpretive bound: the penalized objective provides an explicit control knob for how much revenue the designer is willing to trade for incentive stability under the chosen attacker model. This is the language that platform stakeholders often need: not DSIC or not,'' buthow much revenue are we buying with how much strategic risk.’’

A practical implication of these contributions is an operational evaluation protocol that parallels adversarial robustness testing in ML. We advocate training against an attacker class 𝒜_train and then evaluating against a disjoint, stronger held-out class 𝒜_test (or simply a larger budget B). The resulting —how much regret increases under stronger, unseen deviations—is an interpretable vulnerability indicator. A mechanism that is genuinely incentive-robust should exhibit small transfer gaps as attacker coverage increases; a mechanism that is merely overfit to a particular deviation heuristic will not. This protocol is especially relevant in light of agentic bidders, because the bidder’s deployed deviation procedure is unlikely to match the designer’s training heuristic.

We close the introduction with two limitations that we treat transparently. First, our strongest statements rely on an approximation property of the deviation oracle that is generally unobserved and environment-dependent. We can and do increase confidence by scaling budgets, diversifying oracle families, and performing held-out attack evaluation, but there is no free lunch: certifying near-best responses in high-dimensional combinatorial bid spaces remains difficult. Second, our worst-case diagnostic grants the attacker access to (i, v, θ) in evaluation. This is deliberately conservative and serves as a stress test rather than a literal description of bidder information. The policy lesson is nonetheless robust: if a mechanism fails under a generous attacker model, it is unlikely to be safe in the field where sophisticated bidders can approximate such capabilities via computation, experimentation, and reverse engineering.

The remainder of the paper develops these ideas formally and uses them to interpret training dynamics and deployment risk. We begin by situating our approach within the literatures on learned mechanisms, regret-based incentive metrics, and adversarial robustness.

Our work sits at the intersection of learned mechanism design, approximate incentive metrics based on regret, and adversarial robustness. We briefly review the closest strands and explain how our framing—deviation search as an explicit oracle class with a measurable approximation gap—helps interpret results that are often reported as purely empirical ``low regret’’ claims.

A central catalyst for modern neural mechanism design is the regret-regularization paradigm introduced in RegretNet and related systems . The core idea is to parameterize allocation and payment rules by differentiable function classes, and to train them to optimize revenue (or welfare) while penalizing estimated regret computed via a deviation search routine. This approach made two conceptual moves that remain influential. First, it replaced hard incentive constraints with a continuous training signal, enabling gradient-based optimization at scale. Second, it operationalized approximate incentive compatibility using an ex-post regret metric, which is natural for quasi-linear utilities and directly aligned with the DSIC inequalities.

RochetNet and subsequent IC-by-construction'' architectures pursue a complementary route: rather than penalizing regret, they encode structural conditions (e.g., convexity/monotonicity) that imply truthfulness for certain domains, often drawing on Rochet's characterization in single-parameter or additive settings \cite{Rochet1987}. This line highlights a persistent theme: we can sometimes trade expressive power for guarantees by restricting the mechanism class or the valuation domain. In multi-item combinatorial environments, however, fully general DSIC characterization is elusive, so regret-based training remains a pragmatic default. Our perspective does not argue against regret-based training; instead, it clarifies that the meaning ofregret’’ is inseparable from deviations are computed. Put differently, these systems implicitly assume an oracle, and our contribution is to make that assumption explicit and to treat oracle quality as a first-class parameter of the guarantee.

A recurring empirical observation in this literature is that regret minimization can be optimization-sensitive and attack-sensitive: measured regret depends on hyperparameters of the deviation search (initializations, steps, learning rates, and surrogate losses), and mechanisms can appear stable under one deviation routine while exhibiting larger regret under another . This motivates our emphasis on nested oracle classes and held-out attacks as an evaluation discipline rather than a one-off implementation detail.

A second relevant thread concerns how learned mechanisms enforce feasibility constraints in multi-item allocation problems. Early approaches (including RegretNet-style designs) typically use continuous relaxations (e.g., soft assignments) with penalty terms or projection-like layers to approximately satisfy feasibility, sometimes combined with rounding at inference . More recent architectures such as CANet and CAFormer aim to incorporate feasibility more natively via structured parameterizations and attention-based representations that respect permutation symmetries and can better capture complementarities across items . These designs are motivated by both accuracy (higher attainable revenue/welfare) and stability (reduced constraint violation and better generalization across environments).

From our standpoint, these architectural choices interact with incentive robustness in two ways. First, feasibility constraints shape the utility landscape faced by a deviating bidder; smoother relaxations may be easier to exploit by gradient-based deviations, while hard feasibility or discrete rounding can create non-smoothness that defeats certain local searches but may not deter more global, tool-assisted deviation procedures. Second, richer allocation models can expand the mechanism class enough that training dynamics become more brittle, increasing the risk of ``overfitting’’ to the particular deviation heuristic used in training. This is one reason we find it useful to separate (i) the mechanism class and its feasibility guarantees from (ii) the attacker class used to certify incentive stability. The model we develop is compatible with either design choice: we only require that feasibility and ex-post IR are satisfied by construction, so that regret measures reflect strategic considerations rather than constraint violations.

Our adversarial framing draws a direct analogy to robust optimization and adversarial training in machine learning . In that literature, robustness guarantees are typically conditional on an (e.g., an ℓ_p threat set) and an (e.g., PGD). A well-known lesson is that training against weak or narrow attacks can yield illusory robustness: models appear robust under the training attack but fail under stronger or different attacks, and ``transfer’’ evaluation across attack families has become standard practice . The same logic applies to regret-based incentive training: if the deviation routine is weak, the resulting low measured regret need not correspond to genuine incentive stability.

Our framework imports two robustness ideas in a way that is economically interpretable. First, we introduce an explicit approximation gap between the deviation oracle and the true best response (our ε), playing the role of a certificate slack. Second, we emphasize evaluation against disjoint, stronger attacker families and larger budgets, paralleling robustness transfer tests. While this analogy is conceptually straightforward, we view it as practically consequential: it encourages mechanism designers to report not only regret numbers but also the attacker class, budget, and transfer gaps, much as ML practitioners report robustness under multiple attacks.

Our work also connects to robust mechanism design and the broader study of incentive compatibility under uncertainty or constraints . Classical robust mechanism design asks for mechanisms that perform well under ambiguity about beliefs or higher-order information, often leading to worst-case objectives and minimax formulations. Although our uncertainty is not about beliefs per se, there is a parallel: we seek mechanisms that remain stable under a broad set of deviation procedures, not merely under the particular deviation computation used in training.

There is also a large literature on truthfulness and regret-based incentive metrics, including ε-DSIC notions and algorithmic implementations where exact DSIC is computationally infeasible or too restrictive . In multi-dimensional environments, approximate IC is often the only tractable target, and regret penalties naturally implement a Lagrangian relaxation of incentive constraints. Our novelty is not the existence of such a tradeoff, but a clean way to interpret it operationally: the penalty weight determines the revenue–robustness point , and the attacker model determines how meaningful the measured regret is as a proxy for true regret.

Finally, our motivation is aligned with a growing body of evidence that bidding is increasingly automated and strategic computation is increasingly cheap. In sponsored search and other platform markets, algorithmic bidding tools (often proprietary, sometimes RL-based) already mediate a substantial fraction of bids, and the platform-facing literature documents how bidder-side optimization can reshape equilibrium behavior and platform revenue ). More broadly, the economics of ``strategic agents’’ emphasizes that when agents can deploy computation, they may exploit mechanism details in ways that are not captured by simple behavioral models . Recent advances in tool-using LLM agents further lower the barrier to running structured searches over complex action spaces, including combinatorial bid languages and constraint-satisfying misreports.

This trend matters for learned mechanisms because deviation computation is precisely the task that can be automated: a bidder can treat the published mechanism (or an inferred surrogate) as an objective function and invest compute to search for profitable misreports. The natural implication is that the relevant deviation set in deployment may strictly exceed the one used in training. Our focus on nested oracle classes and attack budgets is meant to capture this operational reality without requiring a fully specified model of bidder learning dynamics. In that sense, we view our setting as a conservative stress test: if a mechanism is brittle to stronger deviation oracles in simulation, it is difficult to argue that it will remain stable in the field as bidder tooling improves.

Taken together, these literatures suggest a common lesson: empirical incentive stability claims should be conditioned on, and accompanied by, a transparent description of the deviation search procedure. Our model in the next section formalizes this dependence by treating deviation search as an oracle with an explicit approximation radius, thereby separating what can be proven from what must be established numerically via stronger and more diverse attacks.

3. A Clean Model of Deviation Oracles: formal environment, mechanism class, true regret, oracle-measured regret; the -approximate oracle condition; discussion of what is provable vs what requires numerics.

Incentive constraints in multi-dimensional auctions are conceptually simple but operationally slippery: truth-telling is optimal if and only if no bidder can improve utility by deviating to alternative report. In learned mechanism design, we rarely search over the full misreport space; instead, we run a deviation routine and compute an regret. Our goal in this section is to separate these two objects cleanly. We write down (i) the economic environment and the ``true’’ regret notion that corresponds to DSIC, (ii) the oracle-measured regret actually used in training and evaluation, and (iii) a single approximation condition—an ε-approximate oracle—that makes explicit what can be certified analytically and what must be established numerically.

We fix a finite set of bidders N and items M, and let K denote the family of bundles over which bidders can express values (either the full power set minus ∅, or a restricted bidding language). A bidder i’s type is a valuation vector v_i = (v_iS)_S ∈ K ∈ V_i, and the valuation profile is v = (v₁, …, v_n) drawn from some distribution F that is unknown to the econometrician and, in applications, only accessible through samples or a simulator.

A mechanism is parameterized by θ ∈ Θ and consists of an allocation rule g_θ and payment rule p_θ. For a bid profile b = (b₁, …, b_n), the mechanism outputs a (possibly randomized) feasible allocation g_θ(b) and payments p_θ(b) = (p_θ, 1(b), …, p_θ, n(b)). We work with quasi-linear utilities,

where g_θ, i(b) denotes bidder i’s (possibly random) bundle. Throughout, we take feasibility and ex-post individual rationality (IR) to be satisfied by construction, so that any incentive violations we measure reflect strategic manipulation rather than allocation infeasibility or negative truthful utility.

The standard DSIC inequalities require that truthful reporting b_i = v_i maximizes bidder i’s utility pointwise for every profile of others’ bids. A convenient scalar measure of how far a mechanism is from DSIC is . For each realized valuation profile v, define the pointwise deviation gain

By construction, Δ_i(v; θ) ≥ 0, and Δ_i(v; θ) = 0 for all i and all v (up to measure-zero events) corresponds to DSIC. The expected regret is then

This is the object we ultimately care about: it is defined in terms of the true best response over the entire feasible report space V_i. Importantly, it depends only on the mechanism and the environment; it does depend on any particular deviation algorithm.

The difficulty is computational rather than conceptual: even when u_i is evaluable, the maximization over V_i is typically high-dimensional, non-convex, and constrained by the bidding language. For combinatorial K, the dimension of v_i can be exponential in m, and even for restricted languages the mapping from reports to allocation and payments can create a rugged utility surface. As a result, practical systems approximate Δ_i(v; θ) by searching for a profitable deviation rather than solving for the true maximizer.

We model deviation search as an explicit 𝒜 = {a_ϕ : ϕ ∈ Φ}. Given bidder identity i, valuation profile v, and mechanism parameters θ, the oracle outputs a misreport v̂_i = a_ϕ(i, v, θ) ∈ V_i (possibly stochastically, e.g., through randomized restarts or policy sampling). This abstraction covers a range of concrete deviation procedures: gradient-based ascent on a differentiable surrogate, coordinate search or evolutionary strategies in bid space, learned deviation policies trained by reinforcement learning, and tool-using agents that propose structured misreports subject to language constraints.

Given an oracle a_ϕ, we define the deviation gain and regret as

These quantities are directly computable given simulation access to u_i under the mechanism, and they are precisely what is minimized (often implicitly) when papers report ``low regret’’ for a learned mechanism. The key point is that rgt_i^ϕ(θ) is not, by itself, an incentive guarantee: it is an diagnostic, and it can understate the true regret whenever the oracle fails to find a strong deviation.

To connect oracle-measured regret back to the economic benchmark, we impose a single approximation requirement. We say that an oracle a_ϕ is (for a given mechanism θ) if for all bidders i and almost all valuation profiles v,

This condition is deliberately stated in utility units: it says the oracle finds a deviation whose utility is within ε of the best possible deviation utility at that (i, v, θ). When holds, the gap between true regret and oracle-measured regret is controlled in the simplest possible way. Indeed, subtracting the truthful utility u_i(v_i; v; θ) from both sides yields the pointwise bound

and taking expectations gives

Thus, if we can make rgt_i^ϕ(θ) small and we can argue that ε is small, then we obtain an economically interpretable approximate DSIC statement in expected regret.

Two clarifications matter in practice. First, is a approximation requirement; it is stronger than merely having small expected suboptimality, and it is the right object if we want worst-case-style certification. Second, ε is not a structural constant of the environment; it is an operational summary of the attacker class, its budget, and the non-convexity of the induced best-response problem. Increasing the attacker budget (more steps, more restarts, more rollouts) or enriching the oracle class can reduce ε, but it need not drive it to zero.

The role of the model is to make a clean division between analytical implications and empirical workload.

On the ``provable’’ side, statements like follow immediately from and do not depend on the internal form of g_θ or p_θ, beyond utilities being well-defined. Similarly, once we treat the oracle as an explicit object, monotonicity claims become transparent: evaluating with a richer oracle class or a larger budget can only increase measured regret (it finds at least as good deviations), while simultaneously tightening the interpretation of that regret as a proxy for the true best response by reducing the plausible ε.

On the numerical'' side, the central unresolved question is whether a given deviation procedure is in fact $\varepsilon$-approximate, and what $\varepsilon$ should be. There is no generic theorem that a particular heuristic (say, a fixed-step gradient ascent) achieves a uniform approximation guarantee in these non-convex problems. As a consequence, we view $\varepsilon$ not as something we can assume away, but as something to be \emph{stress-tested}: by expanding the attacker class, increasing compute, and performing transfer-style evaluations across disjoint deviation families. If measured regret rises substantially under stronger held-out attacks, that is direct evidence that the original oracle had a large approximation gap in the relevant regions of type space, and that priorlow regret’’ claims were conditional on a narrow deviation model.

This perspective also clarifies the interpretation of negative results. If a mechanism appears manipulable under a stronger oracle, that does not necessarily mean the mechanism is ``bad’’ in an absolute sense; it means that, relative to the strategic capabilities represented by that oracle, the mechanism fails an incentive robustness test. Conversely, if regret remains low across increasingly strong and diverse oracles, we gain confidence (though not a formal proof absent a quantified ε) that the mechanism is not merely exploiting weaknesses of a particular deviation routine.

We now build on this oracle-based view to formulate training itself as a two-player game between a designer choosing θ and an attacker choosing ϕ, making explicit how revenue maximization and incentive robustness are traded off through a regret penalty.

4. Training Objective as a Two-Player Game: designer chooses , attacker chooses ; loss functions (revenue vs regret); saddle-point / approximate equilibrium definitions; connection to augmented Lagrangian and to multi-task balancing from CAFormer.

The oracle view naturally suggests that ``mechanism training’’ is not a single-agent optimization problem but a strategic interaction between two algorithmic players. On one side, the designer (seller or platform) selects mechanism parameters θ ∈ Θ to increase expected revenue subject to incentive robustness. On the other side, an attacker selects oracle parameters ϕ ∈ Φ to uncover profitable deviations and thereby increase measured regret. This framing is useful because it forces us to be explicit about what is optimized, against which deviation model, and with which computational budget.

Fix a mechanism class ℳ = {(g_θ, p_θ) : θ ∈ Θ} and an oracle class 𝒜 = {a_ϕ : ϕ ∈ Φ}. For a valuation profile v drawn from F, the designer evaluates revenue at truthful bids, rev(θ; v) = ∑_ip_θ, i(v), while incentive robustness is proxied by oracle-measured regret rgt_i^ϕ(θ). A canonical penalized objective combines these two terms,

where λ > 0 is a regret-penalty weight chosen by the designer. The designer seeks to L with respect to θ, while the attacker seeks to L with respect to ϕ (equivalently, maximize total measured regret):

Operationally, captures the intended meaning of robustness: a mechanism is not ``good’’ because it has low regret under a fixed deviation heuristic, but because it maintains low regret even when the deviation procedure itself is chosen adversarially within a permitted class.

Two modeling choices are embedded in . First, we evaluate revenue under truthful reports; this keeps the designer objective aligned with the standard auction benchmark and avoids conflating revenue with equilibrium selection under misreports. Second, we penalize measured regret under the attacker oracle, rather than imposing an explicit constraint. This reflects practice in learned mechanism design: penalization is easier to optimize than hard constraints and admits smooth tradeoffs controlled by λ.

Because F is unknown, both players operate with sample-based estimates. Given a batch {v^(s)}_s = 1^S and a fixed (θ, ϕ), we form the empirical loss

This expression makes clear what the attacker controls: the mapping a_ϕ(i, v, θ) that determines which misreport is evaluated. In turn, the B (steps of a search routine, number of random restarts, rollout count, or tool-augmented calls) governs how well a_ϕ can optimize bidder utility and thus how informative the regret signal is. In the limit of a perfect best-response oracle, measured regret coincides with true regret; in finite compute regimes, the attacker is an approximation to the best-response correspondence, and training is best understood as adversarial training under a bounded attack model.

In implementations, we rarely solve in one shot. Instead we alternate between (i) updating the attacker to increase measured regret against the current mechanism, and (ii) updating the mechanism to reduce measured regret while preserving revenue. Abstractly, for iterations t = 0, 1, 2, …, an alternation scheme takes the form

Here ``≈’’ is not merely a computational footnote: the attacker step is itself a nonconvex optimization problem (find a high-utility deviation), and the defense step is a nonconvex learning problem (fit a parametric mechanism). Consequently, the right equilibrium concept is also approximate. One convenient definition is an η-saddle point for the population loss: a pair (θ̂, ϕ̂) such that

In words, neither player can improve its objective by more than η by deviating unilaterally. Empirically, we often monitor analogous conditions on held-out samples and report best-response gaps against a suite of attackers, which is a practical proxy for the second inequality in .

This game-theoretic lens clarifies a common failure mode. If the attacker update is weak (small B, narrow oracle class, poor optimization), then training may converge to mechanisms that are only ``robust’’ to that particular weakness. In contrast, if the attacker is strong, then the designer is forced to remove brittle revenue sources that depend on unexploited manipulations, and the learned mechanism moves toward genuinely incentive-compatible behavior within the expressivity of Θ.

It is often conceptually helpful to start from the constrained benchmark

or, more realistically, rgt_i(θ) ≤ δ for a tolerance level δ > 0. The penalized objective can be read as a Lagrangian relaxation of , with λ acting as a shadow price on incentive violations. In classical convex settings, one might introduce bidder-specific multipliers and update them by dual ascent. In our nonconvex, sample-based environment, strict duality guarantees are unavailable, but the analogy remains valuable: λ determines how much revenue we are willing to sacrifice to reduce manipulation incentives, and changing λ traces out a revenue–robustness frontier.

This also suggests why augmented-Lagrangian-style heuristics are attractive in practice. Instead of holding λ fixed, one can increase the penalty when measured regret remains high, or combine linear penalties with quadratic terms that more aggressively punish large violations. From our perspective, such schemes are not merely optimization tricks: they are policy knobs that encode how a platform values incentive robustness relative to revenue, and they make that tradeoff explicit and auditable.

Architectures such as CAFormer explicitly train a mechanism network with multiple objectives—a revenue objective and a regret objective—that share parameters and can compete in gradient space. Equation is exactly a two-task scalarization, and many practical stabilization techniques can be interpreted through this lens. For example, if regret gradients dominate, the mechanism may collapse toward overly conservative allocations or payments; if revenue gradients dominate, the mechanism may exploit irregularities that are easy for a weak attacker to miss. Methods that normalize gradient magnitudes across tasks, schedule λ over training, or alternate between revenue-focused and regret-focused updates can be interpreted as attempts to control this multi-task interference.

The two-player formulation adds a further ingredient: the regret term is itself generated by an adaptive opponent. Thus, the ``regret task’’ is nonstationary even when θ is fixed, because the attacker improves over time. This nonstationarity is not a bug; it is the mechanism by which training discovers incentive violations. But it does mean that convergence diagnostics should be adversarial rather than purely loss-based: what matters is not only whether the current measured regret is low, but whether it remains low under strengthened oracles and increased budgets.

Taken together, the min–max objective , the approximation-aware alternation –, and the Lagrangian interpretation provide a clean template for what ``robust mechanism training’’ is doing. In the next section we state the main guarantees that become available once we quantify oracle approximation error, and we separate what follows by algebra from what requires regularity assumptions or empirical validation.

5. Main Guarantees (Theory Section): certification bound ()^()+; nested-oracle monotonicity; revenue–robustness tradeoff; limitations and required regularity assumptions.

Our goal in this section is to separate two distinct questions that are sometimes conflated in empirical work. First, : given a mechanism θ, what does the regret computed using a particular deviation procedure actually tell us about the true best-response regret? Second, : when we optimize revenue subject to a regret penalty, what can we say about the implied revenue–robustness compromise? The central message is that, once we quantify how well an oracle approximates best responses, we obtain clean certification statements that are largely algebraic. What remains difficult—and therefore must be treated explicitly as an assumption or an empirical question—is whether our training dynamics actually produce mechanisms that are near-optimal for the chosen objective, and whether our oracle family is rich enough to make ε small.

Fix θ ∈ Θ and bidder i. For each valuation profile v, define the as

True expected ex-post regret is rgt_i(θ) = 𝔼_v ∼ F[Δ_i(v; θ)]. In contrast, the oracle-measured counterpart evaluates utility at a single deviation suggested by a_ϕ:

The key modeling input is an ε ≥ 0: for almost all v,

Condition says that the oracle may fail to find the true best response, but it does not miss it by more than ε in utility terms. This is the right notion for certification because it is stated directly in the payoff units that define incentive constraints.

Intuitively, implies that the oracle deviation is as the best deviation; therefore the gap between true regret (which uses a maximum) and measured regret (which uses the oracle deviation) cannot be large. Formally, subtract u_i(v_i; v; θ) from both sides of to obtain the pointwise inequality

Taking expectations over v ∼ F yields the promised certificate:

Two remarks are worth stressing. First, is not an equilibrium claim; it is a purely diagnostic implication of oracle quality. Second, the bound is : low oracle regret certifies low true regret, but high oracle regret does not imply high true regret unless we also know the oracle does not systematically overstate utilities (which typically fails for approximate maximizers).

If a mechanism θ̂ achieves rgt_i^ϕ(θ̂) ≤ δ for all i, then by ,

Thus, in the expected ex-post regret metric, θ̂ is (δ + ε)-DSIC. When δ is driven near zero by training and ε is small because the oracle is strong, we recover the familiar implication that (approximate) regret minimization corresponds to (approximate) incentive compatibility. Conversely, if ε is uncontrolled, then small measured regret may simply reflect a weak deviation search, and the certificate becomes vacuous; this is precisely why we treat attacker strength as a first-class modeling choice rather than a training detail.

In many applications we can order deviation procedures by richness: for example, a few steps of gradient ascent is a strict subset of what a learned policy with exploration can implement, which itself may be a subset of a black-box search with restarts and adaptive proposals. Let 𝒜₁ ⊆ 𝒜₂ be two such classes. Suppose that, for a given mechanism θ, the best oracle in 𝒜_k is ε_k-approximate in the sense of , with ε₁ ≥ ε₂. Then any regret certificate derived from 𝒜₂ is weakly tighter than the one derived from 𝒜₁: if training achieves rgt_i^ϕ_k(θ̂_k) ≤ δ using some ϕ_k ∈ 𝒜_k, we obtain

The substance of is : a richer deviation model can only reduce the certificate gap between what we measure and what we care about. Importantly, this does not assert that a particular optimizer will find a better θ̂₂ than θ̂₁; nonconvex training can fail in either case. What the result does tell us is how to interpret empirical robustness claims: if robustness does not transfer from a weaker oracle family to a held-out stronger family, the most plausible explanation is not that incentive constraints changed, but that the original evaluation oracle left profitable deviations undiscovered.

A second set of guarantees clarifies what is (and is not) implied by optimizing a penalized objective. Fix an attacker ϕ and consider the designer problem of minimizing

Let θ̂ be any minimizer of , and let θ^rev ∈ arg max_{θ ∈ Θ}𝔼[rev(θ; v)] be a revenue-optimal mechanism in the same class (ignoring incentive considerations). Since ∑_irgt_i^ϕ(θ) ≥ 0, optimality of θ̂ implies

which rearranges to the revenue lower bound

Equation formalizes the Lagrangian intuition: the only way penalization reduces revenue relative to the best-in-class revenue benchmark is through the amount of regret we insist on removing, scaled by λ. In particular, if training drives ∑_irgt_i^ϕ(θ̂) near zero, then revenue is near the class-optimal level. At the same time, also highlights a limitation: the bound is expressed in terms of regret, not true regret. To translate it into a statement about true robustness, one must combine with certification , which again places the burden on controlling ε.

The preceding implications are intentionally modest in what they require, but several regularity conditions are still doing work behind the scenes. First, we assume utilities are well-defined and integrable so that expectations exist and (when needed) can be interchanged with maximization and approximation arguments; bounded valuations or bounded payments are sufficient in most implementations. Second, is a approximation condition over almost all v and all bidders. In practice, the best we can often justify is an or analogue under the sampling distribution used to train and evaluate the oracle; the resulting certificates then inherit the same probabilistic qualification. Third, the oracle is granted access to (i, v, θ), which is appropriate for worst-case diagnostics but is stronger than what real bidders possess. We view this as a feature: if a mechanism only appears robust because deviations are hard to find even with privileged information, then its incentive properties are fragile.

Finally, none of the bounds above resolve the algorithmic question of whether alternating updates actually find a good minimax point, nor whether the chosen parametric class Θ contains a mechanism that simultaneously achieves high revenue and low regret. These are empirical questions that depend on optimization landscapes, architecture capacity, and the attacker budget B. The theory therefore guides practice in a specific way: to make the certificates meaningful, we must invest in deviation procedures that credibly shrink ε and that generalize across attack families. This motivates the next section, where we make the oracle classes concrete and discuss how different practical oracles trade off search strength, coverage of the deviation space, and computational cost.

6. Deviation Oracles in Practice: (i) gradient ascent (baseline), (ii) learned policy oracle (RL), (iii) black-box search oracle, (iv) LLM-guided search oracle; inputs, action spaces, and computational budgets.

The certification logic in Section~ treats the deviation oracle a_ϕ as an abstract maximization routine over misreports. In implementations, however, the oracle is a concrete algorithm with a concrete interface: it receives inputs (what information it is allowed to use), it searches over an action space (what reports it can produce), and it consumes a computational budget B (how hard it tries). These design choices matter because they determine the effective approximation radius ε in , and therefore whether low measured regret is genuinely informative or merely an artifact of a weak search.

We organize practical oracles along two axes. First is : in worst-case diagnostics we allow the oracle to condition on (i, v, θ), including v_−i and the mechanism parameters, even though real bidders do not observe these objects. This is intentionally conservative: if we cannot find profitable deviations even with privileged access, then incentive issues are less likely to arise from ``obvious’’ strategic manipulations. Second is : differentiable, learned, black-box, and hybrid methods make different tradeoffs between speed, coverage, and brittleness.

A shared implementation detail across all oracles is the . In combinatorial settings, a report is a vector b_i = (b_iS)_S ∈ K over bundles in the chosen bidding language. To ensure feasibility (e.g., nonnegativity, upper bounds, or monotonicity constraints when imposed), we typically optimize over an unconstrained vector z ∈ ℝ^|K| and map it to a feasible report via a projection or link function, such as b_iS = softplus(z_S) for nonnegativity, clipping to a known range, or projecting onto a polytope that encodes a restricted language. The action space for the oracle is then the domain of z (or a lower-dimensional parameterization when we restrict bids to, say, additive or XOR forms). This seemingly mundane choice can dominate performance: if the parameterization cannot express the relevant profitable deviations, no amount of compute will make ε small.

The baseline oracle treats deviation search as direct optimization of bidder i’s utility with respect to a continuous report vector. Given inputs (i, v, θ), it approximately solves
max_{b_i ∈ V_i} u_i(v_i; (b_i, v_−i); θ),
holding opponents truthful. When g_θ and p_θ are differentiable (as in many neural mechanism architectures) and the report parameterization is differentiable, we can compute ∇_zu_i(v_i; (π(z), v_−i); θ) and take T ascent steps, optionally with momentum or Adam. The computational budget B is naturally measured as the number of gradient steps times the cost of one forward/backward pass through the mechanism. To reduce sensitivity to initialization, we often use multiple random restarts (so B is split across restarts) and keep the best found deviation.

This oracle is fast and easy to scale, which makes it attractive for inner-loop training. Its limitations are equally clear. First, the objective is typically nonconcave in b_i, so gradients can get stuck in local optima; second, the search is sensitive to the bid parameterization and step-size schedules; third, for discrete bidding languages, gradients are either unavailable or require relaxations that can be systematically biased. For these reasons, gradient ascent is best viewed as a for deviation search rather than a robustness guarantee.

A learned oracle replaces per-instance optimization with a policy a_ϕ that amortizes the search across many valuation profiles. Concretely, a_ϕ is a neural network that maps a feature representation of (i, v, θ) to a distribution over reports, written v̂_i ∼ a_ϕ(i, v, θ). The action space may be (a) direct continuous bids, (b) parameters of a structured bid (e.g., additive weights), or (c) a sequence of edits applied to an initial truthful bid (increase bundle $\{1,3\}$ by $\Delta$,''shade all singleton bids by α,’’ etc.). The RL objective is to maximize expected regret, i.e.,
max_ϕ 𝔼_v ∼ F[u_i(v_i; (a_ϕ(i, v, θ), v_−i); θ) − u_i(v_i; v; θ)],
summed over bidders in training, typically with entropy regularization to encourage exploration.

The budget B here is measured in environment interactions: the number of sampled profiles v, the number of rollouts per profile, and the length of any action sequence. The advantage of amortization is that, for a fixed per-instance compute at evaluation time, a well-trained policy can propose strong deviations immediately, often discovering systematic misreports that gradient ascent misses (e.g., coordinated shading across related bundles). The disadvantage is the standard one in RL: training instability, sensitivity to reward scaling, and the risk of overfitting to idiosyncrasies of the training distribution and the current θ. In our setting, these drawbacks are not merely engineering concerns: they translate directly into uncertainty about ε and about transfer to held-out mechanisms.

When differentiability is unavailable, unreliable, or suspected to induce blind spots, we can treat the mechanism as a black box: given b_i, we can simulate g_θ and p_θ and evaluate u_i, but we do not use gradients. A black-box oracle specifies (1) a proposal distribution over candidate misreports and (2) a selection rule that returns the best candidate found under the budget. Examples include random search with adaptive scaling, coordinate-wise hill climbing, cross-entropy methods, CMA-ES, or Bayesian optimization in a low-dimensional structured bid space.

The key design decision is the action space dimension. Searching over all |K| bundle bids can be prohibitively expensive, so black-box methods are often paired with a structured language (e.g., additive bids, low-rank bundle embeddings, or a small menu of bundles). The computational budget B is then the number of utility evaluations (mechanism simulations), which can be large even when each evaluation is cheap, and can be parallelized naturally. Black-box search tends to be slower than gradient ascent per instance, but it can be more robust to non-smoothness and can uncover deviations that exploit discontinuities in allocation rules (e.g., threshold effects) where gradients are uninformative.

A practical limitation is that black-box methods may ``waste’’ evaluations on implausible or infeasible bids unless feasibility is built into the parameterization. Moreover, because these methods often optimize stochastically, regret estimates can have higher variance unless we allocate additional compute to resampling outcomes when g_θ is randomized.

LLM-guided oracles sit between structured reasoning and black-box evaluation. The core idea is to use a language model to propose candidate misreports—often as structured edits or as a sparse set of salient bundles to manipulate—and then to verify and refine these proposals by simulating the mechanism. Operationally, we provide the LLM with (a) a description of the bidding language (what a valid b_i looks like), (b) the bidder’s true valuation summary v_i (possibly compressed, e.g., top bundles), (c) outcome information under truthful bidding (allocation and payments), and (d) optionally mechanism-level information about θ or observed allocation patterns. The LLM outputs one or more candidate deviations, which we map into feasible reports and score by realized utility. We can iterate this procedure: feed back the best-performing deviation and the resulting outcome, request variations, and keep the best. The budget B can be counted as the number of LLM calls plus the number of mechanism evaluations for verification.

The comparative advantage of LLM guidance is : it can propose qualitatively different deviation families without requiring us to pre-specify them, such as inflate a complementary bundle while shading its subsets'' orshift mass from dominated bundles to pivotal ones.’’ This is especially useful when the mechanism is complex enough that profitable deviations are not small perturbations around truthful bids. At the same time, LLM guidance introduces its own failure modes. First, the LLM may generate invalid bids unless we enforce strict output schemas and post-processing projections. Second, the LLM can be sensitive to prompt design and can hallucinate mechanism properties; therefore, we treat the LLM as a and rely on simulation-based verification as the ground truth. Third, because LLM calls are expensive, the effective B is often small, which makes careful orchestration (candidate batching, structured action spaces, and reuse of past successful deviations) essential.

We view these four oracle families as complementary rather than mutually exclusive. Gradient ascent is the workhorse for rapid training-time pressure; learned policies provide amortized, adaptive deviations; black-box methods serve as a non-differentiable check against gradient blind spots; and LLM-guided search broadens the deviation space toward human-like strategic patterns that are hard to enumerate. In policy and platform settings, the relevant question is not whether any single oracle is ``realistic’’ for bidders, but whether the remains approximately incentive compatible under a sufficiently rich family of deviations that includes the most plausible and the most damaging manipulations. This is why we emphasize budgets and held-out oracle families: robustness claims are only as credible as the strongest deviations we can find under a transparent computational accounting of B.

These considerations motivate the algorithmic choices in the next section: we warm-start with cheap gradient attacks, then escalate to richer learned and search-based oracles, and we evaluate via transfer to held-out attack families to diagnose overfitting to a particular deviation procedure.

7. Algorithms: curriculum training (warm-start with gradient attacks, then switch to learned oracle); stabilization tricks (regret budgets, weight normalization); transfer-attack evaluation protocol (held-out oracle families).

Our goal in training is to find a mechanism parameter θ that attains high expected revenue while remaining approximately incentive compatible with respect to the strongest deviations we can generate under a transparent computational budget. Operationally, we implement the saddle-point problem in the designer–attacker loss
L(θ, ϕ) = −𝔼_v[rev(θ; v)] + λ∑_i ∈ Nrgt_i^ϕ(θ),
as an alternating procedure: for each mechanism update, we update (or query) an oracle that proposes deviations intended to increase the measured regret. Because both the deviation search and the mechanism class are typically nonconvex and because regret is defined by a over reports, the central algorithmic question is not whether we can optimize perfectly, but whether we can increase the pressure of the inner maximization over time without destabilizing the outer minimization.

In early training, the mechanism is far from incentive compatible, and many profitable deviations are ``local’’ in the report space. In this regime, simple gradient-based deviation search is not only inexpensive but also informative: it quickly identifies large regret violations that the designer can correct with a small number of updates. We therefore begin with a in which the attacker is a gradient-ascent oracle with modest compute (few steps, few restarts). Concretely, for each sampled profile v and bidder i, we approximately compute a deviation v̂_i by ascending in a continuous parameterization and record the corresponding utility improvement. The designer then updates θ by descending on an empirical estimate of L(θ, ϕ), with the regret term computed at the found deviations.

As training progresses, the mechanism adapts to these local attacks: measured regret can become small not because the mechanism is robust, but because the attacker has insufficient coverage (large ε). At this point, continuing to invest solely in gradient steps yields diminishing returns and risks overfitting to a narrow deviation family. We therefore employ an explicit : we gradually replace (or augment) the baseline oracle with a stronger learned oracle that amortizes search and proposes more global manipulations. Practically, we implement this escalation in one of two ways. The first is a : after T₀ designer iterations, we freeze the baseline attacker and begin training a learned policy oracle a_ϕ to maximize measured regret against the current θ, while simultaneously continuing designer updates. The second is a : at each iteration we sample an attacker type from a distribution that shifts mass from cheap gradient attacks to learned (and, when used, black-box or LLM-guided) attacks as training proceeds. The mixture approach is often more stable, because it preserves a supply of easy-to-find violations while the stronger attacker learns, and it reduces the chance that the designer chases a moving target produced by a rapidly changing oracle.

A key practical detail is that learned attackers can become strong only if they see a sufficiently diverse set of mechanisms during training; otherwise they may specialize to a single θ and fail to generalize even within the same mechanism class. We therefore train the attacker against a of recent mechanism snapshots {θ^{(t − τ)}, …, θ^(t)} (or an exponential moving average of parameters), sampling a snapshot uniformly when generating attacker rollouts. This simple device makes the attacker more robust to nonstationarity and empirically reduces cycling behavior in the outer loop.

A recurrent instability in regret-penalized training is that the regret term can produce rare but extremely large gradients. This can happen when a deviation triggers a discontinuous change in allocation, when payments create sharp thresholds, or when the learned attacker discovers an ``outlier’’ report that yields very large utility gains for one bidder on a small set of profiles. If we naively backpropagate through such events, the designer updates can become dominated by a handful of samples, leading to oscillation or collapse in revenue.

To control this, we use : we cap the contribution of any single bidder–profile pair to the regret penalty. Writing the per-sample regret estimate as
Δ̂_i(v; θ, ϕ) = u_i(v_i; (a_ϕ(i, v, θ), v_−i); θ) − u_i(v_i; v; θ),
we replace it in the loss with a capped version min {Δ̂_i, Γ} for a budget parameter Γ > 0 (or a smooth approximation such as Γtanh (Δ̂_i/Γ)). The interpretation is not that we disregard incentive violations above Γ, but rather that we prevent a single extreme violation from dictating the entire update direction. In parallel, we apply standard gradient clipping on the designer update, and we monitor both the capped and uncapped regret statistics to ensure that capping is not hiding systematic problems.

We also find it useful to schedule the penalty weight λ rather than fixing it throughout training. Early in training, a large λ can prevent the mechanism from learning a sensible allocation rule at all; late in training, a small λ can leave ``residual’’ incentive violations unaddressed. A simple schedule that increases λ over time (or increases it when a moving average of measured regret falls below a target threshold) tends to yield a smoother revenue–regret tradeoff than any single fixed value.

Many parametric mechanism classes (especially neural architectures) exhibit sensitivity to the scale of intermediate representations: small parameter changes can induce large changes in allocations and payments, which in turn produces brittle gradients for both revenue and regret. We mitigate this with scale control methods that are agnostic to the specific architecture. First, we use weight normalization (or spectral normalization, when compatible with the implementation) on the parts of the network that map bids to allocation scores. The practical aim is to constrain Lipschitz-like behavior so that the mechanism does not create unnecessarily sharp decision boundaries that are easy for attackers to exploit and hard for the designer to smooth out. Second, we normalize inputs (bids) consistently across training phases and across oracle types, so that the learned attacker does not exploit trivial scaling mismatches. Third, when payments are represented by a separate network head, we regularize their magnitude and enforce structural constraints needed for ex-post IR by construction, rather than by penalty, to avoid a fragile dependence on the tuning of λ.

Training-time measured regret is only meaningful insofar as it transfers to deviations outside the particular search procedure used in the inner loop. We therefore treat evaluation as an adversarial ``generalization’’ problem and adopt a transfer-attack protocol mirroring robustness testing in machine learning. Concretely, we partition the oracle space into a training class 𝒜_train and a disjoint held-out class 𝒜_test that is strictly richer in search modalities and/or budget. For example, 𝒜_train may include gradient ascent with small B and a learned policy trained on recent snapshots, while 𝒜_test may include black-box search over a structured bid language, larger budgets, alternative initializations, and (when available) LLM-guided proposal families.

The evaluation protocol is then: (i) fix a trained mechanism θ̂; (ii) draw a large held-out set of valuation profiles v; (iii) for each v and each bidder i, compute regret estimates under each attacker family in 𝒜_train and 𝒜_test under matched or explicitly reported budgets B; and (iv) report both the worst-case and distributional summaries (e.g., max_irgt_i^ϕ, average across bidders, and high quantiles across profiles). The central diagnostic is the
max_irgt_i^𝒜_test(θ̂) − max_irgt_i^𝒜_train(θ̂),
which we interpret as evidence of unmodeled strategic behavior not captured by the training oracle. A small gap does not prove DSIC, but it increases our confidence that the mechanism is not merely ``robust’’ to a single attack family.

Two practicalities matter for credible transfer evaluation. First, we ensure that each oracle in 𝒜_test is itself run with multiple random seeds or restarts, reporting the best deviation found under the allotted budget; otherwise, a weak run can understate regret and create a false sense of robustness. Second, we avoid tuning hyperparameters on 𝒜_test: the held-out attackers are meant to approximate genuinely new deviation procedures, so performance against them should be reported without ex post adaptation. When we do explore alternative budgets or attack variants, we present them explicitly as stress tests rather than as calibrated evaluation.

Taken together, the curriculum and the transfer protocol reflect a single design principle: low regret is informative only when it is achieved against attacks that are both strong and diverse, and when robustness is validated by held-out deviation families that probe for blind spots. This principle guides the experimental design in the next section, where we evaluate not only revenue and in-sample regret, but also attacker generalization under distribution shift and systematic ablations over attacker capacity and compute.

8. Experiments: synthetic additive/unit-demand, then combinatorial (bundle values); stress tests under distribution shift; attacker generalization; ablations on oracle capacity, compute budget, and mechanism architecture (CANet vs CAFormer).

We design experiments to answer a practical question that sits behind the formal guarantees in the preceding sections: when we train a parametric auction against a deviation oracle, do we obtain mechanisms whose low measured regret is (i) informative about strategic robustness beyond the training attacker and (ii) stable under reasonable forms of distribution shift? The premise is that, in realistic deployments, we rarely know F and we never know the bidders’ deviation procedures; what we can control is the exposed during training and the used at evaluation. Our experimental protocol therefore couples standard revenue–regret reporting with explicit transfer-attack audits and systematic ablations over attacker capacity, compute budget, and mechanism architecture.

We begin with two canonical synthetic families in which economic structure is transparent, and then move to combinatorial settings where complementarities make incentives and learning harder.

Each bidder i has per-item values (v_ij)_j ∈ M and value for a bundle S given by v_i(S) = ∑_j ∈ Sv_ij. We draw v_ij i.i.d. from simple distributions (e.g., Unif[0, 1] and log-normal variants) to probe sensitivity to tail behavior. This family is useful because incentive violations are often ``smooth’’ in the report space, making it an informative setting for curriculum training and for comparing against economically interpretable baselines (e.g., VCG, or itemwise Myerson-style reserves when applicable).

Each bidder wants at most one item: v_i(S) = max_j ∈ Sv_ij. This environment introduces a structural nonlinearity in S even though primitives remain per-item, and it is a convenient stress test for allocation rules that must coordinate competition across items while maintaining approximate DSIC.

To elicit richer deviations, we consider settings where bidders can express complementarities. We instantiate v_i through a pairwise-complementarity model
v_i(S) = ∑_j ∈ Sα_ij + ∑_{{j, k} ⊆ S}β_i, jk,
with α_ij ≥ 0 and β_i, jk drawn from distributions that control complementarity strength and sparsity. We evaluate two bidding languages: (i) full bundle reporting for all S ∈ K = 2^M \ {∅} for small m, and (ii) restricted languages for larger m (e.g., singletons plus a small set of candidate bundles) to reflect practical constraints in combinatorial auctions. The combinatorial setting is where we most expect ``global’’ deviations: misreports that reshape relative bundle rankings rather than merely scaling a few coordinates.

Across all environments, we compare two parametric families that differ in expressive capacity and inductive bias: a baseline CANet-style architecture (feedforward score-based allocation and payment heads) and a CAFormer-style architecture with attention over items/bundles and bidders. Both are implemented to satisfy feasibility and ex-post IR by construction; thus, the dominant strategic concern is incentive compatibility rather than participation. We train both with the alternating designer–attacker procedure described previously, using matched hyperparameter budgets and multiple random seeds, and we report not only averages but also dispersion (since instability is itself an important practical outcome).

To make robustness claims operational, we explicitly separate the oracle class used in training, 𝒜_train, from the held-out audit class, 𝒜_test. In all experiments, 𝒜_train includes the curriculum mixture of (a) a gradient-ascent deviation search in a continuous parameterization of bids, and (b) a learned policy oracle trained to propose misreports that increase measured regret against a buffer of recent mechanism snapshots. The held-out class 𝒜_test then augments this with search modalities intended to be qualitatively different: larger-budget multi-start attacks, coordinate-wise or evolutionary perturbations over structured bidding languages, and (when the bidding language is discrete) black-box search that does not rely on differentiability of u_i. The core evaluation statistic is the transfer gap,
Gap(θ̂) = max_irgt_i^𝒜_test(θ̂) − max_irgt_i^𝒜_train(θ̂),
reported alongside revenue and regret levels. In our interpretation, a small gap is not a proof of DSIC, but a necessary condition for believing that low measured regret is not merely an artifact of a narrow attacker.

In additive and unit-demand environments, the curriculum behaves as economic intuition suggests. Early in training, cheap gradient attacks reveal large, local incentive violations, and the designer rapidly reduces measured regret without large revenue sacrifice. As the mechanism improves, however, the same cheap attacker begins to ``miss’’ profitable deviations, and the learned oracle becomes essential for continuing progress. Quantitatively, mechanisms trained only against the baseline gradient oracle frequently exhibit a low regret yet a nontrivial regret, i.e., a pronounced transfer gap. Introducing the stronger learned attacker in the curriculum reduces this gap substantially, at the cost of a modest reduction in revenue relative to the weakest-attacker optimum. This pattern is consistent with the nested-oracle intuition: richer deviation classes reduce the slack between measured and true regret (a smaller effective ε), forcing the mechanism away from brittle revenue gains that rely on undetected strategic moves.

In combinatorial settings, two qualitative effects become salient. First, we observe that profitable deviations are often : bidders can gain by reshaping the reported complementarities across bundles rather than by small perturbations of a single bundle value. Consequently, purely gradient-based attackers become less reliable even when given more steps, because the objective landscape in report space is multimodal and sensitive to discrete bundle interactions. Second, the mechanism architecture matters more: CAFormer-style mechanisms tend to achieve a strictly better revenue–regret frontier than CANet-style mechanisms at comparable training budgets, particularly when m grows and bidder heterogeneity increases. Our interpretation is not that attention is intrinsically ``truthful,’’ but that the additional capacity and relational inductive bias let the mechanism represent smoother allocation boundaries and more coherent payment adjustments in the regions where incentive constraints bind.

A deployable auction must tolerate that the deployed type distribution differs from the simulated or historically inferred distribution used in training. We therefore evaluate each trained θ̂ under a family of shifts: changes in marginal scales (e.g., multiplying values by a factor), shifts in heterogeneity (increasing variance across bidders), changes in correlation across items (from independent to positively correlated values), and in combinatorial settings, changes in complementarity strength and sparsity (altering the distribution of β_i, jk). Two patterns emerge. (i) Regret is typically more shift-sensitive than revenue: even when expected revenue degrades gracefully, incentive violations can spike under shifts that move probability mass toward regions where the allocation rule has sharper thresholds. (ii) Training against stronger and more diverse attackers partially mitigates this sensitivity, because the designer learns to smooth out exploitable boundaries that are likely to be hit under many distributions, not just the training one. At the same time, we emphasize a limitation: oracle-robust training cannot, by itself, eliminate the effects of severe distributional mismatch; it can only reduce reliance on narrow, distribution-specific quirks.

A central practical concern is whether the learned attacker in 𝒜_train itself overfits to the training trajectory. We therefore evaluate attacker generalization in two directions: (a) across mechanism snapshots not seen during the attacker’s training window, and (b) across mechanism architectures (training an attacker on CANet snapshots and auditing CAFormer, and vice versa). We find that snapshot buffering improves within-class generalization, but cross-architecture transfer remains imperfect, which motivates our use of disjoint audit families. From an auditability perspective, this reinforces the principle that robustness claims should be supported by multiple, heterogeneous deviation procedures, ideally developed independently of the mechanism training pipeline.

Finally, we run controlled ablations that vary one dimension at a time.

Replacing the curriculum mixture with a single weak attacker increases revenue slightly but increases both audit regret and transfer gaps, especially in combinatorial settings. Adding richer deviation modalities (e.g., black-box or structured-language search) reduces transfer gaps, consistent with the guarantee-level monotonicity discussed earlier.

Increasing B (more restarts/rollouts/steps) monotonically strengthens the we can find at evaluation, and during training it makes the inner maximization harder. Empirically, higher B yields mechanisms with lower realized regret under audits, though with diminishing returns and occasional optimization instability if the designer update is not stabilized (e.g., without regret capping).

At fixed attacker strength, CAFormer typically attains higher revenue at the same regret, or lower regret at the same revenue, but it can be more sensitive to optimizer settings; in several regimes, the main benefit of CAFormer is not raw revenue but a smaller transfer gap under held-out attacks, suggesting smoother strategic behavior.

These experiments set up the discussion that follows: robustness in learned auctions is less about a single regret number and more about —explicitly reporting what attacker families and budgets were used, how transfer gaps behave, and how performance changes under plausible distribution shifts.

9. Discussion: implications for deployable auctions; auditability; how to combine with implementability/lottery execution; open problems (certification, scaling, bidding languages).

Our results point to a pragmatic takeaway: in learned multi-dimensional auctions, , but a joint property of the mechanism and the audit regime used to stress it. In principle, DSIC is a universal quantifier over deviations; in practice, any training loop replaces that quantifier with an inner maximization solved approximately by an oracle. The right question for deployment is therefore not is the mechanism truthful?'' butwhat strategic behaviors were exposed during training and how much undetected slack might remain?’’ The transfer-gap perspective makes this operational: if regret remains low only under the training oracle class, then we have learned a mechanism that is brittle with respect to unmodeled strategic behavior.

A deployable auction needs an audit story that can be communicated to stakeholders who do not want to adjudicate neural network internals. We can treat an oracle class'' as an analogue of a security model: it specifies a family of deviations (and budgets) against which we certify approximate incentive compatibility. Concretely, it becomes meaningful to report an \emph{incentive robustness sheet} alongside revenue: (i) the bidder language $K$ used in deployment; (ii) the set of attacker modalities used in training and evaluation (gradient, black-box, structured search, learned policies); (iii) compute budgets $B$ and restart policies; (iv) observed transfer gaps; and (v) sensitivity under distribution shifts. This style of reporting does not prove DSIC, but it does create a falsifiable, standardized notion ofrobustness to deviations up to budget B in class 𝒜.’’

This audit framing also clarifies how third-party evaluation should work. A platform that trains a mechanism should be the sole author of its audits. Independent ``red teams’’ can be modeled as producing additional held-out classes 𝒜_test; under nested classes, stronger audits can only increase measured regret, so reporting how regret changes as 𝒜_test grows is itself informative. From a governance standpoint, this parallels penetration testing: we do not ask whether a system is unhackable, but whether it withstands diverse, well-specified threat models.

A mechanism in our class outputs allocations and payments; deploying it requires that these outputs be implementable in a real market. Three issues matter.

First, . Many learned allocation rules are naturally stochastic (or become effectively stochastic due to exploration, dropout, or sampling-based approximations). Randomized allocations are economically legitimate—they are lotteries over feasible deterministic allocations—but their acceptance depends on the application. In ad auctions, randomized tie-breaking is common; in procurement or high-stakes allocation, randomness may be politically or legally constrained. If randomization is allowed, we must implement it transparently: publish the lottery seed policy, ensure verifiable randomness, and preserve ex-post IR at the realized outcome level, not merely in expectation. When randomization is disallowed, we can either (i) restrict g_θ to deterministic mappings, or (ii) treat the mechanism as defining a distribution and then apply a deterministic rounding scheme. The second option is delicate: rounding can introduce incentive violations not seen during training unless the rounding rule is included in the mechanism map that the oracle attacks.

Second, . Even if g_θ is cheap to evaluate, payments can be expensive if they require counterfactual recomputation (as in VCG-style rules). One practical advantage of parametric payment heads is constant-time evaluation, but this creates a new audit burden: we must ensure that the payment rule does not encode brittle, discontinuous dependence on bids that attackers can exploit. In applications with strict latency constraints (real-time ad exchanges), payments and allocations must be computed within milliseconds; this pushes us toward architectures whose forward pass is predictable and whose dependence on K scales gracefully.

Third, . In combinatorial environments, full bundle reporting is rarely feasible. Deployment typically fixes a restricted language K (e.g., a set of allowed packages or a proxy bidding interface), and this choice is itself part of mechanism design: restricting K reduces expressiveness and may create opportunities for manipulation through the interface. Our framework can incorporate restricted K directly, but the interpretation changes: we certify approximate DSIC relative to the allowed deviation space V_i over that language. This is not a weakness so much as a reminder that truthfulness is always defined with respect to what agents can report; designing K is therefore as important as designing (g_θ, p_θ).

One promising path for deployment is hybridization: rather than learning an auction ``from scratch,’’ we can learn components around an interpretable core. For example, in additive or unit-demand settings, we can begin from a VCG allocation (or a Myerson-style allocation with reserves) and learn (i) bidder-specific reserve policies, (ii) smoothing or ironing transformations, or (iii) constrained payment adjustments, while preserving feasibility and ex-post IR by construction. The benefit is twofold: (a) the induced allocation and payment maps are easier to reason about and to explain, and (b) the attacker has fewer degrees of freedom to exploit, which can reduce both measured regret and transfer gaps. The limitation is that such structure may leave revenue on the table in complex combinatorial domains; the right balance likely depends on market thickness, regulatory constraints, and tolerance for opacity.

Our regret estimates are intrinsically on the true best-response regret, because they evaluate only deviations discovered by some oracle. The ε-approximation proposition explains how to convert a lower bound into an upper bound when we can bound ε, but in practice ε is rarely known. A major open problem is therefore to build of approximate DSIC for high-dimensional mechanisms. One route is to develop dual bounds on the inner maximization (e.g., convex relaxations or Lipschitz-based global optimization bounds in restricted languages). Another is to design mechanism classes whose utility landscape in reports is provably well-behaved (e.g., concave in appropriate bid parameterizations), so that best responses can be found reliably. A third is statistical: treat deviations as a hypothesis class and bound the generalization gap between empirical and population regret, though this still leaves the computational best-response gap.

Scaling remains a fundamental challenge. As m grows, |K| explodes, complementarities create multimodal deviation landscapes, and attention-based mechanisms—while more expressive—become harder to train stably against strong attackers. From a systems viewpoint, increasing attacker compute B improves robustness but can make the designer update noisy and brittle. From an economic viewpoint, large m also increases the scope for deviations that are ``semantic’’ rather than local (reshuffling relative rankings of many bundles). Addressing this likely requires joint advances: richer structured bidding languages; attacker curricula that cover qualitatively distinct deviation modes; and training objectives that stabilize the min–max game (e.g., regret capping, optimistic updates, or explicit regularization of allocation boundaries).

Even a mechanism that is robust under a strong audit regime can fail under severe distribution shift: incentive constraints bind where the allocation rule changes sharply, and a shift can move probability mass precisely to those regions. This suggests a deployment posture closer to monitoring and patching'' than totrain once and ship.’’ In practice, a platform can (i) log bids and outcomes, (ii) run periodic offline audits against updated attacker suites, and (iii) retrain with a widened training distribution or domain randomization. Economically, this resembles adaptive mechanism design under evolving environments; operationally, it resembles post-deployment security updates.

Finally, we have focused on unilateral deviations and quasi-linear utilities. Real markets introduce collusion, repeated interaction, bidder budgets, and platform-side constraints (fairness, transparency, legal compliance). These features interact with incentive properties in ways our present framework does not capture. Nonetheless, the core message carries over: when mechanisms are learned, . Treating deviation search as an explicit adversary, separating training from held-out audits, and reporting transfer gaps are practical steps toward mechanisms that can be defended not only in code, but in economic and institutional terms.

Appendices: technical proofs; additional experimental details; oracle implementation pseudocode.

This appendix serves three purposes. First, we give self-contained proofs of the certification-style statements used in the main text. Second, we document an experimental protocol that makes the reported regret and transfer gaps interpretable as an (i.e., a well-specified threat model). Third, we provide implementable pseudocode for deviation oracles, emphasizing the engineering details that materially affect the strength of an ``inner maximization’’ in practice.

Fix a bidder i and mechanism θ. For each valuation profile v, define the best-response gain from deviating:
Δ_i(v; θ) = max_{v_i^′ ∈ V_i}u_i (v_i; (v_i^′, v_−i); θ) − u_i (v_i; v; θ).
By definition, rgt_i(θ) = 𝔼_v ∼ F[Δ_i(v; θ)].

Assume the oracle a_ϕ^⋆ is ε-approximate almost surely in v:
max_{v_i^′ ∈ V_i}u_i (v_i; (v_i^′, v_−i); θ) ≤ u_i (v_i; (a_ϕ^⋆(i, v, θ), v_−i); θ) + ε.
Subtracting u_i(v_i; v; θ) from both sides yields the pointwise bound
Δ_i(v; θ) ≤ (u_i (v_i; (a_ϕ^⋆(i, v, θ), v_−i); θ) − u_i (v_i; v; θ)) + ε.
Taking expectations over v ∼ F and using the definition of oracle-measured regret gives
rgt_i(θ) = 𝔼[Δ_i(v; θ)] ≤ rgt_i^{ϕ^⋆}(θ) + ε.
Consequently, if a mechanism θ̂ achieves rgt_i^{ϕ^⋆}(θ̂) ≤ δ for all i, then it is (δ + ε)-DSIC in expected ex-post regret, i.e., rgt_i(θ̂) ≤ δ + ε for all i.

Let 𝒜₁ ⊆ 𝒜₂ be oracle classes, and let ε_k denote the best (worst-case) approximation radius achievable within 𝒜_k, with ε₁ ≥ ε₂. Suppose training with class 𝒜_k produces a pair (θ̂_k, ϕ_k) such that ϕ_k ∈ 𝒜_k and rgt_i^ϕ_k(θ̂_k) ≤ δ for all i.

Applying the certification bound above to (θ̂_k, ϕ_k) yields
rgt_i(θ̂_k) ≤ rgt_i^ϕ_k(θ̂_k) + ε_k ≤ δ + ε_k.
In particular,
rgt_i(θ̂₂) ≤ δ + ε₂ ≤ δ + ε₁,
which establishes that the guarantee level (measured regret plus approximation slack) is weakly better under the richer oracle class. This statement is intentionally agnostic about optimization error: it is a guarantee comparison on achieving the same measured-regret target δ.

Fix an oracle ϕ and consider the designer objective
J(θ) = − 𝔼[rev(θ; v)] + λ∑_i ∈ Nrgt_i^ϕ(θ), λ > 0.
Let θ̂ ∈ arg min_{θ ∈ Θ}J(θ), and let θ^rev ∈ arg max_{θ ∈ Θ}𝔼[rev(θ; v)] (assumed to exist for simplicity). Optimality of θ̂ implies
−𝔼[rev(θ̂; v)] + λ∑_irgt_i^ϕ(θ̂) ≤ −𝔼[rev(θ^rev; v)] + λ∑_irgt_i^ϕ(θ^rev).
Dropping the nonnegative term λ∑_irgt_i^ϕ(θ^rev) ≥ 0 and rearranging gives
𝔼[rev(θ̂; v)] ≥ 𝔼[rev(θ^rev; v)] − λ∑_irgt_i^ϕ(θ̂),
which is the stated bound. Interpreting λ as a Lagrange multiplier clarifies the economics: the designer ``pays’’ in revenue proportionally to the remaining measured regret.

A central methodological point is that regret is not an intrinsic scalar attached to θ; it is an expectation of a whose numerical value depends on how that maximization is approximated. For this reason, we recommend reporting the following items whenever presenting learned auctions.

We split the attacker toolchain into (i) a class 𝒜_train used inside the min–max loop and (ii) a strictly held-out suite 𝒜_test used only for auditing. The evaluation suite should be disjoint in method (e.g., if training uses gradient ascent, evaluation includes black-box search, structured perturbations, and learned policies) and strictly stronger in budget whenever feasible.

We treat the attack budget B as a first-class parameter. Practically, B includes (a) the number of optimization steps, (b) the number of random restarts, and (c) any additional rollouts needed when g_θ is randomized. Restarts matter because utility in the report space is frequently non-concave; without restarts, many ``low regret’’ claims are artifacts of local search failure.

When the bidding language restricts reports to V_i, attacks must incorporate a projection or parameterization that preserves feasibility. For example, if bids must be nonnegative and monotone over bundles, then an unconstrained gradient step can propose infeasible vectors that artificially inflate measured utility before being clipped; we therefore either (i) parameterize bids via a differentiable map ψ(⋅) that always outputs feasible reports or (ii) include an explicit projection operator Π_{V_i} inside the attack loop and audit its effect.

We report (i) max_irgt_i^𝒜(θ) and $\frac{1}{n}\sum_i \mathrm{rgt}^{\mathcal{A}}_i(\theta)$, (ii) the between 𝒜_train and 𝒜_test, and (iii) the empirical distribution of pointwise regret Δ_i(v; θ) rather than only its mean (tail risk is often the practical concern).

We present two representative families: a differentiable (white-box) ascent oracle and a hybrid search oracle that can wrap learned policies or language-model-driven proposal generators. In all cases, the oracle receives (i, v, θ) and returns a feasible misreport v̂_i ∈ V_i.

Even with careful implementation, these procedures produce on true best-response regret unless ε can be bounded. The purpose of the appendix is therefore not to claim exact DSIC, but to make the audit regime explicit and reproducible, so that stronger oracles and larger budgets can be used to stress the mechanism and quantify how robustness transfers across attacker families.