Inference-Time Feasibility in Neural Auctions via Bregman Projection onto Fairness/Quota Polytopes

Metadata

Total Words: 10,481
Export Date: 2026-01-16 02:04:19
Description: Deep-learning approaches to mechanism design (e.g., RegretNet/ProportionNet and related penalty-based methods surveyed in the source material) typically enforce incentive, fairness, and business constraints through soft penalties during training. In practice, these penalties are fragile under distribution shift and difficult to audit: a mechanism can appear fair in-sample yet violate regulatory or contractual constraints at deployment. We propose a clean alternative: enforce feasibility and fairness/business constraints exactly at inference time. The mechanism first produces unconstrained allocation scores using any differentiable network, then applies a differentiable projection layer (Bregman/entropic projection or constrained optimal transport) onto a polyhedral constraint set encoding feasibility plus quotas/parity constraints (and total-variation style constraints via standard linearization). This yields allocations that are fair-by-construction for every bid profile, eliminating the need for tuning Lagrange multipliers to hit compliance targets. We provide a projection-dominance analysis: projection produces zero constraint violation and incurs a bounded objective loss proportional to the projection gap and Lipschitz constants. We also show that approximate incentive compatibility measured by regret degrades smoothly when the payment rule is Lipschitz in the projected allocation. Empirically, we compare inference-time feasibility against penalty-only baselines under varying n, m and under distribution shift, finding substantially improved worst-case compliance with small revenue tradeoffs.

1. Introduction and motivation: why penalty-based fairness/business constraints are brittle; operational/audit need for hard constraints; positioning relative to RegretNet/ProportionNet and differentiable matching.
2. Model: unit-demand assignment auction with contextual bids; feasibility constraints; linear fairness and business constraints (quotas, parity bands; brief note on total-variation linearization).
3. Mechanism architecture: (i) score network producing P; (ii) inference-time projection layer Π_{}^{ψ}; (iii) payment rule options (learned payment network or structured payments); definition of regret and IR constraints (trained as soft constraints).
4. Projection operators: KL/entropic projection (Sinkhorn-like), Euclidean projection, and generic separable Bregman projections; differentiability and computational complexity; when closed form exists vs when numerical solvers are needed.
5. Main theoretical results: exact feasibility/fairness; revenue-loss bounds; regret-stability bounds; conditions under which projection commutes with symmetry/equivariance.
6. Learning objective and training protocol: end-to-end differentiable training with projection in the loop; optional two-stage training (train P then calibrate projection temperature); monitoring and certification at deployment.
7. Experiments: synthetic multi-item auctions and a quota-constrained ad-allocation proxy; compare penalty-only vs projection-based across (i) constraint violations, (ii) revenue, (iii) regret, (iv) robustness under distribution shift; ablations for projection temperature and constraint tightness.
8. Discussion and extensions: richer fairness (total variation fairness), procurement/business constraints, variable n/m; limitations (DSIC not guaranteed), and paths to certified approximate IC.
9. Conclusion: inference-time feasibility as a new design primitive for 2026 compliance-first mechanism learning.

Content

1. Introduction and motivation: why penalty-based fairness/business constraints are brittle; operational/audit need for hard constraints; positioning relative to RegretNet/ProportionNet and differentiable matching.

Modern marketplaces increasingly rely on algorithmic allocation rules that must navigate two forces that are often in tension: strategic bidding (which pushes us toward incentive-aware design) and externally imposed constraints (which push us toward policy compliance). In sponsored search, labor platforms, procurement, and public-sector auctions, the mechanism is not evaluated solely by efficiency or revenue; it is also judged by whether it respects operational rules (e.g., capacity, eligibility, and exposure limits) and fairness or parity requirements (e.g., minimum participation of protected groups, banded representation, or contractual quotas). In practice, these constraints are not ``nice-to-have’’ regularizers. They are frequently binding, audited, and enforced at the level of each realized allocation, not merely on average across time.

A common engineering response is to append penalty terms to the seller’s objective or to run a Lagrangian-style training procedure. Concretely, one optimizes a learned allocation rule against expected revenue (or welfare) minus weighted violations of fairness/business constraints. While attractive for its simplicity, this penalty-based paradigm is brittle in several ways. First, it requires tuning multipliers that trade off revenue against constraint satisfaction, often in a regime where the optimization landscape is nonconvex and sensitive to hyperparameters. Small changes in the multiplier can yield qualitatively different solutions: either the penalty is too weak, in which case violations persist, or too strong, in which case the learned rule collapses into overly conservative behavior that sacrifices revenue even when constraints are slack. Second, penalties certify nothing at deployment. Even if training-time averages look acceptable, the learned policy can violate constraints on particular instances, especially under distribution shift in bids or context. Third, penalties are difficult to audit: when a constraint is violated, the mechanism designer has no simple, ex-post explanation other than ``the model traded it off.’’

These weaknesses matter because many fairness and business constraints are operationally . A regulator or internal policy unit may specify that exposure to a protected class must lie within a band for each item, each campaign, or each auction; a procurement policy may impose minimum award shares to a supplier category; a platform may need to guarantee that certain bidders never receive certain items (eligibility) or that some items are never co-assigned. In such settings, a penalty-based approach is risky: even a small rate of violations can trigger legal exposure, reputational costs, or contractual breach. The mechanism designer therefore faces an applied version of a classic problem in economic design: how to incorporate side constraints without forfeiting tractability and without relying on post-hoc ``fixes’’ that undermine incentives.

We view inference-time constraint satisfaction as the appropriate locus of enforcement. The core economic intuition is simple: when constraints represent rather than , they should be treated as feasibility conditions of the allocation problem. In the same way that we would never tolerate an auction that occasionally allocates the same item to two bidders (a feasibility violation), we should not tolerate a mechanism that occasionally violates a mandated quota band. This perspective suggests an architectural separation between (i) a learned component that proposes high-quality allocations based on bids and context, and (ii) a deterministic compliance layer that maps any proposal into the feasible-and-fair set. The advantage of this separation is governance: policy makers can modify constraints without redesigning the entire learning objective, and auditors can verify compliance by inspecting the projection step rather than re-litigating a training loss.

Our point of departure is that a large class of business and fairness constraints can be written as linear inequalities over the allocation matrix (possibly after standard linearizations). When the feasible set is a polytope, the compliance layer can be implemented as a projection operator under an appropriate geometry (e.g., Euclidean or entropic). This design is ``fair-by-construction’’ in the literal sense: regardless of how the learned network behaves, the deployed allocation is always feasible. The economic tradeoff is not whether constraints are satisfied, but how much objective value is lost when we enforce them. This shifts attention toward interpretable quantities, such as how far the raw proposal lies from the constraint set, and how sensitive revenue and incentive properties are to small perturbations in allocations.

This approach also clarifies why certain forms of ``differentiable matching’’ are helpful but incomplete for policy compliance. Techniques based on Sinkhorn balancing and related differentiable optimal transport layers are often used to enforce assignment feasibility (row/column sums) and to produce smooth, approximately integral matchings. However, feasibility alone does not encode fairness or business quotas, and adding them via soft regularizers reintroduces the brittleness discussed above. More generally, differentiable relaxations of combinatorial assignment can ensure that the output resembles a matching, but they do not, by themselves, provide the kind of hard, instance-wise guarantees that are typically demanded by compliance regimes.

Relative to learning-based mechanism design frameworks such as , our emphasis is complementary. -style methods prioritize approximate incentive compatibility by optimizing a revenue objective subject to a regret penalty, using neural networks to represent allocations and payments. Fairness constraints are often appended as additional penalties or moment constraints during training. Our concern is that this couples two difficult tasks into a single optimization: (i) learning an approximately DSIC mechanism and (ii) learning to satisfy operational constraints under all realized inputs. In contrast, we advocate decoupling soft'' economic desiderata (revenue, regret) fromhard’’ policy constraints (feasibility, quotas) by enforcing the latter through a projection layer at deployment. This decoupling is particularly valuable when constraint sets evolve over time: one can update the constraint polytope to reflect new policy without re-tuning the entire regret–revenue–fairness penalty stack.

We also relate to approaches in ranking and allocation that enforce group proportions through normalization layers (sometimes referred to as -type constructions). These methods can be effective when fairness is naturally expressed as a smooth proportionality target and when approximate satisfaction is acceptable. Our setting differs in two economically salient ways. First, auctions are strategic: small changes in allocation and payments can alter bidding incentives, so the compliance layer must interact predictably with incentive considerations. Second, many marketplace constraints are not best expressed as equality-of-proportions objectives; they arise as bands, floors, and ceilings that reflect contracts, risk controls, or legal requirements. A polyhedral representation accommodates these forms naturally and supports exact satisfaction rather than approximate adherence.

We are careful, however, not to oversell projection as a universal remedy. The tractability of hard compliance depends on the constraint set. Convex polytopes admit scalable projections and preserve differentiability properties needed for gradient-based training (via implicit differentiation or unrolled solvers), but nonconvex business rules—such as a minimum number of distinct winners, bundle restrictions, or discrete eligibility that cannot be convexified—may require relaxations, mixed-integer subroutines, or randomized rounding. Moreover, hard enforcement can induce distortion: when constraints bind tightly, the projection can move the allocation materially away from the unconstrained revenue-maximizing choice. Our goal is to make this distortion explicit and quantifiable, thereby turning an opaque penalty tradeoff into a transparent compliance–performance tradeoff that can be discussed with policy stakeholders.

With this motivation in place, we now formalize the unit-demand contextual assignment environment, specify the feasibility and linear fairness/business constraints of interest (including quota bands and standard linearizations such as total-variation-style constraints), and define the projected allocation rule that we deploy.

2. Model: unit-demand assignment auction with contextual bids; feasibility constraints; linear fairness and business constraints (quotas, parity bands; brief note on total-variation linearization).

We study a contextual unit-demand assignment environment with a finite set of bidders (agents) and a finite set of items. Let N = {1, …, n} index bidders and M = {1, …, m} index items. Each auction instance is accompanied by observable context x, which aggregates all nonstrategic inputs available to the mechanism at the time of allocation—for example, item attributes, bidder eligibility flags, protected-class labels, and market-state features. We treat x as exogenous and publicly known at the time of bidding, and we allow the constraint set described below to depend on x through right-hand sides or eligibility restrictions.

Bidder i has private values v_ij ∈ ℝ₊ for receiving item j. The unit-demand assumption means that bidder i derives value from receiving at most one item; accordingly, if the mechanism randomizes over matchings (or outputs a fractional assignment), bidder i’s expected value is linear in the allocation probabilities. Let b_ij ∈ ℝ₊ denote the report (bid) submitted by bidder i for item j, and let b ∈ ℝ₊^n × m denote the bid matrix. We do not restrict the message space beyond nonnegativity, though in applications one may impose bid caps or reserve-price induced truncation.

An outcome specifies an allocation matrix P ∈ ℝ₊^n × m and a payment vector t ∈ ℝ₊ⁿ. We interpret P_ij as either an indicator of whether item j is assigned to bidder i (deterministic allocation) or, more generally, as the probability that bidder i receives item j under a randomized allocation rule. Under this interpretation, bidder i’s quasi-linear utility for allocation P and payments t is

The seller’s (expected) revenue is Rev(P, t) = ∑_i ∈ Nt_i. The unit-demand structure enters through feasibility: bidder i can receive at most one item, and each item can be assigned to at most one bidder.

The basic feasibility constraints are the standard row/column constraints for (sub)stochastic assignment matrices:

When P is integral, describes a (partial) matching. When P is fractional, describes a randomized assignment (or a convex relaxation of matchings) and is operationally meaningful in settings where the platform can randomize, where impressions are divisible, or where fractional allocations represent time-sharing across repeated micro-allocations. In either case, captures the unit-demand and capacity logic that items cannot be over-assigned and bidders cannot receive more than one item.

Eligibility and exclusion restrictions fit naturally into this framework. If bidder i is ineligible for item j given context x, we impose

Such constraints are pervasive in practice (e.g., targeting requirements, contractual exclusions, geographic restrictions) and are economically akin to removing edges from the underlying bipartite graph.

Beyond feasibility, we impose operational constraints that reflect policy mandates or business rules. We focus on constraints that are linear in P (possibly after introducing auxiliary variables), because they define a polytope and therefore admit tractable enforcement and transparent auditing.

A canonical family is . Let G be a finite set of protected groups or business classes, and let g(i) ∈ G map each bidder to its group. For each item j and group g, we may require that the total probability mass allocated to group g lies within a band:

The lower bound ℓ_gj(x) captures minimum representation (a floor), while the upper bound u_gj(x) captures caps (e.g., exposure limits or concentration controls). Context-dependence is natural: for example, the set of eligible bidders may vary with x, in which case feasible bounds should adjust to avoid infeasibility. When ℓ_gj = 0 and u_gj = 1, is vacuous; when bounds are tight, can bind and meaningfully reshape the set of admissible assignments.

More general linear constraints encode business requirements such as (i) minimum or maximum total allocation to a subset of bidders S ⊆ N,

(ii) item-level pacing or throttling across categories (e.g., limiting the share of inventory allocated to a class), and (iii) parity bands across collections of items (e.g., requiring group exposure on a set J^′ ⊆ M to fall in a band). These constraints are linear because they amount to upper and lower bounds on sums of entries of P.

Some fairness policies are most naturally expressed as a bound on an L₁ (total-variation) distance between a realized group-exposure vector and a target. For a fixed item j, define group exposure

A typical requirement is that E_⋅j(P) remain close to a prescribed target distribution ρ_⋅j(x) (or to a baseline distribution determined by eligibility), for instance

Although is not linear as written, it admits a standard linearization. Introduce auxiliary variables z_gj ≥ 0 satisfying

and impose

Constraints – define a polyhedral set in the augmented variable space (P, z) and ensure . Analogous constructions can bound pairwise disparities (e.g., |E_gj(P) − E_g^′j(P)| ≤ δ) or enforce smoothness of exposure across neighboring items or time periods by applying the same absolute-value linearization to differences of linear forms in P.

Collecting feasibility , eligibility , and the chosen family of business/fairness constraints (e.g., , , and/or –), we obtain a constraint set 𝒬 of admissible allocations. In our baseline treatment, 𝒬 ⊆ ℝ₊^n × m is a nonempty compact polytope. Compactness is typically ensured by the row/column upper bounds in ; nonemptiness is a substantive feasibility requirement on the policy parameters (e.g., quota bands must be consistent with eligibility and capacity). This polyhedral representation is the key modeling choice: it captures many practically relevant compliance requirements while remaining amenable to algorithmic enforcement and to comparative statics with respect to constraint tightness.

3. Mechanism architecture: (i) score network producing P; (ii) inference-time projection layer Π_{}^{ψ}; (iii) payment rule options (learned payment network or structured payments); definition of regret and IR constraints (trained as soft constraints).

Our design separates the mechanism into three components—a , an enforcing 𝒬, and a . The guiding idea is to hard-code feasibility and policy compliance, while learning the parts of the mapping from reports and context to outcomes that are most environment-specific. This division reflects a practical constraint: many fairness and business rules are nonnegotiable at deployment, whereas incentive properties and revenue are typically optimized subject to those rules and are therefore better treated as objectives (or soft constraints) during training.

Given bids and context, the first stage produces an unconstrained nonnegative score matrix

where θ are learned parameters. We interpret P̃_θ as rather than as a final allocation: it may violate feasibility (row/column sums), eligibility zero constraints, or fairness/business constraints. Economically, this stage plays the role of a reduced-form ``planner’’ that maps messages and context into a tentative assignment. Operationally, it can ingest rich features (including bidder and item embeddings, eligibility masks, and market-state covariates) and capture systematic patterns in bids across contexts that would be difficult to encode analytically.

Because P̃_θ is not required to lie in 𝒬, we can choose a convenient parameterization (e.g., elementwise softplus to ensure nonnegativity) and focus learning capacity on ranking and responsiveness, leaving hard compliance to the next stage. This decoupling is particularly valuable when 𝒬 changes across markets or policy regimes: the score network can remain unchanged while the constraint description is updated.

The deployed allocation is obtained by projecting scores onto the feasible-and-fair polytope 𝒬:

where D_ψ is a strictly convex separable Bregman divergence. At an intuitive level, selects the policy-compliant allocation to the network output under a geometry induced by ψ. This projection is carried out at inference time for each realized (b, x), so compliance is enforced instance-by-instance rather than only in expectation.

We emphasize the economic role of the projection: it converts the mechanism from learned and possibly noncompliant'' tocompliant by construction.’’ This also clarifies the tradeoff we study. If P̃_θ(b, x) already approximately satisfies the constraints, then P̂(b, x) will be close, and the projection induces little distortion. If constraints are tight, the projection may significantly alter the outcome, and any revenue or welfare objective must internalize that distortion.

Given (b, x) and the final allocation P̂(b, x), we compute payments using a rule

where ϕ denotes additional parameters if payments are learned. We consider two broad design choices.

encode economic priors (and often improve stability). For example, one can use a differentiable analogue of threshold/critical payments that charges bidder i an amount linked to the smallest report that would have preserved its allocated probability mass, holding b_−i fixed. In assignment settings with fractional allocations, one can also adapt VCG-style externality payments to the randomized allocation, possibly with reserves or pacing multipliers. These rules reduce the space of admissible behaviors and can make incentive properties less fragile, but they may be conservative under complex constraints.

treat t_ϕ as a flexible function approximator that is trained jointly with P̃_θ (while still depending on P̂). This is useful when the economically optimal payment mapping is complicated by context and constraints. To retain basic plausibility, we typically impose architectural restrictions such as nonnegativity, dependence on own report in a monotone manner, or explicit regularization toward simple baselines. In either case, we compute payments projection, so the payment rule internalizes the fact that the implemented allocation is P̂ ∈ 𝒬 rather than the raw scores.

A technical property that will matter for our incentive analysis is to allocation perturbations. We will appeal to Lipschitz-type bounds of the form

which captures the idea that small changes in allocation probabilities should not create arbitrarily large payment jumps. This condition is not automatic, but it can be encouraged via smooth parameterizations and gradient penalties.

While feasibility and fairness are enforced exactly through P̂(b, x) ∈ 𝒬, dominant-strategy incentive compatibility and individual rationality are generally not achievable simultaneously with rich constraints and high-dimensional contexts without sacrificing performance. We therefore adopt an approach that is standard in learning-based mechanism design: we train the mechanism to minimize and IR violations as soft constraints.

Fix a valuation profile v (used only in training or evaluation), realized bids b, and context x. Bidder i’s ex-post utility under the deployed allocation is

We measure deviations from dominant-strategy truthfulness via ex-post regret,

where ℬ_i is a chosen deviation set (e.g., bounded bids, multiplicative shading, or a grid). In practice, we approximate the maximization in by (i) sampling candidate deviations, (ii) performing a few gradient-ascent steps on b_i^′ when the mechanism is differentiable, or (iii) restricting to economically interpretable deviations (uniform bid scaling) when robustness is the goal.

Similarly, we quantify ex-post IR violations by

and require these to be small on the training distribution. Treating IR softly is appropriate when allocations are fractional or when the platform can tolerate rare small violations that can later be corrected by rebates or participation subsidies; nonetheless, we will report worst-case and tail statistics since average guarantees can mask problematic instances.

Putting the pieces together, we train parameters (θ, ϕ) to optimize expected revenue subject to small regret and IR violations. A representative sample-based objective is

for penalty weights λ, μ ≥ 0, with expectations taken over the training distribution of (v, b, x) (or over v with bids set to b = v when training under truthful-play simulations). We view as implementing a clear division of labor: (a) P̂ ensures satisfaction of 𝒬 for every instance, and (b) the penalties encourage incentive compatibility and participation on the distribution of interest.

This architecture also makes explicit a limitation. Because the projection can move allocations, it can perturb utilities and thereby affect regret; accordingly, incentive properties must be evaluated with respect to the mapping b ↦ (P̂, t_ϕ(⋅, P̂)). At the same time, the projection provides a transparent lever for policy: changing 𝒬 changes outcomes predictably and audibly, without requiring retraining to restore compliance. The next section formalizes how different projection operators behave computationally and how their geometry influences both tractability and learning dynamics.

4. Projection operators: KL/entropic projection (Sinkhorn-like), Euclidean projection, and generic separable Bregman projections; differentiability and computational complexity; when closed form exists vs when numerical solvers are needed.

The projection layer in is the main engineering and modeling choice in our architecture: by selecting the generator ψ and representing 𝒬, we jointly determine (i) how the learned scores are ``corrected’’ to satisfy constraints, (ii) whether the correction is numerically stable at scale, and (iii) whether gradients can be propagated through the deployed mapping during training. In this section we summarize three families that cover most practical deployments—(a) entropic/KL projections computed by Sinkhorn-like scaling, (b) Euclidean projections computed by quadratic programming or simplex-style routines, and (c) generic separable Bregman projections solved by iterative dual methods. We also clarify when closed forms exist and when numerical solvers are unavoidable.

When 𝒬 is a nonempty compact polytope and ψ is strictly convex and separable, the projection problem is a finite-dimensional convex optimization problem with linear constraints:

Because 𝒬 is polyhedral, the solution is characterized by KKT conditions with Lagrange multipliers on row sums, column sums, and fairness/business constraints. This is economically useful: the multipliers admit a shadow-price interpretation of which constraints are binding and how costly they are in the local geometry induced by ψ. Computationally, the same structure allows us to choose between (i) specialized matrix-scaling algorithms when constraints are close to ``transportation-like,’’ and (ii) general-purpose convex solvers (QP/LP-conic) when constraints are more heterogeneous.

A particularly convenient choice is the (scaled) negative entropy generator

which yields the generalized KL divergence D_{ψ_τ}(P∥P̃) = τ∑_i, jP_ijlog (P_ij/P̃_ij) − τ∑_i, j(P_ij − P̃_ij). The parameter τ plays the role of a : larger τ penalizes deviations in a way that encourages smoother, more entropic allocations (and typically improves numerical conditioning), whereas smaller τ pushes toward sharper, near-integer assignments when the feasible region allows it.

When 𝒬 contains only nonnegativity and row/column sum constraints (a transportation polytope), KL projection has a multiplicative structure. Writing dual variables α_i for row constraints and β_j for column constraints, the KKT conditions imply

so the solution can be obtained by iterative row and column rescaling (Sinkhorn). Each iteration is 𝒪(nm) and uses only matrix-vector operations, making it attractive in large markets and on accelerators. With additional fairness quotas (e.g., group-item totals), the same multiplicative form persists but with extra dual terms; one can implement this through generalized iterative scaling, Dykstra-like alternating Bregman projections, or dual coordinate ascent. In all cases, KL geometry tends to be robust when P̃ is sparse or when eligibility masks enforce structural zeros, because multiplicative updates naturally respect zeros (when implemented with care).

Two caveats matter for deployment. First, KL projection is typically computed approximately by stopping an iterative method at a tolerance; if constraints must hold exactly, we must set tolerances sufficiently tight and monitor primal feasibility (and not only objective decrease). Second, KL requires positive arguments for the logarithm; in practice we either (i) ensure P̃_ij ≥ δ by construction, or (ii) treat ineligible entries as fixed zeros handled by constraints and compute KL only over eligible indices.

A second canonical choice is the Euclidean generator $\psi(P)=\frac{1}{2}\|P\|_F^2$, which turns into the familiar Euclidean projection

a strictly convex quadratic program with linear constraints. Euclidean projection is attractive when we want corrections that behave additively (rather than multiplicatively), and when the constraint polytope 𝒬 is expressed in a general linear form that is not well-suited to matrix scaling.

Closed forms exist in a few important special cases. For example, projecting a vector onto the simplex or onto box constraints admits sorting-based formulas; in our matrix setting, if 𝒬 decomposes into independent row-wise simplices (or column-wise simplices), we can apply these row by row (or column by column). However, the moment we couple rows and columns simultaneously (as in assignment feasibility) or add cross-cutting fairness constraints, closed forms generally disappear and we rely on numerical QP solvers. Modern differentiable QP layers solve by interior-point methods or operator splitting; their per-instance cost can be higher than Sinkhorn, but they can handle arbitrary linear constraints directly and can be quite stable when nm is moderate.

A practical issue is that Euclidean projection can produce exact zeros and kinks when the active set changes. While the projection onto a closed convex set is always 1-Lipschitz, it is only differentiable almost everywhere. For training, this is typically sufficient (stochastic gradients rarely land exactly on nondifferentiable boundaries), but it motivates either (i) adding mild smoothing (e.g., entropic regularization) or (ii) using implicit differentiation that handles active-set changes through the KKT system with appropriate safeguards.

Between these extremes lies the broader class of separable strictly convex generators

which includes KL as a special case but also covers, for instance, squared-norm variants with weights, Itakura–Saito divergences, or hybrid geometries designed to penalize changes in certain entries more than others (e.g., to protect reserved capacity). For generic ψ, the KKT conditions imply that ∇ψ(P̂) equals ∇ψ(P̃) shifted by a linear combination of constraint normals. This suggests solving in the space via iterative updates of multipliers (mirror descent / dual coordinate ascent), or applying Bregman versions of Dykstra’s algorithm when 𝒬 is an intersection of simpler sets.

From a computational standpoint, generic separable Bregman projections tend to share two properties. First, each iteration often remains 𝒪(nm) when constraint operators are sparse and structured (row/column sums, group totals). Second, convergence rates depend on the conditioning induced by ψ and on the geometry of 𝒬; in practice we typically tune stopping rules to reach feasibility tolerances that are tight enough for policy compliance while keeping latency acceptable.

For training, we need gradients of loss functions that depend on P̂(b, x) with respect to network parameters θ (and possibly payment parameters). There are two standard routes. The first is : we run a fixed number of projection iterations (Sinkhorn scaling steps, Dykstra updates, or QP iterations) and backpropagate through the computation graph. This is simple and often effective, but memory usage scales with the number of unrolled steps and gradients can become biased if we stop far from the true projector.

The second is through the KKT conditions of . Under constraint qualifications and strict convexity, the solution map is locally differentiable, and the Jacobian of P̂ with respect to P̃ can be obtained by solving a linear system involving the Hessian of the objective and the constraint matrix. This approach can be substantially cheaper at high accuracy and avoids backpropagating through long iteration traces, but it requires careful implementation to handle degeneracy (nearly active constraints) and numerical regularization.

Finally, we stress a boundary that is easy to miss when adding ``business logic’’ to 𝒬. As long as 𝒬 is convex and described by linear inequalities (including auxiliary-variable linearizations for convex fairness metrics), projection remains a tractable convex problem and admits the algorithmic choices above. In contrast, constraints such as minimum number of winners, exact cardinality rules, or other integer restrictions make 𝒬 nonconvex; projection then becomes combinatorial and can be NP-hard in the worst case. In such settings we typically (i) relax the constraint to a convex surrogate and project exactly onto the relaxation, (ii) compute a fractional solution and apply randomized or deterministic rounding with ex-post repair, or (iii) embed a mixed-integer optimization layer when instance sizes are small enough.

This computational discussion sets up the main results that follow: the more tightly and frequently projection must correct P̃_θ, the more important it is to use a projector that is both stable and differentiable, since that correction is precisely what governs (i) revenue loss via the size of the adjustment and (ii) incentive degradation via payment sensitivity to changes in P.

5. Main theoretical results: exact feasibility/fairness; revenue-loss bounds; regret-stability bounds; conditions under which projection commutes with symmetry/equivariance.

Our main theoretical message is that the projection layer cleanly separates from . Feasibility and fairness/business constraints are enforced at inference, while revenue and incentive properties are controlled by how far the projection must move the raw score output. This viewpoint is practically relevant: in policy-constrained markets we typically cannot tolerate even rare constraint violations, but we can tolerate small efficiency losses provided they are predictable and monitorable.

Fix any bid/context pair (b, x), and consider the deployed allocation

where 𝒬 is a nonempty compact polytope encoding feasibility and linear fairness/business constraints, and ψ is strictly convex and separable. The first result is deliberately ``boring’’ but economically central: Because D_ψ(⋅∥P̃) is strictly convex in its first argument and 𝒬 is closed and convex, the projection exists and is unique; hence the mapping (b, x) ↦ P̂(b, x) is single-valued. By definition of the argmin, we also have P̂(b, x) ∈ 𝒬 for every (b, x), so all feasibility constraints (row/column sums, nonnegativity, eligibility masks encoded as linear constraints) and all linear fairness constraints (e.g., group-item quota bands) hold exactly at deployment.

This is the key compliance guarantee: regardless of what the score network outputs—including out-of-distribution situations—the projection acts as a . In audit terms, constraint satisfaction is certified by the solver residuals rather than by generalization claims about a learned predictor.

Compliance may force us to deviate from the raw allocation scores, and the natural question is how costly that deviation can be. We therefore study the gap between the objective value under P̃_θ(b, x) and under its projection P̂(b, x). Let Rev(P; v) denote the ex-post revenue (or more generally the seller objective induced by a fixed payment rule) as a function of the allocation matrix P and the realized valuation profile v. We assume a Lipschitz property in the allocation argument: there exists L < ∞ such that for all P, Q and all v,

Then, pointwise for each (b, x, v),

The economic content is straightforward: When the learned scores already lie close to 𝒬 (because the network internalizes constraints approximately, or because the realized instance is slack), the correction term ∥P̂ − P̃∥₁ is small and so is the implied revenue loss bound. Taking expectations over the data-generating process yields the analogous expected bound

which is the quantity we can estimate from held-out samples at training time.

While is an assumption, it is often mild in bounded environments. For instance, if payments are bounded and depend smoothly on allocations (as in many differentiable payment parameterizations), then L can be interpreted as a worst-case payment sensitivity to reallocating marginal probability mass. The bound does claim that projection is free; rather, it identifies a measurable statistic—projection displacement—that mediates the compliance–revenue tradeoff.

We next address incentives. Because projection modifies allocations, it can in principle change strategic behavior even when payments are fixed. Our goal is therefore not to assert DSIC, but to show a property: if we trained a mechanism with small regret before projection, and if payments are not too sensitive to allocation perturbations, then regret remains controlled after projection.

Let t_ϕ(b, x, P) be the payment rule, and suppose it is Lipschitz in P:

Fix bidder i and valuation profile v. Write u_i(P, t; v) = ∑_jP_ijv_ij − t_i, and define the ex-post regret of the projected mechanism as

with an analogous definition rgt_i^raw for P̃_θ. Under , the increase in regret induced by projection can be bounded by a term proportional to ∥P̂ − P̃∥₁ (up to constants). A convenient sufficient bound (when values are also bounded, say v_ij ∈ [0, v̄]) is

where the factor 2 comes from comparing truthful and deviating utilities under the two allocations and applying the triangle inequality. The interpretation matches practice: projection is a compliance step that may introduce additional ``noise’’ from the bidders’ perspective, but if the compliance correction is small (or payments are insensitive), then any incentive degradation is correspondingly small. Conversely, if projection routinely makes large moves, we should not expect approximate-DSIC training to survive unchanged; the bound helps diagnose when this is likely to occur.

Finally, we formalize a structural property that matters both for statistical efficiency and for perceived fairness: . Many assignment environments are exchangeable within classes of bidders or items, and we often impose architectural equivariances on P̃_θ (e.g., permutation equivariance over bidders). A natural concern is whether projection breaks these symmetries.

Let σ be a permutation of bidders and π a permutation of items, and let (σ, π) ⋅ P denote the corresponding row/column permuted matrix. If (i) 𝒬 is invariant under the same permutations (i.e., P ∈ 𝒬 ⇒ (σ, π) ⋅ P ∈ 𝒬), and (ii) the generator is entrywise identical (so D_ψ is permutation-invariant), then uniqueness of the projection implies the commutation property

Thus, if the score network is equivariant and the constraints respect the same symmetry, the deployed mechanism remains equivariant. This is particularly relevant for symmetry: group-based quota constraints typically break full bidder-permutation invariance, but they preserve invariance to relabeling each protected group. Projection respects exactly the symmetries that are respected by 𝒬; it does not introduce arbitrary asymmetries of its own.

Taken together, these results motivate the training protocol that follows. Projection gives us exact compliance. The remaining question is how to choose and train P̃_θ (and t_ϕ) so that (i) ∥P̂ − P̃∥₁ is small on the relevant distribution, and (ii) payment sensitivity is controlled, ensuring that revenue and regret degrade gracefully under the compliance correction. In the next section we translate these theoretical dependencies into an end-to-end objective, a practical two-stage option, and monitoring/certification procedures that report constraint satisfaction, projection displacement, and regret proxies on held-out data.

6. Learning objective and training protocol: end-to-end differentiable training with projection in the loop; optional two-stage training (train P then calibrate projection temperature); monitoring and certification at deployment.

We now translate the comparative statics from Section~ into a training and deployment protocol. The guiding idea is simple: since feasibility and fairness/business constraints are guaranteed by the projection , learning should focus on (i) achieving high revenue (or low cost) the projected allocation, (ii) controlling incentives via regret penalties, and (iii) reducing the expected so that compliance is not purchased with large and unpredictable distortion.

Because the deployed allocation is P̂(b, x) = Π_𝒬^ψ(P̃_θ(b, x)), we train , rather than on the raw scores. Concretely, given samples (x, v) from an offline distribution (and possibly simulated bids b from a behavioral model or past logs), we consider an empirical objective of the form

where $\widehat{\mathrm{rgt}}_i$ is a differentiable (or subdifferentiable) proxy for ex-post regret and Disp(P̃, P̂) is a statistic that encourages the score network to output near-feasible allocations. Two practical choices for Disp are the Bregman divergence induced by the same generator, Disp(P̃, P̂) = D_ψ(P̂∥P̃), or a norm-based proxy such as ∥P̂ − P̃∥₁ (often smoothed). The economic role of the displacement term is to operationalize the bounds in and : we directly penalize the quantity that mediates the compliance–revenue–incentives tradeoff.

The regret proxy is computed by approximating the maximization over deviations. Given a finite candidate set 𝒮_i(b_i, x) ⊆ 𝒞_i (e.g., random perturbations, grid points, or outputs of a learned best-response network), we use

While is only an approximation to the true regret, it aligns training pressure with the relevant strategic failure modes (e.g., bid shading, misreporting across items). In our experience, it is important to diversify 𝒮_i beyond small local moves: projection can create nonlocal effects (especially under tight quotas), so purely gradient-based local deviations may understate exploitable manipulations.

The objective requires gradients through the mapping P̃_θ ↦ P̂ = Π_𝒬^ψ(P̃_θ). For the convex polytopes we emphasize, this is standard. When ψ is entropic and 𝒬 is a transportation polytope with additional linear quotas, we can compute P̂ by generalized Sinkhorn/Bregman iterative scaling and backpropagate through a fixed number of iterations. More generally, since P̂ solves a strictly convex program with linear constraints, we may differentiate via the KKT system (implicit differentiation). The practical implication is that we can train (θ, ϕ) with standard stochastic gradient methods, treating the projection as a differentiable layer whose Jacobian is obtained either by unrolling or by solving a linear system associated with the KKT conditions. We emphasize a limitation: if 𝒬 includes genuinely nonconvex business rules (e.g., minimum number of winners with integrality), then exact projection is intractable and end-to-end differentiation must proceed through a convex relaxation or a surrogate layer; in such cases, the compliance guarantee applies to the relaxed constraint set.

A convenient protocol is to alternate between (i) updating θ to improve the quality of the score output under the allocation and (ii) updating ϕ to align payments with revenue and incentive goals under the same projected allocation. In each minibatch we (a) compute P̃_θ(b, x), (b) project to obtain P̂(b, x), (c) evaluate revenue and regret proxies using t_ϕ(b, x, P̂), and (d) backpropagate through the entire pipeline. Two implementation details matter for stability. First, we typically clip or smooth bids/values and ensure P̃_θ(b, x) remains strictly positive when using KL-type divergences, avoiding numerical issues at the boundary. Second, we anneal λ_rgt and λ_disp: early in training we emphasize displacement reduction (so the network learns the geometry of 𝒬), and later we shift weight toward regret and revenue, when the projection corrections are already moderate.

Although end-to-end training is conceptually clean, a two-stage alternative is often attractive in production, where teams may want modularity between prediction'' andcompliance.’’ In Stage~1, we train P̃_θ (and optionally t_ϕ) to optimize revenue and regret with a constraint signal—either by adding penalty terms for violations of 𝒬 or by using a fast approximate projection during training. In Stage~2, we freeze (θ, ϕ) and calibrate the projection geometry/temperature to achieve a targeted distortion profile. For entropic projections, one can introduce a temperature parameter τ > 0 into the generator (equivalently scaling the divergence), producing smoother allocations for larger τ and more ``peaked’’ assignments for smaller τ. We select τ by a held-out sweep that reports three quantities jointly: realized revenue, average and tail projection displacement (e.g., 𝔼[∥P̂ − P̃∥₁] and its 95th percentile), and regret proxies. This calibration is economically meaningful: τ controls how aggressively we force near-integer allocations versus how much we spread probability mass to satisfy quotas with minimal displacement. The two-stage approach also makes it easier to update compliance behavior (via τ or via updated quotas) without retraining the score network, at the cost of potentially higher distortion if P̃_θ has not internalized the constraint geometry.

Because projection enforces constraints by construction, deployment monitoring should focus on rather than violation detection. We recommend logging (i) solver residuals and feasibility certificates (which should be near machine precision), (ii) the realized displacement ∥P̂ − P̃∥₁ (or D_ψ(P̂∥P̃)), and (iii) dual variables (shadow prices) associated with the fairness/business constraints. Dual variables are particularly informative for policy: persistent high multipliers indicate that quotas are binding and costly, which can motivate revisiting the constraint design or increasing supply in constrained categories.

For incentive monitoring, we cannot observe true values v in most applications, so we use two complementary approaches. First, we compute offline regret proxies on synthetic or estimated valuation models, including stress tests that concentrate on instances with high displacement (where Proposition~3 predicts larger degradation). Second, we track behavioral indicators in logs (e.g., bid shading patterns, abnormal item-switching) as a coarse early-warning system. Finally, to address distribution shift in x, we maintain dashboards of displacement quantiles and dual-variable distributions across time and across segments. A salient operational rule is: if displacement shifts upward materially, we should expect revenue and regret performance to drift even though feasibility and fairness remain exact; the projection layer turns compliance risk into a measurable distortion risk that can be monitored and governed.

7. Experiments: synthetic multi-item auctions and a quota-constrained ad-allocation proxy; compare penalty-only vs projection-based across (i) constraint violations, (ii) revenue, (iii) regret, (iv) robustness under distribution shift; ablations for projection temperature and constraint tightness.

We evaluate the projection-based design in two complementary settings. The first is a controlled synthetic environment where we can measure ex-post outcomes against known values and systematically vary the geometry of 𝒬. The second is a quota-constrained ad-allocation proxy that mirrors common production constraints (eligibility, pacing-like caps, and group-based share rules) and stresses the mechanism under distribution shift in context x. Across both settings we compare (i) constraint violations, (ii) revenue, (iii) incentive proxies (regret), and (iv) robustness, and we report ablations for projection temperature and constraint tightness.

We generate i.i.d. rounds with n bidders and m items. Each bidder belongs to a protected group g(i) ∈ G and has unit-demand values v_ij drawn from a parametric family that depends on observed features (e.g., item quality, bidder–item match, and group effects). Bids are set either equal to values (truthful training/evaluation) or drawn from a simple shading model to reflect behavioral heterogeneity; the goal is not to claim a full equilibrium model, but to expose the learned mechanism to plausible deviations. The constraint set 𝒬 includes standard feasibility (row/column sums) and linear group quotas of the form ℓ_gj ≤ ∑_{i : g(i) = g}P_ij ≤ u_gj, with ℓ_gj, u_gj chosen so that 𝒬 is nonempty. By sweeping the width u_gj − ℓ_gj we can directly study the comparative statics predicted in Section~: tighter bands increase the projection displacement and (weakly) reduce attainable revenue.

We also consider a stylized ad allocation problem in which items j correspond to impression opportunities with context x_j (user features, page metadata), and bidders i correspond to campaigns with eligibility constraints (some (i, j) pairs are forbidden). Values v_ij depend on relevance and predicted conversion, with group structure induced either by campaign attributes (e.g., advertiser category) or by regulated segments of traffic. The constraint set 𝒬 combines feasibility with (a) per-group share constraints over subsets of impressions (e.g., a minimum and maximum share within a traffic slice), and (b) business constraints such as per-campaign exposure caps (implemented as additional linear inequalities on row sums over time windows). This proxy is useful because it creates sparse, heterogeneous constraint activity: some quotas bind only in certain contexts, making it easy to see whether a method is robust when the binding set changes over time.

We compare three approaches that isolate the role of projection. (1) : we train an unconstrained network output P̃_θ and add hinge penalties for each linear constraint violation (plus feasibility penalties), but deploy P̃_θ directly. (2) : we train as in (1) but deploy P̂ = Π_𝒬^ψ(P̃_θ) at inference without training the model to anticipate projection. (3) (ours): we train and evaluate on P̂ with an explicit displacement term, as in . This separation is important in practice: many teams can add penalties easily, and many teams can enforce constraints at inference, but the economic question is whether integrating projection into training reduces distortion costs and stabilizes incentives.

Constraint violations are summarized by the maximum and average absolute slack across all linear constraints, and by tail risk (e.g., 95th percentile violation), since policy failures are typically driven by rare but large breaches. Revenue is measured as realized seller revenue under the learned payment rule, evaluated on held-out rounds. For incentives, we report an ex-post regret proxy using diversified deviation sets 𝒮_i (including nonlocal deviations across items), and we stratify regret by instances with high displacement ∥P̂ − P̃∥₁ to test the qualitative content of Proposition~3. Finally, to diagnose robustness, we track the distribution of projection displacement and the distribution of dual multipliers on fairness constraints; persistent or shifting multipliers provide a policy-relevant indicator of which constraints are economically binding.

As expected, projection-based deployment yields zero violations up to numerical tolerances in all regimes where 𝒬 is feasible. Penalty-only methods can be tuned to achieve small violations, but we consistently observe heavier tails: rare contexts create large breaches when constraints become locally tight (e.g., when group composition shifts or eligibility sparsifies the feasible set). This gap widens under distribution shift, where penalty-only models often extrapolate poorly and violate quotas by economically meaningful margins. Post-hoc projection fixes violations at inference, but at the cost of potentially large and unanticipated corrections, which show up in the revenue and regret comparisons below.

In-distribution, projection-in-the-loop attains revenue close to penalty-only when constraints are loose, and dominates as constraints tighten. The mechanism-level intuition is that penalties must trade off objective improvement against satisfying constraints in expectation, whereas projection enforces constraints pointwise; once penalties become large enough to reduce tail violations, they also reshape the learned score landscape in ways that can reduce revenue even in regions where constraints are slack. By contrast, training with the projection layer internalizes the geometry of 𝒬, reducing the expected displacement D_ψ(P̂∥P̃) and thereby reducing the ``distortion tax’’ paid at inference. The post-hoc projection baseline is particularly informative: even when its revenue is similar on average, it exhibits larger variance and worse downside performance because P̃_θ is not optimized for the projected outcome, so P̂ can be far from the raw argmax in precisely those rounds where fairness constraints bind.

Across both environments, regret proxies are strongly increasing in displacement, consistent with the stability logic in Proposition~3. Penalty-only models can have low regret in regimes where violations are tolerated, but once penalty weights are raised to control violations, regret tends to rise: the learned allocation becomes sensitive to small bid changes near constraint boundaries, creating exploitable discontinuities in the effective assignment after soft constraints ``almost bind.’’ Projection-in-the-loop mitigates this by (i) guaranteeing feasibility/fairness exactly, which removes strategic opportunities that stem from constraint-breaking allocations, and (ii) reducing the typical size of the projection correction. In particular, when we stratify evaluation rounds by displacement quantiles, the top tail accounts for a disproportionate share of regret for post-hoc projection, while projection-in-the-loop shifts mass away from that tail, yielding more uniform incentive performance.

We consider shifts that are operationally natural: changes in group proportions in x (e.g., an influx of a protected segment), shifts in value distributions (e.g., higher variance or different bidder heterogeneity), and shifts in eligibility sparsity (e.g., more forbiddances due to policy updates). Penalty-only models degrade sharply in constraint compliance and often in revenue, reflecting that penalty tuning is implicitly calibrated to the training distribution. Projection-based deployment preserves compliance by construction, so the relevant robustness question becomes how much revenue and regret drift as displacement grows. Empirically, displacement is a sensitive early-warning metric: when the environment shifts toward tighter effective feasibility, we see ∥P̂ − P̃∥₁ increase and revenue decline smoothly, rather than sudden failures. This supports the governance perspective that projection transforms a compliance problem into a measurable distortion problem.

We vary the entropic temperature τ in the KL geometry used for Π_𝒬^ψ. Larger τ yields smoother allocations with smaller sensitivity but can reduce revenue when near-integer assignments are valuable; smaller τ produces sharper allocations but can increase numerical stiffness and amplify displacement when quotas are tight. In both environments, we observe an interior ``sweet spot’’: moderate τ improves optimization stability and reduces regret without materially harming revenue, while extreme values expose the expected tradeoff between sharpness and robustness. Separately, tightening quota bands predictably increases displacement and lowers revenue; importantly, projection-in-the-loop exhibits a more graceful slope because the model learns to place P̃_θ(b, x) closer to the tightening set 𝒬, whereas post-hoc projection pays the full cost of tightening at inference.

The experiments support the core economic claim of our design: enforcing feasibility and fairness by construction is not merely a compliance convenience, but a way to make the efficiency–incentives–policy tradeoff transparent and controllable. When we must satisfy constraints exactly, the central object becomes the projection displacement, which predicts revenue loss and regret degradation and can be monitored under shift. At the same time, we do not interpret these results as a guarantee of dominant-strategy truthfulness: rather, they motivate the extensions in the next section, where we discuss richer fairness constraints and paths toward certified approximate incentive compatibility.

8. Discussion and extensions: richer fairness (total variation fairness), procurement/business constraints, variable n/m; limitations (DSIC not guaranteed), and paths to certified approximate IC.

Our experiments treat 𝒬 as a generic linear-constraint polytope and emphasize the it enables: we can enforce feasibility and policy constraints exactly by projection, while using learning to optimize revenue and (approximate) incentives within that compliant region. Here we discuss how far this separation extends, what breaks when constraints depart from convexity, and what we view as practical paths toward approximate incentive compatibility.

Quota bands ℓ_gj ≤ ∑_{i : g(i) = g}P_ij ≤ u_gj are interpretable but narrow: they control marginal exposure by item (or impression class) and group, yet they do not prevent large across items that preserves quotas. A natural strengthening is to constrain the between group-conditioned allocation profiles. For example, let
a_gj = ∑_{i : g(i) = g}P_ij, a_g ∈ ℝ₊^m.
Total-variation (TV) fairness can be expressed as ∥a_g − a_g^′∥₁ ≤ Δ_gg^′ for selected pairs (g, g^′), optionally after normalizing by group mass in the eligible pool (to avoid penalizing groups with systematically fewer feasible edges). Because the ℓ₁ norm is linearizable, TV constraints remain compatible with our inference-time projection framework: introduce auxiliary variables z_gg^′j ≥ 0 and impose
z_gg^′j ≥ a_gj − a_g^′j, z_gg^′j ≥ −(a_gj − a_g^′j), ∑_j ∈ Mz_gg^′j ≤ Δ_gg^′.
This turns a distributional parity goal into a set of linear constraints on P (via a_gj), keeping 𝒬 convex. Substantively, TV constraints shift the policy question from how much of item $j$ goes to group $g$'' tohow differently do two groups experience the platform overall,’’ which is often closer to regulatory intent in ranking and exposure settings. The limitation is equally clear: TV fairness is only as meaningful as the definition of the item partition and eligibility mask; if item categories are coarse or eligibility is itself correlated with group status, then the constraint can either be too weak (hiding disparities) or infeasible (demanding parity where the marketplace cannot supply it).

Many production constraints are naturally defined over of items or time windows (e.g., shares within a traffic slice, or per-group exposure over a day). These can be incorporated by indexing constraints by a slice s determined by context x and by aggregating allocation mass over the items in that slice, e.g.,
ℓ_gs ≤ ∑_{j ∈ M_s(x)}∑_{i : g(i) = g}P_ij ≤ u_gs.
When slices overlap, feasibility can become delicate: nonemptiness of 𝒬 is no longer automatic, and diagnosing infeasibility becomes a governance problem in its own right. A practical extension is to add with explicit penalties in the objective while maintaining hard feasibility (row/column sums), thereby producing a ``least-violating’’ allocation when policy targets are mutually inconsistent. Importantly, this changes the compliance story: the mechanism remains feasibility-safe, but fairness becomes an optimization target rather than a pointwise guarantee, and the resulting slack can be audited.

Our notation is symmetric enough to cover procurement settings (e.g., allocating tasks to suppliers) where the seller minimizes expected payments or costs. Conceptually, the projection layer is unchanged: we still compute P̂ = Π_𝒬^ψ(P̃_θ) and then apply a payment rule t(b, x, P̂). What changes is the economic meaning of constraints: fairness may require minimum awards to small suppliers, geographic parity, or risk diversification, while feasibility may include capacity limits and quality thresholds. In these settings, nonnegativity and unit-demand constraints can be supplemented with linear constraints of the form ∑_jP_ij ≤ κ_i (supplier capacity) or slice-based minimums $\sum_{j\in \mathcal{J}} P_{ij}\ge \underline{\kappa}_{i\mathcal{J}}$. The same comparative statics apply: tighter constraints increase ∥P̂ − P̃∥₁ and thus, via Propositions~2–3, predictably increase distortion in the objective and degrade incentive robustness when payments are allocation-sensitive.

Some widely requested rules are inherently combinatorial: at least $K$ distinct winners,''no bidder wins more than one item in a category,’’ or ``exactly one winner per item’’ (integrality). Once 𝒬 includes such nonconvexities, exact projection can become intractable (as emphasized by Proposition~4). We view two relax-and-repair patterns as promising. First, convex relaxations (fractional P with entropic geometry) produce a allocation that respects linear policy constraints; if downstream systems require integral assignments, randomized rounding can be applied with additional safeguards to preserve quotas in expectation. Second, mixed-integer projection layers can be used for small subproblems (e.g., a top-k shortlist), but we caution that end-to-end learning with integer solvers tends to be brittle and may undermine the operational simplicity that motivates projection in the first place.

Real marketplaces rarely have fixed numbers of bidders and items. At the modeling level, P̃_θ(b, x) can be produced by permutation-equivariant set architectures (e.g., attention over bidder and item embeddings) that output an n × m score matrix for the current instance. At the constraint level, 𝒬 must be constructed from the eligibility graph and policy metadata in x (e.g., group labels and slice membership). The projection then operates on a varying-dimension polytope, but the core guarantee remains: for each realized instance, P̂(b, x) ∈ 𝒬(x). Practically, batching requires padding and masking; conceptually, it requires care in defining group constraints when groups are absent in an instance (to avoid vacuous or infeasible constraints). This is a governance-relevant detail: fairness targets stated at the population level may be ill-posed at the micro-auction level unless they are conditioned on sufficient participation.

A central limitation is that compliance-by-construction does not imply dominant-strategy incentive compatibility. Projection changes allocations in a context-dependent, potentially non-monotone way, and even if P̃_θ were monotone in bids, P̂ need not be. Our theory therefore treats incentives via (approximate) regret, with stability controlled by displacement and payment sensitivity. This is an honest trade: in complex constrained environments, enforcing strict DSIC may be either impossible or excessively costly in revenue, and many production systems already rely on behavioral assumptions plus monitoring rather than formal equilibrium guarantees. Still, it is important to avoid over-claiming: projection is a , not an incentive certificate.

We see three complementary directions toward stronger guarantees. (i) : for restricted deviation sets 𝒮_i (e.g., multiplicative shading, item-wise scaling), one can compute worst-case utility gains by solving a differentiable best-response optimization against the deployed mechanism and then upper bound errors via Lipschitz constants and numerical tolerances. (ii) : when P̂ is computed by a convex program (e.g., KL projection onto linear 𝒬), KKT conditions provide dual variables that characterize local sensitivity of P̂ to changes in P̃; combined with bounds on how P̃_θ changes with bids, this can yield instance-wise upper bounds on utility improvement from small deviations. (iii) : designing payment rules t_ϕ(b, x, P) to be less sensitive in P (smaller L_t) directly strengthens Proposition~3-style stability bounds, even if it sacrifices some revenue. Each approach involves explicit modeling choices about which deviations matter operationally, but that is arguably appropriate: certification should reflect the manipulation surface actually available to bidders.

Finally, projection makes the policy tradeoff legible but not painless. If constraints are frequently binding, displacement becomes a recurring ``shadow price’’ of compliance, and it may indicate that the policy targets are misaligned with market primitives (eligibility, participation, or heterogeneity in values). In that sense, the projection layer is also an auditing tool: it exposes, via displacement and dual multipliers, when fairness goals require systematic distortion rather than occasional correction. This diagnostic role is part of why we view inference-time feasibility not merely as an engineering trick, but as a mechanism-design primitive that supports measurable governance.

9. Conclusion: inference-time feasibility as a new design primitive for 2026 compliance-first mechanism learning.

We have argued that —implemented as a projection of a learned, unconstrained allocation score matrix onto an explicit constraint polytope—should be viewed as a genuine mechanism-design primitive rather than a post-hoc engineering patch. The practical motivation is straightforward: in many marketplaces the binding requirements are not the objective itself (e.g., revenue) but rather a heterogeneous set of constraints arising from policy, platform integrity, and business governance. In 2026, this ``compliance-first’’ reality is only becoming more pronounced: regulations increasingly demand auditable guarantees, internal risk frameworks require predictable behavior under distribution shift, and operational teams need the ability to change constraints quickly without retraining large models. Our central message is that projection-based deployment offers a disciplined way to separate what must be from what can be , while retaining the flexibility of learning.

At the technical level, the separation is clean. We allow a differentiable model to output an unconstrained allocation score P̃_θ(b, x) ∈ ℝ₊^n × m, which can be as expressive as needed to exploit context x and strategic reports b. We then deploy the allocation
P̂(b, x) ∈ arg min_{P ∈ 𝒬}D_ψ (P ∥ P̃_θ(b, x)),
where 𝒬 encodes feasibility and linear policy constraints. This step does not ``encourage’’ compliance; it it. Because the projection is defined with respect to a strictly convex separable Bregman divergence, the resulting P̂ is well-defined and unique whenever 𝒬 is nonempty and closed, and it satisfies all constraints pointwise. The mechanism therefore inherits a form of robustness that purely penalty-based training lacks: even when the learned scores drift, even when the bid distribution shifts, and even when new slices of context appear at inference, the deployed allocation remains inside the vetted policy set by construction.

At the same time, we have been careful to make explicit the of this safeguard and the variables that govern it. Projection can distort the allocation away from what the score network wanted'' to do, and the right yardstick for that distortion is the displacement $\|\hat P-\tilde P_\theta\|_1$ (or the corresponding Bregman divergence). Under mild Lipschitz assumptions on the seller objective, this displacement directly controls the worst-case revenue loss from enforcing constraints. Under Lipschitz assumptions on payments as a function of allocations, the same displacement controls the degradation of incentive properties when moving from $\tilde P_\theta$ to $\hat P$. In other words, projection does not make constrained mechanism design easy, but it makes the tradeoff \emph{measurable}: we can monitor, instance-by-instance and in expectation, how much compliancecosts’’ and how much it perturbs strategic incentives. This measurability is valuable in practice precisely because it supports governance: it turns what is often an implicit and fragile compromise (soft constraints with unclear failure modes) into an explicit object that can be audited and stress-tested.

Our broader conceptual contribution is to frame projection as a modular interface between three parties whose objectives are often in tension. The policy maker or regulator specifies 𝒬 and therefore determines the admissible set of allocations; the learning system chooses θ to perform well that admissible region, anticipating that the projection will be applied; and the mechanism designer selects payment rules and training objectives to manage incentives, cognizant that strict DSIC may be unattainable or undesirable under rich constraints. This modularity is not only computationally convenient. It also has a normative benefit: it forces us to encode fairness and business rules as transparent constraints rather than as opaque terms in a neural loss. When a constraint is infeasible, frequently binding, or unexpectedly costly, this becomes visible through projection displacement and dual multipliers, and that visibility enables informed renegotiation of the targets rather than silent degradation of system performance.

We also emphasize what this primitive does solve. Projection is not a substitute for incentive analysis, and it should not be misinterpreted as a certificate of truthfulness. The allocation mapping b ↦ P̂(b, x) can fail monotonicity even when P̃_θ is well-behaved, and linear constraints can introduce kinks that amplify strategic externalities across bidders. For that reason, our preferred notion of incentive robustness is approximate: ex-post regret evaluated over relevant deviation sets, coupled with stability bounds that tie regret changes to the projection displacement and payment sensitivity. This perspective aligns with how many real systems are operated: they do not rely exclusively on equilibrium theory, but instead combine mechanism rules with monitoring, throttling, bidder policies, and enforcement. Projection fits naturally into that operational worldview because it delivers compliance while still allowing the incentive layer to be tuned and validated empirically.

Looking forward, we view three research directions as particularly consequential for making compliance-first mechanism learning both principled and deployable. First, we need better methods for when policy constraints are specified across slices, time windows, or multi-round budgets. Even when each single-auction polytope is feasible, temporal coupling can induce infeasibility or volatile shadow prices; projection remains useful, but the correct object becomes a dynamic feasibility set and a state-dependent projection. Second, we need scalable tools for : bounding regret, bounding sensitivity, and diagnosing when constraints are responsible for performance loss versus when the score network is under-trained. Dual information from convex projection is a promising foundation here, because it provides a language for explaining outcomes in terms that are legible to governance teams (``which constraint bound and by how much’’). Third, we need a more systematic understanding of itself: how to translate high-level fairness mandates into linear constraints that are meaningful under eligibility masks and sparse participation, and how to choose the granularity of item partitions and group definitions so that the resulting 𝒬 is both feasible and aligned with welfare.

The main practical takeaway is that projection allows organizations to treat compliance as a first-class artifact: a version-controlled constraint set 𝒬 that can be reviewed, stress-tested, and modified without retraining the entire mechanism. That is a different operating model from end-to-end ``learned compliance,’’ where a shift in data or an innocuous hyperparameter change can reintroduce violations. In that sense, inference-time feasibility is not merely a mathematical convenience; it is an institutional technology that aligns mechanism learning with auditability and policy iteration. Our model illuminates the core tradeoff: tighter constraints induce larger displacement, which can reduce revenue and complicate incentives, but the system remains safe in the dimensions that matter most for compliance. We therefore see projection-based deployment as a natural foundation for the next generation of constrained marketplaces: systems that optimize aggressively where they are allowed to, and that behave predictably where they must.