← Back

Regulated Linear Contracts in Multitasking: Projection Characterizations and Instrumental Learning

Metadata

Table of Contents

  1. 1. Introduction: platform-era motivation (caps, transparency, fairness), why linear and why constraints; preview projection and IV learning results.
  2. 2. Baseline multitask model: agent effort, noisy signals, linear contracts, limited liability; recap of uniformity result β* = θ*/k and its interpretation.
  3. 3. Regulated contract design: define convex policy sets C (caps, monotonicity, equality/ratio constraints); formulate the principal’s constrained optimization.
  4. 4. Closed-form projection theorem (quadratic costs): derive a(β) = Qβ and show constrained optimum is weighted projection; give KKT and examples (caps, monotone, group-equality).
  5. 5. Beyond quadratic: Bregman-projection characterization under k-homogeneous costs; existence/uniqueness; when numerical convex optimization is required; approximation under ε-homogeneity.
  6. 6. Learning θ* under measurement error with policy constraints: IV/GMM estimator; plug-in projection ˆβ; finite-sample error bounds; weak-IV diagnostics tied to eigenvalues of design matrices.
  7. 7. Online learning with compliance: projected explore-then-commit regret bound; extension to repeated measurements + diversity (projection preserves exploration-free guarantees).
  8. 8. Implementation notes: choosing C from policy; computational routines for projections (QP for quadratic case; convex solvers for Bregman case); monitoring and auditing instruments.
  9. 9. Extensions and discussion: heterogeneous k, delayed outcomes, partial participation; empirical pathways and what data platforms must collect.
  10. 10. Conclusion.

Content

1. Introduction: platform-era motivation (caps, transparency, fairness), why linear and why constraints; preview projection and IV learning results.

Many of the most consequential compensation schemes today are designed not by a single employer and a single worker, but by platforms that mediate millions of transactions and that continuously revise their rules. Drivers are paid through formulas that combine acceptance rates, completion rates, and passenger ratings; creators are rewarded through mixtures of watch time, engagement, and policy compliance; sellers face fee schedules that depend on shipping speed and customer satisfaction. In each of these settings, the platform does not merely ``measure’’ performance: it chooses which dimensions are rewarded and by how much. The modern policy challenge is that these incentive weights are rarely unconstrained. Platforms face regulatory caps, transparency requirements, and internal commitments to fairness and interpretability, and they must enforce these constraints while still eliciting productive effort across multiple tasks.

We study this tension in a multitask principal–agent model tailored to the platform context. The economic force is familiar: whenever one activity is easier to measure than another, or whenever some behaviors are rewarded and others are not, the agent reallocates effort in response. What is distinctive in platform environments is the combination of (i) high-dimensional, noisy measurement; (ii) limited liability and related restrictions that prevent punitive ``negative’’ incentives; (iii) explicit constraints on the incentive weights arising from law, product design, or reputational considerations; and (iv) the fact that the platform often does not know the true marginal value of each task ex ante and must learn it from data generated under its own past contracts. Our goal is to provide a unified, tractable description of optimal contract design under such constraints and of how a platform can learn to implement it over time.

We focus on linear contracts—payments that are weighted sums of observable task signals—not because linearity is always optimal in a frictionless environment, but because it is the natural language of implementable, auditable policy in large-scale systems. Linear rules are simple to communicate and to verify; they fit the engineering reality that platforms often score participants using a finite set of metrics; and they interact cleanly with institutional constraints such as caps (no more than \$X per unit''), monotonicity (do not penalize higher output’‘), and parity constraints (``equalize weights across protected groups or across task categories’’). Moreover, linear contracts provide a disciplined benchmark for asking how far regulation pushes a platform away from the unconstrained incentive scheme, and for quantifying the welfare cost of compliance.

Our approach treats regulation and platform commitments as a set of feasible incentive weights. This is intentionally broad. A cap on bonuses corresponds to upper bounds on weights; limited liability corresponds to nonnegativity; transparency and interpretability often lead to sparsity or simple shape constraints; and fairness policies can impose linear equalities or inequalities that couple weights across tasks. A key modeling choice is to encode these requirements as a closed, convex set of admissible weights. Convexity reflects the idea that mixtures of compliant policies remain compliant, and it makes the problem computationally meaningful: a platform can search over the feasible set using standard convex optimization tools. At the same time, this abstraction forces us to confront an economic question: when the feasible set is binding, what form does the optimal distortion take? Does regulation generate qualitatively new incentive patterns, or does it act as a predictable ``truncation’’ of an otherwise natural contract?

The first insight we develop is that, absent regulation, a large class of multitask technologies leads to a strikingly simple incentive rule. When the agent’s cost of effort exhibits a common degree of homogeneity across tasks—capturing the idea that scaling up all activities becomes disproportionately more expensive when effort is convex—the platform’s optimal linear weights are proportional to the marginal values of the tasks. The proportionality factor is determined by the curvature of the technology: stronger convexity translates into weaker incentives, uniformly across dimensions. This result formalizes an intuitive principle that is often invoked informally in platform debates: if every task becomes ``hard’’ at the same rate as the agent scales effort, then the platform need not differentially shade incentives across tasks in order to manage distortions; the optimal policy is to scale down the entire vector of marginal values.

Regulation breaks this proportionality, but not in an arbitrary way. Our second insight is that, in an important benchmark with quadratic costs, the regulated optimum can be characterized as a geometric projection of the unconstrained ideal onto the feasible set. In this case, compliance does not require re-solving the full principal–agent problem from scratch; instead, the platform can interpret regulation as choosing the closest compliant vector of weights to the unconstrained target, where ``closeness’’ is measured in a metric induced by the effort technology. This projection perspective is useful for two reasons. Economically, it clarifies how different constraints bind: a cap clips extreme weights; parity constraints average weights across constrained dimensions; monotonicity restricts the direction of adjustments. Computationally, it reduces contract design to a standard convex program with a unique solution, making comparative statics transparent and implementation feasible at scale.

Beyond the quadratic benchmark, the same geometric idea persists, but the relevant notion of distance changes. For general homogeneous and strictly convex effort costs, the distortion created by regulation is naturally measured not by an ordinary Euclidean norm but by a divergence induced by the agent’s technology. In other words, the ``projection’’ occurs in a geometry determined by how incentives translate into effort. This matters in practice: in some environments, increasing a weight by a given amount changes behavior much more in one task than in another, and it is precisely this asymmetry that should govern how the platform trades off compliance against performance. Our framework accommodates this by linking the regulated optimum to a Bregman-type projection, which remains computationally tractable even when closed forms are unavailable.

A third motivation for our analysis is learning. Platforms rarely know the true marginal value of each task—for example, how much an additional unit of response speed improves retention, or how much additional content moderation reduces long-run churn. These values must be inferred from data, but the data are endogenous: the platform changes incentives, agents change behavior, and observed task signals are noisy and often incomplete. In such settings, naive regression of outcomes on observed signals can be badly biased because the signal is measured with error and because the platform’s own choice of incentives affects both the level and composition of effort. We therefore take seriously the econometric problem faced by a platform that must learn .

Our learning results use a simple principle: posted incentive weights themselves can serve as instruments for the unobserved effort that drives outcomes. Because the platform chooses the weights and observes the resulting noisy signals and outcomes, it can form moment conditions that remain valid despite measurement error in signals. This yields an IV/GMM approach to estimating task values from observational data generated by past contracts. Importantly, once we estimate task values, we can enforce regulatory constraints by projecting the estimated ideal contract onto the feasible set. This plug-in projection step is stable: small estimation errors in values translate into proportionally small errors in the implemented contract, so compliance does not amplify statistical noise.

Finally, we connect these pieces in an online setting where the platform repeatedly interacts with agents and updates its contract over time. Here the platform faces the classic exploration–exploitation tradeoff, complicated by compliance: exploration must itself remain within the regulated set of contracts. We show that a simple explore-then-commit approach, combined with IV estimation and projection, achieves sublinear regret, and that the projection step preserves rates because it is nonexpansive in the relevant metric. In environments with repeated measurements and sufficient ``diversity’’ in signals, strong identification can arise even without deliberate experimentation, implying that compliance and learning need not be in tension when the data-generating process is rich enough.

Throughout, we emphasize what our model does and does not capture. By restricting attention to linear contracts, we abstract from potentially welfare-improving nonlinearities and from richer dynamic incentive schemes. By focusing on a representative agent technology, we suppress heterogeneity that may be central in labor markets and creator ecosystems. These choices are deliberate: our aim is to isolate how a platform’s of incentive weights is shaped by convex effort costs, regulatory constraints, and learning under measurement error. The resulting framework yields a set of sharp predictions—notably, proportional incentives without regulation and projection-style distortions with regulation—that are both interpretable for policy analysis and actionable for mechanism design in large-scale systems. In the next section, we lay out the baseline multitask model and the equilibrium relationship between incentive weights and induced effort, which will allow us to formalize these ideas.


2. Baseline multitask model: agent effort, noisy signals, linear contracts, limited liability; recap of uniformity result β* = θ*/k and its interpretation.

We begin with a static multitask environment that isolates the mapping from to . There are d tasks, indexed by i = 1, …, d. A representative agent chooses an effort vector a ∈ ℝ+d, where ai denotes effort allocated to task i. We interpret the nonnegativity constraint as a limited-liability or no sabotage'' restriction: the agent can increase productive activities but cannot be compelled to exert negative effort on any task. The agent incurs cost $c(a)$, where $c:\mathbb{R}^d_{+}\to \mathbb{R}_{+}$ is strictly convex and differentiable. Our central technological assumption is homogeneity: there exists $k>1$ such that \[ c(\rho a) \;=\; \rho^{k}\, c(a)\qquad \text{for all } \rho>0 \text{ and } a\in \mathbb{R}^d_{+}. \] This captures the idea that scaling up all activities becomes increasingly expensive at a common rate. The parameter $k$ is a reduced-form measure of curvature (orreturns to scale’’ in disutility): higher k means marginal costs rise faster when we proportionally expand effort in all dimensions.

The principal (platform) does not observe a directly. Instead it observes a vector of noisy task signals x ∈ ℝd:
x = a + ε,   𝔼[ε ∣ a] = 0,
so that 𝔼[x ∣ a] = a. Signals may represent measured completion rates, engagement counts, or other operational metrics; the key friction is that they are imperfect proxies for true effort.

The principal derives value from the agent’s (unobserved) effort through an outcome y ∈ ℝ satisfying
𝔼[y ∣ a] = ⟨θ*, a⟩,
where θ* ∈ ℝ+d is a vector of marginal values of each task to the principal. In the static design problem of this section, we treat θ* as known. (In later sections, we allow θ* to be unknown and learned from data generated under the principal’s own contracts.)

The principal restricts attention to of the form
w(x) = ⟨β, x⟩,   β ∈ ℝ+d.
Nonnegativity of β is a natural limited-liability counterpart on the contract side: the platform can reward measured performance but does not impose explicit per-unit penalties. Because 𝔼[x ∣ a] = a, the agent’s expected payment under effort a is 𝔼[w(x) ∣ a] = ⟨β, a; thus, additive measurement noise affects realized payments but not the agent’s incentives.

Given β, the agent chooses a to maximize expected utility
UA(a; β) = 𝔼[w(x) ∣ a] − c(a) = ⟨β, a⟩ − c(a)   subject to a ∈ ℝ+d.
Strict convexity of c implies that the agent’s problem has a unique solution, which we denote by a(β). The necessary and sufficient optimality conditions take the Kuhn–Tucker form

where denotes componentwise multiplication. When the interior condition a(β) ∈ ℝ++d holds, these reduce to the familiar first-order condition c(a(β)) = β: incentives equal marginal costs, task by task.

It is often useful to represent the best response via convex duality. Let c* denote the convex conjugate,
c*(β) = supa ≥ 0{⟨β, a⟩ − c(a)}.
Under differentiability and regularity, the envelope theorem implies a(β) = ∇c*(β) on the region where the maximizer is interior. This dual viewpoint will later be convenient for computation and for understanding how the geometry of the technology shapes constrained optima.

The principal chooses β anticipating the agent response. Since 𝔼[y ∣ a] = ⟨θ*, a and 𝔼[w(x) ∣ a] = ⟨β, a, the principal’s expected utility under β is
u(β) = ⟨θ*, a(β)⟩ − ⟨β, a(β)⟩ = ⟨θ* − β, a(β)⟩.
This expression highlights the basic wedge: raising β increases effort (beneficial) but also increases expected transfers (costly), and both effects operate through the same induced action a(β). In the absence of additional restrictions on β, the principal solves
maxβ ∈ ℝ+du(β).
The purpose of this section is to characterize the solution to this unconstrained benchmark and to interpret its structure.

Our homogeneity assumption implies a particularly sharp result: the principal optimally sets incentives proportional to the value vector θ*, with a single proportionality constant determined by the curvature parameter k. Formally, when the feasible set is all of +d, the unique optimal linear contract is
$$ \beta^* \;=\; \frac{1}{k}\,\theta^*. $$
The result is striking because it holds in high dimensions and does not depend on the correlation structure of the measurement noise ε: only the mean condition 𝔼[x ∣ a] = a matters for expected incentives.

We can see the economic logic most transparently by rewriting the principal’s problem in terms of the action a that the contract induces. When the agent chooses an interior effort vector, implementability is summarized by β = ∇c(a). Substituting into the principal’s payoff yields
u(β) = ⟨θ* − ∇c(a), a⟩.
Now apply Euler’s theorem for k-homogeneous functions: for differentiable c,
⟨∇c(a), a⟩ = kc(a).
Therefore, the principal’s payoff from inducing action a can be written as
θ*, a⟩ − kc(a).
This is a concave maximization over a because c is convex. The first-order condition for the optimal induced action a* is
θ* = k ∇c(a*).
Recalling that the incentive vector required to implement a* is β* = ∇c(a*), we obtain the proportionality rule β* = θ*/k. Strict convexity of c pins down a* uniquely and thus yields uniqueness of β*.

The proportionality factor 1/k admits a simple interpretation. The technology parameter k summarizes how quickly marginal costs grow when we scale effort. A larger k means that inducing additional effort is more expensive at the margin, and the principal responds by uniformly tempering incentives. Importantly, the scaling is : unconstrained optimality does not require tilting incentives away from any particular dimension, even though effort allocation is multidimensional. Instead, the principal ``shrinks’’ the entire vector of task values, preserving relative weights:
$$ \frac{\beta^*_i}{\beta^*_j} \;=\; \frac{\theta^*_i}{\theta^*_j}\qquad \text{for all } i,j. $$
In this sense, the benchmark provides a discipline for discussing distortions: any deviation from proportional weights must come from constraints on β (or from failures of the common-homogeneity assumption), not from multitask considerations per se.

A familiar example illustrates the role of k. If costs are separable power functions, $c(a)=\sum_{i=1}^d a_i^k/k$, then c(a) = (a1k − 1, …, adk − 1), and β = ∇c(a) implies ai(β) = βi1/(k − 1). The principal’s problem aggregates these responses, yet the optimal contract still collapses to β* = θ*/k. Thus the result is not an artifact of quadratic structure; it is a consequence of homogeneity and convexity.

Two limitations are worth flagging before we move to regulated design. First, although noise does not affect expected incentives in our formulation, it can matter for realized payments and risk; we abstract from risk aversion and participation constraints to focus on incentive weights and compliance. Second, the unconstrained optimum β* = θ*/k relies on the availability of all nonnegative weights. In practice, platforms often face caps, monotonicity, parity, or other restrictions that couple components of β and prevent exact proportionality. The next step is therefore to introduce a general convex feasible set of policies and to study how the optimal contract is distorted relative to this benchmark.


3. Regulated contract design: define convex policy sets C (caps, monotonicity, equality/ratio constraints); formulate the principal’s constrained optimization.

The proportionality rule β* = θ*/k is a useful benchmark, but it is often infeasible once we incorporate policy or operational constraints on how incentive weights may vary across tasks. We therefore model regulation by restricting the feasible contracts to a set
C ⊆ ℝ+d,
interpreted broadly to include hard legal constraints (e.g., caps on piece rates), platform policy (e.g., monotone reward schedules across tiers), and fairness or compliance requirements (e.g., equal treatment across protected groups). Throughout, we assume that C is . Convexity is not merely technical: it is a natural reduced form for constraints that are implemented as mixtures of admissible policies (randomization or averaging across business units), and it is the key property that will later allow us to characterize the regulated optimum as a projection in an appropriate geometry.

Given a feasible set C, the principal’s static design problem becomes

where a(β) is the agent’s best response from the previous subsection. When the agent’s optimum is interior, we can equivalently write a(β) = ∇c*(β) and hence
u(β) = ⟨θ* − β, ∇c*(β)⟩.
This representation emphasizes that the regulated problem is a β, but with curvature and cross-task interactions inherited from the technology c (via its conjugate c*). In general, the solution need not preserve the benchmark proportionality across tasks: constraints can force the principal to reallocate incentive intensity across dimensions, creating distortions that are entirely attributable to feasibility rather than to the multitask nature of the agent’s choice problem per se.

Many real constraints take the form of linear inequalities and equalities, which generate closed convex sets automatically. More generally, convexity captures the idea that if two contracts are compliant, then any convex combination is also compliant (e.g., a regulator may approve either of two pay schedules; a platform can interpolate between them by scaling and mixing, or by implementing them for different subpopulations and averaging over time). From a computational standpoint, convexity will allow us to treat using standard tools: KKT conditions, projection operators, and (in the learning section) nonexpansive mappings that preserve estimation rates.

Closedness ensures the optimum is not ``lost at the boundary’’ through limit points that are infeasible. Nonemptiness simply rules out inconsistent regulation. When C is also compact, existence of βreg is immediate by continuity of u(β) (which holds under mild regularity inherited from c). If C is unbounded, existence may still hold because the objective typically becomes unfavorable for very large weights (the principal pays for induced effort), but the exact conditions depend on the growth of c; we return to concrete sufficient conditions in the quadratic and separable-power cases where coercivity is transparent.

To make concrete, it is helpful to list a few constraint classes that appear repeatedly in applications. All examples below are subsets of +d and are convex and closed.


A common regulatory intervention is to cap piece rates to limit ``over-incentivization’’ on any single metric:
$$ C_{\mathrm{cap}} \;=\;\{\beta\in\mathbb{R}^d_{+}:\ 0\le \beta_i \le \overline{\beta}_i \ \forall i\}. $$
Caps may reflect consumer protection (e.g., limits on how strongly a platform may steer sellers via fees), labor regulation (limits on penalties or extreme bonuses), or internal policy (avoid gaming a particular metric). Floors can be handled similarly and are useful when some baseline incentive must be maintained for safety or quality tasks.


Platforms sometimes restrict the overall ``strength’’ of incentives because of budget, risk, or volatility concerns. A simple convex form is an 1 bound,
Cbudget = {β ∈ ℝ+d: ∥β1 ≤ B},
which limits total marginal payment per unit of measured output across tasks. Alternative convex choices include β2 ≤ B (stability in Euclidean norm) or weighted norms that encode heterogeneous compliance costs across tasks.


When tasks correspond to ordered categories (e.g., seniority tiers, quality levels, or funnel stages), a regulator or platform policy may require monotone incentives:
Cmono = {β ∈ ℝ+dβ1 ≥ β2 ≥ ⋯ ≥ βd}.
More generally, if tasks are nodes in a partially ordered set (e.g., content categories with an at least as safe'' ordering), we can impose $\beta_i \ge \beta_j$ for specified pairs $(i,j)$. These are linear inequalities, hence convex. Such constraints formalizeno perverse ranking’’ requirements: higher-quality or safer actions should not be incentivized less than lower-quality ones.


Fairness and compliance policies often require that certain tasks receive identical incentive weights, e.g., because they correspond to the same underlying activity measured in different ways, or because they proxy for attributes where differential incentives are disallowed. Let 𝒢1, …, 𝒢m be a partition of tasks into groups that must be treated equally. Then
Cparity = {β ∈ ℝ+dβi = βj whenever i, j ∈ 𝒢 = 1, …, m}
is an affine subspace intersected with +d, hence closed and convex. Economically, these constraints induce ``pooling’’ of weights and thus force the platform to sacrifice fine task-by-task tailoring.


A softer alternative to equality is to bound relative differences:
$$ \beta_i \le r\,\beta_j \quad \text{and/or}\quad \beta_i \ge \underline{r}\,\beta_j, $$
for specified pairs (i, j) and constants r ≥ 1, $\underline{r}\in(0,1]$. These are still linear inequalities (e.g., βi − rβj ≤ 0), hence convex. Ratio bounds capture policies like no task may be incentivized more than twice as strongly as a safety task'' orweights across demographic proxies cannot differ by more than a fixed multiplicative factor.’’


In many organizations, a policy team may be allowed to adjust incentives only within a limited distance of a previously approved schedule β0:
Cstable = {β ∈ ℝ+d: ∥β − β0∥ ≤ δ}.
Such sets are convex balls intersected with +d. They formalize a form of regulatory inertia or internal governance that prevents abrupt changes which might induce manipulation or distributional shocks.

Most of the preceding examples can be written compactly as
C = {β ∈ ℝ+dAβ ≤ b, Eβ = f},
for appropriate matrices A, E and vectors b, f. This form is flexible enough to encode caps, monotonicity, parity, ratio bounds, and many budget constraints (after introducing slack variables if needed). It also clarifies how different policy requirements interact: combining constraints corresponds to intersecting convex sets, preserving closedness and convexity. In particular, adding regulation can only (weakly) reduce the principal’s maximum value, and it can only (weakly) increase the ``distance’’ between the regulated optimum and the unconstrained benchmark in the sense that β* may lie outside C.

Problem highlights a disciplined notion of distortion. In the benchmark, the platform would like to set β proportional to θ*, uniformly scaled by 1/k. Regulation restricts the feasible directions in which β can move, and thus forces the platform to select the to that proportionality rule given the agent’s technology. Which tasks become under- or over-incentivized depends jointly on (i) which constraints bind and (ii) how incentives translate into effort via a(β). This is precisely why it is valuable to maintain an explicit model of behavior rather than treating a(β) as an exogenous response curve.

At this level of generality, is a constrained nonlinear program. We can still characterize βreg by first-order optimality conditions in the usual variational-inequality form: at an optimum,
⟨∇u(βreg), β − βreg⟩ ≤ 0   for all β ∈ C,
whenever u is differentiable at βreg. However, this characterization is not yet operational until we specify how u behaves, and, crucially, whether u is concave so that these conditions are sufficient. The next subsection therefore specializes to a quadratic benchmark in which a(β) becomes linear and the constrained optimum admits a sharp geometric interpretation as a weighted projection onto C.


4. Closed-form projection theorem (quadratic costs): derive a(β) = Qβ and show constrained optimum is weighted projection; give KKT and examples (caps, monotone, group-equality).

We now specialize to the benchmark in which the agent’s technology is quadratic:

This assumption is not meant to be universally descriptive; rather, it isolates a particularly transparent case in which the mapping from incentives to effort is linear, and the principal’s constrained design problem reduces to a familiar geometric operation. In applications, can be viewed as a second-order (local) approximation to a smooth cost function, or as the exact form arising from Gaussian perturbations of effort around a baseline in certain continuous-control models.

Fix a feasible contract β ∈ C ⊆ ℝ+d. Ignoring the nonnegativity constraint on a for a moment, the agent solves
$$ \max_{a\in\mathbb{R}^d}\ \langle \beta,a\rangle-\frac{1}{2}a^\top Q^{-1}a, $$
with first-order condition β − Q−1a = 0, hence

When we impose a ∈ ℝ+d, the exact solution is characterized by the usual complementarity conditions. A sufficient condition for to remain valid is that Qβ ≥ 0 for all β ∈ C (for example, when Q is diagonal with positive entries, or more generally when Q maps +d into itself). Because the regulated-design insights we emphasize concern how constraints on β distort incentives, we proceed under this maintained condition, so that induced effort is given by throughout the feasible region.

Economically, Q summarizes how responsive each task’s effort is to marginal incentives, allowing for cross-task spillovers. If Q is diagonal, tasks respond independently. If Q has off-diagonal terms, incentives on one task can induce effort shifts in others, a reduced-form way to capture complementarities or joint production.

Substituting into the principal’s expected utility gives

Since Q ≻ 0, the term βQβ is strictly convex, so u(β) is a strictly concave quadratic function of β. The key step is to rewrite by completing the square:
$$ (\theta^*)^\top Q\beta-\beta^\top Q\beta \;=\;-\Big(\beta-\frac{\theta^*}{2}\Big)^\top Q\Big(\beta-\frac{\theta^*}{2}\Big) \;+\;\frac{1}{4}(\theta^*)^\top Q\theta^*. $$
Define the Q-weighted norm vQ2 := vQv. Then

Because the second term in does not depend on β, maximizing u(β) over a feasible set C is equivalent to minimizing a weighted squared distance to the target vector θ*/2.

Equation implies the following sharp characterization: the regulated optimum is the Q-weighted Euclidean projection of θ*/2 onto C,

Uniqueness follows immediately because β − θ*/2∥Q2 is strictly convex in β and C is convex and closed. Two economic points are worth emphasizing.

First, regulation does not create a new ``first-best’’ target; the unconstrained benchmark still points to θ*/2, but feasibility forces the principal to choose the closest admissible contract in a geometry dictated by Q. In this sense, constraints distort incentives only through a geometric truncation/averaging operation.

Second, the welfare loss from regulation admits a literal distance interpretation. Evaluating at βreg yields
u(βreg) = ∥θ*/2∥Q2 − ∥βreg − θ*/2∥Q2,
so the reduction in attainable value relative to the unconstrained optimum is exactly the squared Q-distance from θ*/2 to the feasible set.

The projection formulation also yields a convenient optimality condition. Let NC(β) denote the normal cone to C at β. Then βreg solves if and only if

Equivalently, for all β ∈ C,

which is the variational-inequality form of the projection property.

When C is described by linear inequalities and equalities,
C = {β ∈ ℝdAβ ≤ bEβ = fβ ≥ 0},
the KKT conditions for take an explicit multiplier form: there exist μ ≥ 0 (for Aβ ≤ b), ν (for Eβ = f), and s ≥ 0 (for β ≥ 0) such that

These conditions make clear how regulation ``tilts’’ the contract: binding constraints contribute shadow prices that shift βreg away from the unconstrained target θ*/2.

The projection viewpoint is especially useful because many canonical constraint classes correspond to well-studied projection operators.


If $C=\{\beta\in\mathbb{R}^d_+:\ \beta_i\le \overline{\beta}_i\}$ and Q = I, then reduces to componentwise clipping:
$$ \beta^{\mathrm{reg}}_i\;=\;\min\Big\{\overline{\beta}_i,\ \max\{0,\ \theta^*_i/2\}\Big\}. $$
Thus caps bind exactly on tasks with large marginal value θi* (relative to the cap), producing a transparent notion of distortion: high-value dimensions are truncated, while low-value dimensions remain at their unconstrained levels (here, simply θi*/2). When Q ≠ I but diagonal, Q = diag(qi), the same clipping result holds because the objective is separable:
minβ ∈ C ∑iqi(βi − θi*/2)2,
so the weights qi affect the value but not the argmin under pure box constraints.


Suppose tasks are ordered and C = {β ∈ ℝ+dβ1 ≥ β2 ≥ ⋯ ≥ βd}. With Q = I, is exactly the isotonic regression of the vector θ*/2 under a nonincreasing shape restriction (and truncation at zero). The solution pools adjacent coordinates whenever the target violates monotonicity: if θi*/2 < θi + 1*/2, then the projection forces βireg and βi + 1reg toward a common level. With diagonal Q = diag(qi), we obtain isotonic regression, in which coordinates with larger qi are penalized more heavily for deviating from θi*/2; operationally, the pooled level becomes a weighted average. This is attractive in practice because it yields fast algorithms (e.g., the pool-adjacent-violators method) and a clear policy interpretation: monotonicity constraints create local averaging of incentives across tiers.


Let 𝒢1, …, 𝒢m be task groups required to share a common weight. Then C imposes βi = βj for all i, j ∈ 𝒢, together with β ≥ 0 (and possibly other constraints). In the simplest case with Q = I and no additional constraints, the projection averages within each group:
$$ \beta^{\mathrm{reg}}_i\;=\;\max\Big\{0,\ \frac{1}{|\mathcal{G}_\ell|}\sum_{j\in\mathcal{G}_\ell}\frac{\theta^*_j}{2}\Big\}\qquad \text{for } i\in\mathcal{G}_\ell. $$
Thus parity requirements induce explicit pooling: the platform replaces heterogeneous task-by-task marginal values by group averages, reflecting the informational and allocative cost of ``equal treatment.’’ If Q = diag(qi), the group value becomes a weighted average, j ∈ 𝒢qj(θj*/2)/∑j ∈ 𝒢qj, emphasizing again that the technology determines which deviations from the target are more costly.

In the quadratic benchmark, computing βreg is no harder than solving a strictly convex quadratic program—and in many common cases (boxes, monotonicity, parity with diagonal Q) it reduces to closed-form clipping or fast projection routines. Conceptually, the result also tells us what regulation and do in this environment: it cannot change the fundamental target θ*/2, but it can force the platform to approximate that target under a constraint-induced geometry. This precise geometric structure will be the template we generalize in the next section, where quadratic costs are replaced by general k-homogeneous technologies and Euclidean projections become Bregman projections induced by c*.


5. Beyond quadratic: Bregman-projection characterization under k-homogeneous costs; existence/uniqueness; when numerical convex optimization is required; approximation under ε-homogeneity.

We now drop the quadratic benchmark and return to the maintained primitives: c : ℝ+d → ℝ+ is strictly convex, differentiable, and k-homogeneous with k > 1. The central benefit of this class is that it still delivers a clean representation of behavior. Let the convex conjugate of c be
c*(β) = supa ≥ 0 ⟨β, a⟩ − c(a).
Under standard regularity (e.g., c essentially smooth on ++d), the agent’s best response can be written as

with the usual caveat that corners may arise when some components of β are zero and the nonnegativity constraint binds. Economically, c* summarizes the ``supply side’’ of effort in : it tells us how the agent converts marginal rewards β into a chosen effort vector.

A key structural implication is that homogeneity is preserved (up to a change of degree) under conjugacy. If c is k-homogeneous, then c* is k*-homogeneous with
$$ k^*\;=\;\frac{k}{k-1}. $$
By Euler’s theorem for homogeneous functions, for all β ∈ ℝ++d we have

Substituting into the principal’s expected utility gives

where the second equality uses . This expression is the natural analog of ``completing the square’’ in the quadratic case: the technology enters only through c* and its gradient.

We view the regulated design problem as

where C ⊆ ℝ+d is nonempty, closed, and convex. Two observations are useful for practice. First, because u(β) is continuous whenever c* is differentiable, an optimum exists whenever C is compact. Compactness is not merely technical: in many policy environments, caps, budget constraints, or ``no extreme incentives’’ rules naturally impose boundedness. When C is closed but unbounded, existence can still be ensured by mild coercivity conditions (for instance, c*(β) → ∞ sufficiently fast along rays in C so that the negative term in dominates), but the compact case is the cleanest statement.

Second, strict convexity of c implies differentiability and strict convexity of c* on the interior of its domain, which typically yields of u over ++d for common classes (notably separable power costs and many CES-type technologies). In those cases, the maximizer over convex C is unique. We emphasize, however, that full generality requires care: without additional curvature conditions on c* (e.g., positive definiteness of its Hessian on the relevant region), u may be concave but not strictly concave, and uniqueness can fail on faces of C.

Even when a closed form is unavailable, the optimal contract is characterized by a familiar first-order condition. Let NC(β) denote the normal cone to C at β. If u is differentiable at βreg, then βreg solves if and only if

equivalently −∇u(βreg) ∈ NC(βreg). Differentiating and writing Hc*(β) for the Hessian of c* yields

The economic content of – mirrors the quadratic projection story: the unconstrained ``target’’ points toward θ*, but regulation introduces normal-cone forces that shift the chosen β until the marginal gain direction is orthogonal (in a generalized sense) to all feasible deviations.

The quadratic benchmark delivered a Euclidean projection in a Q-weighted geometry. With general k-homogeneous costs, the geometry becomes . A convenient choice of potential is

which is convex and differentiable wherever c* is. The associated Bregman divergence is
Dψ(β, z) = ψ(β) − ψ(z) − ⟨∇ψ(z), β − z⟩.
In many canonical specifications, maximizing u(β) over C is equivalent to a of a scaled value vector onto C:

This representation should be read as a disciplined generalization of the projection theorem: regulation chooses the feasible contract closest to the unconstrained benchmark θ*/k, but ``closest’’ is measured in the divergence generated by the agent’s technology, not necessarily by squared Euclidean distance. The substantive message is that the policy-induced distortions (pooling, truncation, averaging) persist, yet they occur in a nonlinear coordinate system aligned with effort supply.

We also flag a limitation: is most transparent when c* has a separable or otherwise well-structured form so that ψ is monotone and the KKT system coincides with Bregman projection optimality conditions. For an arbitrary strictly convex c, the variational inequality is always valid, while the exact reduction to may require additional regularity (or may only hold locally).

From an implementation perspective, the design problem remains tractable: under the maintained conditions ensuring concavity of u, is a concave maximization over a convex set, hence solvable by standard methods. In practice we can proceed in at least three ways.

Closed forms are the exception rather than the rule: beyond quadratic and a few separable cases (where constraints like boxes or group equalities can still be handled almost analytically), regulation typically forces a numerical step. The model’s practical takeaway is therefore not we can always write $\beta^{\mathrm{reg}}$ explicitly,'' but ratherwe can always compute βreg as a well-behaved convex optimization problem once the technology is specified.’’

Finally, we briefly discuss robustness when homogeneity is only approximate. In many environments, costs are not exactly k-homogeneous, but may satisfy an ε-homogeneity condition on the relevant region of efforts: for all ρ ≥ 1 and a in a set of interest,

This formulation captures technologies that are ``locally power-like’’ up to small multiplicative distortions. Under and standard smoothness, the Euler identity used to obtain clean scaling relations becomes approximate, and the unconstrained benchmark β = θ*/k is no longer exact. Nonetheless, the same logic implies a quantitative stability result: the optimal contract under approximate homogeneity remains close to the homogeneous prediction, with deviations on the order of ε once we restrict attention to bounded C (so that the relevant gradients are Lipschitz on C). In words, small departures from exact returns-to-scale do not overturn the geometric interpretation of regulation; they perturb the target and the projection geometry slightly, but do not change the fundamental fact that constraints operate by pulling incentives toward a feasible approximation of an economically meaningful benchmark.

This robustness perspective also guides empirical work. When a platform’s internal model of effort costs is only an approximation, insisting on exact closed forms is less important than ensuring that the regulated design problem is stable and computable. In the next section, we turn to the complementary practical challenge: even if the structure above is correct, the marginal value vector θ* is typically unknown and must be learned from noisy measurements under precisely the same policy constraints embodied in C.


6. Learning θ* under measurement error with policy constraints: IV/GMM estimator; plug-in projection ˆβ; finite-sample error bounds; weak-IV diagnostics tied to eigenvalues of design matrices.

We now treat the marginal value vector θ* as unknown. The principal observes only posted contracts and noisy performance data, and must infer θ* while remaining compliant with the regulatory set C. The key econometric difficulty is that the natural regressor—the observed signal vector x—is contaminated by measurement error, so na"{}ve regression of outcomes on x is generally biased (attenuation and more complex distortions in the multitask setting). Our maintained structure delivers a clean fix: because the contract weights β are chosen by the principal and affect effort, they serve as natural instruments for x.

Suppose we collect T observations (βt, xt, yt) with βt ∈ C. Let at = a(βt) denote the (unobserved) induced effort, and write the measurement and outcome equations as

where εt ∈ ℝd and ηt ∈ ℝ are mean-zero noises conditional on at (and hence conditional on βt), with subgaussian tails of scale σ.
If we regress yt on xt, then xt contains εt which is mechanically correlated with the regression residual yt − ⟨θ*, xt⟩ = ηt − ⟨θ*, εt, producing bias even when the principal’s choices are exogenous. This is the canonical ``errors-in-variables’’ problem.

The contract weights βt affect at through the agent’s best response, but (by design and by timing) are independent of the realized measurement noise. This yields an orthogonality condition:

Equivalently, rearranging gives the linear IV identity

The economic interpretation is simple: we use variation in incentives—which shifts effort but is not contaminated by measurement noise—to identify the mapping from effort to value.

Stack the data into matrices
$$ B_T \;=\;\begin{bmatrix}\beta_1^\top\\ \vdots\\ \beta_T^\top\end{bmatrix}\in\mathbb{R}^{T\times d}, \qquad X_T \;=\;\begin{bmatrix}x_1^\top\\ \vdots\\ x_T^\top\end{bmatrix}\in\mathbb{R}^{T\times d}, \qquad Y_T \;=\;\begin{bmatrix}y_1\\ \vdots\\ y_T\end{bmatrix}\in\mathbb{R}^{T}. $$
The sample analog of is BTYT ≈ BTXTθ*, motivating the estimator

when BTXT is invertible. This is the exactly-identified IV estimator (and can be viewed as a special case of GMM with moment vector $\frac{1}{T}\sum_{t=1}^T \beta_t(y_t-\langle \theta,x_t\rangle)$).

Two requirements deserve emphasis.
First, is built into our timing: conditional mean-zero noise and the principal’s commitment to βt before (xt, yt) are realized deliver .
Second, is nontrivial under constraints: BTXT must be well-conditioned, which requires that the sequence {βt}t ≤ T span enough directions that translate into meaningfully different effort choices (and hence different xt).

To understand how quickly we learn, it is useful to express the estimation error in a form that isolates relevance. Define the sample ``first-stage’’ matrix
$$ M_T \;:=\;\frac{1}{T}B_T^\top X_T. $$
Under , we can write
$$ \frac{1}{T}B_T^\top Y_T \;=\; \frac{1}{T}B_T^\top A_T\,\theta^* \;+\; \frac{1}{T}B_T^\top \eta, \qquad M_T \;=\; \frac{1}{T}B_T^\top A_T \;+\; \frac{1}{T}B_T^\top E, $$
where AT stacks at, E stacks εt, and η stacks ηt. Substituting into yields (when invertible)

The two terms correspond to outcome noise and measurement noise, respectively. Both are amplified by MT−1, which is exactly where weak-IV problems enter: when MT has a small minimum singular value, the inversion step magnifies sampling fluctuations.

A convenient (and operational) high-probability bound can be stated in terms of σmin(MT). If βt2 ≤ b almost surely and (εt, ηt) are conditionally σ-subgaussian, then standard matrix concentration implies that with probability at least 1 − δ,

for universal constants c1, c2 (the second term reflects the propagation of measurement error through θ* in ). The important comparative static is transparent: holding noise and scale fixed, learning speed is governed by the of the design induced by the feasible and actually deployed contracts.

Once we have θ̂T, we can translate it into a compliant contract by applying the same mapping from values to incentives used in the complete-information benchmark, followed by projection onto C. In the simplest homogeneous ``uniformity’’ case, the unconstrained target is θ*/k, so we define

In the quadratic benchmark with known Q ≻ 0, the target is θ*/2 in the Q-metric, giving

Both constructions enforce regulation automatically: every iterate is in C regardless of estimation error.

The payoff from this estimate then project'' approach is stability. Euclidean projection onto a closed convex set is nonexpansive, hence \begin{equation}\label{eq:projection_lipschitz} \left\|\widehat{\beta}_T-\beta^{\mathrm{reg}}\right\|_2 \;\le\; \frac{1}{k}\left\|\widehat{\theta}_T-\theta^*\right\|_2, \end{equation} when $\beta^{\mathrm{reg}}=\mathrm{Proj}_C(\theta^*/k)$. In the quadratic case, the same logic holds in the weighted norm: \begin{equation}\label{eq:projection_lipschitz_Q} \left\|\widehat{\beta}_T-\beta^{\mathrm{reg}}\right\|_{Q} \;\le\; \frac{1}{2}\left\|\widehat{\theta}_T-\theta^*\right\|_{Q}. \end{equation} Thus, regulation does notblow up’’ statistical error; it only truncates or pools incentives according to the geometry of C.

Equation makes clear that the principal’s statistical problem is largely summarized by $\sigma_{\min}(M_T)=\sigma_{\min}(\frac{1}{T}B_T^\top X_T)$. In practice, this quantity (or its close cousins such as $\sigma_{\min}(\frac{1}{T}B_T^\top B_T)$ in simplified first-stage approximations) plays the role of a weak-IV diagnostic: if it is small, confidence regions for θ* are necessarily wide, and plug-in contracts will be imprecise.

This perspective also clarifies how compliance constraints interact with learning. If C is narrow—for instance, if regulation forces many components of β to be equal, caps all components tightly, or limits variation to a low-dimensional face—then BT may have low effective rank, and σmin(MT) can remain small even as T grows. Economically, regulation can inadvertently reduce the ``experimental variation’’ in incentives available for identification. The appropriate response is not to violate constraints, but to : when C permits, the principal can choose a sequence of compliant contracts that spans the feasible tangent directions (e.g., rotating weight across groups within allowed caps, or cycling among extremal points of C). When C truly collapses the feasible set to a low-dimensional subspace, then no estimator can recover all coordinates of θ*; in that case the relevant object becomes the projection of θ* onto the identifiable subspace, and welfare comparisons should be made accordingly.

The offline IV/GMM formulation delivers three practical takeaways. First, measurement error in task signals is not a nuisance that vanishes with sample size if we use the wrong estimator; it requires instruments. Second, compliance is not merely a constraint in the optimization stage: it shapes the instrument set and therefore the speed at which values can be learned. Third, projection-based implementation is robust: once we have a value estimate, mapping it into C is stable and interpretable, and finite-sample contract error inherits the same eigenvalue-driven rates as value estimation. These observations set the stage for the dynamic problem, where the principal must βt ∈ C while simultaneously using the resulting data to learn θ*.


7. Online learning with compliance: projected explore-then-commit regret bound; extension to repeated measurements + diversity (projection preserves exploration-free guarantees).

In the dynamic problem the principal must choose a (βt)t ≤ T with each βt ∈ C, and faces a familiar feedback loop: contracts generate data, data determine estimates of θ*, and estimates determine future contracts. A useful organizing device is to benchmark against the (unknown) βreg = βreg(θ*), and measure how far our deployed sequence is from that target. In the quadratic benchmark, this distance has a direct welfare meaning because u(β) is a concave quadratic; indeed,
$$ u(\beta^{\mathrm{reg}})-u(\beta) \;=\; \|\beta-\beta^{\mathrm{reg}}\|_{Q}^{2} \qquad \text{when }c(a)=\tfrac12 a^\top Q^{-1}a \text{ and } \beta^{\mathrm{reg}}=\mathrm{Proj}_{C}^{Q}(\theta^*/2). $$
Accordingly, we track the squared-distance regret

or, in the quadratic case, its natural weighted analog t ≤ Tβt − βregQ2. This choice keeps the economics transparent: regret is exactly the cumulative misalignment between implemented incentives and the best compliant incentives.

The simplest way to manage the feedback loop is to decouple the two roles of βt:
during an initial phase, we choose βt primarily to make the IV/GMM design well-conditioned; during , we choose βt to be close to βreg given our current estimate. Because compliance is nonnegotiable, both phases must use only contracts in C.

Concretely, fix an exploration length τ and select a finite set of compliant contracts {β̄(1), …, β̄(m)} ⊆ C that spans the feasible directions we can vary.
We then cycle or randomize among these contracts for t ≤ τ, collect (βt, xt, yt), compute an IV estimate θ̂τ, and finally for t > τ to the projected plug-in contract

Two features matter. First, projection makes the algorithm at every round. Second, because projection is nonexpansive in the appropriate geometry, estimation error in θ̂τ translates into contract error in β̂ without amplification.

Exploration is not about choosing ``high-variance’’ actions in the abstract; it is about increasing the minimum singular value of the empirical first-stage matrix (the relevance object). Let
$$ M_\tau \;:=\;\frac{1}{\tau}B_\tau^\top X_\tau, \qquad\text{where }B_\tau=\begin{bmatrix}\beta_1^\top\\ \vdots\\ \beta_\tau^\top\end{bmatrix},\; X_\tau=\begin{bmatrix}x_1^\top\\ \vdots\\ x_\tau^\top\end{bmatrix}. $$
All IV-style error bounds scale like 1/σmin(Mτ). Thus, the principal’s exploration problem is naturally a problem: pick a compliant sequence β1, …, βτ ∈ C that makes σmin(Mτ) large. Regulation can severely restrict this: if C forces βt to lie on a low-dimensional face, then no amount of time yields full-rank Mτ in d. The best we can do is to maximize eigenvalues in the identifiable subspace, and accept that learning (and hence contract adaptation) is only possible along those directions.

The next statement summarizes the baseline performance one gets from combining (i) an exploration design that ensures relevance and (ii) a projection-based commit step.

The argument is a direct decomposition. During exploration, we may incur regret as large as t ≤ τβt − βreg2 ≤ 4b2τ simply because we are not optimizing; this is the . During exploitation, βt = β̂ for all t > τ, so
$$ \sum_{t=\tau+1}^T \|\beta_t-\beta^{\mathrm{reg}}\|^2 \;=\; (T-\tau)\,\|\widehat{\beta}-\beta^{\mathrm{reg}}\|^2. $$
By nonexpansiveness of projection, β̂ − βreg∥ ≤ (1/k)∥θ̂τ − θ* in the uniformity geometry (and similarly in Q-norm in the quadratic case). Hence exploitation regret is controlled by the IV estimation error, which scales like $\widetilde{O}(\sqrt{d/\tau})$ under the stated eigenvalue growth. Choosing $\tau\asymp \sqrt{T}$ balances the linear exploration cost τ against the inverse exploration benefit T/τ, producing the $\widetilde{O}(d\sqrt{T})$ rate. Economically, we trade off short-run inefficiency (cycling through informative but potentially suboptimal incentives) against long-run efficiency (quickly converging to the best compliant contract).

The explore–then–commit template is conservative: it treats relevance as something created only by varying βt. In many platforms, however, we observe of performance within a round (or across closely related channels), and these can supply instruments even when incentives do not vary much. A canonical structure is: in round t we observe two conditionally independent signal vectors
xt(1) = at + εt(1),   xt(2) = at + εt(2),
with εt(1) ⟂ εt(2) given at. Then xt(2) is a valid instrument for xt(1) in the moment condition 𝔼[xt(2)(yt − ⟨θ, xt(1)⟩)] = 0, because it is correlated with effort at but orthogonal to the measurement error in xt(1). More generally, averaging many within-round measurements yields a low-noise proxy t whose covariance is driven by heterogeneity in at rather than by our contract variation.

What, then, replaces exploration? A condition: the induced efforts (across agents and environments) must exhibit sufficiently rich variation even when we are exploiting. Formally, one convenient requirement is a uniform lower bound λ0 > 0 such that the design matrix based on the repeated-measurement instrument grows linearly,
$$ \sigma_{\min}\!\left(\sum_{t=1}^T \widetilde{x}_t \widetilde{x}_t^\top\right)\;\ge\; \lambda_0 T \quad\text{with high probability.} $$
Under such a condition, we can update θ̂t online using the repeated-measurement IV moments while choosing βt (e.g., βt = ProjC(θ̂t − 1/k) each round). The resulting contract error shrinks like $\widetilde{O}(\sqrt{d/(\lambda_0 T)})$, and summing βt − βreg2 over time yields an exploration-free regret bound of order (d/λ02) (logarithmic dependence on T is absorbed by the tilde). The substantive message is that when the environment supplies exogenous variation in effort (diverse agents, diverse tasks, or repeated independent measurement), : we can remain near the best compliant contract throughout and still identify θ*.

Across both regimes—explicit exploration or diversity-driven learning—the projection step plays a benign role. It never harms feasibility, and it does not worsen statistical rates because it is Lipschitz in the relevant norm. In effect, regulation enters the learning problem through two channels: it can restrict instrument variation (making eigenvalues small), but it also provides a stable map from estimated values to implemented incentives. This is precisely the tradeoff the model is designed to illuminate: constraints may reduce what we can learn from our own interventions, yet the best-response geometry ensures that once we learn, compliance can be enforced without amplifying error.


8. Implementation notes: choosing C from policy; computational routines for projections (QP for quadratic case; convex solvers for Bregman case); monitoring and auditing instruments.

In applications the most delicate modeling step is often not the cost function, but the translation of policy constraints into a set C ⊆ ℝ+d that can be enforced mechanically. The guiding principle is to express compliance requirements as linear (in)equalities whenever possible, both because these are auditable and because they preserve fast projection routines. Common examples include: (i) and on incentives, $\underline{\beta}\le \beta\le \overline{\beta}$ (box constraints); (ii) or limits, 1β ≤ B; (iii) and constraints across tasks, βi ≥ βj (e.g., do not reward clicks more than verified purchases); and (iv) or requirements that force equal weights across a subset of tasks, βi = βj for i, j in a regulated group. Each of these yields a polyhedron
C = {β ∈ ℝ+d : Aβ ≤ b, Eβ = e},
which is closed and convex, and is directly consumable by modern quadratic and conic solvers.

Two practical lessons are worth stating explicitly. First, if C is unbounded (for instance, only nonnegativity constraints), then existence of an optimum can fail under some technologies; in deployed systems this is typically addressed by adding explicit caps $\overline{\beta}$ justified as risk controls, payment predictability, or anti-manipulation safeguards. Second, when policy creates a low-dimensional feasible region (e.g., many equalities), the platform should treat the resulting loss of identifiability as a , not a bug to patch: it clarifies which aspects of θ* are learnable and which are fundamentally pooled by design.

When costs are quadratic, regulated optimization and compliant learning repeatedly call a weighted projection of the form
ProjCQ(z) ∈ arg minβ ∈ C(β − z)Q(β − z),   Q ≻ 0.
Operationally this is a strictly convex quadratic program (QP). If C is specified by linear constraints, we can write it in standard QP form and solve it with off-the-shelf routines (interior-point methods for high accuracy; operator-splitting methods such as ADMM for speed and warm-starting). Warm-starting is particularly valuable online: successive targets zt (coming from successive estimates θ̂t) move gradually, so reusing the previous primal–dual iterate can reduce solve times by an order of magnitude.

A simple computational trick reduces weighted projection to Euclidean projection. Let L be such that Q = LL (e.g., Cholesky). Then
(β − z)Q(β − z) = ∥L(β − z)∥22.
If we change variables γ = Lβ, we can re-express the projection as a Euclidean projection onto the transformed set LC := {Lβ : β ∈ C}. This is not always simpler (it can densify constraints), but it is useful when Q is diagonal or sparse, and it clarifies numerical conditioning: badly scaled Q produces poorly scaled QPs, so a preconditioning step (or simply rescaling task units) can materially improve stability.

In many policy-driven cases C is an intersection of simple sets (boxes, monotone cones, affine subspaces). Then one can avoid general-purpose QP solvers and use specialized projection algorithms such as Dykstra’s method or alternating projections, which are easy to implement, easy to audit, and often fast enough for real-time contract updates. The platform-facing implication is that a “projection-friendly” description of regulation (e.g., constraints that decompose across coordinates or across a small number of groups) is not merely a convenience; it is an enabler of frequent, transparent compliance updates.

For general k-homogeneous costs, the regulated solution is typically computable but not closed form. The key object becomes a Bregman divergence Dψ(β, z) for a convex potential ψ derived from c* (for instance, ψ(β) = (k/(k − 1))c*(β) in separable power families). Implementationally, computing
arg minβ ∈ CDψ(β, z)
is a convex optimization problem; hence the main questions are convex solver to use and how to maintain reliability under changing data.

When C is polyhedral and ψ is smooth and strongly convex on ++d, a robust approach is to run a mirror-descent (or dual-averaging) routine with projections onto C in an easy geometry. In the special but important separable case $c(a)=\sum_i \frac{1}{k}a_i^k$ (up to scaling), c*(β) is also separable and c* is inexpensive coordinatewise; then each iteration requires only evaluating ψ and projecting onto C (often Euclidean). This is precisely the situation in which Bregman methods shine: the “curvature” is absorbed into the mirror map rather than into the projection operator. Conversely, if c* is not separable or has a dense Hessian, we should expect heavier computation and should consider solving the convex program directly (e.g., via a second-order conic solver) at lower frequency, while interpolating contracts between solves.

A limitation we emphasize in deployments is . Many ψ induced by c* become steep near βi ↓ 0, which is economically natural (zero incentives can collapse effort) but numerically tricky. In practice we can add a small lower bound β ≥ ε1 if policy permits, or use barrier-aware solvers and explicit tolerances. The broader point is that the constraint set C should be chosen with both policy and numerics in mind: extremely sharp corners and near-infeasible equalities are a recipe for brittle optimization and thus brittle compliance.

Because learning hinges on IV/GMM relevance, we recommend treating instrument diagnostics as a first-class compliance object. Concretely, the platform can track the smallest singular value (or a regularized surrogate) of the accumulated first-stage matrix, e.g.,
λmin (BtXt)  or  λmin (∑s ≤ tss) in repeated-measurement designs.
These quantities can be logged in real time, thresholded, and audited ex post. If the diagnostic falls below a policy-set floor, the system can automatically trigger an “identification mode”: either lengthen exploration, widen the randomized support of contracts within C, or increase reliance on repeated-measurement instruments.

Validity is subtler than relevance. Using βt as an instrument rests on the exclusion restriction that βt affects yt only through effort at, not through direct channels (e.g., changing who participates, changing demand, or changing downstream moderation). While this is a modeling assumption, we can still impose operational checks: pre-register which product changes are allowed during learning windows; run placebo moments (does βt predict outcomes that should be unaffected?); and use holdout policies where some cohorts face fixed contracts to detect drift in the environment. These checks do not “prove” validity, but they create an auditable trail that the identifying assumptions were not obviously violated by concurrent interventions.

Repeated-measurement identification depends on conditional independence of measurement errors across channels. In platform terms, this is an engineering requirement: measurement pipelines should be de-correlated (different logging systems, different sampling noise, different post-processing) so that errors are not shared. When this is infeasible, we can sometimes construct approximately independent views by splitting traffic, time-slicing logs, or using separate raters. The cost is redundancy; the benefit is that the platform can learn while staying close to the best compliant contract, rather than paying an exploration tax.

Even when independence holds, instrumentation can be weakened by strategic behavior: if agents can differentially manipulate one measurement channel, the repeated-measurement IV becomes endogenous. This suggests an institutional complement to the formal algorithm: periodically rotate or refresh measurement channels, maintain tamper-evident logs, and audit for discontinuities around known thresholds. In our setting these are not merely “trust and safety” add-ons; they are part of the statistical apparatus that makes learning possible.

A practical deployment typically follows a loop: (i) fix a policy-derived C with a documented rationale and an auditable representation; (ii) run an estimator (IV/GMM or repeated-measurement IV) with explicit relevance diagnostics and regularization; (iii) map θ̂ to an implementable contract via a projection routine whose numerical tolerances are logged; and (iv) monitor both economic outcomes (payouts, participation) and identifying health (eigenvalues, residual moments). The central implementation message is that : the set C determines which contracts are legal, which directions are identifiable, and which projections are computationally stable. Designing C is therefore not a purely legal step; it is part of mechanism design, econometrics, and systems engineering all at once.


9. Extensions and discussion: heterogeneous k, delayed outcomes, partial participation; empirical pathways and what data platforms must collect.

Our baseline analysis isolates a clean geometric message: once policy constraints are encoded as a closed convex set C, the economic problem of ``regulated contract design’’ reduces to a projection (Euclidean under quadratic costs; Bregman in more general homogeneous technologies), and learning can be layered on top via IV/GMM without breaking compliance. In practice, three frictions are especially salient for platforms: (i) agents differ in their effective returns-to-scale parameter k (and more broadly in cost curvature); (ii) outcomes y arrive with delay; and (iii) participation is endogenous. Each friction preserves part of the structure, but each also changes what must be modeled, randomized, and logged.

A convenient interpretation of k is as an ``elasticity of marginal cost under scaling.’’ Empirically, this elasticity can differ across agents (creators with different production processes), across tasks (some tasks saturate quickly), or across time (seasonality, policy shocks). A simple way to capture this is to index costs by type τ, writing cτ as strictly convex, differentiable, and kτ-homogeneous. If the principal observes a type proxy s (e.g., a cohort label, tenure bin, or verified business status) and is permitted to condition contracts on it, then the regulated problem decomposes by segment: for each s we solve
maxβ ∈ C(s)us(β) = ⟨θs* − β, as(β)⟩,   as(β) = ∇cs*(β),
and implement βreg(s) via the corresponding projection map (weighted in the quadratic benchmark, or Bregman in the general homogeneous case). Conceptually, regulation then acts each segment, while heterogeneity is handled by .

When type is unobserved (or conditioning is restricted by policy), the platform effectively chooses a single β to maximize an objective 𝔼τ[uτ(β)]. Two points matter. First, the resulting objective typically remains concave in β under the same primitives (since it is an expectation of concave functions), so existence and uniqueness under compact C are not the obstacle; rather, the obstacle is that the simple scaling rule β = θ*/k no longer applies with a single k. Second, the welfare consequences of ``one-size-fits-all’’ incentives become legible: if high-k types are more costly to motivate, a contract calibrated to low-k types may induce excessive payouts without proportional benefit for high-k types, while a contract calibrated to high-k types may under-incentivize low-k types. In operational terms, this is a strong argument for (policy-permitting) coarse segmentation, because it can dramatically reduce the efficiency loss while preserving auditability.

A conservative alternative is robust design over a plausible set of curvatures, e.g. k ∈ [kmin, kmax]. In the unconstrained idealization one would choose β closer to θ*/kmax, reflecting the fact that over-incentivizing is costlier than under-incentivizing when the platform is uncertainty-averse about curvature. Under regulation, the same logic translates into choosing β as the maximizer of mink ∈ [kmin, kmax]uk(β) over C, which is still a tractable concave–convex saddle problem but typically requires numerical methods.

Many economically meaningful outcomes are delayed: refunds arrive weeks later, long-run retention is only observed after a horizon, and quality audits take time. Delays do not change the static regulated optimum, but they do change the learning problem because the moment condition
𝔼 [βt(yt − ⟨θ*, xt⟩)] = 0
cannot be evaluated at time t if yt is not yet observed. The practical implication is that the platform must run a learning pipeline with : contracts βt ∈ C are posted continuously, while the estimator of θ* is updated whenever outcomes mature. Algorithmically, this corresponds to online learning with delayed feedback; in regret bounds, one typically pays an additive (or multiplicative) penalty in the delay magnitude, reflecting that decisions are made using stale sufficient statistics.

Two mitigations are worth highlighting. First, many platforms observe intermediate proxies quickly (e.g. early engagement or complaint rates). Even if these are imperfect substitutes for y, they can serve as auxiliary moments or as triggers for safety constraints that shrink C temporarily (a ``circuit breaker’’) while waiting for final outcomes. Second, delays interact with compliance in an important way: when policy requires immediate enforcement (e.g. payout caps), we can remain compliant because projection depends only on the current θ̂, not on observing y instantly. The cost is slower adaptation, not violation of constraints.

Our baseline model implicitly assumes a fixed agent who always supplies effort. Platforms instead face responses: agents can choose whether to participate, how much to participate, and which tasks to attempt. A parsimonious extension adds an outside option r and a participation indicator p ∈ {0, 1}. The agent participates if the value of the effort problem exceeds r:
p(β) = 1 {maxa ≥ 0β, a⟩ − c(a) ≥ r} = 1 {c*(β) ≥ r}.
This highlights an economically intuitive channel: increasing β raises not only induced effort a(β) but also the ``rent’’ c*(β), hence participation.

Selection has two consequences. On the objective side, the principal now maximizes expected utility multiplied by participation, e.g. 𝔼[p(β)⟨θ* − β, a(β)⟩] (possibly minus fixed onboarding costs). Even if u(β) is concave, the product with a discontinuous (or sharply changing) p(β) can create non-smoothness and local kinks, making the geometry less projection-like unless one smooths participation (e.g. random r with a continuous distribution). On the identification side, using βt as an instrument becomes more delicate because βt changes appears in the data. Then βt may correlate with unobserved determinants of yt through composition, violating the exclusion restriction in a literal sense. This is not a fatal flaw, but it forces an explicit design choice: either (i) model participation and estimate a joint system (participation equation plus outcome equation), or (ii) restrict attention to regimes where participation is stable (e.g. for incumbents with low churn), treating entry/exit as an additional outcome to be monitored rather than a confounder to be ignored.

A practical compromise is to treat participation as a first-class endogenous variable and to collect moments for it. For example, if pt is observed, one can augment the GMM system with instruments for participation (or impose a policy that randomizes a participation-neutral component of the contract), and test whether residual moments remain stable across participation strata. This does not eliminate selection, but it makes the modeling assumptions auditable.

These extensions make clear that the main empirical bottleneck is rarely the optimization routine; it is whether the platform has the right data to support identification under its own policy constraints. At minimum, the platform should log: the posted contract βt and the exact constraint regime in force (a versioned representation of C); the raw task signals xt (ideally at the most disaggregated level, before post-processing); the realized outcome yt with timestamps to measure delays; and agent identifiers so repeated observations can be linked. To address heterogeneity, the platform must record the segmentation variables it is allowed to condition on, as well as protected-class indicators when fairness constraints depend on them (with appropriate governance). To address participation, it must log offer exposure and acceptance (who saw which βt, who opted in, who churned), not just outcomes for participants. To address delayed outcomes, it must log maturation windows and backfills (late-arriving y updates), since these determine which observations are ``final’’ at any point in time.

Finally, if the platform relies on IV variation induced by randomized or quasi-random changes in βt, it must log the randomization protocol (seed, assignment unit, stratification variables) and all concurrent product interventions that could create direct channels from βt to yt. Without this provenance, one can compute an estimator, but one cannot defend the identifying assumptions when policy or litigation demands an explanation.

We view the projection characterization as a robust organizing principle, not as a claim that linear contracts and homogeneous costs are empirically exact. Heterogeneous curvature, delays, and selection already push us toward richer models (contextual contracts, delayed-feedback learning, joint participation–outcome systems). The remaining challenge is to expand expressiveness while preserving the two deployment virtues that motivated linearity in the first place: transparency (auditors can read β) and mechanical compliance (every update lies in C). Piecewise-linear or groupwise-linear contracts, and constrained contextual policies β(s) ∈ C(s), are promising intermediate steps: they enlarge the feasible policy class while keeping optimization convex and enforcement straightforward.


10. Conclusion.

We began from a practical tension that many platforms and regulators now face. On the one hand, the platform would like to tailor incentives across many measurable dimensions of performance. On the other hand, policy constraints—caps, monotonicity requirements, groupwise equalities, and other notions of ``do no harm’’—restrict how contracts may depend on those dimensions. Our goal was to show that this tension has a clean economic resolution once we adopt two modeling commitments that are both analytically transparent and empirically testable: linear contracts over observable signals, and convex restrictions on the contract weights.

The first takeaway is geometric. When the agent’s technology is well behaved—strictly convex and homogeneous in effort—the principal’s problem of regulated contract design can be reorganized around a single object: the between the unconstrained benchmark and the feasible policy set. In the canonical quadratic-cost benchmark, the platform’s objective reduces to a concave quadratic in the contract weights, and regulation acts by mapping the unconstrained target θ*/2 to the nearest feasible point in a weighted norm:
βreg = ProjCQ(θ*/2).
We emphasize the economic interpretation behind this algebra. Regulation does not introduce a qualitatively new motive; it changes from the first-best are permitted. The projection representation makes this explicit: the feasible set C encodes compliance, while the metric induced by Q encodes how strongly incentives translate into effort along each task dimension. A cap, for example, prevents movement in a direction; an equality constraint collapses directions; a monotonicity constraint creates a cone. The optimal regulated contract is simply the closest compliant point to the benchmark in the technology-relevant geometry.

The second takeaway is conceptual rather than algebraic: the same ``projection logic’’ survives beyond quadratic costs. For general k-homogeneous costs, the principal no longer projects in Euclidean space. Instead, the agent’s technology induces a dual geometry through the convex conjugate c*, and the regulated optimum can be expressed (under standard regularity) as a Bregman-type projection onto C around the unconstrained target θ*/k. Even when this characterization does not collapse to a closed form, it still turns regulated contract design into a disciplined convex program: we can compute βreg reliably, and we can interpret binding constraints through KKT conditions in exactly the way auditors and policy teams need (which constraint binds, in which direction, and with what shadow value). In short, moving away from quadratic costs changes , not the tractability.

The third takeaway concerns identification and learning. In many applied settings the platform does not know the marginal values θ*, and the observable task signals are themselves noisy or strategically distorted. The central empirical message is that policy-compliant optimization and causal learning can be layered. When effort is latent and signals are measured with error, naive regression of outcomes on signals fails. But the platform’s own contract variation provides an instrument: under the maintained exclusion (that βt affects yt only through induced effort), the moment condition
𝔼 [βt(yt − ⟨θ*, xt⟩)] = 0
enables consistent estimation of θ* via IV/GMM. Once θ̂ is in hand, compliance is restored mechanically by projecting the plug-in unconstrained contract back into C. This separation is operationally valuable: it allows an experimentation or inference team to focus on identification and variance control, while a policy/compliance team can insist that every posted contract satisfy constraints by construction.

The projection step is not merely cosmetic. Because projection onto a closed convex set is nonexpansive (in the appropriate norm), it provides a stability guarantee: estimation error in θ̂ does not get amplified into larger policy error in β̂. This is the key reason that we can obtain regret bounds in online learning that look familiar—on the order of $\widetilde{O}(d\sqrt{T})$ under standard instrument-strength conditions—even though we require every action to remain in C at every time. The algorithmic content is modest (explore to ensure relevance, estimate by IV, project to enforce compliance), but the modeling implication is substantial: as long as the platform is willing to represent policy as convex restrictions in contract space.

A useful way to summarize the paper is to separate three layers of structure that are often conflated. The first layer is the , summarized by c (or Q in the quadratic benchmark), which determines how incentives translate into effort. The second layer is the , summarized by the feasible set C, which determines what the platform is allowed to do. The third layer is the , summarized by the signal and outcome equations and the strength of available instruments, which determines what the platform can learn. Our results show that these layers interact in a modular way: c determines the metric (Euclidean or Bregman), C determines the projection target, and the data determine how quickly we approach that target.

For practice, this modularity clarifies what must be specified—and defended—when the platform claims to be optimizing subject to regulation. The platform must (i) state the constraint set C precisely enough that an auditor could verify membership, (ii) justify a mapping from contract weights to induced effort that is consistent with observed behavior (or at least robust to misspecification), and (iii) document the source of identifying variation used to estimate θ*, including what is randomized, what is held fixed, and what concurrent interventions might violate exclusion. The value of the projection framing is that it forces these commitments into explicit objects. A compliance review can ask what is $C$ and how is it enforced?'' An economics review can askwhat is the implied responsiveness matrix or conjugate geometry?’’ An inference review can ask ``how strong are the instruments and how stable are the moments?’’ These questions are separable, and the answers can be logged and versioned.

At the same time, it is important to be clear about what the framework does resolve. Linear contracts are attractive because they are transparent, easy to compute, and easy to enforce, but they are not always behaviorally or institutionally adequate. The assumption that β affects outcomes only through effort is likewise a modeling claim, not a theorem; it can fail if, for instance, incentives change composition, induce gaming that directly impacts measured outcomes, or interact with platform-side ranking and matching rules. Homogeneity is a tractable abstraction that captures returns to scale in a parsimonious way, yet real production systems may feature fixed costs, discontinuities, and complementarities that violate it. Our view is that the contribution here is not to deny these complexities, but to provide a baseline in which the economics of regulation, the mechanics of compliance, and the statistics of learning can be analyzed jointly and cleanly.

Several research directions follow naturally. One is to expand the contract class while preserving convex computability and mechanical compliance—for example, piecewise-linear weights, groupwise-linear rules, or constrained contextual policies that map observable covariates into a feasible set C(s). Another is to integrate richer dynamics: reputation and relational incentives, intertemporal substitution of effort, and delayed or multi-stage outcomes that require explicit state variables. A third is to treat fairness constraints not only as restrictions on β but also as restrictions on induced allocations and welfare, which may require coupling the projection step with equilibrium participation and matching. Each direction raises new identification challenges, but the organizing principle remains the same: represent policy as a feasible region, represent behavior as a response map, and ensure learning uses variation whose provenance can be explained.

The broader lesson is that regulation changes the of optimization, not the need for optimization. Once we accept that compliance must be enforced at the level of deployed policies, the right question is not whether the platform can optimize, but whether it can do so in a way that is transparent, auditable, and statistically grounded. In our setting, convex constraints and projection-based updates provide a direct affirmative answer. The remaining work is to bring richer empirical content to the primitives—to measure responsiveness, to validate exclusion, to characterize heterogeneity and selection—while preserving the two deployment virtues that linear, constrained contracts deliver: every policy is interpretable, and every policy is compliant by construction.