Many of the most consequential compensation schemes today are designed not by a single employer and a single worker, but by platforms that mediate millions of transactions and that continuously revise their rules. Drivers are paid through formulas that combine acceptance rates, completion rates, and passenger ratings; creators are rewarded through mixtures of watch time, engagement, and policy compliance; sellers face fee schedules that depend on shipping speed and customer satisfaction. In each of these settings, the platform does not merely ``measure’’ performance: it chooses which dimensions are rewarded and by how much. The modern policy challenge is that these incentive weights are rarely unconstrained. Platforms face regulatory caps, transparency requirements, and internal commitments to fairness and interpretability, and they must enforce these constraints while still eliciting productive effort across multiple tasks.
We study this tension in a multitask principal–agent model tailored to the platform context. The economic force is familiar: whenever one activity is easier to measure than another, or whenever some behaviors are rewarded and others are not, the agent reallocates effort in response. What is distinctive in platform environments is the combination of (i) high-dimensional, noisy measurement; (ii) limited liability and related restrictions that prevent punitive ``negative’’ incentives; (iii) explicit constraints on the incentive weights arising from law, product design, or reputational considerations; and (iv) the fact that the platform often does not know the true marginal value of each task ex ante and must learn it from data generated under its own past contracts. Our goal is to provide a unified, tractable description of optimal contract design under such constraints and of how a platform can learn to implement it over time.
We focus on linear contracts—payments that are weighted sums of
observable task signals—not because linearity is always optimal in a
frictionless environment, but because it is the natural language of
implementable, auditable policy in large-scale systems. Linear rules are
simple to communicate and to verify; they fit the engineering reality
that platforms often score participants using a finite set of metrics;
and they interact cleanly with institutional constraints such as caps
(no more than \$X per unit''), monotonicity (do not
penalize higher output’‘), and parity constraints (``equalize weights
across protected groups or across task categories’’). Moreover, linear
contracts provide a disciplined benchmark for asking how far regulation
pushes a platform away from the unconstrained incentive scheme, and for
quantifying the welfare cost of compliance.
Our approach treats regulation and platform commitments as a set of feasible incentive weights. This is intentionally broad. A cap on bonuses corresponds to upper bounds on weights; limited liability corresponds to nonnegativity; transparency and interpretability often lead to sparsity or simple shape constraints; and fairness policies can impose linear equalities or inequalities that couple weights across tasks. A key modeling choice is to encode these requirements as a closed, convex set of admissible weights. Convexity reflects the idea that mixtures of compliant policies remain compliant, and it makes the problem computationally meaningful: a platform can search over the feasible set using standard convex optimization tools. At the same time, this abstraction forces us to confront an economic question: when the feasible set is binding, what form does the optimal distortion take? Does regulation generate qualitatively new incentive patterns, or does it act as a predictable ``truncation’’ of an otherwise natural contract?
The first insight we develop is that, absent regulation, a large class of multitask technologies leads to a strikingly simple incentive rule. When the agent’s cost of effort exhibits a common degree of homogeneity across tasks—capturing the idea that scaling up all activities becomes disproportionately more expensive when effort is convex—the platform’s optimal linear weights are proportional to the marginal values of the tasks. The proportionality factor is determined by the curvature of the technology: stronger convexity translates into weaker incentives, uniformly across dimensions. This result formalizes an intuitive principle that is often invoked informally in platform debates: if every task becomes ``hard’’ at the same rate as the agent scales effort, then the platform need not differentially shade incentives across tasks in order to manage distortions; the optimal policy is to scale down the entire vector of marginal values.
Regulation breaks this proportionality, but not in an arbitrary way. Our second insight is that, in an important benchmark with quadratic costs, the regulated optimum can be characterized as a geometric projection of the unconstrained ideal onto the feasible set. In this case, compliance does not require re-solving the full principal–agent problem from scratch; instead, the platform can interpret regulation as choosing the closest compliant vector of weights to the unconstrained target, where ``closeness’’ is measured in a metric induced by the effort technology. This projection perspective is useful for two reasons. Economically, it clarifies how different constraints bind: a cap clips extreme weights; parity constraints average weights across constrained dimensions; monotonicity restricts the direction of adjustments. Computationally, it reduces contract design to a standard convex program with a unique solution, making comparative statics transparent and implementation feasible at scale.
Beyond the quadratic benchmark, the same geometric idea persists, but the relevant notion of distance changes. For general homogeneous and strictly convex effort costs, the distortion created by regulation is naturally measured not by an ordinary Euclidean norm but by a divergence induced by the agent’s technology. In other words, the ``projection’’ occurs in a geometry determined by how incentives translate into effort. This matters in practice: in some environments, increasing a weight by a given amount changes behavior much more in one task than in another, and it is precisely this asymmetry that should govern how the platform trades off compliance against performance. Our framework accommodates this by linking the regulated optimum to a Bregman-type projection, which remains computationally tractable even when closed forms are unavailable.
A third motivation for our analysis is learning. Platforms rarely know the true marginal value of each task—for example, how much an additional unit of response speed improves retention, or how much additional content moderation reduces long-run churn. These values must be inferred from data, but the data are endogenous: the platform changes incentives, agents change behavior, and observed task signals are noisy and often incomplete. In such settings, naive regression of outcomes on observed signals can be badly biased because the signal is measured with error and because the platform’s own choice of incentives affects both the level and composition of effort. We therefore take seriously the econometric problem faced by a platform that must learn .
Our learning results use a simple principle: posted incentive weights themselves can serve as instruments for the unobserved effort that drives outcomes. Because the platform chooses the weights and observes the resulting noisy signals and outcomes, it can form moment conditions that remain valid despite measurement error in signals. This yields an IV/GMM approach to estimating task values from observational data generated by past contracts. Importantly, once we estimate task values, we can enforce regulatory constraints by projecting the estimated ideal contract onto the feasible set. This plug-in projection step is stable: small estimation errors in values translate into proportionally small errors in the implemented contract, so compliance does not amplify statistical noise.
Finally, we connect these pieces in an online setting where the platform repeatedly interacts with agents and updates its contract over time. Here the platform faces the classic exploration–exploitation tradeoff, complicated by compliance: exploration must itself remain within the regulated set of contracts. We show that a simple explore-then-commit approach, combined with IV estimation and projection, achieves sublinear regret, and that the projection step preserves rates because it is nonexpansive in the relevant metric. In environments with repeated measurements and sufficient ``diversity’’ in signals, strong identification can arise even without deliberate experimentation, implying that compliance and learning need not be in tension when the data-generating process is rich enough.
Throughout, we emphasize what our model does and does not capture. By restricting attention to linear contracts, we abstract from potentially welfare-improving nonlinearities and from richer dynamic incentive schemes. By focusing on a representative agent technology, we suppress heterogeneity that may be central in labor markets and creator ecosystems. These choices are deliberate: our aim is to isolate how a platform’s of incentive weights is shaped by convex effort costs, regulatory constraints, and learning under measurement error. The resulting framework yields a set of sharp predictions—notably, proportional incentives without regulation and projection-style distortions with regulation—that are both interpretable for policy analysis and actionable for mechanism design in large-scale systems. In the next section, we lay out the baseline multitask model and the equilibrium relationship between incentive weights and induced effort, which will allow us to formalize these ideas.
We begin with a static multitask environment that isolates the
mapping from to . There are d
tasks, indexed by i = 1, …, d. A
representative agent chooses an effort vector a ∈ ℝ+d,
where ai
denotes effort allocated to task i. We interpret the nonnegativity
constraint as a limited-liability or
no sabotage'' restriction: the agent can increase productive activities but cannot be compelled to exert negative effort on any task. The agent incurs cost $c(a)$, where $c:\mathbb{R}^d_{+}\to \mathbb{R}_{+}$ is strictly convex and differentiable. Our central technological assumption is homogeneity: there exists $k>1$ such that \[ c(\rho a) \;=\; \rho^{k}\, c(a)\qquad \text{for all } \rho>0 \text{ and } a\in \mathbb{R}^d_{+}. \] This captures the idea that scaling up all activities becomes increasingly expensive at a common rate. The parameter $k$ is a reduced-form measure of curvature (orreturns
to scale’’ in disutility): higher k means marginal costs rise faster
when we proportionally expand effort in all dimensions.
The principal (platform) does not observe a directly. Instead it observes a
vector of noisy task signals x ∈ ℝd:
x = a + ε, 𝔼[ε ∣ a] = 0,
so that 𝔼[x ∣ a] = a.
Signals may represent measured completion rates, engagement counts, or
other operational metrics; the key friction is that they are imperfect
proxies for true effort.
The principal derives value from the agent’s (unobserved) effort
through an outcome y ∈ ℝ
satisfying
𝔼[y ∣ a] = ⟨θ*, a⟩,
where θ* ∈ ℝ+d
is a vector of marginal values of each task to the principal. In the
static design problem of this section, we treat θ* as known. (In later
sections, we allow θ* to be unknown and
learned from data generated under the principal’s own contracts.)
The principal restricts attention to of the form
w(x) = ⟨β, x⟩, β ∈ ℝ+d.
Nonnegativity of β is a
natural limited-liability counterpart on the contract side: the platform
can reward measured performance but does not impose explicit per-unit
penalties. Because 𝔼[x ∣ a] = a, the
agent’s expected payment under effort a is 𝔼[w(x) ∣ a] = ⟨β, a⟩;
thus, additive measurement noise affects realized payments but not the
agent’s incentives.
Given β, the agent chooses
a to maximize expected
utility
UA(a; β) = 𝔼[w(x) ∣ a] − c(a) = ⟨β, a⟩ − c(a) subject
to a ∈ ℝ+d.
Strict convexity of c implies
that the agent’s problem has a unique solution, which we denote by a(β). The necessary and
sufficient optimality conditions take the Kuhn–Tucker form
where ⊙ denotes componentwise
multiplication. When the interior condition a(β) ∈ ℝ++d
holds, these reduce to the familiar first-order condition ∇c(a(β)) = β:
incentives equal marginal costs, task by task.
It is often useful to represent the best response via convex duality.
Let c* denote the
convex conjugate,
c*(β) = supa ≥ 0{⟨β, a⟩ − c(a)}.
Under differentiability and regularity, the envelope theorem implies
a(β) = ∇c*(β)
on the region where the maximizer is interior. This dual viewpoint will
later be convenient for computation and for understanding how the
geometry of the technology shapes constrained optima.
The principal chooses β
anticipating the agent response. Since 𝔼[y ∣ a] = ⟨θ*, a⟩
and 𝔼[w(x) ∣ a] = ⟨β, a⟩,
the principal’s expected utility under β is
u(β) = ⟨θ*, a(β)⟩ − ⟨β, a(β)⟩ = ⟨θ* − β, a(β)⟩.
This expression highlights the basic wedge: raising β increases effort (beneficial) but
also increases expected transfers (costly), and both effects operate
through the same induced action a(β). In the absence of
additional restrictions on β,
the principal solves
maxβ ∈ ℝ+du(β).
The purpose of this section is to characterize the solution to this
unconstrained benchmark and to interpret its structure.
Our homogeneity assumption implies a particularly sharp result: the
principal optimally sets incentives proportional to the value vector
θ*, with a single
proportionality constant determined by the curvature parameter k. Formally, when the feasible set
is all of ℝ+d, the
unique optimal linear contract is
$$
\beta^* \;=\; \frac{1}{k}\,\theta^*.
$$
The result is striking because it holds in high dimensions and does not
depend on the correlation structure of the measurement noise ε: only the mean condition 𝔼[x ∣ a] = a
matters for expected incentives.
We can see the economic logic most transparently by rewriting the
principal’s problem in terms of the action a that the contract induces. When
the agent chooses an interior effort vector, implementability is
summarized by β = ∇c(a).
Substituting into the principal’s payoff yields
u(β) = ⟨θ* − ∇c(a), a⟩.
Now apply Euler’s theorem for k-homogeneous functions: for
differentiable c,
⟨∇c(a), a⟩ = k c(a).
Therefore, the principal’s payoff from inducing action a can be written as
⟨θ*, a⟩ − k c(a).
This is a concave maximization over a because c is convex. The first-order
condition for the optimal induced action a* is
θ* = k ∇c(a*).
Recalling that the incentive vector required to implement a* is β* = ∇c(a*),
we obtain the proportionality rule β* = θ*/k.
Strict convexity of c pins
down a* uniquely
and thus yields uniqueness of β*.
The proportionality factor 1/k admits a simple interpretation.
The technology parameter k
summarizes how quickly marginal costs grow when we scale effort. A
larger k means that inducing
additional effort is more expensive at the margin, and the principal
responds by uniformly tempering incentives. Importantly, the scaling is
: unconstrained optimality does not require tilting incentives away from
any particular dimension, even though effort allocation is
multidimensional. Instead, the principal ``shrinks’’ the entire vector
of task values, preserving relative weights:
$$
\frac{\beta^*_i}{\beta^*_j} \;=\; \frac{\theta^*_i}{\theta^*_j}\qquad
\text{for all } i,j.
$$
In this sense, the benchmark provides a discipline for discussing
distortions: any deviation from proportional weights must come from
constraints on β (or from
failures of the common-homogeneity assumption), not from multitask
considerations per se.
A familiar example illustrates the role of k. If costs are separable power functions, $c(a)=\sum_{i=1}^d a_i^k/k$, then ∇c(a) = (a1k − 1, …, adk − 1), and β = ∇c(a) implies ai(β) = βi1/(k − 1). The principal’s problem aggregates these responses, yet the optimal contract still collapses to β* = θ*/k. Thus the result is not an artifact of quadratic structure; it is a consequence of homogeneity and convexity.
Two limitations are worth flagging before we move to regulated design. First, although noise does not affect expected incentives in our formulation, it can matter for realized payments and risk; we abstract from risk aversion and participation constraints to focus on incentive weights and compliance. Second, the unconstrained optimum β* = θ*/k relies on the availability of all nonnegative weights. In practice, platforms often face caps, monotonicity, parity, or other restrictions that couple components of β and prevent exact proportionality. The next step is therefore to introduce a general convex feasible set of policies and to study how the optimal contract is distorted relative to this benchmark.
The proportionality rule β* = θ*/k
is a useful benchmark, but it is often infeasible once we incorporate
policy or operational constraints on how incentive weights may vary
across tasks. We therefore model regulation by restricting the feasible
contracts to a set
C ⊆ ℝ+d,
interpreted broadly to include hard legal constraints (e.g., caps on
piece rates), platform policy (e.g., monotone reward schedules across
tiers), and fairness or compliance requirements (e.g., equal treatment
across protected groups). Throughout, we assume that C is . Convexity is not merely
technical: it is a natural reduced form for constraints that are
implemented as mixtures of admissible policies (randomization or
averaging across business units), and it is the key property that will
later allow us to characterize the regulated optimum as a projection in
an appropriate geometry.
Given a feasible set C, the
principal’s static design problem becomes
where a(β) is the
agent’s best response from the previous subsection. When the agent’s
optimum is interior, we can equivalently write a(β) = ∇c*(β)
and hence
u(β) = ⟨θ* − β, ∇c*(β)⟩.
This representation emphasizes that the regulated problem is a β, but with curvature and cross-task
interactions inherited from the technology c (via its conjugate c*). In general, the
solution need not preserve the benchmark proportionality across tasks:
constraints can force the principal to reallocate incentive intensity
across dimensions, creating distortions that are entirely attributable
to feasibility rather than to the multitask nature of the agent’s choice
problem per se.
Many real constraints take the form of linear inequalities and equalities, which generate closed convex sets automatically. More generally, convexity captures the idea that if two contracts are compliant, then any convex combination is also compliant (e.g., a regulator may approve either of two pay schedules; a platform can interpolate between them by scaling and mixing, or by implementing them for different subpopulations and averaging over time). From a computational standpoint, convexity will allow us to treat using standard tools: KKT conditions, projection operators, and (in the learning section) nonexpansive mappings that preserve estimation rates.
Closedness ensures the optimum is not ``lost at the boundary’’ through limit points that are infeasible. Nonemptiness simply rules out inconsistent regulation. When C is also compact, existence of βreg is immediate by continuity of u(β) (which holds under mild regularity inherited from c). If C is unbounded, existence may still hold because the objective typically becomes unfavorable for very large weights (the principal pays for induced effort), but the exact conditions depend on the growth of c; we return to concrete sufficient conditions in the quadratic and separable-power cases where coercivity is transparent.
To make concrete, it is helpful to list a few constraint classes that appear repeatedly in applications. All examples below are subsets of ℝ+d and are convex and closed.
A common regulatory intervention is to cap piece rates to limit
``over-incentivization’’ on any single metric:
$$
C_{\mathrm{cap}} \;=\;\{\beta\in\mathbb{R}^d_{+}:\ 0\le \beta_i \le
\overline{\beta}_i \ \forall i\}.
$$
Caps may reflect consumer protection (e.g., limits on how strongly a
platform may steer sellers via fees), labor regulation (limits on
penalties or extreme bonuses), or internal policy (avoid gaming a
particular metric). Floors can be handled similarly and are useful when
some baseline incentive must be maintained for safety or quality
tasks.
Platforms sometimes restrict the overall ``strength’’ of incentives
because of budget, risk, or volatility concerns. A simple convex form is
an ℓ1 bound,
Cbudget = {β ∈ ℝ+d: ∥β∥1 ≤ B},
which limits total marginal payment per unit of measured output across
tasks. Alternative convex choices include ∥β∥2 ≤ B
(stability in Euclidean norm) or weighted norms that encode
heterogeneous compliance costs across tasks.
When tasks correspond to ordered categories (e.g., seniority tiers,
quality levels, or funnel stages), a regulator or platform policy may
require monotone incentives:
Cmono = {β ∈ ℝ+d: β1 ≥ β2 ≥ ⋯ ≥ βd}.
More generally, if tasks are nodes in a partially ordered set (e.g.,
content categories with an
at least as safe'' ordering), we can impose $\beta_i \ge \beta_j$ for specified pairs $(i,j)$. These are linear inequalities, hence convex. Such constraints formalizeno
perverse ranking’’ requirements: higher-quality or safer actions should
not be incentivized less than lower-quality ones.
Fairness and compliance policies often require that certain tasks
receive identical incentive weights, e.g., because they correspond to
the same underlying activity measured in different ways, or because they
proxy for attributes where differential incentives are disallowed. Let
𝒢1, …, 𝒢m be a
partition of tasks into groups that must be treated equally. Then
Cparity = {β ∈ ℝ+d: βi = βj
whenever
i, j ∈ 𝒢ℓ, ℓ = 1, …, m}
is an affine subspace intersected with ℝ+d, hence
closed and convex. Economically, these constraints induce ``pooling’’ of
weights and thus force the platform to sacrifice fine task-by-task
tailoring.
A softer alternative to equality is to bound relative differences:
$$
\beta_i \le r\,\beta_j \quad \text{and/or}\quad \beta_i \ge
\underline{r}\,\beta_j,
$$
for specified pairs (i, j) and constants r ≥ 1, $\underline{r}\in(0,1]$. These are still
linear inequalities (e.g., βi − rβj ≤ 0),
hence convex. Ratio bounds capture policies like
no task may be incentivized more than twice as strongly as a safety task'' orweights
across demographic proxies cannot differ by more than a fixed
multiplicative factor.’’
In many organizations, a policy team may be allowed to adjust incentives
only within a limited distance of a previously approved schedule β0:
Cstable = {β ∈ ℝ+d: ∥β − β0∥ ≤ δ}.
Such sets are convex balls intersected with ℝ+d. They
formalize a form of regulatory inertia or internal governance that
prevents abrupt changes which might induce manipulation or
distributional shocks.
Most of the preceding examples can be written compactly as
C = {β ∈ ℝ+d: Aβ ≤ b, Eβ = f},
for appropriate matrices A, E and vectors b, f. This form is flexible
enough to encode caps, monotonicity, parity, ratio bounds, and many
budget constraints (after introducing slack variables if needed). It
also clarifies how different policy requirements interact: combining
constraints corresponds to intersecting convex sets, preserving
closedness and convexity. In particular, adding regulation can only
(weakly) reduce the principal’s maximum value, and it can only (weakly)
increase the ``distance’’ between the regulated optimum and the
unconstrained benchmark in the sense that β* may lie outside C.
Problem highlights a disciplined notion of distortion. In the benchmark, the platform would like to set β proportional to θ*, uniformly scaled by 1/k. Regulation restricts the feasible directions in which β can move, and thus forces the platform to select the to that proportionality rule given the agent’s technology. Which tasks become under- or over-incentivized depends jointly on (i) which constraints bind and (ii) how incentives translate into effort via a(β). This is precisely why it is valuable to maintain an explicit model of behavior rather than treating a(β) as an exogenous response curve.
At this level of generality, is a constrained nonlinear program. We
can still characterize βreg by first-order
optimality conditions in the usual variational-inequality form: at an
optimum,
⟨∇u(βreg), β − βreg⟩ ≤ 0 for
all β ∈ C,
whenever u is differentiable
at βreg. However,
this characterization is not yet operational until we specify how ∇u behaves, and, crucially, whether
u is concave so that these
conditions are sufficient. The next subsection therefore specializes to
a quadratic benchmark in which a(β) becomes linear and the
constrained optimum admits a sharp geometric interpretation as a
weighted projection onto C.
We now specialize to the benchmark in which the agent’s technology is
quadratic:
This assumption is not meant to be universally descriptive; rather, it
isolates a particularly transparent case in which the mapping from
incentives to effort is linear, and the principal’s constrained design
problem reduces to a familiar geometric operation. In applications, can
be viewed as a second-order (local) approximation to a smooth cost
function, or as the exact form arising from Gaussian perturbations of
effort around a baseline in certain continuous-control models.
Fix a feasible contract β ∈ C ⊆ ℝ+d.
Ignoring the nonnegativity constraint on a for a moment, the agent
solves
$$
\max_{a\in\mathbb{R}^d}\ \langle \beta,a\rangle-\frac{1}{2}a^\top
Q^{-1}a,
$$
with first-order condition β − Q−1a = 0,
hence
When we impose a ∈ ℝ+d,
the exact solution is characterized by the usual complementarity
conditions. A sufficient condition for to remain valid is that Qβ ≥ 0 for all β ∈ C (for example, when
Q is diagonal with positive
entries, or more generally when Q maps ℝ+d into
itself). Because the regulated-design insights we emphasize concern how
constraints on β distort
incentives, we proceed under this maintained condition, so that induced
effort is given by throughout the feasible region.
Economically, Q summarizes how responsive each task’s effort is to marginal incentives, allowing for cross-task spillovers. If Q is diagonal, tasks respond independently. If Q has off-diagonal terms, incentives on one task can induce effort shifts in others, a reduced-form way to capture complementarities or joint production.
Substituting into the principal’s expected utility gives
Since Q ≻ 0, the term β⊤Qβ is
strictly convex, so u(β) is a strictly concave
quadratic function of β. The
key step is to rewrite by completing the square:
$$
(\theta^*)^\top Q\beta-\beta^\top Q\beta
\;=\;-\Big(\beta-\frac{\theta^*}{2}\Big)^\top
Q\Big(\beta-\frac{\theta^*}{2}\Big)
\;+\;\frac{1}{4}(\theta^*)^\top Q\theta^*.
$$
Define the Q-weighted norm
∥v∥Q2 := v⊤Qv.
Then
Because the second term in does not depend on β, maximizing u(β) over a feasible set
C is equivalent to minimizing
a weighted squared distance to the target vector θ*/2.
Equation implies the following sharp characterization: the regulated
optimum is the Q-weighted
Euclidean projection of θ*/2 onto C,
Uniqueness follows immediately because ∥β − θ*/2∥Q2
is strictly convex in β and
C is convex and closed. Two
economic points are worth emphasizing.
First, regulation does not create a new ``first-best’’ target; the unconstrained benchmark still points to θ*/2, but feasibility forces the principal to choose the closest admissible contract in a geometry dictated by Q. In this sense, constraints distort incentives only through a geometric truncation/averaging operation.
Second, the welfare loss from regulation admits a literal distance
interpretation. Evaluating at βreg yields
u(βreg) = ∥θ*/2∥Q2 − ∥βreg − θ*/2∥Q2,
so the reduction in attainable value relative to the unconstrained
optimum is exactly the squared Q-distance from θ*/2 to the feasible
set.
The projection formulation also yields a convenient optimality
condition. Let NC(β)
denote the normal cone to C at
β. Then βreg solves if and only
if
Equivalently, for all β ∈ C,
which is the variational-inequality form of the projection property.
When C is described by
linear inequalities and equalities,
C = {β ∈ ℝd: Aβ ≤ b, Eβ = f, β ≥ 0},
the KKT conditions for take an explicit multiplier form: there exist
μ ≥ 0 (for Aβ ≤ b), ν (for Eβ = f), and s ≥ 0 (for β ≥ 0) such that
These conditions make clear how regulation ``tilts’’ the contract:
binding constraints contribute shadow prices that shift βreg away from the
unconstrained target θ*/2.
The projection viewpoint is especially useful because many canonical constraint classes correspond to well-studied projection operators.
If $C=\{\beta\in\mathbb{R}^d_+:\ \beta_i\le
\overline{\beta}_i\}$ and Q = I, then reduces to
componentwise clipping:
$$
\beta^{\mathrm{reg}}_i\;=\;\min\Big\{\overline{\beta}_i,\ \max\{0,\
\theta^*_i/2\}\Big\}.
$$
Thus caps bind exactly on tasks with large marginal value θi*
(relative to the cap), producing a transparent notion of distortion:
high-value dimensions are truncated, while low-value dimensions remain
at their unconstrained levels (here, simply θi*/2).
When Q ≠ I but
diagonal, Q = diag(qi),
the same clipping result holds because the objective is separable:
minβ ∈ C ∑iqi(βi − θi*/2)2,
so the weights qi affect the
value but not the argmin under pure box constraints.
Suppose tasks are ordered and C = {β ∈ ℝ+d: β1 ≥ β2 ≥ ⋯ ≥ βd}.
With Q = I, is
exactly the isotonic regression of the vector θ*/2 under a
nonincreasing shape restriction (and truncation at zero). The solution
pools adjacent coordinates whenever the target violates monotonicity: if
θi*/2 < θi + 1*/2,
then the projection forces βireg
and βi + 1reg
toward a common level. With diagonal Q = diag(qi),
we obtain isotonic regression, in which coordinates with larger qi are penalized
more heavily for deviating from θi*/2;
operationally, the pooled level becomes a weighted average. This is
attractive in practice because it yields fast algorithms (e.g., the
pool-adjacent-violators method) and a clear policy interpretation:
monotonicity constraints create local averaging of incentives across
tiers.
Let 𝒢1, …, 𝒢m be
task groups required to share a common weight. Then C imposes βi = βj
for all i, j ∈ 𝒢ℓ,
together with β ≥ 0 (and
possibly other constraints). In the simplest case with Q = I and no additional
constraints, the projection averages within each group:
$$
\beta^{\mathrm{reg}}_i\;=\;\max\Big\{0,\
\frac{1}{|\mathcal{G}_\ell|}\sum_{j\in\mathcal{G}_\ell}\frac{\theta^*_j}{2}\Big\}\qquad
\text{for } i\in\mathcal{G}_\ell.
$$
Thus parity requirements induce explicit pooling: the platform replaces
heterogeneous task-by-task marginal values by group averages, reflecting
the informational and allocative cost of ``equal treatment.’’ If Q = diag(qi),
the group value becomes a weighted average, ∑j ∈ 𝒢ℓqj(θj*/2)/∑j ∈ 𝒢ℓqj,
emphasizing again that the technology determines which deviations from
the target are more costly.
In the quadratic benchmark, computing βreg is no harder than solving a strictly convex quadratic program—and in many common cases (boxes, monotonicity, parity with diagonal Q) it reduces to closed-form clipping or fast projection routines. Conceptually, the result also tells us what regulation and do in this environment: it cannot change the fundamental target θ*/2, but it can force the platform to approximate that target under a constraint-induced geometry. This precise geometric structure will be the template we generalize in the next section, where quadratic costs are replaced by general k-homogeneous technologies and Euclidean projections become Bregman projections induced by c*.
We now drop the quadratic benchmark and return to the maintained
primitives: c : ℝ+d → ℝ+
is strictly convex, differentiable, and k-homogeneous with k > 1. The central benefit of
this class is that it still delivers a clean representation of behavior.
Let the convex conjugate of c
be
c*(β) = supa ≥ 0 ⟨β, a⟩ − c(a).
Under standard regularity (e.g., c essentially smooth on ℝ++d), the
agent’s best response can be written as
with the usual caveat that corners may arise when some components of
β are zero and the
nonnegativity constraint binds. Economically, c* summarizes the
``supply side’’ of effort in : it tells us how the agent converts
marginal rewards β into a
chosen effort vector.
A key structural implication is that homogeneity is preserved (up to
a change of degree) under conjugacy. If c is k-homogeneous, then c* is k*-homogeneous with
$$
k^*\;=\;\frac{k}{k-1}.
$$
By Euler’s theorem for homogeneous functions, for all β ∈ ℝ++d
we have
Substituting into the principal’s expected utility gives
where the second equality uses . This expression is the natural analog
of ``completing the square’’ in the quadratic case: the technology
enters only through c* and its gradient.
We view the regulated design problem as
where C ⊆ ℝ+d
is nonempty, closed, and convex. Two observations are useful for
practice. First, because u(β) is continuous whenever
c* is
differentiable, an optimum exists whenever C is compact. Compactness is not
merely technical: in many policy environments, caps, budget constraints,
or ``no extreme incentives’’ rules naturally impose boundedness. When
C is closed but unbounded,
existence can still be ensured by mild coercivity conditions (for
instance, c*(β) → ∞
sufficiently fast along rays in C so that the negative term in
dominates), but the compact case is the cleanest statement.
Second, strict convexity of c implies differentiability and strict convexity of c* on the interior of its domain, which typically yields of u over ℝ++d for common classes (notably separable power costs and many CES-type technologies). In those cases, the maximizer over convex C is unique. We emphasize, however, that full generality requires care: without additional curvature conditions on c* (e.g., positive definiteness of its Hessian on the relevant region), u may be concave but not strictly concave, and uniqueness can fail on faces of C.
Even when a closed form is unavailable, the optimal contract is
characterized by a familiar first-order condition. Let NC(β)
denote the normal cone to C at
β. If u is differentiable at βreg, then βreg solves if and only
if
equivalently −∇u(βreg) ∈ NC(βreg).
Differentiating and writing Hc*(β)
for the Hessian of c* yields
The economic content of – mirrors the quadratic projection story: the
unconstrained ``target’’ points toward θ*, but regulation
introduces normal-cone forces that shift the chosen β until the marginal gain direction
is orthogonal (in a generalized sense) to all feasible deviations.
The quadratic benchmark delivered a Euclidean projection in a Q-weighted geometry. With general
k-homogeneous costs, the
geometry becomes . A convenient choice of potential is
which is convex and differentiable wherever c* is. The associated
Bregman divergence is
Dψ(β, z) = ψ(β) − ψ(z) − ⟨∇ψ(z), β − z⟩.
In many canonical specifications, maximizing u(β) over C is equivalent to a of a scaled
value vector onto C:
This representation should be read as a disciplined generalization of
the projection theorem: regulation chooses the feasible contract closest
to the unconstrained benchmark θ*/k, but
``closest’’ is measured in the divergence generated by the agent’s
technology, not necessarily by squared Euclidean distance. The
substantive message is that the policy-induced distortions (pooling,
truncation, averaging) persist, yet they occur in a nonlinear coordinate
system aligned with effort supply.
We also flag a limitation: is most transparent when c* has a separable or otherwise well-structured form so that ∇ψ is monotone and the KKT system coincides with Bregman projection optimality conditions. For an arbitrary strictly convex c, the variational inequality is always valid, while the exact reduction to may require additional regularity (or may only hold locally).
From an implementation perspective, the design problem remains tractable: under the maintained conditions ensuring concavity of u, is a concave maximization over a convex set, hence solvable by standard methods. In practice we can proceed in at least three ways.Closed forms are the exception rather than the rule: beyond quadratic
and a few separable cases (where constraints like boxes or group
equalities can still be handled almost analytically), regulation
typically forces a numerical step. The model’s practical takeaway is
therefore not
we can always write $\beta^{\mathrm{reg}}$ explicitly,'' but ratherwe
can always compute βreg as a well-behaved
convex optimization problem once the technology is specified.’’
Finally, we briefly discuss robustness when homogeneity is only
approximate. In many environments, costs are not exactly k-homogeneous, but may satisfy an
ε-homogeneity condition on the
relevant region of efforts: for all ρ ≥ 1 and a in a set of interest,
This formulation captures technologies that are ``locally power-like’’
up to small multiplicative distortions. Under and standard smoothness,
the Euler identity used to obtain clean scaling relations becomes
approximate, and the unconstrained benchmark β = θ*/k
is no longer exact. Nonetheless, the same logic implies a quantitative
stability result: the optimal contract under approximate homogeneity
remains close to the homogeneous prediction, with deviations on the
order of ε once we restrict
attention to bounded C (so
that the relevant gradients are Lipschitz on C). In words, small departures from
exact returns-to-scale do not overturn the geometric interpretation of
regulation; they perturb the target and the projection geometry
slightly, but do not change the fundamental fact that constraints
operate by pulling incentives toward a feasible approximation of an
economically meaningful benchmark.
This robustness perspective also guides empirical work. When a platform’s internal model of effort costs is only an approximation, insisting on exact closed forms is less important than ensuring that the regulated design problem is stable and computable. In the next section, we turn to the complementary practical challenge: even if the structure above is correct, the marginal value vector θ* is typically unknown and must be learned from noisy measurements under precisely the same policy constraints embodied in C.
We now treat the marginal value vector θ* as unknown. The principal observes only posted contracts and noisy performance data, and must infer θ* while remaining compliant with the regulatory set C. The key econometric difficulty is that the natural regressor—the observed signal vector x—is contaminated by measurement error, so na"{}ve regression of outcomes on x is generally biased (attenuation and more complex distortions in the multitask setting). Our maintained structure delivers a clean fix: because the contract weights β are chosen by the principal and affect effort, they serve as natural instruments for x.
Suppose we collect T
observations (βt, xt, yt)
with βt ∈ C.
Let at = a(βt)
denote the (unobserved) induced effort, and write the measurement and
outcome equations as
where εt ∈ ℝd
and ηt ∈ ℝ
are mean-zero noises conditional on at (and hence
conditional on βt), with
subgaussian tails of scale σ.
If we regress yt on xt, then xt contains
εt which
is mechanically correlated with the regression residual yt − ⟨θ*, xt⟩ = ηt − ⟨θ*, εt⟩,
producing bias even when the principal’s choices are exogenous. This is
the canonical ``errors-in-variables’’ problem.
The contract weights βt affect at through the
agent’s best response, but (by design and by timing) are independent of
the realized measurement noise. This yields an orthogonality
condition:
Equivalently, rearranging gives the linear IV identity
The economic interpretation is simple: we use variation in
incentives—which shifts effort but is not contaminated by measurement
noise—to identify the mapping from effort to value.
Stack the data into matrices
$$
B_T \;=\;\begin{bmatrix}\beta_1^\top\\ \vdots\\
\beta_T^\top\end{bmatrix}\in\mathbb{R}^{T\times d},
\qquad
X_T \;=\;\begin{bmatrix}x_1^\top\\ \vdots\\
x_T^\top\end{bmatrix}\in\mathbb{R}^{T\times d},
\qquad
Y_T \;=\;\begin{bmatrix}y_1\\ \vdots\\
y_T\end{bmatrix}\in\mathbb{R}^{T}.
$$
The sample analog of is BT⊤YT ≈ BT⊤XT θ*,
motivating the estimator
when BT⊤XT
is invertible. This is the exactly-identified IV estimator (and can be
viewed as a special case of GMM with moment vector $\frac{1}{T}\sum_{t=1}^T \beta_t(y_t-\langle
\theta,x_t\rangle)$).
Two requirements deserve emphasis.
First, is built into our timing: conditional mean-zero noise and the
principal’s commitment to βt before (xt, yt)
are realized deliver .
Second, is nontrivial under constraints: BT⊤XT
must be well-conditioned, which requires that the sequence {βt}t ≤ T
span enough directions that translate into meaningfully different effort
choices (and hence different xt).
To understand how quickly we learn, it is useful to express the
estimation error in a form that isolates relevance. Define the sample
``first-stage’’ matrix
$$
M_T \;:=\;\frac{1}{T}B_T^\top X_T.
$$
Under , we can write
$$
\frac{1}{T}B_T^\top Y_T
\;=\;
\frac{1}{T}B_T^\top A_T\,\theta^*
\;+\;
\frac{1}{T}B_T^\top \eta,
\qquad
M_T
\;=\;
\frac{1}{T}B_T^\top A_T
\;+\;
\frac{1}{T}B_T^\top E,
$$
where AT
stacks at,
E stacks εt, and η stacks ηt. Substituting
into yields (when invertible)
The two terms correspond to outcome noise and measurement noise,
respectively. Both are amplified by ∥MT−1∥,
which is exactly where weak-IV problems enter: when MT has a small
minimum singular value, the inversion step magnifies sampling
fluctuations.
A convenient (and operational) high-probability bound can be stated
in terms of σmin(MT).
If ∥βt∥2 ≤ b
almost surely and (εt, ηt)
are conditionally σ-subgaussian, then standard matrix
concentration implies that with probability at least 1 − δ,
for universal constants c1, c2
(the second term reflects the propagation of measurement error through
θ* in ). The
important comparative static is transparent: holding noise and scale
fixed, learning speed is governed by the of the design induced by the
feasible and actually deployed contracts.
Once we have θ̂T, we can
translate it into a compliant contract by applying the same mapping from
values to incentives used in the complete-information benchmark,
followed by projection onto C.
In the simplest homogeneous ``uniformity’’ case, the unconstrained
target is θ*/k, so we
define
In the quadratic benchmark with known Q ≻ 0, the target is θ*/2 in the Q-metric, giving
Both constructions enforce regulation automatically: every iterate is in
C regardless of estimation
error.
The payoff from this
estimate then project'' approach is stability. Euclidean projection onto a closed convex set is nonexpansive, hence \begin{equation}\label{eq:projection_lipschitz} \left\|\widehat{\beta}_T-\beta^{\mathrm{reg}}\right\|_2 \;\le\; \frac{1}{k}\left\|\widehat{\theta}_T-\theta^*\right\|_2, \end{equation} when $\beta^{\mathrm{reg}}=\mathrm{Proj}_C(\theta^*/k)$. In the quadratic case, the same logic holds in the weighted norm: \begin{equation}\label{eq:projection_lipschitz_Q} \left\|\widehat{\beta}_T-\beta^{\mathrm{reg}}\right\|_{Q} \;\le\; \frac{1}{2}\left\|\widehat{\theta}_T-\theta^*\right\|_{Q}. \end{equation} Thus, regulation does notblow
up’’ statistical error; it only truncates or pools incentives according
to the geometry of C.
Equation makes clear that the principal’s statistical problem is largely summarized by $\sigma_{\min}(M_T)=\sigma_{\min}(\frac{1}{T}B_T^\top X_T)$. In practice, this quantity (or its close cousins such as $\sigma_{\min}(\frac{1}{T}B_T^\top B_T)$ in simplified first-stage approximations) plays the role of a weak-IV diagnostic: if it is small, confidence regions for θ* are necessarily wide, and plug-in contracts will be imprecise.
This perspective also clarifies how compliance constraints interact with learning. If C is narrow—for instance, if regulation forces many components of β to be equal, caps all components tightly, or limits variation to a low-dimensional face—then BT may have low effective rank, and σmin(MT) can remain small even as T grows. Economically, regulation can inadvertently reduce the ``experimental variation’’ in incentives available for identification. The appropriate response is not to violate constraints, but to : when C permits, the principal can choose a sequence of compliant contracts that spans the feasible tangent directions (e.g., rotating weight across groups within allowed caps, or cycling among extremal points of C). When C truly collapses the feasible set to a low-dimensional subspace, then no estimator can recover all coordinates of θ*; in that case the relevant object becomes the projection of θ* onto the identifiable subspace, and welfare comparisons should be made accordingly.
The offline IV/GMM formulation delivers three practical takeaways. First, measurement error in task signals is not a nuisance that vanishes with sample size if we use the wrong estimator; it requires instruments. Second, compliance is not merely a constraint in the optimization stage: it shapes the instrument set and therefore the speed at which values can be learned. Third, projection-based implementation is robust: once we have a value estimate, mapping it into C is stable and interpretable, and finite-sample contract error inherits the same eigenvalue-driven rates as value estimation. These observations set the stage for the dynamic problem, where the principal must βt ∈ C while simultaneously using the resulting data to learn θ*.
In the dynamic problem the principal must choose a (βt)t ≤ T
with each βt ∈ C,
and faces a familiar feedback loop: contracts generate data, data
determine estimates of θ*, and estimates
determine future contracts. A useful organizing device is to benchmark
against the (unknown) βreg = βreg(θ*),
and measure how far our deployed sequence is from that target. In the
quadratic benchmark, this distance has a direct welfare meaning because
u(β) is a concave
quadratic; indeed,
$$
u(\beta^{\mathrm{reg}})-u(\beta)
\;=\;
\|\beta-\beta^{\mathrm{reg}}\|_{Q}^{2}
\qquad
\text{when }c(a)=\tfrac12 a^\top Q^{-1}a \text{ and }
\beta^{\mathrm{reg}}=\mathrm{Proj}_{C}^{Q}(\theta^*/2).
$$
Accordingly, we track the squared-distance regret
or, in the quadratic case, its natural weighted analog ∑t ≤ T∥βt − βreg∥Q2.
This choice keeps the economics transparent: regret is exactly the
cumulative misalignment between implemented incentives and the best
compliant incentives.
The simplest way to manage the feedback loop is to decouple the two
roles of βt:
during an initial phase, we choose βt primarily to
make the IV/GMM design well-conditioned; during , we choose βt to be close
to βreg given our
current estimate. Because compliance is nonnegotiable, both phases must
use only contracts in C.
Concretely, fix an exploration length τ and select a finite set of
compliant contracts {β̄(1), …, β̄(m)} ⊆ C
that spans the feasible directions we can vary.
We then cycle or randomize among these contracts for t ≤ τ, collect (βt, xt, yt),
compute an IV estimate θ̂τ, and finally
for t > τ to the
projected plug-in contract
Two features matter. First, projection makes the algorithm at every
round. Second, because projection is nonexpansive in the appropriate
geometry, estimation error in θ̂τ translates
into contract error in β̂
without amplification.
Exploration is not about choosing ``high-variance’’ actions in the
abstract; it is about increasing the minimum singular value of the
empirical first-stage matrix (the relevance object). Let
$$
M_\tau \;:=\;\frac{1}{\tau}B_\tau^\top X_\tau,
\qquad\text{where }B_\tau=\begin{bmatrix}\beta_1^\top\\ \vdots\\
\beta_\tau^\top\end{bmatrix},\;
X_\tau=\begin{bmatrix}x_1^\top\\ \vdots\\ x_\tau^\top\end{bmatrix}.
$$
All IV-style error bounds scale like 1/σmin(Mτ).
Thus, the principal’s exploration problem is naturally a problem: pick a
compliant sequence β1, …, βτ ∈ C
that makes σmin(Mτ)
large. Regulation can severely restrict this: if C forces βt to lie on a
low-dimensional face, then no amount of time yields full-rank Mτ in ℝd. The best we can do is
to maximize eigenvalues in the identifiable subspace, and accept that
learning (and hence contract adaptation) is only possible along those
directions.
The next statement summarizes the baseline performance one gets from combining (i) an exploration design that ensures relevance and (ii) a projection-based commit step.
The argument is a direct decomposition. During exploration, we may
incur regret as large as ∑t ≤ τ∥βt − βreg∥2 ≤ 4b2τ
simply because we are not optimizing; this is the . During exploitation,
βt = β̂
for all t > τ,
so
$$
\sum_{t=\tau+1}^T \|\beta_t-\beta^{\mathrm{reg}}\|^2
\;=\;
(T-\tau)\,\|\widehat{\beta}-\beta^{\mathrm{reg}}\|^2.
$$
By nonexpansiveness of projection, ∥β̂ − βreg∥ ≤ (1/k)∥θ̂τ − θ*∥
in the uniformity geometry (and similarly in Q-norm in the quadratic case). Hence
exploitation regret is controlled by the IV estimation error, which
scales like $\widetilde{O}(\sqrt{d/\tau})$ under the
stated eigenvalue growth. Choosing $\tau\asymp
\sqrt{T}$ balances the linear exploration cost τ against the inverse exploration
benefit T/τ,
producing the $\widetilde{O}(d\sqrt{T})$ rate.
Economically, we trade off short-run inefficiency (cycling through
informative but potentially suboptimal incentives) against long-run
efficiency (quickly converging to the best compliant contract).
The explore–then–commit template is conservative: it treats relevance
as something created only by varying βt. In many
platforms, however, we observe of performance within a round (or across
closely related channels), and these can supply instruments even when
incentives do not vary much. A canonical structure is: in round t we observe two conditionally
independent signal vectors
xt(1) = at + εt(1), xt(2) = at + εt(2),
with εt(1) ⟂ εt(2)
given at.
Then xt(2)
is a valid instrument for xt(1)
in the moment condition 𝔼[xt(2)(yt − ⟨θ, xt(1)⟩)] = 0,
because it is correlated with effort at but
orthogonal to the measurement error in xt(1).
More generally, averaging many within-round measurements yields a
low-noise proxy x̃t whose
covariance is driven by heterogeneity in at rather than
by our contract variation.
What, then, replaces exploration? A condition: the induced efforts
(across agents and environments) must exhibit sufficiently rich
variation even when we are exploiting. Formally, one convenient
requirement is a uniform lower bound λ0 > 0 such that the
design matrix based on the repeated-measurement instrument grows
linearly,
$$
\sigma_{\min}\!\left(\sum_{t=1}^T \widetilde{x}_t
\widetilde{x}_t^\top\right)\;\ge\; \lambda_0 T
\quad\text{with high probability.}
$$
Under such a condition, we can update θ̂t online using
the repeated-measurement IV moments while choosing βt (e.g., βt = ProjC(θ̂t − 1/k)
each round). The resulting contract error shrinks like $\widetilde{O}(\sqrt{d/(\lambda_0 T)})$, and
summing ∥βt − βreg∥2
over time yields an exploration-free regret bound of order Õ(d/λ02)
(logarithmic dependence on T
is absorbed by the tilde). The substantive message is that when the
environment supplies exogenous variation in effort (diverse agents,
diverse tasks, or repeated independent measurement), : we can remain
near the best compliant contract throughout and still identify θ*.
Across both regimes—explicit exploration or diversity-driven learning—the projection step plays a benign role. It never harms feasibility, and it does not worsen statistical rates because it is Lipschitz in the relevant norm. In effect, regulation enters the learning problem through two channels: it can restrict instrument variation (making eigenvalues small), but it also provides a stable map from estimated values to implemented incentives. This is precisely the tradeoff the model is designed to illuminate: constraints may reduce what we can learn from our own interventions, yet the best-response geometry ensures that once we learn, compliance can be enforced without amplifying error.
In applications the most delicate modeling step is often not the cost
function, but the translation of policy constraints into a set C ⊆ ℝ+d
that can be enforced mechanically. The guiding principle is to express
compliance requirements as linear (in)equalities whenever possible, both
because these are auditable and because they preserve fast projection
routines. Common examples include: (i) and on incentives, $\underline{\beta}\le \beta\le
\overline{\beta}$ (box constraints); (ii) or limits, 1⊤β ≤ B;
(iii) and constraints across tasks, βi ≥ βj
(e.g., do not reward clicks more than verified purchases); and (iv) or
requirements that force equal weights across a subset of tasks, βi = βj
for i, j in a
regulated group. Each of these yields a polyhedron
C = {β ∈ ℝ+d : Aβ ≤ b, Eβ = e},
which is closed and convex, and is directly consumable by modern
quadratic and conic solvers.
Two practical lessons are worth stating explicitly. First, if C is unbounded (for instance, only nonnegativity constraints), then existence of an optimum can fail under some technologies; in deployed systems this is typically addressed by adding explicit caps $\overline{\beta}$ justified as risk controls, payment predictability, or anti-manipulation safeguards. Second, when policy creates a low-dimensional feasible region (e.g., many equalities), the platform should treat the resulting loss of identifiability as a , not a bug to patch: it clarifies which aspects of θ* are learnable and which are fundamentally pooled by design.
When costs are quadratic, regulated optimization and compliant
learning repeatedly call a weighted projection of the form
ProjCQ(z) ∈ arg minβ ∈ C(β − z)⊤Q(β − z), Q ≻ 0.
Operationally this is a strictly convex quadratic program (QP). If C is specified by linear
constraints, we can write it in standard QP form and solve it with
off-the-shelf routines (interior-point methods for high accuracy;
operator-splitting methods such as ADMM for speed and warm-starting).
Warm-starting is particularly valuable online: successive targets zt (coming from
successive estimates θ̂t) move
gradually, so reusing the previous primal–dual iterate can reduce solve
times by an order of magnitude.
A simple computational trick reduces weighted projection to Euclidean
projection. Let L be such that
Q = L⊤L
(e.g., Cholesky). Then
(β − z)⊤Q(β − z) = ∥L(β − z)∥22.
If we change variables γ = Lβ, we can
re-express the projection as a Euclidean projection onto the transformed
set LC := {Lβ : β ∈ C}.
This is not always simpler (it can densify constraints), but it is
useful when Q is diagonal or
sparse, and it clarifies numerical conditioning: badly scaled Q produces poorly scaled QPs, so a
preconditioning step (or simply rescaling task units) can materially
improve stability.
In many policy-driven cases C is an intersection of simple sets (boxes, monotone cones, affine subspaces). Then one can avoid general-purpose QP solvers and use specialized projection algorithms such as Dykstra’s method or alternating projections, which are easy to implement, easy to audit, and often fast enough for real-time contract updates. The platform-facing implication is that a “projection-friendly” description of regulation (e.g., constraints that decompose across coordinates or across a small number of groups) is not merely a convenience; it is an enabler of frequent, transparent compliance updates.
For general k-homogeneous
costs, the regulated solution is typically computable but not closed
form. The key object becomes a Bregman divergence Dψ(β, z)
for a convex potential ψ
derived from c*
(for instance, ψ(β) = (k/(k − 1))c*(β)
in separable power families). Implementationally, computing
arg minβ ∈ CDψ(β, z)
is a convex optimization problem; hence the main questions are convex
solver to use and how to maintain reliability under changing data.
When C is polyhedral and ψ is smooth and strongly convex on ℝ++d, a robust approach is to run a mirror-descent (or dual-averaging) routine with projections onto C in an easy geometry. In the special but important separable case $c(a)=\sum_i \frac{1}{k}a_i^k$ (up to scaling), c*(β) is also separable and ∇c* is inexpensive coordinatewise; then each iteration requires only evaluating ∇ψ and projecting onto C (often Euclidean). This is precisely the situation in which Bregman methods shine: the “curvature” is absorbed into the mirror map rather than into the projection operator. Conversely, if c* is not separable or has a dense Hessian, we should expect heavier computation and should consider solving the convex program directly (e.g., via a second-order conic solver) at lower frequency, while interpolating contracts between solves.
A limitation we emphasize in deployments is . Many ψ induced by c* become steep near βi ↓ 0, which is economically natural (zero incentives can collapse effort) but numerically tricky. In practice we can add a small lower bound β ≥ ε1 if policy permits, or use barrier-aware solvers and explicit tolerances. The broader point is that the constraint set C should be chosen with both policy and numerics in mind: extremely sharp corners and near-infeasible equalities are a recipe for brittle optimization and thus brittle compliance.
Because learning hinges on IV/GMM relevance, we recommend treating
instrument diagnostics as a first-class compliance object. Concretely,
the platform can track the smallest singular value (or a regularized
surrogate) of the accumulated first-stage matrix, e.g.,
λmin (Bt⊤Xt) or λmin (∑s ≤ tx̃sx̃s⊤) in
repeated-measurement designs.
These quantities can be logged in real time, thresholded, and audited ex
post. If the diagnostic falls below a policy-set floor, the system can
automatically trigger an “identification mode”: either lengthen
exploration, widen the randomized support of contracts within C, or increase reliance on
repeated-measurement instruments.
Validity is subtler than relevance. Using βt as an instrument rests on the exclusion restriction that βt affects yt only through effort at, not through direct channels (e.g., changing who participates, changing demand, or changing downstream moderation). While this is a modeling assumption, we can still impose operational checks: pre-register which product changes are allowed during learning windows; run placebo moments (does βt predict outcomes that should be unaffected?); and use holdout policies where some cohorts face fixed contracts to detect drift in the environment. These checks do not “prove” validity, but they create an auditable trail that the identifying assumptions were not obviously violated by concurrent interventions.
Repeated-measurement identification depends on conditional independence of measurement errors across channels. In platform terms, this is an engineering requirement: measurement pipelines should be de-correlated (different logging systems, different sampling noise, different post-processing) so that errors are not shared. When this is infeasible, we can sometimes construct approximately independent views by splitting traffic, time-slicing logs, or using separate raters. The cost is redundancy; the benefit is that the platform can learn while staying close to the best compliant contract, rather than paying an exploration tax.
Even when independence holds, instrumentation can be weakened by strategic behavior: if agents can differentially manipulate one measurement channel, the repeated-measurement IV becomes endogenous. This suggests an institutional complement to the formal algorithm: periodically rotate or refresh measurement channels, maintain tamper-evident logs, and audit for discontinuities around known thresholds. In our setting these are not merely “trust and safety” add-ons; they are part of the statistical apparatus that makes learning possible.
A practical deployment typically follows a loop: (i) fix a policy-derived C with a documented rationale and an auditable representation; (ii) run an estimator (IV/GMM or repeated-measurement IV) with explicit relevance diagnostics and regularization; (iii) map θ̂ to an implementable contract via a projection routine whose numerical tolerances are logged; and (iv) monitor both economic outcomes (payouts, participation) and identifying health (eigenvalues, residual moments). The central implementation message is that : the set C determines which contracts are legal, which directions are identifiable, and which projections are computationally stable. Designing C is therefore not a purely legal step; it is part of mechanism design, econometrics, and systems engineering all at once.
Our baseline analysis isolates a clean geometric message: once policy constraints are encoded as a closed convex set C, the economic problem of ``regulated contract design’’ reduces to a projection (Euclidean under quadratic costs; Bregman in more general homogeneous technologies), and learning can be layered on top via IV/GMM without breaking compliance. In practice, three frictions are especially salient for platforms: (i) agents differ in their effective returns-to-scale parameter k (and more broadly in cost curvature); (ii) outcomes y arrive with delay; and (iii) participation is endogenous. Each friction preserves part of the structure, but each also changes what must be modeled, randomized, and logged.
A convenient interpretation of k is as an ``elasticity of marginal
cost under scaling.’’ Empirically, this elasticity can differ across
agents (creators with different production processes), across tasks
(some tasks saturate quickly), or across time (seasonality, policy
shocks). A simple way to capture this is to index costs by type τ, writing cτ as strictly
convex, differentiable, and kτ-homogeneous.
If the principal observes a type proxy s (e.g., a cohort label, tenure bin,
or verified business status) and is permitted to condition contracts on
it, then the regulated problem decomposes by segment: for each s we solve
maxβ ∈ C(s)us(β) = ⟨θs* − β, as(β)⟩, as(β) = ∇cs*(β),
and implement βreg(s)
via the corresponding projection map (weighted in the quadratic
benchmark, or Bregman in the general homogeneous case). Conceptually,
regulation then acts each segment, while heterogeneity is handled by
.
When type is unobserved (or conditioning is restricted by policy), the platform effectively chooses a single β to maximize an objective 𝔼τ[uτ(β)]. Two points matter. First, the resulting objective typically remains concave in β under the same primitives (since it is an expectation of concave functions), so existence and uniqueness under compact C are not the obstacle; rather, the obstacle is that the simple scaling rule β = θ*/k no longer applies with a single k. Second, the welfare consequences of ``one-size-fits-all’’ incentives become legible: if high-k types are more costly to motivate, a contract calibrated to low-k types may induce excessive payouts without proportional benefit for high-k types, while a contract calibrated to high-k types may under-incentivize low-k types. In operational terms, this is a strong argument for (policy-permitting) coarse segmentation, because it can dramatically reduce the efficiency loss while preserving auditability.
A conservative alternative is robust design over a plausible set of curvatures, e.g. k ∈ [kmin, kmax]. In the unconstrained idealization one would choose β closer to θ*/kmax, reflecting the fact that over-incentivizing is costlier than under-incentivizing when the platform is uncertainty-averse about curvature. Under regulation, the same logic translates into choosing β as the maximizer of mink ∈ [kmin, kmax]uk(β) over C, which is still a tractable concave–convex saddle problem but typically requires numerical methods.
Many economically meaningful outcomes are delayed: refunds arrive
weeks later, long-run retention is only observed after a horizon, and
quality audits take time. Delays do not change the static regulated
optimum, but they do change the learning problem because the moment
condition
𝔼 [βt(yt − ⟨θ*, xt⟩)] = 0
cannot be evaluated at time t
if yt is
not yet observed. The practical implication is that the platform must
run a learning pipeline with : contracts βt ∈ C
are posted continuously, while the estimator of θ* is updated whenever
outcomes mature. Algorithmically, this corresponds to online learning
with delayed feedback; in regret bounds, one typically pays an additive
(or multiplicative) penalty in the delay magnitude, reflecting that
decisions are made using stale sufficient statistics.
Two mitigations are worth highlighting. First, many platforms observe intermediate proxies quickly (e.g. early engagement or complaint rates). Even if these are imperfect substitutes for y, they can serve as auxiliary moments or as triggers for safety constraints that shrink C temporarily (a ``circuit breaker’’) while waiting for final outcomes. Second, delays interact with compliance in an important way: when policy requires immediate enforcement (e.g. payout caps), we can remain compliant because projection depends only on the current θ̂, not on observing y instantly. The cost is slower adaptation, not violation of constraints.
Our baseline model implicitly assumes a fixed agent who always
supplies effort. Platforms instead face responses: agents can choose
whether to participate, how much to participate, and which tasks to
attempt. A parsimonious extension adds an outside option r and a participation indicator
p ∈ {0, 1}. The agent
participates if the value of the effort problem exceeds r:
p(β) = 1 {maxa ≥ 0⟨β, a⟩ − c(a) ≥ r} = 1 {c*(β) ≥ r}.
This highlights an economically intuitive channel: increasing β raises not only induced effort
a(β) but also the
``rent’’ c*(β), hence
participation.
Selection has two consequences. On the objective side, the principal now maximizes expected utility multiplied by participation, e.g. 𝔼[p(β)⟨θ* − β, a(β)⟩] (possibly minus fixed onboarding costs). Even if u(β) is concave, the product with a discontinuous (or sharply changing) p(β) can create non-smoothness and local kinks, making the geometry less projection-like unless one smooths participation (e.g. random r with a continuous distribution). On the identification side, using βt as an instrument becomes more delicate because βt changes appears in the data. Then βt may correlate with unobserved determinants of yt through composition, violating the exclusion restriction in a literal sense. This is not a fatal flaw, but it forces an explicit design choice: either (i) model participation and estimate a joint system (participation equation plus outcome equation), or (ii) restrict attention to regimes where participation is stable (e.g. for incumbents with low churn), treating entry/exit as an additional outcome to be monitored rather than a confounder to be ignored.
A practical compromise is to treat participation as a first-class endogenous variable and to collect moments for it. For example, if pt is observed, one can augment the GMM system with instruments for participation (or impose a policy that randomizes a participation-neutral component of the contract), and test whether residual moments remain stable across participation strata. This does not eliminate selection, but it makes the modeling assumptions auditable.
These extensions make clear that the main empirical bottleneck is rarely the optimization routine; it is whether the platform has the right data to support identification under its own policy constraints. At minimum, the platform should log: the posted contract βt and the exact constraint regime in force (a versioned representation of C); the raw task signals xt (ideally at the most disaggregated level, before post-processing); the realized outcome yt with timestamps to measure delays; and agent identifiers so repeated observations can be linked. To address heterogeneity, the platform must record the segmentation variables it is allowed to condition on, as well as protected-class indicators when fairness constraints depend on them (with appropriate governance). To address participation, it must log offer exposure and acceptance (who saw which βt, who opted in, who churned), not just outcomes for participants. To address delayed outcomes, it must log maturation windows and backfills (late-arriving y updates), since these determine which observations are ``final’’ at any point in time.
Finally, if the platform relies on IV variation induced by randomized or quasi-random changes in βt, it must log the randomization protocol (seed, assignment unit, stratification variables) and all concurrent product interventions that could create direct channels from βt to yt. Without this provenance, one can compute an estimator, but one cannot defend the identifying assumptions when policy or litigation demands an explanation.
We view the projection characterization as a robust organizing principle, not as a claim that linear contracts and homogeneous costs are empirically exact. Heterogeneous curvature, delays, and selection already push us toward richer models (contextual contracts, delayed-feedback learning, joint participation–outcome systems). The remaining challenge is to expand expressiveness while preserving the two deployment virtues that motivated linearity in the first place: transparency (auditors can read β) and mechanical compliance (every update lies in C). Piecewise-linear or groupwise-linear contracts, and constrained contextual policies β(s) ∈ C(s), are promising intermediate steps: they enlarge the feasible policy class while keeping optimization convex and enforcement straightforward.
We began from a practical tension that many platforms and regulators now face. On the one hand, the platform would like to tailor incentives across many measurable dimensions of performance. On the other hand, policy constraints—caps, monotonicity requirements, groupwise equalities, and other notions of ``do no harm’’—restrict how contracts may depend on those dimensions. Our goal was to show that this tension has a clean economic resolution once we adopt two modeling commitments that are both analytically transparent and empirically testable: linear contracts over observable signals, and convex restrictions on the contract weights.
The first takeaway is geometric. When the agent’s technology is well
behaved—strictly convex and homogeneous in effort—the principal’s
problem of regulated contract design can be reorganized around a single
object: the between the unconstrained benchmark and the feasible policy
set. In the canonical quadratic-cost benchmark, the platform’s objective
reduces to a concave quadratic in the contract weights, and regulation
acts by mapping the unconstrained target θ*/2 to the nearest
feasible point in a weighted norm:
βreg = ProjCQ(θ*/2).
We emphasize the economic interpretation behind this algebra. Regulation
does not introduce a qualitatively new motive; it changes from the
first-best are permitted. The projection representation makes this
explicit: the feasible set C
encodes compliance, while the metric induced by Q encodes how strongly incentives
translate into effort along each task dimension. A cap, for example,
prevents movement in a direction; an equality constraint collapses
directions; a monotonicity constraint creates a cone. The optimal
regulated contract is simply the closest compliant point to the
benchmark in the technology-relevant geometry.
The second takeaway is conceptual rather than algebraic: the same ``projection logic’’ survives beyond quadratic costs. For general k-homogeneous costs, the principal no longer projects in Euclidean space. Instead, the agent’s technology induces a dual geometry through the convex conjugate c*, and the regulated optimum can be expressed (under standard regularity) as a Bregman-type projection onto C around the unconstrained target θ*/k. Even when this characterization does not collapse to a closed form, it still turns regulated contract design into a disciplined convex program: we can compute βreg reliably, and we can interpret binding constraints through KKT conditions in exactly the way auditors and policy teams need (which constraint binds, in which direction, and with what shadow value). In short, moving away from quadratic costs changes , not the tractability.
The third takeaway concerns identification and learning. In many
applied settings the platform does not know the marginal values θ*, and the observable
task signals are themselves noisy or strategically distorted. The
central empirical message is that policy-compliant optimization and
causal learning can be layered. When effort is latent and signals are
measured with error, naive regression of outcomes on signals fails. But
the platform’s own contract variation provides an instrument: under the
maintained exclusion (that βt affects yt only through
induced effort), the moment condition
𝔼 [βt(yt − ⟨θ*, xt⟩)] = 0
enables consistent estimation of θ* via IV/GMM. Once θ̂ is in hand, compliance is restored
mechanically by projecting the plug-in unconstrained contract back into
C. This separation is
operationally valuable: it allows an experimentation or inference team
to focus on identification and variance control, while a
policy/compliance team can insist that every posted contract satisfy
constraints by construction.
The projection step is not merely cosmetic. Because projection onto a closed convex set is nonexpansive (in the appropriate norm), it provides a stability guarantee: estimation error in θ̂ does not get amplified into larger policy error in β̂. This is the key reason that we can obtain regret bounds in online learning that look familiar—on the order of $\widetilde{O}(d\sqrt{T})$ under standard instrument-strength conditions—even though we require every action to remain in C at every time. The algorithmic content is modest (explore to ensure relevance, estimate by IV, project to enforce compliance), but the modeling implication is substantial: as long as the platform is willing to represent policy as convex restrictions in contract space.
A useful way to summarize the paper is to separate three layers of structure that are often conflated. The first layer is the , summarized by c (or Q in the quadratic benchmark), which determines how incentives translate into effort. The second layer is the , summarized by the feasible set C, which determines what the platform is allowed to do. The third layer is the , summarized by the signal and outcome equations and the strength of available instruments, which determines what the platform can learn. Our results show that these layers interact in a modular way: c determines the metric (Euclidean or Bregman), C determines the projection target, and the data determine how quickly we approach that target.
For practice, this modularity clarifies what must be specified—and
defended—when the platform claims to be optimizing subject to
regulation. The platform must (i) state the constraint set C precisely enough that an auditor
could verify membership, (ii) justify a mapping from contract weights to
induced effort that is consistent with observed behavior (or at least
robust to misspecification), and (iii) document the source of
identifying variation used to estimate θ*, including what is
randomized, what is held fixed, and what concurrent interventions might
violate exclusion. The value of the projection framing is that it forces
these commitments into explicit objects. A compliance review can ask
what is $C$ and how is it enforced?'' An economics review can askwhat
is the implied responsiveness matrix or conjugate geometry?’’ An
inference review can ask ``how strong are the instruments and how stable
are the moments?’’ These questions are separable, and the answers can be
logged and versioned.
At the same time, it is important to be clear about what the framework does resolve. Linear contracts are attractive because they are transparent, easy to compute, and easy to enforce, but they are not always behaviorally or institutionally adequate. The assumption that β affects outcomes only through effort is likewise a modeling claim, not a theorem; it can fail if, for instance, incentives change composition, induce gaming that directly impacts measured outcomes, or interact with platform-side ranking and matching rules. Homogeneity is a tractable abstraction that captures returns to scale in a parsimonious way, yet real production systems may feature fixed costs, discontinuities, and complementarities that violate it. Our view is that the contribution here is not to deny these complexities, but to provide a baseline in which the economics of regulation, the mechanics of compliance, and the statistics of learning can be analyzed jointly and cleanly.
Several research directions follow naturally. One is to expand the contract class while preserving convex computability and mechanical compliance—for example, piecewise-linear weights, groupwise-linear rules, or constrained contextual policies that map observable covariates into a feasible set C(s). Another is to integrate richer dynamics: reputation and relational incentives, intertemporal substitution of effort, and delayed or multi-stage outcomes that require explicit state variables. A third is to treat fairness constraints not only as restrictions on β but also as restrictions on induced allocations and welfare, which may require coupling the projection step with equilibrium participation and matching. Each direction raises new identification challenges, but the organizing principle remains the same: represent policy as a feasible region, represent behavior as a response map, and ensure learning uses variation whose provenance can be explained.
The broader lesson is that regulation changes the of optimization, not the need for optimization. Once we accept that compliance must be enforced at the level of deployed policies, the right question is not whether the platform can optimize, but whether it can do so in a way that is transparent, auditable, and statistically grounded. In our setting, convex constraints and projection-based updates provide a direct affirmative answer. The remaining work is to bring richer empirical content to the primitives—to measure responsiveness, to validate exclusion, to characterize heterogeneity and selection—while preserving the two deployment virtues that linear, constrained contracts deliver: every policy is interpretable, and every policy is compliant by construction.