We study a Stackelberg model of strategic classification with heterogeneous groups, group-dependent information (peer learning), and monotone manipulation costs. In a one-dimensional threshold setting we derive closed-form equilibrium acceptance probabilities and show a sharp dichotomy: fairness repairs achieved by increasing selectivity on the advantaged group are generically unstable—there exist environments where demographic parity holds before adaptation yet fails by a constant amount in equilibrium. In contrast, repairs achieved by inclusiveness—expanding access for the disadvantaged group—are equilibrium-stable, with post-adaptation disparities bounded by simple functions of cost and information heterogeneity.
Our results synthesize and extend the incentive-aware ML literature (strategic classification, partial information, and fairness reversal) by turning an empirical warning into a design rule: if fairness is required in strategic environments, constraints should be posed and optimized on equilibrium outcomes, and, when solutions are non-unique, the inclusive branch is robust. We also provide an efficient algorithm (binary search over thresholds) to compute inclusive equilibrium-fair rules and validate the theory with simulations on benchmark datasets under both continuous manipulation and manipulation graphs.
Modern decision systems routinely face a tension between two desiderata that are each compelling in isolation: we want simple, auditable rules (often threshold-based), and we want those rules to satisfy formal notions of fairness across protected groups. Yet these systems are deployed into environments in which individuals respond strategically. When a decision rule is predictable enough to be accountable, it is also predictable enough to be exploited. The central motivation of this paper is that the relevant object for fairness assessment is therefore not the rule “on paper,” nor even its performance under truthful reporting, but rather the rule’s equilibrium consequences after agents adapt.
A canonical example comes from consumer credit. Lenders increasingly rely on score-like indices—traditional credit scores, bank-account “cash-flow scores,” or proprietary underwriting models—that are then converted into accept/reject decisions through relatively transparent cutoffs. At the same time, both regulators and internal risk teams monitor disparities in acceptance rates across groups and may impose demographic-parity-style targets. A natural operational response is to “repair” disparities by adjusting thresholds: tighten the cutoff for an advantaged group, loosen it for a disadvantaged group, or combine both. However, the deployment of a repaired threshold changes incentives. If there are accessible actions that raise the relevant score (opening certain credit lines, paying for a credit-builder product, reallocating payments to optimize utilization, disputing items, or simply learning which behaviors the model rewards), then agents near the cutoff have a reason to take them. Importantly, the ability and willingness to do so is not uniform. Some populations have better access to financial products, more stable liquidity, or stronger peer networks that transmit actionable information. As a result, a threshold change that equalizes acceptance under baseline scores can be followed by a behavioral response that partially or fully undoes the equalization. In practice, one observes what we call a fairness reversal: an intervention that appears parity-improving ex ante becomes parity-worsening ex post.
Hiring provides a second motivating domain. Resume-screening systems and assessment tests are often summarized by a scalar score, and the operational decision is frequently a cutoff. Firms then face legal and reputational pressure to ensure that selection rates do not diverge too sharply across groups. But applicants can and do invest in targeted preparation: resume keyword optimization, coaching for psychometric screens, practice on leaked question banks, or the purchase of services that specifically advertise “beating the ATS.” Again, these investments are not equally available. Some candidates have access to better advising, more time to prepare, or communities that share tactical information. Moreover, the relevant strategic lever is typically threshold-crossing: an applicant slightly below the cutoff has the strongest incentive to engage in whatever action yields a marginal score increase, because the payoff from crossing the cutoff is discrete (an interview, a job offer, a chance to proceed). This creates a predictable pattern of manipulation concentrated in a band just below the decision boundary. A fairness repair that operates by tightening selection for an advantaged group can therefore backfire if that group is also the one more capable of “just clearing” the tightened boundary.
These examples illustrate why we insist on distinguishing pre-adaptation from post-adaptation disparity. A principal may compute acceptance rates under truthful reporting and conclude that a rule satisfies demographic parity. But if the principal’s deployment induces a subset of agents to alter their reported score, then the realized acceptance rates reflect both the statistical properties of the underlying distributions and the strategic environment in which the rule is embedded. The appropriate fairness object is thus an equilibrium acceptance probability, i.e., the probability that an agent is accepted after accounting for the agent’s optimal response to the rule. In this sense, equilibrium fairness is not an exotic refinement; it is the minimal standard needed for fairness claims to be robust to predictable behavioral feedback.
Our modeling approach is designed to formalize this robustness concern while remaining close to how threshold rules are used in practice. We take seriously the fact that agents are heterogeneous not only in their baseline qualification (a latent “true” score) but also in their ability to respond to the mechanism and in their access to information about how the mechanism works. Two simple primitives capture these ideas. First, we allow group-dependent costs of score improvement, which we summarize by a manipulation budget: how much an agent can raise the score before the marginal cost exceeds the unit value of acceptance. Second, we allow group-dependent informational responsiveness: not all agents will discover, understand, or trust the rule well enough to optimize against it. Some will behave strategically, others will not. In many real settings, this informational channel is shaped by opacity and peer learning. If the rule is opaque, fewer agents can target it; if communities share knowledge effectively, more agents can. Either way, informational differences can translate directly into disparities in who is able to exploit the decision boundary.
This perspective also clarifies why certain fairness “fixes” are particularly fragile. A common repair heuristic is what one might call selective: to reduce the acceptance rate of an advantaged group by raising its threshold, holding the disadvantaged group’s threshold fixed. Selective repairs are attractive because they preserve standards for the disadvantaged group and often look defensible as a procedural change. But they also create a newly valuable region just below the raised threshold, precisely where manipulation is cheapest and most effective. If the advantaged group is also more able or more informed, then the principal’s attempt to reduce its acceptance rate can be offset by increased manipulation, leaving the principal with a rule that is fair on paper but unfair in equilibrium. By contrast, repairs that operate by becoming more inclusive toward the disadvantaged group—loosening its threshold—often interact differently with strategic response: they increase baseline acceptance directly, and the residual disparity from manipulation can be bounded by how sharply the score distributions change near the boundary and by how different the groups are in responsiveness and manipulation capacity. The broad lesson is not that inclusiveness is always optimal, but that equilibrium stability depends on where the repair moves mass relative to the decision boundary and on which populations are best positioned to respond.
The policy relevance of equilibrium fairness is immediate. Regulators increasingly contemplate transparency requirements, adverse-action explanations, and standardized model documentation. These interventions may improve accountability, but they also change the strategic environment by making the rule easier to learn and target. Similarly, firms invest in “fairness through awareness” programs and external audits that may inadvertently publicize operational thresholds. Our framework does not take a normative stance against transparency; rather, it highlights a tradeoff: holding fixed the underlying distributions, increased learnability can amplify behavioral responses, and if learnability is uneven across groups, it can amplify disparity. This suggests that fairness governance cannot be separated from the information structure of the environment. In practice, a fairness assessment should specify whether it is conditional on truthful behavior, on partial strategic response, or on full adaptation—and which populations are expected to be able to adapt.
We close this introduction with a note on scope. We deliberately study a stylized, one-dimensional threshold environment, in which agents can only increase a scalar score and the decision-maker’s action is a cutoff (possibly group-conditional). This abstraction fits many operational pipelines—scores are ubiquitous, thresholds are common, and “gaming” often takes the form of raising a measured feature. At the same time, we do not claim that all strategic behavior is captured by additive score manipulation, nor that group differences can be reduced to a single cost and information parameter. Real settings feature multidimensional actions, general-equilibrium spillovers, dynamic learning by the principal, and welfare consequences beyond acceptance probabilities. Our aim is more modest and, we believe, complementary: to isolate a tractable mechanism by which fairness constraints that are satisfied under baseline behavior can fail after deployment, and to provide conditions under which certain classes of repairs are predictably more stable. In doing so, the model illuminates a practical lesson: fairness is not only a property of an algorithm, but also a property of the strategic environment the algorithm creates.
Our analysis sits at the intersection of three literatures: strategic classification and “gaming” of predictive systems; algorithmic fairness under behavioral response; and the role of information, transparency, and social context (including the distribution of burdens induced by compliance or manipulation).
A first thread studies prediction and classification when individuals can change inputs in response to a decision rule. Early computer-science work on adversarial classification and strategic manipulation emphasizes that when decision boundaries are known or can be inferred, agents will shift features in order to obtain favorable outcomes (e.g., in spam, fraud, or access-control settings). Subsequent work in strategic classification formalizes this feedback in economic terms: agents choose actions to maximize utility net of costs, and the principal’s deployed classifier shapes the induced action distribution. This literature clarifies a point we take as foundational: the object a principal “sees” after deployment is an endogenous distribution of reported features, not the baseline distribution of latent types. Our model adopts the canonical separation between a latent scalar “qualification” (here, a base score z) and a manipulable report ẑ, and it uses a standard best-response logic with monotone costs. Where we depart from much of the strategic-classification literature is in placing fairness constraints—not only accuracy or robustness—at the center of the principal’s problem, and in making group-dependent strategic responsiveness a primitive rather than an equilibrium artifact of a fully specified learning environment.
A second thread focuses directly on fairness constraints and their interaction with incentives. In the fairness literature, demographic parity and related criteria are typically defined with respect to outcomes under a fixed data-generating process. A large body of work explores the impossibility of simultaneously satisfying multiple fairness definitions, and the statistical tradeoffs between error rates across groups. More recent work asks what happens when the mechanism itself changes behavior: fairness constraints can shift incentives, and those incentive shifts can be heterogeneous across groups. One line of papers analyzes “fairness and effort” settings in which individuals invest effort to improve an evaluation score; fairness constraints can reallocate effort, sometimes increasing total welfare but sometimes imposing disproportionate burdens. Another line studies “recourse” and the feasibility of improving one’s outcome under a model: even when a classifier is formally fair, the set of attainable changes can differ by group, implying unequal access to favorable outcomes. Our formulation is intentionally simpler than these frameworks—we treat acceptance as a binary benefit and manipulation as an additive increase in a single score—but the simplification is purposeful: it isolates the threshold-crossing logic that makes behavioral response particularly sharp near a cutoff, and it allows us to express post-deployment acceptance rates in closed form. This tractability lets us distinguish repairs that operate by excluding (raising thresholds) from repairs that operate by including (lowering thresholds), and to show how these choices map into equilibrium disparities when manipulation ability and information differ.
A closely related behavioral-feedback literature studies performative prediction: when a predictor is deployed, it changes the environment that generates future data, so a model that is optimal on historical data may be suboptimal once it influences behavior. Our setting can be read as a one-shot, Stackelberg version of this idea: the principal commits to thresholds, agents respond, and realized outcomes reflect an endogenous distribution. The performative prediction perspective highlights a methodological lesson that motivates our equilibrium fairness metric: evaluating a rule “offline” on baseline data can be systematically misleading when the rule itself is a causal intervention. Our contribution is to bring this logic into a fairness-repair question that is operationally common—adjusting thresholds to equalize selection rates—and to characterize when such repairs are stable to strategic response.
A third thread concerns information, transparency, and the social processes by which agents learn how to respond to algorithms. The interpretability and transparency literatures often frame explanations as unambiguously beneficial: they improve accountability, enable contestability, and support compliance. At the same time, a mechanism-design perspective emphasizes that greater transparency can increase manipulability. Several recent papers study this tension formally, showing that revealing decision rules can induce gaming and reduce predictive validity, while also potentially improving welfare by enabling productive investment. Our reduced-form parameter λg is meant to capture precisely the “learnability” channel that connects transparency, peer learning, and strategic response. Rather than model information acquisition explicitly (e.g., through costly experimentation, social networks, or repeated play), we allow a fraction λg of each group to best-respond. This choice is a limitation—λg compresses many mechanisms into one scalar—but it is also a strength: it lets us ask how fairness repairs behave under plausible comparative statics (higher λg due to better access to advice, higher exposure to the system, or stronger peer networks), without committing to a particular microfoundation. In this sense, λg bridges two empirical observations that are often discussed separately: first, that some communities have more effective channels for disseminating “what the model wants,” and second, that transparency interventions can differentially amplify those channels.
Our discussion also relates to work on disparate impact as a burden rather than (or in addition to) a disparity in selection rates. Even when groups achieve similar acceptance probabilities, the cost of achieving acceptance can differ if one group must invest more (time, money, stress, or foregone opportunities) to cross the decision boundary. This concern appears in scholarship on the “hidden costs” of algorithmic systems and in formal models of effort-based fairness. While our core results focus on demographic parity in acceptance rates—because that is a common compliance metric and analytically tractable—our framework naturally speaks to burden through the manipulation cost function cg(⋅). In particular, a rule that equalizes equilibrium acceptance probabilities may still be inequitable if it induces systematically higher manipulation expenditure for one group. We do not fully pursue welfare or burden-optimal policy here, but we view the ability to compute induced manipulation regions and costs as a useful byproduct of the equilibrium characterization, and as a stepping stone to richer objectives.
Finally, our emphasis on repair policies connects to an applied literature on post-processing and threshold adjustment to satisfy fairness constraints. Much of that literature treats threshold changes as a direct mapping from score distributions to acceptance rates. Our results caution that, in strategic settings, the mapping is itself policy-dependent: moving a threshold changes not only who is accepted under baseline scores, but also who finds it worthwhile to manipulate. This matters especially for the common “selective repair” heuristic of tightening standards for an advantaged group. The key conceptual link to prior work is that fairness constraints can create incentives to concentrate effort (or gaming) around decision boundaries; our added point is that when this concentration is group-skewed—through either differential ability to manipulate (bg) or differential ability to learn (λg)—repairs that look parity-restoring under truthful reporting can be parity-destabilizing after deployment.
In sum, we borrow from strategic classification the idea that deployed rules induce behavioral response; from performative prediction the idea that evaluation must be equilibrium-consistent; and from fairness and recourse the idea that both outcomes and burdens matter and can differ by group. Our contribution is to combine these insights in a parsimonious threshold environment and to use that parsimony to obtain sharp, policy-relevant distinctions between exclusionary and inclusionary fairness repairs. We now turn to the baseline model that delivers this tractable equilibrium mapping from thresholds to post-adaptation acceptance rates.
We study a one-shot, Stackelberg model of strategic classification with two observable groups g ∈ {A, B}. The principal (decision-maker) commits to a simple threshold rule, agents respond by possibly manipulating their reported score, and the principal’s realized selection rates reflect this endogenous response. The purpose of the baseline model is not realism in every detail, but tractability: we want a setting in which (i) strategic behavior concentrates around a cutoff in an analytically transparent way, and (ii) differences in groups’ ability to respond can be represented by a small number of interpretable primitives.
Latent scores. Each agent belongs to a group g and draws a one-dimensional, latent “base score” z ∼ Dg. We take Dg as a primitive of the environment, with continuous CDF Fg and density fg. The base score can be interpreted as a composite index of underlying qualification or risk—e.g., a creditworthiness index, a standardized test score, or an internal risk score—whose distribution differs across groups due to historical and structural factors. Our emphasis is on how a fixed decision rule interacts with these baseline differences once agents can adapt.
The one-dimensionality assumption is standard in threshold models and is motivated by practice: many high-stakes decisions are operationalized via a scalar score followed by a cutoff. More importantly, a scalar score isolates a key mechanism: when acceptance is determined by whether one crosses a single boundary, incentives to act are sharply localized near that boundary. This localization drives our main comparative statics and, ultimately, the contrast between exclusionary (raising thresholds) and inclusionary (lowering thresholds) “repairs.”
Decision rule (thresholds). The principal chooses a
vector of group-specific thresholds t = (tA, tB) ∈ ℝ2.
After observing the agent’s reported score ẑ and group g, the principal applies
ft(ẑ, g) = 1{ẑ ≥ tg}.
Allowing tA ≠ tB
is meant to encompass two cases: (i) explicit group-conditional policies
(which may be legally or normatively constrained in some domains), and
(ii) an analytical proxy for post-processing or “repair” rules that
effectively implement different cutoffs across groups through other
instruments. Our results are most naturally read as statements about
what kind of threshold movement—tightening versus
loosening—tends to be stable once behavior adjusts.
Manipulation technology. After observing z, agents may increase their
reported score. We model manipulation as an additive increase Δ ≥ 0, so that
ẑ = z + Δ.
This captures many forms of “gaming” or costly improvement that
primarily move an individual along the model’s scoring dimension: paying
for test prep, hiring a consultant to optimize an application,
reallocating transactions to improve a credit score, or selectively
disclosing information. We abstract from manipulation that decreases
scores (it is never privately optimal here) and from multidimensional
feature changes (which complicate identification of the relevant
margin). Additivity is a simplification, but it delivers a transparent
threshold-crossing logic: to change an outcome, an agent must move just
enough to cross tg.
Manipulation is costly. A group-g agent who increases by Δ pays cg(Δ), where cg : ℝ+ → ℝ+ is strictly increasing, continuous, and normalized by cg(0) = 0. We interpret cg broadly as the private burden of attempting to improve one’s score along the relevant dimension, including time, money, stress, and foregone opportunities. Allowing cg to vary by group is crucial: some groups may have systematically better access to credit-building products, coaching, documentation, or networks that reduce the cost of strategic response.
Agents obtain a normalized value of 1 from being accepted. Thus, conditional on
(z, g) and thresholds
t, utility is
Ug(z, Δ; t) = 1{z + Δ ≥ tg} − cg(Δ).
The “value = 1” normalization is
without loss of generality for our purposes: it pins down units and
makes costs immediately interpretable as fractions of the acceptance
benefit. What matters for the behavior we study is the binary nature of
acceptance and the monotonicity of costs, which together imply that
manipulation decisions are governed by a simple comparison between the
gain from switching the classification outcome and the cost of doing
so.
A useful summary of manipulation ability is the budget (or
maximal worthwhile shift)
bg := cg−1(1).
This is the largest score increase an agent in group g would ever be willing to pay for
if acceptance yields value 1. When
bA > bB,
group A can, in this
reduced-form sense, “move farther” for the same benefit. We will see
that differences in (bA, bB)
translate directly into differences in how much mass each group can push
over a threshold.
Information heterogeneity (limited strategic responsiveness). A second central primitive is that not all agents know how to respond effectively. In many real systems, the decision rule is partially opaque: agents may not know the precise cutoff, may be uncertain which actions map into score changes, or may lack the sophistication to optimize. At the same time, some agents learn through repeated interaction, advice markets, social networks, or formal disclosures. We capture these channels parsimoniously by assuming that, within group g, an agent best-responds with probability λg ∈ [0, 1], and otherwise does not manipulate (reports ẑ = z).
We interpret λg as a reduced-form responsiveness rate: it summarizes both access to information and the ability to translate information into effective action. Differences in λg can arise from differential exposure to the system (more frequent applicants learn faster), differential access to counselors or online forums, or heterogeneous trust and engagement with institutions. This modeling choice deliberately avoids committing to a microfoundation (e.g., costly information acquisition or network diffusion), which would add detail but obscure the comparative statics we care about. In equilibrium, λg functions as an “opacity/transparency” lever: holding all else fixed, higher λg makes the deployed rule more gameable by a larger share of group g.
Timing and commitment. The interaction is a one-shot Stackelberg game. The principal first commits to thresholds t. Then each agent observes their group g and base score z. With probability λg, the agent is informed and chooses Δ to maximize Ug; with probability 1 − λg, the agent does not manipulate. The agent reports ẑ = z + Δ, and the principal applies ft.
This timing mirrors “offline deployment” of a policy: a lender posts underwriting standards, a university sets an admissions cutoff, or an agency implements an eligibility threshold; applicants then adapt. It also clarifies what we mean by evaluating a fairness repair: a repair is not assessed against the historical distribution of z alone, but against the post-deployment distribution of ẑ, which is policy-dependent.
Principal’s objective and the role of fairness. We keep the principal’s objective deliberately flexible. In some applications the principal cares about predictive accuracy or expected loss with respect to an outcome y (e.g., repayment), and thresholds are chosen to trade off errors. In others, the principal faces explicit constraints on selection rates. Our analysis treats threshold choice as the policy instrument and focuses on how fairness constraints map into thresholds when behavior responds. Concretely, our later results will compare two types of threshold adjustments that are common in practice: tightening standards for one group versus loosening standards for another. The baseline model’s role is to make these adjustments comparable within a single environment in which both manipulation ability (bg) and information (λg) can differ.
We emphasize at the outset a limitation that also motivates our focus: because manipulation is costly, equalizing acceptance probabilities need not equalize burdens. Even if a repair achieves similar selection rates across groups after adaptation, one group may pay more (in expected manipulation cost) to attain those outcomes. Our core metric will be disparity in acceptance rates because it is operationally prominent and analytically clean, but the model keeps track of costs, enabling burden calculations as a natural extension.
With primitives (Dg, cg, λg) and a threshold vector t in place, we can now characterize equilibrium behavior and derive closed-form acceptance probabilities as a function of effective thresholds—an equilibrium mapping that will allow us to distinguish “fair on paper” (pre-adaptation) from “fair in equilibrium” (post-adaptation) policies.
Fix a threshold vector t = (tA, tB). Because the principal’s decision is binary and manipulation only enters through the reported score ẑ = z + Δ, an agent’s strategic problem has a particularly stark structure: conditional on being strategic, the agent either (i) does nothing, or (ii) manipulates exactly enough to cross the relevant cutoff. This “threshold-crossing” logic is the engine behind all of our subsequent stability results, because it lets us map any deployed threshold rule into an analytically tractable, policy-dependent acceptance rate.
Consider a group-g agent
with base score z. If the
agent is among the fraction λg who can
best-respond, they solve
maxΔ ≥ 0 1{z + Δ ≥ tg} − cg(Δ).
There are two cases.
If z ≥ tg: the agent is already accepted without manipulation. Since cg(Δ) is strictly increasing and cg(0) = 0, any Δ > 0 strictly reduces utility. Hence Δg*(z; t) = 0.
If z < tg: acceptance requires choosing Δ ≥ tg − z. Because costs are strictly increasing, among all Δ that secure acceptance, the cheapest is Δ = tg − z. Thus the agent compares two options: reject with Δ = 0 (utility 0) versus accept by just crossing (utility 1 − cg(tg − z)). The agent manipulates if and only if cg(tg − z) ≤ 1. Using the manipulation budget bg := cg−1(1), this condition is equivalent to tg − z ≤ bg, i.e., z ≥ tg − bg.
Putting these cases together, the strategic best response is
$$
\Delta_g^*(z;\mathbf t)=
\begin{cases}
0, & z\ge t_g,\\
t_g-z, & z\in[t_g-b_g,\ t_g),\\
0, & z<t_g-b_g.
\end{cases}
$$
Two immediate implications are worth highlighting. First, manipulation
is localized to an interval of width bg just below
the threshold—exactly the region where outcomes are “within reach.”
Second, for strategic agents the deployed cutoff tg is
behaviorally transformed into an effective cutoff tg − bg:
anyone with z ≥ tg − bg
can ensure acceptance by paying a cost weakly below 1.
The model’s information heterogeneity enters only through whether the agent is able to implement the best response. With probability λg, the agent is strategic and plays Δg*(z; t); with probability 1 − λg, the agent does not manipulate and reports ẑ = z. We can interpret the realized population as a mixture of two “types” within each group: a strategic subpopulation that effectively faces cutoff tg − bg, and a non-strategic subpopulation that faces the literal cutoff tg.
Although one could characterize the full distribution of reported scores ẑ, for our purposes it is enough to compute acceptance probabilities. Let Sg(u) := 1 − Fg(u) denote the survival function of Dg. Under our continuity assumption, there is no mass exactly at the cutoff, so acceptance events can be treated cleanly using Sg.
Let pg(t) denote the equilibrium (post-adaptation) acceptance probability in group g. Condition on whether the agent is strategic.
Therefore,
pg(t) = (1 − λg) Sg(tg) + λg Sg(tg − bg).
This expression makes transparent how behavior mediates the effect of
thresholds. Relative to the “truthful” acceptance rate Sg(tg),
the deployed policy receives an endogenous boost equal to
pg(t) − Sg(tg) = λg(Sg(tg − bg) − Sg(tg)),
which is increasing in both responsiveness λg and
manipulation ability bg, and is
amplified when there is substantial probability mass near the cutoff
(i.e., when fg is large
around tg).
Several monotonicities follow immediately and will be repeatedly used later. For each group g, pg(t) is strictly decreasing in its own threshold tg (since Sg is strictly decreasing under a continuous density), weakly increasing in λg, and weakly increasing in bg. In words: raising a cutoff reduces acceptance, but some of that reduction is “undone” by strategic response, especially in groups that are better able to manipulate or better informed about how to do so.
It is also helpful to emphasize what does not appear in pg(t): we do not need the full functional form of cg(⋅), only the single-index summary bg = cg−1(1). This is a direct consequence of binary payoffs from acceptance. More elaborate value structures (e.g., heterogeneous values of acceptance) would generally require richer summaries of costs; we return to this limitation in our discussion, but the binary case already isolates the core stability mechanism.
Because acceptance depends on the reported score ẑ, fairness assessed “on paper” using the base distribution of z can diverge from fairness “in equilibrium” after agents adapt. We therefore distinguish two demographic-parity gaps associated with the same deployed thresholds t.
Pre-adaptation (truthful) acceptance. If all agents
reported ẑ = z,
group-g acceptance would
be
Pr [ft(z, g) = 1 ∣ g] = Pr [z ≥ tg ∣ g] = Sg(tg),
and the corresponding demographic-parity gap is
Δpre(t) := |SA(tA) − SB(tB)|.
Equilibrium (post-adaptation) acceptance. Under
strategic response with limited information, group-g acceptance is pg(t)
as above, and the equilibrium demographic-parity gap is
Δeq(t) := |pA(t) − pB(t)| = |(1 − λA)SA(tA) + λASA(tA − bA) − (1 − λB)SB(tB) − λBSB(tB − bB)|.
The key conceptual point is that Δeq(t) is not a property of (DA, DB) and thresholds alone: it is jointly determined by thresholds and the groups’ response parameters (λg, bg). A policy can therefore be calibrated to satisfy Δpre(t) = 0 and yet fail to satisfy Δeq(t) = 0 once deployed, simply because the underlying incentives to cross a cutoff are uneven across groups.
Finally, while our primary fairness object is acceptance-rate disparity, the equilibrium characterization also pins down manipulation incidence and burden. For example, among strategic group-g agents, manipulation occurs exactly when z ∈ [tg − bg, tg); thus the manipulation rate is λg(Fg(tg) − Fg(tg − bg)), and expected manipulation cost can be written as an integral over this interval. These quantities will matter when we later discuss the welfare consequences of “repairs” that equalize acceptance rates at the expense of unequal (or inefficient) effort costs.
With the equilibrium mapping t ↦ (pA(t), pB(t)) in hand, we can now study policy classes that practitioners use to enforce parity—loosening cutoffs for a disadvantaged group versus tightening cutoffs for an advantaged group—and ask which classes remain fair after adaptation rather than only under truthful reporting.
In applied settings, demographic parity (and related group constraints) is rarely implemented by redesigning the underlying score; instead, it is typically implemented by post-processing a fixed score with group-conditional cutoffs. This is true both in “human-in-the-loop” systems (where caseworkers are instructed to apply a different bar across groups) and in automated screening (where the platform’s accept/reject threshold is adjusted). Our model is tailored to this reality: the principal’s policy instrument is the threshold vector t = (tA, tB), and the only difference across common repair heuristics is which direction we move each cutoff relative to some baseline.
To make this precise, we start from a benchmark threshold t that the principal would deploy absent fairness considerations. Conceptually, t can be the accuracy-optimal common threshold under truthful reporting, or simply an institutionally “default” cutoff (e.g., a credit score standard) that is treated as fixed for legal or operational reasons. A fairness repair is then any mapping from t into a group-specific pair (tA, tB) intended to reduce an acceptance-rate gap. The key choice is whether we seek parity by tightening standards for the group with higher base acceptance, or by loosening standards for the group with lower base acceptance.
We call a repair selective if it holds fixed the
disadvantaged group’s cutoff and raises the advantaged group’s cutoff.
In our two-group notation (where group A is often, though not always, the
higher-scoring group in the MLR sense), the selective class anchored at
t is
𝒯sel(t) := {(tA, tB): tB = t, tA ≥ t}.
The interpretation is straightforward: we “repair” a disparity by
excluding additional marginal applicants from group A, leaving group B’s standard untouched. This
captures a family of policies that are common in practice when
decision-makers are reluctant (for legal, political, or risk-management
reasons) to explicitly lower standards for the lower-acceptance group.
It also aligns with a popular algorithmic instinct: if one group is
over-represented among accepted cases, raise that group’s bar until the
numbers match.
Under truthful reporting, the selective strategy is mechanically effective whenever SA(t) exceeds SB(t): increasing tA decreases SA(tA) continuously, so one can often find tA such that SA(tA) = SB(t). Our equilibrium characterization, however, suggests an immediate tension: selective repairs create a larger manipulation incentive margin for group A exactly near the new, higher cutoff. Because acceptance in equilibrium is governed by the effective cutoff tA − bA for strategic agents, the question is not only whether selectivity equalizes outcomes “on paper,” but whether it remains fair after the advantaged group responds to the newly created incentive.
We call a repair inclusive if it holds fixed the advantaged
group’s cutoff and lowers the disadvantaged group’s cutoff. The
inclusive class anchored at t
is
𝒯inc(t) := {(tA, tB): tA = t, tB ≤ t}.
Here the repair equalizes acceptance by including more marginal
applicants from group B. This
captures policies often described as affirmative action, targeted
outreach coupled with relaxed screening, or “equal opportunity”
adjustments implemented at the decision boundary rather than in the
score construction.
Under truthful reporting, inclusiveness similarly offers a direct route to parity: decreasing tB increases SB(tB), and one can typically choose tB so that SB(tB) = SA(t). Importantly, inclusive repairs change the incentive environment differently than selective repairs. Lowering tB shrinks the mass of group B agents who are “just below” the threshold and hence shrinks the set for whom manipulation is pivotal. In our equilibrium formula, this manifests through the terms SB(tB) and SB(tB − bB) moving together as tB changes. Intuitively, inclusiveness reduces reliance on agents “gaming” the boundary to reach acceptance, because more of the desired acceptance is achieved directly through the policy choice.
Many real interventions adjust both thresholds, either explicitly (a
regulator sets group targets and allows both standards to move) or
implicitly (internal tuning changes a common threshold while also
applying a group-specific offset). We therefore define a mixed class
anchored at t as any set
allowing both movements:
𝒯mix(t) := {(tA, tB): tA ≥ t, tB ≤ t}.
Mixed repairs can be viewed as convex combinations of selectivity and
inclusiveness: we partly tighten group A and partly loosen group B. In a non-strategic world this
class is attractive because it can reduce the efficiency cost of
parity—one need not move a single group’s cutoff as aggressively. In a
strategic world, however, the effect is ambiguous ex ante: tightening
A potentially increases
manipulation by A, while
loosening B potentially
decreases manipulation by B.
Which force dominates depends on local densities near the cutoffs and on
heterogeneity in (bg, λg).
Our main theory focuses on the polar cases 𝒯sel and 𝒯inc because they isolate the two canonical implementation logics used in policy debates (“raise standards” versus “lower standards”). Mixed rules then inherit features of both, and we return to them when discussing design implications.
A crucial modeling choice is when parity is evaluated. In
the simplest compliance narrative, the principal selects t to satisfy a
pre-adaptation (truthful) demographic-parity constraint,
Δpre(t) = |SA(tA) − SB(tB)| ≤ ε,
treating the score distribution as fixed. This is the natural target if
the principal calibrates using historical data under an earlier policy
or if manipulation is ignored in the audit.
In contrast, if parity is assessed on realized deployment outcomes
(as it often is in ex post monitoring), the relevant constraint is
equilibrium parity,
Δeq(t) = |pA(t) − pB(t)| ≤ ε,
where pg(t) = (1 − λg)Sg(tg) + λgSg(tg − bg).
The difference between these constraints is not a technicality: Δeq
depends on the strategic parameters (λg, bg),
and hence on opacity, access to coaching, and the technology of
manipulation. A repair that “solves” parity in the pre-adaptation sense
can therefore fail once agents respond, and our taxonomy is designed to
ask which repair directions are robust to that feedback.
Fairness constraints are rarely the principal’s only concern. In our
framework, accuracy can be represented by a standard loss ℓ(y, ŷ), where
ŷ = ft(ẑ, g)
is the decision and y is a
latent label linked monotonically to z (e.g., y = 1{z ≥ τg}).
The principal’s canonical problem is then
mint 𝔼[ℓ(y, ft(ẑ, g))] s.t. Δ(t) ≤ ε,
where Δ is either Δpre
(a “static” audit) or Δeq (a
“behavior-aware” audit). Because ẑ is endogenous, even the accuracy
term depends on t
through equilibrium behavior.
Beyond predictive accuracy, strategic classification raises an
additional welfare margin that is often central in policy discussions:
wasted effort. Manipulation costs cg(Δ)
are borne by agents and (in many applications) are socially unproductive
expenditures on coaching, paperwork, or gaming. A natural reduced-form
welfare criterion therefore augments loss with expected manipulation
cost,
mint 𝔼[ℓ(y, ft(ẑ, g))] + κ∑g ∈ {A, B}𝔼[cg(Δg*(z; t))] s.t.
fairness,
for some κ ≥ 0 capturing how
much the principal (or regulator) values reducing wasteful burden. This
objective makes explicit an important practical distinction between
repairs: selective repairs can equalize acceptance partly by
inducing costly responses near a higher cutoff, whereas
inclusive repairs can equalize acceptance partly by removing
the need to manipulate for the disadvantaged group. Our subsequent
results can be read as identifying when that intuitive welfare advantage
of inclusiveness is also a stability advantage.
We emphasize, finally, what our repair taxonomy does and does not cover. It focuses on threshold adjustments, which are a leading special case of post-processing and are analytically transparent in one-dimensional scores. More complex interventions—retraining the score, providing subsidies that change cg, or changing information structures that shift λg—are clearly relevant in practice, but they operate through different levers. By isolating threshold repairs, we can cleanly ask a narrow question: holding the score technology fixed, which direction of boundary movement is robust to strategic feedback, and which direction is prone to fairness reversal once the policy is deployed?
Selective repairs are often defended as the “safe” way to satisfy demographic parity: rather than lowering standards for the lower-acceptance group, the principal simply tightens the cutoff for the higher-acceptance group until acceptance rates match in the historical (truthful) data. Our first main result says that this intuition can fail in a strategically stark way. The reason is not subtle: raising tA creates a thicker manipulation margin for group A—a band of agents who are newly rejected but can cheaply move across the cutoff—so the very act of “excluding” marginal A applicants can induce an equilibrium rebound in A’s acceptance.
Fix the anchor t and
consider a selective move tA > t
with tB = t.
Under truthful reporting, acceptance rates are SA(tA)
and SB(t),
so pre-adaptation parity is achieved by choosing tA such
that
SA(tA) = SB(t).
In a non-strategic world, this is essentially the end of the story: once
we raise tA enough, the
acceptance gap closes.
With manipulation, however, the relevant object for the strategic
fraction λA is not the
survival at tA but the
survival at tA − bA.
Lemma 1 implies that the newly pivotal set is precisely the
interval
[tA − bA, tA),
whose mass is FA(tA) − FA(tA − bA).
If group A has substantial
density near tA, then this
mass can be large even when bA is modest.
The selective repair therefore has two countervailing effects on
equilibrium acceptance for group A: it lowers SA(tA)
for non-strategic agents but raises acceptance for strategic agents by
effectively “moving the bar” down by bA. When bA is larger
than bB
and/or λA
exceeds λB, this second
force can dominate in relative terms, producing a fairness reversal:
parity holds on paper, but disparity reappears after deployment.
This mechanism is especially relevant in applied domains where the “cost” of manipulation reflects access to coaching, appeals, or documentation, and where such access is uneven. A selective tightening can unintentionally create a premium on those resources near the boundary, precisely where many decisions are made.
We now formalize the above logic by showing that pre-adaptation parity within 𝒯sel(t) does not imply any meaningful bound on Δeq. In fact, for any target tolerance ϵ ≥ 0, we can construct primitives such that a selective repair achieves Δpre = 0 yet violates equilibrium parity by more than ϵ.
Concretely, recall that equilibrium acceptance is
pg(t) = (1 − λg)Sg(tg) + λgSg(tg − bg).
Under the selective restriction (tA, t),
the equilibrium gap is
Δeq(tA, t) = |(1 − λA)SA(tA) + λASA(tA − bA) − (1 − λB)SB(t) + λBSB(t − bB)|.
If we pick tA to enforce
pre-parity SA(tA) = SB(t),
the leading (truthful) terms cancel, leaving the post-adaptation gap
driven by differences in strategic uplift:
Δeq(tA, t) = |λA(SA(tA − bA) − SA(tA)) − λB(SB(t − bB) − SB(t))|.
A selective repair is therefore safe only if the incremental mass of
A in [tA − bA, tA),
weighted by λA, is not much
larger than the analogous mass for B at [t − bB, t),
weighted by λB. Our result
shows that selective repairs provide no general guarantee of this type:
one can satisfy pre-parity while making the first term large and the
second term small.
The simplest way to see the instability is to construct distributions with the following qualitative features:
Pre-parity is achievable by raising tA: group A has higher survival at the baseline anchor t, so there exists tA > t with SA(tA) = SB(t).
High local density for group A near tA: there is substantial probability mass in the band [tA − bA, tA). This ensures a large manipulation-induced uplift SA(tA − bA) − SA(tA).
Low local density for group B near t (or smaller bB, smaller λB): the analogous uplift for B, SB(t − bB) − SB(t), is comparatively small.
A canonical instance is to let DA place a “bump” of density just below the eventual cutoff tA, while letting DB be flatter (or shifted) around t. Because tA is chosen endogenously to satisfy SA(tA) = SB(t), the construction ensures that the truthful acceptances match exactly, even though the local geometry around the thresholds differs sharply.
Using only continuity of densities (no parametric assumptions), we
can translate “high mass in the manipulation band” into an explicit
lower bound. For any interval I ⊆ [tA − bA, tA],
SA(tA − bA) − SA(tA) = ∫tA − bAtAfA(u) du ≥ bA ⋅ infu ∈ IfA(u),
and similarly
SB(t − bB) − SB(t) = ∫t − bBtfB(u) du.
Hence, under pre-parity SA(tA) = SB(t),
we obtain the lower bound
Δeq(tA, t) ≥ λA∫tA − bAtAfA(u) du − λB∫t − bBtfB(u) du.
In our constructive instances we can make the first integral bounded
away from zero (by concentrating A-mass near tA) while making
the second arbitrarily small (by making B-density small near t, or by taking bB small, or
λB low).
This yields a fairness reversal of size at least a constant c > 0, and thus violates any
prescribed ϵ.
The takeaway is that selective repairs are intrinsically local: they depend on the density of the advantaged group near the new cutoff, and that is exactly where strategic response is most consequential. Put differently, selectivity “moves the goalpost” into a region where manipulation is both pivotal and (by assumption) feasible up to bA.
Two clarifications are important.
First, the result is not that selectivity always fails, but that it is not equilibrium-stable as a class: absent strong additional restrictions tying down (DA, DB) and (bg, λg), pre-adaptation parity via tA↑ provides no uniform control of post-adaptation disparity. In practice, this means that an institution that audits fairness using historical, non-strategic outcomes can be surprised by a systematic acceptance rebound for the group whose cutoff it tightened, especially when that group has better access to manipulation technologies or information.
Second, our construction highlights a policy-relevant asymmetry: selective repairs place the burden of adjustment on the group that is allegedly advantaged, but the capacity to respond strategically may be even more skewed toward that same group (higher bA, higher λA). In such environments, selective repairs can inadvertently reward strategic sophistication rather than equalize opportunity.
These observations motivate the next section, where we ask whether the opposite direction—inclusive repairs—admits stability guarantees under economically meaningful regularity conditions (smooth densities and MLR-type orderings) and under modest heterogeneity in manipulation ability and information.
The previous section showed that tightening the advantaged group’s cutoff can be brittle because it creates a thick set of newly rejected-but-nearby agents who have strong incentives to manipulate. We now ask whether moving in the opposite direction—lowering the disadvantaged group’s cutoff—admits any robust control of post-deployment disparities. The answer is yes: while inclusiveness is not a free lunch (it can sacrifice accuracy, and it can still be distorted by manipulation), it is the direction in which we can obtain a uniform equilibrium stability guarantee under economically standard regularity.
Fix an anchor threshold t
for group A, and consider the
inclusive family
𝒯inc(t) := {(t, tB) : tB ≤ t},
so the principal equalizes acceptance by lowering B’s bar rather than raising A’s. Under truthful reporting,
pre-parity is achieved by selecting tB such
that
SA(t) = SB(tB).
In the non-strategic world, this is exactly the demographic-parity
repair.
With strategic response, the key difference from selectivity is geometric: inclusiveness does not “pile up” A-mass just below a new and potentially high-density cutoff. Instead, it moves B’s cutoff left, typically into a region where B already has more mass. Intuitively, this does two stabilizing things at once:
It reduces the marginal importance of manipulation for closing the gap. Since parity is achieved by directly increasing B’s baseline acceptance, the equilibrium correction due to manipulation is a perturbation around an already-equalized base rate.
It aligns incentives with the intended direction of repair. Any strategic uplift that B experiences (from agents just below tB moving up) tends to reinforce the inclusive adjustment rather than undo it.
Of course, if group A can manipulate substantially more (bA ≫ bB) or is much better informed (λA ≫ λB), then even an inclusive rule can be pulled away from parity. The point is that—unlike selectivity—this pull admits a clean bound that depends only on smoothness of the score distributions and the heterogeneity in (bg, λg).
Under Lemma 1, equilibrium acceptance in group g is
pg(t) = (1 − λg)Sg(tg) + λgSg(tg − bg),
so for an inclusive repair (t, tB)
the equilibrium gap can be written as
pA − pB = [(1 − λA)SA(t) + λASA(t − bA)] − [(1 − λB)SB(tB) + λBSB(tB − bB)].
Imposing pre-parity SA(t) = SB(tB)
cancels the leading (truthful) terms and leaves only the difference in
“strategic uplift”:
pA − pB = λA(SA(t − bA) − SA(t)) − λB(SB(tB − bB) − SB(tB)).
Thus, inclusive repairs are stable precisely when the
manipulation-induced increments
Sg(⋅ − bg) − Sg(⋅)
do not differ too much across groups after weighting by λg.
To control these increments uniformly, we use a smoothness condition
encoded by the density bound
L := supumax {fA(u), fB(u)} < ∞.
This implies a Lipschitz property of survival functions: for any δ ≥ 0,
|Sg(u − δ) − Sg(u)| = ∫u − δufg(v) dv ≤ L δ.
The remaining issue is that the two increments are evaluated at
different thresholds (t for A and tB for B). Here the
monotone-likelihood-ratio / single-crossing structure plays its role:
when A stochastically
dominates B in the usual sense
(the common empirical case motivating demographic parity repairs), the
equality SA(t) = SB(tB)
forces tB ≤ t;
lowering B’s cutoff moves
B to a region where small
left-shifts (of size bB) typically
pick up at least as much mass as the corresponding shift at the
higher cutoff. This makes it conservative to treat B’s strategic uplift at (tB, bB)
as the natural benchmark, and to attribute instability primarily to (i)
A’s extra
manipulation range bA − bB
and (ii) differential strategic responsiveness |λA − λB|.
Formally, one can decompose A’s uplift into a “common-width”
part plus an “excess-width” part and then apply the Lipschitz
bound:
$$
S_A(t-b_A)-S_A(t)
=
\underbrace{\big(S_A(t-b_A)-S_A(t-b_B)\big)}_{\le L(b_A-b_B)}
+\underbrace{\big(S_A(t-b_B)-S_A(t)\big)}_{\text{common-width uplift}}.
$$
The first term is controlled directly by L(bA − bB).
Under the single-crossing/MLR ordering and the pre-parity alignment
SA(t) = SB(tB),
the second (common-width) term is not larger than the corresponding
uplift available to group B at
(tB, bB).
Combining these steps and applying triangle inequality yields the
uniform stability bound reported in Proposition 3:
Δeq(t, tB) ≤ L ⋅ (λA(bA − bB) + |λA − λB|⋅bB).
Two features are worth emphasizing. First, the bound is
distribution-free beyond smoothness: the entire effect
of strategic response is summarized by the local density cap L and the heterogeneity in budgets
and information. Second, the bound is first-order in
heterogeneity: if bA ≈ bB
and λA ≈ λB,
then the equilibrium gap remains small even if manipulation itself is
common.
Economically, the bound says that inclusive repairs are stable when manipulation technologies and access to information are not too uneven across groups. In applied terms, bg can reflect access to coaching, documentation, legal support, or optimization effort, while λg reflects whether the decision rule is learnable through social networks, platform feedback, or repeated interaction. Inclusiveness does not eliminate these channels—but it prevents them from being amplified by the repair itself.
The same expression also clarifies when inclusive repairs can fail. If the advantaged group both (i) can manipulate more and (ii) is more likely to be strategic, then the right-hand side can be large, so parity can be materially violated post-deployment even though it held in the audit. Likewise, if densities are extremely steep near the relevant thresholds (large L), then even small budget differences translate into sizable acceptance differences. This is a real limitation: in domains with discretized scores or sharp bunching, one should expect larger strategic distortions than the smooth model predicts.
Finally, inclusiveness interacts with accuracy in a qualitatively different way than selectivity. Lowering tB mechanically increases acceptance for B, which may increase false positives if labels are correlated with z. The present section therefore should not be read as endorsing inclusive repairs unconditionally, but rather as isolating a stability advantage: if demographic parity is a binding policy constraint, then repairing by inclusion provides a tractable and robust handle on post-adaptation outcomes.
A further reason inclusiveness is attractive is algorithmic. Because pg is monotone in its own threshold, the equilibrium parity condition pA(t) = pB(tB) becomes a one-dimensional root-finding problem in tB once t is fixed. Under continuity and strict monotonicity of FB, the solution is unique (Proposition 4), which means equilibrium-fair inclusive thresholds can be computed reliably and audited transparently. This computational tractability is not incidental: it is the operational counterpart of the stability logic above, and it is what we turn to next.
Our stability result for inclusive repairs is not only conceptual; it also delivers a practical computational dividend. Once we accept that deployed rules elicit strategic response, the relevant fairness constraint is equilibrium demographic parity, pA(t) = pB(t), not its pre-adaptation analogue. The question then becomes operational: how do we set thresholds to satisfy equilibrium parity using only data and a small set of behavioral primitives?
Under Lemma 1, acceptance in group g takes the closed form
pg(tg) = (1 − λg)Sg(tg) + λgSg(tg − bg),
where Sg(u) = 1 − Fg(u).
For inclusive rules we fix tA = t
and search over tB ≤ t.
Equilibrium demographic parity is the scalar equation
pB(tB) = pA(t).
The key implementation fact is monotonicity: since SB(⋅) is
decreasing and bB ≥ 0, the
mapping tB ↦ pB(tB)
is continuous and strictly decreasing whenever FB is continuous
and strictly increasing. Hence there is a unique solution tB(t)
(Proposition 4), and we can compute it by bisection.
Concretely, define the function
ϕ(tB; t) := pB(tB) − pA(t).
Then ϕ(⋅; t) is
strictly decreasing in tB. We bracket
the root on an interval $[\,\underline t_B,\
t\,]$ where $\underline t_B$ is
low enough that $p_B(\underline t_B)\ge
p_A(t)$. In practice we can take $\underline t_B$ as a small quantile of
observed scores in group B, or
simply decrease it until the inequality holds. Bisection then finds
tB such
that |ϕ(tB; t)| ≤ η
in O(log (1/η))
iterations.
This is the algorithmic counterpart of the theory: inclusive restrictions turn an equilibrium-fairness problem into a reliable, auditable one-dimensional search, rather than a brittle multi-parameter tuning exercise.
In most applications Fg is unknown
and must be estimated from samples {zi : gi = g}.
The most direct approach is plug-in: let F̂g be the
empirical CDF and Ŝg(u) = 1 − F̂g(u).
Then define the estimated equilibrium acceptance curve
p̂g(tg) = (1 − λg)Ŝg(tg) + λg Ŝg(tg − bg),
and solve p̂B(tB) = p̂A(t)
by bisection over tB.
Two practical points matter.
(i) Discreteness and non-uniqueness in finite samples. Empirical survival functions are step functions, so p̂B(tB) can be flat over intervals. There may be a set of solutions tB that achieve the same p̂B. A transparent tie-breaking rule is to choose the largest such tB (minimizing inclusiveness subject to the constraint), or the smallest (maximizing inclusiveness). Either choice should be pre-registered because it affects downstream accuracy.
(ii) Interpolation / smoothing. If thresholds are required to lie on a grid (credit scores, test scores), we can restrict bisection to that grid. If not, we can linearly interpolate F̂g between order statistics, which restores an “effectively continuous” monotone map and avoids numerical artifacts. Kernel smoothing can also be used, but it introduces bandwidth choices that should be justified and stress-tested.
A useful feature of the plug-in approach is that fairness error can
be bounded in a simple way. By the Dvoretzky–Kiefer–Wolfowitz (DKW)
inequality, for each group g,
Pr (supu|F̂g(u) − Fg(u)| > ϵ) ≤ 2e−2ngϵ2,
so with high probability supu|Ŝg(u) − Sg(u)| ≤ ϵ
as well. Because pg is an affine
combination of two survival evaluations, we get the uniform bound
suptg |p̂g(tg) − pg(tg)| ≤ ϵ.
Thus, if we solve |p̂B(tB) − p̂A(t)| ≤ η,
then (with high probability)
|pB(tB) − pA(t)| ≤ 2ϵ + η.
This gives an implementation-ready recipe: pick a target tolerance η for numerical solving, choose
sample sizes (or confidence levels) so that ϵ is small enough, and report the
implied fairness certificate 2ϵ + η.
What this does not cover are errors in (λg, bg).
In practice, these parameters may be estimated from past deployments,
randomized audits, or quasi-experimental variation (e.g., sudden policy
changes that reveal strategic shifts). A conservative alternative is to
specify uncertainty sets $\lambda_g\in[\underline\lambda_g,\overline\lambda_g]$,
$b_g\in[\underline b_g,\overline b_g]$
and enforce robust parity:
supλ ∈ Λ, b ∈ B |pA(t) − pB(tB)| ≤ ε.
Under monotonicity, this robustification can still be handled by
bisection by replacing pg with its
worst-case (upper/lower) envelope over the uncertainty set—at the cost
of more inclusiveness.
So far we treated t = tA as fixed. If the principal also wants to optimize an accuracy proxy (or any monotone objective), inclusive computation still helps: we can reduce the problem to a single outer search over t. For each candidate t, compute the unique tB(t) that equalizes equilibrium acceptance; then evaluate the objective at (t, tB(t)). This turns a two-threshold constrained optimization into an unconstrained one-dimensional search. Even simple methods (grid search, golden-section) are typically adequate, and the fairness constraint is satisfied by construction rather than by penalty tuning.
The computational simplicity above relies on the one-dimensional score and threshold structure. Several natural extensions break this, and it is helpful to be explicit about where bisection stops working.
Multiple features and non-threshold classifiers. If ẑ arises from a model score s(x) and agents manipulate features x rather than the score directly, then strategic response changes the distribution of scores endogenously. Acceptance probabilities may no longer have the closed form Sg(tg − bg), and t ↦ pg(t) can become non-smooth or even non-monotone. Computing equilibrium then becomes a fixed-point problem, often handled by simulation (agent best responses) combined with numerical root finding on fairness constraints.
Equalized odds (label-conditional constraints). If we require parity separately for y = 0 and y = 1, we effectively impose two equations (one per label) and typically introduce additional degrees of freedom (e.g., label-conditional thresholds or randomized decisions). This is inherently multi-dimensional; one uses coupled bisection on monotone maps when available, or Newton / quasi-Newton methods otherwise.
Discrete/bunched score distributions. In heavily discretized environments, small changes in thresholds can cause large jumps in acceptance. Then the “unique root” logic weakens; randomized thresholds (accepting a fraction at the cutoff) can restore exact feasibility, but require careful governance because randomization is itself a policy choice.
These caveats motivate our experimental section: even when theory points to inclusiveness as stable and computable, numerical behavior depends on how sharply scores bunch, how heterogeneous (bg, λg) are, and whether the environment is truly one-dimensional. In the simulations that follow, we will reproduce fairness reversal under selective repairs, and then show how equilibrium-fair inclusive computation behaves under controlled perturbations of information and manipulation gaps.
Our theoretical results are intentionally stylized, so we complement them with simulations designed to (i) reproduce the “fairness reversal” phenomenon under selective repairs, (ii) verify that inclusive repairs computed for equilibrium parity behave stably, and (iii) map how sensitive post-adaptation disparities are to manipulation-ability and information gaps. Throughout, we treat the experiments as mechanism checks: the goal is not to fit a particular application, but to show that the comparative statics implied by Lemma 1 and the propositions manifest transparently in finite-sample environments with estimation noise and discretization.
We simulate two groups g ∈ {A, B} with
base scores z ∼ Dg.
Unless otherwise stated, we take Dg to be
Gaussian with potentially different means and variances (e.g., z ∣ A ∼ 𝒩(μA, σA2),
z ∣ B ∼ 𝒩(μB, σB2))
and truncate to a plausible score range to mimic operational scorecards.
For costs we use a parsimonious class consistent with our budget
parameterization, such as
$$
c_g(\Delta)=\left(\frac{\Delta}{b_g}\right)^\kappa,\qquad \kappa\ge 1,
$$
so that cg(bg) = 1
and bg
directly indexes the maximal “worthwhile” manipulation distance. When
κ = 1, costs are linear and
the best response is exactly the threshold-crossing behavior described
earlier; for κ > 1, the
same qualitative structure holds (manipulate only if close enough), but
the boundary is slightly softened. We implement the exact Lemma 1
behavior in the baseline and use κ > 1 as a robustness check.
Information is captured by λg:
independently for each agent, with probability λg the agent
best-responds to the deployed rule, otherwise the agent reports ẑ = z. Given thresholds
t = (tA, tB),
we compute equilibrium acceptance in two ways: (a) directly by
simulation (drawing many agents and applying the behavioral rule), and
(b) via the closed form
pg(t) = (1 − λg)Sg(tg) + λgSg(tg − bg).
As a sanity check, these coincide up to Monte Carlo error, and the
discrepancy shrinks at the usual $1/\sqrt{n}$ rate. This validation is useful
because it makes clear that the reversals we highlight are not numerical
artifacts but mechanical consequences of mass near the cutoff combined
with heterogeneous bg and λg.
To isolate the mechanism in Proposition 2, we impose a selective
repair: fix a baseline threshold t and set tB = t
while raising tA > t
to equalize pre-adaptation acceptance, i.e.,
SA(tA) = SB(t),
so that Δpre(tA, t) = 0
under truthful reporting. Intuitively, this enforces parity by “holding
back” group A at the
boundary.
We then deploy the rule and allow manipulation. When bA > bB
and/or λA > λB,
the post-adaptation acceptance in group A depends not only on SA(tA)
but on the additional mass in [tA − bA, tA)
that can profitably cross. The simulations reproduce exactly this logic:
even when Δpre = 0
by construction, the equilibrium gap
Δeq(tA, t) = |pA(tA) − pB(t)|
becomes strictly positive and can be large when there is substantial
density near tA. In a
representative calibration with moderate density around the cutoff, we
observe a sharp increase from essentially zero pre-gap to a visibly
nontrivial equilibrium disparity driven by the term λA(SA(tA − bA) − SA(tA)),
which has no analogue in the pre-adaptation constraint.
Two comparative patterns are especially robust across distributions. First, holding t fixed, increasing the selectivity intensity tA − t has an ambiguous effect, consistent with our discussion: initially the gap can worsen as more A agents are placed just below a higher bar (creating a larger manipulable band), while very large increases eventually reduce acceptance once the local density thins. Second, the effect is highly localized: when we shift the cutoff to a region where fA(⋅) is small, the reversal attenuates, whereas placing the cutoff near a dense region (a “bunching” point in a discretized score) amplifies the reversal.
Next we implement inclusive repair computation as an operational
procedure: fix an anchor tA = t
and choose tB ≤ t
to solve the equilibrium parity equation
pB(tB) = pA(t).
In the oracle case (known Fg), bisection
on ϕ(tB; t) = pB(tB) − pA(t)
converges rapidly and yields a unique tB(t)
whenever FB is continuous
and strictly increasing. In the data-driven case, we replace Sg by empirical
Ŝg and
solve p̂B(tB) = p̂A(t)
on a grid of permissible thresholds.
The key experimental finding is not merely that we can hit parity by construction—this is tautological—but that the solution is stable under modest misspecification and sampling noise. When we perturb (bg, λg) slightly away from the values used to compute tB, the induced parity error tracks the smooth bound from Proposition 3: the deviation scales approximately with λA(bA − bB) and |λA − λB|bB, and it is amplified when the score density near the relevant thresholds is high. By contrast, when we compute a pre-adaptation parity threshold pair (ignoring manipulation) and then evaluate it under strategic response, the resulting gap can move in either direction and is often substantially larger for the same perturbation size.
We also stress-test discretization. When scores are rounded (e.g., integer scores), exact feasibility may require randomization at the cutoff. Implementing the simplest tie-breaking convention—choosing the largest tB that weakly satisfies p̂B(tB) ≥ p̂A(t)—produces near-parity outcomes, while explicit randomization at a single boundary point recovers exact parity at the cost of additional policy complexity. The general message is that inclusiveness maintains tractability even when the data generating process is “messy” in the ways real score systems often are.
To visualize how behavioral heterogeneity drives outcomes, we run grid experiments over (bA − bB, λA − λB) holding fixed a baseline t and comparable score distributions. For each grid point we compute: (i) a selective repair tA that equalizes pre-adaptation acceptance with tB = t, and (ii) an inclusive equilibrium-fair tB(t) with tA = t. We then plot Δeq under each design.
The resulting “phase diagram” has an interpretable structure. Under selective repair, Δeq increases roughly linearly in both gaps for small heterogeneity, with curvature in regions where density changes quickly around the cutoff. Under inclusive equilibrium-fair design, the level of Δeq is near zero by construction when parameters are correct, and the sensitivity to misspecification concentrates in the same directions predicted by the bound: the design is robust when groups have similar bg and λg, and fragility emerges when one group is both more capable and more informed.
Finally, we illustrate one way λg can arise endogenously: information diffuses through a social or professional network. We generate a random graph within each group (e.g., Erdős–Rényi with edge probability πg) and seed a small fraction of “initially informed” agents. Information spreads for a fixed number of rounds, and λg is the realized informed fraction at deployment. By varying πg and the seed rate, we obtain environments where λA ≥ λB emerges without assuming it ex ante.
When we plug the realized λg into the equilibrium-fair inclusive computation, parity is restored as expected; but when we compute thresholds using anticipated λg and then allow diffusion to increase λA post hoc, we observe predictable parity drift. This case study clarifies a policy-relevant point: interventions that change transparency, peer learning, coaching markets, or platform “tips” effectively shift λg, and fairness audits that ignore these dynamics can systematically misstate deployed disparities.
Taken together, these simulations reinforce the central practical lesson: enforcing fairness pre-adaptation can be misleading precisely in the environments where strategic behavior is most plausible, while inclusive, equilibrium-aware design remains computationally simple and empirically stable whenever behavioral primitives are not too asymmetric.
Our core model is intentionally spare: a one-dimensional score, a binary accept/reject decision, and a reduced-form way to capture strategic response through (bg, λg). The benefit of this parsimony is that it makes the “equilibrium vs. pre-adaptation” distinction transparent, and it yields closed-form objects that can be stress-tested empirically. The cost, of course, is that real deployments raise additional desiderata (e.g., equalized odds), richer action spaces (multi-dimensional manipulation), and institutional constraints (limits on group-conditional rules). We briefly discuss how the logic extends, and where it does not.
Demographic parity is a natural first target when the decision is
allocative and the score is an omnibus measure. In settings where labels
y are meaningful (e.g.,
repayment, recidivism), practitioners often prefer equalized
odds, which requires acceptance rates to match conditional on the
label:
Pr [ft(ẑ, g) = 1 ∣ y = 1, g = A] = Pr [ft(ẑ, g) = 1 ∣ y = 1, g = B],
and analogously for y = 0
(equal false-positive rates).
The conceptual extension is straightforward: we posit conditional
score distributions z ∼ Dg, y
with CDF Fg, y
and survival Sg, y(u) = 1 − Fg, y(u).
Under the same threshold-crossing behavior, the equilibrium acceptance
probability conditional on (g, y) takes the same
mixture form as in Lemma 1:
pg, y(t) = (1 − λg)Sg, y(tg) + λgSg, y(tg − bg),
where we have kept λg and bg
group-specific (not label-specific) for simplicity. Equalized odds on
equilibrium outcomes is then the pair of constraints pA, 1 = pB, 1
and pA, 0 = pB, 0.
Two points are worth emphasizing. First, the feasibility and stability issues become sharper. Equalized odds imposes two moment-matching conditions, and with a single scalar instrument per group (one threshold tg) we may not be able to satisfy both without randomization at the boundary or without moving beyond threshold rules. This is not a technicality: even absent strategic behavior, deterministic thresholds generally cannot equalize both TPR and FPR unless the conditional distributions have a special alignment. Strategic behavior tightens this further because the “effective” acceptance is evaluated at both tg and tg − bg, so local shape differences in both Dg, 1 and Dg, 0 matter.
Second, equilibrium awareness is arguably more important under equalized odds than under demographic parity. When agents manipulate, they mechanically change the mapping from latent ability z to observed ẑ. If the label is linked to z (as in monotone models y = 1{z ≥ τg} or probabilistic links), then post-manipulation acceptance no longer corresponds to the same conditional error rates as under truthful reporting. Put differently, a policy tuned to equalize pre-adaptation TPR/FPR can drift in either direction after deployment, and this drift is mediated by exactly the same behavioral gaps (bA − bB, λA − λB) highlighted in our demographic-parity analysis. In applications where equalized odds is motivated by error-rate equity, that is precisely the object we should evaluate after strategic response.
Real score systems are rarely one-dimensional; even when a platform produces a scalar score, it is typically a function of a feature vector x ∈ ℝd (income, utilization, test preparation, documentation quality, etc.). Extending the model to d > 1 raises two separate questions: (i) what is the agent’s action space (which features can be changed, and at what cost), and (ii) what is the principal’s policy class (linear rules, trees, neural scores, hand-built rubrics).
A tractable extension that preserves our economic logic is to assume the classifier is a linear threshold f(x, g) = 1{w⊤x ≥ tg} and that an informed agent can choose a manipulation vector δ to maximize acceptance minus cost, with group-dependent costs cg(δ). When costs are convex and depend on ∥δ∥ through a norm (e.g., quadratic effort or a weighted ℓ1 cost capturing “hard-to-change” coordinates), the agent’s best response is to move in the direction that most efficiently increases w⊤x, i.e., along the normal vector w scaled by the inverse cost metric. In that sense, many multi-dimensional settings collapse to an effective one-dimensional distance-to-boundary problem, where bg becomes a group-dependent maximal movement in the decision-relevant direction.
What changes relative to the one-dimensional case is that the geometry of feasible manipulation can differ substantially across groups even when the value of acceptance is the same. For example, if one group has access to coaching that targets precisely the features with the highest weight (say, résumé keywords), then its cost metric is effectively lower along high-impact directions; this is an economically meaningful source of bA > bB that need not show up as a uniform shift in a scalar score distribution. Likewise, the information parameter λg can become feature-specific (some agents learn “what matters” but not “how to change it”), which suggests that equilibrium design should be robust not only to different manipulation magnitudes but also to different manipulation directions. Our main qualitative lesson survives: policies that engineer parity by “tightening” on the advantaged side can be precisely those that create the largest manipulable region in feature space.
A recurring practical objection is that group-conditional thresholds (tA, tB) may be legally restricted, politically unacceptable, or operationally infeasible. In many jurisdictions, explicitly using a protected attribute at decision time triggers heightened scrutiny, even if the intent is corrective. How does our analysis inform policy when the principal must use a common threshold tA = tB = t (or, more generally, a group-blind rule)?
Within our framework, group blindness collapses the instrument set:
we can still compute equilibrium acceptance
pg(t) = (1 − λg)Sg(t) + λgSg(t − bg),
but we typically cannot force pA(t) = pB(t)
unless the primitives satisfy a knife-edge condition. This yields an
important reframing: under group-blind constraints, the relevant
question is not “can we achieve parity?” but “how does the chosen policy
interact with endogenous response to shape disparities?” Here, the
comparative statics are policy-relevant even without the ability to
repair. For instance, if disparities are amplified primarily by λ-gaps (information diffusion)
rather than by b-gaps
(capability), then interventions targeted at transparency, coaching
markets, or platform guidance may be more effective than threshold
tuning.
A second workaround, common in practice, is implicit group conditioning: the principal avoids using g directly but uses correlated features or post-processing. Our model cautions that such proxies can be strategically targeted as well; they may simply relocate the manipulable margin rather than eliminate it. A more principled alternative is to allow limited randomization at the cutoff (already standard in some allocation problems), which can restore feasibility for constraints like equalized odds while reducing the incentive to concentrate manipulation on a deterministic boundary. That said, randomization carries its own legitimacy and communication challenges.
We view the model as suggesting three broad, implementable takeaways.
First, fairness auditing should be equilibrium-aware whenever strategic response is plausible. If stakeholders can improve measured scores through effort, advice, or gaming, then pre-deployment parity calculations are, at best, partial-equilibrium forecasts.
Second, “inclusive” adjustments—loosely, helping the disadvantaged group clear the bar rather than raising the bar on the advantaged—tend to be more stable in equilibrium when behavioral primitives are not too asymmetric. This is not an ethical claim; it is a mechanism claim about where manipulation mass accumulates.
Third, behavioral parameters are policy levers. Subsidies, counseling, simplification of documentation, and anti-fraud enforcement all change effective bg. Transparency, interpretability, and the ecosystem of third-party advice change λg. Importantly, these levers can move in opposite directions: transparency can improve accountability while also increasing strategic responsiveness. Our framework makes that tradeoff explicit.
We close by acknowledging what we do not model. We abstract from dynamic interactions (agents learning over time), from endogenous entry (who applies), from feedback to Dg (investments that shift base scores), and from richer principal objectives beyond threshold accuracy proxies. Each of these forces can matter in practice, and incorporating them is not merely a technical extension—it can change which interventions are desirable. Our goal here is narrower: to isolate a basic equilibrium logic that is easy to compute, easy to simulate, and hard to ignore once seen.