Equilibrium-Stable Fairness in Strategic Classification: Selectivity Fails, Inclusiveness Survives

Metadata

Total Words: 11,893
Export Date: 2026-01-14 21:38:15
Description: Fairness constraints are typically designed on static data, yet deployed decision rules induce strategic adaptation (Goodhart’s law) and can become less fair after people respond. Recent evidence documents “fairness reversal,” where a fairness-aware classifier becomes less fair than an accuracy-focused classifier once agents strategically manipulate features. We provide a clean, tractable theory explaining when and why this occurs, and what constraints remain valid in equilibrium.

We study a Stackelberg model of strategic classification with heterogeneous groups, group-dependent information (peer learning), and monotone manipulation costs. In a one-dimensional threshold setting we derive closed-form equilibrium acceptance probabilities and show a sharp dichotomy: fairness repairs achieved by increasing selectivity on the advantaged group are generically unstable—there exist environments where demographic parity holds before adaptation yet fails by a constant amount in equilibrium. In contrast, repairs achieved by inclusiveness—expanding access for the disadvantaged group—are equilibrium-stable, with post-adaptation disparities bounded by simple functions of cost and information heterogeneity.

Our results synthesize and extend the incentive-aware ML literature (strategic classification, partial information, and fairness reversal) by turning an empirical warning into a design rule: if fairness is required in strategic environments, constraints should be posed and optimized on equilibrium outcomes, and, when solutions are non-unique, the inclusive branch is robust. We also provide an efficient algorithm (binary search over thresholds) to compute inclusive equilibrium-fair rules and validate the theory with simulations on benchmark datasets under both continuous manipulation and manipulation graphs.

1. Introduction and motivating examples (fairness reversal; strategic adaptation in credit/hiring; why equilibrium fairness matters).
2. Related work (strategic classification; fairness under strategic agents; opacity/peer learning; social burden).
3. Baseline model: scores, manipulation, information heterogeneity, and threshold decision rules (one-dimensional tractable core).
4. Equilibrium characterization: best responses, effective thresholds, and closed-form acceptance rates; definition of pre- vs post-adaptation fairness metrics.
5. Taxonomy of fairness repairs: selective vs inclusive (and mixed); formal definitions and welfare/accuracy objectives.
6. Main theory I: instability of selectivity (constructive lower bound / fairness reversal instances).
7. Main theory II: stability of inclusiveness (uniform upper bounds under MLR + single-crossing; interpretation).
8. Algorithmic implications: computing equilibrium-fair inclusive thresholds; statistical estimation from samples; when numerical methods are needed (multi-dimensional extensions).
9. Experiments/simulations: reproduce fairness reversal; demonstrate equilibrium-stable designs; sensitivity to cost and information gaps; optional manipulation-graph case study.
10. Discussion and extensions: equalized odds; multi-dimensional scores; constraints on using group-conditional thresholds; policy implications.

Content

1. Introduction and motivating examples (fairness reversal; strategic adaptation in credit/hiring; why equilibrium fairness matters).

Modern decision systems routinely face a tension between two desiderata that are each compelling in isolation: we want simple, auditable rules (often threshold-based), and we want those rules to satisfy formal notions of fairness across protected groups. Yet these systems are deployed into environments in which individuals respond strategically. When a decision rule is predictable enough to be accountable, it is also predictable enough to be exploited. The central motivation of this paper is that the relevant object for fairness assessment is therefore not the rule “on paper,” nor even its performance under truthful reporting, but rather the rule’s equilibrium consequences after agents adapt.

A canonical example comes from consumer credit. Lenders increasingly rely on score-like indices—traditional credit scores, bank-account “cash-flow scores,” or proprietary underwriting models—that are then converted into accept/reject decisions through relatively transparent cutoffs. At the same time, both regulators and internal risk teams monitor disparities in acceptance rates across groups and may impose demographic-parity-style targets. A natural operational response is to “repair” disparities by adjusting thresholds: tighten the cutoff for an advantaged group, loosen it for a disadvantaged group, or combine both. However, the deployment of a repaired threshold changes incentives. If there are accessible actions that raise the relevant score (opening certain credit lines, paying for a credit-builder product, reallocating payments to optimize utilization, disputing items, or simply learning which behaviors the model rewards), then agents near the cutoff have a reason to take them. Importantly, the ability and willingness to do so is not uniform. Some populations have better access to financial products, more stable liquidity, or stronger peer networks that transmit actionable information. As a result, a threshold change that equalizes acceptance under baseline scores can be followed by a behavioral response that partially or fully undoes the equalization. In practice, one observes what we call a fairness reversal: an intervention that appears parity-improving ex ante becomes parity-worsening ex post.

Hiring provides a second motivating domain. Resume-screening systems and assessment tests are often summarized by a scalar score, and the operational decision is frequently a cutoff. Firms then face legal and reputational pressure to ensure that selection rates do not diverge too sharply across groups. But applicants can and do invest in targeted preparation: resume keyword optimization, coaching for psychometric screens, practice on leaked question banks, or the purchase of services that specifically advertise “beating the ATS.” Again, these investments are not equally available. Some candidates have access to better advising, more time to prepare, or communities that share tactical information. Moreover, the relevant strategic lever is typically threshold-crossing: an applicant slightly below the cutoff has the strongest incentive to engage in whatever action yields a marginal score increase, because the payoff from crossing the cutoff is discrete (an interview, a job offer, a chance to proceed). This creates a predictable pattern of manipulation concentrated in a band just below the decision boundary. A fairness repair that operates by tightening selection for an advantaged group can therefore backfire if that group is also the one more capable of “just clearing” the tightened boundary.

These examples illustrate why we insist on distinguishing pre-adaptation from post-adaptation disparity. A principal may compute acceptance rates under truthful reporting and conclude that a rule satisfies demographic parity. But if the principal’s deployment induces a subset of agents to alter their reported score, then the realized acceptance rates reflect both the statistical properties of the underlying distributions and the strategic environment in which the rule is embedded. The appropriate fairness object is thus an equilibrium acceptance probability, i.e., the probability that an agent is accepted after accounting for the agent’s optimal response to the rule. In this sense, equilibrium fairness is not an exotic refinement; it is the minimal standard needed for fairness claims to be robust to predictable behavioral feedback.

Our modeling approach is designed to formalize this robustness concern while remaining close to how threshold rules are used in practice. We take seriously the fact that agents are heterogeneous not only in their baseline qualification (a latent “true” score) but also in their ability to respond to the mechanism and in their access to information about how the mechanism works. Two simple primitives capture these ideas. First, we allow group-dependent costs of score improvement, which we summarize by a manipulation budget: how much an agent can raise the score before the marginal cost exceeds the unit value of acceptance. Second, we allow group-dependent informational responsiveness: not all agents will discover, understand, or trust the rule well enough to optimize against it. Some will behave strategically, others will not. In many real settings, this informational channel is shaped by opacity and peer learning. If the rule is opaque, fewer agents can target it; if communities share knowledge effectively, more agents can. Either way, informational differences can translate directly into disparities in who is able to exploit the decision boundary.

This perspective also clarifies why certain fairness “fixes” are particularly fragile. A common repair heuristic is what one might call selective: to reduce the acceptance rate of an advantaged group by raising its threshold, holding the disadvantaged group’s threshold fixed. Selective repairs are attractive because they preserve standards for the disadvantaged group and often look defensible as a procedural change. But they also create a newly valuable region just below the raised threshold, precisely where manipulation is cheapest and most effective. If the advantaged group is also more able or more informed, then the principal’s attempt to reduce its acceptance rate can be offset by increased manipulation, leaving the principal with a rule that is fair on paper but unfair in equilibrium. By contrast, repairs that operate by becoming more inclusive toward the disadvantaged group—loosening its threshold—often interact differently with strategic response: they increase baseline acceptance directly, and the residual disparity from manipulation can be bounded by how sharply the score distributions change near the boundary and by how different the groups are in responsiveness and manipulation capacity. The broad lesson is not that inclusiveness is always optimal, but that equilibrium stability depends on where the repair moves mass relative to the decision boundary and on which populations are best positioned to respond.

The policy relevance of equilibrium fairness is immediate. Regulators increasingly contemplate transparency requirements, adverse-action explanations, and standardized model documentation. These interventions may improve accountability, but they also change the strategic environment by making the rule easier to learn and target. Similarly, firms invest in “fairness through awareness” programs and external audits that may inadvertently publicize operational thresholds. Our framework does not take a normative stance against transparency; rather, it highlights a tradeoff: holding fixed the underlying distributions, increased learnability can amplify behavioral responses, and if learnability is uneven across groups, it can amplify disparity. This suggests that fairness governance cannot be separated from the information structure of the environment. In practice, a fairness assessment should specify whether it is conditional on truthful behavior, on partial strategic response, or on full adaptation—and which populations are expected to be able to adapt.

We close this introduction with a note on scope. We deliberately study a stylized, one-dimensional threshold environment, in which agents can only increase a scalar score and the decision-maker’s action is a cutoff (possibly group-conditional). This abstraction fits many operational pipelines—scores are ubiquitous, thresholds are common, and “gaming” often takes the form of raising a measured feature. At the same time, we do not claim that all strategic behavior is captured by additive score manipulation, nor that group differences can be reduced to a single cost and information parameter. Real settings feature multidimensional actions, general-equilibrium spillovers, dynamic learning by the principal, and welfare consequences beyond acceptance probabilities. Our aim is more modest and, we believe, complementary: to isolate a tractable mechanism by which fairness constraints that are satisfied under baseline behavior can fail after deployment, and to provide conditions under which certain classes of repairs are predictably more stable. In doing so, the model illuminates a practical lesson: fairness is not only a property of an algorithm, but also a property of the strategic environment the algorithm creates.

Related work

Our analysis sits at the intersection of three literatures: strategic classification and “gaming” of predictive systems; algorithmic fairness under behavioral response; and the role of information, transparency, and social context (including the distribution of burdens induced by compliance or manipulation).

A first thread studies prediction and classification when individuals can change inputs in response to a decision rule. Early computer-science work on adversarial classification and strategic manipulation emphasizes that when decision boundaries are known or can be inferred, agents will shift features in order to obtain favorable outcomes (e.g., in spam, fraud, or access-control settings). Subsequent work in strategic classification formalizes this feedback in economic terms: agents choose actions to maximize utility net of costs, and the principal’s deployed classifier shapes the induced action distribution. This literature clarifies a point we take as foundational: the object a principal “sees” after deployment is an endogenous distribution of reported features, not the baseline distribution of latent types. Our model adopts the canonical separation between a latent scalar “qualification” (here, a base score z) and a manipulable report ẑ, and it uses a standard best-response logic with monotone costs. Where we depart from much of the strategic-classification literature is in placing fairness constraints—not only accuracy or robustness—at the center of the principal’s problem, and in making group-dependent strategic responsiveness a primitive rather than an equilibrium artifact of a fully specified learning environment.

A second thread focuses directly on fairness constraints and their interaction with incentives. In the fairness literature, demographic parity and related criteria are typically defined with respect to outcomes under a fixed data-generating process. A large body of work explores the impossibility of simultaneously satisfying multiple fairness definitions, and the statistical tradeoffs between error rates across groups. More recent work asks what happens when the mechanism itself changes behavior: fairness constraints can shift incentives, and those incentive shifts can be heterogeneous across groups. One line of papers analyzes “fairness and effort” settings in which individuals invest effort to improve an evaluation score; fairness constraints can reallocate effort, sometimes increasing total welfare but sometimes imposing disproportionate burdens. Another line studies “recourse” and the feasibility of improving one’s outcome under a model: even when a classifier is formally fair, the set of attainable changes can differ by group, implying unequal access to favorable outcomes. Our formulation is intentionally simpler than these frameworks—we treat acceptance as a binary benefit and manipulation as an additive increase in a single score—but the simplification is purposeful: it isolates the threshold-crossing logic that makes behavioral response particularly sharp near a cutoff, and it allows us to express post-deployment acceptance rates in closed form. This tractability lets us distinguish repairs that operate by excluding (raising thresholds) from repairs that operate by including (lowering thresholds), and to show how these choices map into equilibrium disparities when manipulation ability and information differ.

A closely related behavioral-feedback literature studies performative prediction: when a predictor is deployed, it changes the environment that generates future data, so a model that is optimal on historical data may be suboptimal once it influences behavior. Our setting can be read as a one-shot, Stackelberg version of this idea: the principal commits to thresholds, agents respond, and realized outcomes reflect an endogenous distribution. The performative prediction perspective highlights a methodological lesson that motivates our equilibrium fairness metric: evaluating a rule “offline” on baseline data can be systematically misleading when the rule itself is a causal intervention. Our contribution is to bring this logic into a fairness-repair question that is operationally common—adjusting thresholds to equalize selection rates—and to characterize when such repairs are stable to strategic response.

A third thread concerns information, transparency, and the social processes by which agents learn how to respond to algorithms. The interpretability and transparency literatures often frame explanations as unambiguously beneficial: they improve accountability, enable contestability, and support compliance. At the same time, a mechanism-design perspective emphasizes that greater transparency can increase manipulability. Several recent papers study this tension formally, showing that revealing decision rules can induce gaming and reduce predictive validity, while also potentially improving welfare by enabling productive investment. Our reduced-form parameter λ_g is meant to capture precisely the “learnability” channel that connects transparency, peer learning, and strategic response. Rather than model information acquisition explicitly (e.g., through costly experimentation, social networks, or repeated play), we allow a fraction λ_g of each group to best-respond. This choice is a limitation—λ_g compresses many mechanisms into one scalar—but it is also a strength: it lets us ask how fairness repairs behave under plausible comparative statics (higher λ_g due to better access to advice, higher exposure to the system, or stronger peer networks), without committing to a particular microfoundation. In this sense, λ_g bridges two empirical observations that are often discussed separately: first, that some communities have more effective channels for disseminating “what the model wants,” and second, that transparency interventions can differentially amplify those channels.

Our discussion also relates to work on disparate impact as a burden rather than (or in addition to) a disparity in selection rates. Even when groups achieve similar acceptance probabilities, the cost of achieving acceptance can differ if one group must invest more (time, money, stress, or foregone opportunities) to cross the decision boundary. This concern appears in scholarship on the “hidden costs” of algorithmic systems and in formal models of effort-based fairness. While our core results focus on demographic parity in acceptance rates—because that is a common compliance metric and analytically tractable—our framework naturally speaks to burden through the manipulation cost function c_g(⋅). In particular, a rule that equalizes equilibrium acceptance probabilities may still be inequitable if it induces systematically higher manipulation expenditure for one group. We do not fully pursue welfare or burden-optimal policy here, but we view the ability to compute induced manipulation regions and costs as a useful byproduct of the equilibrium characterization, and as a stepping stone to richer objectives.

Finally, our emphasis on repair policies connects to an applied literature on post-processing and threshold adjustment to satisfy fairness constraints. Much of that literature treats threshold changes as a direct mapping from score distributions to acceptance rates. Our results caution that, in strategic settings, the mapping is itself policy-dependent: moving a threshold changes not only who is accepted under baseline scores, but also who finds it worthwhile to manipulate. This matters especially for the common “selective repair” heuristic of tightening standards for an advantaged group. The key conceptual link to prior work is that fairness constraints can create incentives to concentrate effort (or gaming) around decision boundaries; our added point is that when this concentration is group-skewed—through either differential ability to manipulate (b_g) or differential ability to learn (λ_g)—repairs that look parity-restoring under truthful reporting can be parity-destabilizing after deployment.

In sum, we borrow from strategic classification the idea that deployed rules induce behavioral response; from performative prediction the idea that evaluation must be equilibrium-consistent; and from fairness and recourse the idea that both outcomes and burdens matter and can differ by group. Our contribution is to combine these insights in a parsimonious threshold environment and to use that parsimony to obtain sharp, policy-relevant distinctions between exclusionary and inclusionary fairness repairs. We now turn to the baseline model that delivers this tractable equilibrium mapping from thresholds to post-adaptation acceptance rates.

3. Baseline model: scores, manipulation, information heterogeneity, and threshold decision rules (one-dimensional tractable core).

3. Baseline model: scores, manipulation, information heterogeneity, and threshold decision rules

We study a one-shot, Stackelberg model of strategic classification with two observable groups g ∈ {A, B}. The principal (decision-maker) commits to a simple threshold rule, agents respond by possibly manipulating their reported score, and the principal’s realized selection rates reflect this endogenous response. The purpose of the baseline model is not realism in every detail, but tractability: we want a setting in which (i) strategic behavior concentrates around a cutoff in an analytically transparent way, and (ii) differences in groups’ ability to respond can be represented by a small number of interpretable primitives.

Latent scores. Each agent belongs to a group g and draws a one-dimensional, latent “base score” z ∼ D_g. We take D_g as a primitive of the environment, with continuous CDF F_g and density f_g. The base score can be interpreted as a composite index of underlying qualification or risk—e.g., a creditworthiness index, a standardized test score, or an internal risk score—whose distribution differs across groups due to historical and structural factors. Our emphasis is on how a fixed decision rule interacts with these baseline differences once agents can adapt.

The one-dimensionality assumption is standard in threshold models and is motivated by practice: many high-stakes decisions are operationalized via a scalar score followed by a cutoff. More importantly, a scalar score isolates a key mechanism: when acceptance is determined by whether one crosses a single boundary, incentives to act are sharply localized near that boundary. This localization drives our main comparative statics and, ultimately, the contrast between exclusionary (raising thresholds) and inclusionary (lowering thresholds) “repairs.”

Decision rule (thresholds). The principal chooses a vector of group-specific thresholds t = (t_A, t_B) ∈ ℝ². After observing the agent’s reported score ẑ and group g, the principal applies
f_t(ẑ, g) = 1{ẑ ≥ t_g}.
Allowing t_A ≠ t_B is meant to encompass two cases: (i) explicit group-conditional policies (which may be legally or normatively constrained in some domains), and (ii) an analytical proxy for post-processing or “repair” rules that effectively implement different cutoffs across groups through other instruments. Our results are most naturally read as statements about what kind of threshold movement—tightening versus loosening—tends to be stable once behavior adjusts.

Manipulation technology. After observing z, agents may increase their reported score. We model manipulation as an additive increase Δ ≥ 0, so that
ẑ = z + Δ.
This captures many forms of “gaming” or costly improvement that primarily move an individual along the model’s scoring dimension: paying for test prep, hiring a consultant to optimize an application, reallocating transactions to improve a credit score, or selectively disclosing information. We abstract from manipulation that decreases scores (it is never privately optimal here) and from multidimensional feature changes (which complicate identification of the relevant margin). Additivity is a simplification, but it delivers a transparent threshold-crossing logic: to change an outcome, an agent must move just enough to cross t_g.

Manipulation is costly. A group-g agent who increases by Δ pays c_g(Δ), where c_g : ℝ₊ → ℝ₊ is strictly increasing, continuous, and normalized by c_g(0) = 0. We interpret c_g broadly as the private burden of attempting to improve one’s score along the relevant dimension, including time, money, stress, and foregone opportunities. Allowing c_g to vary by group is crucial: some groups may have systematically better access to credit-building products, coaching, documentation, or networks that reduce the cost of strategic response.

Agents obtain a normalized value of 1 from being accepted. Thus, conditional on (z, g) and thresholds t, utility is
U_g(z, Δ; t) = 1{z + Δ ≥ t_g} − c_g(Δ).
The “value = 1” normalization is without loss of generality for our purposes: it pins down units and makes costs immediately interpretable as fractions of the acceptance benefit. What matters for the behavior we study is the binary nature of acceptance and the monotonicity of costs, which together imply that manipulation decisions are governed by a simple comparison between the gain from switching the classification outcome and the cost of doing so.

A useful summary of manipulation ability is the budget (or maximal worthwhile shift)
b_g := c_g⁻¹(1).
This is the largest score increase an agent in group g would ever be willing to pay for if acceptance yields value 1. When b_A > b_B, group A can, in this reduced-form sense, “move farther” for the same benefit. We will see that differences in (b_A, b_B) translate directly into differences in how much mass each group can push over a threshold.

Information heterogeneity (limited strategic responsiveness). A second central primitive is that not all agents know how to respond effectively. In many real systems, the decision rule is partially opaque: agents may not know the precise cutoff, may be uncertain which actions map into score changes, or may lack the sophistication to optimize. At the same time, some agents learn through repeated interaction, advice markets, social networks, or formal disclosures. We capture these channels parsimoniously by assuming that, within group g, an agent best-responds with probability λ_g ∈ [0, 1], and otherwise does not manipulate (reports ẑ = z).

We interpret λ_g as a reduced-form responsiveness rate: it summarizes both access to information and the ability to translate information into effective action. Differences in λ_g can arise from differential exposure to the system (more frequent applicants learn faster), differential access to counselors or online forums, or heterogeneous trust and engagement with institutions. This modeling choice deliberately avoids committing to a microfoundation (e.g., costly information acquisition or network diffusion), which would add detail but obscure the comparative statics we care about. In equilibrium, λ_g functions as an “opacity/transparency” lever: holding all else fixed, higher λ_g makes the deployed rule more gameable by a larger share of group g.

Timing and commitment. The interaction is a one-shot Stackelberg game. The principal first commits to thresholds t. Then each agent observes their group g and base score z. With probability λ_g, the agent is informed and chooses Δ to maximize U_g; with probability 1 − λ_g, the agent does not manipulate. The agent reports ẑ = z + Δ, and the principal applies f_t.

This timing mirrors “offline deployment” of a policy: a lender posts underwriting standards, a university sets an admissions cutoff, or an agency implements an eligibility threshold; applicants then adapt. It also clarifies what we mean by evaluating a fairness repair: a repair is not assessed against the historical distribution of z alone, but against the post-deployment distribution of ẑ, which is policy-dependent.

Principal’s objective and the role of fairness. We keep the principal’s objective deliberately flexible. In some applications the principal cares about predictive accuracy or expected loss with respect to an outcome y (e.g., repayment), and thresholds are chosen to trade off errors. In others, the principal faces explicit constraints on selection rates. Our analysis treats threshold choice as the policy instrument and focuses on how fairness constraints map into thresholds when behavior responds. Concretely, our later results will compare two types of threshold adjustments that are common in practice: tightening standards for one group versus loosening standards for another. The baseline model’s role is to make these adjustments comparable within a single environment in which both manipulation ability (b_g) and information (λ_g) can differ.

We emphasize at the outset a limitation that also motivates our focus: because manipulation is costly, equalizing acceptance probabilities need not equalize burdens. Even if a repair achieves similar selection rates across groups after adaptation, one group may pay more (in expected manipulation cost) to attain those outcomes. Our core metric will be disparity in acceptance rates because it is operationally prominent and analytically clean, but the model keeps track of costs, enabling burden calculations as a natural extension.

With primitives (D_g, c_g, λ_g) and a threshold vector t in place, we can now characterize equilibrium behavior and derive closed-form acceptance probabilities as a function of effective thresholds—an equilibrium mapping that will allow us to distinguish “fair on paper” (pre-adaptation) from “fair in equilibrium” (post-adaptation) policies.

4. Equilibrium characterization: best responses, effective thresholds, and closed-form acceptance rates; definition of pre- vs post-adaptation fairness metrics.

4. Equilibrium characterization: best responses, effective thresholds, and closed-form acceptance rates; pre- vs post-adaptation fairness

Fix a threshold vector t = (t_A, t_B). Because the principal’s decision is binary and manipulation only enters through the reported score ẑ = z + Δ, an agent’s strategic problem has a particularly stark structure: conditional on being strategic, the agent either (i) does nothing, or (ii) manipulates exactly enough to cross the relevant cutoff. This “threshold-crossing” logic is the engine behind all of our subsequent stability results, because it lets us map any deployed threshold rule into an analytically tractable, policy-dependent acceptance rate.

4.1 Best responses: “do nothing” or “just cross”

Consider a group-g agent with base score z. If the agent is among the fraction λ_g who can best-respond, they solve
max_Δ ≥ 0 1{z + Δ ≥ t_g} − c_g(Δ).
There are two cases.

If z ≥ t_g: the agent is already accepted without manipulation. Since c_g(Δ) is strictly increasing and c_g(0) = 0, any Δ > 0 strictly reduces utility. Hence Δ_g^*(z; t) = 0.

If z < t_g: acceptance requires choosing Δ ≥ t_g − z. Because costs are strictly increasing, among all Δ that secure acceptance, the cheapest is Δ = t_g − z. Thus the agent compares two options: reject with Δ = 0 (utility 0) versus accept by just crossing (utility 1 − c_g(t_g − z)). The agent manipulates if and only if c_g(t_g − z) ≤ 1. Using the manipulation budget b_g := c_g⁻¹(1), this condition is equivalent to t_g − z ≤ b_g, i.e., z ≥ t_g − b_g.

Putting these cases together, the strategic best response is
$$ \Delta_g^*(z;\mathbf t)= \begin{cases} 0, & z\ge t_g,\\ t_g-z, & z\in[t_g-b_g,\ t_g),\\ 0, & z<t_g-b_g. \end{cases} $$
Two immediate implications are worth highlighting. First, manipulation is localized to an interval of width b_g just below the threshold—exactly the region where outcomes are “within reach.” Second, for strategic agents the deployed cutoff t_g is behaviorally transformed into an effective cutoff t_g − b_g: anyone with z ≥ t_g − b_g can ensure acceptance by paying a cost weakly below 1.

4.2 Limited information as a mixture: who gets to play the game?

The model’s information heterogeneity enters only through whether the agent is able to implement the best response. With probability λ_g, the agent is strategic and plays Δ_g^*(z; t); with probability 1 − λ_g, the agent does not manipulate and reports ẑ = z. We can interpret the realized population as a mixture of two “types” within each group: a strategic subpopulation that effectively faces cutoff t_g − b_g, and a non-strategic subpopulation that faces the literal cutoff t_g.

Although one could characterize the full distribution of reported scores ẑ, for our purposes it is enough to compute acceptance probabilities. Let S_g(u) := 1 − F_g(u) denote the survival function of D_g. Under our continuity assumption, there is no mass exactly at the cutoff, so acceptance events can be treated cleanly using S_g.

4.3 Closed-form equilibrium acceptance rates

Let p_g(t) denote the equilibrium (post-adaptation) acceptance probability in group g. Condition on whether the agent is strategic.

If the agent is non-strategic (probability 1 − λ_g), they are accepted iff z ≥ t_g, which occurs with probability S_g(t_g).
If the agent is strategic (probability λ_g), then by the best-response characterization they are accepted iff z ≥ t_g − b_g, which occurs with probability S_g(t_g − b_g).

Therefore,
p_g(t) = (1 − λ_g) S_g(t_g) + λ_g S_g(t_g − b_g).
This expression makes transparent how behavior mediates the effect of thresholds. Relative to the “truthful” acceptance rate S_g(t_g), the deployed policy receives an endogenous boost equal to
p_g(t) − S_g(t_g) = λ_g(S_g(t_g − b_g) − S_g(t_g)),
which is increasing in both responsiveness λ_g and manipulation ability b_g, and is amplified when there is substantial probability mass near the cutoff (i.e., when f_g is large around t_g).

Several monotonicities follow immediately and will be repeatedly used later. For each group g, p_g(t) is strictly decreasing in its own threshold t_g (since S_g is strictly decreasing under a continuous density), weakly increasing in λ_g, and weakly increasing in b_g. In words: raising a cutoff reduces acceptance, but some of that reduction is “undone” by strategic response, especially in groups that are better able to manipulate or better informed about how to do so.

It is also helpful to emphasize what does not appear in p_g(t): we do not need the full functional form of c_g(⋅), only the single-index summary b_g = c_g⁻¹(1). This is a direct consequence of binary payoffs from acceptance. More elaborate value structures (e.g., heterogeneous values of acceptance) would generally require richer summaries of costs; we return to this limitation in our discussion, but the binary case already isolates the core stability mechanism.

4.4 Pre-adaptation vs post-adaptation fairness metrics

Because acceptance depends on the reported score ẑ, fairness assessed “on paper” using the base distribution of z can diverge from fairness “in equilibrium” after agents adapt. We therefore distinguish two demographic-parity gaps associated with the same deployed thresholds t.

Pre-adaptation (truthful) acceptance. If all agents reported ẑ = z, group-g acceptance would be
Pr [f_t(z, g) = 1 ∣ g] = Pr [z ≥ t_g ∣ g] = S_g(t_g),
and the corresponding demographic-parity gap is
Δ_pre(t) := |S_A(t_A) − S_B(t_B)|.

Equilibrium (post-adaptation) acceptance. Under strategic response with limited information, group-g acceptance is p_g(t) as above, and the equilibrium demographic-parity gap is
Δ_eq(t) := |p_A(t) − p_B(t)| = |(1 − λ_A)S_A(t_A) + λ_AS_A(t_A − b_A) − (1 − λ_B)S_B(t_B) − λ_BS_B(t_B − b_B)|.

The key conceptual point is that Δ_eq(t) is not a property of (D_A, D_B) and thresholds alone: it is jointly determined by thresholds and the groups’ response parameters (λ_g, b_g). A policy can therefore be calibrated to satisfy Δ_pre(t) = 0 and yet fail to satisfy Δ_eq(t) = 0 once deployed, simply because the underlying incentives to cross a cutoff are uneven across groups.

Finally, while our primary fairness object is acceptance-rate disparity, the equilibrium characterization also pins down manipulation incidence and burden. For example, among strategic group-g agents, manipulation occurs exactly when z ∈ [t_g − b_g, t_g); thus the manipulation rate is λ_g(F_g(t_g) − F_g(t_g − b_g)), and expected manipulation cost can be written as an integral over this interval. These quantities will matter when we later discuss the welfare consequences of “repairs” that equalize acceptance rates at the expense of unequal (or inefficient) effort costs.

With the equilibrium mapping t ↦ (p_A(t), p_B(t)) in hand, we can now study policy classes that practitioners use to enforce parity—loosening cutoffs for a disadvantaged group versus tightening cutoffs for an advantaged group—and ask which classes remain fair after adaptation rather than only under truthful reporting.

5. Taxonomy of fairness repairs: selective vs inclusive (and mixed); formal definitions and welfare/accuracy objectives.

5. Taxonomy of fairness repairs: selective vs. inclusive (and mixed); formal definitions and welfare/accuracy objectives

In applied settings, demographic parity (and related group constraints) is rarely implemented by redesigning the underlying score; instead, it is typically implemented by post-processing a fixed score with group-conditional cutoffs. This is true both in “human-in-the-loop” systems (where caseworkers are instructed to apply a different bar across groups) and in automated screening (where the platform’s accept/reject threshold is adjusted). Our model is tailored to this reality: the principal’s policy instrument is the threshold vector t = (t_A, t_B), and the only difference across common repair heuristics is which direction we move each cutoff relative to some baseline.

To make this precise, we start from a benchmark threshold t that the principal would deploy absent fairness considerations. Conceptually, t can be the accuracy-optimal common threshold under truthful reporting, or simply an institutionally “default” cutoff (e.g., a credit score standard) that is treated as fixed for legal or operational reasons. A fairness repair is then any mapping from t into a group-specific pair (t_A, t_B) intended to reduce an acceptance-rate gap. The key choice is whether we seek parity by tightening standards for the group with higher base acceptance, or by loosening standards for the group with lower base acceptance.

5.1 Selective repairs (tighten the advantaged group)

We call a repair selective if it holds fixed the disadvantaged group’s cutoff and raises the advantaged group’s cutoff. In our two-group notation (where group A is often, though not always, the higher-scoring group in the MLR sense), the selective class anchored at t is
𝒯^sel(t) := {(t_A, t_B): t_B = t, t_A ≥ t}.
The interpretation is straightforward: we “repair” a disparity by excluding additional marginal applicants from group A, leaving group B’s standard untouched. This captures a family of policies that are common in practice when decision-makers are reluctant (for legal, political, or risk-management reasons) to explicitly lower standards for the lower-acceptance group. It also aligns with a popular algorithmic instinct: if one group is over-represented among accepted cases, raise that group’s bar until the numbers match.

Under truthful reporting, the selective strategy is mechanically effective whenever S_A(t) exceeds S_B(t): increasing t_A decreases S_A(t_A) continuously, so one can often find t_A such that S_A(t_A) = S_B(t). Our equilibrium characterization, however, suggests an immediate tension: selective repairs create a larger manipulation incentive margin for group A exactly near the new, higher cutoff. Because acceptance in equilibrium is governed by the effective cutoff t_A − b_A for strategic agents, the question is not only whether selectivity equalizes outcomes “on paper,” but whether it remains fair after the advantaged group responds to the newly created incentive.

5.2 Inclusive repairs (loosen the disadvantaged group)

We call a repair inclusive if it holds fixed the advantaged group’s cutoff and lowers the disadvantaged group’s cutoff. The inclusive class anchored at t is
𝒯^inc(t) := {(t_A, t_B): t_A = t, t_B ≤ t}.
Here the repair equalizes acceptance by including more marginal applicants from group B. This captures policies often described as affirmative action, targeted outreach coupled with relaxed screening, or “equal opportunity” adjustments implemented at the decision boundary rather than in the score construction.

Under truthful reporting, inclusiveness similarly offers a direct route to parity: decreasing t_B increases S_B(t_B), and one can typically choose t_B so that S_B(t_B) = S_A(t). Importantly, inclusive repairs change the incentive environment differently than selective repairs. Lowering t_B shrinks the mass of group B agents who are “just below” the threshold and hence shrinks the set for whom manipulation is pivotal. In our equilibrium formula, this manifests through the terms S_B(t_B) and S_B(t_B − b_B) moving together as t_B changes. Intuitively, inclusiveness reduces reliance on agents “gaming” the boundary to reach acceptance, because more of the desired acceptance is achieved directly through the policy choice.

5.3 Mixed repairs (move both cutoffs)

Many real interventions adjust both thresholds, either explicitly (a regulator sets group targets and allows both standards to move) or implicitly (internal tuning changes a common threshold while also applying a group-specific offset). We therefore define a mixed class anchored at t as any set allowing both movements:
𝒯^mix(t) := {(t_A, t_B): t_A ≥ t, t_B ≤ t}.
Mixed repairs can be viewed as convex combinations of selectivity and inclusiveness: we partly tighten group A and partly loosen group B. In a non-strategic world this class is attractive because it can reduce the efficiency cost of parity—one need not move a single group’s cutoff as aggressively. In a strategic world, however, the effect is ambiguous ex ante: tightening A potentially increases manipulation by A, while loosening B potentially decreases manipulation by B. Which force dominates depends on local densities near the cutoffs and on heterogeneity in (b_g, λ_g).

Our main theory focuses on the polar cases 𝒯^sel and 𝒯^inc because they isolate the two canonical implementation logics used in policy debates (“raise standards” versus “lower standards”). Mixed rules then inherit features of both, and we return to them when discussing design implications.

5.4 What is the repair trying to equalize: pre-adaptation vs. equilibrium parity

A crucial modeling choice is when parity is evaluated. In the simplest compliance narrative, the principal selects t to satisfy a pre-adaptation (truthful) demographic-parity constraint,
Δ_pre(t) = |S_A(t_A) − S_B(t_B)| ≤ ε,
treating the score distribution as fixed. This is the natural target if the principal calibrates using historical data under an earlier policy or if manipulation is ignored in the audit.

In contrast, if parity is assessed on realized deployment outcomes (as it often is in ex post monitoring), the relevant constraint is equilibrium parity,
Δ_eq(t) = |p_A(t) − p_B(t)| ≤ ε,
where p_g(t) = (1 − λ_g)S_g(t_g) + λ_gS_g(t_g − b_g). The difference between these constraints is not a technicality: Δ_eq depends on the strategic parameters (λ_g, b_g), and hence on opacity, access to coaching, and the technology of manipulation. A repair that “solves” parity in the pre-adaptation sense can therefore fail once agents respond, and our taxonomy is designed to ask which repair directions are robust to that feedback.

5.5 Accuracy and welfare objectives with strategic response

Fairness constraints are rarely the principal’s only concern. In our framework, accuracy can be represented by a standard loss ℓ(y, ŷ), where ŷ = f_t(ẑ, g) is the decision and y is a latent label linked monotonically to z (e.g., y = 1{z ≥ τ_g}). The principal’s canonical problem is then
min_t 𝔼[ℓ(y, f_t(ẑ, g))] s.t. Δ(t) ≤ ε,
where Δ is either Δ_pre (a “static” audit) or Δ_eq (a “behavior-aware” audit). Because ẑ is endogenous, even the accuracy term depends on t through equilibrium behavior.

Beyond predictive accuracy, strategic classification raises an additional welfare margin that is often central in policy discussions: wasted effort. Manipulation costs c_g(Δ) are borne by agents and (in many applications) are socially unproductive expenditures on coaching, paperwork, or gaming. A natural reduced-form welfare criterion therefore augments loss with expected manipulation cost,
min_t 𝔼[ℓ(y, f_t(ẑ, g))] + κ∑_{g ∈ {A, B}}𝔼[c_g(Δ_g^*(z; t))] s.t. fairness,
for some κ ≥ 0 capturing how much the principal (or regulator) values reducing wasteful burden. This objective makes explicit an important practical distinction between repairs: selective repairs can equalize acceptance partly by inducing costly responses near a higher cutoff, whereas inclusive repairs can equalize acceptance partly by removing the need to manipulate for the disadvantaged group. Our subsequent results can be read as identifying when that intuitive welfare advantage of inclusiveness is also a stability advantage.

We emphasize, finally, what our repair taxonomy does and does not cover. It focuses on threshold adjustments, which are a leading special case of post-processing and are analytically transparent in one-dimensional scores. More complex interventions—retraining the score, providing subsidies that change c_g, or changing information structures that shift λ_g—are clearly relevant in practice, but they operate through different levers. By isolating threshold repairs, we can cleanly ask a narrow question: holding the score technology fixed, which direction of boundary movement is robust to strategic feedback, and which direction is prone to fairness reversal once the policy is deployed?

6. Main theory I: instability of selectivity (constructive lower bound / fairness reversal instances).

6. Main theory I: instability of selectivity (constructive lower bound / fairness reversal instances)

Selective repairs are often defended as the “safe” way to satisfy demographic parity: rather than lowering standards for the lower-acceptance group, the principal simply tightens the cutoff for the higher-acceptance group until acceptance rates match in the historical (truthful) data. Our first main result says that this intuition can fail in a strategically stark way. The reason is not subtle: raising t_A creates a thicker manipulation margin for group A—a band of agents who are newly rejected but can cheaply move across the cutoff—so the very act of “excluding” marginal A applicants can induce an equilibrium rebound in A’s acceptance.

6.1 The key mechanism: selectivity concentrates incentives exactly where density is high

Fix the anchor t and consider a selective move t_A > t with t_B = t. Under truthful reporting, acceptance rates are S_A(t_A) and S_B(t), so pre-adaptation parity is achieved by choosing t_A such that
S_A(t_A) = S_B(t).
In a non-strategic world, this is essentially the end of the story: once we raise t_A enough, the acceptance gap closes.

With manipulation, however, the relevant object for the strategic fraction λ_A is not the survival at t_A but the survival at t_A − b_A. Lemma 1 implies that the newly pivotal set is precisely the interval
[t_A − b_A, t_A),
whose mass is F_A(t_A) − F_A(t_A − b_A). If group A has substantial density near t_A, then this mass can be large even when b_A is modest. The selective repair therefore has two countervailing effects on equilibrium acceptance for group A: it lowers S_A(t_A) for non-strategic agents but raises acceptance for strategic agents by effectively “moving the bar” down by b_A. When b_A is larger than b_B and/or λ_A exceeds λ_B, this second force can dominate in relative terms, producing a fairness reversal: parity holds on paper, but disparity reappears after deployment.

This mechanism is especially relevant in applied domains where the “cost” of manipulation reflects access to coaching, appeals, or documentation, and where such access is uneven. A selective tightening can unintentionally create a premium on those resources near the boundary, precisely where many decisions are made.

6.2 Formal statement: selective pre-parity can imply arbitrarily large equilibrium gaps

We now formalize the above logic by showing that pre-adaptation parity within 𝒯^sel(t) does not imply any meaningful bound on Δ_eq. In fact, for any target tolerance ϵ ≥ 0, we can construct primitives such that a selective repair achieves Δ_pre = 0 yet violates equilibrium parity by more than ϵ.

Concretely, recall that equilibrium acceptance is
p_g(t) = (1 − λ_g)S_g(t_g) + λ_gS_g(t_g − b_g).
Under the selective restriction (t_A, t), the equilibrium gap is
Δ_eq(t_A, t) = |(1 − λ_A)S_A(t_A) + λ_AS_A(t_A − b_A) − (1 − λ_B)S_B(t) + λ_BS_B(t − b_B)|.
If we pick t_A to enforce pre-parity S_A(t_A) = S_B(t), the leading (truthful) terms cancel, leaving the post-adaptation gap driven by differences in strategic uplift:
Δ_eq(t_A, t) = |λ_A(S_A(t_A − b_A) − S_A(t_A)) − λ_B(S_B(t − b_B) − S_B(t))|.
A selective repair is therefore safe only if the incremental mass of A in [t_A − b_A, t_A), weighted by λ_A, is not much larger than the analogous mass for B at [t − b_B, t), weighted by λ_B. Our result shows that selective repairs provide no general guarantee of this type: one can satisfy pre-parity while making the first term large and the second term small.

6.3 Constructive lower bound: placing mass where selectivity creates a “gaming band”

The simplest way to see the instability is to construct distributions with the following qualitative features:

Pre-parity is achievable by raising t_A: group A has higher survival at the baseline anchor t, so there exists t_A > t with S_A(t_A) = S_B(t).
High local density for group A near t_A: there is substantial probability mass in the band [t_A − b_A, t_A). This ensures a large manipulation-induced uplift S_A(t_A − b_A) − S_A(t_A).
Low local density for group B near t (or smaller b_B, smaller λ_B): the analogous uplift for B, S_B(t − b_B) − S_B(t), is comparatively small.

A canonical instance is to let D_A place a “bump” of density just below the eventual cutoff t_A, while letting D_B be flatter (or shifted) around t. Because t_A is chosen endogenously to satisfy S_A(t_A) = S_B(t), the construction ensures that the truthful acceptances match exactly, even though the local geometry around the thresholds differs sharply.

Using only continuity of densities (no parametric assumptions), we can translate “high mass in the manipulation band” into an explicit lower bound. For any interval I ⊆ [t_A − b_A, t_A],
S_A(t_A − b_A) − S_A(t_A) = ∫_{t_A − b_A}^t_Af_A(u) du ≥ b_A ⋅ inf_u ∈ If_A(u),
and similarly
S_B(t − b_B) − S_B(t) = ∫_{t − b_B}^tf_B(u) du.
Hence, under pre-parity S_A(t_A) = S_B(t), we obtain the lower bound
Δ_eq(t_A, t) ≥ λ_A∫_{t_A − b_A}^t_Af_A(u) du − λ_B∫_{t − b_B}^tf_B(u) du.
In our constructive instances we can make the first integral bounded away from zero (by concentrating A-mass near t_A) while making the second arbitrarily small (by making B-density small near t, or by taking b_B small, or λ_B low). This yields a fairness reversal of size at least a constant c > 0, and thus violates any prescribed ϵ.

The takeaway is that selective repairs are intrinsically local: they depend on the density of the advantaged group near the new cutoff, and that is exactly where strategic response is most consequential. Put differently, selectivity “moves the goalpost” into a region where manipulation is both pivotal and (by assumption) feasible up to b_A.

6.4 Interpretation and limitations: when does the pathology matter?

Two clarifications are important.

First, the result is not that selectivity always fails, but that it is not equilibrium-stable as a class: absent strong additional restrictions tying down (D_A, D_B) and (b_g, λ_g), pre-adaptation parity via t_A↑ provides no uniform control of post-adaptation disparity. In practice, this means that an institution that audits fairness using historical, non-strategic outcomes can be surprised by a systematic acceptance rebound for the group whose cutoff it tightened, especially when that group has better access to manipulation technologies or information.

Second, our construction highlights a policy-relevant asymmetry: selective repairs place the burden of adjustment on the group that is allegedly advantaged, but the capacity to respond strategically may be even more skewed toward that same group (higher b_A, higher λ_A). In such environments, selective repairs can inadvertently reward strategic sophistication rather than equalize opportunity.

These observations motivate the next section, where we ask whether the opposite direction—inclusive repairs—admits stability guarantees under economically meaningful regularity conditions (smooth densities and MLR-type orderings) and under modest heterogeneity in manipulation ability and information.

7. Main theory II: stability of inclusiveness (uniform upper bounds under MLR + single-crossing; interpretation).

7. Main theory II: stability of inclusiveness (uniform upper bounds under MLR + single-crossing)

The previous section showed that tightening the advantaged group’s cutoff can be brittle because it creates a thick set of newly rejected-but-nearby agents who have strong incentives to manipulate. We now ask whether moving in the opposite direction—lowering the disadvantaged group’s cutoff—admits any robust control of post-deployment disparities. The answer is yes: while inclusiveness is not a free lunch (it can sacrifice accuracy, and it can still be distorted by manipulation), it is the direction in which we can obtain a uniform equilibrium stability guarantee under economically standard regularity.

7.1 Why inclusive repairs change the game: shifting the cutoff away from the “knife edge”

Fix an anchor threshold t for group A, and consider the inclusive family
𝒯^inc(t) := {(t, t_B) : t_B ≤ t},
so the principal equalizes acceptance by lowering B’s bar rather than raising A’s. Under truthful reporting, pre-parity is achieved by selecting t_B such that
S_A(t) = S_B(t_B).
In the non-strategic world, this is exactly the demographic-parity repair.

With strategic response, the key difference from selectivity is geometric: inclusiveness does not “pile up” A-mass just below a new and potentially high-density cutoff. Instead, it moves B’s cutoff left, typically into a region where B already has more mass. Intuitively, this does two stabilizing things at once:

It reduces the marginal importance of manipulation for closing the gap. Since parity is achieved by directly increasing B’s baseline acceptance, the equilibrium correction due to manipulation is a perturbation around an already-equalized base rate.
It aligns incentives with the intended direction of repair. Any strategic uplift that B experiences (from agents just below t_B moving up) tends to reinforce the inclusive adjustment rather than undo it.

Of course, if group A can manipulate substantially more (b_A ≫ b_B) or is much better informed (λ_A ≫ λ_B), then even an inclusive rule can be pulled away from parity. The point is that—unlike selectivity—this pull admits a clean bound that depends only on smoothness of the score distributions and the heterogeneity in (b_g, λ_g).

7.2 A uniform post-adaptation bound

Under Lemma 1, equilibrium acceptance in group g is
p_g(t) = (1 − λ_g)S_g(t_g) + λ_gS_g(t_g − b_g),
so for an inclusive repair (t, t_B) the equilibrium gap can be written as
p_A − p_B = [(1 − λ_A)S_A(t) + λ_AS_A(t − b_A)] − [(1 − λ_B)S_B(t_B) + λ_BS_B(t_B − b_B)].
Imposing pre-parity S_A(t) = S_B(t_B) cancels the leading (truthful) terms and leaves only the difference in “strategic uplift”:
p_A − p_B = λ_A(S_A(t − b_A) − S_A(t)) − λ_B(S_B(t_B − b_B) − S_B(t_B)).
Thus, inclusive repairs are stable precisely when the manipulation-induced increments
S_g(⋅ − b_g) − S_g(⋅) do not differ too much across groups after weighting by λ_g.

To control these increments uniformly, we use a smoothness condition encoded by the density bound
L := sup_umax {f_A(u), f_B(u)} < ∞.
This implies a Lipschitz property of survival functions: for any δ ≥ 0,
|S_g(u − δ) − S_g(u)| = ∫_u − δ^uf_g(v) dv ≤ L δ.
The remaining issue is that the two increments are evaluated at different thresholds (t for A and t_B for B). Here the monotone-likelihood-ratio / single-crossing structure plays its role: when A stochastically dominates B in the usual sense (the common empirical case motivating demographic parity repairs), the equality S_A(t) = S_B(t_B) forces t_B ≤ t; lowering B’s cutoff moves B to a region where small left-shifts (of size b_B) typically pick up at least as much mass as the corresponding shift at the higher cutoff. This makes it conservative to treat B’s strategic uplift at (t_B, b_B) as the natural benchmark, and to attribute instability primarily to (i) A’s extra manipulation range b_A − b_B and (ii) differential strategic responsiveness |λ_A − λ_B|.

Formally, one can decompose A’s uplift into a “common-width” part plus an “excess-width” part and then apply the Lipschitz bound:
$$ S_A(t-b_A)-S_A(t) = \underbrace{\big(S_A(t-b_A)-S_A(t-b_B)\big)}_{\le L(b_A-b_B)} +\underbrace{\big(S_A(t-b_B)-S_A(t)\big)}_{\text{common-width uplift}}. $$
The first term is controlled directly by L(b_A − b_B). Under the single-crossing/MLR ordering and the pre-parity alignment S_A(t) = S_B(t_B), the second (common-width) term is not larger than the corresponding uplift available to group B at (t_B, b_B). Combining these steps and applying triangle inequality yields the uniform stability bound reported in Proposition 3:
Δ_eq(t, t_B) ≤ L ⋅ (λ_A(b_A − b_B) + |λ_A − λ_B|⋅b_B).
Two features are worth emphasizing. First, the bound is distribution-free beyond smoothness: the entire effect of strategic response is summarized by the local density cap L and the heterogeneity in budgets and information. Second, the bound is first-order in heterogeneity: if b_A ≈ b_B and λ_A ≈ λ_B, then the equilibrium gap remains small even if manipulation itself is common.

7.3 Interpretation: what inclusiveness buys (and what it does not)

Economically, the bound says that inclusive repairs are stable when manipulation technologies and access to information are not too uneven across groups. In applied terms, b_g can reflect access to coaching, documentation, legal support, or optimization effort, while λ_g reflects whether the decision rule is learnable through social networks, platform feedback, or repeated interaction. Inclusiveness does not eliminate these channels—but it prevents them from being amplified by the repair itself.

The same expression also clarifies when inclusive repairs can fail. If the advantaged group both (i) can manipulate more and (ii) is more likely to be strategic, then the right-hand side can be large, so parity can be materially violated post-deployment even though it held in the audit. Likewise, if densities are extremely steep near the relevant thresholds (large L), then even small budget differences translate into sizable acceptance differences. This is a real limitation: in domains with discretized scores or sharp bunching, one should expect larger strategic distortions than the smooth model predicts.

Finally, inclusiveness interacts with accuracy in a qualitatively different way than selectivity. Lowering t_B mechanically increases acceptance for B, which may increase false positives if labels are correlated with z. The present section therefore should not be read as endorsing inclusive repairs unconditionally, but rather as isolating a stability advantage: if demographic parity is a binding policy constraint, then repairing by inclusion provides a tractable and robust handle on post-adaptation outcomes.

7.4 From theory to implementation: why inclusiveness enables computation

A further reason inclusiveness is attractive is algorithmic. Because p_g is monotone in its own threshold, the equilibrium parity condition p_A(t) = p_B(t_B) becomes a one-dimensional root-finding problem in t_B once t is fixed. Under continuity and strict monotonicity of F_B, the solution is unique (Proposition 4), which means equilibrium-fair inclusive thresholds can be computed reliably and audited transparently. This computational tractability is not incidental: it is the operational counterpart of the stability logic above, and it is what we turn to next.

8. Algorithmic implications: computing equilibrium-fair inclusive thresholds; statistical estimation from samples; when numerical methods are needed (multi-dimensional extensions).

8. Algorithmic implications: computing equilibrium-fair inclusive thresholds, estimation from samples, and when numerical methods are needed

Our stability result for inclusive repairs is not only conceptual; it also delivers a practical computational dividend. Once we accept that deployed rules elicit strategic response, the relevant fairness constraint is equilibrium demographic parity, p_A(t) = p_B(t), not its pre-adaptation analogue. The question then becomes operational: how do we set thresholds to satisfy equilibrium parity using only data and a small set of behavioral primitives?

8.1 Equilibrium parity under inclusive rules reduces to one-dimensional root finding

Under Lemma 1, acceptance in group g takes the closed form
p_g(t_g) = (1 − λ_g)S_g(t_g) + λ_gS_g(t_g − b_g),
where S_g(u) = 1 − F_g(u). For inclusive rules we fix t_A = t and search over t_B ≤ t. Equilibrium demographic parity is the scalar equation
p_B(t_B) = p_A(t).
The key implementation fact is monotonicity: since S_B(⋅) is decreasing and b_B ≥ 0, the mapping t_B ↦ p_B(t_B) is continuous and strictly decreasing whenever F_B is continuous and strictly increasing. Hence there is a unique solution t_B(t) (Proposition 4), and we can compute it by bisection.

Concretely, define the function
ϕ(t_B; t) := p_B(t_B) − p_A(t).
Then ϕ(⋅; t) is strictly decreasing in t_B. We bracket the root on an interval $[\,\underline t_B,\ t\,]$ where $\underline t_B$ is low enough that $p_B(\underline t_B)\ge p_A(t)$. In practice we can take $\underline t_B$ as a small quantile of observed scores in group B, or simply decrease it until the inequality holds. Bisection then finds t_B such that |ϕ(t_B; t)| ≤ η in O(log (1/η)) iterations.

This is the algorithmic counterpart of the theory: inclusive restrictions turn an equilibrium-fairness problem into a reliable, auditable one-dimensional search, rather than a brittle multi-parameter tuning exercise.

8.2 Data-driven computation: plug-in estimation of S_g and p_g

In most applications F_g is unknown and must be estimated from samples {z_i : g_i = g}. The most direct approach is plug-in: let F̂_g be the empirical CDF and Ŝ_g(u) = 1 − F̂_g(u). Then define the estimated equilibrium acceptance curve
p̂_g(t_g) = (1 − λ_g)Ŝ_g(t_g) + λ_g Ŝ_g(t_g − b_g),
and solve p̂_B(t_B) = p̂_A(t) by bisection over t_B.

Two practical points matter.

(i) Discreteness and non-uniqueness in finite samples. Empirical survival functions are step functions, so p̂_B(t_B) can be flat over intervals. There may be a set of solutions t_B that achieve the same p̂_B. A transparent tie-breaking rule is to choose the largest such t_B (minimizing inclusiveness subject to the constraint), or the smallest (maximizing inclusiveness). Either choice should be pre-registered because it affects downstream accuracy.

(ii) Interpolation / smoothing. If thresholds are required to lie on a grid (credit scores, test scores), we can restrict bisection to that grid. If not, we can linearly interpolate F̂_g between order statistics, which restores an “effectively continuous” monotone map and avoids numerical artifacts. Kernel smoothing can also be used, but it introduces bandwidth choices that should be justified and stress-tested.

8.3 Statistical guarantees: how estimation error translates into fairness error

A useful feature of the plug-in approach is that fairness error can be bounded in a simple way. By the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality, for each group g,
Pr (sup_u|F̂_g(u) − F_g(u)| > ϵ) ≤ 2e^−2n_gϵ²,
so with high probability sup_u|Ŝ_g(u) − S_g(u)| ≤ ϵ as well. Because p_g is an affine combination of two survival evaluations, we get the uniform bound
sup_{t_g} |p̂_g(t_g) − p_g(t_g)| ≤ ϵ.
Thus, if we solve |p̂_B(t_B) − p̂_A(t)| ≤ η, then (with high probability)
|p_B(t_B) − p_A(t)| ≤ 2ϵ + η.
This gives an implementation-ready recipe: pick a target tolerance η for numerical solving, choose sample sizes (or confidence levels) so that ϵ is small enough, and report the implied fairness certificate 2ϵ + η.

What this does not cover are errors in (λ_g, b_g). In practice, these parameters may be estimated from past deployments, randomized audits, or quasi-experimental variation (e.g., sudden policy changes that reveal strategic shifts). A conservative alternative is to specify uncertainty sets $\lambda_g\in[\underline\lambda_g,\overline\lambda_g]$, $b_g\in[\underline b_g,\overline b_g]$ and enforce robust parity:
sup_{λ ∈ Λ, b ∈ B} |p_A(t) − p_B(t_B)| ≤ ε.
Under monotonicity, this robustification can still be handled by bisection by replacing p_g with its worst-case (upper/lower) envelope over the uncertainty set—at the cost of more inclusiveness.

8.4 Choosing the anchor t: fairness-constrained optimization remains tractable

So far we treated t = t_A as fixed. If the principal also wants to optimize an accuracy proxy (or any monotone objective), inclusive computation still helps: we can reduce the problem to a single outer search over t. For each candidate t, compute the unique t_B(t) that equalizes equilibrium acceptance; then evaluate the objective at (t, t_B(t)). This turns a two-threshold constrained optimization into an unconstrained one-dimensional search. Even simple methods (grid search, golden-section) are typically adequate, and the fairness constraint is satisfied by construction rather than by penalty tuning.

8.5 When numerical methods are genuinely necessary: multidimensional extensions

The computational simplicity above relies on the one-dimensional score and threshold structure. Several natural extensions break this, and it is helpful to be explicit about where bisection stops working.

Multiple features and non-threshold classifiers. If ẑ arises from a model score s(x) and agents manipulate features x rather than the score directly, then strategic response changes the distribution of scores endogenously. Acceptance probabilities may no longer have the closed form S_g(t_g − b_g), and t ↦ p_g(t) can become non-smooth or even non-monotone. Computing equilibrium then becomes a fixed-point problem, often handled by simulation (agent best responses) combined with numerical root finding on fairness constraints.
Equalized odds (label-conditional constraints). If we require parity separately for y = 0 and y = 1, we effectively impose two equations (one per label) and typically introduce additional degrees of freedom (e.g., label-conditional thresholds or randomized decisions). This is inherently multi-dimensional; one uses coupled bisection on monotone maps when available, or Newton / quasi-Newton methods otherwise.
Discrete/bunched score distributions. In heavily discretized environments, small changes in thresholds can cause large jumps in acceptance. Then the “unique root” logic weakens; randomized thresholds (accepting a fraction at the cutoff) can restore exact feasibility, but require careful governance because randomization is itself a policy choice.

These caveats motivate our experimental section: even when theory points to inclusiveness as stable and computable, numerical behavior depends on how sharply scores bunch, how heterogeneous (b_g, λ_g) are, and whether the environment is truly one-dimensional. In the simulations that follow, we will reproduce fairness reversal under selective repairs, and then show how equilibrium-fair inclusive computation behaves under controlled perturbations of information and manipulation gaps.

9. Experiments/simulations: reproduce fairness reversal; demonstrate equilibrium-stable designs; sensitivity to cost and information gaps; optional manipulation-graph case study.

9. Experiments and simulations: fairness reversal, equilibrium-stable designs, and sensitivity to behavioral gaps

Our theoretical results are intentionally stylized, so we complement them with simulations designed to (i) reproduce the “fairness reversal” phenomenon under selective repairs, (ii) verify that inclusive repairs computed for equilibrium parity behave stably, and (iii) map how sensitive post-adaptation disparities are to manipulation-ability and information gaps. Throughout, we treat the experiments as mechanism checks: the goal is not to fit a particular application, but to show that the comparative statics implied by Lemma 1 and the propositions manifest transparently in finite-sample environments with estimation noise and discretization.

9.1 Baseline simulation environment and validation of the closed-form equilibrium

We simulate two groups g ∈ {A, B} with base scores z ∼ D_g. Unless otherwise stated, we take D_g to be Gaussian with potentially different means and variances (e.g., z ∣ A ∼ 𝒩(μ_A, σ_A²), z ∣ B ∼ 𝒩(μ_B, σ_B²)) and truncate to a plausible score range to mimic operational scorecards. For costs we use a parsimonious class consistent with our budget parameterization, such as
$$ c_g(\Delta)=\left(\frac{\Delta}{b_g}\right)^\kappa,\qquad \kappa\ge 1, $$
so that c_g(b_g) = 1 and b_g directly indexes the maximal “worthwhile” manipulation distance. When κ = 1, costs are linear and the best response is exactly the threshold-crossing behavior described earlier; for κ > 1, the same qualitative structure holds (manipulate only if close enough), but the boundary is slightly softened. We implement the exact Lemma 1 behavior in the baseline and use κ > 1 as a robustness check.

Information is captured by λ_g: independently for each agent, with probability λ_g the agent best-responds to the deployed rule, otherwise the agent reports ẑ = z. Given thresholds t = (t_A, t_B), we compute equilibrium acceptance in two ways: (a) directly by simulation (drawing many agents and applying the behavioral rule), and (b) via the closed form
p_g(t) = (1 − λ_g)S_g(t_g) + λ_gS_g(t_g − b_g).
As a sanity check, these coincide up to Monte Carlo error, and the discrepancy shrinks at the usual $1/\sqrt{n}$ rate. This validation is useful because it makes clear that the reversals we highlight are not numerical artifacts but mechanical consequences of mass near the cutoff combined with heterogeneous b_g and λ_g.

9.2 Reproducing fairness reversal under selective repairs

To isolate the mechanism in Proposition 2, we impose a selective repair: fix a baseline threshold t and set t_B = t while raising t_A > t to equalize pre-adaptation acceptance, i.e.,
S_A(t_A) = S_B(t),
so that Δ_pre(t_A, t) = 0 under truthful reporting. Intuitively, this enforces parity by “holding back” group A at the boundary.

We then deploy the rule and allow manipulation. When b_A > b_B and/or λ_A > λ_B, the post-adaptation acceptance in group A depends not only on S_A(t_A) but on the additional mass in [t_A − b_A, t_A) that can profitably cross. The simulations reproduce exactly this logic: even when Δ_pre = 0 by construction, the equilibrium gap
Δ_eq(t_A, t) = |p_A(t_A) − p_B(t)|
becomes strictly positive and can be large when there is substantial density near t_A. In a representative calibration with moderate density around the cutoff, we observe a sharp increase from essentially zero pre-gap to a visibly nontrivial equilibrium disparity driven by the term λ_A(S_A(t_A − b_A) − S_A(t_A)), which has no analogue in the pre-adaptation constraint.

Two comparative patterns are especially robust across distributions. First, holding t fixed, increasing the selectivity intensity t_A − t has an ambiguous effect, consistent with our discussion: initially the gap can worsen as more A agents are placed just below a higher bar (creating a larger manipulable band), while very large increases eventually reduce acceptance once the local density thins. Second, the effect is highly localized: when we shift the cutoff to a region where f_A(⋅) is small, the reversal attenuates, whereas placing the cutoff near a dense region (a “bunching” point in a discretized score) amplifies the reversal.

9.3 Demonstrating equilibrium-stable inclusive designs

Next we implement inclusive repair computation as an operational procedure: fix an anchor t_A = t and choose t_B ≤ t to solve the equilibrium parity equation
p_B(t_B) = p_A(t).
In the oracle case (known F_g), bisection on ϕ(t_B; t) = p_B(t_B) − p_A(t) converges rapidly and yields a unique t_B(t) whenever F_B is continuous and strictly increasing. In the data-driven case, we replace S_g by empirical Ŝ_g and solve p̂_B(t_B) = p̂_A(t) on a grid of permissible thresholds.

The key experimental finding is not merely that we can hit parity by construction—this is tautological—but that the solution is stable under modest misspecification and sampling noise. When we perturb (b_g, λ_g) slightly away from the values used to compute t_B, the induced parity error tracks the smooth bound from Proposition 3: the deviation scales approximately with λ_A(b_A − b_B) and |λ_A − λ_B|b_B, and it is amplified when the score density near the relevant thresholds is high. By contrast, when we compute a pre-adaptation parity threshold pair (ignoring manipulation) and then evaluate it under strategic response, the resulting gap can move in either direction and is often substantially larger for the same perturbation size.

We also stress-test discretization. When scores are rounded (e.g., integer scores), exact feasibility may require randomization at the cutoff. Implementing the simplest tie-breaking convention—choosing the largest t_B that weakly satisfies p̂_B(t_B) ≥ p̂_A(t)—produces near-parity outcomes, while explicit randomization at a single boundary point recovers exact parity at the cost of additional policy complexity. The general message is that inclusiveness maintains tractability even when the data generating process is “messy” in the ways real score systems often are.

9.4 Sensitivity maps: separating manipulation ability from information

To visualize how behavioral heterogeneity drives outcomes, we run grid experiments over (b_A − b_B, λ_A − λ_B) holding fixed a baseline t and comparable score distributions. For each grid point we compute: (i) a selective repair t_A that equalizes pre-adaptation acceptance with t_B = t, and (ii) an inclusive equilibrium-fair t_B(t) with t_A = t. We then plot Δ_eq under each design.

The resulting “phase diagram” has an interpretable structure. Under selective repair, Δ_eq increases roughly linearly in both gaps for small heterogeneity, with curvature in regions where density changes quickly around the cutoff. Under inclusive equilibrium-fair design, the level of Δ_eq is near zero by construction when parameters are correct, and the sensitivity to misspecification concentrates in the same directions predicted by the bound: the design is robust when groups have similar b_g and λ_g, and fragility emerges when one group is both more capable and more informed.

9.5 Optional case study: endogenizing λ_g via a manipulation graph

Finally, we illustrate one way λ_g can arise endogenously: information diffuses through a social or professional network. We generate a random graph within each group (e.g., Erdős–Rényi with edge probability π_g) and seed a small fraction of “initially informed” agents. Information spreads for a fixed number of rounds, and λ_g is the realized informed fraction at deployment. By varying π_g and the seed rate, we obtain environments where λ_A ≥ λ_B emerges without assuming it ex ante.

When we plug the realized λ_g into the equilibrium-fair inclusive computation, parity is restored as expected; but when we compute thresholds using anticipated λ_g and then allow diffusion to increase λ_A post hoc, we observe predictable parity drift. This case study clarifies a policy-relevant point: interventions that change transparency, peer learning, coaching markets, or platform “tips” effectively shift λ_g, and fairness audits that ignore these dynamics can systematically misstate deployed disparities.

Taken together, these simulations reinforce the central practical lesson: enforcing fairness pre-adaptation can be misleading precisely in the environments where strategic behavior is most plausible, while inclusive, equilibrium-aware design remains computationally simple and empirically stable whenever behavioral primitives are not too asymmetric.

10. Discussion and extensions: equalized odds; multi-dimensional scores; constraints on using group-conditional thresholds; policy implications.

10. Discussion and extensions: equalized odds; multi-dimensional scores; constraints on using group-conditional thresholds; policy implications

Our core model is intentionally spare: a one-dimensional score, a binary accept/reject decision, and a reduced-form way to capture strategic response through (b_g, λ_g). The benefit of this parsimony is that it makes the “equilibrium vs. pre-adaptation” distinction transparent, and it yields closed-form objects that can be stress-tested empirically. The cost, of course, is that real deployments raise additional desiderata (e.g., equalized odds), richer action spaces (multi-dimensional manipulation), and institutional constraints (limits on group-conditional rules). We briefly discuss how the logic extends, and where it does not.

10.1 From demographic parity to equalized odds (and why equilibrium matters even more)

Demographic parity is a natural first target when the decision is allocative and the score is an omnibus measure. In settings where labels y are meaningful (e.g., repayment, recidivism), practitioners often prefer equalized odds, which requires acceptance rates to match conditional on the label:
Pr [f_t(ẑ, g) = 1 ∣ y = 1, g = A] = Pr [f_t(ẑ, g) = 1 ∣ y = 1, g = B],
and analogously for y = 0 (equal false-positive rates).

The conceptual extension is straightforward: we posit conditional score distributions z ∼ D_g, y with CDF F_g, y and survival S_g, y(u) = 1 − F_g, y(u). Under the same threshold-crossing behavior, the equilibrium acceptance probability conditional on (g, y) takes the same mixture form as in Lemma 1:
p_g, y(t) = (1 − λ_g)S_g, y(t_g) + λ_gS_g, y(t_g − b_g),
where we have kept λ_g and b_g group-specific (not label-specific) for simplicity. Equalized odds on equilibrium outcomes is then the pair of constraints p_A, 1 = p_B, 1 and p_A, 0 = p_B, 0.

Two points are worth emphasizing. First, the feasibility and stability issues become sharper. Equalized odds imposes two moment-matching conditions, and with a single scalar instrument per group (one threshold t_g) we may not be able to satisfy both without randomization at the boundary or without moving beyond threshold rules. This is not a technicality: even absent strategic behavior, deterministic thresholds generally cannot equalize both TPR and FPR unless the conditional distributions have a special alignment. Strategic behavior tightens this further because the “effective” acceptance is evaluated at both t_g and t_g − b_g, so local shape differences in both D_g, 1 and D_g, 0 matter.

Second, equilibrium awareness is arguably more important under equalized odds than under demographic parity. When agents manipulate, they mechanically change the mapping from latent ability z to observed ẑ. If the label is linked to z (as in monotone models y = 1{z ≥ τ_g} or probabilistic links), then post-manipulation acceptance no longer corresponds to the same conditional error rates as under truthful reporting. Put differently, a policy tuned to equalize pre-adaptation TPR/FPR can drift in either direction after deployment, and this drift is mediated by exactly the same behavioral gaps (b_A − b_B, λ_A − λ_B) highlighted in our demographic-parity analysis. In applications where equalized odds is motivated by error-rate equity, that is precisely the object we should evaluate after strategic response.

10.2 Multi-dimensional scores and feature manipulation

Real score systems are rarely one-dimensional; even when a platform produces a scalar score, it is typically a function of a feature vector x ∈ ℝ^d (income, utilization, test preparation, documentation quality, etc.). Extending the model to d > 1 raises two separate questions: (i) what is the agent’s action space (which features can be changed, and at what cost), and (ii) what is the principal’s policy class (linear rules, trees, neural scores, hand-built rubrics).

A tractable extension that preserves our economic logic is to assume the classifier is a linear threshold f(x, g) = 1{w^⊤x ≥ t_g} and that an informed agent can choose a manipulation vector δ to maximize acceptance minus cost, with group-dependent costs c_g(δ). When costs are convex and depend on ∥δ∥ through a norm (e.g., quadratic effort or a weighted ℓ₁ cost capturing “hard-to-change” coordinates), the agent’s best response is to move in the direction that most efficiently increases w^⊤x, i.e., along the normal vector w scaled by the inverse cost metric. In that sense, many multi-dimensional settings collapse to an effective one-dimensional distance-to-boundary problem, where b_g becomes a group-dependent maximal movement in the decision-relevant direction.

What changes relative to the one-dimensional case is that the geometry of feasible manipulation can differ substantially across groups even when the value of acceptance is the same. For example, if one group has access to coaching that targets precisely the features with the highest weight (say, résumé keywords), then its cost metric is effectively lower along high-impact directions; this is an economically meaningful source of b_A > b_B that need not show up as a uniform shift in a scalar score distribution. Likewise, the information parameter λ_g can become feature-specific (some agents learn “what matters” but not “how to change it”), which suggests that equilibrium design should be robust not only to different manipulation magnitudes but also to different manipulation directions. Our main qualitative lesson survives: policies that engineer parity by “tightening” on the advantaged side can be precisely those that create the largest manipulable region in feature space.

10.3 When group-conditional thresholds are constrained

A recurring practical objection is that group-conditional thresholds (t_A, t_B) may be legally restricted, politically unacceptable, or operationally infeasible. In many jurisdictions, explicitly using a protected attribute at decision time triggers heightened scrutiny, even if the intent is corrective. How does our analysis inform policy when the principal must use a common threshold t_A = t_B = t (or, more generally, a group-blind rule)?

Within our framework, group blindness collapses the instrument set: we can still compute equilibrium acceptance
p_g(t) = (1 − λ_g)S_g(t) + λ_gS_g(t − b_g),
but we typically cannot force p_A(t) = p_B(t) unless the primitives satisfy a knife-edge condition. This yields an important reframing: under group-blind constraints, the relevant question is not “can we achieve parity?” but “how does the chosen policy interact with endogenous response to shape disparities?” Here, the comparative statics are policy-relevant even without the ability to repair. For instance, if disparities are amplified primarily by λ-gaps (information diffusion) rather than by b-gaps (capability), then interventions targeted at transparency, coaching markets, or platform guidance may be more effective than threshold tuning.

A second workaround, common in practice, is implicit group conditioning: the principal avoids using g directly but uses correlated features or post-processing. Our model cautions that such proxies can be strategically targeted as well; they may simply relocate the manipulable margin rather than eliminate it. A more principled alternative is to allow limited randomization at the cutoff (already standard in some allocation problems), which can restore feasibility for constraints like equalized odds while reducing the incentive to concentrate manipulation on a deterministic boundary. That said, randomization carries its own legitimacy and communication challenges.

10.4 Policy implications and limitations

We view the model as suggesting three broad, implementable takeaways.

First, fairness auditing should be equilibrium-aware whenever strategic response is plausible. If stakeholders can improve measured scores through effort, advice, or gaming, then pre-deployment parity calculations are, at best, partial-equilibrium forecasts.

Second, “inclusive” adjustments—loosely, helping the disadvantaged group clear the bar rather than raising the bar on the advantaged—tend to be more stable in equilibrium when behavioral primitives are not too asymmetric. This is not an ethical claim; it is a mechanism claim about where manipulation mass accumulates.

Third, behavioral parameters are policy levers. Subsidies, counseling, simplification of documentation, and anti-fraud enforcement all change effective b_g. Transparency, interpretability, and the ecosystem of third-party advice change λ_g. Importantly, these levers can move in opposite directions: transparency can improve accountability while also increasing strategic responsiveness. Our framework makes that tradeoff explicit.

We close by acknowledging what we do not model. We abstract from dynamic interactions (agents learning over time), from endogenous entry (who applies), from feedback to D_g (investments that shift base scores), and from richer principal objectives beyond threshold accuracy proxies. Each of these forces can matter in practice, and incorporating them is not merely a technical extension—it can change which interventions are desirable. Our goal here is narrower: to isolate a basic equilibrium logic that is easy to compute, easy to simulate, and hard to ignore once seen.