Digital retail in 2026 looks less like a posted-price storefront and
more like a continuously tuned interaction between a platform, its
sellers, and consumers who carry increasingly explicit . Two
developments are especially salient. First, platforms have made
reference prices''---the benchmarks against which a current offer is framed---both more visible and more standardized. A product page routinely displays awas’’
price, a
typical'' price, a struck-through list price, or a percentage discount badge, and these labels are often computed using a stated lookback rule (e.g.,lowest
price in the last 30 days’‘). Second, the data environment has expanded
the horizon of consumer recall: wish lists, price-tracking extensions,
persistent carts, and algorithmically curated ``price drop’’
notifications all effectively extend the set of past prices a buyer can
retrieve at the moment of purchase. These forces make it difficult to
treat reference dependence as a short-memory phenomenon and, at the same
time, difficult to justify the opposite extreme in which the entire
history is always equally salient.
This paper is motivated by the practical gap between two canonical
modeling approaches to reference formation. In the exponentially
smoothed model (ESM), the reference state updates as a constant-weight
average,
rt + 1 = ζrt + (1 − ζ)pt,
so the influence of prices m
periods ago decays geometrically in m. This is a convenient
approximation when the environment is stationary and consumers
forget'' at a constant hazard. Yet it sits uneasily with platform practices that use \emph{calendar} lookbacks (e.g., $30$ days) or with consumer behavior that becomes more anchored as experience accumulates: a buyer who has repeatedly seen a product at \$100 for months does not revise her notion of thenormal’’
price as quickly as a first-time visitor. At the other pole, the
arithmetic reference model (ARM) makes the reference a running average,
which corresponds to
$$
r_{t+1}=\frac{t}{t+1}r_t+\frac{1}{t+1}p_t,
$$
so the weight on each past price is roughly 1/t and the effective memory grows
linearly with age. This specification captures a kind of
learning-by-accumulation, but it can overstate the salience of very old
prices in contexts where products change, competitors enter, and
consumer attention is episodic rather than archival. In short, ESM can
be and ARM can be .
Our starting point is that platforms and consumers alike appear to
operate with an intermediate regime: the reference is built from past
prices, but the rate at which it incorporates new information declines
over time. We capture this with a time-varying step size,
$$
r_{t+1}=(1-\delta_t)r_t+\delta_t p_t,
\qquad
\delta_t=\frac{1-\zeta}{(t+1)^\alpha},\qquad \alpha\in[0,1],\
\zeta\in[0,1).
$$
The parameter α is a . When
α = 0 we recover ESM with
constant updating intensity 1 − ζ. When (α, ζ) = (1, 0) we recover
ARM. For intermediate α, the
consumer reference becomes increasingly inert: early in the horizon, new
posted prices move the benchmark materially; later, the same price
change moves the benchmark only slightly. This feature is not merely a
technical interpolation. It encodes the idea that a consumer (or a
platform rule) gradually ``locks in’’ a notion of what the regular price
should be.
Why should updating intensity decline with time? We see at least three complementary interpretations. The first is . As a buyer sees more instances of a product price, the estimate of a ``typical’’ price becomes statistically more precise, and rational updating would naturally place less weight on any single new observation. The second is . Digital tools lower the cost of recalling recent prices, but they do not make retrieval of the entire price history free; rather, they create a recency-biased cache whose span can expand as the consumer engages repeatedly with the product category. The third is by platforms and regulation. Discount labels frequently reference a prescribed window, and sellers can anticipate that early posted prices may serve as a future anchor in those compliance calculations or in consumers’ mental comparisons. In each interpretation, it is plausible that a reference is moved quickly when it is still being formed, and slowly once it is established.
The step-size path δt ≍ (t + 1)−α implies a precise prediction about . Intuitively, the impact of a past price pt − m on today’s reference rt is roughly proportional to a product of terms of the form (1 − δk), which behaves like an exponential in the cumulative step sizes $\sum_{k=t-m}^{t}\delta_k$. When δk scales as k−α, this cumulative sum over the last m periods is approximately m/tα. Consequently, prices older than order tα periods carry exponentially small weight, while prices within the most recent order tα periods account for most of the reference. We therefore obtain a growing, but sublinear, memory span: the relevant lookback is not fixed (as in ESM), and it is not proportional to the entire age of the product (as in ARM), but rather expands at rate tα. This delivers a disciplined way to talk about long lookbacks in mature products without forcing the model to keep essentially the entire early history on the margin.
This
growing memory'' perspective clarifies a common platform phenomenon: the same promotional tactic can feel potent early in a product’s lifecycle and muted later. If a seller launches a product and cycles through prices in the first few weeks, consumers may quickly revise what they consider normal. Months later, after repeated exposures at a stable regular price, a one-week promotion may generate a sharp spike in conversion while leaving the reference largely intact; the promotion is perceived as a \emph{deal} rather than a redefinition of the baseline. Our dynamics reproduce this pattern mechanically: in later periods, $\delta_t$ is small, so the reference reacts little to the temporary discount, preserving thewas/now’’
wedge that drives deal-seeking demand. Conversely, when α is small (fast forgetting), deep
discounts may rapidly reset the reference downward, reducing the
profitability of frequent promotions and aligning with the practical
advice that constant discounting can ``train customers’’ to expect low
prices.
A second motivation comes from the design of deal labels and their strategic use. Sellers often care not only about the contemporaneous conversion effect of a price cut, but also about the credibility of a discount badge. If the platform’s badge is computed from some past benchmark, then raising today’s posted price can increase tomorrow’s perceived discount even if the seller plans to reduce price later. This intertemporal tradeoff is central to reference-dependent demand and is precisely where the choice between ESM and ARM matters. Under constant-memory ESM, any attempt to ``set an anchor’’ dissipates at a constant rate. Under full-history ARM, anchors are persistent but also very hard to move once enough history has accumulated. By allowing intermediate α, we can represent environments in which anchors are persistent enough to matter (long lookback), yet still responsive enough to be strategically shaped (nontrivial dynamic optimization).
The economics of such shaping interacts sharply with the well-documented asymmetry between perceived gains and losses. Consumers often respond more favorably to prices below their reference than they respond unfavorably to prices above it, or vice versa, depending on category and context. Deal labels can amplify this asymmetry: a discount tag makes gains salient, whereas a price increase without a tag may be less salient or may trigger distrust. Our demand specification will allow different slopes for gain and loss regions, and the time-varying reference update translates these local demand sensitivities into global pricing patterns. In particular, when gains are especially effective, one expects a high-to-low (markdown) pricing path: early high prices elevate the benchmark; later lower prices harvest ``gain’’ demand. When losses are especially painful, one expects pricing rules that avoid pricing above the benchmark or that move the reference upward cautiously. What our framework adds is a parameterized notion of how long such intertemporal effects last and how they scale with the horizon.
The parameter α therefore has a direct set of empirical and managerial implications. Holding other primitives fixed, larger α implies slower forgetting and a longer effective lookback, which raises the return to early anchoring and increases the potential wedge between optimal dynamic pricing and a naïve fixed-price strategy. At the same time, larger α also makes the reference harder to move later, so it can increase the cost of correcting a mistaken anchor. This creates a nuanced prediction: environments with durable price memory (high α) should exhibit both stronger incentives for early price setting—including high introductory ``regular’’ prices and staged markdowns—and stronger path dependence in subsequent pricing. Conversely, in settings where consumers are volatile or where platform interfaces emphasize only short-term comparisons (low α), the benefit of sophisticated intertemporal pricing is limited, and static pricing heuristics may perform relatively well.
Beyond managerial relevance, the memory knob matters for policy discussions around pricing transparency and discount regulation. When rules mandate that a discount be computed relative to a historical benchmark, they implicitly choose a memory regime. A longer mandated lookback can protect consumers from artificial ``was’’ prices, but it can also increase the incentive to set high initial prices that then serve as durable anchors. Our model provides a way to articulate this tradeoff: regulation that effectively raises α (by extending the relevant memory) can reduce short-run manipulation via rapid reference resetting, yet increase the long-run value of early anchoring. We do not claim this is the only channel, but it is a disciplined channel that connects a platform design choice to a measurable change in dynamic pricing incentives.
Finally, we emphasize what we are doing. We do not model forward-looking consumers who strategically time purchases, nor do we model competition among sellers or inventory constraints that can dominate pricing in many categories. We also treat the reference as a one-dimensional state updated deterministically from posted prices, which abstracts from heterogeneous memories, advertising, and external reference points (e.g., competitor prices). Our goal is narrower: to isolate how a simple, time-varying reference updating rule—interpretable as a long lookback with growing effective span—changes the structure and value of dynamic pricing relative to the two canonical extremes. Within that scope, the model illuminates a basic tradeoff: the same mechanism that makes prices ``stick’’ in consumers’ minds also creates an intertemporal instrument for sellers, and the strength of that instrument is governed by the single parameter α.
We study a monopolist seller who posts a price over a finite selling horizon t ∈ {1, …, T} for a single product. The key state variable is a that summarizes how buyers benchmark the current offer against past prices. Our goal in this section is to (i) specify the reference-dependent demand system with asymmetric responses to gains and losses, (ii) define the seller’s feasible actions and the reference dynamics with time-varying updating intensity, and (iii) formulate the seller’s dynamic revenue maximization problem and the associated value function. Throughout, we treat consumers as myopic in the sense that demand in period t depends on (pt, rt) but not directly on expected future prices; all intertemporal considerations enter through the evolution of the reference state.
Time is discrete. At the beginning of period t, the seller observes the current reference state rt ∈ [0, p̄] and then chooses a posted price pt ∈ [0, p̄], where p̄ is an exogenous maximum feasible price (e.g., an institutional cap, a menu bound, or a range in which demand remains well-defined). Demand then realizes, revenue is collected, and the reference state updates deterministically according to the realized posted price.
The reference update rule is a convex combination:
Because δt ∈ (0, 1] for
all t (given ζ < 1 and (t + 1)α ≥ 1),
the reference remains in [0, p̄] whenever r1, pt ∈ [0, p̄].
The parameter ζ acts as a
baseline inertia term: increasing ζ shrinks all step sizes
proportionally, making the reference harder to move at every date. The
parameter α governs how
quickly updating intensity decays with age. When α is larger, the reference becomes
increasingly inert as t grows,
reflecting the idea that the benchmark ``locks in’’ with accumulated
exposure.
It is useful to interpret in two equivalent ways. First, rt + 1 − rt = δt(pt − rt), so δt directly scales the one-period adjustment toward the current posted price. Second, iterating shows that rt is a weighted average of past prices and the initial condition, with weights determined by products of (1 − δs). We postpone the explicit weight representation and its consequences to the next section, but we will already use here the basic intuition: because δt declines like (t + 1)−α, early prices can influence the benchmark for longer when α is larger.
Demand in period t is a
reference-dependent function of (pt, rt)
plus an additive shock:
The boundedness assumption is a technical convenience that guarantees
integrability of revenues and avoids pathological corner solutions; none
of our qualitative insights rely on heavy tails.
The deterministic component D(p, r)
incorporates a standard base-demand term and a piecewise-linear
reference adjustment:
where (x)+ := max {x, 0}.
The parameters a > 0 and
b ≥ 0 govern the base linear
demand curve, while η+, η− ≥ 0
capture the strength of reference effects in the gain and loss regions,
respectively. When p ≤ r, the price is
perceived as a relative to the benchmark and demand increases by η+(r − p).
When p ≥ r, the price
is perceived as a and demand decreases by η−(p − r).
This specification nests the symmetric, loss-neutral case η+ = η− = : η
and allows us to separate two empirically plausible environments:
gain-seeking categories (large η+ relative to η−) and loss-averse
categories (large η− relative to η+).
Two modeling remarks are important for interpretation. First, the reference enters demand only through the difference r − p and only via the sign-dependent slope; r is not a separate ``quality’’ shifter. This isolates framing and comparison effects. Second, the piecewise linearity deliberately emphasizes tractability and the economic logic of gain/loss asymmetry. In particular, within each region (p ≤ r or p ≥ r), period demand is affine in both p and r, which will allow us later to characterize optimal pricing paths via relatively simple recursions rather than full-blown numerical dynamic programming.
We also implicitly impose feasibility conditions ensuring demand does not become negative over the price range of interest. Because can in principle yield negative values at very high prices (or if η− is large), one can either interpret D(p, r) as an approximation valid on [0, p̄] with p̄ chosen so that D(p, r) is nonnegative for relevant (p, r), or replace D with max {D, 0} without changing the central intertemporal mechanism. For clarity, we maintain and assume parameters are such that expected revenue is well-defined and the seller’s optimum is attained in [0, p̄].
Given a price p and
reference state r, expected
period revenue (conditioning on the state and action) is
since 𝔼[Dt ∣ p, r] = D(p, r)
by . The seller is risk-neutral and chooses prices to maximize expected
total revenue over the horizon:
where a (possibly history-dependent) policy π selects pt based on the
information available at time t. Because the reference state
evolves deterministically from past posted prices via , and demand
shocks are i.i.d. and additively separable, the problem is Markov in
(rt, t):
the entire payoff-relevant history is summarized by the current
reference level and the time index.
Accordingly, we define the continuation value at time t as
$$
V^*(r,t)=\sup_{\pi}\ \mathbb{E}\Big[\sum_{s=t}^T \text{Rev}(p_s,r_s)\
\Big|\ r_t=r\Big],
\qquad
V^*(r,T+1)=0.
$$
The Bellman equation takes the familiar form
The economic content of is immediate: posting a higher price today
raises current revenue mechanically, but also shifts tomorrow’s
reference upward, which can be valuable (if it makes future prices look
like gains) or costly (if it makes future prices look like losses, or if
it pushes the seller into a region where demand is more sensitive). The
time-varying δt determines
how strongly this intertemporal channel operates at each date.
A useful benchmark for interpretation is the best fixed-price policy,
where the seller chooses a constant p and commits to pt ≡ p
for all t. Under such a
policy, the reference follows with pt = p,
so rt
converges toward p at a rate
controlled by (δt). We write
the best fixed-price expected revenue from r1 as
$$
V^{\text{fix}}(r_1):=\max_{p\in[0,\bar p]}\ \sum_{t=1}^T
\mathbb{E}[\text{Rev}(p,r_t)].
$$
Comparing V*(r1)
and Vfix(r1)
will let us quantify the value of dynamic price paths that intentionally
shape the reference over time.
The specification embeds two canonical reference formation rules as
special cases. When α = 0, we
have constant updating intensity δt = 1 − ζ
and the state recursion becomes
rt + 1 = ζrt + (1 − ζ)pt,
the exponentially smoothed model (ESM). ESM implies that the impact of a
past price decays at a constant geometric rate, so the consumer’s
effective lookback is roughly constant over time. When (α, ζ) = (1, 0), we have
δt = 1/(t + 1)
and obtain the arithmetic reference model (ARM),
$$
r_{t+1}=\frac{t}{t+1}r_t+\frac{1}{t+1}p_t,
$$
under which the reference is essentially a running average of all past
prices, and very old prices retain nontrivial influence.
For intermediate α ∈ (0, 1), the model formalizes a
middle regime: the benchmark incorporates new prices, but the rate of
incorporation declines as the product ages. Although we defer formal
kernel calculations, it is worth stating the basic mechanism that will
underlie our results. If we perturb a past price ps slightly, its
influence on a later reference rt is multiplied
forward through the dynamics by a sequence of ``retention’’ factors
(1 − δk)
for k = s + 1, …, t − 1.
When δk is
small, retention is high; when δk is large,
retention is low. With δk ≍ k−α,
the cumulative attenuation from s to t is approximately
$$
\prod_{k=s+1}^{t-1}(1-\delta_k)\ \approx\
\exp\Big(-\sum_{k=s+1}^{t-1}\delta_k\Big),
$$
so what matters is the partial sum of step sizes over the intervening
dates. For α > 0, these
step sizes decline, meaning that (holding the gap t − s fixed) late-stage
references are harder to move than early-stage references. Conversely,
for a fixed current time t,
the part of the past is the range of s for which $\sum_{k=s+1}^{t-1}\delta_k$ is not too
large; outside that range, the exponential attenuation makes the weight
negligible. This is the sense in which α acts as a memory knob: larger
α slows the growth of the
partial sums and thereby extends the span of past prices that still
matter for today’s benchmark.
This interpretation connects directly to practice. A platform rule that effectively uses a longer lookback window, or a consumer population that accumulates a more stable ``fair price’’ belief over repeated exposures, corresponds in our reduced form to a larger α (and/or a larger ζ). The seller then faces a sharper dynamic tradeoff: early prices have a longer shadow on future perceived gains and losses, but later corrective moves are less effective because the benchmark has become inert. In the next section we make this shadow precise by deriving the closed-form memory kernel and bounding how quickly its mass concentrates on recent prices as a function of α and ζ.
A central object in our analysis is the mapping from the posted-price history (p1, …, pt − 1) to the current reference rt. Because the state update is linear, rt admits a closed-form representation as a weighted sum of past prices (plus the initial condition). This representation plays two roles. First, it gives an explicit ``memory kernel’’ that quantifies how strongly each lagged price affects today’s benchmark. Second, it allows us to translate properties of the step-size schedule (δt)—in particular, its power-law decay governed by α—into quantitative bounds on how long the seller can benefit from shaping the reference.
Iterating yields, for each t ≥ 2,
We will write more compactly as
The coefficients {wt, s}
are nonnegative and sum to one:
Thus rt is
a convex combination of r1 and the past posted
prices. The ``memory kernel’’ at time t is the vector (wt, s)s ∈ {0, 1, …, t − 1},
which depends only on the step-size sequence (δk) and not on
the realized prices.
Two immediate sensitivity identities follow from . For any s ≤ t − 1,
Hence the weights are not merely an algebraic device: they are exactly
the marginal impacts of earlier posted prices (or of the initial anchor)
on the current benchmark. This observation will let us connect the time
scale on which weights decay to the time scale on which a seller can
create an economically meaningful gap between rt and pt.
The weights also give a useful Lipschitz-type inequality that we will
use repeatedly when comparing two price paths. Consider two sequences
(ps)s ≥ 1
and (ps′)s ≥ 1,
with corresponding reference paths (rt) and (rt′)
starting from r1
and r1′.
Then for each t,
This inequality formalizes a theme we return to later: differences in
the distant past matter only insofar as the kernel assigns them
non-negligible mass.
To understand the shape of (wt, s)
under $\delta_k=\frac{1-\zeta}{(k+1)^\alpha}$, it
is helpful to approximate products of the form $\prod_{k=u}^{v}(1-\delta_k)$ by
exponentials. Since log (1 − x) ≤ −x for x ∈ (0, 1), we have
A matching lower bound can be obtained when δk is small
(e.g., for moderate and large k), using log (1 − x) ≥ −x − x2;
for our purposes, the upper bound already captures the key concentration
phenomenon.
The partial sums of δk depend
sharply on α. For α ∈ (0, 1),
while for α = 1,
Substituting – into yields a transparent description of how far back the
kernel reaches. Roughly, wt, s
is of order δs times an
exponential of (minus) the cumulative step sizes between s and t. When α > 0, δk shrinks with
k, so retention (1 − δk) rises
with k, and the reference
becomes increasingly inert in late periods.
We now formalize the ``effective memory length’’ induced by the power-law schedule. Fix a time t and consider how much total kernel mass lies on prices older than t − m, i.e., on indices s ≤ t − m − 1. Because rt is a convex combination, this tail mass directly bounds the extent to which the distant past can affect the current reference.
A convenient way to quantify tail mass is to track the survival of any perturbation that occurred at or before time t − m − 1 as it propagates to time t. The key step is that products of retentions can be controlled by sums of step sizes on the intervening window.
The content of is that the relevant time scale for the decay of
memory is tα: once we go
back more than a constant multiple of tα periods, the
cumulative step size over the last m dates is order m/tα,
and retention decays like exp (−Θ(m/tα)).
In particular, for any fixed c > 0,
which is the sense in which weights concentrate on the most recent Õ(tα)
prices (where Õ(⋅) hides
logarithmic factors when we invert such bounds).
We bound the mass on indices at most t − m − 1 by the maximum
possible survival of any weight from time t − m onward. Using and
nonnegativity of weights, it suffices to control the retention factor
from t − m to t, namely $\prod_{k=t-m}^{t-1}(1-\delta_k)$. Applying
gives
$$
\prod_{k=t-m}^{t-1}(1-\delta_k)\le
\exp\Big(-\sum_{k=t-m}^{t-1}\delta_k\Big).
$$
For α ∈ (0, 1], the sum $\sum_{k=t-m}^{t-1}(k+1)^{-\alpha}$ is
well-approximated by an integral and satisfies $\sum_{k=t-m}^{t-1}(k+1)^{-\alpha}\ge c\,
m/t^\alpha$ for a constant c > 0 as long as m is not too large relative to t (and in fact the bound extends
uniformly with appropriate constants). Multiplying by (1 − ζ) yields .
A useful corollary is an explicit effective lookback window as a function of an accuracy level ε.
In words, to recover all but an ε fraction of the kernel mass at time t, it suffices to look back on the order of tαlog (1/ε) periods. This scaling is the technical underpinning of the phase-transition statements we establish later: with α larger, the seller can create a reference shift that persists for more future periods, and thus dynamic policies can outperform fixed pricing by an amount that grows with Tα.
We next translate the concentration results into bounds on how strongly early decisions can affect late references. While already tells us that ∂rt/∂ps = wt, s, it is useful to have simple upper bounds on wt, s that depend on the gap (t − s) and the time scale tα.
The second inequality uses δs ≤ (1 − ζ)/(s + 1)α ≤ (1 − ζ)/tα for s ≤ t. The key implication is that, as a function of the lag ℓ := t − s, the kernel behaves like an exponentially decaying profile when plotted against the rescaled lag ℓ/tα. Thus the seller’s ability to influence rt via a deviation at time s is substantial only when t − s = O(tα).
A particularly important special case is the sensitivity to the
initial reference. Since $w_{t,0}=\prod_{k=1}^{t-1}(1-\delta_k)$, the
same product-to-exponential argument yields
When α < 1, the influence
of r1 dies out
extremely fast (stretched-exponentially in t1 − α). When
α = 1, the decay is
polynomial, reflecting the running-average flavor of ARM-like dynamics.
This distinction matters for how quickly the system ``forgets’’ an
initially low reference: under α < 1 the initial anchor becomes
essentially irrelevant relatively early in the horizon, whereas under
α = 1 it continues to exert a
long tail of influence.
Finally, combining with Lemma~ yields a convenient stability bound:
perturbations to prices far in the past have a uniformly small effect on
today’s reference, regardless of the magnitude of the perturbation (as
long as prices remain feasible). Concretely, if two price paths coincide
on the last m periods before
t, then and imply
so matching only the recent Õ(tα)
prices suffices to match the reference state up to exponentially small
error.
We can now summarize the economic content of the kernel bounds in a way that will guide the pricing results that follow. The power-law schedule δt ≍ t−α implies that, at time t, the reference behaves as if consumers ``remember’’ primarily the last on the order of tα prices, with exponentially small weight on earlier prices. When α is larger, this effective lookback grows, so a high early price can lift the benchmark for more subsequent periods; when α is smaller, the lift dissipates more quickly. At the same time, because δt itself shrinks with t for α > 0, later references become harder to move, so the seller faces a sharper intertemporal tradeoff between exploiting current demand and shaping future comparisons.
These facts are exactly what we need for the next step of the analysis. In the loss-neutral linear-demand benchmark, we will construct price paths that intentionally create a persistent ``reference wedge’’ over approximately Tα periods, yielding a revenue advantage over any fixed price of order Ω(Tα). Conversely, we will show that no dynamic policy can generate more than Õ(Tα) total incremental value from reference shaping, because the kernel prevents any single deviation from influencing more than about tα future dates in a material way.
Fixed-price policies are a natural benchmark in both theory and
practice: they are simple to implement, easy to communicate, and often
justified by the idea that consumers eventually
get used to'' a stable price. Our model clarifies when this intuition is (approximately) correct and when it can be badly misleading. The key driver is the \emph{effective memory length} induced by the power-law step size $\delta_t \asymp t^{-\alpha}$: when consumers retain a nontrivial imprint of earlier prices over $\tilde O(t^\alpha)$ future periods, a seller can profitably invest in a high early price to lift the future reference and laterharvest’’
demand at lower prices. When memory is short, such investments have only
transient payoff and fixed pricing becomes comparatively robust.
In this section we focus on the cleanest environment for quantifying
the gap: reference effects, η+ = η− = : η,
with linear base demand H(p) = b − ap.
In that case the kink in D(p, r) disappears
and demand becomes globally linear:
Revenue is therefore
a concave quadratic in p for
each fixed r, with the only
intertemporal channel entering through the bilinear term ηrp. Put
differently, can be decomposed as
where pH(p) = bp − ap2
is the ``static’’ component and ηp(r − p)
is a that is positive precisely when the posted price sits below the
prevailing benchmark.
We show that for every α ∈ (0, 1] there exist feasible parameters and an initial reference r1 such that any fixed price is polynomially suboptimal: the optimal dynamic policy earns an additional Ω(Tα) expected revenue. The construction is deliberately simple—a two-block policy with one price used to the reference and a second price used to it—and the scaling emerges directly from the kernel concentration results in Section~.
Take an initial reference equal to the (interior) static monopoly
price under the base demand curve:
assuming p∘ ∈ (0, p̄) and
parameters are such that demand is nonnegative at relevant prices. Under
any fixed-price policy pt ≡ p,
the reference recursion implies rt → p,
so the wedge rt − p
is transient. In particular, if we choose p = p∘ and r1 = p∘,
then rt = p∘
for all t and the reference
premium in is identically zero. Thus the best fixed-price benchmark is
essentially pinned down by the concave static term, leaving no mechanism
for a fixed price to generate a persistent gain wedge.
Consider the policy that posts a slightly higher ``investment’’ price
for L periods and then drops
to the base-optimal level:
Because the state update is linear, once we enter the second block the
reference evolves as a contraction toward pL. Writing
xt := rt − pL
for t ≥ L + 1, the
recursion gives
Moreover, after L consecutive
periods at pH, the
reference at the start of the second block is
For α ∈ (0, 1], the cumulative
step size $\sum_{k=1}^L \delta_k$
diverges with L, implying
$\prod_{k=1}^L(1-\delta_k)$ becomes
small once L is moderately
large. In particular, choosing L to grow with T ensures xL + 1 is a
constant fraction of Δ for
large horizons.
Fix L of order Tα and focus on
the first m periods of the
second block, where m is also
of order Tα. By Lemma~
(applied to the survival factor $\prod_{k=L+1}^{t-1}(1-\delta_k)$), the
reference does not immediately collapse to pL; rather, it
remains elevated for roughly tα steps.
Concretely, for t = L + 1 + j with
j ≤ m,
for constants c, c′ > 0
(depending on α, ζ)
and using δk ≍ k−α.
Taking m = ⌊κLα⌋
with κ > 0 sufficiently
small yields a uniform lower bound
where c1 ∈ (0, 1)
is a constant independent of T. This is the core
``memory-to-revenue’’ translation: by investing in an early price
increase, we create a wedge of constant magnitude that persists for
Θ(Tα)
subsequent periods.
Under the loss-neutral demand , the marginal value of an increase in
the reference state at time t
is
Therefore, during the second block (where pt = pL),
the additional revenue attributable to an elevated reference is exactly
linear in the wedge:
Summing over the first m = Θ(Tα)
periods of the second block and using gives a gain of order
The only remaining issue is the ``investment’’ cost incurred during
the first block by charging pH = pL + Δ
instead of pL. Because
pL is the
maximizer of pH(p) and the
static component is strongly concave (quadratic), the baseline loss per
period from deviating by Δ is
O(Δ2).
Meanwhile, the reference premium created by posting pH when rt is near pL is at most
linear in Δ and does not
overturn the conclusion that the net first-block effect is O(LΔ2)
for small Δ. Choosing Δ as a sufficiently small constant
and taking L = Θ(Tα)
yields
so the overall net improvement is Ω(Tα).
Since the optimal policy dominates this explicit two-block policy, we
obtain an instance-dependent lower bound
for each α ∈ (0, 1].
When α = 0, the step size is constant, δt ≡ 1 − ζ, so the effective memory length does not grow with t. Repeating the above argument yields only a window m = Θ(1) on which the wedge remains non-negligible, leading to $V^*(r_1)-V^{\fix}(r_1)=\Omega(1)$ at best. This is the sense in which α > 0 creates a qualitative change: even though δt → 0 and the reference becomes increasingly inert, the horizon-wide value of shaping the reference accumulates over a growing number of future periods.
We now argue that the preceding lower bound is essentially the correct order: no instance in the loss-neutral linear class can exhibit a fixed-price gap larger than Õ(Tα). The proof strategy is to (i) isolate the part of revenue that can benefit from intertemporal reference effects and (ii) use kernel concentration to show that any policy can only maintain a meaningful wedge rt − pt over Õ(tα) periods following any ``movement’’ in the posted price.
Using , the total expected revenue under any policy π can be written as
where the first term is purely ``static’’ and the second term captures
all benefits of inducing rt ≠ pt.
Since pH(p)
is concave in p and prices are
bounded in [0, p̄], the static
term cannot exceed Tmaxp ∈ [0, p̄]pH(p)
by more than a constant depending on the initial transient under a fixed
price (and in particular it cannot create a superlinear advantage over
the best fixed price). Thus, to upper bound $V^*(r_1)-V^{\fix}(r_1)$ it suffices to upper
bound the maximal attainable magnitude of the second term in , up to
additive constants:
Because rt is a convex
combination of recent prices (Section~), the wedge rt − pt
can be controlled by how much prices have moved in the recent past. Fix
a window length m and split
the kernel into a recent'' part and atail’’ part:
The tail term is uniformly small when m ≫ tα
by Lemma~, because the total tail mass $\sum_{s=0}^{t-m-1}w_{t,s}$ decays like exp (−Θ(m/tα)).
The recent term can be bounded by telescoping price increments:
Substituting into the recent part of and swapping sums yields a bound of
the form
for coefficients γt, k ∈ [0, 1]
that depend only on the kernel weights. Intuitively, only price changes
within the last m periods can
keep rt
separated from pt; older
changes have exponentially small influence.
Set m = m(t) := ⌈ctαlog T⌉
for a large enough constant c.
Then the tail term in is at most T−2, say, and hence
negligible when summed over t ≤ T. The remaining
contribution depends on the price increments weighted by γt, k.
A single increment |pk + 1 − pk|
can affect |rt − pt|
only for t in a window of
length Õ(kα)
after time k, because for
later t it falls into the
kernel tail. Therefore, when we sum over t = 1, …, T, each increment
is ``counted’’ at most Õ(kα)
times. This yields the aggregate bound
where Õ(⋅) hides logarithmic
factors inherited from the choice of m(t).
Finally, to translate into a bound on , we use feasibility and
interiority to control total variation of near-optimal policies: when
the static revenue component is strongly concave and demand must remain
nonnegative, excessively oscillatory pricing is dominated because it
sacrifices pH(p) without
creating commensurate reference premia (the premium is at most linear in
the wedge, while the static loss from moving prices is quadratic).
Formalizing this tradeoff yields
and hence
The lower bound construction shows Ω(Tα) is attainable by an explicit, economically interpretable strategy: raise the reference, then harvest. The upper bound shows that this is essentially the most one can do in the worst case: power-law memory permits a seller to profit from reference shaping over Õ(tα) future periods, but not more, and aggregating these opportunities over the horizon yields at most Õ(Tα). The remaining logarithmic slack comes from converting exponential tail decay into a uniform ``effective window’’ across all t.
Taken together, and justify describing a phase transition in α: when α = 0 the maximal advantage of dynamic pricing over fixed pricing is bounded (in order), whereas for any α > 0 the worst-case fixed-price gap grows polynomially as Tα, reflecting an expanding effective memory that makes early reference shaping increasingly valuable.
We now turn from dynamic pricing can matter to a more operational
question: in the presence of reference effects with time-varying
updating. A canonical pattern in many retail categories is a
schedule—prices drift down over the season, often with occasional
discrete
promotional'' drops. In our model, markdowns arise for a simple economic reason: earlier prices influence later perceived gains through the reference state, so placing relatively high prices earlier can be interpreted as aninvestment’’
in a higher benchmark against which later prices are evaluated.
Formally, we call a policy a if its realized posted prices
satisfy
p1 ≥ p2 ≥ ⋯ ≥ pT.
The central message of this section is twofold. First, when consumers
are in the sense that gains are at least as salient as losses (η+ ≥ η−),
markdown policies are without loss of optimality: among all optimal
policies there exists one that is monotone non-increasing. Second, even
when consumers are (η+ < η−)
so that aggressive early anchoring can be costly, we can still justify
markdowns as a robust design: there exists a markdown policy whose
expected revenue is within an O(Tαlog T)
additive gap of the fully optimal policy. The time-varying step size
δt = (1 − ζ)/(t + 1)α
matters quantitatively through the effective memory length, but it does
not overturn the qualitative monotonicity logic.
The key technical object is the (time-varying) reference kernel
implied by the linear update. Iterating the recursion yields, for each
t ≥ 2,
Thus each posted price ps affects all
future references rt for t > s, with an influence
weight that decays in the time distance through products of (1 − δk).
Compared to constant-step exponential smoothing, the only change is that
these influence weights are no longer geometric; however, they remain
nonnegative, sum to at most one, and (by the kernel lemma) concentrate
on a window of length Õ(tα).
To see why markdowns are optimal when η+ ≥ η−, it is useful to focus on —adjacent times t for which pt < pt + 1. A standard way to enforce monotonicity in finite-horizon scheduling problems is an argument: we show that whenever such an inversion occurs, swapping the two prices (leaving all other prices unchanged) weakly increases the seller’s expected objective. Repeating this swap until no inversions remain yields a non-increasing (markdown) sequence without reducing revenue.
The economic force behind the swap argument is that, holding fixed the multiset of posted prices, placing a higher price earlier (i) raises the next-period reference more, thereby making the subsequent lower price more likely to be perceived as a gain, and (ii) shifts any ``loss’’ perceptions (pricing above the reference) earlier, when they have fewer future spillovers. When η+ ≥ η−, the model rewards re-timing in exactly this direction: demand reacts at least as strongly to creating gains later as it does to avoiding losses later.
The time-varying update affects the swap calculus only through how
the swap perturbs the reference path. Consider two policies that
coincide up to time t − 1, and
that differ only by swapping the pair (pt, pt + 1) = (x, y)
with x < y:
$$
\pi:\ (p_t,p_{t+1})=(x,y),\qquad
\pi^{\swap}:\ (p_t,p_{t+1})=(y,x),
$$
with all later decisions held fixed for the moment. Because the state
evolution is linear, the induced references under the two policies can
be coupled pathwise (for every shock realization), and the difference in
the reference at any future time s ≥ t + 1 is an affine
function of (y − x).
In particular, rt + 1 shifts by
the immediate step size δt:
while for s ≥ t + 2
the difference is still proportional to (y − x) with a coefficient
obtained from the kernel in . What matters for the argument is not the
exact closed form of this coefficient (which depends on δt and δt + 1), but
rather that the effect of the swap on any rs is (a) linear
in (y − x) and (b)
confined to a decaying tail, because the state recursion is a
contraction at each step.
The revenue comparison leverages a monotonicity property of
single-period payoffs with respect to the reference. Since D(p, r) is
increasing in r for every
fixed p (both gain and loss
slopes are nonnegative), we have
Moreover, when η+ ≥ η−,
the value of an increase in the reference is weakly larger in situations
where the price is more likely to be evaluated as a gain. Indeed,
wherever r ≠ p the
derivative $\partial_r \Rev(p,r)$
exists and equals η+p in the gain
region r > p and
η−p in the
loss region r < p.
Thus higher r is particularly
valuable when it flips a future period from loss'' togain’’
(or expands the gain wedge), and the inequality η+ ≥ η−
ensures that, at the margin, such flips are beneficial in the right
direction for markdowns.
Putting these elements together yields the following conclusion.
Assume η+ ≥ η−. Then for any initial reference r1 and any horizon T, there exists an optimal policy whose posted prices are non-increasing over time.
Fix any optimal policy and locate an inversion pt < pt + 1.
We compare it to a modified policy that swaps (pt, pt + 1)
and then (crucially) from time t + 2 onward follows an for the new
induced state. Because the continuation value is the supremum over all
future actions, this re-optimization dominates holding the original
continuation fixed, so it suffices to show that the two-period portion
of the objective (periods t
and t + 1) weakly increases
under the swap once we account for the induced change in rt + 1 in . The
gain-seeking condition η+ ≥ η−
ensures that shifting gain opportunities'' later (andloss
exposures’’ earlier) cannot decrease the two-period expected revenue;
the time-varying steps δt simply scale
the magnitude of the induced reference change but do not alter its sign
at t + 1. Iterating swaps
eliminates all inversions without reducing optimality, producing a
markdown price path.
Two remarks help interpret this result in practice. First, the statement is existential: there may be multiple optimal policies, and some need not be markdown, but at least one markdown policy is optimal. Second, the argument is robust to the particular power-law form of δt; we only use that (i) the update is a convex combination and (ii) the impact of a single price change on the reference state propagates forward linearly and contracts over time.
When η+ < η−, pricing above the current reference can be severely punished by demand reductions. This makes the ``investment’’ phase of an aggressive two-block strategy potentially unattractive: raising prices early may create losses in the very periods when the reference is still low. In such environments it is not generally true that an optimal policy must be a markdown, because the seller may prefer to avoid losses by staying close to the state and only gradually adjusting prices. Nonetheless, markdown policies remain a useful approximation, and we can quantify this approximation error in a way that matches the effective-memory scaling.
The key tool is a control on the value function with respect to the reference state. Intuitively, if we perturb the starting reference by Δr, then (because rt is a weighted average of the initial condition and recent prices) the induced change in future references is at most Δr times the total remaining kernel mass. Under power-law steps, that mass is on the order of the effective memory length, yielding an O(Tαlog T) bound after taking a worst-case uniform window across all times.
To formalize, note that the envelope recursion implies, whenever
∂rV*(⋅, t)
exists,
Because pt ∈ [0, p̄]
and D(p, r)
has reference slopes bounded by max {η+, η−},
we have the uniform bound
where the first inequality uses that higher r weakly increases demand and prices
are nonnegative. Iterating and using gives
The sum in is precisely the remaining ``survival mass’’ of the
reference, and it can be bounded using the same exponential-product
estimates as in the kernel lemma. Choosing a uniform effective window of
length m(t) = Θ(tαlog T)
makes the tail negligible and yields
and hence a global Lipschitz property
This bound immediately implies a near-optimal markdown construction. Let r̄ := p̄ and let πr̄ be an optimal policy when the initial reference equals r̄. Starting from the maximal feasible benchmark makes it easy to avoid losses: by restricting attention to actions pt ≤ rt, we ensure (pt − rt)+ = 0 so loss aversion does not bite. Along such loss-free paths the demand function reduces to the gain-side linear form H(p) + η+(r − p), and the earlier markdown optimality logic applies to deliver an optimal continuation that can be taken to be non-increasing. Thus we may select πr̄ to be a markdown policy.
Now apply the markdown policy πr̄ starting from
an arbitrary r1 ∈ [0, p̄].
Since the only difference is the initial reference, at t = 1 yields
where we also used that V*(⋅) is nondecreasing in
r (higher references weakly
increase demand for any fixed action sequence, hence also under the
optimal sequence). Because (r̄ − r1) ≤ p̄,
the right-hand side is O(Tαlog T).
In words: even under loss aversion, we can identify a markdown policy
whose additive suboptimality is controlled by the effective memory
length.
This near-optimality guarantee is intentionally worst-case and additive. It does not claim that markdowns are always nearly optimal in relative terms, nor that the specific constructed markdown is unique or easily interpretable in every instance. Rather, it provides a robustness statement: with time-varying reference updating, restricting attention to monotone price paths cannot cost more than the amount of value that any policy could plausibly extract from reference shaping over an effective memory window.
Having established when monotone structure is exact (gain-seeking) and when it is a controlled approximation (loss aversion), we next exploit the loss-neutral linear-demand benchmark to derive an explicit characterization of the optimal markdown segment via a generalized Euler-type recursion under the time-varying steps δt.
We now specialize to the loss-neutral case η+ = η− = : η,
which plays the role of a tractable benchmark. The key simplification is
that the kink in the reference term disappears: for every (p, r),
(r − p)+ − (p − r)+ = r − p,
so demand is globally linear in (p, r):
Hence the single-period expected revenue is a smooth concave quadratic
in p,
and the dynamic problem becomes a finite-horizon linear–quadratic
control problem with state-transition coefficient rt + 1 = (1 − δt)rt + δtpt.
The time variation in δt breaks
stationarity, but it does not destroy tractability: the optimal policy
remains affine in the reference state, and the realized optimal price
path satisfies a generalized Euler-type recursion whose coefficients
inherit the variation in (δt)t ≤ T.
Because shocks enter demand additively with mean zero, they do not
affect expected revenue beyond their mean; throughout this subsection we
work with expected demand . The Bellman recursion is
Under the usual interiority conditions (so the optimizer lies in (0, p̄)), differentiability yields
the one-step FOC
In the loss-neutral linear model, $\Rev_p$ is affine in (p, r) and $\Rev_r(p,r)=\eta p$ is affine in p. This makes it possible to push
the characterization further by eliminating ∂rV*
through a two-step recursion, producing a generalized ``Euler equation’’
coupling (pt, pt + 1).
Combining the envelope recursion with gives the two-step condition
stated earlier; in the present benchmark it becomes fully explicit.
Using
$$
\Rev_p(p,r)=b+\eta r-2(a+\eta)p,
\qquad
\Rev_r(p,r)=\eta p,
$$
the two-step condition at time t ≤ T − 1 reads
with the state recursion
Equation is a generalized Euler equation: it equates the current
marginal revenue loss from raising pt (the term
−2(a + η)pt
inside $\Rev_p$) to the discounted
marginal continuation benefit that comes from shifting the next
reference via the transition . The coefficient
$$
\kappa_t:=\delta_t\frac{1-\delta_{t+1}}{\delta_{t+1}}
$$
captures how today’s reference investment trades off against the next
period’s incentive to adjust price, and it is precisely here that
time-varying updating enters.
Substituting into yields a recursion in (pt, rt, pt + 1)
with time-varying coefficients. After straightforward algebra one can
write equivalently as
where
Thus, whenever At > 0 (which
holds in the standard parameter region a + η > 0 with δt ∈ (0, 1]),
the optimal pt is an affine
function of (rt, pt + 1):
The terminal condition is the static first-order condition in the last
period,
again assuming an interior solution.
Equation should be read as the time-varying analogue of the constant-coefficient recursion that appears under exponential smoothing or under the ARM update: we still obtain a linear relationship across adjacent prices, but the coefficients now drift over time through (δt, δt + 1).
A complementary (and often computationally cleaner) characterization
proceeds directly from dynamic programming. Because is quadratic in
(r, p) and the state
evolves linearly, the value function is quadratic in r and the optimal price is affine in
r. Concretely, for each t there exist scalars (At, Bt, Ct)
such that
and the optimal policy can be written as
with time-varying coefficients (βt, γt)
determined by backward induction. To see the structure, take as an
induction hypothesis at time t + 1 and differentiate the t-period objective in . The FOC
becomes a linear equation in (p, r):
b + ηr − 2(a + η)p + δt(2At + 1((1 − δt)r + δtp) + Bt + 1) = 0,
so
The denominator a + η − At + 1δt2
is the (positive) curvature of the period-t objective in p; it is the natural interiority
condition ensuring strict concavity in p. Substituting back into yields a
backward recursion for (At, Bt, Ct)
(a time-varying Riccati-type update). While the closed-form expression
for (At, Bt, Ct)
is algebraically tedious, the key point is that it is one-dimensional
and can be computed in O(T) time.
The affine form also induces a linear evolution of the state under
the optimal policy:
Given (βt, γt)
and r1, we can
therefore compute the entire optimal expected price path by a forward
pass. In this benchmark, the ``optimal markdown’’ is not imposed as an
external restriction; rather, the monotonicity result from the previous
subsection guarantees that among optimal policies we may select one with
non-increasing realized prices, and the affine policy representation
provides a direct way to compute such an optimal path (subject to the
caveat below about feasibility constraints).
The two representations above—the two-step Euler equation and the affine-feedback form —lead to efficient computation, but they emphasize different objects.
If one wishes to compute the optimal price sequence (pt)t ≤ T directly for a given r1, it is often convenient to treat the unknowns as the pair (pt, rt)t = 1T and solve a sparse linear system. Indeed, the optimality conditions consist of:Each Euler equation links only variables at times t and t + 1, and each state equation links only (rt, pt, rt + 1). As a result, the overall linear system has a banded structure (block-tridiagonal when written in the stacked vector (r2, p1, r3, p2, …, rT, pT − 1, pT)). This can be solved in O(T) time by standard banded Gaussian elimination. The advantage of this formulation is that it stays close to the Euler-equation intuition: time-varying updating simply makes the coefficients of the banded matrix time dependent.
Alternatively, if one prefers to compute a policy that maps any realized rt to pt, then the Riccati recursion implied by is the more natural route: we compute (At, Bt) backward once, and then apply the affine map forward. This viewpoint will be useful in the learning discussion that follows, because it exhibits a low-dimensional object—the affine coefficients (βt, γt)—that summarize the optimal action rule in the benchmark.
The recursion is analytically solvable only in special cases where the time dependence in (δt) disappears or simplifies.
If α = 0 (exponential smoothing), then δt ≡ 1 − ζ is constant. In that case the Euler equation has constant coefficients, and the induced second-order difference equation for prices has a solution in terms of the roots of a characteristic polynomial; equivalently, the Riccati recursion converges to a stationary fixed point away from the horizon boundary, yielding an approximately geometric markdown in long horizons.
If (α, ζ) = (1, 0) (ARM), then δt = 1/(t + 1) varies but does so in a very structured way; one can exploit cancellations in products of (1 − δt) and harmonic sums to obtain explicit formulas for the reference kernel and, in turn, explicit representations of the optimal markdown segment in terms of simple backward recursions with closed-form coefficients.
For general α ∈ (0, 1] and ζ ∈ [0, 1), the coefficients in depend on both t and t + 1 through (t + 1)−α and (t + 2)−α. This time dependence typically precludes a clean closed form, but it does not impede computation: because the problem remains linear–quadratic, one can compute the optimal policy to machine precision in time linear in T via either the Riccati recursion or banded linear solves.
Two qualifications are worth keeping in view. First, our derivations above use interior first-order conditions; when the optimal unconstrained price lies outside [0, p̄], the true optimum is obtained by projection to the boundary, and the affine policy becomes piecewise-affine. Second, while loss-neutrality removes the kink in the reference term, other constraints (such as demand nonnegativity, inventory, or price-adjustment costs) would reintroduce nonlinearities. In that sense, the present characterization should be understood as a benchmark that isolates the role of time-varying memory in the cleanest possible environment.
In the next section we leverage precisely this structure: the loss-neutral benchmark suggests that the optimal decision rule can be summarized by a small number of time-varying coefficients (or, equivalently, by a banded linear relation across adjacent prices). This low-dimensionality is what makes it plausible to learn the underlying demand and reference parameters online without ever ``resetting’’ reference effects, and it guides the design of learning algorithms whose regret depends on α through the effective memory length.
A practical motivation for allowing time-varying updating is that it makes increasingly implausible. When δt decays, pushing the reference from its current level toward a target price requires sustained pricing for on the order of tα periods, precisely the effective-memory phenomenon captured by the kernel lemma. This creates a basic tension for learning: exploration today can contaminate the reference state for many future periods, and the number of affected periods grows with α. At the same time, the analysis above reveals an offsetting simplification: the state is one-dimensional and observed (given the known update rule), and in the loss-neutral benchmark the optimal policy is summarized by a small number of time-varying coefficients. We can therefore design learning rules that reference effects and nevertheless exploit the structure.
There are two closely related summaries of the control problem that are essentially invariant to α in (though not in numerical values).
First, the to a given reference,
is a one-dimensional map from r to p. In the loss-neutral linear
benchmark it is affine:
independent of (δt) except
through the evolution of rt. This map is
useful because, for large t
when δt is
small, the dynamic correction to the myopic rule is of order δt in the Euler
equation, so myopic pricing becomes a natural baseline even when the
true optimum is dynamic.
Second, the optimal policy in the benchmark is affine in the
state,
where the entire policy is encoded by the 2T scalars (βt, γt)t ≤ T.
Importantly, the dimension of the unknown economic environment is also
constant: in the loss-neutral linear model, all primitive uncertainty
can be collected into a parameter vector such as
$$
\theta:=(b,a,\eta)\in\R^3,
$$
while the time variation in δt is known.
Thus, learning can be framed as estimating a fixed low-dimensional θ, and repeatedly converting θ̂ into the low-dimensional policy
coefficients (β̂t, γ̂t)
via the Riccati recursion from the previous section.
Because the seller observes (pt, Dt)
and knows (δt), she can
reconstruct (rt) exactly
from r1 and posted
prices. This removes a key obstacle relative to models in which
reference is latent. In particular, in the loss-neutral benchmark the
conditional mean demand is linear in observables:
Writing Dt = xt⊤θ′ + εt
with a suitable reparameterization (e.g., θ′ := (b, η, a + η)
and xt := (1, rt, −pt)),
we obtain a standard linear regression problem with regressors: rt depends on
past prices, which were chosen using past data. Nevertheless, under
bounded shocks and mild excitation conditions, self-normalized
concentration for adaptive linear regression applies, yielding
confidence ellipsoids around the least-squares estimate.
The same idea extends to asymmetric reference effects. The demand
model
D(p, r) = b − ap + η+(r − p)+ − η−(p − r)+
is linear in the parameter vector θ := (b, a, η+, η−)
with feature vector
so one can estimate θ by least
squares as well, albeit with a policy computation step that is no longer
globally linear–quadratic.
We sketch a learning rule that (i) estimates θ online by ridge regression, (ii) computes a candidate dynamic policy by plugging an optimistic parameter into the Riccati recursion, and (iii) adds a vanishing dithering term to ensure persistent excitation without inducing large, long-lived distortions in the reference.
Fix a regularization λ > 0 and a confidence level δ ∈ (0, 1). Let θ̂t be the ridge estimate based on data up to t − 1, and let 𝒞t be an ellipsoidal confidence set for θ constructed from the design matrix. At time t:Two remarks clarify why this rule is aligned with the economics.
First, it uses the same low-dimensional object throughout: the coefficients. The algorithm never searches over arbitrary price paths; it searches over a constant-dimensional parameter vector θ, and each candidate θ induces a unique (time-varying) affine control via the Riccati recursion.
Second, dithering can be chosen to respect memory. Because the influence of pt on future references is concentrated over roughly tα periods, the cumulative distortion generated by a bounded perturbation ξt is also localized in time. This suggests shrinking ξt just slowly enough to guarantee identification, but fast enough that the induced reference drift does not dominate revenue.
Let the (expected) regret be
$$
\Reg_T
~:=~
V^*(r_1)-\mathbb{E}\Big[\sum_{t=1}^T \Rev(p_t,r_t)\Big].
$$
The kernel lemma suggests a useful mental model: the controlled system
has an mt ≍ tα,
meaning that price or estimation errors at time t materially affect only the next
mt
periods. In online learning problems with bounded memory m, a common heuristic is that regret
interpolates between $\tilde
O(\sqrt{T})$ (no memory) and $\tilde
O(\sqrt{mT})$ (memory-m
coupling). Translating this to our setting by taking m ≍ Tα
yields the benchmark scaling
which reduces to $\tilde O(\sqrt{T})$
under exponential smoothing (α = 0) and becomes nearly linear as
α → 1 (ARM-like long memory),
reflecting the intuition that early experimentation has long-lasting
consequences when references are hard to move.
One can also interpret as an information–distortion tradeoff: to identify η one needs variation in rt, but inducing such variation requires sustained price movements whose opportunity cost accumulates over the effective memory window. Larger α increases this window and therefore raises the cost of informative experimentation.
The proof strategy we have in mind mirrors standard OFU analyses but requires one additional ingredient tailored to reference dynamics: a stability/sensitivity bound showing that, uniformly over admissible parameters, the value function is Lipschitz in the reward parameters with a Lipschitz constant proportional to the effective memory length induced by (δt). The kernel lemma supplies exactly this ingredient by quantifying how deviations in prices (and hence in inferred demand parameters through the algorithm’s choices) propagate into future references.
We emphasize that Theorem~ should be read as a guidepost rather than a fully optimized statement. In particular, sharper rates may be achievable by (i) separating the learning of the static demand slope a from the learning of reference sensitivity η (which requires movement in rt), and (ii) exploiting the fact that δt → 0, so the system becomes progressively less sensitive to control later in the horizon.
Several extensions are conceptually straightforward but technically nontrivial.
With η+ ≠ η−, the demand regression remains linear in parameters via , but the control problem is no longer globally quadratic due to kinks. One pragmatic approach is to combine regression with a motivated by our structural results, e.g., markdown policies parameterized by a small number of breakpoints. This yields a low-dimensional optimization problem each period even when the exact Bellman equation is not tractable.
We have treated α and ζ as known primitives. In applications they may be uncertain. Because the reference update is deterministic given prices, one can in principle estimate (α, ζ) from observed reference proxies (surveys, posted competitor prices, or repeated-purchase panels), but without direct reference observations the identification problem becomes substantially harder. Our results suggest that mis-specifying α is economically meaningful because it changes the effective memory length and therefore the returns to early anchoring.
Learning algorithms that rely on first-order characterizations should be robust to projection onto [0, p̄] and to demand nonnegativity constraints. In finite samples, optimistic parameters can push computed prices toward the boundary; careful clipping and conservative confidence sets are needed to avoid pathological behavior, especially when α is large and the reference is slow to revert.
Overall, the main lesson is that time-varying reference updating does not merely complicate the control problem; it also offers a principled way to think about learning. The same effective-memory phenomenon that drives the Tα phase transition in optimal should also govern the difficulty of : as α rises, information arrives at rate T, but the cost of generating informative variation is amortized over increasingly long reference windows. This motivates the numerical exploration in the next section, where we can visualize both the optimal policy shapes and the way the effective-memory scaling manifests in finite horizons.
We use this section to visualize three implications of the analysis: (i) the revenue advantage of dynamic pricing over the best fixed price grows on the order of Tα (up to logarithms) in the loss-neutral linear benchmark; (ii) the optimal price path deforms smoothly as we vary the updating exponent α, interpolating between a nearly myopic regime (fast forgetting) and a strongly intertemporal regime (slow forgetting); and (iii) the qualitative structure is robust—but not identical—when reference effects are asymmetric (η+ ≠ η−), where the Bellman objective is kinked and the linear–quadratic reduction no longer applies globally.
Throughout, we take p̄ large enough that the optimal prices are interior in the benchmark experiments (we verify ex post that truncation does not bind). Because shocks enter additively and satisfy $\E[\varepsilon_t]=0$, expected revenue depends only on the conditional mean demand, so the value of any deterministic policy can be evaluated without Monte Carlo. When we report realized paths, we simulate i.i.d. bounded shocks only to illustrate that the qualitative policies are not artifacts of expectation.
In the case η+ = η− = η
and H(p) = b − ap,
the expected one-period revenue can be written (on the no-loss region
p ≤ r) as
$$
\Rev(p,r)=p\big(b+\eta r-(a+\eta)p\big),
$$
and, as discussed earlier, the optimal feedback rule is affine in r:
pt*(r) = βtr + γt.
For any fixed parameter triple (a, b, η) and
known (δt), we compute
(βt, γt)
by backward recursion (equivalently, the one-dimensional Riccati
system). This yields an exact solution for the benchmark runs, so
numerical error arises only from floating-point arithmetic.
To compute the best fixed price, we evaluate
$$
V^{\text{fix}}(r_1)=\max_{p\in[0,\bar p]} \sum_{t=1}^T
\Rev\big(p,r_t(p)\big),
\qquad
r_{t+1}(p)=(1-\delta_t)r_t(p)+\delta_t p,
$$
by one-dimensional search (the objective is smooth and unimodal for our
parameter ranges). This comparison isolates the intertemporal value of
shaping rt
from the static curvature of $\Rev(\cdot,\cdot)$.
When η+ ≠ η−,
the period objective is piecewise quadratic in p with a kink at p = r, and the Bellman
recursion is no longer globally quadratic. We therefore compute the
dynamic optimum by discretizing the state r ∈ [0, p̄] on a fine grid
and performing backward induction:
$$
V(r,t)=\max_{p\in[0,\bar p]}\Big\{\Rev(p,r)+V\big((1-\delta_t)r+\delta_t
p,t+1\big)\Big\}.
$$
The maximization over p is
one-dimensional; we implement it by combining a coarse grid search with
local refinement on each side of the kink at p = r. This routine is fast
because the state is one-dimensional and the horizon is finite.
In addition, to connect to the structural results, we compute a restricted policy class even under asymmetry: we optimize over non-increasing sequences with a small number of breakpoints (e.g., K ∈ {3, 5, 7} segments), which produces an interpretable policy and serves as a stress test for the claim that markdowns remain near-optimal when η+ < η−.
Our first experiment is designed to highlight the phase transition in
the magnitude of dynamic gains as the memory exponent α varies. We fix (a, b, η) and vary
T and α while holding ζ constant. For each (T, α), we compute the
gap
Δ(T, α) := V*(r1) − Vfix(r1),
where V* is the
optimal dynamic value in the loss-neutral benchmark. We then examine the
log–log slope of Δ in T.
A convenient parameterization that yields clear separation between dynamic and fixed pricing is one in which (i) the static optimum is interior and (ii) gains from a temporarily elevated reference are not swamped by the base-demand loss from higher prices. Concretely, we set a moderate (so base demand is not too elastic), choose η comparable to a (so reference effects are first-order), and initialize r1 below the static monopoly price (so there is room for ``anchoring’’ upward). In this regime, the computed pt* typically features an early phase with elevated prices relative to the myopic map pmyop(rt), followed by a markdown that harvests gains when pt falls below the elevated rt.
Figure~ plots Δ(T, α) for α ∈ {0, 0.25, 0.5, 0.75, 1} over horizons T spanning one to two orders of magnitude. The key visual pattern is that the curves steepen with α. A simple least-squares fit of log Δ on log T (excluding very small T where finite-horizon end effects dominate) yields slopes close to α; the fit is tightest for intermediate α, where Δ is neither essentially constant (as for α = 0) nor dominated by truncation at t = T (as can happen when α is very close to 1 and T is small).
An informative normalization is Δ(T, α)/Tα: across the same runs, this ratio remains approximately flat in T up to a slow drift that is consistent with the logarithmic factor in the upper bound. In other words, the computations match the qualitative message that the effective memory length tα is the unit of intertemporal coupling, and hence the appropriate unit of dynamic advantage.
Holding α fixed and increasing ζ uniformly shrinks δt, making reference harder to move. In the computations, this appears as a downward shift in Δ(T, α) with minimal change in the log–log slope. This matches the theory: α governs the of forgetting (effective memory growth), while ζ governs the of responsiveness. Practically, one can think of ζ as controlling how much ``effort’’ (sustained high pricing) is required to raise the reference by a given amount.
We next fix a horizon T and plot optimal price and reference trajectories (pt*, rt*) for a grid of α values. The goal is not to claim a universal shape (this depends on primitives and boundary constraints), but to show how the same economic forces manifest differently under short versus long effective memory.
When α = 0 (exponential smoothing with constant δt = 1 − ζ), the optimal policy typically resembles a smoothed version of myopic pricing: pt* tracks pmyop(rt) closely, with a modest dynamic correction early in the horizon. Intuitively, because the reference adjusts geometrically, any single price affects the distant future only weakly; the seller has little incentive to accept a large current sacrifice to influence far-ahead demand.
As α increases, the adjustment slows down, and the optimal policy becomes more front-loaded. Two robust deformations appear in the numerics:These patterns are exactly what the kernel lemma suggests: when the system remembers roughly tα periods, the return to lifting rt today is amortized over that window, so the optimal control allocates more ``effort’’ to shaping rt early.
Figure~ overlays price paths for several α under gain-seeking (or loss-neutral) effects. The paths are non-increasing, consistent with the markdown optimality result. More interestingly, the of the markdown changes: for small α the path drops relatively quickly toward its long-run level; for larger α the price remains elevated longer before declining, reflecting the fact that the reference is harder to move later in time (smaller δt) and thus must be influenced earlier if it is to matter.
To connect more directly to the theory, we compute the
Dur := |{t : rt* − pt* ≥ κ}|
for a small threshold κ > 0. Across our parameter grid,
Dur increases with α holding T fixed, and for fixed α it increases with T at a rate consistent with Tα. This
diagnostic is useful because it separates the economic channel (time
spent harvesting gains) from level effects (the overall magnitude of
prices).
Finally, we examine how the policy and the dynamic advantage change when consumers are loss-averse (η− > η+) or strongly gain-seeking (η+ > η−). This is the empirically relevant case, and it is precisely where our analysis emphasizes structure (markdown optimality or near-optimality) rather than closed-form solutions.
When η+ > η−, the computed dynamic optimum from the discretized Bellman recursion is a markdown in all instances we tested: any local non-monotonicity disappears as we refine the state grid and the price maximization. Quantitatively, relative to the loss-neutral benchmark, the optimal policy becomes more aggressive in creating and exploiting the gain region p ≤ r. The anchoring phase is slightly shorter but higher, followed by a more pronounced markdown, because the marginal return to generating perceived gains is larger.
The dynamic advantage over fixed pricing also increases in level, but its scaling in T remains tied to α. This is consistent with the idea that η+ changes the of the wedge while α changes its .
When η− > η+, the computed optimal policy can exhibit a qualitatively different early-horizon behavior depending on r1. If r1 is low, pricing above r1 triggers a loss penalty, making a sharp anchoring move unattractive. In this case, the dynamic optimum often begins near p = r (or slightly below) to avoid losses, and then transitions into a markdown as rt is gradually pulled upward. If instead r1 is high (or the seller inherits a high ``usual price’’), the optimal path resembles the gain-seeking case: the seller can stay below the reference immediately and harvest gains without paying the loss penalty.
These patterns underscore a practical point: under loss aversion, the r1 is first-order, not merely a transient. Since δt decays, a low initial reference can persist for a long time, and the opportunity cost of moving it is concentrated early.
To connect to our near-optimality claim, we compare the true dynamic optimum under loss aversion (from discretized DP) to the best segmented-markdown policy with K breakpoints. Across a broad set of instances, the segmented markdown achieves a large fraction of the dynamic advantage relative to fixed pricing, and the residual gap grows slowly with T (consistent with an O(Tαlog T) envelope rather than a linear-in-T degradation). Empirically, increasing K yields diminishing returns: most of the benefit is captured by allowing an early plateau (to avoid losses while nudging rt upward) and a later decline (to harvest gains).
The main limitation we observe is not that markdowns become poor, but that applying the loss-neutral affine policy can be very suboptimal when η− ≫ η+. In such cases, even a small episode of pricing above reference can erase much of the long-run benefit of raising rt, because losses are immediate while gains are diluted over the effective memory window. This reinforces the modeling lesson: asymmetry matters not only for welfare interpretation but also for algorithm design, since it changes which regions of the state space are economically safe to explore.
The numerical exercises reinforce three messages. First, the effective-memory exponent α is visible in finite-horizon computations: dynamic gains scale approximately like Tα and policies become more front-loaded as α increases. Second, the policy shapes are interpretable through a simple anchoring-versus-harvesting narrative that survives beyond the loss-neutral benchmark. Third, while asymmetry introduces kinks and complicates exact computation, it does not eliminate the organizing role of markdowns; rather, it shifts the early-horizon incentives by making it costly to price above a low inherited reference.
These patterns motivate the broader questions we turn to next: if a platform or regulator can influence the effective memory of consumers (through information design, displayed ``usual prices,’’ or rules governing reference disclosures), what are the implications for revenue, welfare, and fairness, and how should we think about policy that effectively chooses α and ζ?
Our results and simulations emphasize that
how long consumers remember'' is not a merely descriptive parameter: it is an object that platforms, firms, and regulators can influence through information design. In the model, the pair $(\alpha,\zeta)$ governs the effective span and intensity of the reference mechanism, and the main comparative statics run through the induced memory length on the order of $t^{\alpha}$ (with overall responsiveness scaled by $1-\zeta$). This mapping suggests a useful translation for practice: policies that make older prices salient (or that stabilize the notion of ausual
price’’) effectively reduce forgetting and thereby raise α (or lower ζ), whereas policies that foreground
the most recent transaction and de-emphasize history effectively
increase forgetting and thereby lower α (or raise ζ). In this section we discuss how
these levers interact with revenue, welfare, and fairness, and we
outline open questions that arise once one moves beyond our stylized
environment.
Many retail transactions are mediated by platforms that choose what
price history is displayed, what comparison is suggested, and how
discounts are framed. Common design choices include: whether to show a
strike-through
was'' price, whether to show a price range over the last $30$ or $90$ days, whether to display the buyer's own past purchase price, and whether to provide cross-seller comparisons. Each of these design elements can be interpreted as shaping the weights consumers implicitly place on past prices. In our notation, the kernel lemma makes this precise: the reference state is a weighted average of recent posted prices, with mass concentrated on the most recent $\tilde{O}(t^{\alpha})$ periods. A platform that forces the comparison set to be short (e.g.,lowest
price in the last 7 days’‘) effectively truncates the memory kernel and
pushes behavior toward a smaller effective α; conversely, a platform that
highlights a longer lookback window or reinforces a stable ``regular
price’’ pushes toward a larger effective α.
This framing matters because the seller’s optimal policy reacts sharply to memory. When effective memory is long, early prices have persistent influence, and we obtain both larger dynamic gains and stronger incentives to front-load high prices and then markdown. When effective memory is short, the intertemporal benefit of shaping reference is small, and pricing becomes closer to static optimization. Thus, even absent any change in preferences, a platform can amplify or dampen incentives for intertemporal price manipulation by changing how price history is curated.
A practical implication is that the platform’s objective function is pivotal. A platform that is paid via ad valorem fees may prefer designs that increase seller revenue, which in our environment may correspond to a larger effective α (or smaller ζ) when reference effects are important, because the dynamic advantage scales like Tα up to logarithms. A platform concerned with consumer satisfaction and long-run retention may prefer the opposite if consumers dislike feeling manipulated, if high initial prices reduce conversion, or if perceived unfairness increases churn. The model therefore provides a language for an empirically grounded question: which UI and disclosure policies shift effective memory parameters, and how do those shifts translate into seller revenue and consumer outcomes?
Revenue comparisons alone are not welfare statements. Reference dependence is often interpreted as a behavioral distortion, and policy discussions frequently treat the reference as something that can be exploited. Our analysis is consistent with that interpretation in the sense that dynamic pricing creates wedges (periods where rt − pt > 0) that raise demand without improving intrinsic product value. Yet the welfare consequences are ambiguous once one accounts for heterogeneous consumers, participation decisions, and the possibility that consumers derive genuine utility from perceived gains (even if reference-dependent).
A helpful decomposition separates three channels. First, there is the standard allocative channel: higher prices reduce quantity relative to marginal cost and thus may reduce total surplus in a competitive benchmark. Second, there is a volatility channel: front-loaded pricing and markdowns can shift when consumers purchase, potentially creating congestion or stockouts (in richer models with capacity) and increasing the variance of consumer expenditure over time. Third, there is a behavioral channel: if perceived gains and losses are not mere framing but correspond to experienced utility, then creating gain episodes can increase consumer welfare even when the seller extracts more revenue. In our reduced-form demand specification, these channels are not separately identified; the demand response captures the net effect of perceived gains and losses on purchasing behavior, not the underlying welfare primitives.
Nonetheless, the comparative statics deliver a clear policy lesson: interventions that lengthen memory (larger α) increase the seller’s ability to shift demand intertemporally by shaping reference, so any welfare analysis should treat memory as a lever. If a regulator believes that reference manipulation is harmful, then limiting the salience of long price histories (or standardizing what constitutes a valid comparison) can reduce the long-horizon intertemporal coupling and attenuate the incentive for aggressive anchoring. Conversely, if a regulator believes that consumers benefit from stable expectations and from being protected against short-term price spikes, then requiring longer lookback windows for advertised ``discounts’’ may be appropriate even if it increases the seller’s scope for dynamic extraction along other margins.
Reference effects are likely heterogeneous: some consumers track prices closely, others rely heavily on platform-provided cues, and still others face attention or liquidity constraints that make them more susceptible to high initial prices. In such settings, the same shift in effective memory can have uneven impacts across groups. For example, making a long price history salient may help sophisticated consumers avoid overpaying, but it may also create stronger reference anchoring that harms inattentive consumers who enter mid-horizon and treat displayed ``regular prices’’ as informative. Similarly, under loss aversion (η− > η+), a low inherited reference can lock in a long period of relatively low prices (benefiting later consumers) but may also penalize early adopters if the seller avoids pricing above reference and delays product availability or promotion.
Fairness concerns also arise from segmentation. Platforms can personalize displayed comparisons (e.g., ``you saved relative to your usual price’’) or vary the reference window across users. In our language, this is equivalent to assigning different (α, ζ) or even different reference states rt across consumer cohorts. Such design can be efficiency-enhancing if it better matches information to preferences, but it can also function as a form of behaviorally mediated price discrimination. Importantly, discrimination can occur even when the posted price pt is uniform: if different groups carry different references, then the same price generates different perceived gains/losses and hence different effective demand elasticities. This suggests a novel fairness diagnostic: disparate outcomes may arise not only from different prices but from different induced by platform design.
A regulator or platform auditor therefore faces two related questions. The first is transparency: are consumers told how ``usual price’’ benchmarks are computed and over what window? The second is parity: are benchmark rules consistent across users and sellers, or are they optimized to increase conversion and revenue in ways that systematically disadvantage certain groups? Our framework does not resolve these questions normatively, but it helps clarify what should be measured: not only realized prices, but also the informational environment that shapes reference.
Many jurisdictions regulate strike-through pricing and
was/now'' claims by specifying a lookback period or requiring that the reference price be a genuine prevailing price for a nontrivial duration. These rules can be interpreted as imposing constraints on the reference signal available to consumers. In our setting, such regulation can be viewed as an external choice of an effective memory window: a requirement that thewas’’
price be the lowest price in the last L days pushes consumer comparisons
toward a longer and more conservative benchmark, while allowing a seller
to cite any recent high price pushes comparisons toward a shorter and
potentially manipulable benchmark.
This interpretation yields two nuanced implications. First, stricter ``usual price’’ rules can reduce deceptive anchoring (in the sense of preventing a seller from briefly posting an inflated price solely to create a reference), but they can also make legitimate markdown strategies more potent by stabilizing the reference once it has been earned. Second, the welfare effect depends on whether the regulation changes only the about past prices or also the price path via seller incentives. Our model focuses on the latter: changing effective memory changes optimal pricing, so regulation can have equilibrium effects rather than merely informational effects.
From a policy design perspective, the scaling results provide a way to anticipate magnitude. If the effective memory exponent rises, the potential revenue gain from dynamic reference shaping rises roughly like Tα. Even if the absolute horizon T is not literally the number of days a product is sold, the same logic applies to the length of the period over which consumers consider the product and over which sellers can run promotions. For seasonal goods, subscriptions, or durable purchase cycles, that window can be long, making the choice of reference policy consequential.
Several limitations temper how literally one should take the policy mapping from (α, ζ) to design choices. First, our consumers are myopic and summarized by a reduced-form demand curve; strategic waiting, stockpiling, and forward-looking inference about future discounts are excluded. These forces can either reinforce markdowns (as in classic durable goods intuition) or dampen them (if consumers anticipate discounts and delay purchase). Second, the seller is a monopolist with no inventory constraints. In many environments, inventory and replenishment create additional motives for dynamic pricing that interact with reference effects, and competition can discipline the ability to sustain high early prices. Third, the reference update rule is deterministic and common knowledge; in practice, consumers may have noisy memories and may learn the seller’s pricing regime over time.
These limitations matter for welfare claims. In particular, if consumers are forward-looking, then reference manipulation may be partially undone by expectations, and the welfare effect of longer memory could reverse. Similarly, in competitive markets, longer memory could intensify price competition by making deviations from a low-price norm more costly in demand terms, which could reduce prices and increase consumer surplus. Thus, we view the present model as an organizing device: it isolates a clean channel (time-varying effective memory) and shows how it scales incentives, but it is not a complete regulatory impact model.
Several research directions appear especially important for connecting theory to platform policy.
The parameters (α, ζ) are not directly observed. Estimating them requires panel data on prices, demand, and potentially on platform display policies, with careful attention to endogeneity (prices respond to anticipated demand). Natural experiments that change disclosure rules or UI elements offer a promising route, as they may shift memory without directly changing costs or product quality.
Real consumers may form references from multiple sources: their own past purchases, competitor prices, list prices, and platform-provided benchmarks. A multi-reference model would endogenize which reference is salient and allow platforms to choose salience weights. Conceptually, this generalizes the scalar state rt to a vector, with platform policy selecting a mapping from price histories to the salient component.
In many applications the seller does not know (a, b, η+, η−) ex ante and must learn. Reference dependence complicates exploration because prices affect both current demand and future state. Designing learning algorithms with regret guarantees that reflect the effective memory length (and that respect monotonicity or ``no gouging’’ constraints) remains largely open, especially under asymmetric kinks.
A normative analysis requires a model of utility with reference dependence, not just demand. Embedding our dynamics in a utility-based framework would allow one to separate experienced gains/losses from purely behavioral distortions and to quantify when policies that reduce manipulation also reduce consumer enjoyment from discounts.
Finally, platform policy is rarely chosen in a monopoly vacuum. Competing platforms may choose different reference disclosures to attract consumers, and sellers may multi-home. Understanding equilibrium in disclosure and reference design, and its interaction with dynamic pricing, is essential for translating our comparative statics into market-level predictions.
Taken together, these discussion points reinforce the main conceptual takeaway: reference dependence makes the informational environment a first-class determinant of dynamic pricing incentives. By interpreting UI choices and ``usual price’’ rules as choices over effective memory, we obtain a coherent way to link platform design, regulation, revenue, and distributional outcomes, and we identify a set of measurable objects—memory span, responsiveness, and asymmetry—that should anchor empirical and policy work going forward.