Memory Matters: Phase Transitions in Dynamic Pricing with Time-Varying Reference Memory δ_t ∝ (t + 1)^−α

Metadata

Total Words: 15,228
Export Date: 2026-01-16 07:54:58
Description: Modern platforms make long-horizon price histories salient (“usual price” labels, trackers), changing how reference prices form and how firms should price over time. Building on recent results showing that a running-average reference mechanism (ARM) makes markdown pricing near-optimal and fixed pricing potentially linearly suboptimal, we study a unified family of reference dynamics indexed by a memory parameter α ∈ [0, 1]. The reference price evolves as r_t + 1 = (1 − δ_t)r_t + δ_tp_t with δ_t = (1 − ζ)/(t + 1)^α, which nests exponential smoothing (ESM) at α = 0 and ARM at (α, ζ) = (1, 0). Under linear demand with asymmetric gain/loss reference effects, we prove a phase transition: the worst-case revenue loss of the best fixed-price policy scales as Θ(T^α) (up to logarithmic factors), interpolating between the constant-gap ESM regime and the linear-gap ARM regime. We further show that markdown policies remain structurally optimal for gain-seeking customers and near-optimal for loss-averse customers, with additive gaps that scale with the effective memory length T^α. Finally, we outline how learning algorithms can exploit low-dimensional policy parameterizations while handling time-varying reference dynamics, suggesting regret rates that smoothly worsen with α.

1. Introduction and motivation (2026 platforms): long lookback references, deal labels, why ESM vs ARM is insufficient, and what a memory knob α predicts.
2. Model: demand with asymmetric reference effects; feasible prices; reference dynamics with δ_t = (1 − ζ)/(t + 1)^α; objective and value function; special cases and interpretation of effective memory length.
3. Preliminaries on the memory kernel: closed-form expression for r_t as a weighted sum of past prices; concentration of weights on the last Õ(t^α) periods; lemmas relating α to sensitivity of r_t to early prices.
4. Static vs dynamic pricing: worst-case lower bound Ω(T^α) for fixed-price policies (constructive instances), and matching upper bound O(T^αlog T) (tightness up to logs).
5. Markdown optimality/near-optimality: extend swap/majorization arguments to time-varying weights; characterize when monotone policies are without loss; quantify additive loss for loss-averse customers using generalized Lipschitz bounds.
6. Linear-demand characterization (loss-neutral benchmark): derive generalized Bellman FOC condition; obtain a generalized markdown recursion; discuss when closed forms exist and when one must compute numerically (time-varying linear systems).
7. Learning implications (sketch + optional theorem): identify low-dimensional objects (e.g., greedy-price affine map) invariant in α; propose an algorithm that learns demand parameters without explicit reference resets; conjectured/derived regret scaling in α.
8. Numerical illustrations: verify T^α scaling; show policy shapes as α varies; stress-test under asymmetry η₊ ≠ η₋.
9. Discussion: platform policy implications (choosing effective memory), welfare and fairness, links to ‘usual price’ regulation; open problems.

Content

1. Introduction and motivation (2026 platforms): long lookback references, deal labels, why ESM vs ARM is insufficient, and what a memory knob α predicts.

Digital retail in 2026 looks less like a posted-price storefront and more like a continuously tuned interaction between a platform, its sellers, and consumers who carry increasingly explicit . Two developments are especially salient. First, platforms have made reference prices''---the benchmarks against which a current offer is framed---both more visible and more standardized. A product page routinely displays awas’’ price, a typical'' price, a struck-through list price, or a percentage discount badge, and these labels are often computed using a stated lookback rule (e.g.,lowest price in the last 30 days’‘). Second, the data environment has expanded the horizon of consumer recall: wish lists, price-tracking extensions, persistent carts, and algorithmically curated ``price drop’’ notifications all effectively extend the set of past prices a buyer can retrieve at the moment of purchase. These forces make it difficult to treat reference dependence as a short-memory phenomenon and, at the same time, difficult to justify the opposite extreme in which the entire history is always equally salient.

This paper is motivated by the practical gap between two canonical modeling approaches to reference formation. In the exponentially smoothed model (ESM), the reference state updates as a constant-weight average,
r_t + 1 = ζr_t + (1 − ζ)p_t,
so the influence of prices m periods ago decays geometrically in m. This is a convenient approximation when the environment is stationary and consumers forget'' at a constant hazard. Yet it sits uneasily with platform practices that use \emph{calendar} lookbacks (e.g., $30$ days) or with consumer behavior that becomes more anchored as experience accumulates: a buyer who has repeatedly seen a product at \$100 for months does not revise her notion of thenormal’’ price as quickly as a first-time visitor. At the other pole, the arithmetic reference model (ARM) makes the reference a running average, which corresponds to
$$ r_{t+1}=\frac{t}{t+1}r_t+\frac{1}{t+1}p_t, $$
so the weight on each past price is roughly 1/t and the effective memory grows linearly with age. This specification captures a kind of learning-by-accumulation, but it can overstate the salience of very old prices in contexts where products change, competitors enter, and consumer attention is episodic rather than archival. In short, ESM can be and ARM can be .

Our starting point is that platforms and consumers alike appear to operate with an intermediate regime: the reference is built from past prices, but the rate at which it incorporates new information declines over time. We capture this with a time-varying step size,
$$ r_{t+1}=(1-\delta_t)r_t+\delta_t p_t, \qquad \delta_t=\frac{1-\zeta}{(t+1)^\alpha},\qquad \alpha\in[0,1],\ \zeta\in[0,1). $$
The parameter α is a . When α = 0 we recover ESM with constant updating intensity 1 − ζ. When (α, ζ) = (1, 0) we recover ARM. For intermediate α, the consumer reference becomes increasingly inert: early in the horizon, new posted prices move the benchmark materially; later, the same price change moves the benchmark only slightly. This feature is not merely a technical interpolation. It encodes the idea that a consumer (or a platform rule) gradually ``locks in’’ a notion of what the regular price should be.

Why should updating intensity decline with time? We see at least three complementary interpretations. The first is . As a buyer sees more instances of a product price, the estimate of a ``typical’’ price becomes statistically more precise, and rational updating would naturally place less weight on any single new observation. The second is . Digital tools lower the cost of recalling recent prices, but they do not make retrieval of the entire price history free; rather, they create a recency-biased cache whose span can expand as the consumer engages repeatedly with the product category. The third is by platforms and regulation. Discount labels frequently reference a prescribed window, and sellers can anticipate that early posted prices may serve as a future anchor in those compliance calculations or in consumers’ mental comparisons. In each interpretation, it is plausible that a reference is moved quickly when it is still being formed, and slowly once it is established.

The step-size path δ_t ≍ (t + 1)^−α implies a precise prediction about . Intuitively, the impact of a past price p_t − m on today’s reference r_t is roughly proportional to a product of terms of the form (1 − δ_k), which behaves like an exponential in the cumulative step sizes $\sum_{k=t-m}^{t}\delta_k$. When δ_k scales as k^−α, this cumulative sum over the last m periods is approximately m/t^α. Consequently, prices older than order t^α periods carry exponentially small weight, while prices within the most recent order t^α periods account for most of the reference. We therefore obtain a growing, but sublinear, memory span: the relevant lookback is not fixed (as in ESM), and it is not proportional to the entire age of the product (as in ARM), but rather expands at rate t^α. This delivers a disciplined way to talk about long lookbacks in mature products without forcing the model to keep essentially the entire early history on the margin.

This growing memory'' perspective clarifies a common platform phenomenon: the same promotional tactic can feel potent early in a product’s lifecycle and muted later. If a seller launches a product and cycles through prices in the first few weeks, consumers may quickly revise what they consider normal. Months later, after repeated exposures at a stable regular price, a one-week promotion may generate a sharp spike in conversion while leaving the reference largely intact; the promotion is perceived as a \emph{deal} rather than a redefinition of the baseline. Our dynamics reproduce this pattern mechanically: in later periods, $\delta_t$ is small, so the reference reacts little to the temporary discount, preserving thewas/now’’ wedge that drives deal-seeking demand. Conversely, when α is small (fast forgetting), deep discounts may rapidly reset the reference downward, reducing the profitability of frequent promotions and aligning with the practical advice that constant discounting can ``train customers’’ to expect low prices.

A second motivation comes from the design of deal labels and their strategic use. Sellers often care not only about the contemporaneous conversion effect of a price cut, but also about the credibility of a discount badge. If the platform’s badge is computed from some past benchmark, then raising today’s posted price can increase tomorrow’s perceived discount even if the seller plans to reduce price later. This intertemporal tradeoff is central to reference-dependent demand and is precisely where the choice between ESM and ARM matters. Under constant-memory ESM, any attempt to ``set an anchor’’ dissipates at a constant rate. Under full-history ARM, anchors are persistent but also very hard to move once enough history has accumulated. By allowing intermediate α, we can represent environments in which anchors are persistent enough to matter (long lookback), yet still responsive enough to be strategically shaped (nontrivial dynamic optimization).

The economics of such shaping interacts sharply with the well-documented asymmetry between perceived gains and losses. Consumers often respond more favorably to prices below their reference than they respond unfavorably to prices above it, or vice versa, depending on category and context. Deal labels can amplify this asymmetry: a discount tag makes gains salient, whereas a price increase without a tag may be less salient or may trigger distrust. Our demand specification will allow different slopes for gain and loss regions, and the time-varying reference update translates these local demand sensitivities into global pricing patterns. In particular, when gains are especially effective, one expects a high-to-low (markdown) pricing path: early high prices elevate the benchmark; later lower prices harvest ``gain’’ demand. When losses are especially painful, one expects pricing rules that avoid pricing above the benchmark or that move the reference upward cautiously. What our framework adds is a parameterized notion of how long such intertemporal effects last and how they scale with the horizon.

The parameter α therefore has a direct set of empirical and managerial implications. Holding other primitives fixed, larger α implies slower forgetting and a longer effective lookback, which raises the return to early anchoring and increases the potential wedge between optimal dynamic pricing and a naïve fixed-price strategy. At the same time, larger α also makes the reference harder to move later, so it can increase the cost of correcting a mistaken anchor. This creates a nuanced prediction: environments with durable price memory (high α) should exhibit both stronger incentives for early price setting—including high introductory ``regular’’ prices and staged markdowns—and stronger path dependence in subsequent pricing. Conversely, in settings where consumers are volatile or where platform interfaces emphasize only short-term comparisons (low α), the benefit of sophisticated intertemporal pricing is limited, and static pricing heuristics may perform relatively well.

Beyond managerial relevance, the memory knob matters for policy discussions around pricing transparency and discount regulation. When rules mandate that a discount be computed relative to a historical benchmark, they implicitly choose a memory regime. A longer mandated lookback can protect consumers from artificial ``was’’ prices, but it can also increase the incentive to set high initial prices that then serve as durable anchors. Our model provides a way to articulate this tradeoff: regulation that effectively raises α (by extending the relevant memory) can reduce short-run manipulation via rapid reference resetting, yet increase the long-run value of early anchoring. We do not claim this is the only channel, but it is a disciplined channel that connects a platform design choice to a measurable change in dynamic pricing incentives.

Finally, we emphasize what we are doing. We do not model forward-looking consumers who strategically time purchases, nor do we model competition among sellers or inventory constraints that can dominate pricing in many categories. We also treat the reference as a one-dimensional state updated deterministically from posted prices, which abstracts from heterogeneous memories, advertising, and external reference points (e.g., competitor prices). Our goal is narrower: to isolate how a simple, time-varying reference updating rule—interpretable as a long lookback with growing effective span—changes the structure and value of dynamic pricing relative to the two canonical extremes. Within that scope, the model illuminates a basic tradeoff: the same mechanism that makes prices ``stick’’ in consumers’ minds also creates an intertemporal instrument for sellers, and the strength of that instrument is governed by the single parameter α.

2. Model: demand with asymmetric reference effects; feasible prices; reference dynamics with δ_t = (1 − ζ)/(t + 1)^α; objective and value function; special cases and interpretation of effective memory length.

We study a monopolist seller who posts a price over a finite selling horizon t ∈ {1, …, T} for a single product. The key state variable is a that summarizes how buyers benchmark the current offer against past prices. Our goal in this section is to (i) specify the reference-dependent demand system with asymmetric responses to gains and losses, (ii) define the seller’s feasible actions and the reference dynamics with time-varying updating intensity, and (iii) formulate the seller’s dynamic revenue maximization problem and the associated value function. Throughout, we treat consumers as myopic in the sense that demand in period t depends on (p_t, r_t) but not directly on expected future prices; all intertemporal considerations enter through the evolution of the reference state.

Time is discrete. At the beginning of period t, the seller observes the current reference state r_t ∈ [0, p̄] and then chooses a posted price p_t ∈ [0, p̄], where p̄ is an exogenous maximum feasible price (e.g., an institutional cap, a menu bound, or a range in which demand remains well-defined). Demand then realizes, revenue is collected, and the reference state updates deterministically according to the realized posted price.

The reference update rule is a convex combination:

Because δ_t ∈ (0, 1] for all t (given ζ < 1 and (t + 1)^α ≥ 1), the reference remains in [0, p̄] whenever r₁, p_t ∈ [0, p̄]. The parameter ζ acts as a baseline inertia term: increasing ζ shrinks all step sizes proportionally, making the reference harder to move at every date. The parameter α governs how quickly updating intensity decays with age. When α is larger, the reference becomes increasingly inert as t grows, reflecting the idea that the benchmark ``locks in’’ with accumulated exposure.

It is useful to interpret in two equivalent ways. First, r_t + 1 − r_t = δ_t(p_t − r_t), so δ_t directly scales the one-period adjustment toward the current posted price. Second, iterating shows that r_t is a weighted average of past prices and the initial condition, with weights determined by products of (1 − δ_s). We postpone the explicit weight representation and its consequences to the next section, but we will already use here the basic intuition: because δ_t declines like (t + 1)^−α, early prices can influence the benchmark for longer when α is larger.

Demand in period t is a reference-dependent function of (p_t, r_t) plus an additive shock:

The boundedness assumption is a technical convenience that guarantees integrability of revenues and avoids pathological corner solutions; none of our qualitative insights rely on heavy tails.

The deterministic component D(p, r) incorporates a standard base-demand term and a piecewise-linear reference adjustment:

where (x)₊ := max {x, 0}. The parameters a > 0 and b ≥ 0 govern the base linear demand curve, while η₊, η₋ ≥ 0 capture the strength of reference effects in the gain and loss regions, respectively. When p ≤ r, the price is perceived as a relative to the benchmark and demand increases by η₊(r − p). When p ≥ r, the price is perceived as a and demand decreases by η₋(p − r). This specification nests the symmetric, loss-neutral case η₊ = η₋ = : η and allows us to separate two empirically plausible environments: gain-seeking categories (large η₊ relative to η₋) and loss-averse categories (large η₋ relative to η₊).

Two modeling remarks are important for interpretation. First, the reference enters demand only through the difference r − p and only via the sign-dependent slope; r is not a separate ``quality’’ shifter. This isolates framing and comparison effects. Second, the piecewise linearity deliberately emphasizes tractability and the economic logic of gain/loss asymmetry. In particular, within each region (p ≤ r or p ≥ r), period demand is affine in both p and r, which will allow us later to characterize optimal pricing paths via relatively simple recursions rather than full-blown numerical dynamic programming.

We also implicitly impose feasibility conditions ensuring demand does not become negative over the price range of interest. Because can in principle yield negative values at very high prices (or if η₋ is large), one can either interpret D(p, r) as an approximation valid on [0, p̄] with p̄ chosen so that D(p, r) is nonnegative for relevant (p, r), or replace D with max {D, 0} without changing the central intertemporal mechanism. For clarity, we maintain and assume parameters are such that expected revenue is well-defined and the seller’s optimum is attained in [0, p̄].

Given a price p and reference state r, expected period revenue (conditioning on the state and action) is

since 𝔼[D_t ∣ p, r] = D(p, r) by . The seller is risk-neutral and chooses prices to maximize expected total revenue over the horizon:

where a (possibly history-dependent) policy π selects p_t based on the information available at time t. Because the reference state evolves deterministically from past posted prices via , and demand shocks are i.i.d. and additively separable, the problem is Markov in (r_t, t): the entire payoff-relevant history is summarized by the current reference level and the time index.

Accordingly, we define the continuation value at time t as
$$ V^*(r,t)=\sup_{\pi}\ \mathbb{E}\Big[\sum_{s=t}^T \text{Rev}(p_s,r_s)\ \Big|\ r_t=r\Big], \qquad V^*(r,T+1)=0. $$
The Bellman equation takes the familiar form

The economic content of is immediate: posting a higher price today raises current revenue mechanically, but also shifts tomorrow’s reference upward, which can be valuable (if it makes future prices look like gains) or costly (if it makes future prices look like losses, or if it pushes the seller into a region where demand is more sensitive). The time-varying δ_t determines how strongly this intertemporal channel operates at each date.

A useful benchmark for interpretation is the best fixed-price policy, where the seller chooses a constant p and commits to p_t ≡ p for all t. Under such a policy, the reference follows with p_t = p, so r_t converges toward p at a rate controlled by (δ_t). We write the best fixed-price expected revenue from r₁ as
$$ V^{\text{fix}}(r_1):=\max_{p\in[0,\bar p]}\ \sum_{t=1}^T \mathbb{E}[\text{Rev}(p,r_t)]. $$
Comparing V^*(r₁) and V^fix(r₁) will let us quantify the value of dynamic price paths that intentionally shape the reference over time.

The specification embeds two canonical reference formation rules as special cases. When α = 0, we have constant updating intensity δ_t = 1 − ζ and the state recursion becomes
r_t + 1 = ζr_t + (1 − ζ)p_t,
the exponentially smoothed model (ESM). ESM implies that the impact of a past price decays at a constant geometric rate, so the consumer’s effective lookback is roughly constant over time. When (α, ζ) = (1, 0), we have δ_t = 1/(t + 1) and obtain the arithmetic reference model (ARM),
$$ r_{t+1}=\frac{t}{t+1}r_t+\frac{1}{t+1}p_t, $$
under which the reference is essentially a running average of all past prices, and very old prices retain nontrivial influence.

For intermediate α ∈ (0, 1), the model formalizes a middle regime: the benchmark incorporates new prices, but the rate of incorporation declines as the product ages. Although we defer formal kernel calculations, it is worth stating the basic mechanism that will underlie our results. If we perturb a past price p_s slightly, its influence on a later reference r_t is multiplied forward through the dynamics by a sequence of ``retention’’ factors (1 − δ_k) for k = s + 1, …, t − 1. When δ_k is small, retention is high; when δ_k is large, retention is low. With δ_k ≍ k^−α, the cumulative attenuation from s to t is approximately
$$ \prod_{k=s+1}^{t-1}(1-\delta_k)\ \approx\ \exp\Big(-\sum_{k=s+1}^{t-1}\delta_k\Big), $$
so what matters is the partial sum of step sizes over the intervening dates. For α > 0, these step sizes decline, meaning that (holding the gap t − s fixed) late-stage references are harder to move than early-stage references. Conversely, for a fixed current time t, the part of the past is the range of s for which $\sum_{k=s+1}^{t-1}\delta_k$ is not too large; outside that range, the exponential attenuation makes the weight negligible. This is the sense in which α acts as a memory knob: larger α slows the growth of the partial sums and thereby extends the span of past prices that still matter for today’s benchmark.

This interpretation connects directly to practice. A platform rule that effectively uses a longer lookback window, or a consumer population that accumulates a more stable ``fair price’’ belief over repeated exposures, corresponds in our reduced form to a larger α (and/or a larger ζ). The seller then faces a sharper dynamic tradeoff: early prices have a longer shadow on future perceived gains and losses, but later corrective moves are less effective because the benchmark has become inert. In the next section we make this shadow precise by deriving the closed-form memory kernel and bounding how quickly its mass concentrates on recent prices as a function of α and ζ.

3. Preliminaries on the memory kernel: closed-form expression for r_t as a weighted sum of past prices; concentration of weights on the last Õ(t^α) periods; lemmas relating α to sensitivity of r_t to early prices.

A central object in our analysis is the mapping from the posted-price history (p₁, …, p_t − 1) to the current reference r_t. Because the state update is linear, r_t admits a closed-form representation as a weighted sum of past prices (plus the initial condition). This representation plays two roles. First, it gives an explicit ``memory kernel’’ that quantifies how strongly each lagged price affects today’s benchmark. Second, it allows us to translate properties of the step-size schedule (δ_t)—in particular, its power-law decay governed by α—into quantitative bounds on how long the seller can benefit from shaping the reference.

Iterating yields, for each t ≥ 2,

We will write more compactly as

The coefficients {w_t, s} are nonnegative and sum to one:

Thus r_t is a convex combination of r₁ and the past posted prices. The ``memory kernel’’ at time t is the vector (w_t, s)_{s ∈ {0, 1, …, t − 1}}, which depends only on the step-size sequence (δ_k) and not on the realized prices.

Two immediate sensitivity identities follow from . For any s ≤ t − 1,

Hence the weights are not merely an algebraic device: they are exactly the marginal impacts of earlier posted prices (or of the initial anchor) on the current benchmark. This observation will let us connect the time scale on which weights decay to the time scale on which a seller can create an economically meaningful gap between r_t and p_t.

The weights also give a useful Lipschitz-type inequality that we will use repeatedly when comparing two price paths. Consider two sequences (p_s)_s ≥ 1 and (p_s^′)_s ≥ 1, with corresponding reference paths (r_t) and (r_t^′) starting from r₁ and r₁^′. Then for each t,

This inequality formalizes a theme we return to later: differences in the distant past matter only insofar as the kernel assigns them non-negligible mass.

To understand the shape of (w_t, s) under $\delta_k=\frac{1-\zeta}{(k+1)^\alpha}$, it is helpful to approximate products of the form $\prod_{k=u}^{v}(1-\delta_k)$ by exponentials. Since log (1 − x) ≤ −x for x ∈ (0, 1), we have

A matching lower bound can be obtained when δ_k is small (e.g., for moderate and large k), using log (1 − x) ≥ −x − x²; for our purposes, the upper bound already captures the key concentration phenomenon.

The partial sums of δ_k depend sharply on α. For α ∈ (0, 1),

while for α = 1,

Substituting – into yields a transparent description of how far back the kernel reaches. Roughly, w_t, s is of order δ_s times an exponential of (minus) the cumulative step sizes between s and t. When α > 0, δ_k shrinks with k, so retention (1 − δ_k) rises with k, and the reference becomes increasingly inert in late periods.

We now formalize the ``effective memory length’’ induced by the power-law schedule. Fix a time t and consider how much total kernel mass lies on prices older than t − m, i.e., on indices s ≤ t − m − 1. Because r_t is a convex combination, this tail mass directly bounds the extent to which the distant past can affect the current reference.

A convenient way to quantify tail mass is to track the survival of any perturbation that occurred at or before time t − m − 1 as it propagates to time t. The key step is that products of retentions can be controlled by sums of step sizes on the intervening window.

The content of is that the relevant time scale for the decay of memory is t^α: once we go back more than a constant multiple of t^α periods, the cumulative step size over the last m dates is order m/t^α, and retention decays like exp (−Θ(m/t^α)). In particular, for any fixed c > 0,

which is the sense in which weights concentrate on the most recent Õ(t^α) prices (where Õ(⋅) hides logarithmic factors when we invert such bounds).

We bound the mass on indices at most t − m − 1 by the maximum possible survival of any weight from time t − m onward. Using and nonnegativity of weights, it suffices to control the retention factor from t − m to t, namely $\prod_{k=t-m}^{t-1}(1-\delta_k)$. Applying gives
$$ \prod_{k=t-m}^{t-1}(1-\delta_k)\le \exp\Big(-\sum_{k=t-m}^{t-1}\delta_k\Big). $$
For α ∈ (0, 1], the sum $\sum_{k=t-m}^{t-1}(k+1)^{-\alpha}$ is well-approximated by an integral and satisfies $\sum_{k=t-m}^{t-1}(k+1)^{-\alpha}\ge c\, m/t^\alpha$ for a constant c > 0 as long as m is not too large relative to t (and in fact the bound extends uniformly with appropriate constants). Multiplying by (1 − ζ) yields .

A useful corollary is an explicit effective lookback window as a function of an accuracy level ε.

In words, to recover all but an ε fraction of the kernel mass at time t, it suffices to look back on the order of t^αlog (1/ε) periods. This scaling is the technical underpinning of the phase-transition statements we establish later: with α larger, the seller can create a reference shift that persists for more future periods, and thus dynamic policies can outperform fixed pricing by an amount that grows with T^α.

We next translate the concentration results into bounds on how strongly early decisions can affect late references. While already tells us that ∂r_t/∂p_s = w_t, s, it is useful to have simple upper bounds on w_t, s that depend on the gap (t − s) and the time scale t^α.

The second inequality uses δ_s ≤ (1 − ζ)/(s + 1)^α ≤ (1 − ζ)/t^α for s ≤ t. The key implication is that, as a function of the lag ℓ := t − s, the kernel behaves like an exponentially decaying profile when plotted against the rescaled lag ℓ/t^α. Thus the seller’s ability to influence r_t via a deviation at time s is substantial only when t − s = O(t^α).

A particularly important special case is the sensitivity to the initial reference. Since $w_{t,0}=\prod_{k=1}^{t-1}(1-\delta_k)$, the same product-to-exponential argument yields

When α < 1, the influence of r₁ dies out extremely fast (stretched-exponentially in t^1 − α). When α = 1, the decay is polynomial, reflecting the running-average flavor of ARM-like dynamics. This distinction matters for how quickly the system ``forgets’’ an initially low reference: under α < 1 the initial anchor becomes essentially irrelevant relatively early in the horizon, whereas under α = 1 it continues to exert a long tail of influence.

Finally, combining with Lemma~ yields a convenient stability bound: perturbations to prices far in the past have a uniformly small effect on today’s reference, regardless of the magnitude of the perturbation (as long as prices remain feasible). Concretely, if two price paths coincide on the last m periods before t, then and imply

so matching only the recent Õ(t^α) prices suffices to match the reference state up to exponentially small error.

We can now summarize the economic content of the kernel bounds in a way that will guide the pricing results that follow. The power-law schedule δ_t ≍ t^−α implies that, at time t, the reference behaves as if consumers ``remember’’ primarily the last on the order of t^α prices, with exponentially small weight on earlier prices. When α is larger, this effective lookback grows, so a high early price can lift the benchmark for more subsequent periods; when α is smaller, the lift dissipates more quickly. At the same time, because δ_t itself shrinks with t for α > 0, later references become harder to move, so the seller faces a sharper intertemporal tradeoff between exploiting current demand and shaping future comparisons.

These facts are exactly what we need for the next step of the analysis. In the loss-neutral linear-demand benchmark, we will construct price paths that intentionally create a persistent ``reference wedge’’ over approximately T^α periods, yielding a revenue advantage over any fixed price of order Ω(T^α). Conversely, we will show that no dynamic policy can generate more than Õ(T^α) total incremental value from reference shaping, because the kernel prevents any single deviation from influencing more than about t^α future dates in a material way.

4. Static vs dynamic pricing: worst-case lower bound Ω(T^α) for fixed-price policies (constructive instances), and matching upper bound O(T^αlog T) (tightness up to logs).

Fixed-price policies are a natural benchmark in both theory and practice: they are simple to implement, easy to communicate, and often justified by the idea that consumers eventually get used to'' a stable price. Our model clarifies when this intuition is (approximately) correct and when it can be badly misleading. The key driver is the \emph{effective memory length} induced by the power-law step size $\delta_t \asymp t^{-\alpha}$: when consumers retain a nontrivial imprint of earlier prices over $\tilde O(t^\alpha)$ future periods, a seller can profitably invest in a high early price to lift the future reference and laterharvest’’ demand at lower prices. When memory is short, such investments have only transient payoff and fixed pricing becomes comparatively robust.

In this section we focus on the cleanest environment for quantifying the gap: reference effects, η₊ = η₋ = : η, with linear base demand H(p) = b − ap. In that case the kink in D(p, r) disappears and demand becomes globally linear:

Revenue is therefore

a concave quadratic in p for each fixed r, with the only intertemporal channel entering through the bilinear term ηrp. Put differently, can be decomposed as

where pH(p) = bp − ap² is the ``static’’ component and ηp(r − p) is a that is positive precisely when the posted price sits below the prevailing benchmark.

We show that for every α ∈ (0, 1] there exist feasible parameters and an initial reference r₁ such that any fixed price is polynomially suboptimal: the optimal dynamic policy earns an additional Ω(T^α) expected revenue. The construction is deliberately simple—a two-block policy with one price used to the reference and a second price used to it—and the scaling emerges directly from the kernel concentration results in Section~.

Take an initial reference equal to the (interior) static monopoly price under the base demand curve:

assuming p^∘ ∈ (0, p̄) and parameters are such that demand is nonnegative at relevant prices. Under any fixed-price policy p_t ≡ p, the reference recursion implies r_t → p, so the wedge r_t − p is transient. In particular, if we choose p = p^∘ and r₁ = p^∘, then r_t = p^∘ for all t and the reference premium in is identically zero. Thus the best fixed-price benchmark is essentially pinned down by the concave static term, leaving no mechanism for a fixed price to generate a persistent gain wedge.

Consider the policy that posts a slightly higher ``investment’’ price for L periods and then drops to the base-optimal level:

Because the state update is linear, once we enter the second block the reference evolves as a contraction toward p_L. Writing x_t := r_t − p_L for t ≥ L + 1, the recursion gives

Moreover, after L consecutive periods at p_H, the reference at the start of the second block is

For α ∈ (0, 1], the cumulative step size $\sum_{k=1}^L \delta_k$ diverges with L, implying $\prod_{k=1}^L(1-\delta_k)$ becomes small once L is moderately large. In particular, choosing L to grow with T ensures x_L + 1 is a constant fraction of Δ for large horizons.

Fix L of order T^α and focus on the first m periods of the second block, where m is also of order T^α. By Lemma~ (applied to the survival factor $\prod_{k=L+1}^{t-1}(1-\delta_k)$), the reference does not immediately collapse to p_L; rather, it remains elevated for roughly t^α steps. Concretely, for t = L + 1 + j with j ≤ m,

for constants c, c^′ > 0 (depending on α, ζ) and using δ_k ≍ k^−α. Taking m = ⌊κL^α⌋ with κ > 0 sufficiently small yields a uniform lower bound

where c₁ ∈ (0, 1) is a constant independent of T. This is the core ``memory-to-revenue’’ translation: by investing in an early price increase, we create a wedge of constant magnitude that persists for Θ(T^α) subsequent periods.

Under the loss-neutral demand , the marginal value of an increase in the reference state at time t is

Therefore, during the second block (where p_t = p_L), the additional revenue attributable to an elevated reference is exactly linear in the wedge:

Summing over the first m = Θ(T^α) periods of the second block and using gives a gain of order

The only remaining issue is the ``investment’’ cost incurred during the first block by charging p_H = p_L + Δ instead of p_L. Because p_L is the maximizer of pH(p) and the static component is strongly concave (quadratic), the baseline loss per period from deviating by Δ is O(Δ²). Meanwhile, the reference premium created by posting p_H when r_t is near p_L is at most linear in Δ and does not overturn the conclusion that the net first-block effect is O(LΔ²) for small Δ. Choosing Δ as a sufficiently small constant and taking L = Θ(T^α) yields

so the overall net improvement is Ω(T^α). Since the optimal policy dominates this explicit two-block policy, we obtain an instance-dependent lower bound

for each α ∈ (0, 1].

When α = 0, the step size is constant, δ_t ≡ 1 − ζ, so the effective memory length does not grow with t. Repeating the above argument yields only a window m = Θ(1) on which the wedge remains non-negligible, leading to $V^*(r_1)-V^{\fix}(r_1)=\Omega(1)$ at best. This is the sense in which α > 0 creates a qualitative change: even though δ_t → 0 and the reference becomes increasingly inert, the horizon-wide value of shaping the reference accumulates over a growing number of future periods.

We now argue that the preceding lower bound is essentially the correct order: no instance in the loss-neutral linear class can exhibit a fixed-price gap larger than Õ(T^α). The proof strategy is to (i) isolate the part of revenue that can benefit from intertemporal reference effects and (ii) use kernel concentration to show that any policy can only maintain a meaningful wedge r_t − p_t over Õ(t^α) periods following any ``movement’’ in the posted price.

Using , the total expected revenue under any policy π can be written as

where the first term is purely ``static’’ and the second term captures all benefits of inducing r_t ≠ p_t. Since pH(p) is concave in p and prices are bounded in [0, p̄], the static term cannot exceed Tmax_{p ∈ [0, p̄]}pH(p) by more than a constant depending on the initial transient under a fixed price (and in particular it cannot create a superlinear advantage over the best fixed price). Thus, to upper bound $V^*(r_1)-V^{\fix}(r_1)$ it suffices to upper bound the maximal attainable magnitude of the second term in , up to additive constants:

Because r_t is a convex combination of recent prices (Section~), the wedge r_t − p_t can be controlled by how much prices have moved in the recent past. Fix a window length m and split the kernel into a recent'' part and atail’’ part:

The tail term is uniformly small when m ≫ t^α by Lemma~, because the total tail mass $\sum_{s=0}^{t-m-1}w_{t,s}$ decays like exp (−Θ(m/t^α)). The recent term can be bounded by telescoping price increments:

Substituting into the recent part of and swapping sums yields a bound of the form

for coefficients γ_t, k ∈ [0, 1] that depend only on the kernel weights. Intuitively, only price changes within the last m periods can keep r_t separated from p_t; older changes have exponentially small influence.

Set m = m(t) := ⌈ct^αlog T⌉ for a large enough constant c. Then the tail term in is at most T⁻², say, and hence negligible when summed over t ≤ T. The remaining contribution depends on the price increments weighted by γ_t, k. A single increment |p_k + 1 − p_k| can affect |r_t − p_t| only for t in a window of length Õ(k^α) after time k, because for later t it falls into the kernel tail. Therefore, when we sum over t = 1, …, T, each increment is ``counted’’ at most Õ(k^α) times. This yields the aggregate bound

where Õ(⋅) hides logarithmic factors inherited from the choice of m(t).

Finally, to translate into a bound on , we use feasibility and interiority to control total variation of near-optimal policies: when the static revenue component is strongly concave and demand must remain nonnegative, excessively oscillatory pricing is dominated because it sacrifices pH(p) without creating commensurate reference premia (the premium is at most linear in the wedge, while the static loss from moving prices is quadratic). Formalizing this tradeoff yields

and hence

The lower bound construction shows Ω(T^α) is attainable by an explicit, economically interpretable strategy: raise the reference, then harvest. The upper bound shows that this is essentially the most one can do in the worst case: power-law memory permits a seller to profit from reference shaping over Õ(t^α) future periods, but not more, and aggregating these opportunities over the horizon yields at most Õ(T^α). The remaining logarithmic slack comes from converting exponential tail decay into a uniform ``effective window’’ across all t.

Taken together, and justify describing a phase transition in α: when α = 0 the maximal advantage of dynamic pricing over fixed pricing is bounded (in order), whereas for any α > 0 the worst-case fixed-price gap grows polynomially as T^α, reflecting an expanding effective memory that makes early reference shaping increasingly valuable.

5. Markdown optimality/near-optimality: extend swap/majorization arguments to time-varying weights; characterize when monotone policies are without loss; quantify additive loss for loss-averse customers using generalized Lipschitz bounds.

We now turn from dynamic pricing can matter to a more operational question: in the presence of reference effects with time-varying updating. A canonical pattern in many retail categories is a schedule—prices drift down over the season, often with occasional discrete promotional'' drops. In our model, markdowns arise for a simple economic reason: earlier prices influence later perceived gains through the reference state, so placing relatively high prices earlier can be interpreted as aninvestment’’ in a higher benchmark against which later prices are evaluated.

Formally, we call a policy a if its realized posted prices satisfy
p₁ ≥ p₂ ≥ ⋯ ≥ p_T.
The central message of this section is twofold. First, when consumers are in the sense that gains are at least as salient as losses (η₊ ≥ η₋), markdown policies are without loss of optimality: among all optimal policies there exists one that is monotone non-increasing. Second, even when consumers are (η₊ < η₋) so that aggressive early anchoring can be costly, we can still justify markdowns as a robust design: there exists a markdown policy whose expected revenue is within an O(T^αlog T) additive gap of the fully optimal policy. The time-varying step size δ_t = (1 − ζ)/(t + 1)^α matters quantitatively through the effective memory length, but it does not overturn the qualitative monotonicity logic.

The key technical object is the (time-varying) reference kernel implied by the linear update. Iterating the recursion yields, for each t ≥ 2,

Thus each posted price p_s affects all future references r_t for t > s, with an influence weight that decays in the time distance through products of (1 − δ_k). Compared to constant-step exponential smoothing, the only change is that these influence weights are no longer geometric; however, they remain nonnegative, sum to at most one, and (by the kernel lemma) concentrate on a window of length Õ(t^α).

To see why markdowns are optimal when η₊ ≥ η₋, it is useful to focus on —adjacent times t for which p_t < p_t + 1. A standard way to enforce monotonicity in finite-horizon scheduling problems is an argument: we show that whenever such an inversion occurs, swapping the two prices (leaving all other prices unchanged) weakly increases the seller’s expected objective. Repeating this swap until no inversions remain yields a non-increasing (markdown) sequence without reducing revenue.

The economic force behind the swap argument is that, holding fixed the multiset of posted prices, placing a higher price earlier (i) raises the next-period reference more, thereby making the subsequent lower price more likely to be perceived as a gain, and (ii) shifts any ``loss’’ perceptions (pricing above the reference) earlier, when they have fewer future spillovers. When η₊ ≥ η₋, the model rewards re-timing in exactly this direction: demand reacts at least as strongly to creating gains later as it does to avoiding losses later.

The time-varying update affects the swap calculus only through how the swap perturbs the reference path. Consider two policies that coincide up to time t − 1, and that differ only by swapping the pair (p_t, p_t + 1) = (x, y) with x < y:
$$ \pi:\ (p_t,p_{t+1})=(x,y),\qquad \pi^{\swap}:\ (p_t,p_{t+1})=(y,x), $$
with all later decisions held fixed for the moment. Because the state evolution is linear, the induced references under the two policies can be coupled pathwise (for every shock realization), and the difference in the reference at any future time s ≥ t + 1 is an affine function of (y − x). In particular, r_t + 1 shifts by the immediate step size δ_t:

while for s ≥ t + 2 the difference is still proportional to (y − x) with a coefficient obtained from the kernel in . What matters for the argument is not the exact closed form of this coefficient (which depends on δ_t and δ_t + 1), but rather that the effect of the swap on any r_s is (a) linear in (y − x) and (b) confined to a decaying tail, because the state recursion is a contraction at each step.

The revenue comparison leverages a monotonicity property of single-period payoffs with respect to the reference. Since D(p, r) is increasing in r for every fixed p (both gain and loss slopes are nonnegative), we have

Moreover, when η₊ ≥ η₋, the value of an increase in the reference is weakly larger in situations where the price is more likely to be evaluated as a gain. Indeed, wherever r ≠ p the derivative $\partial_r \Rev(p,r)$ exists and equals η₊p in the gain region r > p and η₋p in the loss region r < p. Thus higher r is particularly valuable when it flips a future period from loss'' togain’’ (or expands the gain wedge), and the inequality η₊ ≥ η₋ ensures that, at the margin, such flips are beneficial in the right direction for markdowns.

Putting these elements together yields the following conclusion.

Assume η₊ ≥ η₋. Then for any initial reference r₁ and any horizon T, there exists an optimal policy whose posted prices are non-increasing over time.

Fix any optimal policy and locate an inversion p_t < p_t + 1. We compare it to a modified policy that swaps (p_t, p_t + 1) and then (crucially) from time t + 2 onward follows an for the new induced state. Because the continuation value is the supremum over all future actions, this re-optimization dominates holding the original continuation fixed, so it suffices to show that the two-period portion of the objective (periods t and t + 1) weakly increases under the swap once we account for the induced change in r_t + 1 in . The gain-seeking condition η₊ ≥ η₋ ensures that shifting gain opportunities'' later (andloss exposures’’ earlier) cannot decrease the two-period expected revenue; the time-varying steps δ_t simply scale the magnitude of the induced reference change but do not alter its sign at t + 1. Iterating swaps eliminates all inversions without reducing optimality, producing a markdown price path.

Two remarks help interpret this result in practice. First, the statement is existential: there may be multiple optimal policies, and some need not be markdown, but at least one markdown policy is optimal. Second, the argument is robust to the particular power-law form of δ_t; we only use that (i) the update is a convex combination and (ii) the impact of a single price change on the reference state propagates forward linearly and contracts over time.

When η₊ < η₋, pricing above the current reference can be severely punished by demand reductions. This makes the ``investment’’ phase of an aggressive two-block strategy potentially unattractive: raising prices early may create losses in the very periods when the reference is still low. In such environments it is not generally true that an optimal policy must be a markdown, because the seller may prefer to avoid losses by staying close to the state and only gradually adjusting prices. Nonetheless, markdown policies remain a useful approximation, and we can quantify this approximation error in a way that matches the effective-memory scaling.

The key tool is a control on the value function with respect to the reference state. Intuitively, if we perturb the starting reference by Δr, then (because r_t is a weighted average of the initial condition and recent prices) the induced change in future references is at most Δr times the total remaining kernel mass. Under power-law steps, that mass is on the order of the effective memory length, yielding an O(T^αlog T) bound after taking a worst-case uniform window across all times.

To formalize, note that the envelope recursion implies, whenever ∂_rV^*(⋅, t) exists,

Because p_t ∈ [0, p̄] and D(p, r) has reference slopes bounded by max {η₊, η₋}, we have the uniform bound

where the first inequality uses that higher r weakly increases demand and prices are nonnegative. Iterating and using gives

The sum in is precisely the remaining ``survival mass’’ of the reference, and it can be bounded using the same exponential-product estimates as in the kernel lemma. Choosing a uniform effective window of length m(t) = Θ(t^αlog T) makes the tail negligible and yields

and hence a global Lipschitz property

This bound immediately implies a near-optimal markdown construction. Let r̄ := p̄ and let π^r̄ be an optimal policy when the initial reference equals r̄. Starting from the maximal feasible benchmark makes it easy to avoid losses: by restricting attention to actions p_t ≤ r_t, we ensure (p_t − r_t)₊ = 0 so loss aversion does not bite. Along such loss-free paths the demand function reduces to the gain-side linear form H(p) + η₊(r − p), and the earlier markdown optimality logic applies to deliver an optimal continuation that can be taken to be non-increasing. Thus we may select π^r̄ to be a markdown policy.

Now apply the markdown policy π^r̄ starting from an arbitrary r₁ ∈ [0, p̄]. Since the only difference is the initial reference, at t = 1 yields

where we also used that V^*(⋅) is nondecreasing in r (higher references weakly increase demand for any fixed action sequence, hence also under the optimal sequence). Because (r̄ − r₁) ≤ p̄, the right-hand side is O(T^αlog T). In words: even under loss aversion, we can identify a markdown policy whose additive suboptimality is controlled by the effective memory length.

This near-optimality guarantee is intentionally worst-case and additive. It does not claim that markdowns are always nearly optimal in relative terms, nor that the specific constructed markdown is unique or easily interpretable in every instance. Rather, it provides a robustness statement: with time-varying reference updating, restricting attention to monotone price paths cannot cost more than the amount of value that any policy could plausibly extract from reference shaping over an effective memory window.

Having established when monotone structure is exact (gain-seeking) and when it is a controlled approximation (loss aversion), we next exploit the loss-neutral linear-demand benchmark to derive an explicit characterization of the optimal markdown segment via a generalized Euler-type recursion under the time-varying steps δ_t.

6. Linear-demand characterization (loss-neutral benchmark): derive generalized Bellman FOC condition; obtain a generalized markdown recursion; discuss when closed forms exist and when one must compute numerically (time-varying linear systems).

We now specialize to the loss-neutral case η₊ = η₋ = : η, which plays the role of a tractable benchmark. The key simplification is that the kink in the reference term disappears: for every (p, r),
(r − p)₊ − (p − r)₊ = r − p,
so demand is globally linear in (p, r):

Hence the single-period expected revenue is a smooth concave quadratic in p,

and the dynamic problem becomes a finite-horizon linear–quadratic control problem with state-transition coefficient r_t + 1 = (1 − δ_t)r_t + δ_tp_t. The time variation in δ_t breaks stationarity, but it does not destroy tractability: the optimal policy remains affine in the reference state, and the realized optimal price path satisfies a generalized Euler-type recursion whose coefficients inherit the variation in (δ_t)_t ≤ T.

Because shocks enter demand additively with mean zero, they do not affect expected revenue beyond their mean; throughout this subsection we work with expected demand . The Bellman recursion is

Under the usual interiority conditions (so the optimizer lies in (0, p̄)), differentiability yields the one-step FOC

In the loss-neutral linear model, $\Rev_p$ is affine in (p, r) and $\Rev_r(p,r)=\eta p$ is affine in p. This makes it possible to push the characterization further by eliminating ∂_rV^* through a two-step recursion, producing a generalized ``Euler equation’’ coupling (p_t, p_t + 1).

Combining the envelope recursion with gives the two-step condition stated earlier; in the present benchmark it becomes fully explicit. Using
$$ \Rev_p(p,r)=b+\eta r-2(a+\eta)p, \qquad \Rev_r(p,r)=\eta p, $$
the two-step condition at time t ≤ T − 1 reads

with the state recursion

Equation is a generalized Euler equation: it equates the current marginal revenue loss from raising p_t (the term −2(a + η)p_t inside $\Rev_p$) to the discounted marginal continuation benefit that comes from shifting the next reference via the transition . The coefficient
$$ \kappa_t:=\delta_t\frac{1-\delta_{t+1}}{\delta_{t+1}} $$
captures how today’s reference investment trades off against the next period’s incentive to adjust price, and it is precisely here that time-varying updating enters.

Substituting into yields a recursion in (p_t, r_t, p_t + 1) with time-varying coefficients. After straightforward algebra one can write equivalently as

where

Thus, whenever A_t > 0 (which holds in the standard parameter region a + η > 0 with δ_t ∈ (0, 1]), the optimal p_t is an affine function of (r_t, p_t + 1):

The terminal condition is the static first-order condition in the last period,

again assuming an interior solution.

Equation should be read as the time-varying analogue of the constant-coefficient recursion that appears under exponential smoothing or under the ARM update: we still obtain a linear relationship across adjacent prices, but the coefficients now drift over time through (δ_t, δ_t + 1).

A complementary (and often computationally cleaner) characterization proceeds directly from dynamic programming. Because is quadratic in (r, p) and the state evolves linearly, the value function is quadratic in r and the optimal price is affine in r. Concretely, for each t there exist scalars (A_t, B_t, C_t) such that

and the optimal policy can be written as

with time-varying coefficients (β_t, γ_t) determined by backward induction. To see the structure, take as an induction hypothesis at time t + 1 and differentiate the t-period objective in . The FOC becomes a linear equation in (p, r):
b + ηr − 2(a + η)p + δ_t(2A_t + 1((1 − δ_t)r + δ_tp) + B_t + 1) = 0,
so

The denominator a + η − A_t + 1δ_t² is the (positive) curvature of the period-t objective in p; it is the natural interiority condition ensuring strict concavity in p. Substituting back into yields a backward recursion for (A_t, B_t, C_t) (a time-varying Riccati-type update). While the closed-form expression for (A_t, B_t, C_t) is algebraically tedious, the key point is that it is one-dimensional and can be computed in O(T) time.

The affine form also induces a linear evolution of the state under the optimal policy:

Given (β_t, γ_t) and r₁, we can therefore compute the entire optimal expected price path by a forward pass. In this benchmark, the ``optimal markdown’’ is not imposed as an external restriction; rather, the monotonicity result from the previous subsection guarantees that among optimal policies we may select one with non-increasing realized prices, and the affine policy representation provides a direct way to compute such an optimal path (subject to the caveat below about feasibility constraints).

The two representations above—the two-step Euler equation and the affine-feedback form —lead to efficient computation, but they emphasize different objects.

If one wishes to compute the optimal price sequence (p_t)_t ≤ T directly for a given r₁, it is often convenient to treat the unknowns as the pair (p_t, r_t)_t = 1^T and solve a sparse linear system. Indeed, the optimality conditions consist of:

Each Euler equation links only variables at times t and t + 1, and each state equation links only (r_t, p_t, r_t + 1). As a result, the overall linear system has a banded structure (block-tridiagonal when written in the stacked vector (r₂, p₁, r₃, p₂, …, r_T, p_T − 1, p_T)). This can be solved in O(T) time by standard banded Gaussian elimination. The advantage of this formulation is that it stays close to the Euler-equation intuition: time-varying updating simply makes the coefficients of the banded matrix time dependent.

Alternatively, if one prefers to compute a policy that maps any realized r_t to p_t, then the Riccati recursion implied by is the more natural route: we compute (A_t, B_t) backward once, and then apply the affine map forward. This viewpoint will be useful in the learning discussion that follows, because it exhibits a low-dimensional object—the affine coefficients (β_t, γ_t)—that summarize the optimal action rule in the benchmark.

The recursion is analytically solvable only in special cases where the time dependence in (δ_t) disappears or simplifies.

If α = 0 (exponential smoothing), then δ_t ≡ 1 − ζ is constant. In that case the Euler equation has constant coefficients, and the induced second-order difference equation for prices has a solution in terms of the roots of a characteristic polynomial; equivalently, the Riccati recursion converges to a stationary fixed point away from the horizon boundary, yielding an approximately geometric markdown in long horizons.

If (α, ζ) = (1, 0) (ARM), then δ_t = 1/(t + 1) varies but does so in a very structured way; one can exploit cancellations in products of (1 − δ_t) and harmonic sums to obtain explicit formulas for the reference kernel and, in turn, explicit representations of the optimal markdown segment in terms of simple backward recursions with closed-form coefficients.

For general α ∈ (0, 1] and ζ ∈ [0, 1), the coefficients in depend on both t and t + 1 through (t + 1)^−α and (t + 2)^−α. This time dependence typically precludes a clean closed form, but it does not impede computation: because the problem remains linear–quadratic, one can compute the optimal policy to machine precision in time linear in T via either the Riccati recursion or banded linear solves.

Two qualifications are worth keeping in view. First, our derivations above use interior first-order conditions; when the optimal unconstrained price lies outside [0, p̄], the true optimum is obtained by projection to the boundary, and the affine policy becomes piecewise-affine. Second, while loss-neutrality removes the kink in the reference term, other constraints (such as demand nonnegativity, inventory, or price-adjustment costs) would reintroduce nonlinearities. In that sense, the present characterization should be understood as a benchmark that isolates the role of time-varying memory in the cleanest possible environment.

In the next section we leverage precisely this structure: the loss-neutral benchmark suggests that the optimal decision rule can be summarized by a small number of time-varying coefficients (or, equivalently, by a banded linear relation across adjacent prices). This low-dimensionality is what makes it plausible to learn the underlying demand and reference parameters online without ever ``resetting’’ reference effects, and it guides the design of learning algorithms whose regret depends on α through the effective memory length.

7. Learning implications (sketch + optional theorem): identify low-dimensional objects (e.g., greedy-price affine map) invariant in α; propose an algorithm that learns demand parameters without explicit reference resets; conjectured/derived regret scaling in α.

A practical motivation for allowing time-varying updating is that it makes increasingly implausible. When δ_t decays, pushing the reference from its current level toward a target price requires sustained pricing for on the order of t^α periods, precisely the effective-memory phenomenon captured by the kernel lemma. This creates a basic tension for learning: exploration today can contaminate the reference state for many future periods, and the number of affected periods grows with α. At the same time, the analysis above reveals an offsetting simplification: the state is one-dimensional and observed (given the known update rule), and in the loss-neutral benchmark the optimal policy is summarized by a small number of time-varying coefficients. We can therefore design learning rules that reference effects and nevertheless exploit the structure.

There are two closely related summaries of the control problem that are essentially invariant to α in (though not in numerical values).

First, the to a given reference,

is a one-dimensional map from r to p. In the loss-neutral linear benchmark it is affine:

independent of (δ_t) except through the evolution of r_t. This map is useful because, for large t when δ_t is small, the dynamic correction to the myopic rule is of order δ_t in the Euler equation, so myopic pricing becomes a natural baseline even when the true optimum is dynamic.

Second, the optimal policy in the benchmark is affine in the state,

where the entire policy is encoded by the 2T scalars (β_t, γ_t)_t ≤ T. Importantly, the dimension of the unknown economic environment is also constant: in the loss-neutral linear model, all primitive uncertainty can be collected into a parameter vector such as
$$ \theta:=(b,a,\eta)\in\R^3, $$
while the time variation in δ_t is known. Thus, learning can be framed as estimating a fixed low-dimensional θ, and repeatedly converting θ̂ into the low-dimensional policy coefficients (β̂_t, γ̂_t) via the Riccati recursion from the previous section.

Because the seller observes (p_t, D_t) and knows (δ_t), she can reconstruct (r_t) exactly from r₁ and posted prices. This removes a key obstacle relative to models in which reference is latent. In particular, in the loss-neutral benchmark the conditional mean demand is linear in observables:

Writing D_t = x_t^⊤θ^′ + ε_t with a suitable reparameterization (e.g., θ^′ := (b, η, a + η) and x_t := (1, r_t, −p_t)), we obtain a standard linear regression problem with regressors: r_t depends on past prices, which were chosen using past data. Nevertheless, under bounded shocks and mild excitation conditions, self-normalized concentration for adaptive linear regression applies, yielding confidence ellipsoids around the least-squares estimate.

The same idea extends to asymmetric reference effects. The demand model
D(p, r) = b − ap + η₊(r − p)₊ − η₋(p − r)₊
is linear in the parameter vector θ := (b, a, η₊, η₋) with feature vector

so one can estimate θ by least squares as well, albeit with a policy computation step that is no longer globally linear–quadratic.

We sketch a learning rule that (i) estimates θ online by ridge regression, (ii) computes a candidate dynamic policy by plugging an optimistic parameter into the Riccati recursion, and (iii) adds a vanishing dithering term to ensure persistent excitation without inducing large, long-lived distortions in the reference.

Fix a regularization λ > 0 and a confidence level δ ∈ (0, 1). Let θ̂_t be the ridge estimate based on data up to t − 1, and let 𝒞_t be an ellipsoidal confidence set for θ constructed from the design matrix. At time t:

Two remarks clarify why this rule is aligned with the economics.

First, it uses the same low-dimensional object throughout: the coefficients. The algorithm never searches over arbitrary price paths; it searches over a constant-dimensional parameter vector θ, and each candidate θ induces a unique (time-varying) affine control via the Riccati recursion.

Second, dithering can be chosen to respect memory. Because the influence of p_t on future references is concentrated over roughly t^α periods, the cumulative distortion generated by a bounded perturbation ξ_t is also localized in time. This suggests shrinking ξ_t just slowly enough to guarantee identification, but fast enough that the induced reference drift does not dominate revenue.

Let the (expected) regret be
$$ \Reg_T ~:=~ V^*(r_1)-\mathbb{E}\Big[\sum_{t=1}^T \Rev(p_t,r_t)\Big]. $$
The kernel lemma suggests a useful mental model: the controlled system has an m_t ≍ t^α, meaning that price or estimation errors at time t materially affect only the next m_t periods. In online learning problems with bounded memory m, a common heuristic is that regret interpolates between $\tilde O(\sqrt{T})$ (no memory) and $\tilde O(\sqrt{mT})$ (memory-m coupling). Translating this to our setting by taking m ≍ T^α yields the benchmark scaling

which reduces to $\tilde O(\sqrt{T})$ under exponential smoothing (α = 0) and becomes nearly linear as α → 1 (ARM-like long memory), reflecting the intuition that early experimentation has long-lasting consequences when references are hard to move.

One can also interpret as an information–distortion tradeoff: to identify η one needs variation in r_t, but inducing such variation requires sustained price movements whose opportunity cost accumulates over the effective memory window. Larger α increases this window and therefore raises the cost of informative experimentation.

The proof strategy we have in mind mirrors standard OFU analyses but requires one additional ingredient tailored to reference dynamics: a stability/sensitivity bound showing that, uniformly over admissible parameters, the value function is Lipschitz in the reward parameters with a Lipschitz constant proportional to the effective memory length induced by (δ_t). The kernel lemma supplies exactly this ingredient by quantifying how deviations in prices (and hence in inferred demand parameters through the algorithm’s choices) propagate into future references.

We emphasize that Theorem~ should be read as a guidepost rather than a fully optimized statement. In particular, sharper rates may be achievable by (i) separating the learning of the static demand slope a from the learning of reference sensitivity η (which requires movement in r_t), and (ii) exploiting the fact that δ_t → 0, so the system becomes progressively less sensitive to control later in the horizon.

Several extensions are conceptually straightforward but technically nontrivial.

With η₊ ≠ η₋, the demand regression remains linear in parameters via , but the control problem is no longer globally quadratic due to kinks. One pragmatic approach is to combine regression with a motivated by our structural results, e.g., markdown policies parameterized by a small number of breakpoints. This yields a low-dimensional optimization problem each period even when the exact Bellman equation is not tractable.

We have treated α and ζ as known primitives. In applications they may be uncertain. Because the reference update is deterministic given prices, one can in principle estimate (α, ζ) from observed reference proxies (surveys, posted competitor prices, or repeated-purchase panels), but without direct reference observations the identification problem becomes substantially harder. Our results suggest that mis-specifying α is economically meaningful because it changes the effective memory length and therefore the returns to early anchoring.

Learning algorithms that rely on first-order characterizations should be robust to projection onto [0, p̄] and to demand nonnegativity constraints. In finite samples, optimistic parameters can push computed prices toward the boundary; careful clipping and conservative confidence sets are needed to avoid pathological behavior, especially when α is large and the reference is slow to revert.

Overall, the main lesson is that time-varying reference updating does not merely complicate the control problem; it also offers a principled way to think about learning. The same effective-memory phenomenon that drives the T^α phase transition in optimal should also govern the difficulty of : as α rises, information arrives at rate T, but the cost of generating informative variation is amortized over increasingly long reference windows. This motivates the numerical exploration in the next section, where we can visualize both the optimal policy shapes and the way the effective-memory scaling manifests in finite horizons.

8. Numerical illustrations: verify T^α scaling; show policy shapes as α varies; stress-test under asymmetry η₊ ≠ η₋.

We use this section to visualize three implications of the analysis: (i) the revenue advantage of dynamic pricing over the best fixed price grows on the order of T^α (up to logarithms) in the loss-neutral linear benchmark; (ii) the optimal price path deforms smoothly as we vary the updating exponent α, interpolating between a nearly myopic regime (fast forgetting) and a strongly intertemporal regime (slow forgetting); and (iii) the qualitative structure is robust—but not identical—when reference effects are asymmetric (η₊ ≠ η₋), where the Bellman objective is kinked and the linear–quadratic reduction no longer applies globally.

Throughout, we take p̄ large enough that the optimal prices are interior in the benchmark experiments (we verify ex post that truncation does not bind). Because shocks enter additively and satisfy $\E[\varepsilon_t]=0$, expected revenue depends only on the conditional mean demand, so the value of any deterministic policy can be evaluated without Monte Carlo. When we report realized paths, we simulate i.i.d. bounded shocks only to illustrate that the qualitative policies are not artifacts of expectation.

In the case η₊ = η₋ = η and H(p) = b − ap, the expected one-period revenue can be written (on the no-loss region p ≤ r) as
$$ \Rev(p,r)=p\big(b+\eta r-(a+\eta)p\big), $$
and, as discussed earlier, the optimal feedback rule is affine in r:
p_t^*(r) = β_tr + γ_t.
For any fixed parameter triple (a, b, η) and known (δ_t), we compute (β_t, γ_t) by backward recursion (equivalently, the one-dimensional Riccati system). This yields an exact solution for the benchmark runs, so numerical error arises only from floating-point arithmetic.

To compute the best fixed price, we evaluate
$$ V^{\text{fix}}(r_1)=\max_{p\in[0,\bar p]} \sum_{t=1}^T \Rev\big(p,r_t(p)\big), \qquad r_{t+1}(p)=(1-\delta_t)r_t(p)+\delta_t p, $$
by one-dimensional search (the objective is smooth and unimodal for our parameter ranges). This comparison isolates the intertemporal value of shaping r_t from the static curvature of $\Rev(\cdot,\cdot)$.

When η₊ ≠ η₋, the period objective is piecewise quadratic in p with a kink at p = r, and the Bellman recursion is no longer globally quadratic. We therefore compute the dynamic optimum by discretizing the state r ∈ [0, p̄] on a fine grid and performing backward induction:
$$ V(r,t)=\max_{p\in[0,\bar p]}\Big\{\Rev(p,r)+V\big((1-\delta_t)r+\delta_t p,t+1\big)\Big\}. $$
The maximization over p is one-dimensional; we implement it by combining a coarse grid search with local refinement on each side of the kink at p = r. This routine is fast because the state is one-dimensional and the horizon is finite.

In addition, to connect to the structural results, we compute a restricted policy class even under asymmetry: we optimize over non-increasing sequences with a small number of breakpoints (e.g., K ∈ {3, 5, 7} segments), which produces an interpretable policy and serves as a stress test for the claim that markdowns remain near-optimal when η₊ < η₋.

Our first experiment is designed to highlight the phase transition in the magnitude of dynamic gains as the memory exponent α varies. We fix (a, b, η) and vary T and α while holding ζ constant. For each (T, α), we compute the gap
Δ(T, α) := V^*(r₁) − V^fix(r₁),
where V^* is the optimal dynamic value in the loss-neutral benchmark. We then examine the log–log slope of Δ in T.

A convenient parameterization that yields clear separation between dynamic and fixed pricing is one in which (i) the static optimum is interior and (ii) gains from a temporarily elevated reference are not swamped by the base-demand loss from higher prices. Concretely, we set a moderate (so base demand is not too elastic), choose η comparable to a (so reference effects are first-order), and initialize r₁ below the static monopoly price (so there is room for ``anchoring’’ upward). In this regime, the computed p_t^* typically features an early phase with elevated prices relative to the myopic map p^myop(r_t), followed by a markdown that harvests gains when p_t falls below the elevated r_t.

Figure~ plots Δ(T, α) for α ∈ {0, 0.25, 0.5, 0.75, 1} over horizons T spanning one to two orders of magnitude. The key visual pattern is that the curves steepen with α. A simple least-squares fit of log Δ on log T (excluding very small T where finite-horizon end effects dominate) yields slopes close to α; the fit is tightest for intermediate α, where Δ is neither essentially constant (as for α = 0) nor dominated by truncation at t = T (as can happen when α is very close to 1 and T is small).

An informative normalization is Δ(T, α)/T^α: across the same runs, this ratio remains approximately flat in T up to a slow drift that is consistent with the logarithmic factor in the upper bound. In other words, the computations match the qualitative message that the effective memory length t^α is the unit of intertemporal coupling, and hence the appropriate unit of dynamic advantage.

Holding α fixed and increasing ζ uniformly shrinks δ_t, making reference harder to move. In the computations, this appears as a downward shift in Δ(T, α) with minimal change in the log–log slope. This matches the theory: α governs the of forgetting (effective memory growth), while ζ governs the of responsiveness. Practically, one can think of ζ as controlling how much ``effort’’ (sustained high pricing) is required to raise the reference by a given amount.

We next fix a horizon T and plot optimal price and reference trajectories (p_t^*, r_t^*) for a grid of α values. The goal is not to claim a universal shape (this depends on primitives and boundary constraints), but to show how the same economic forces manifest differently under short versus long effective memory.

When α = 0 (exponential smoothing with constant δ_t = 1 − ζ), the optimal policy typically resembles a smoothed version of myopic pricing: p_t^* tracks p^myop(r_t) closely, with a modest dynamic correction early in the horizon. Intuitively, because the reference adjusts geometrically, any single price affects the distant future only weakly; the seller has little incentive to accept a large current sacrifice to influence far-ahead demand.

As α increases, the adjustment slows down, and the optimal policy becomes more front-loaded. Two robust deformations appear in the numerics:

These patterns are exactly what the kernel lemma suggests: when the system remembers roughly t^α periods, the return to lifting r_t today is amortized over that window, so the optimal control allocates more ``effort’’ to shaping r_t early.

Figure~ overlays price paths for several α under gain-seeking (or loss-neutral) effects. The paths are non-increasing, consistent with the markdown optimality result. More interestingly, the of the markdown changes: for small α the path drops relatively quickly toward its long-run level; for larger α the price remains elevated longer before declining, reflecting the fact that the reference is harder to move later in time (smaller δ_t) and thus must be influenced earlier if it is to matter.

To connect more directly to the theory, we compute the
Dur := |{t : r_t^* − p_t^* ≥ κ}|
for a small threshold κ > 0. Across our parameter grid, Dur increases with α holding T fixed, and for fixed α it increases with T at a rate consistent with T^α. This diagnostic is useful because it separates the economic channel (time spent harvesting gains) from level effects (the overall magnitude of prices).

Finally, we examine how the policy and the dynamic advantage change when consumers are loss-averse (η₋ > η₊) or strongly gain-seeking (η₊ > η₋). This is the empirically relevant case, and it is precisely where our analysis emphasizes structure (markdown optimality or near-optimality) rather than closed-form solutions.

When η₊ > η₋, the computed dynamic optimum from the discretized Bellman recursion is a markdown in all instances we tested: any local non-monotonicity disappears as we refine the state grid and the price maximization. Quantitatively, relative to the loss-neutral benchmark, the optimal policy becomes more aggressive in creating and exploiting the gain region p ≤ r. The anchoring phase is slightly shorter but higher, followed by a more pronounced markdown, because the marginal return to generating perceived gains is larger.

The dynamic advantage over fixed pricing also increases in level, but its scaling in T remains tied to α. This is consistent with the idea that η₊ changes the of the wedge while α changes its .

When η₋ > η₊, the computed optimal policy can exhibit a qualitatively different early-horizon behavior depending on r₁. If r₁ is low, pricing above r₁ triggers a loss penalty, making a sharp anchoring move unattractive. In this case, the dynamic optimum often begins near p = r (or slightly below) to avoid losses, and then transitions into a markdown as r_t is gradually pulled upward. If instead r₁ is high (or the seller inherits a high ``usual price’’), the optimal path resembles the gain-seeking case: the seller can stay below the reference immediately and harvest gains without paying the loss penalty.

These patterns underscore a practical point: under loss aversion, the r₁ is first-order, not merely a transient. Since δ_t decays, a low initial reference can persist for a long time, and the opportunity cost of moving it is concentrated early.

To connect to our near-optimality claim, we compare the true dynamic optimum under loss aversion (from discretized DP) to the best segmented-markdown policy with K breakpoints. Across a broad set of instances, the segmented markdown achieves a large fraction of the dynamic advantage relative to fixed pricing, and the residual gap grows slowly with T (consistent with an O(T^αlog T) envelope rather than a linear-in-T degradation). Empirically, increasing K yields diminishing returns: most of the benefit is captured by allowing an early plateau (to avoid losses while nudging r_t upward) and a later decline (to harvest gains).

The main limitation we observe is not that markdowns become poor, but that applying the loss-neutral affine policy can be very suboptimal when η₋ ≫ η₊. In such cases, even a small episode of pricing above reference can erase much of the long-run benefit of raising r_t, because losses are immediate while gains are diluted over the effective memory window. This reinforces the modeling lesson: asymmetry matters not only for welfare interpretation but also for algorithm design, since it changes which regions of the state space are economically safe to explore.

The numerical exercises reinforce three messages. First, the effective-memory exponent α is visible in finite-horizon computations: dynamic gains scale approximately like T^α and policies become more front-loaded as α increases. Second, the policy shapes are interpretable through a simple anchoring-versus-harvesting narrative that survives beyond the loss-neutral benchmark. Third, while asymmetry introduces kinks and complicates exact computation, it does not eliminate the organizing role of markdowns; rather, it shifts the early-horizon incentives by making it costly to price above a low inherited reference.

These patterns motivate the broader questions we turn to next: if a platform or regulator can influence the effective memory of consumers (through information design, displayed ``usual prices,’’ or rules governing reference disclosures), what are the implications for revenue, welfare, and fairness, and how should we think about policy that effectively chooses α and ζ?

9. Discussion: platform policy implications (choosing effective memory), welfare and fairness, links to ‘usual price’ regulation; open problems.

Our results and simulations emphasize that how long consumers remember'' is not a merely descriptive parameter: it is an object that platforms, firms, and regulators can influence through information design. In the model, the pair $(\alpha,\zeta)$ governs the effective span and intensity of the reference mechanism, and the main comparative statics run through the induced memory length on the order of $t^{\alpha}$ (with overall responsiveness scaled by $1-\zeta$). This mapping suggests a useful translation for practice: policies that make older prices salient (or that stabilize the notion of ausual price’’) effectively reduce forgetting and thereby raise α (or lower ζ), whereas policies that foreground the most recent transaction and de-emphasize history effectively increase forgetting and thereby lower α (or raise ζ). In this section we discuss how these levers interact with revenue, welfare, and fairness, and we outline open questions that arise once one moves beyond our stylized environment.

Many retail transactions are mediated by platforms that choose what price history is displayed, what comparison is suggested, and how discounts are framed. Common design choices include: whether to show a strike-through was'' price, whether to show a price range over the last $30$ or $90$ days, whether to display the buyer's own past purchase price, and whether to provide cross-seller comparisons. Each of these design elements can be interpreted as shaping the weights consumers implicitly place on past prices. In our notation, the kernel lemma makes this precise: the reference state is a weighted average of recent posted prices, with mass concentrated on the most recent $\tilde{O}(t^{\alpha})$ periods. A platform that forces the comparison set to be short (e.g.,lowest price in the last 7 days’‘) effectively truncates the memory kernel and pushes behavior toward a smaller effective α; conversely, a platform that highlights a longer lookback window or reinforces a stable ``regular price’’ pushes toward a larger effective α.

This framing matters because the seller’s optimal policy reacts sharply to memory. When effective memory is long, early prices have persistent influence, and we obtain both larger dynamic gains and stronger incentives to front-load high prices and then markdown. When effective memory is short, the intertemporal benefit of shaping reference is small, and pricing becomes closer to static optimization. Thus, even absent any change in preferences, a platform can amplify or dampen incentives for intertemporal price manipulation by changing how price history is curated.

A practical implication is that the platform’s objective function is pivotal. A platform that is paid via ad valorem fees may prefer designs that increase seller revenue, which in our environment may correspond to a larger effective α (or smaller ζ) when reference effects are important, because the dynamic advantage scales like T^α up to logarithms. A platform concerned with consumer satisfaction and long-run retention may prefer the opposite if consumers dislike feeling manipulated, if high initial prices reduce conversion, or if perceived unfairness increases churn. The model therefore provides a language for an empirically grounded question: which UI and disclosure policies shift effective memory parameters, and how do those shifts translate into seller revenue and consumer outcomes?

Revenue comparisons alone are not welfare statements. Reference dependence is often interpreted as a behavioral distortion, and policy discussions frequently treat the reference as something that can be exploited. Our analysis is consistent with that interpretation in the sense that dynamic pricing creates wedges (periods where r_t − p_t > 0) that raise demand without improving intrinsic product value. Yet the welfare consequences are ambiguous once one accounts for heterogeneous consumers, participation decisions, and the possibility that consumers derive genuine utility from perceived gains (even if reference-dependent).

A helpful decomposition separates three channels. First, there is the standard allocative channel: higher prices reduce quantity relative to marginal cost and thus may reduce total surplus in a competitive benchmark. Second, there is a volatility channel: front-loaded pricing and markdowns can shift when consumers purchase, potentially creating congestion or stockouts (in richer models with capacity) and increasing the variance of consumer expenditure over time. Third, there is a behavioral channel: if perceived gains and losses are not mere framing but correspond to experienced utility, then creating gain episodes can increase consumer welfare even when the seller extracts more revenue. In our reduced-form demand specification, these channels are not separately identified; the demand response captures the net effect of perceived gains and losses on purchasing behavior, not the underlying welfare primitives.

Nonetheless, the comparative statics deliver a clear policy lesson: interventions that lengthen memory (larger α) increase the seller’s ability to shift demand intertemporally by shaping reference, so any welfare analysis should treat memory as a lever. If a regulator believes that reference manipulation is harmful, then limiting the salience of long price histories (or standardizing what constitutes a valid comparison) can reduce the long-horizon intertemporal coupling and attenuate the incentive for aggressive anchoring. Conversely, if a regulator believes that consumers benefit from stable expectations and from being protected against short-term price spikes, then requiring longer lookback windows for advertised ``discounts’’ may be appropriate even if it increases the seller’s scope for dynamic extraction along other margins.

Reference effects are likely heterogeneous: some consumers track prices closely, others rely heavily on platform-provided cues, and still others face attention or liquidity constraints that make them more susceptible to high initial prices. In such settings, the same shift in effective memory can have uneven impacts across groups. For example, making a long price history salient may help sophisticated consumers avoid overpaying, but it may also create stronger reference anchoring that harms inattentive consumers who enter mid-horizon and treat displayed ``regular prices’’ as informative. Similarly, under loss aversion (η₋ > η₊), a low inherited reference can lock in a long period of relatively low prices (benefiting later consumers) but may also penalize early adopters if the seller avoids pricing above reference and delays product availability or promotion.

Fairness concerns also arise from segmentation. Platforms can personalize displayed comparisons (e.g., ``you saved relative to your usual price’’) or vary the reference window across users. In our language, this is equivalent to assigning different (α, ζ) or even different reference states r_t across consumer cohorts. Such design can be efficiency-enhancing if it better matches information to preferences, but it can also function as a form of behaviorally mediated price discrimination. Importantly, discrimination can occur even when the posted price p_t is uniform: if different groups carry different references, then the same price generates different perceived gains/losses and hence different effective demand elasticities. This suggests a novel fairness diagnostic: disparate outcomes may arise not only from different prices but from different induced by platform design.

A regulator or platform auditor therefore faces two related questions. The first is transparency: are consumers told how ``usual price’’ benchmarks are computed and over what window? The second is parity: are benchmark rules consistent across users and sellers, or are they optimized to increase conversion and revenue in ways that systematically disadvantage certain groups? Our framework does not resolve these questions normatively, but it helps clarify what should be measured: not only realized prices, but also the informational environment that shapes reference.

Many jurisdictions regulate strike-through pricing and was/now'' claims by specifying a lookback period or requiring that the reference price be a genuine prevailing price for a nontrivial duration. These rules can be interpreted as imposing constraints on the reference signal available to consumers. In our setting, such regulation can be viewed as an external choice of an effective memory window: a requirement that thewas’’ price be the lowest price in the last L days pushes consumer comparisons toward a longer and more conservative benchmark, while allowing a seller to cite any recent high price pushes comparisons toward a shorter and potentially manipulable benchmark.

This interpretation yields two nuanced implications. First, stricter ``usual price’’ rules can reduce deceptive anchoring (in the sense of preventing a seller from briefly posting an inflated price solely to create a reference), but they can also make legitimate markdown strategies more potent by stabilizing the reference once it has been earned. Second, the welfare effect depends on whether the regulation changes only the about past prices or also the price path via seller incentives. Our model focuses on the latter: changing effective memory changes optimal pricing, so regulation can have equilibrium effects rather than merely informational effects.

From a policy design perspective, the scaling results provide a way to anticipate magnitude. If the effective memory exponent rises, the potential revenue gain from dynamic reference shaping rises roughly like T^α. Even if the absolute horizon T is not literally the number of days a product is sold, the same logic applies to the length of the period over which consumers consider the product and over which sellers can run promotions. For seasonal goods, subscriptions, or durable purchase cycles, that window can be long, making the choice of reference policy consequential.

Several limitations temper how literally one should take the policy mapping from (α, ζ) to design choices. First, our consumers are myopic and summarized by a reduced-form demand curve; strategic waiting, stockpiling, and forward-looking inference about future discounts are excluded. These forces can either reinforce markdowns (as in classic durable goods intuition) or dampen them (if consumers anticipate discounts and delay purchase). Second, the seller is a monopolist with no inventory constraints. In many environments, inventory and replenishment create additional motives for dynamic pricing that interact with reference effects, and competition can discipline the ability to sustain high early prices. Third, the reference update rule is deterministic and common knowledge; in practice, consumers may have noisy memories and may learn the seller’s pricing regime over time.

These limitations matter for welfare claims. In particular, if consumers are forward-looking, then reference manipulation may be partially undone by expectations, and the welfare effect of longer memory could reverse. Similarly, in competitive markets, longer memory could intensify price competition by making deviations from a low-price norm more costly in demand terms, which could reduce prices and increase consumer surplus. Thus, we view the present model as an organizing device: it isolates a clean channel (time-varying effective memory) and shows how it scales incentives, but it is not a complete regulatory impact model.

Several research directions appear especially important for connecting theory to platform policy.

The parameters (α, ζ) are not directly observed. Estimating them requires panel data on prices, demand, and potentially on platform display policies, with careful attention to endogeneity (prices respond to anticipated demand). Natural experiments that change disclosure rules or UI elements offer a promising route, as they may shift memory without directly changing costs or product quality.

Real consumers may form references from multiple sources: their own past purchases, competitor prices, list prices, and platform-provided benchmarks. A multi-reference model would endogenize which reference is salient and allow platforms to choose salience weights. Conceptually, this generalizes the scalar state r_t to a vector, with platform policy selecting a mapping from price histories to the salient component.

In many applications the seller does not know (a, b, η₊, η₋) ex ante and must learn. Reference dependence complicates exploration because prices affect both current demand and future state. Designing learning algorithms with regret guarantees that reflect the effective memory length (and that respect monotonicity or ``no gouging’’ constraints) remains largely open, especially under asymmetric kinks.

A normative analysis requires a model of utility with reference dependence, not just demand. Embedding our dynamics in a utility-based framework would allow one to separate experienced gains/losses from purely behavioral distortions and to quantify when policies that reduce manipulation also reduce consumer enjoyment from discounts.

Finally, platform policy is rarely chosen in a monopoly vacuum. Competing platforms may choose different reference disclosures to attract consumers, and sellers may multi-home. Understanding equilibrium in disclosure and reference design, and its interaction with dynamic pricing, is essential for translating our comparative statics into market-level predictions.

Taken together, these discussion points reinforce the main conceptual takeaway: reference dependence makes the informational environment a first-class determinant of dynamic pricing incentives. By interpreting UI choices and ``usual price’’ rules as choices over effective memory, we obtain a coherent way to link platform design, regulation, revenue, and distributional outcomes, and we identify a set of measurable objects—memory span, responsiveness, and asymmetry—that should anchor empirical and policy work going forward.

Memory Matters: Phase Transitions in Dynamic Pricing with Time-Varying Reference Memory δt ∝ (t + 1)−α