Pricing Heuristics Against Non-human Transaction Orchestration Mechanisms

IE University
Bachelor's Thesis 2025
Advisor: Alberto Martín Izquierdo
PHANTOM teaser diagram connecting vulnerability, behavioral signal, and robust control

Abstract

Dynamic pricing extracts margin by exploiting the gap between what a platform knows and what a buyer knows. A user who browses a hotel across several sessions signals intent; the platform raises the price accordingly. That information asymmetry — the Cost of Information — is the economic engine behind session-based pricing in travel, hospitality, and e-commerce.

LLM agents break the engine. An agent conducting reconnaissance in isolated sessions accumulates zero demand signal, then routes the purchase through a clean session at the floor price. As the number of independent querying agents grows, the realizable price converges to its minimum order statistic and COI collapses to zero. This is not a future risk; it is a structural failure mode in any pricing system that treats sessions independently.

PHANTOM formalizes the failure, measures it on real human and agent interaction data, and builds a defense. We prove the COI erosion theorem, collect 29 labeled sessions (13 human, 16 agent) across hotel and airline storefronts under goal-driven tasks, learn class-specific Markov transition kernels, and train a Distributionally Robust RL pricing policy over a Wasserstein ambiguity set. Behavioral separability is statistically significant (Mann–Whitney U = 2.0, p = 0.0006). The per-session agent probability signal f(τ) feeds directly into the robust policy reward as a COI-leakage penalty.

How it works

The methodology runs in three stages: observe, distinguish, defend.

Stage 1 — Observe

Both human participants and LLM agents are assigned goal-driven tasks on a live instrumented storefront (hotel or airline mode). Every interaction is logged as a timestamped event tuple (action, item, timestamp). Actions are partitioned into four semantic categories — cart, dwell, navigation, filter — with decreasing signal weights (4.0, 2.0, 1.0, 0.5) calibrated by the KL divergence between human and agent transition rows. Price quotes are streamed to a separate Kafka topic, enabling joint analysis of behavior and pricing exposure. The platform runs a surge-discount heuristic during collection to expose participants to state-dependent prices.

Stage 2 — Distinguish

From the labeled session trajectories, we estimate class-specific Markov transition kernels H and A by maximum likelihood. For any new partial trajectory τ', we compute KL divergence to each prototype:

ΔH = DKL(T̂' ∥ T̄H),   ΔA = DKL(T̂' ∥ T̄A)

The gap score g(τ') = ΔH − ΔA maps to a weak agent probability via a temperature-controlled logistic function: f(τ') = σ((ΔH − ΔA) / T). This is a continuous signal, not a binary bot flag. The Mann–Whitney test on gap scores between the 13-human and 16-agent cohorts yields U = 2.0, p = 0.0006 — the behavioral distributions are well separated.

Stage 3 — Defend

A contamination generator G(α) mixes real human trajectories with synthetic agent trajectories drawn from A to produce training distributions at any contamination level α ∈ [0, 1]. The pricing policy is trained as a Stackelberg leader against a Wasserstein ambiguity set around the generator's empirical distribution, minimizing worst-case regret over plausible demand shifts. The per-step reward penalizes COI leakage — weighted by f(τ') — while a UX index bounds harm to legitimate users. Sweeps ran across 384 TPU chips (v4, v5e, v6e Trillium) covering six contamination levels and multiple algorithm variants (PPO, A2C, DQN, Q-table).

Defense Scenes

Full Thesis

BibTeX

@thesis{Rosel2025PHANTOM,
  title={Pricing Heuristics Against Non-human Transaction Orchestration Mechanisms},
  author={Rösel, Daniel},
  school={IE University},
  year={2025},
  address={Madrid, Spain},
  type={Bachelor's Thesis},
  note={Advisor: Alberto Mart{\'i}n Izquierdo}
}