Skip to Content

A New Playbook for Wholesale Pricing

03-02-2026

In today’s data-driven supply chains, it’s not just customers who are learning — your partners are learning too. When your retailers constantly update their inventory models and pricing, your own wholesale pricing strategy has to learn to keep up.

A study coauthored by Daniels School Associate Professor William Haskell, “Learning to Price Supply Chain Contracts Against a Learning Retailer,” looks at a common but underexplored situation: a supplier sells through a retailer (such as a consumer brand selling via Amazon), but only sees the retailer’s orders, not end-customer demand.

Each period, the supplier sets a wholesale price, while the retailer, using its own data and algorithms, chooses an order quantity. Customers then generate random demand, which only the retailer observes.

Crucially, neither party knows the true demand distribution in advance. The retailer uses some data-driven inventory policy (which may change over time), but the supplier does not know that policy and must infer market conditions only from order quantities.

The core question: Can a supplier still do nearly as well as an “all-knowing” supplier who fully understands demand and the retailer’s policy? The answer, under realistic conditions, is yes.

Key findings

The research shows how suppliers can “learn to price” against a learning retailer. The authors design pricing policies that approach the performance of an ideal clairvoyant supplier. Over time, the average performance gap shrinks, meaning the supplier’s losses from not knowing the environment grow slowly rather than accumulating rapidly.

Importantly, the approach works for both discrete and continuous demand. Whether products are sold in units like pallets and cases or in continuous quantities such as tons or gallons, the framework can be adapted.

The study also finds that suppliers do not need direct visibility into customer demand. The algorithm relies only on the supplier’s own historical prices, the retailer’s resulting orders and basic knowledge of feasible demand ranges. This is important because most suppliers lack access to real-time point-of-sale data.

The authors warn that off-the-shelf bandit methods and generic online-learning tools can perform poorly in this setting. In realistic scenarios — such as when the retailer uses standard sample-average inventory policies — these tools can generate losses that grow linearly over time. In simulations, the tailored pricing policy consistently outperforms these general-purpose algorithms.

The study also introduces a smarter way to measure how much the environment is changing. Instead of tracking how profits fluctuate, the authors track how the retailer’s implied demand estimates shift over time. This better captures how retailer learning affects the supplier’s world and allows robust pricing even when the retailer switches or mixes decision policies.

Practical implications

For suppliers and upstream leaders, three messages stand out:

  • Don’t treat your retailer as a black box — treat them as a learning agent. Their orders aren’t just reflecting demand; they’re also reflecting their algorithms and updates. Monitoring how order patterns change over time can reveal when the retailer’s model or behavior has shifted.
  • You can build pricing that learns, even with limited visibility. You don’t need direct POS data to get smarter. A disciplined, experiment‑friendly pricing process — occasionally testing different wholesale prices and tracking resulting orders — can move you closer to “clairvoyant” performance over time.
  • Custom, structure‑aware methods beat generic tools. If your data-science team is using generic online-learning or bandit tools for wholesale pricing, there’s likely value left on the table. Approaches that exploit the specific economics of wholesale contracts and inventory behavior can deliver meaningfully better performance.

Actionable insights

Here are concrete moves to consider if you’re a supplier or run an upstream business unit:

  • Instrument your retailer interactions. Capture, period by period: wholesale prices, order quantities, lead times, and any known constraints (capacity, minimum order quantities). Ensure this data is clean, time‑stamped and easily accessible for analytics.
  • Build a “learning wholesale pricing” capability. Start with a simple, structured experiment design: periodically vary wholesale prices within an acceptable band to learn how the retailer reacts. Use this to estimate how sensitive their orders are to price and how that sensitivity changes over time.
  • Watch for non-stationarity signals. Large, sudden changes in order quantities at similar prices often signal that the retailer has changed their model, policy or expectations. Have a playbook: when such shifts are detected, temporarily increase experimentation to relearn the landscape.
  • Design contracts with learning in mind. Include clauses or mechanisms that give you visibility or allow controlled price experimentation over the contract term. Where possible, align incentives so that the retailer’s learning doesn’t inadvertently hurt joint profits.
  • Upgrade analytics from “learn demand” to “learn others’ learning.” Traditional analytics focuses on estimating customer demand. This work suggests you should also model how your partners learn and respond. In multi‑agent environments (platforms, marketplaces, ecosystems), treating counterparties as adaptive learners and designing your pricing algorithms accordingly will become a key competitive advantage.

Ultimately, this research shows that suppliers don’t have to be passive price‑takers in a world of sophisticated, data‑driven retailers. With the right learning‑based pricing strategies, you can remain agile, protect margins and keep pace with partners whose algorithms are changing just as fast as the market itself.