Companies May Be Overspending on AI

06-08-2026

Companies rushing to deploy artificial intelligence may be burning money on expensive frontier models when cheaper alternatives could do the job, according to a new working paper from the Daniels School of Business’ Chen-An Lin.

Lin, an assistant professor in the Supply Chain and Operations Management Department, and fellow authors Yuan Guo of George Washington University and Stefanus Jasin of the University of Michigan examined how organizations can strategically invest in large language models without overspending on tokens from the most powerful systems. Their paper, “Congestion-Aware Static LLM Cascades,” reveals that firms don't need complex chains of expensive AI models because the marginal value of deeper routing can diminish quickly.

Rethinking AI as a service system

“Practitioners have to change their mindset to think about AI as not just a judge. It's a service,” Lin says. “And for high-quality service, it needs high accuracy. And at the same time, it should be quick.”

Most companies focus solely on accuracy when selecting AI models, Lin explains. But in production environments, response time matters just as much. Consider healthcare: when a doctor asks an AI system for diagnostic support in high-stakes situations, the system needs to respond quickly while maintaining accuracy because a person’s life may be on the line. Lin noted that as AI usage increases, response times can grow longer, creating what researchers call “elastic service latency.”

The same principle applies in financial services, where firms need to detect fraudulent transactions rapidly. “Due to a large volume of transactions, they want to reduce or they want to have a quick lead time to get a response,” says Lin. “So basically, if the waiting time gets longer to get the response back from the model, the customer experience will be hurt.”

The congestion problem companies ignore

The research addresses a critical gap in how companies currently approach AI deployment. Existing frameworks like FrugalGPT and RouteLLM focus on balancing accuracy against token costs but treat latency as fixed. They miss how routing decisions create congestion in finite-capacity systems.

Lin's team modeled LLM inference as a multiclass queueing network where delay comes from an overloaded system. When companies send more traffic to a seemingly inexpensive model, that model becomes slow, increasing delay costs for all jobs using it.

The paper demonstrates that routing in AI tasks should be a consideration, lest routing creates operational fragility. The researchers show that classical congestion-blind routing from programming literature naturally funnels traffic into accuracy bottlenecks, causing queue lengths to explode.

A simpler, shallower approach

One of the study's most practical findings is that companies don't need to invest in numerous LLM models or build deep orchestration chains.

“Our results robustly show that to build an infrastructure, you do not need to build a complex, very long LLM orchestration,” Lin says. “Basically, a shallow layer — maybe we can just utilize a small set of carefully routed LLMs. These can capture the most benefits of AI [for a company].”

The research introduces a scoring method based on what the paper calls “idle cost,” which combines query fees, classification accuracy and service speed. This metric helps companies rank models and determine which ones to use for different task types.

Lin explains the framework, saying, “We don't need to have the most expensive, heavy-duty tool for every single task.” Just as a craftsperson selects appropriate tools for each job, companies should match AI models to task complexity.

For instance, simple email responses should route to lighter, less expensive models, while complex research tasks might justify frontier models. “For different types of tasks, in our model, we can give a ranking of those models. Then, for that type of job, you may just need to have access to the top two of the LLMs in the rank list.”

Strategic implementation guidance

The research provides concrete decision-making frameworks for executives considering AI infrastructure investments. “Let's say there's a company that has never used AI before, and right now the CEO decides to invest a huge amount of money into their AI infrastructure,” says Lin. “Then, how many models should they acquire? Our research helps executives reason through questions about how large and complex their AI routing infrastructure needs to be.”

Once companies acquire a set of models, the framework shows how to effectively use them to achieve high accuracy while minimizing spending and reducing poor customer experiences from delays.

For companies ready to transition from current cost-per-token routing strategies to congestion-aware approaches, Lin says awareness is the first step.

Companies May Be Overspending on AI

Rethinking AI as a service system

The congestion problem companies ignore

A simpler, shallower approach

Strategic implementation guidance

Thought Leadership from Purdue's Business School Daniels Insights