01-10-2025
Large language models (LLMs) like ChatGPT have surged in popularity across industries, generating excitement in finance for their analytical capabilities. When asked to forecast expected returns based on historical stock market data, however, AI models tend to make the same mistakes as the humans who trained them.
In their paper “What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts,” Shuaiyu Chen and Huseyin Gulen from the Mitch Daniels School of Business collaborate with fellow researchers T. Clifton Green and Dexin Zhou to examine whether ChatGPT shows the same behavioral tendencies as humans when predicting stock returns.
The question is whether LLMs like ChatGPT, trained on large sets of data that often originate from humans, will share the same cognitive biases as the people the data comes from?
To test this, Chen and Gulen compare stock return forecasts generated by ChatGPT with those made by humans, both using historical data to predict future returns. For instance, the researchers analyzed how ChatGPT ranked a selection of ten stocks when given 12 weeks of historical return data as compared to how humans approach ranking stocks.
In this example, both humans and ChatGPT use past data to make predictions in a strikingly similar way. Humans often place too much emphasis on recent data, which negatively impacts the accuracy of their forecasts. In its analysis, ChatGPT generally makes the same error, overvaluing recent trends and leading to similarly flawed predictions.
In another test, the researchers compared survey results from CFOs asked to provide stock return forecasts with those generated by ChatGPT. While ChatGPT’s forecasts are generally more accurate than human predictions, they tend to be too optimistic when predicting expected performance.
ChatGPT generally showed better math and risk assessment skills, but its overoptimism may be a mistake inherited from humans. Since research shows humans tend to be overly optimistic in a variety of settings, this overoptimism may be reflected in the data that trains LLMs like ChatGPT.
Chen and Gulen’s findings suggest that ChatGPT and similar AI models, much like the humans who trained them, may not always interpret data in a fully rational manner. Their analysis illustrates the importance of critically evaluating the role of AI in finance, reminding us not to place unquestioning trust in AI outputs, no matter how intelligent they may seem.