Research on the Integration Mechanism of Large Language Models and Analyst Consensus: An Empirical Study Based on Financial Forecasting

Jiqi Li

doi:10.23977/infse.2026.070108

Research on the Integration Mechanism of Large Language Models and Analyst Consensus: An Empirical Study Based on Financial Forecasting

Download as PDF

DOI: 10.23977/infse.2026.070108 | Downloads: 5 | Views: 224

Author(s)

Jiqi Li ¹

Affiliation(s)

¹ School of Economics and Management, Tongji University, Shanghai, China

Corresponding Author

Jiqi Li

ABSTRACT

When both a human analyst signal and large language model (LLM) judgment are available for a financial forecasting task, what governs the quality of the result is the way the two are combined rather than the mere fact of combining them. This paper compares three prompt-defined integration mechanisms implemented through a single commercial LLM over 6,363 U.S. firm-year observations. The mechanisms differ only in how each handles the analyst signal. One withholds it entirely and asks the model to read the statements on its own. Structured integration supplies the consensus alongside simple reliability metadata. Critical integration goes further and requires the model to challenge that signal through an explicit multi-step deliberative protocol. The engaged-use-of-AI principle from the human-AI collaboration literature predicts that the most deliberative mechanism should perform best, yet the evidence points the other way. Structured integration attains the highest accuracy at about 59%, while critical integration is the lowest at about 54%, falling below even the signal-free baseline at about 55%, and the gaps are statistically significant. The structured advantage concentrates in financially healthier and non-loss firms, where analyst signals are most reliable. Viewed alongside the emerging literature on LLM overthinking and the long-standing verbal-overshadowing effect, this reversal marks a boundary condition of the engaged-use principle. Critical engagement adds value when a human expert interrogates an AI, because the expert brings independent domain knowledge to the critique; transposed onto an LLM asked to scrutinize a human signal, the same protocol supplies no new information and introduces deliberative interference instead.

KEYWORDS

Large Language Models, Human-AI Collaboration, Financial Forecasting, Prompt Engineering, Analyst Consensus

CITE THIS PAPER

Jiqi Li. Research on the Integration Mechanism of Large Language Models and Analyst Consensus: An Empirical Study Based on Financial Forecasting. Information Systems and Economics (2026). Vol. 7, No.1, 68-73. DOI: http://dx.doi.org/10.23977/infse.2026.070108.

REFERENCES

[1] Cao, S., Jiang, W., Wang, J.L. and Yang, B. (2024) From Man vs. Machine to Man + Machine: The Art and AI of Stock Analyses. Journal of Financial Economics, 160, 103910.
[2] Kim, A.G., Muhn, M. and Nikolaev, V.V. (2024) Financial Statement Analysis with Large Language Models. SSRN Journal.
[3] Lebovitz, S., Lifshitz-Assaf, H. and Levina, N. (2022) To Engage or Not to Engage with AI for Critical Judgments: How Professionals Deal with Opacity When Using AI for Medical Diagnosis. Organization Science, 33, 126-148.
[4] Hassid, M., Synnaeve, G., Adi, Y. and Schwartz, R. (2026) Don't Overthink It: Preferring Shorter Thinking Chains for Improved LLM Reasoning. arXiv preprint, arXiv:2505.17813.
[5] Liu, R., Geng, J., Wu, A.J., Sucholutsky, I., Lombrozo, T. and Griffiths, T.L. (2025) Mind Your Step (by Step): Chain-of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse. arXiv preprint, arXiv:2410.21333.
[6] Schooler, J.W. and Engstler-Schooler, T.Y. (1990) Verbal Overshadowing of Visual Memories: Some Things Are Better Left Unsaid. Cognitive Psychology, 22, 36-71.
[7] Wilson, T.D. and Schooler, J.W. (1991) Thinking Too Much: Introspection Can Reduce the Quality of Preferences and Decisions. Journal of Personality and Social Psychology, 60, 181-192.
[8] Lopez-Lira, A. (2024) Can ChatGPT Forecast Stock Price Movements? The Predictive Edge: Outsmart the Market Using Generative AI and ChatGPT in Financial Forecasting, 121-133.
[9] Hansen, A.L. and Kazinnik, S. (2023) Can ChatGPT Decipher Fedspeak? SSRN Journal.
[10] Jha, M., Qian, J., Weber, M. and Yang, B. (2024) ChatGPT and Corporate Policies. NBER Working Paper, 32161.
[11] Siano, F. (2025) The News in Earnings Announcement Disclosures: Capturing Word Context Using LLM Methods. Management Science.
[12] Frank, M.Z., Gao, J. and Yang, K. (2025) Behavioral Machine Learning? Regularization and Forecast Bias. arXiv preprint, arXiv:2303.16158.
[13] Kang, H. and Liu, X.Y. (2023) Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination. arXiv preprint, arXiv:2311.15548.
[14] Sarkar, S.K. and Vafa, K. (2024) Lookahead Bias in Pretrained Language Models. SSRN Working Paper, 4754678.
[15] Bertomeu, J., Lin, Y., Liu, Y. and Ni, Z. (2023) Capital Market Consequences of Generative AI: Early Evidence from the Ban of ChatGPT in Italy. SSRN Journal.
[16] Dellermann, D., Ebel, P., Soellner, M. and Leimeister, J.M. (2019) Hybrid Intelligence. Business & Information Systems Engineering, 61, 637-643.
[17] Fügener, A., Grahl, J., Gupta, A. and Ketter, W. (2021) Will Humans-in-the-Loop Become Borgs? Merits and Pitfalls of Working with AI. MIS Quarterly, 45, 1527-1556.
[18] Vaccaro, M., Almaatouq, A. and Malone, T. (2024) When Combinations of Humans and AI Are Useful: A Systematic Review and Meta-Analysis. Nature Human Behaviour, 8, 2293-2303.
[19] Lim, T. (2001) Rationality and Analysts' Forecast Bias. The Journal of Finance, 56, 369-385.
[20] Welch, I. (2000) Herding Among Security Analysts. Journal of Financial Economics, 58, 369-396.
[21] Michaely, R. and Womack, K.L. (1999) Conflict of Interest and the Credibility of Underwriter Analyst Recommendations. The Review of Financial Studies, 12.
[22] Bordalo, P., Gennaioli, N., Porta, R.L. and Shleifer, A. (2019) Diagnostic Expectations and Stock Returns. The Journal of Finance, 74, 2839-2874.
[23] Bouchaud, J., Krüger, P., Landier, A. and Thesmar, D. (2019) Sticky Expectations and the Profitability Anomaly. The Journal of Finance, 74, 639-674.
[24] Van Binsbergen, J.H., Han, X. and Lopez-Lira, A. (2023) Man versus Machine Learning: The Term Structure of Earnings Expectations and Conditional Biases. The Review of Financial Studies, 36, 2361-2396.

Subscription

E-Mail Alert

Downloads:	24446
Visits:	811899

Research on the Integration Mechanism of Large Language Models and Analyst Consensus: An Empirical Study Based on Financial Forecasting

Author(s)

Affiliation(s)

Corresponding Author

ABSTRACT

KEYWORDS

CITE THIS PAPER

REFERENCES

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US