Education, Science, Technology, Innovation and Life
Open Access
Sign In

A Transferable Retrieval-Augmented Generation Framework for Vertical-Domain Question Answering: From Academic Competitions to Cultural Tourism and Financial Technology

Download as PDF

DOI: 10.23977/acss.2026.100104 | Downloads: 2 | Views: 61

Author(s)

Yongye Huang 1

Affiliation(s)

1 School of Mathematics and Statistics, Hanshan Normal University, Chaozhou, Guangdong, China

Corresponding Author

Yongye Huang

ABSTRACT

Vertical-domain question answering often relies on domain-specific retrieval pipelines and prompt designs, which limits robustness when transferred across heterogeneous domains. This paper presents a transferable Retrieval-Augmented Generation framework, where Retrieval-Augmented Generation (RAG) integrates external knowledge retrieval with large language model generation for grounded answering. The proposed framework targets cross-domain transfer from academic competition problem solving to cultural tourism services and financial technology applications by unifying query normalization, hybrid retrieval, and citation-consistent generation. Specifically, a domain router predicts an inference policy that adaptively configures sparse retrieval, dense retrieval, and neural re-ranking, while a query rewriting module converts user questions into a structured canonical form to reduce domain shift. Retrieved evidence is further standardized through evidence canonicalization to provide a consistent input schema for downstream generation. To improve reliability, the generation module incorporates evidence alignment and post-generation verification to reduce unsupported statements and enhance citation correctness. A transfer-oriented training strategy is introduced by combining contrastive retrieval learning, lightweight domain adaptation, and domain-invariant regularization, enabling effective adaptation under limited target-domain supervision. Experiments across three representative scenarios demonstrate that the framework improves answer accuracy, evidence recall, and citation consistency under both in-domain evaluation and few-shot transfer settings, indicating strong transferability and practical potential for deployable vertical-domain question answering systems.

KEYWORDS

Retrieval-Augmented Generation; Transfer Learning; Vertical-Domain Question Answering; Hybrid Retrieval; Domain Routing; Evidence Canonicalization; Citation Consistency

CITE THIS PAPER

Yongye Huang. A Transferable Retrieval-Augmented Generation Framework for Vertical-Domain Question Answering: From Academic Competitions to Cultural Tourism and Financial Technology. Advances in Computer, Signals and Systems (2026) Vol. 10: 27-38. DOI: http://dx.doi.org/10.23977/acss.2026.100104.

REFERENCES

[1] Lewis P, Perez E, Piktus A, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS), 2020. DOI: 10.48550/arXiv.2005.11401. 
[2] Izacard G, Lewis P, Lomeli M, et al. Atlas: Few-shot Learning with Retrieval Augmented Language Models. Journal of Machine Learning Research, 2023, 24: 251:1–251:43. DOI: 10.5555/3648699.3648950. 
[3] Borgeaud S, Mensch A, Hoffmann J, et al. Improving Language Models by Retrieving from Trillions of Tokens. International Conference on Machine Learning (ICML), 2022. DOI: 10.48550/arXiv.2112.04426. 
[4] Stolfo A. Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study. Findings of the Association for Computational Linguistics: NAACL, 2024. DOI: 10.18653/v1/2024.findings-naacl.100. 
[5] Roy N, Ribeiro L F R, Blloshmi R, Small K. Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA. Findings of the Association for Computational Linguistics: EMNLP, 2024. DOI: 10.18653/v1/2024.findings-emnlp.622. 
[6] Izacard G, Grave E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Proceedings of the EACL, 2021. DOI: 10.18653/v1/2021.eacl-main.74. 
[7] Santhanam K, Khattab O, Saad-Falcon J, et al. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. Proceedings of NAACL-HLT, 2022. DOI: 10.18653/v1/2022.naacl-main.272. 
[8] Thakur N, Reimers N, Rücklé A, et al. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS Datasets and Benchmarks Track, 2021. DOI: 10.48550/arXiv.2104.08663. 
[9] Petroni F, Piktus A, Fan A, et al. KILT: A Benchmark for Knowledge Intensive Language Tasks. Proceedings of NAACL-HLT, 2021. DOI: 10.18653/v1/2021.naacl-main.200. 
[10] Formal T, Piwowarski B, Clinchant S. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. Proceedings of SIGIR, 2021. DOI: 10.1145/3404835.3463098. 
[11] Dasigi P, Lo K, Beltagy I, et al. A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers (Qasper). Proceedings of NAACL-HLT, 2021. DOI: 10.18653/v1/2021.naacl-main.365. 
[12] Es S, James J, Guitton L, et al. RAGAs: Automated Evaluation of Retrieval Augmented Generation. Proceedings of EACL (System Demonstrations), 2024. DOI: 10.18653/v1/2024.eacl-demo.16. 
[13] Rau D, Déjean H, Chirkova N, et al. A Benchmarking Library for Retrieval-Augmented Generation (BERGEN). Findings of the Association for Computational Linguistics: EMNLP, 2024. DOI: 10.18653/v1/2024.findings-emnlp.449. 
[14] Tahaei M, et al. Efficient Citer: Tuning Large Language Models for Enhanced Answer Quality and Verification. Findings of the Association for Computational Linguistics: NAACL, 2024. DOI: 10.18653/v1/2024.findings-naacl.277. 
[15] Ramu P, et al. Enhancing Post-Hoc Attributions in Long Document Question Answering. Proceedings of EMNLP, 2024. DOI: 10.18653/v1/2024.emnlp-main.985. 

Downloads: 43039
Visits: 928279

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.