The Training Process and Methods for LLMs Using an Own Knowledge Base
DOI: 10.23977/jaip.2024.070306 | Downloads: 30 | Views: 1136
Author(s)
Sheng Zhiyuan 1
Affiliation(s)
1 Wuhan Foreign Languages School, Wuhan, Hubei, China
Corresponding Author
Sheng ZhiyuanABSTRACT
This paper explores the development of frameworks and training methods for large language models (LLMs), focusing on the importance of self-built data (Own data or Own Knowledge Base), specific processes of model pre-training and fine-tuning, and model performance evaluation and deployment effects. By introducing and analysing the advantages and disadvantages of mainstream large language models (such as GPT-4, BERT, LLaMA, and Mistral), we illustrate the strengths and limitations of large language models in natural language processing tasks. This paper particularly emphasises the critical role of self-built data in enhancing the model's professionalism and accuracy, discussing data collection and processing methods. We detail the steps of model pre-training and their impact on model performance, explore the necessity and implementation of model fine-tuning, and validate the effectiveness of the proposed framework training method through performance evaluation metrics and actual deployment effects.
KEYWORDS
Large Language Models, LLMs, Own Data, Own Knowledge Base, Pre-Training, Fine-Tuning, Performance Evaluation, NLP, CVCITE THIS PAPER
Sheng Zhiyuan, The Training Process and Methods for LLMs Using an Own Knowledge Base. Journal of Artificial Intelligence Practice (2024) Vol. 7: 41-47. DOI: http://dx.doi.org/10.23977/jaip.2024.070306.
REFERENCES
[1] Wolf T., Chaumond J., Debut L., Sanh V., & Delangue C. (2020). "Transformers: State-of-the-Art Natural Language Processing." In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 38-45.
[2] Strubell E., Ganesh A., & McCallum A. (2019). "Energy and Policy Considerations for Deep Learning in NLP." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3645-3650.
[3] Radford A., Narasimhan K., Salimans T., & Sutskever I. (2018). "Improving Language Understanding by Generative Pre-Training." OpenAI, pp. 1-12.
[4] He P., Liu X., Gao J., & Chen W. (2021). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention." In International Conference on Learning Representations, pp 1-19.
[5] Liu Zhiyuan, Sun Maosong. (2019). "Natural Language Processing: Methods Based on Pre-trained Models." Beijing: Science Press, pp. 15-20.
[6] Wang Liwei, Zhang Min. (2020). "Deep Learning-Based Natural Language Processing." Beijing: People's Posts and Telecommunications Press, pp. 85-92.
[7] Jiang Tianzai, Liu Peng, Yang Jian. (2021). "Deep Learning and Natural Language Processing: Algorithms, Models, and Applications." Beijing: Electronic Industry Press, pp. 110-120.
Downloads: | 15127 |
---|---|
Visits: | 485223 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Advances in Computer, Signals and Systems
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks