TSPPT: Two-Stage Prompt Pre-Train to Promote Few-Shot Learning Performance
DOI: 10.23977/acss.2024.080107 | Downloads: 17 | Views: 145
Author(s)
Feng Jiang 1, Chengguo Lv 1
Affiliation(s)
1 Department of Computer Science, Heilongjiang University, Harbin, China
Corresponding Author
Chengguo LvABSTRACT
The Pretrained-Language Model (PLM) has achieved dominance in the field of Natural Language Processing (NLP), and prompt learning further enhances its impact by aligning the pre-training tasks of the language model with the downstream tasks. However, comparing with traditional fine-tune, prompt learning has some disadvantages such as poor absolute accuracy, low training efficiency and poor robustness, especially in the case of small parameters of the language model itself or insufficient training data. A large number of studies have shown that the main defect of Prompt learning (PL) at the present stage is that the quality of Prompt itself plays an important role in the performance of the model, and the existing initialization method of prompt is often not optimal. Therefore, we propose Two-Stage Prompt Pre-Train (TSPPT): using the special pre-training tasks, obtained by constructing or reforming raw texts and downstream tasks, to pre-train two sub-prompt, Task-oriented sub-Prompt (TSP) and Universal Sub-Prompt (USP), in two advanced stages respectively. By concatenating USP and TSP as the prompt initialization for language model to prompt-tuning on downstream tasks, TSPPT promotes overall performance, such as robustness, accuracy, and generalization. Experiments have shown that TSPPT can achieve or even exceed the performance of traditional fine-tuning while retaining the advantage freezing language model parameters and tuning few parameters only.
KEYWORDS
Artificial Intelligence, Natural Language Process, Few-Shot Learning, Prompt LearningCITE THIS PAPER
Feng Jiang, Chengguo Lv, TSPPT: Two-Stage Prompt Pre-Train to Promote Few-Shot Learning Performance. Advances in Computer, Signals and Systems (2024) Vol. 8: 63-71. DOI: http://dx.doi.org/10.23977/acss.2024.080107.
REFERENCES
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT.
[3] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR.
[4] Timo Schick and Hinrich Schütze. 2021a. Exploiting cloze questions for few-shot text classification and natural language inference. In Proceedings of EACL.
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. 2020. Language models are few-shot learners. In Proceedings of NeurIPS.
[6] Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of ACL.
[7] Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP.
[8] Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, et al. 2022. CPM-2: Large-scale cost-effective pre-trained language models. AI Open.
[9] Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. 2020. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Empirical Methods in Natural Language Processing (EMNLP).
[10] Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In Proceedings of ACL.
[11] Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2021. GPT understands, too. arXiv preprint arXiv:2103.10385.
[12] Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, and Daniel Cer. 2021. Spot: Better frozen model adaptation through soft prompt transfer. CoRR, abs/2110.07904.
[13] Yusheng Su and Xiaozhi Wang. On Transferability of Prompt Tuning for Natural Language Understanding. arXiv preprint arxiv:2111.06719.
[14] Yuxian Gu, Xu Han, Zhiyuan Liu and Minlie Huang. PPT: Pre-trained Prompt Tuning for Few-shot Learning. In Proceedings of ACL.
[15] Hanwei Xu, Yujun Chen and Yulun Du. Zero Prompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization. arXiv preprint arxiv:2201.06910.
[16] Xie, Q., Dai, Z., Hovy, E.H., Luong, T., Le, Q., 2020. Unsupervised data augmentation for consistency training. December 6-12, 2020. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020. virtual.
[17] Ethan Perez, Douwe Kiela, and Kyunghyun Cho. 2021. True few-shot learning with language models. In Proceedings of NeurIPS.
[18] Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, and Yoav Artzi. 2021. Revisiting few-sample bert fine-tuning. In Proceedings of ICLR.
Downloads: | 13478 |
---|---|
Visits: | 258646 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks