Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on Graph-based Text Summarization Extraction Algorithm

Download as PDF

DOI: 10.23977/acss.2024.080603 | Downloads: 31 | Views: 990

Author(s)

Junhong Chen 1,2, Kaihui Peng 3

Affiliation(s)

1 School of Software Engineering, South China University of Technology, Guangzhou, China
2 LeiHuo Studio, NetEase, Hangzhou, China
3 Faculty of Business and Economics, University of Malaya, Kuala Lumpur, Malaysia

Corresponding Author

Junhong Chen

ABSTRACT

This paper proposes a graph-based text summarization extraction algorithm. The algorithm is based on directed graphs and can incorporate the position information of sentences into the computational scope. When calculating the edge weights of nodes in the directed graph, a pre-trained model after negative sampling is used, which not only can extract deeper semantic features but also enable higher relevance between the contextual sentences in the article. The algorithm also introduces a weighting mechanism to adjust the extraction priority of the sentences according to the article's theme, resulting in a higher quality of extracted summary sentences that can represent the key information of the text as much as possible. The algorithm can capture the key information in the text, reduce the impact of irrelevant information on semantics, and play a role in text compression.

KEYWORDS

Text Summarization; Keyword Extraction; Pre-trained Model

CITE THIS PAPER

Junhong Chen, Kaihui Peng, Research on Graph-based Text Summarization Extraction Algorithm. Advances in Computer, Signals and Systems (2024) Vol. 8: 13-22. DOI: http://dx.doi.org/10.23977/acss.2024.080603.

REFERENCES

[1] Mihalcea R, Tarau P. Textrank: Bringing order into text[C]//Proceedings of the 2004 conference on empirical methods in natural language processing. 2004: 404-411.
[2] Devlin J. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[3] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for chinese bert[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[4] Abdi A, Shamsuddin S M, Idris N, et al. A linguistic treatment for automatic external plagiarism detection[J]. Knowledge-Based Systems, 2017, 135: 135-146.
[5] Nápoles G, Dikopoulou Z, Papageorgiou E, et al. Prototypes construction from partial rankings to characterize the attractiveness of companies in Belgium[J]. Applied Soft Computing, 2016, 42: 276-289.
[6] Goyal R, Dymetman M, Gaussier E. Natural language generation through character-based rnns with finite-state prior knowledge[C]//Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016: 1083-1092.
[7] Yu D, Wang H, Chen P, et al. Mixed pooling for convolutional neural networks[C]//Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China, October 24-26, 2014, Proceedings 9. Springer International Publishing, 2014: 364-375.
[8] Li B, Zhou H, He J, et al. On the sentence embeddings from pre-trained language models[J]. arXiv preprint arXiv:2011.05864, 2020.
[9] Yu Y, Wang Y, Mu J, et al. Chinese mineral named entity recognition based on BERT model[J]. Expert Systems with Applications, 2022, 206: 117727.
[10] Mahata D, Kuriakose J, Shah R, et al. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018: 634-639.
[11] Yao L, Pengzhou Z, Chi Z. Research on news keyword extraction technology based on TF-IDF and TextRank[C]// 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS). IEEE, 2019: 452-455.
[12] Genest P E, Lapalme G. Framework for abstractive summarization using text-to-text generation[C]//Proceedings of the workshop on monolingual text-to-text generation. 2011: 64-73. 
[13] Hua L, Wan X, Li L. Overview of the NLPCC 2017 shared task: single document summarization[C]//Natural Language Processing and Chinese Computing: 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8–12, 2017, Proceedings 6. Springer International Publishing, 2018: 942-947.
[14] Yuan W, Neubig G, Liu P. Bartscore: Evaluating generated text as text generation[J]. Advances in Neural Information Processing Systems, 2021, 34: 27263-27277.

Downloads: 38553
Visits: 697954

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.