Education, Science, Technology, Innovation and Life
Open Access
Sign In

Exploration of Formalization Techniques for Geometric Entities in Planar Geometry Proposition Texts

Download as PDF

DOI: 10.23977/jaip.2025.080105 | Downloads: 20 | Views: 528

Author(s)

Huiying Tang 1, Xijia Wu 2, Lian Chen 2

Affiliation(s)

1 School of Modern Information Industry, Guangzhou College of Commerce, Guangzhou, China
2 School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China

Corresponding Author

Lian Chen

ABSTRACT

This paper explores the formalization technology of geometric entities in planar geometry proposition texts and proposes a formal definition of planar geometric entities. The study investigates the formalization of planar geometry proposition texts using NER technology and the latest large language models. The research utilizes the classic NER model BiLSTM-CRF for sequence labeling and training to achieve geometric entity recognition, followed by dictionary-based formalization. Additionally, the study employs the PROMPT approach based on Baidu's ERNIE-Bot and iFlytek's Spark large language models for geometric entity recognition and subsequent formalization. The results show that the BiLSTM-CRF model achieves an accuracy of 98% in geometric entity recognition, demonstrating stable performance but limited flexibility. ERNIE-Bot performs exceptionally well, with a recognition accuracy of 99%, although it struggles with certain special geometric entities. The Spark large model performs slightly worse, with issues primarily related to repeated or missed entity recognition. While the classic deep learning model exhibits high stability for recognizing common expressions in planar geometry proposition texts, it has limited adaptability to new and complex expressions not present in the training set. Large language models excel in simple tasks due to their strong language processing and contextual understanding capabilities; however, their accuracy depends heavily on the precision of PROMPT design, and their commercial use entails significant costs. This paper provides technical insights into the formalization of geometric entities in planar geometry proposition texts and offers a valuable reference for applications in intelligent education, particularly in geometry teaching and automated proof generation.

KEYWORDS

LLMs, NER, BiLSTM, Text Formalization

CITE THIS PAPER

Huiying Tang, Xijia Wu, Lian Chen, Exploration of Formalization Techniques for Geometric Entities in Planar Geometry Proposition Texts. Journal of Artificial Intelligence Practice (2025) Vol. 8: 37-47. DOI: http://dx.doi.org/10.23977/jaip.2025.080105.

REFERENCES

[1] ZHANG Jinbao, JI Lingyan. (2018). Artificial Intelligence Enhanced Education or Education for Development of Intelligence——An Analysis on Connotation and Target of Intelligent Education in the Age of Artificial Intelligence. Modern Distance Education Research (02), 14-23.  
[2] Chen, W., Yin, M., Ku, M., Lu, P., Wan, Y., Ma, X., ... & Xia, T. (2023, December). Theoremqa: A theorem-driven question answering dataset. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7889-7901).
[3] Liu, W., Hu, H., Zhou, J., Ding, Y., Li, J., Zeng, J., ... & He, L. (2023). Mathematical language models: A survey. arXiv preprint arXiv:2312.07622.
[4] Chen, J., Tang, J., Qin, J., Liang, X., Liu, L., Xing, E. P., & Lin, L. (2021). GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning. arXiv preprint arXiv:2105.14517.
[5] GAN Wenbin. (2018). Research on Automatically Solving Plane Geometry Problems (DOCTORAL DISSERTATION, Central China Normal University). Doctor https://kns.cnki.net/kcms2/article/abstract?v=x0sJsLzXXbifstQ4abWx-ScKx680f7nGI4SknzZ1b77oYizZSu6djQB2unmRDyZ1Blilue9fFj172EN0ftOI8z-Jr3z4yEyB3xVgdT0hXwgt8lgm0IQvxvcciqxI_szmpqGpcoNZ4aryBp8Pnz9hlnrARuCbiQTdi5AL9IHyMsSxO6d10vvwiyViFAcWrHij0tD0Lm2r-9M=&uniplatform=NZKPT&language=CHS
[6] Lample, G., & Charton, F. (2019). Deep learning for symbolic mathematics. arXiv preprint arXiv:1912.01412.
[7] Ning, M., Wang, Q. F., Huang, K., & Huang, X. (2023, October). A symbolic characters aware model for solving geometry problems. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 7767-7775).
[8] Jingzhong Zhang, & Yongbin Li. (2009). Thirty years of machine proof of geometric theorems. Journal of Systems Science and Mathematical Sciences, 29(9), 1155.
[9] Xia, R., Li, M., Ye, H., Wu, W., Zhou, H., Yuan, J., ... & Zhang, B. (2024). GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training. arXiv preprint arXiv:2412.11863.
[10] Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences, 103, 102274.
[11] Lafferty, J., McCallum, A., & Pereira, F. (2001, June). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Icml (Vol. 1, No. 2, p. 3).
[12] Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
[13] Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT (Vol. 1, No. 2).
[14] Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the association for computational linguistics, 8, 64-77.
[15] He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654. 
[16] Koroteev, M. V. (2021). BERT: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943.
[17] Zhu, W., & Cheung, D. (2021). Lex-bert: Enhancing bert based ner with lexicons. arXiv preprint arXiv:2101.00396.
[18] Welsby, P., & Cheung, B. M. (2023). ChatGPT. Postgraduate Medical Journal, 99(1176), 1047-1048. 
[19] Cosentino, R., & Shekkizhar, S. (2024). Reasoning in large language models: A geometric perspective. arXiv preprint arXiv:2407.02678.
[20] ZHANG Yu. (2018). Design and creation of a formalized geometric knowledge base. Information Technology (05), 139-144. doi:10.13274/j.cnki.hdzj.2018.05.031.
[21] Fu Fan. (2019). The Formal Proof of Geometric Theorem in Coq Case Study: The Formal Proof of Yanglu Theorem (MASTER DISSERTATION, Beijing University of Posts and Telecommunications). Master https://kns.cnki.net/kcms2/article/abstract?v=x0sJsLzXXbhUTm9QAaVaWw0sxdbtua_YJ9gcAi8jIlaLNl3yqiyusHcIUMmCTdFwI5NQAHM2rdOcrTLIfx4jl8ZNaxl_58h6iqFYAlzXRh 16GVwHzA49UUasLg_QJccHbT7y2V4rB_goSEOpCRVEoQ3ZtEz5ibSWBGy5Ze3udB5eeFaC6K__cF4znQ2l2FqCuPX-usCsPwU=&uniplatform=NZKPT&language=CHS
[22] Lignos, C., Kruse, M., & Rueda, A. (2023, December). Improving NER Research Workflows with SeqScore. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023) (pp. 147-152). 
[23] Qiu, Q., Xie, Z., Wu, L., Tao, L., & Li, W. (2019). BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Science Informatics, 12, 565-579.
[24] SHAO Jingze, LI Linna &NIE Yaoxin. (2022). A Chinese entity extraction method based on BILSTM_CRF. China CIO News (10), 124-127.
[25] Ji, Y., & Gao, S. (2023). Evaluating the effectiveness of large language models in representing textual descriptions of geometry and spatial relations. arXiv preprint arXiv:2307.03678.
[26] Gao, J., Pi, R., Zhang, J., Ye, J., Zhong, W., Wang, Y., ... & Kong, L. (2023). G-llava: Solving geometric problem with multi-modal large language model. arXiv preprint arXiv:2312.11370. 

Downloads: 15127
Visits: 485145

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.