Interpretable Machine Learning in Enzyme Classification: A SHAP-Guided Analysis of Global Structural Features
DOI: 10.23977/jaip.2026.090113 | Downloads: 2 | Views: 92
Author(s)
Donghan Li 1
Affiliation(s)
1 Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
Corresponding Author
Donghan LiABSTRACT
Accurate computational annotation of protein function addresses a critical bottleneck in bioinformatics. This study presents an explainable machine learning framework that predicts a protein's functional class from its physicochemical profile. A Random Forest model, trained on four structural descriptors from the Protein Data Bank, classifies proteins into three major enzyme classes: Hydrolase, Oxidoreductase, and Transferase. The model achieved robust performance (test accuracy: 71.42%), with SHAP analysis identifying molecular weight as the primary discriminative feature. To ensure practical utility and reproducibility, the model is deployed as an interactive Shiny application and an open-source R package, providing a reliable and accessible tool for the community.
KEYWORDS
R language; Database; Machine learning; Bioinformatics; Shiny Web; Protein predictionCITE THIS PAPER
Donghan Li. Interpretable Machine Learning in Enzyme Classification: A SHAP-Guided Analysis of Global Structural Features. Journal of Artificial Intelligence Practice (2026). Vol. 9, No. 1, 108-121. DOI: http://dx.doi.org/10.23977/jaip.2026.090113.
REFERENCES
[1] Salem, R., Aidaros, B. and Al-Obeidat, F. (2025) 'Exploring Deep Learning Models for Protein Sequence Classification: A Comparative Study', 2025 International Conference on Electrical, Communication and Computer Engineering (ICECCE), Electrical, Communication and Computer Engineering (ICECCE), 2025 International Conference on, pp. 1–6.
[2] Mi, J. et al. (2024) ‘GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features', Briefings in Bioinformatics, 25(6), pp. 1–10. doi:10.1093/bib/bbae559
[3] Li, C., Zheng, Y. and Jagodzinski, F. (2024) 'How pairs of insertion mutations impact protein structure: an exhaustive computational study', Bioinformatics Advances, 4(1), pp. 1–11.
[4] E. V. Malyugin and D. A. Afonnikov (2025) 'OrthoML2GO: homology-based protein function prediction using orthogroups and machine learning', Вавиловский журнал генетики и селекции, 29(7), pp. 1145–1154. doi:10.18699/vjgb-25-119.
[5] Nguyen, H.H., Viviani, J.-L. and Ben Jabeur, S. (2025) 'Bankruptcy prediction using machine learning and Shapley additive explanations', Review of Quantitative Finance & Accounting, 65(1), pp. 107–148. doi:10.1007/s11156-023-01192-x.
[6] Bini, G., Tamburello, G., Cacciaguerra, S., & Perfetti, P. (2025). sGs UnMix: a web application for spatial prediction and mixture modeling with a case study on volcanic soil CO2 fluxes. Environmental Modelling and Software, 193. https://doi.org/10.1016/j.envsoft.2025.106652
[7] Lesnick, M.L. et al. (2005) 'Identification of remote protein homologs by probabilistic comparison of sequence profiles using k-mer counts', Bioinformatics, 21(10), pp. 2302–2310.
[8] Kawashima, S. and Kanehisa, M. (2000) 'AAindex: Amino Acid Index Database', Nucleic Acids Research, 28(1), pp. 374.
[9] Kabsch, W. and Sander, C. (1983) 'Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features', Biopolymers, 22(12), pp. 2577–2637.
[10] Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S. and Rives, A. (2022) 'Evolutionary-scale prediction of atomic-level protein structure with a language model', Science, 379(6637), pp. 1123–1130.
[11] Webb, E.C. (1992) Enzyme nomenclature 1992: recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. San Diego: Academic Press.
[12] Dietterich, T.G. (2000) 'Ensemble methods in machine learning', in Multiple Classifier Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1–15. Available at: https://doi.org/10.1007/3-540-45014-9_1.
[13] Džeroski, S. and Ženko, B. (2004) 'Is combining classifiers with stacking better than selecting the best one?', Machine Learning, 54(3), pp. 255–273.
[14] Maksymiuk, S., Gosiewska, A., Biecek, P., Staniak, M. and Burdukiewicz, M. (2020) shapper: Wrapper of Python Library 'shap' [R package]. Version 0.1.3. Available at: https://CRAN.R-project.org/package=shapper (Accessed: 20 December 2025).
[15] Biecek, P. (2018) 'DALEX: Explainers for Complex Predictive Models in R', Journal of Machine Learning Research, 19(84), pp. 1–5.
| Downloads: | 26359 |
|---|---|
| Visits: | 776253 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Advances in Computer, Signals and Systems
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks

Download as PDF