Systematic Literature Review on the Use of Cluster Analysis in Software Design Activities

Ángel J. Sánchez García; Álvaro Barradas Fernández; Oscar Alonso Ramírez; Xavier Limón

doi:10.61467/2007.1558.2026.v17i2.1262

Authors

Ángel J. Sánchez García Facultad de Estadística e Informática, Universidad Veracruzana https://orcid.org/0000-0002-2917-2960
Álvaro Barradas Fernández Facultad de Estadística e Informática, Universidad Veracruzana https://orcid.org/0009-0004-3052-3600
Oscar Alonso Ramírez Facultad de Estadística e Informática, Universidad Veracruzana https://orcid.org/0009-0007-6476-5781
Xavier Limón Facultad de Estadística e Informática, Universidad Veracruzana https://orcid.org/0000-0003-4654-636X

DOI:

https://doi.org/10.61467/2007.1558.2026.v17i2.1262

Keywords:

Cluster Analysis, Unsupervised Machine Learning, Software Design, Systematic Literature Review, K-means, Empirical Studies

Abstract

Cluster analysis is an unsupervised machine learning approach that groups data into homogeneous categories without the need for predefined labels. Although it was not originally developed for software engineering, this technique has increasingly been applied to support various activities in the software design phase. However, information about its use remains scattered across different studies. To address this gap, this work presents a systematic literature review synthesizing the state of the art on the application of cluster analysis in software design. Following a rigorous selection process, 14 primary studies published between 2019 and 2025 were identified from four digital libraries: IEEE Xplore, ACM Digital Library, Springer Link, and ScienceDirect. This review highlights the contexts in which clustering has been applied, emphasizing its predominant role in class decomposition tasks and the frequent adoption of the K-means algorithm, while also documenting the algorithms and tools used during design activities. Furthermore, the analysis discusses the benefits and challenges of adopting cluster analysis in this stage of development. The findings provide software engineering researchers and practitioners with a consolidated overview of the role of cluster analysis in software design, offering insights into its potential, limitations, and directions for future research and practice.

Smart citations: https://scite.ai/reports/10.61467/2007.1558.2026.v17i2.1262
Dimensions.
Open Alex.

References

Sommerville, I. (2011). Software engineering (9ª ed.). Pearson Education/Addison-Wesley.

ISO/IEC/IEEE. (2017). Systems and software engineering — Software life cycle processes (ISO/IEC/IEEE 12207:2017).

Elmasry, I., Wassif, K., & Bayomi, H. (2021). Extracting software design from text: A machine learning approach. In Proceedings of the 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS) (pp. 486–492). IEEE.

Chancheaw, S., & Prompoon, N. (2015). Consistency verification between software design models and user interface design based on components relationships. In Proceedings of the 2015 2nd International Conference on Information Science and Security (ICISS) (pp. 1–5). IEEE.

Sharma, T. (2012). Quantifying quality of software design to measure the impact of refactoring. In Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference Workshops (pp. 266–271). IEEE.

Kurzweil, R. (2012). How to create a mind: The secret of human thought revealed. Penguin.

Aguilar, A.R. Ocharán-Hernández J.O., and Sánchez-García, A.J. (2020). A systematic mapping study of artificial intelligence in software requirements. Res. Comput. Sci., vol. 149, no. 11, pp. 179–188.

Robles-Aguilar, A., Ocharán-Hernández, J. O., Sánchez-García, A. J., & Limón, X. (2021). Software design and artificial intelligence: A systematic mapping study. In Proceedings of the 2021 9th International Conference in Software Engineering Research and Innovation (CONISOFT) (pp. 132–141). IEEE.

Bhandari, K., Kumar, K., & Sangal, A. L. (2023). Artificial intelligence in software engineering: Perspectives and challenges. In Proceedings of the 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC) (pp. 133–137). IEEE.

Khomh, F., Adams, B., Cheng, J., Fokaefs, M., & Antoniol, G. (2018). Software engineering for machine-learning applications: The road ahead. IEEE Software, 35(5), 81–84.

Simon, P. (2013). Too big to ignore: The business case for big data. John Wiley & Sons.

Gupta, V., Mishra, V. K., Singhal, P., & Kumar, A. (2022). An overview of supervised machine learning algorithms. En Proceedings of the 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART) (pp. 87–92). IEEE.

Chen, H., & Liu, C. (2013). Research and application of cluster analysis algorithm. In Proceedings of the 2013 2nd International Conference on Measurement, Information and Control (Vol. 1, pp. 575–579). IEEE.

Juziuk, J., Weyns, D., & Holvoet, T. (2014). Design patterns for multi-agent systems: A systematic literature review. En Agent-Oriented Software Engineering (pp. 79–99). Springer.

Batarseh, F. A., Mohod, R., Kumar, A., & Bui, J. (2020). The application of artificial intelligence in software engineering: A review challenging conventional wisdom. En Data democracy (pp. 179–232). Elsevier.

Chaudhry, M., Shafi, I., Mahnoor, M., Vargas, D. L. R., Thompson, E. B., & Ashraf, I. (2023). A systematic literature review on identifying patterns using unsupervised clustering algorithms: A data mining perspective. Symmetry, 15(9), 1679.

Mercader-Olivares, I. E., Martínez-Moreno, P., Vergara-Camacho, J. A., Sánchez-García, A. J., & Ocharán-Hernández, J. O. (2024). Cluster analysis in the identification of patterns in software development with agile methodologies: A systematic literature review. In Proceedings of the 2024 12th International Conference in Software Engineering Research and Innovation (CONISOFT) (pp. 147–155). IEEE.

Kitchenham, B. A., Budgen, D., & Brereton, P. (2015). Evidence-based software engineering and systematic reviews. CRC Press.

Zhang, H., Babar, M. A., & Tell, P. (2011). Identifying relevant studies in software engineering. Information and Software Technology, 53(6), 625–637.

Popay, J., Roberts, H., Sowden, A., Petticrew, M., Arai, L., Rodgers, M., Britten, N., Roen, K., & Duffy, S. (2006). Guidance on the conduct of narrative synthesis in systematic reviews. ESRC Methods Programme.

Teymourian, N., Izadkhah, H., & Isazadeh, A. (2022). A fast clustering algorithm for modularization of large-scale software systems. IEEE Transactions on Software Engineering, 48(4), 1451–1462.

Mao, Y., Li, B., Zhu, X., & Ma, J. (2023). Software design based on k-means algorithm and artificial intelligence technology. In Proceedings of the 2023 International Conference on Mobile Internet, Cloud Computing and Information Security (MICCIS) (pp. 200–203). IEEE.

Arora, D., Kumar, U., Jain, S., & Gupta, A. (2019). UML modeling for preserving sensitive information based on k-means clustering approach. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 110–117). IEEE.

Hammad, M., Hassan, A. E., & Hamdi, M. (2021). Automatic class decomposition using clustering. En Proceedings of the 2021 IEEE 18th International Conference on Software Architecture Companion (ICSA-C) (pp. 78–81). IEEE.

Parthasarathy, S., Bagavathilaksmi, R., Rajkumar, P., & Devi, S. (2023). Exploring research opportunities to apply data mining techniques in software engineering lifecycle. En Proceedings of the 2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI) (pp. 1–5). IEEE.

Khedmatkon, F., Hasheminejad, S. M. H., & Malak, J. S. (2024). Automated software design using machine learning with natural language processing. En Proceedings of the 2024 14th International Conference on Computer and Knowledge Engineering (ICCKE) (pp. 25–30). IEEE.

Cañizares, P. C., López-Morales, J. M., Pérez-Soler, S., Guerra, E., & de Lara, J. (2024). Measuring and clustering heterogeneous chatbot designs. ACM Transactions on Software Engineering and Methodology, 33(4), 1–43.

dos Santos Ferreira, M. V., Rios, R., & Rios, T. N. (2022). SCI-FTS: Using soft clustering on intrinsic mode functions to model fuzzy time series. Software Impacts, 11, 100230.

Esteves, G., Figueiredo, E., Veloso, A., Viggiato, M., & Ziviani, N. (2020). Understanding machine learning software defect predictions. Automated Software Engineering, 27(3), 369–392.

Johnson, F., Oluwatobi, O., Folorunso, O., Ojumu, A. V., & Quadri, A. (2023). Optimized ensemble machine learning model for software bugs prediction. Innovations in Systems and Software Engineering, 19(1), 91–101.

Barenkamp, M., Rebstadt, J., & Thomas, O. (2020). Applications of AI in classical software engineering. AI Perspectives, 2(1), 1.

Sozer, H. (2019). Evaluating the effectiveness of multi-level greedy modularity clustering for software architecture recovery. En Software Architecture – ECSA 2019 (pp. 71–87). Springer.

Rahman, M. S., Khomh, F., Hamidi, A., Cheng, J., Antoniol, G., & Washizaki, H. (2023). Machine learning application development: Practitioners’ insights. Software Quality Journal, 31(4), 1065–1119.

Elyasi, M., Simitcioglu, M. E., Saydemir, A., Ekici, A., Ozener, O. O., & Sozer, H. (2023). Genetic algorithms and heuristics hybridized for software architecture recovery. Automated Software Engineering, 30(2), 19.

Systematic Literature Review on the Use of Cluster Analysis in Software Design Activities

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

Current Issue