Pretrained Generative Transformer (PGT) for the estimation of the response variable using causal relationships with fast convergence

Authors

DOI:

https://doi.org/10.61467/2007.1558.2026.v17i2.1271

Keywords:

GPT, estimation, encoder, decoder, multilayer neural network, dependencies

Abstract

The transformer architecture has been used very successfully in translating text into various languages. Additionally, transformer architecture has been successfully adapted for use in vision systems and, more recently, for estimating remaining useful life across various industrial fields. The training patterns of an artificial neural network have no relationship among the input variables. They are used to estimate response variables using a training algorithm, such as backpropagation or its variants. In the present research, an accurate estimation transformer structure is proposed to develop causal relationships between input variables and their respective training patterns and identify the importance and relevance of causal input-output relationships. The complete encoder-decoder structure is used. Input training patterns are entered into the encoder and decoder. In this way, the input data is processed by three self-attention blocks: one in the encoder and two in the decoder, allowing the model to capture dependencies between them and strengthening the base of training patterns. The system's output represents the required estimate. One of the main research contributions is to add two linear transformations at the end of the structure to identify critical and relevant input variables and training patterns.  The results of a transformer structure using variable dependencies achieve fast convergence and excellent accuracy than those of a multilayer neural network trained on patterns that are unrelated to each other. Multiple simulations were carried out, and the proposed structure achieved 100% accuracy in estimates in only two epochs using the Levenberg-Marquardt algorithm, unlike the multilayer network. The architecture shown can be used across any production sector to estimate and identify the most significant variables.

 

Smart citations: https://scite.ai/reports/10.61467/2007.1558.2026.v17i2.1271
Dimensions.
Open Alex.

References

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate (arXiv:1409.0473). arXiv. https://arxiv.org/abs/1409.0473

Chen, D., Hong, W., & Zhou, X. (2022). Transformer network for remaining useful life prediction of lithium-ion batteries. IEEE Access, 10, 19621–19628. https://doi.org/10.1109/ACCESS.2022.3151975

Chen, Q., Meng, Z., Liu, X., Jin, Q., & Su, R. (2018). Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes, 9(6), 301. https://doi.org/10.3390/genes9060301

Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation (arXiv:1406.1078). arXiv. https://arxiv.org/abs/1406.1078

Choudhary, A., & Arora, A. (2024). Assessment of bidirectional transformer encoder model and attention-based bidirectional LSTM language models for fake news detection. Journal of Retailing and Consumer Services, 76, 103545. https://doi.org/10.1016/j.jretconser.2023.103545

Ding, Y., & Jia, M. (2021). A convolutional transformer architecture for remaining useful life estimation. In 2021 Global Reliability and Prognostics and Health Management (PHM-Nanjing) (pp. 1–7). IEEE. https://doi.org/10.1109/PHM-Nanjing52125.2021.9612814

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020). An image is worth 16×16 words: Transformers for image recognition at scale (arXiv:2010.11929). arXiv. https://arxiv.org/abs/2010.11929

Guo, F., Niu, H., & Li, M. (2024). Remaining useful life prediction of precision bearing based on multi-head attention mechanism. Journal of Physics: Conference Series, 2762(1), 012053. https://doi.org/10.1088/1742-6596/2762/1/012053

Huang, J., Liu, T., Zhan, Y., Chen, Z., Xiao, X., Wu, Q., Zheng, Y., Liu, R., & Su, Y. (2025). Prediction model of dam deformation based on attention mechanism. Journal of Physics: Conference Series, 3005(1), 012021. https://doi.org/10.1088/1742-6596/3005/1/012021

Hu, Q., Zhao, Y., & Ren, L. (2023). Novel transformer-based fusion models for aero-engine remaining useful life estimation. IEEE Access, 11, 52668–52685. https://doi.org/10.1109/ACCESS.2023.3277730

Liu, L., Song, X., & Zhou, Z. (2022). Aircraft engine remaining useful life estimation via a double attention-based data-driven architecture. Reliability Engineering & System Safety, 221, 108330. https://doi.org/10.1016/j.ress.2022.108330

Marinos, T., Markaki, M., Sarafidis, Y., Georgopoulou, E., & Mirasgedis, S. (2025). The economic effects of the green transition of the Greek economy: An input–output analysis. Energies, 18, 4177. https://doi.org/10.3390/en18154177

Mo, Y., Wu, Q., Li, X., & Huang, B. (2021). Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit. Journal of Intelligent Manufacturing, 32(7), 1997–2006. https://doi.org/10.1007/s10845-020-01698-4

Poczeta, K., & Papageorgiou, E. I. (2022). Energy use forecasting with the use of a nested structure based on fuzzy cognitive maps and artificial neural networks. Energies, 15, 7542. https://doi.org/10.3390/en15207542

Rahman, M. M., Rahman, S. M., Shafiullah, M., Hasan, M. A., Gazder, U., Al Mamun, A., Mansoor, U., Kashifi, M. T., Reshi, O., & Arifuzzaman, M. (2022). Energy demand of the road transport sector of Saudi Arabia—Application of a causality-based machine learning model to ensure sustainable environment. Sustainability, 14(23), 16064. https://doi.org/10.3390/su142316064

Rende, R., & Viteritti, L. (2025). Are queries and keys always relevant? A case study on transformer wave functions. Machine Learning: Science and Technology, 6, 010501. https://doi.org/10.1088/2632-2153/ada1a0

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 27, 3104–3112.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need (arXiv:1706.03762). arXiv. https://arxiv.org/abs/1706.03762

Wang, H.-K., Cheng, Y., & Song, K. (2021). Remaining useful life estimation of aircraft engines using a joint deep learning model based on TCNN and transformer. Computational Intelligence and Neuroscience, 2021, 5185938. https://doi.org/10.1155/2021/5185938

Wang, Y. H., Tie, L., Qi, L., Wang, F., & Wang, L. (2021). Tumor type prediction based on residual attention model. Journal of Physics: Conference Series, 1914(1), 012029. https://doi.org/10.1088/1742-6596/1914/1/012029

Zhang, H., & Shafiq, M. O. (2024). Survey of transformers and towards ensemble learning using transformers for natural language processing. Journal of Big Data, 11, 25. https://doi.org/10.1186/s40537-024-00878-9

Zhang, L., & Zeng, X. (2021). Research on transformer fault diagnosis based on genetic algorithm optimized neural network. Journal of Physics: Conference Series, 1848(1), 012004. https://doi.org/10.1088/1742-6596/1848/1/012004

Zhang, Z., Song, W., & Li, Q. (2022). Dual-aspect self-attention based on transformer for remaining useful life prediction. IEEE Transactions on Instrumentation and Measurement, 71, 1–11. https://doi.org/10.1109/TIM.2022.3160561

Zhou, T., Fu, C., Liu, Y., & Xiang, L. (2025). Groundwater level estimation using improved transformer model: A case study of the Yellow River Basin. Water, 17, 2318. https://doi.org/10.3390/w17152318

Downloads

Published

2026-02-16

How to Cite

Baeza-Serrato, R. (2026). Pretrained Generative Transformer (PGT) for the estimation of the response variable using causal relationships with fast convergence. International Journal of Combinatorial Optimization Problems and Informatics, 17(2), 256–274. https://doi.org/10.61467/2007.1558.2026.v17i2.1271

Issue

Section

Articles