Robust Webcam-Based Gaze Tracking Using Double-Eye Convolutional Networks with Kalman Filtering and Clustered Calibration
DOI:
https://doi.org/10.61467/2007.1558.2026.v17i2.1304Keywords:
Gaze estimation, Eye tracking, Convolutional neural networks, Webcam-based systems, Human–Computer InteractionAbstract
Webcam-based gaze tracking has emerged as a cost-effective alternative to infrared eye-tracking systems; however, its robustness under real-world conditions remains limited, especially when monocular (single-eye) models are employed. This study offers a systematic comparison between single-eye (SET) and double-eye (DET) convolutional neural network architectures for appearance-based gaze estimation utilizing standard webcams. A lightweight CNN processes normalized eye-region images to predict gaze coordinates on the screen and is evaluated through two complementary protocols: (i) a static 9-point calibration and (ii) a dynamic trajectory-following task (“blue-ball”) that traverses screen edges and corners. To enhance generalization, the dataset incorporates controlled data augmentation—pose variation, illumination changes, blur/noise, and partial occlusions—and employs a participant-stratified experimental design. Robustness and temporal consistency are further improved by integrating a Kalman filter for trajectory smoothing and a DBSCAN-based clustering calibration stage that suppresses outliers and stabilizes gaze estimates. Performance metrics include pixel error, angular error, and trajectory stability. Under identical training and evaluation conditions, the DET model consistently surpasses the SET model, attaining lower spatial error and smoother trajectories, particularly under asymmetric lighting and partial occlusions. Utilizing only a consumer-grade webcam and a lightweight CNN, the proposed methodology delivers real-time performance without infrared sensors or proprietary licenses, thereby providing an accessible, reproducible, and robust solution for gaze estimation in human–computer interaction applications.
Smart citations: https://scite.ai/reports/10.61467/2007.1558.2026.v17i2.1304
Dimensions.
Open Alex.
References
Aguayo. (2025). Eye-tracking en la investigación UX. https://aguayo.co/es/blog-aguayo-experiencia-usuario/eye-tracking-investigacion-ux/
Ansari, M. A., Kasprowski, P., & Obaidat, M. S. (2021). Gaze tracking using an unmodified web camera and a convolutional neural network. Applied Sciences, 11(19), 9068. https://doi.org/10.3390/app11199068
Boesch, G. (2023). Deep residual networks (ResNet, ResNet-50): A complete guide. Viso.ai. https://viso.ai/deep-learning/resnet-residual-neural-network/
Bonnin, R. (2017). Machine learning for developers. Packt Publishing.
C3 AI. (2020). Root mean square error (RMSE). https://c3.ai/glossary/data-science/root-mean-square-error-rmse/
Daniel. (2021). Convolutional neural network: Definición y funcionamiento. DataScientest. https://datascientest.com/es/convolutional-neural-network-es
Dilmegani, C. (2025). 12+ data augmentation techniques for data-efficient ML. AIMultiple. https://research.aimultiple.com/data-augmentation-techniques/
Engati. (2023). Euclidean distance. https://www.engati.com/glossary/euclidean-distance
Evdokimov, D. (2024). How scientists use webcams to track human gaze. Frontiers for Young Minds, 12. https://doi.org/10.3389/frym.2024.1259404
Falch, L. (2024). Webcam-based gaze estimation for computer screen interaction. Frontiers in Robotics and AI, 11, 1369566. https://doi.org/10.3389/frobt.2024.1369566
Jain, A. (2024). A comprehensive guide to performance metrics in machine learning. Medium. https://medium.com/@abhishekjainindore24/a-comprehensive-guide-to-performance-metrics-in-machine-learning-4ae5bd8208ce
Kristensen, S. (2022). Eye tracking. iMotions. https://imotions.com/eye-tracking/
López, I. (2023). Nueva tecnología eye tracking. Instituto Oftalmológico Recoletas. https://iorecoletas.com/eye-tracking/
Lubinus Badillo, J., et al. (2021). Redes neuronales convolucionales: Un modelo de deep learning en imágenes diagnósticas. Revisión de tema. Revista Colombiana de Radiología, 32(3), 5591–5599. https://doi.org/10.53903/01212095.161
Murel, J., & Kavlakoglu, E. (2025). ¿Qué es el aumento de datos? IBM. https://www.ibm.com/mx-es/think/topics/data-augmentation
Novák, J., et al. (2024). Eye tracking, usability, and user experience: A systematic review. International Journal of Human-Computer Interaction, 40(17), 4484–4500. https://doi.org/10.1080/10447318.2023.2221600
Sauro, J. (2022). Essential eye-tracking visualizations and metrics. MeasuringU. https://measuringu.com/eye-tracking/
Vermeeren, A. P. O. S., Law, E. L.-C., Roto, V., Obrist, M., Hoonhout, J., & Väänänen-Vainio-Mattila, K. (2010). User experience evaluation methods: Current state and development needs. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction (NordiCHI 2010). Association for Computing Machinery.
Vidhya, K., et al. (2025). Real-time gaze estimation using webcam-based CNN models for human–computer interactions. Computers, 14(2), 57. https://doi.org/10.3390/computers14020057
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Combinatorial Optimization Problems and Informatics

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.