Comparative Study of Recognition for Student Attention Analysis: YOLO-Based Face Detection in Classroom Environments
H. Ogawa*,
A. Sang-ngenchai and
M. Nakazawa*: corresponding author
Published on:
October 20, 2025
Abstract
This study investigates the comparative performance of three YOLO (You Only Look Once) face detection models YOLOv5, YOLOv7, and YOLOv11 for real-time recognition and analysis of student attention in classroom environments. Using a dual-camera setup in a Japanese middle school classroom, this experimental methodology incorporated two strategically positioned high-resolution cameras providing complementary classroom perspectives. Despite their optimal placement, inherent challenges persisted, particularly involving the detection and accurate analysis of students situated farther from the camera, characterized by compromised visibility, diminished image clarity, and partial occlusions. To enhance detection speed and accuracy within these constraints, a custom dataset comprising labeled images of students from the participating school was utilized, specifically annotated around the head and upper body regions. Evaluation criteria included detection accuracy, computational speed, algorithmic robustness under varying environmental conditions (e.g., fluctuations in lighting and occlusion scenarios), and performance reliability for partially obscured or distant faces. In our evaluations YOLOv11 offered the best balance between accuracy and efficiency, achieving high detection performance (mAP@0.5 = 0.977, mAP@0.5:0.95 = 0.91) (mAP = mean Average Precision) with a real-time capable inference speed of 75 fps (frames per second), finding YOLOv11 to have good performance for multimodal detection using CiRA Core. Furthermore, this paper explores the integration of face detection systems within classroom management frameworks, notably Positive Behavior Support (PBS), while critically addressing ethical considerations and privacy implications associated with their deployment. Future research will expand on these findings by integrating body position detection methodologies, thus achieving a more comprehensive analysis of classroom attention dynamics and advancing overall system efficacy and reliability.
DOI: https://doi.org/10.22323/1.488.0019
How to cite
Metadata are provided both in
article format (very
similar to INSPIRE)
as this helps creating very compact bibliographies which
can be beneficial to authors and readers, and in
proceeding format which
is more detailed and complete.