The purpose of these proceedings is to report on developments of RL-based cavity locking strategy carried out after the presentation given at the MODE workshop 2025. Reflecting on discussions held during the workshop, we examine how concepts such as partial observability and domain randomization critically impact the training process and the generalization of control policies in deep reinforcement learning. We keep the focus on the topic of our work, concerning reinforcement learning techniques for lock acquisition optimization in gravitational wave detection layout. Future directions are outlined to improve robustness and real-world applicability, including domain-randomized parameters and memory based architectures for addressing partial observability.

