Person Identification in the FHNW Social Robot Project

As part of an R&D project funded by Innosuisse, I worked on the development and integration of a humanoid social robot for use in the care sector. The aim of the project is to relieve nursing staff of non-nursing tasks while improving the quality of life of people in need of care through social interaction. Given the shortage of staff and high workload in the care sector, the project addresses a significant social and economic potential.

A central technical focus of my work was image-based person identification for social robotics applications. For this purpose, I implemented a standalone software module based on ROS 2 and PyTorch, deployed on an NVIDIA Jetson platform. One limitation of my initial implementation was that the robot could only identify people by their faces, which made it difficult to search for people in real-life situations. For this reason, a deep learning model with angular margin loss was additionally fine-tuned to learn robust full-body representations for short-term re-identification of people when they are facing away or have their faces obscured. The goal was not classic face recognition under laboratory conditions, but reliable person re-identification in real care environments with changing perspectives, distances, and partial occlusions. In addition, I used facial keypoint models to estimate the orientation of faces. This allowed for the targeted capture of different angles and poses and the collection of diverse, semantically different representations of individual people. This compensated for variations in the appearance of people that a face encoder could not interpret correctly. This led to stable recognition over time.

To improve the robot’s visual perception, I implemented and integrated YOLO-based object recognition methods into the existing ROS-based system architecture. The goal was to reliably detect relevant objects in indoor environments in real time. This provided the robot system with semantic context information for navigation, interaction, and task planning. The developed modules were evaluated both in the Gazebo simulation and on the real robot system to ensure consistency between simulated and real perception.

To manage the personal and identification data collected, I expanded an existing PostgreSQL/PostGIS database to serve as the central database for the perception system. The focus was not purely on data storage, but on creating a comprehensible, expandable data model for visual identities. Among other things, personal identities and associated feature vectors were stored, as well as contextual metadata such as recording angles, poses, and timestamps. This structure formed the basis for systematic analysis, targeted retraining of the models, and a well-founded evaluation of recognition performance over time and under varying operating conditions.