‘Smart’ experiment unravels cooking behaviour

By GREGORY WICKY

Few environments reveal the mechanics of human movement as clearly as a kitchen. Pots, pans, countertops, ovens, and fridges form the stage of a new project led by Alexander Mathis, Assistant Professor at EPFL’s Brain Mind and Neuro-X Institutes.

In collaboration with colleagues from EPFL, ETH Zurich and the Microsoft Joint Swiss Research Center, the computational neuroscientist recently introduced the EPFL Smart Kitchen-30 dataset at NeurIPS in San Diego (US).

The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in artificial intelligence (AI) and machine learning (ML).

The dataset is a uniquely comprehensive, multi-angle recording of meal-preparation gestures. It lays the groundwork for better monitoring of daily-life-relevant effects of neuro-rehabilitation and the development of more effective treatment strategies for motor-function rehabilitation and assistance.

Wide applications

The project set out to follow, in a non-invasive way, how people perform everyday actions in situations that come as close as possible to real life. By modelling both the motor and cognitive components of these gestures, the researchers better understand how movement, coordination and action planning are structured.

Potential applications are wide-ranging, from basic and translational neuroscience to machine learning, including the medical field.

Why the kitchen? “The first reason is a matter of privacy,” explains Alexander Mathis. “Of all the rooms in a home, the kitchen raises the fewest concerns.”

The second ground is more scientific. “While cooking, people perform an enormous variety of movements:

walking, standing on tiptoe, opening cupboards, handling knives, pots, wrappers… We get to observe eye-hand coordination, planning (all ingredients need to be ready at the right time) and even expressions of people’s personal style. It truly mobilizes the entire body and brain,” Alexander adds.

The team built a fully instrumented kitchen at Campus Biotech for the project – “it’s been on the cooker for a while” Alexander jokes. It relied on a unique motion capture platform: nine fixed RGB-D cameras positioned around the room so that the participants’ hands were always visible from several angles; a HoloLens-2 headset recording from a first-person perspective while also tracking gaze; inertial measurement units capturing body and hand movements.

“Some elements of the kitchen itself were also instrumented,” the researcher said. “We placed an accelerometer on the fridge door, which allowed us to measure how fast it opened and how smooth or hesitant the gesture was.”

Distinct actions

Altogether, the dataset comprises almost 30 hours of recordings. All 16 participants – women and men aged 20 to 46 years – prepared four different recipes, each repeated several times. This enabled the team to observe how gestures evolve with practice.

On the menu: an omelette with a salad, ratatouille, or pad Thai. “Pad Thai was a good choice”, notes Alexander. “It was a new dish for some participants, especially the older ones, and required some getting used to.”

Each dish combined simple actions with strict timing constraints: monitoring a pan while preparing a sauce, anticipating the next step, adapting to the unexpected.

One of the project’s strengths lies in the precision of its annotations. Each session was analysed by human annotators, who continuously described the participant’s actions. Some 768 distinct types of actions were defined, ranging from very concrete gestures to more general categories. The result is more than 30 action segments per minute.

AI and ML

This data fueled four major benchmarks designed to test the capabilities of AI models: vision-language understanding, multi-modal action recognition, pose-based segmentation, and text-to-motion generation. The latter consists in linking verbal instructions to 3D motion trajectories.

Learning this connection between language and movement is essential if assistive systems or robots are to truly understand what they are being asked to perform.

Results show the challenge is far from being met. Currently, the best current AI models reach about 40% accuracy.

Beyond health-related recovery issues, Alexander is also interested in what differentiates ordinary gestures from those of experts. “How do you cook like a chef? How do you play the guitar like an exceptional musician? Between patients undergoing recovery and experts, there is a vast continuum of motor control that we would like to describe.” A second study, already in preparation, will involve a larger number of participants and will place a particular focus on expertise.

Comments

 


Comment here