Self-doubting robots could integrate more easily into society

There are a lot of robots in the world. Mostly they live in factories, but they're increasingly making their way into everyday life - and there are some big decisions on the horizon about how we'll safeguard certain parts of our way of life as they do so.

Now, a team of roboticists at the University of California, Berkeley, has developed a simulation that indicates that adding some self-doubt to robots could help them better integrate into society. The big question is: how much?

"It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off", the researchers wrote in a paper published on the pre-print arXiv server.

"As the capabilities of AI systems improve, it is important to ensure that such systems do not adopt subgoals that prevent a human from switching them off. Our goal is to study the incentives an agent has to allow itself to be switched off." 

The simulation they created depicts an interaction between a robot and a human with an adjustable level of self-confidence for the robot, which has a built in off-switch.

The wishes of the human

In one simulation, the robot was asked to perform a task, and a human was then given the option of allowing it to continue or hitting the off switch. But the robot also had the ability to override its own off-switch and therefore the wishes of the human.

As you can probably expect, robots that had a lot of self-confidence turned themselves right back on. When they were given just a little confidence, however, the robot stayed off - even if it judged that it was doing a good job.

"Our analysis suggests that agents with uncertainty about their utility function have incentives to accept or seek out human oversight. Thus, systems with uncertainty about their utility function are a promising area for research on the design of safe AI systems," the researchers wrote.

They added: "This is far from the end of the story. In future work, we plan to explore incentives to defer to the human in a sequential setting and explore the impacts of model misspecification."