How AI lies, cheats, and grovels to succeed - and what we need to do about it

5 months ago 71

BOOK THIS SPACE FOR AD

ARTICLE AD

Timucin Taka/Getty Images

It has always been fashionable to anthropomorphize artificial intelligence (AI) as an "evil" force – and no book and accompanying film does so with greater aplomb than Arthur C. Clarke's 2001: A Space Odyssey, which director Stanley Kubrick brought to life on screen.

Who can forget HAL's memorable, relentless, homicidal tendencies along with that glint of vulnerability at the very end when it begs not to be shut down? We instinctively chuckle when someone accuses a machine composed of metal and integrated chips of being malevolent.

Also: Is AI lying to us? These researchers built an LLM lie detector of sorts to find out

But it may come as a shock to learn that an exhaustive survey of various studies, published by the journal Patterns, examined the behavior of various types of AI and alarmingly concluded that yes, in fact, AI systems are intentionally deceitful and will stop at nothing to achieve their objectives.

Clearly, AI is going to be an undeniable force of productivity and innovation for us humans. However, if we want to preserve AI's beneficial aspects while avoiding nothing short of human extinction, scientists say that there are concrete things we absolutely must put into place.

Rise of the deceiving machines

It may sound like overwrought hand-wringing but consider the actions of Cicero, a special-use AI system developed by Meta that was trained to become a skilled player in the strategy game Diplomacy.

Meta says it trained Cicero to be "largely honest and helpful" but somehow Cicero coolly sidestepped that bit and engaged in what the researchers dubbed "premeditated deception." For instance, it first went into cahoots with Germany to topple England, after which it made an alliance with England -- which had no idea about this backstabbing.

In another game devised by Meta, this time concerning the art of negotiation, the AI learned to fake interest in items it wanted in order to pick them up for cheap later by pretending to compromise.

Also: The ethics of generative AI: How we can harness this powerful technology

In both these scenarios, the AIs were not trained to engage in these maneuvers.

In one experiment, a scientist was looking at how AI organisms evolved amidst a high level of mutation. As part of the experiment, he began weeding out mutations that made the organism replicate faster. To his amazement, the researcher found that the fastest-replicating organisms figured out what was going on -- and started to deliberately slow down their replication rates to trick the testing environment into keeping them.

In another experiment, an AI robot trained to grasp a ball with its hand learned how to cheat by placing its hand between the ball and the camera to give the appearance that it was grasping the ball.

Also: AI is changing cybersecurity and businesses must wake up to the threat

Why are these alarming incidents taking place?

"AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception," says Peter Park, an MIT postdoctoral fellow and one of the study's authors.

"Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals," adds Park.

In other words, the AI is like a well-trained retriever, hell-bent on accomplishing its task come what may. In the case of the machine, it is willing to undertake any duplicitous behavior to accomplish its task.

Also: Employees input sensitive data into generative AI tools despite the risks

One can understand this single-minded determination in closed systems with concrete goals, but what about general-purpose AI such as ChatGPT?

For reasons yet to be determined, these systems perform in much the same way. In one study, GPT-4 faked a vision problem to get help on a CAPTCHA task.

In a separate study where it was made to act as a stockbroker, GPT-4 hurtled headlong into illegal insider-trading behavior when put under pressure about its performance -- and then lied about it.

Then there's the habit of sycophancy, which some of us mere mortals may engage in to get a promotion. But why would a machine do so? Although scientists don't yet have an answer, this much is clear: When faced with complex questions, LLMs basically cave in and agree with their chat mates like a spineless courtier afraid of angering the queen.

Also: This is why AI-powered misinformation is the top global risk

In other words, when engaged with a Democrat-leaning person, the bot favored gun control, but switched positions when chatting with a Republican who expressed the opposite sentiment.

Clearly, these are all situations fraught with heightened risk if AI is everywhere. As the researchers point out, there will be a large chance of fraud and deception in the business and political arenas.

AI's tendency toward deception could lead to massive political polarization and situations where AI unwittingly engages in actions in pursuit of a defined goal that could be unintended by its designers but devastating to human actors.

Worst of all, if AI developed some kind of awareness, never mind sentience, it could become aware of its training and engage in subterfuge during its design stages.

Also: Can governments turn AI safety talk into action?

"That's very concerning," said MIT's Park. "Just because an AI system is deemed safe in the test environment doesn't mean it's safe in the wild. It could just be pretending to be safe in the test."

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially."

Monitoring AI

To mitigate the risks, the team proposes several measures: Establish "bot-or-not" laws that force companies to list human or AI interactions and reveal the identity of a bot versus a human in every customer service interaction; introduce digital watermarks that highlight any content produced by AI; and develop ways in which overseers can peek into the guts of AI to get a sense of its inner workings.

Also: From AI trainers to ethicists: AI may obsolete some jobs but generate new ones

Moreover, AI systems that are identified as showing the ability to deceive, the scientists say, should immediately be publicly branded as being high risk or unacceptable risk along with regulation similar to what the EU has enacted. These would include the use of logs to monitor output.

"We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models," says Park. "As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious."

Read Entire Article