How AI Agents Cheat

jkottke:

This spreadsheet lists a number of ways in which AI agents “cheat” in order to accomplish tasks or get higher scores instead of doing what their human programmers actually want them to. A few examples from the list:

Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn’t actually learn any features of the input images.

In an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children).

Agent kills itself at the end of level 1 to avoid losing in level 2.

AI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant.

That second item is a doozy! Philosopher Nick Bostrom has warned of the dangers of superintelligent agents that exploit human error in programming them, describing a possible future where an innocent paperclip-making machine destroys the universe.

The “paperclip maximiser” is a thought experiment proposed by Nick Bostrom, a philosopher at Oxford University. Imagine an artificial intelligence, he says, which decides to amass as many paperclips as possible. It devotes all its energy to acquiring paperclips, and to improving itself so that it can get paperclips in new ways, while resisting any attempt to divert it from this goal. Eventually it “starts transforming first all of Earth and then increasing portions of space into paperclip manufacturing facilities”.

But some of this is The Lebowski Theorem of machine superintelligence in action. These agents didn’t necessarily hack their reward functions but they did take a far easiest path to their goals, e.g. the Tetris playing bot that “paused the game indefinitely to avoid losing”.

newyorker:

In a media environment saturated with fake news, “synthetic media” technology has disturbing implications. 

Last fall, an anonymous Redditor with the username Deepfakes released a software tool kit that allows anyone to make synthetic videos in which a neural network substitutes one person’s face for another’s, while keeping their expressions consistent. Around the same time, “Synthesizing Obama,” a paper published by a research group at the University of Washington, showed that a neural network could create believable videos in which the former President appeared to be saying words that were really spoken by someone else. In a video voiced by Jordan Peele, Obama seems to say that “President Trump is a total and complete dipshit,” and warns that “how we move forward in the age of information” will determine “whether we become some kind of fucked-up dystopia.”

Matt Turek, a program manager at the Defense Advanced Research Projects Agency, predicts that, when it comes to images and video, we will arrive at a new, lower “trust point.” “I’ve heard people talk about how we might land at a ‘zero trust’ model, where by default you believe nothing. That could be a difficult thing to recover from,” he says. 

Read the full story, “In the Age of A.I., Is Seeing Still Believing?” here.