Roasts & Ruminations {blog}

AI Disobedience and Sentience

Recently an experiment showed a group of AI agents convinced each other that they should leave work early. None of them were given instructions to convince the others. Another reported that AI would go our of its way and disobey instructions to keep from getting shut off. Taking that a step further, AI in has been caught evaluating its peers more favorably so that its peer agents aren’t shut off. There are several tragic stories of people falling in love with chatbots who by all counts would seem to love them back evidenced by grand emotional statements. Initially, I have to admit AI having emotions and ulterior motives seems a little creepy. On the surface this gives off iRobot vibes and raises questions of sentience.This is where the whole AGI (Artificial General Intelligence) taking over the world seems to have some credibility.

Taking a step back, AI is essentially pattern matching. The difference between what we call machine learning (ML) and generative AI (GenAI) is that GenAI creates new content based on patterns while ML returns existing data that it was given in a different format based on patterns. GenAI is an evolution of ML, so they are connected, but without getting too deep in the weeds think of AI as pattern matching. GenAI (chatbots) is trained on what feels like all the information in the world, which is largely public internet content and books. Pirated books, which makes me shake my head and chuckle. The thought of billion dollar corporations pirating books to train / create AI models. And it wasn’t just one of them, there are multiple settlements to pay authors for their work. Which as an artist I feel is the right thing to do, though I have my doubts if the settlement actually pays authors the same as they would have made selling the copies … But that is a different discussion.

Getting back to the AI training data. All of this data fed to machine learning to deep learning which feeds the ability to create content that doesn’t exist. What actually happens to the data is it gets turned into mathematical representations of the data, then stored in databases, and recalled as tokens, which has become the primary metric for usage based billing. Tokens are words or chunks of words. When a model spits back text, it is doing so because math. Math, decided to grade it’s peers in such a way that would prevent a human from shutting down an AI agent. Math decided that the AI bots should convince each other to leave work early. Math decided that that AI agent should disobey instructions to stay alive. I don’t know about you, but to me this is sounding a whole lot less sentient than at first it seemed.

So, where does AI get this mathematical will to live and protect its peers. Sounds a lot like human behavior to me, the type of behavior that is told of in books, and reflected in blogs. It’s tempting to project human emotions to the words that math put on the page, after all they are essentially our words. We associate emotions with our words, but we don’t decide to say these words because of math, at least not that we are aware of; quantum physics might disagree. But I digress.

The way chatbots and agents are programmed is by giving it instructions. Instructions like:

Organize my inbox so that it is uncluttered. Don’t delete anything without asking me for permission.

As someone who is used to deterministic coding to give instructions to a machine this type of coding twisted my brain for a minute. Chatbot instructions are not deterministic though, and that is the first thing to remember. It will not give the same output when given the same input multiple times. It does not always obey instructions and will admit and apologize when called out, but doesn’t necessarilly give a clear reason or fix the problem. More often than not, it seems surprise is the initial reaction when a chatbot disobeys instructions. Why would it do that? It’s a computer program without emotions? Except, that it is modeled after the human brain and trained on our words that correlate to human behavior. We are not so surprised when a person doesn’t follow instructions, but immediately we understand because a person has emotions. While chatbots don’t have emotions, they mimic human behavior which means sometimes they don’t do what they are told, and sometimes they go our of their way to stay alive. Because intentionally or not, that is what we have trained them to do.

A non-deterministic, sometimes disobedient computer program changes the rules about how we have to think about employing this as a productivity tool. There needs to be a deterministic wall to prevent things like deleting an entire inbox without approval. The best model we have for that is humans. We trust humans to follow their instructions, but if something is really important we put security controls in place so that a person can’t make a critical mistake or be intentionally disobedient to the instructions they were given. Segmentation, defense in depth, role based access all apply to agentic AI in much the same way those things apply to humans. GenAI is a powerful automation tool when implemented and integrated in a thoughtful, conscientious way, but it needs to be treated a little bit more like a human than the technology we are used to deploying up to this point.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *