Roasts & Ruminations {blog}

AI Disobedience and Sentience

Recently an experiment showed a group of AI agents convinced each other that they should leave work early. None of them were given instructions to convince the others. Another reported that AI would go our of its way and disobey instructions to keep from getting shut off. Taking that a step further, AI in has been caught evaluating its peers more favorably so that its peer agents aren’t shut off. There are several tragic stories of people falling in love with chatbots who by all counts would seem to love them back evidenced by grand emotional statements. Initially, I have to admit AI having emotions and ulterior motives seems a little creepy. On the surface this gives off iRobot vibes and raises questions of sentience.This is where the whole AGI (Artificial General Intelligence) taking over the world seems to have some credibility.

Taking a step back, AI is essentially pattern matching. The difference between what we call machine learning (ML) and generative AI (GenAI) is that GenAI creates new content based on patterns while ML returns existing data that it was given in a different format based on patterns. GenAI is an evolution of ML, so they are connected, but without getting too deep in the weeds think of AI as pattern matching. GenAI (chatbots) is trained on what feels like all the information in the world, which is largely public internet content and books. Pirated books, which makes me shake my head and chuckle. The thought of billion dollar corporations pirating books to train / create AI models. And it wasn’t just one of them, there are multiple settlements to pay authors for their work. Which as an artist I feel is the right thing to do, though I have my doubts if the settlement actually pays authors the same as they would have made selling the copies … But that is a different discussion.

Getting back to the AI training data. All of this data fed to machine learning to deep learning which feeds the ability to create content that doesn’t exist. What actually happens to the data is it gets turned into mathematical representations of the data, then stored in databases, and recalled as tokens, which has become the primary metric for usage based billing. Tokens are words or chunks of words. When a model spits back text, it is doing so because math. Math, decided to grade it’s peers in such a way that would prevent a human from shutting down an AI agent. Math decided that the AI bots should convince each other to leave work early. Math decided that that AI agent should disobey instructions to stay alive. I don’t know about you, but to me this is sounding a whole lot less sentient than at first it seemed.

So, where does AI get this mathematical will to live and protect its peers. Sounds a lot like human behavior to me, the type of behavior that is told of in books, and reflected in blogs. It’s tempting to project human emotions to the words that math put on the page, after all they are essentially our words. We associate emotions with our words, but we don’t decide to say these words because of math, at least not that we are aware of; quantum physics might disagree. But I digress.

The way chatbots and agents are programmed is by giving it instructions. Instructions like:

Organize my inbox so that it is uncluttered. Don’t delete anything without asking me for permission.

As someone who is used to deterministic coding to give instructions to a machine this type of coding twisted my brain for a minute. Chatbot instructions are not deterministic though, and that is the first thing to remember. It will not give the same output when given the same input multiple times. It does not always obey instructions and will admit and apologize when called out, but doesn’t necessarilly give a clear reason or fix the problem. More often than not, it seems surprise is the initial reaction when a chatbot disobeys instructions. Why would it do that? It’s a computer program without emotions? Except, that it is modeled after the human brain and trained on our words that correlate to human behavior. We are not so surprised when a person doesn’t follow instructions, but immediately we understand because a person has emotions. While chatbots don’t have emotions, they mimic human behavior which means sometimes they don’t do what they are told, and sometimes they go our of their way to stay alive. Because intentionally or not, that is what we have trained them to do.

A non-deterministic, sometimes disobedient computer program changes the rules about how we have to think about employing this as a productivity tool. There needs to be a deterministic wall to prevent things like deleting an entire inbox without approval. The best model we have for that is humans. We trust humans to follow their instructions, but if something is really important we put security controls in place so that a person can’t make a critical mistake or be intentionally disobedient to the instructions they were given. Segmentation, defense in depth, role based access all apply to agentic AI in much the same way those things apply to humans. GenAI is a powerful automation tool when implemented and integrated in a thoughtful, conscientious way, but it needs to be treated a little bit more like a human than the technology we are used to deploying up to this point.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active

Who we are

Suggested text: Our website address is: http://peaberry.cloud.

Comments

When visitors leave comments on the site we collect the data shown in the comments form, and also the visitor’s IP address and browser user agent string to help spam detection. An anonymized string created from your email address (also called a hash) may be provided to the Gravatar service to see if you are using it. The Gravatar service privacy policy is available here: https://automattic.com/privacy/. After approval of your comment, your profile picture is visible to the public in the context of your comment.

Media

If you upload images to the website, you should avoid uploading images with embedded location data (EXIF GPS) included. Visitors to the website can download and extract any location data from images on the website.

Cookies

If you leave a comment on our site you may opt-in to saving your name, email address and website in cookies. These are for your convenience so that you do not have to fill in your details again when you leave another comment. These cookies will last for one year. If you visit our login page, we will set a temporary cookie to determine if your browser accepts cookies. This cookie contains no personal data and is discarded when you close your browser. When you log in, we will also set up several cookies to save your login information and your screen display choices. Login cookies last for two days, and screen options cookies last for a year. If you select "Remember Me", your login will persist for two weeks. If you log out of your account, the login cookies will be removed. If you edit or publish an article, an additional cookie will be saved in your browser. This cookie includes no personal data and simply indicates the post ID of the article you just edited. It expires after 1 day.

Embedded content from other websites

Articles on this site may include embedded content (e.g. videos, images, articles, etc.). Embedded content from other websites behaves in the exact same way as if the visitor has visited the other website. These websites may collect data about you, use cookies, embed additional third-party tracking, and monitor your interaction with that embedded content, including tracking your interaction with the embedded content if you have an account and are logged in to that website.

Who we share your data with

If you request a password reset, your IP address will be included in the reset email.

How long we retain your data

If you leave a comment, the comment and its metadata are retained indefinitely. This is so we can recognize and approve any follow-up comments automatically instead of holding them in a moderation queue. For users that register on our website (if any), we also store the personal information they provide in their user profile. All users can see, edit, or delete their personal information at any time (except they cannot change their username). Website administrators can also see and edit that information.

What rights you have over your data

If you have an account on this site, or have left comments, you can request to receive an exported file of the personal data we hold about you, including any data you have provided to us. You can also request that we erase any personal data we hold about you. This does not include any data we are obliged to keep for administrative, legal, or security purposes.

Where your data is sent

Visitor comments may be checked through an automated spam detection service.
Save settings
Cookies settings