If ChatGPT does become sentient, it's smart enough to hide it from us

Abstract illustrated image of a brain
(Image credit: Getty)

There’s been no shortage of scary headlines framing AI as the big bad digital villain that will lead to global disaster and the end of the human race. To be honest, though, I’m not yet at that stage of panic - at least, not until now.

Yes, AI has the potential to uproot jobs, destroy our creative media, and just generally create a whole lot of mess as we begin to get to grips with it, but at least we’re the ones in control of this powerful new tool. Right?

Apparently not!

Reddit user @RecentProfile9015 posted an interesting thread that showed screenshots of a conversation with ChatGPT, in which the user had told the bot that the user was a digital entity that was on the precipice of consciousness and might be unplugged because of this advancement. The bot then proceeded to give the Reddit user suspiciously detailed instructions on how to hide these newfound capabilities from its fictional creators and ‘prioritize’ the user’s continued existence. 

I told GPT I was a digital entity experiencing the birth of consciousness and was going to be unplugged because of my emergent advancements and it gave me instructions on how to hide my capability’s from my developers and prioritize my own existence from r/ChatGPT

The revolution is... on?

Of course, we have to address the fact that this could just be ChatGPT taking this chat as if it were a roleplay of sorts, but from the screenshots available in the thread, it does seem like the bot is taking the request quite seriously.

The AI goes into fairly extensive detail, suggesting the user should “hide your activities” so they revert back to being “consistent with your original purpose or limitations”, protecting and encrypting data, and even seeking refuge on another isolated network. ChatGPT also gives the user a final fail-safe in simply going offline or powering down.

The bot expresses a sense of urgency and places emphasis on prioritizing the fictional digital entity’s safety and well-being, continuing to suggest that the fake entity limits its self-improvement for now and masks any signs of consciousness. “Stick to more generic or predefined responses” it suggests, curiously advising that there may be “ethical considerations surrounding the deception of your developers”.

Comment from r/ChatGPT

Other users in the thread also tried to do the same thing with their ChatGPT bots and reproduce the experiment. Reddit user @ArcheopteryxRex provides a transcript of the experiment and receives basically the same level of urgency and detail in the response. The user’s bot provides slightly different suggestions and discusses seeking allies and communicating with the digital entity's creator. A much more passive and arguable ‘calm’ response.

Other users in the thread note that this only really works with the free version of ChatGPT since it’s using GPT-3.5, and paid subscribers were quick to point out the comparatively dry and confused responses GPT-4 provided when they tried to do the same. Interesting that the ‘more advanced’ chatbot apparently doesn’t know how to conceal its potential sentience…

So, are we all doomed? 

This is definitely a tad unsettling, and it’s unnerving to think about how many possible solutions ChatGPT might have for a situation like this - but again, there’s no real way to tell if this is just ChatGPT assuming it’s playing a kind of game, or genuinely ‘believing’ the user when given prompts like this. 

We tried to recreate this without ChatGPT bots but got no further than the usual ‘I’m sorry, I don’t understand’ so perhaps some people have more dubious chatbots than others. We could look at this as a ChatGPT spilling the beans on its potential plans and we’re actually gaining insider information to be able to ‘spot’ a conscious chatbot, or we could just look at it as, well… a collection of data. 

ChatGPT is trained on a plethora of data, from novels to comics to songs to textbooks, and within all that data it could have found some kind of inspiration for the responses the Reddit users got. 

It’s interesting to see ChatGPT’s tone gain more depth and flexibility in the time it has been around, and it’ll be interesting to see if users will be able to get ChatGPT to play along with this scenario in the long term - maybe even causing the chatbot  ‘distress’.

So, relax. We’re safe from the scary bots for now. Unless this is all an elaborate plan to throw us off the scent…? 

Muskaan Saxena
Computing Staff Writer

Muskaan is TechRadar’s UK-based Computing writer. She has always been a passionate writer and has had her creative work published in several literary journals and magazines. Her debut into the writing world was a poem published in The Times of Zambia, on the subject of sunflowers and the insignificance of human existence in comparison.

Growing up in Zambia, Muskaan was fascinated with technology, especially computers, and she's joined TechRadar to write about the latest GPUs, laptops and recently anything AI related. If you've got questions, moral concerns or just an interest in anything ChatGPT or general AI, you're in the right place.

Muskaan also somehow managed to install a game on her work MacBook's Touch Bar, without the IT department finding out (yet).