ChatGPT almost got me fired by mistake: Learn from my ordeal

(Image credit: Future)

The above image was generated by an online AI image generator tool. It was designed to reflect the disconnection between human and machine.

Things were starting to pick back up after the Christmas slump on what was an otherwise unassuming Monday in January when I received an alarming email from my editor.

ChatGPT had just begun to emerge as the next technology blockbuster, and everybody was excited about the possibilities of what it could do… and more importantly, what it could write.

The email was particularly concerning as my editor wanted to know whether I had used OpenAI’s large language model (LLM) to produce one of my articles.

To my horror, that article, according to an AI detector tool, had a high degree of supposed ‘fakeness’. Mission accepted. Let’s investigate some of the unintended consequences of the artificial intelligence that’s weaving its way into our everyday lives. I wanted to get to the bottom of this and understand what threat this may pose to us writers going forward. As part of this, I also wanted to investigate how tools like ChatGPT could affect other industries to determine what the future is for the LLM chatbot.

I took to the Internet to see whether anyone else had had a similar experience, but because we are still in the early days of widespread, publicly available artificial intelligence technology, no one had really written anything that indicated so. It was at this point that I put my investigative hat on and started copying and pasting extracts from numerous other articles online into the same tool that was used against my own.

In a short space of time, I had already found two other recent articles that were reportedly written by AI, even though I was pretty certain that they weren’t. I didn’t want the authors to catch wind of what I was doing, but I needed to ascertain that the content they had written was indeed genuine and human-produced. It was time to turn back the clock; what if I could find some ‘fake’ articles that predated ChatGPT’s public preview launch in November 2022?

I searched for older articles and applied the same method. In a short time, I had already found another pair that was reportedly ‘fake’. It’s worth noting, too, that the popular AI detector tool that I used requires a minimum of 50 tokens in order for its responses to be valid; any fewer and it’s unable to get an accurate enough reading. I took this and doubled it, seeking results with over 100 tokens (one of them had more than 200 tokens). Had I settled for the lower, recommended limit, I’m sure I would have found more. Similarly, had I spent more time copying and pasting articles into the tool, I would have uncovered even more ‘fake’ articles, but ultimately, seeing how many articles have been written by AI was not my goal. It was just a foundation for my work going forward.

Of the four articles that I’d found, three were considered to be fake with at least 99% certainty. The fourth was just a touch behind at over 98%.

The detector tool

I needed to get to the bottom of this, so I posed the question to Richard Ford, Chief Technology Officer at cybersecurity solutions company Praetorian, who has a quarter of a century of experience in offensive and defensive computer security solutions.

He explained that machine learning’s involvement in our workload is nothing new, and in fact, the “thin end of the wedge has been being driven in for some time” - including the humble spell checker he used to compose his email to me, and the grammar tool I’m using to help compose this piece.

On that note, as I sit here typing away at my computer, I wonder to myself whether content that’s been through an AI grammar checker would be considered to be more ‘fake’ than the first draft - although I suspect that’s really a discussion for another day.

I wanted to know why I had found a number of articles that were coming back as being fake, even though through my own human judgment and some easy dating information, I could tell they were genuine, human-written pieces.

Ford described AI detection as an “arms race” which will inevitably be getting more advanced over time, but for now, he said “my gut tells me that current detection isn’t great.” Ultimately, AI detectors aren’t highly skilled people sitting in a room determining the authenticity of content but are instead AI in their own right. From this, I could surmise that the long chain of non-human activity has several vulnerabilities, and the more AI we add to that chain, the less accurate it may become.

There has already been talk of AI tools applying watermarks to make it clear that content was not produced by a human being, however Ford said that for this to work, everyone would need to be “playing nicely in the sandbox”. It’s likely that there would have to be several standardized processes in place, and only the companies that adhere to them would be able to cooperate.

While I firmly believe that AI content should be clearly distinguished wherever it may appear, be it a digital watermark or a note on a physical copy, having now seen how other rival companies interoperate outside of the realms of AI, my expectations for this happening anytime soon are next to zero. Virtually every technology company thinks that it has found the way, generally leaving it unwilling to compromise and share technology resulting in several solutions to the same problem, none of which are interoperable.

For the time being, we’ll have to rely on our own, in-built detectors. I reached out to Robb Wilson, founder of OneReach.ai with over two decades’ experience unlocking hyperautomation. The company helps its customers (including Nike, DHL, and Unilever) design and deploy complex conversational applications.

Over email, Wilson noted how we’re already equipped with the tools and understanding to detect AI-written content ourselves, to a certain degree, and it’s just a matter of reframing what we already know. We can already identify scammers over the phone, which has served us well when picking out malicious actors over the Internet too in many cases. While that leaves some room for error, the same applies to determining what’s genuine content, and what’s not.

I think this is a really valid claim, but I’m concerned that we may soon be incapable of detecting artificially written content by ourselves. GPT-2, an earlier build model and the one that underpins today’s AI detector tools, was trained with 1.5 billion parameters. This may sound like a lot, but the next generation that forms what we know as ChatGPT, GPT-3, was trained with 175 billion parameters, and an upcoming generation that’s expected to launch in 2023 is rumored to be trained with 100 trillion parameters.

While many have called this multi-trillion figure a frankly ludicrous speculation (because that’s all it is for now), what is almost certain is that future models will utilize even more data and become even more effective. Back to my point: being able to call on so much data will make these tools very lifelike, and our rate of determining authenticity is likely to suffer as they get better.

Then again, something that Ford says really resonates, namely “knowing something had an AI-assist might not really be that useful data in practice”. If that’s the case, then what sort of implications could AI content creators have on my job, my industry, and so many others?

Problems with AI

“Whether we know it or not - whether we like it or not - AI is going to affect us all,” Wilson says, adding he believes that this technology has unlocked a whole host of ethical dilemmas, from dealing with the toxicity it produces and licensing its output, to its implications on free speech and how not to leave people behind as others get propelled forward in a vastly digital era.

As we design these tools, (because remember, it’s us that are designing them) - and before it’s too late - we need to be thinking deeply about the unintended outcomes. We need to factor into the development process a healthy relationship that sees humans remain in control.

Wilson says that while regulators and businesses will play an important role going forward, all people will need to speak up and act, “in order for AI to become the powerful ally it promises to be.”

Beerud Sheth, CEO of the business-oriented conversational messaging platform Gupshup, explains how conversational AI tools are “generative in nature” - while they can make up a coherent, plausible answer, it may not be all that accurate. In fact, he told me that insiders regularly (jokingly) refer to these sorts of models as “very good BSers”! He agrees that there’s still work to be done on training LLMs with the right data to fine-tune their responses.

Let’s throw it back to Google’s recent (and rather “unfortunate”) Bard demo video. A few months late to the party and several weeks after Microsoft had already started integrating AI into its own products following a multi-billion dollar investment into OpenAI, what Google did can only be described as rushed. The hasty attempt to show off Bard resulted in a product demo surfacing inaccurate information; Google’s market value plummeted and the company suffered unimaginable embarrassment, to say the least.

A Cornell University study done in partnership with OpenAI aimed to uncover how credible GPT-2 content was. The model that underpins many of the AI detector tools we’re accustomed to today was trained with 1.5 billion parameters, and subsequently awarded a credibility score of 6.91 out of 10.

This credibility is especially problematic when we consider how easy it is to fine-tune such models to our own tastes. The Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism focused on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. It found that it was possible to “generate synthetic propaganda for these ideologies”, which illustrates how easily and quickly people with bad intentions are able to share detrimental messages.

OpenAI also openly discussed in a 2019 post how “content-based detection of synthetic text is a long-term challenge” that needs to be combined with metadata-based approaches, human judgment, and public education. While this sounds perfect in theory, in practice it will require communication and interoperability between AI companies - that ‘playing in the sandbox’ analogy that’s unlikely to happen.

To me, this indicates that AI detector tools may forever be inaccurate, though they will likely continue to serve as a great starting point for many. Beyond this, it looks like human judgment is irreplaceable.

AI’s threat to our jobs

If there are still so many unanswerable questions about artificial intelligence, LLMs, and related conversational tools, then does that leave an element of threat to my role and numerous others going forward?

Ford’s prediction, for now at least, is that content would become devalued as the widespread adoption of AI gets into full swing. “If I can write something that’s 90% as good as your hand-crafted piece but for 10% of the effort, that seems like a pretty inevitable outcome.”

Sheth agrees, stating that a variety of industries with a heavy dependence on language, conversations, and communications, including journalism, customer support, legal, and education, will all be “disrupted” by AI tools - although this may not be as bad as is made out.

Nevertheless, by that illustration, the future looks somewhat concerning, but there remains some questions about its effects on SEO, marketing, journalism, and social media. For now, Ford says the repercussions are “anyone’s guess”.

For me, I can only hope that individuals will continue to value the human touches that give each and every story its own edge, character, and identity. I love Wilson’s ‘Age of Invisible Machines’ co-author, Josh Tyson’s view on this. As a writer, he sees ChatGPT as nothing more than a modern-day Wikipedia, a tool that’s been around almost since the start of the millennium.

What he means by this is that generative AI may be a great place for some professionals to start. They may want to consider using ChatGPT or any other alternatives to pose some questions, or to gauge interest on a particular topic. But to actually create a tangible, valuable piece of literature, writers will continue to be at the forefront as they verify the authenticity of information and apply their own human touches.

However, he continues to theorize how a publication could eventually feed LLMs entire back catalogs of their own, as well as train it with the required technical knowledge, to produce content in their own style.

I feel like this journey is a pendulum: I’m faced with a disturbing outlook for the future of my livelihood, which is written off by an opposing thought. It seems like that pendulum is still swinging, because while algorithms could be employed to find stories of public interest, a reporter’s instincts, judgments, and creativity would always be essential.

The future of AI

Then again, what if GPT’s future isn’t in literature? Wilson said that, while the headlines have enjoyed talking about how ChatGPT can produce anything from poetry to legal advice and even code in just seconds, its real power has been somewhat obscured. What it has done, so far, though, is demonstrate just how willing and eager people are to use it.

From what I can see, so far ChatGPT has primarily been used for producing content, be it for written content or other types that go on to form the basis for videos, podcasts, and other media.

The next step for these types of conversational tools, Wilson told me, would be a whole lot more personal, and actually pretty useful. This is where I bring that ‘disruption’ back up from a few paragraphs ago, because I’m not entirely convinced that it means our jobs are entirely at risk. Wilson reckons that checking your flight status, calling your ride to the airport, and texting your colleagues with your ETA, is all to come in the coming years, and I’d be pleased if we’re heading in that direction.

Rather than take existing jobs, by that outlook, conversational AI tools could serve more as personal assistants that plug the numerous gaps in our lives that aren’t already occupied with human jobs. Such utility from a chatbot, though, will require ecosystems to sequence the associated data and technology, and that’s the work of 2023 and beyond.

I refer to my previous point, too, that companies will have to be willing to share data, and we know how that goes. The people will absolutely love these reimagined personal assistant-like tools, but the chances of them all working together are actually fairly limited. Instead, we might see a number of different chatbots working with different companies. Still, if we could run an entire sequence like the one above without having to log in to three or four different apps, I think we could be on to a winner.

Protection for people

In the past few months, we’ve seen how an alarmingly capable piece of tech can go from zero to hero in a really short space of time. If there’s one thing we know about the human race, it’s that we’re designed to find the most efficient (emphasis: easy) way of doing something. Chances are that this year, we’re going to see AI really take off, whether that’s for text creation, audio, or visual, and we’re all going to be using it more noticeably in our lives.

Along with it, we’ve seen a growing number of negative use cases, such as AI’s ability to write malicious code that could be used in cyberattacks. I asked Sheth whether general members of the public should be protected, or limited, from such powerful tools.

“Yes, indeed”, he said. But how this should be done isn’t so clear. He indicated that entire communities should come together and discuss how to move forward, with voices representing AI developers, regulators, and the broader society in general. All of this “before the genie gets out of the bottle.”

This has me very concerned, which leaves me with no other option than to bring up for what seems like the hundredth time the fact that companies and other bodies will need to cooperate moving forward. Easily done, so you’d think, but the reality is that many companies are just unwilling to come together.

Conclusion

During my investigation, I’ve posed several questions and answered as many as thoroughly as I could, however there’s one that remains unanswered. That is: how is genuine content being flagged as ‘fake’? The precise answer behind this is not really known, and unless we get access to the billions of data that form LLMs, we’ll never know.

However, moving forward, I have become increasingly aware of how inaccurate artificial intelligence can be and the huge need for human intervention. Frankly, this is good news, because the threat to humans and our jobs remains virtually nil at the moment, and the truth of the matter is that I’m really excited to see AI have its year this year and make our lives even easier.

I don’t have huge expectations, because I know that companies will never play in that sandbox together, but even if there are several strains of very similar chatbots, chances are we’ll be able to get through our lives even easier giving us more time to do what we want, whether that’s work, spend time with the family, or get out more.

TOPICS

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!