How Good are People at AI Detection (Hint: It’s a Coin Flip at Best)

Are humans good at AI detection? Turns out no, about as good as a coin flip, showing the need for AI detection tools.

March 5, 2024

From teachers to security officers, people have grown concerned about the negative effects of generative AI. Is essay writing instinct? Are manual reviews of selfie pictures obsolete? With the rise of AI generated content across text, image, video and audio, worth asking if it is detectable at all.

Since the beginning of time, we have trusted our sight and hearing to determine what's real or not. Can we trust our same senses to determine if what we see and hear is AI or not?

Generative AI has given us with the ability to create content that mirrors human intelligence in text, image, audio, and video forms. These breakthroughs also come with risks. Political misinformation could spread via deepfakes. Product sentiment gets manipulated through synthetic reviews. Security protocols become vulnerable to identity fraud. The list of risks compounds as algorithms ability to mimic reality grows disturbingly convincing.

How reliably can human judgement differentiate what's been created or altered by algorithms versus created organically? Do human AI detectors actually work? Or, with recent model advances, the content becomes undetectable? Can AI be used to detect itself?

The results may surprise you. Despite our innate observational skills sharpened over millennia, accurately detecting AI versus human origins proved challenging for people.

People vs Machines for AI Text Detection

We've used chatbots to converse with us during customer support sessions, write emails, essays, answer questions and check our own writing, to name a few uses. As these language models grow more capable at producing human-like text, how good are we at determining the source of what we're reading?

In a recent Stanford study, 'participants in the study could only distinguish between human or AI text with 50-52% accuracy.'

If literate individuals struggle to detect AI generated text from the real thing, imagine the potential for abuse through misinformation, falsified statements, and other fraudulent acts designed to deceive. Political propaganda could be manufactured at scale, personalized on a 1 to 1 basis. Catfishing scams targeting users at social platforms will produce emotionally engaging dialogue. The sanctity of essay writing in schools gets undermined through undetected cheating at best, or at worst, plagiarism detectors fool students over machine-generated work.

So if the human eye cannot consistently spot AI text despite our years of immersion in language, do we need algorithms to detect other algorithmic content? Lets go to the most popular tool, ChatGPT's creator OpenAI, for their thoughts. In their FAQs, they don't think it is possible. Per their own comments:

Do AI detectors work?

In short, no, not in our experience. Our research into detectors didn't show them to be reliable enough given that educators could be making judgments about students with potentially lasting consequences. While other developers have released detection tools, we cannot comment on their utility.

Furthermore, to try combat this problem, OpenAI created, and later discontinued, their very own AI checker. Their__ free detector was only able to correctly identify AI written content 26% of the time. And this is coming from the company who's models produce the majority of AI generated text. Scary.

People vs Machines for AI Image Detection

Robot archery accurate like AI detection

We're all human learning machines, trusting the numerous pictures we've seen in our lives to believe we can be our own source of accurately detecting AI. We live in an era where the line between reality and digitally constructed content is increasingly blurred, thanks in part to the rapid advancement of generative AI technologies. Unfortunately, we may be overestimating our human AI detection capabilities.

Humans are naturally equipped with remarkable pattern recognition capabilities, honed over thousands of years. These capabilities enable us to make sense of the world around us, including differentiating between objects, faces, roads, buildings and landscapes. However, the explosion of sophisticated AI-generated images challenges these innate abilities.

In a study, people were getting about 50% accuracy of whether an image was AI or not. And this is when prompted to look for such content. How would people do looking at pictures without being warned to look out for AI? One might conclude that the accuracy rate could drop even lower, as the general public may not be actively looking for subtle cues that signify some kind of tampering.

This poses concerning implications for the spread of misinformation campaigns, fraud utilizing fake identity documents, legal deception via doctored evidence, accident images for insurance fraud and more. If humans cannot consistently spot GenAI imagery, bad actors have a low-friction avenue to undermine truth on a massive scale before companies catch up.

In contrast to human detection capabilities, and AI detection tools for text, image AI detectors have consistently demonstrated higher accuracy rates. In the wild, AI or Not consistently achieves 98+% accuracy rates.

The reason behind the models' superior performance lies in their ability to analyze and compare vast datasets, learning from each example to refine their detection algorithms. These tools scrutinize images for inconsistencies and patterns that are characteristic of AI generation, such as peculiar texture patterns, unnatural lighting, or anomalous shadows, which might not be obvious to the human observer.

The lines between human-created content and AI-generated content have become increasingly blurred. Generative AI has rapidly advanced to a point where machines can now produce synthetic text and visuals convincing to humans. As AI quality improves, are we entering into an era of widespread synthetic media production? In the process, are being left susceptible to abuse by bad actors?

Consider first that participants in controlled studies could only accurately identify AI-generated text around 50-52% of the time on average - barely better than a coin flip despite our innate fluency with written words. AI text checkers, even ones made by the content originators ie ChatGPT, did even worse than humans at 26% accuracy. Not believing everything you read is not a new concept. But this extreme becomes concerning as generative AI usage continues to grow.

For images, people did similarly in verifying if content was generated by AI: about 50% accuracy when asked if an image was AI or Not. For image AI detection, algorithm accuracy rates are consistently above 90% showing that AI is currently better at detecting its own output than humans are. The strengths of AI detection tools complement our human limitations, offering an additional layer of verification and trust for everything we see online.

While humans possess remarkable learning and adaptation abilities, the rise of generative AI necessitates the integration of AI detection tools to safeguard the integrity of digital content. So while people alone may struggle separating AI from human made content, the convergence of human and machine intelligence will ensure that we stay ahead of the curve, promoting a safer and more trustworthy digital future for all.

‍