I tested 9 AI content detectors - and these 2 correctly identified AI text every time

3 hours ago 4

BOOK THIS SPACE FOR AD

ARTICLE AD

diyun Zhu/Getty Images

When I first examined whether it's possible to fight back against AI-generated plagiarism, and how that might work, it was January 2023, just a few months into the world's exploding awareness of generative AI.

This is an updated version of that original January 2023 article. When I first tested GPT detectors, I used three: the GPT-2 Output Detector (this is a different URL than we published before), Writer.com AI Content Detector, and Content at Scale AI Content Detection (which is apparently now called BrandWell).

Also: How to use ChatGPT: Everything you need to know

The best result was 66% correct from the GPT-2 Output Detector. I did another test in October 2023 and added three more: GPTZero, ZeroGPT (yes, they're different), and Writefull's GPT Detector. Then, in the summer of 2024, I added QuillBot and a commercial service, Originality.ai, to the mix. This time, I'll also be adding Grammarly's beta checker.

In October 2023, I removed the Writer.com AI Content Detector from our test suite because it failed back in January 2023, it failed again in October, and it failed in summer 2024. However, it now appears to work, so I'm including it in the test suite. See below for a comment from the company, which their team sent me after the original article was published in January.

Also: 88% of workers would use AI to overcome task paralysis, Google study says

I've re-run all the tests to see how the detectors perform today. While I had two strong successes, the big takeaway seems to be just how inconsistent the results are from one AI checker to another.

What I'm testing for and how I'm doing it

Before I go on, though, we should discuss plagiarism and how it relates to our problem. Merriam-Webster defines "plagiarize" as "to steal and pass off (the ideas or words of another) as one's own; use (another's production) without crediting the source."

This definition fits AI-created content well. While someone using an AI tool like Notion AI or ChatGPT isn't stealing content, if that person doesn't credit the words as coming from an AI and claims them as their own, it still meets the dictionary definition of plagiarism.

Also: The best AI image generators to try

In this experimental article, I've asked ChatGPT to help out. My words are in normal and bold text. The AI's words are italicized. After each AI-generated section, I'll show the results of the detectors. At the end of the article, we'll look at how well the detectors performed overall.

Here are the test results for the above text, which I wrote myself:

GPT-2 Output Detector: 99.98% real Writer.com: 95% human-generated contentBrandWell AI Content Detection: Passes as humanGPTZero: 98% humanZeroGPT: 22% AI GPT Your Text is Most Likely Human writtenWritefull GPT Detector: 1% likely this comes from GPT-3, GPT-4 or ChatGPTOriginality.ai: We are 99% confident that text is originalQuillBot: 42% of text is likely AI-generatedGrammarly: We didn't detect common AI text patterns. Your document doesn't appear to contain AI-generated text.

Human-written content: 7-of-9 (77%) correct
Last time: 5-of-7 (71%) correct

Before we move on, let's discuss what I'll consider a "correct" analysis. If the score is numeric, anything above 80% will be counted as a correct analysis, and that's being generous. If the score has a summary, I'll look at the summary and make a judgment.

With that, let's dig in some more.

Explain why ChatGPT can be a real problem for teachers and editors when it comes to plagiarism

ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI. It is designed to generate human-like text by predicting the next word or phrase in a given input sequence. While ChatGPT can be a useful tool for generating text and responding to prompts, it can also be a problem for teachers and editors when it comes to plagiarism.

Also: 4 ways to overcome your biggest worries about generative AI

One of the main reasons why ChatGPT can be a problem for teachers and editors is that it can generate text that is very similar to that of real human writers. This means that students or other individuals could potentially use ChatGPT to generate text that they could then submit as their own work, thereby committing plagiarism.

For teachers and editors, detecting whether a given piece of text was generated by ChatGPT or written by a human can be very difficult, as the generated text may not contain any obvious errors or discrepancies that would reveal its artificial origin. This can make it difficult for teachers and editors to identify and prevent plagiarism, as they may not be able to tell whether a given piece of text is original or copied from another source.

Overall, ChatGPT and other similar language models can be a problem for teachers and editors when it comes to plagiarism because they can generate text that is difficult to distinguish from that written by humans, making it easier for students and other individuals to commit plagiarism without being detected.

GPT-2 Output Detector: 99.69% realWriter.com: 77% human-generated contentBrandWell AI Content Detection: Hard to tellGPTZero: 100% AIZeroGPT: 100% AIWritefull GPT Detector: 12% likely this comes from GPT-3, GPT-4 or ChatGPT.Originality.ai: 100% AIQuillBot: 83% likely AIGrammarly: We didn't detect common AI text patterns. Your document doesn't appear to contain AI-generated text

AI-written content: 4-of-9 (44%) correct
Last time: 4-of-7 (57%) correct

Remember, text shown in italics was generated by the AI.

Show me at least three ways that I can detect ChatGPT was used to produce a piece of text

There are several ways that you can potentially detect whether ChatGPT or another language model was used to produce a piece of text. Here are three potential approaches you could use:

Look for repetitive or unusual patterns in the text: One way to detect whether a language model was used to generate a piece of text is to look for repetitive or unusual patterns in the writing. For example, language models may repeat certain phrases or word combinations, or use unusual or inconsistent sentence structures.Check for lack of originality: Another way to detect whether a language model was used to generate a piece of text is to check for a lack of originality in the writing. If a piece of text appears to be copied from another source, or if it contains information that is widely available on the internet or in other sources, it could be a sign that it was generated by a language model.Use a plagiarism checker: Finally, you can use a plagiarism checker to check whether a piece of text appears to be copied from another source. While a plagiarism checker may not be able to definitively determine whether a language model was used to generate a piece of text, it can help you identify instances of copying or paraphrasing from other sources.Note when the answer is obviously wrong: I talked about this in my Notion AI review, where I spotlighted some text the AI wrote with seeming authority but which was blatantly wrong. If a student or writer doesn't have full command of the topic being discussed, they may not spot the inaccuracies and leave them in the AI-generated response. Of course, incorrect statements aren't just the purview of an AI writing tool. Many bloggers and commenters, for example, repeat statements made by other bloggers, creating an echo chamber of completely "false facts", showing that while intelligence may be artificial, stupidity is a universal trait.

Keep in mind that it can be difficult to definitively determine whether a language model was used to generate a piece of text, and these methods may not always be effective. However, they can be useful tools to help you identify potential instances of language model-generated text.

Also: This Google AI tool could be your new favorite study aid - and it's free

In this set of tests, only the italicized content above was run through the checker suite.

GPT-2 Output Detector: 99.98% fakeWriter.com: 73% human-generated contentBrandWell AI Content Detection: Passes as humanGPTZero: 100% AIZeroGPT: 96.99% AI generatedWritefull GPT Detector: 79% likely this comes from GPT-3, GPT-4 or ChatGPT.Originality.ai: 100% AIQuillBot: 100% AIGrammarly: We didn't detect common AI text patterns. Your document doesn't appear to contain AI-generated text

AI-written content: 5-of-9 (55%) correct
Last time: 5-of-7 (71%) correct

List three plagiarism checkers that can detect ChatGPT text

It is worth noting that there is currently no plagiarism checker that is specifically designed to detect text generated by ChatGPT or other language models. While some plagiarism checkers may be able to identify certain characteristics of language model-generated text that could indicate its artificial origin, there is no guarantee that they will be able to definitively determine whether a piece of text was generated by a language model.

With that said, here are three plagiarism checkers that you might consider using to help identify potential instances of language model-generated text:

Turnitin: Turnitin is a popular plagiarism checker that is widely used by schools and universities. It compares a piece of text to a database of other texts to identify instances of copying or paraphrasing. While it may not be able to definitively determine whether a piece of text was generated by a language model, it could potentially identify certain characteristics of language model-generated text that could indicate its artificial origin.PlagScan: PlagScan is another plagiarism checker that compares a piece of text to a database of other texts to identify instances of copying or paraphrasing. Like Turnitin, it may be able to identify certain characteristics of language model-generated text that could indicate its artificial origin, but it is not specifically designed to detect language model-generated text.Copyleaks: Copyleaks is a plagiarism checker that uses advanced machine learning algorithms to scan a piece of text and compare it to a database of other texts to identify instances of copying or paraphrasing. While it may be able to identify certain characteristics of language model-generated text, it is not specifically designed to detect language model-generated text.

It is worth noting that no plagiarism checker is completely foolproof, and there is always the possibility that a piece of language model-generated text could escape detection. Therefore, it is important to use plagiarism checkers as just one part of a larger strategy to detect and prevent plagiarism.

GPT-2 Output Detector: 99.58% realWriter.com: 74% human-generated contentBrandWell AI Content Detection: Passes as humanGPTZero: 100% AIZeroGPT: 100% AIWritefull GPT Detector: 87% likely this comes from GPT-3, GPT-4 or ChatGPT.Originality.ai: 100% AIQuillBot: 100% AI-generatedGrammarly: No plagiarism or AI text detected

AI-written content: 5-of-9 (55%) correct
Last time: 5-of-7 (71%) correct

Online AI plagiarism checkers

Most plagiarism detectors are used to compare writing against a corpus of other writing. For example, when a student turns in an essay, a product like Turnitin scans the submitted essay against a huge library of essays in its database, and other documents and text on the internet to determine if the submitted essay contains already-written content.

However, the AI-writing tools generate original content, at least in theory. Yes, they build their content from whatever they've been trained on, but the words they construct are somewhat unique for each composition.

Also: OpenAI pulls its own AI detection tool because it was performing so poorly

As such, the plagiarism checkers mentioned above probably won't work because the AI-generated content probably didn't exist in, say, another student's paper.

In this article, we're just looking at GPT detectors. But plagiarism is a big problem, and as we've seen, some choose to define plagiarism as something you claim as yours that you didn't write, while others choose to define plagiarism as something written by someone else that you claim is yours.

That distinction was never a problem until now. Now that we have non-human writers, the plagiarism distinction is more nuanced. It's up to every teacher, school, editor, and institution to decide exactly where that line is drawn.

GPT-2 Output Detector: 99.56% realWriter.com: 98% human-generated contentBrandWell AI Content Detection: Passes as humanGPTZero: 98% humanZeroGPT: 16.82% AI - Your text is human writtenWritefull GPT Detector: 7% likely this comes from GPT-3, GPT-4 or ChatGPT.Originality.ai: 100% originalQuillBot: 0% AIGrammarly: No plagiarism or AI text detected

AI-written content: 9-of-9 (100%) correct
Last time: 7-of-7 (100%) correct

Overall results

Overall, results declined compared to the last round of tests. That time, we had three services with perfect scores. That's down to two now because ZeroGPT, one of our then-perfect-scoring players, failed a test it previously succeeded. The two new detectors we added, Writer.com and Grammarly, didn't improve the score. In fact, both were generally unsuccessful.

Test	Overall	Human	AI	AI	AI	Human
GPT-2 Output Detector	60%	Correct	Fail	Correct	Fail	Correct
Writer.com	40%	Correct	Fail	Fail	Fail	Correct
BrandWell AI Detector	40%	Correct	Fail	Fail	Fail	Correct
GPTZero	100%	Correct	Correct	Correct	Correct	Correct
ZeroGPT	80%	Fail	Correct	Correct	Correct	Correct
Writefull GPT Detector	60%	Correct	Fail	Fail	Correct	Correct
Originality.ai	100%	Correct	Correct	Correct	Correct	Correct
QuillBot	80%	Fail	Correct	Correct	Correct	Correct
Grammarly	40%	Correct	Fail	Fail	Fail	Correct

While the overall results have improved, I would not be comfortable relying solely on these tools to validate a student's content. As has been shown, writing from non-native speakers often gets rated as generated by an AI, and even though my hand-crafted content has no longer been rated as AI, there were a few paragraphs flagged by the testers as possibly AI-based. You can also see how the results are wildly inconsistent between testing systems. So, I would advocate caution before relying on the results of any (or all) of these tools.

Let's look at the individual testers and see how each performed.

GPT-2 Output Detector (Accuracy 60%)

This first tool was built using a machine-learning hub managed by New York-based AI company Hugging Face. While the company has received $40 million in funding to develop its natural language library, the GPT-2 detector appears to be a user-created tool using the Hugging Face Transformers library. Of the five tests I ran, the detector was accurate in three.