9 detection engines

Metrics explained

Every flag CleanOutput raises is grounded in a measurable linguistic property. Here's exactly what we're measuring and why it matters.

METRIC 01

High-Risk Vocabulary

Lexical Leaks

Highlight color: ● Red underline

Max penalty: −25 points

Why AI vocabulary is detectable

LLMs are trained on internet text and reinforced with human feedback. This creates a statistical bias toward certain words and phrases that humans rate as "good writing" — words that sound sophisticated, professional, and authoritative. The problem is that models converge on the same set of words repeatedly.

CleanOutput maintains a curated dictionary of 200+ such terms, grouped into tiers:

Tier 1 (highest risk): "delve," "tapestry," "nuanced," "multifaceted," "leverage," "navigate" — these appear in AI output at dramatically higher rates than in human writing.
Tier 2 (elevated risk): Words like "comprehensive," "holistic," "robust," "seamless," "empower" — common in formal human writing but overused by AI to the point of being statistical flags.
Tier 3 (contextual): Phrases like "it is worth noting," "it goes without saying," "in the realm of" — formulaic constructions that feel natural in isolation but become obvious when clustered.

Each flagged word is offered 2–4 context-appropriate alternatives drawn from our alternatives dictionary. The alternatives are chosen to preserve meaning while breaking the AI statistical pattern.

Common examples

AI Word	Human Alternatives
`delve`	explore, dig into, look at, examine
`leverage`	use, apply, draw on, tap into
`multifaceted`	complex, layered, varied, many-sided
`tapestry`	mix, blend, combination, range
`synergy`	teamwork, cooperation, joint effect
`paradigm`	model, framework, approach, system

METRIC 02

Sentence Burstiness

Cadence / SD

Highlight color: ● Purple tint on paragraphs

Max penalty: −20 points

Burstiness: the rhythm fingerprint

Burstiness is the single most reliable statistical indicator of AI text. Human writers naturally vary their sentence length — a long, winding sentence followed by a short one. Then another long one. Three words. Then a paragraph-long thought that builds and builds before landing on its conclusion.

AI models, because they optimise token-by-token, tend to output sentences of strikingly uniform length — typically 18–26 words per sentence with very little deviation.

CleanOutput measures this as the standard deviation (SD) of sentence word-counts:

SD < 3: Extremely uniform — strong AI signal
SD 3–6: Mildly uniform — possible AI editing
SD 6–10: Natural variation — likely human
SD > 10: High burstiness — strong human signal

To improve your score here: deliberately mix short punchy sentences with longer ones. Use fragments occasionally. Let one sentence run long. Then cut. This is natural writing.

METRIC 03

Passive Voice Ratio

Grammatical

Highlight color: ● Blue underline

Max penalty: −15 points

Why AI defaults to passive voice

Passive voice ("the report was written," "the decision was made") is grammatically safer — it avoids assigning agency, which AI finds comfortable. It also sounds formal and authoritative, which training feedback often rewards.

CleanOutput detects passive constructions using eight regex patterns covering all common tense forms: simple present/past passive, perfect passive, modal passives (can be, will be, should be, must be, would be), and infinitive passives.

A passive voice ratio above 25% of sentences is flagged. Expert writing style guides recommend keeping passive voice below 15% for most prose genres. Academic writing may legitimately run higher — adjust your expectations accordingly.

The fix is almost always straightforward: identify the actor (who did the thing?) and make them the subject of the sentence.

Passive: "The analysis was conducted by the team."

Active: "The team conducted the analysis."

METRIC 04

Transition Chains

Perplexity

Highlight color: ● Amber underline

Max penalty: −10 points

Predictable transition patterns

AI writing is highly predictable in how it transitions between ideas. This is related to the linguistic concept of perplexity — how "surprising" each word is given the preceding context. AI outputs low-perplexity text; each word is exactly what you'd expect next.

Transition phrases are the most obvious manifestation: "Furthermore," "Moreover," "In conclusion," "It is worth noting," "Having said that," "In other words." These act as structural scaffolding that AI leans on heavily because they always work — they're never wrong, so they're always predicted as likely.

CleanOutput flags 35+ such transition phrases and cliché structural markers including:

Conclusion openers: "In conclusion," "To summarize," "In summary," "Overall"
Additive transitions: "Furthermore," "Moreover," "Additionally," "In addition"
Certainty markers: "It is clear that," "Without a doubt," "Needless to say"
Reformulation phrases: "In other words," "Put simply," "Simply put"
Cliché openers: "In today's world," "Throughout history," "Have you ever wondered"

METRIC 05

Semantic Redundancy

Hedging

Highlight color: ● Pink underline

Max penalty: −10 points

Redundancy and hedging language

AI writing is often semantically redundant — it says things twice in slightly different ways, or adds qualifiers that don't add information. This happens because the model is rewarded for thoroughness and completeness, so it hedges and doubles back.

Redundant phrases pair words where one already implies the other: "absolutely essential" (essential already means absolute), "past history" (history is already past), "completely eliminate" (eliminate already means completely), "unexpected surprise."

Hedging language adds qualifiers that weaken statements without purpose: "It is worth noting that," "One might argue," "It could be suggested," "In many ways," "For the most part." This over-qualification is an AI safety behavior leaking into prose.

The fix: cut the redundant word or drop the hedge entirely and state your point directly. Readers trust confident prose.

METRIC 06

Padded Verbs

Nominalisation

Highlight color: ● Pink underline

Nominalisations and padded verb phrases

Nominalisation is the habit of turning verbs into nouns and then adding a weak verb: "make a decision" instead of "decide," "conduct an investigation" instead of "investigate," "give consideration to" instead of "consider." This inflates word count without adding meaning.

AI uses these constructions because they appear frequently in formal writing that was part of its training data, and formal-sounding text is often rated more highly by human feedback. The result is bloated, indirect prose.

CleanOutput flags 20+ padded verb patterns and encourages direct substitution with the root verb. This almost always makes sentences shorter, stronger, and more human.

METRIC 07

Structural Uniformity

Paragraph Balance

Highlight color: ● Subtle purple tint

Max penalty: −10 points

Paragraph-level structural patterns

AI outputs tend to have eerily similar paragraph lengths — typically 60–100 words, three to five sentences each, always with an opening statement, supporting detail, and a mini-conclusion. This is not because the content demands it; it's because the model has learned a template and applies it consistently.

CleanOutput measures paragraph-level standard deviation. When all paragraphs are within 20 words of each other in length, that structural symmetry is flagged. Real writing has a two-line paragraph right after a fifteen-line one. It has orphan sentences and sprawling asides.

This metric also detects over-formatted hierarchical structure — bullet-point lists where prose would serve better, numbered sections in response to a question that didn't ask for a list, and artificially parallel structure throughout.

METRIC 08

Readability Score

Flesch-Kincaid

Bonus/neutral metric

Bonus: +3 at extremes

Flesch-Kincaid readability analysis

The Flesch Reading Ease formula scores text on a 0–100 scale based on average sentence length and average syllables per word. A score of 60–70 is standard for plain English; 30–50 is academic; 70–80 is conversational.

AI tends to cluster around 45–65 — the mid-range that its training data and RLHF rewards optimise for. Scores at the extremes are more likely to indicate genuine human style: either a deliberately simple, accessible writer or a dense academic one.

CleanOutput gives a small bonus for FK scores outside the 40–70 band, reflecting the statistical fact that AI rarely inhabits these extremes. This metric is informational — it doesn't penalise scores in the middle, only rewards clear outliers.

METRIC 09

Typography & Spelling

Character-Level

Highlight color: ● Amber underline

Character-level anomalies and spelling

AI sometimes introduces subtle character-level artifacts: smart quotes ("") vs. straight quotes (""), em dashes without surrounding spaces, non-breaking spaces, or Unicode lookalike characters. These are artifacts of how model outputs are post-processed and can cause issues in plain-text contexts.

CleanOutput also maintains a dictionary of the 25 most common English spelling errors (e.g. "recieve" → "receive," "seperate" → "separate," "definately" → "definitely"). While AI rarely makes spelling mistakes, human editing sessions sometimes introduce them.

Note on plagiarism: True plagiarism detection requires comparing your text against an indexed database of published content — this cannot run in a browser without transmitting your text to a server, which we refuse to do. CleanOutput does flag repeated phrases within a single document (self-plagiarism / repetition), but for external comparison, we recommend Copyscape or Grammarly's plagiarism tool.

Apply These Metrics to Your Text →