Metrics explained
Every flag CleanOutput raises is grounded in a measurable linguistic property. Here's exactly what we're measuring and why it matters.
High-Risk Vocabulary
Highlight color: ● Red underline
Max penalty: −25 points
Why AI vocabulary is detectable
LLMs are trained on internet text and reinforced with human feedback. This creates a statistical bias toward certain words and phrases that humans rate as "good writing" — words that sound sophisticated, professional, and authoritative. The problem is that models converge on the same set of words repeatedly.
CleanOutput maintains a curated dictionary of 200+ such terms, grouped into tiers:
- Tier 1 (highest risk): "delve," "tapestry," "nuanced," "multifaceted," "leverage," "navigate" — these appear in AI output at dramatically higher rates than in human writing.
- Tier 2 (elevated risk): Words like "comprehensive," "holistic," "robust," "seamless," "empower" — common in formal human writing but overused by AI to the point of being statistical flags.
- Tier 3 (contextual): Phrases like "it is worth noting," "it goes without saying," "in the realm of" — formulaic constructions that feel natural in isolation but become obvious when clustered.
Each flagged word is offered 2–4 context-appropriate alternatives drawn from our alternatives dictionary. The alternatives are chosen to preserve meaning while breaking the AI statistical pattern.
Common examples
| AI Word | Human Alternatives |
|---|---|
delve | explore, dig into, look at, examine |
leverage | use, apply, draw on, tap into |
multifaceted | complex, layered, varied, many-sided |
tapestry | mix, blend, combination, range |
synergy | teamwork, cooperation, joint effect |
paradigm | model, framework, approach, system |
Sentence Burstiness
Highlight color: ● Purple tint on paragraphs
Max penalty: −20 points
Burstiness: the rhythm fingerprint
Burstiness is the single most reliable statistical indicator of AI text. Human writers naturally vary their sentence length — a long, winding sentence followed by a short one. Then another long one. Three words. Then a paragraph-long thought that builds and builds before landing on its conclusion.
AI models, because they optimise token-by-token, tend to output sentences of strikingly uniform length — typically 18–26 words per sentence with very little deviation.
CleanOutput measures this as the standard deviation (SD) of sentence word-counts:
- SD < 3: Extremely uniform — strong AI signal
- SD 3–6: Mildly uniform — possible AI editing
- SD 6–10: Natural variation — likely human
- SD > 10: High burstiness — strong human signal
To improve your score here: deliberately mix short punchy sentences with longer ones. Use fragments occasionally. Let one sentence run long. Then cut. This is natural writing.
Passive Voice Ratio
Highlight color: ● Blue underline
Max penalty: −15 points
Why AI defaults to passive voice
Passive voice ("the report was written," "the decision was made") is grammatically safer — it avoids assigning agency, which AI finds comfortable. It also sounds formal and authoritative, which training feedback often rewards.
CleanOutput detects passive constructions using eight regex patterns covering all common tense forms: simple present/past passive, perfect passive, modal passives (can be, will be, should be, must be, would be), and infinitive passives.
A passive voice ratio above 25% of sentences is flagged. Expert writing style guides recommend keeping passive voice below 15% for most prose genres. Academic writing may legitimately run higher — adjust your expectations accordingly.
The fix is almost always straightforward: identify the actor (who did the thing?) and make them the subject of the sentence.
Passive: "The analysis was conducted by the team."
Active: "The team conducted the analysis."
Transition Chains
Highlight color: ● Amber underline
Max penalty: −10 points
Predictable transition patterns
AI writing is highly predictable in how it transitions between ideas. This is related to the linguistic concept of perplexity — how "surprising" each word is given the preceding context. AI outputs low-perplexity text; each word is exactly what you'd expect next.
Transition phrases are the most obvious manifestation: "Furthermore," "Moreover," "In conclusion," "It is worth noting," "Having said that," "In other words." These act as structural scaffolding that AI leans on heavily because they always work — they're never wrong, so they're always predicted as likely.
CleanOutput flags 35+ such transition phrases and cliché structural markers including:
- Conclusion openers: "In conclusion," "To summarize," "In summary," "Overall"
- Additive transitions: "Furthermore," "Moreover," "Additionally," "In addition"
- Certainty markers: "It is clear that," "Without a doubt," "Needless to say"
- Reformulation phrases: "In other words," "Put simply," "Simply put"
- Cliché openers: "In today's world," "Throughout history," "Have you ever wondered"
Semantic Redundancy
Highlight color: ● Pink underline
Max penalty: −10 points
Redundancy and hedging language
AI writing is often semantically redundant — it says things twice in slightly different ways, or adds qualifiers that don't add information. This happens because the model is rewarded for thoroughness and completeness, so it hedges and doubles back.
Redundant phrases pair words where one already implies the other: "absolutely essential" (essential already means absolute), "past history" (history is already past), "completely eliminate" (eliminate already means completely), "unexpected surprise."
Hedging language adds qualifiers that weaken statements without purpose: "It is worth noting that," "One might argue," "It could be suggested," "In many ways," "For the most part." This over-qualification is an AI safety behavior leaking into prose.
The fix: cut the redundant word or drop the hedge entirely and state your point directly. Readers trust confident prose.
Padded Verbs
Highlight color: ● Pink underline
Nominalisations and padded verb phrases
Nominalisation is the habit of turning verbs into nouns and then adding a weak verb: "make a decision" instead of "decide," "conduct an investigation" instead of "investigate," "give consideration to" instead of "consider." This inflates word count without adding meaning.
AI uses these constructions because they appear frequently in formal writing that was part of its training data, and formal-sounding text is often rated more highly by human feedback. The result is bloated, indirect prose.
CleanOutput flags 20+ padded verb patterns and encourages direct substitution with the root verb. This almost always makes sentences shorter, stronger, and more human.
Structural Uniformity
Highlight color: ● Subtle purple tint
Max penalty: −10 points
Paragraph-level structural patterns
AI outputs tend to have eerily similar paragraph lengths — typically 60–100 words, three to five sentences each, always with an opening statement, supporting detail, and a mini-conclusion. This is not because the content demands it; it's because the model has learned a template and applies it consistently.
CleanOutput measures paragraph-level standard deviation. When all paragraphs are within 20 words of each other in length, that structural symmetry is flagged. Real writing has a two-line paragraph right after a fifteen-line one. It has orphan sentences and sprawling asides.
This metric also detects over-formatted hierarchical structure — bullet-point lists where prose would serve better, numbered sections in response to a question that didn't ask for a list, and artificially parallel structure throughout.
Readability Score
Bonus/neutral metric
Bonus: +3 at extremes
Flesch-Kincaid readability analysis
The Flesch Reading Ease formula scores text on a 0–100 scale based on average sentence length and average syllables per word. A score of 60–70 is standard for plain English; 30–50 is academic; 70–80 is conversational.
AI tends to cluster around 45–65 — the mid-range that its training data and RLHF rewards optimise for. Scores at the extremes are more likely to indicate genuine human style: either a deliberately simple, accessible writer or a dense academic one.
CleanOutput gives a small bonus for FK scores outside the 40–70 band, reflecting the statistical fact that AI rarely inhabits these extremes. This metric is informational — it doesn't penalise scores in the middle, only rewards clear outliers.
Typography & Spelling
Highlight color: ● Amber underline
Character-level anomalies and spelling
AI sometimes introduces subtle character-level artifacts: smart quotes ("") vs. straight quotes (""), em dashes without surrounding spaces, non-breaking spaces, or Unicode lookalike characters. These are artifacts of how model outputs are post-processed and can cause issues in plain-text contexts.
CleanOutput also maintains a dictionary of the 25 most common English spelling errors (e.g. "recieve" → "receive," "seperate" → "separate," "definately" → "definitely"). While AI rarely makes spelling mistakes, human editing sessions sometimes introduce them.
Note on plagiarism: True plagiarism detection requires comparing your text against an indexed database of published content — this cannot run in a browser without transmitting your text to a server, which we refuse to do. CleanOutput does flag repeated phrases within a single document (self-plagiarism / repetition), but for external comparison, we recommend Copyscape or Grammarly's plagiarism tool.