Under the hood

How CleanOutput works

A plain-English walkthrough of the nine-engine linguistic linter that powers your Clean Output Score.

The core concept

What makes writing "sound AI"?

Large language models are trained to predict the most statistically probable next token. This makes them extraordinarily consistent — but consistency is exactly what makes human writing distinctive. Real human prose is messy, varied, and idiosyncratic.

CleanOutput measures the dimensions where AI consistency is most detectable: word choice, sentence rhythm, structural balance, and grammatical patterns. The further your text deviates from these norms in a human direction, the higher your Clean Output Score.

All analysis runs locally in your browser using JavaScript. No text is transmitted anywhere.

Score formula (simplified)

Start at 100
− Vocabulary penalty (up to 25)
− Passive voice penalty (up to 15)
− Transition penalty (up to 10)
− Burstiness penalty (up to 20)
− Redundancy penalty (up to 10)
− Paragraph uniformity (up to 10)
+ First-person bonus (up to 5)
+ Readability variance bonus (up to 3)

= Clean Output Score (0–100)

75+

Mostly Human

40–74

Mixed / Edited

< 40

Strong AI Signal

The analysis pipeline

Text tokenization

Your input is split into sentences (by punctuation), words (by whitespace and boundary rules), and paragraphs (by double line breaks). Character-level indices are tracked throughout so every finding can be pinpointed to an exact position.

Lexical scan

Every word and phrase is checked against our tiered vocabulary database of 200+ AI-signature terms. Each match is recorded with its exact index, length, and a suggested replacement from our alternatives dictionary.

Grammatical pattern matching

Eight regular-expression patterns scan for passive voice constructions (e.g. "is written," "was established," "can be achieved"). Padded verb phrases and nominalisations are caught by a separate phrase-level dictionary.

Cadence analysis (burstiness)

Word counts per sentence are extracted and the standard deviation is computed. A low SD (e.g. < 3) indicates robotic uniformity — AI tends to write every sentence with 18–24 words. Human writing swings wildly. A high SD means natural rhythm.

Structural uniformity check

Paragraph-level word counts are measured and their standard deviation computed. AI outputs consistently balanced paragraphs of 60–100 words. Significant variance here pushes your score up.

Readability scoring

A Flesch-Kincaid reading-ease calculation is applied. AI calibrates to mid-range readability. Scores at the extremes — very easy or very complex — indicate genuine human authorship.

First-person and voice analysis

The ratio of first-person pronouns ("I") to depersonalising language ("one," "the reader") is measured. AI typically avoids committing to a personal voice, or overcorrects into formal impersonality.

Score computation & deduplication

Weighted penalties are subtracted from a baseline of 100. Overlapping findings are merged and deduplicated so no position is double-counted. The result is a single integer: your Clean Output Score.

Visual rendering

Findings are rendered as color-coded inline highlights over your text. Each highlight is interactive — click any flagged phrase to see why it was caught and choose from 2–4 humanized alternatives. Filters let you isolate any category of issue.

Ready to try it?

Paste any text and get your score in under a second.

Open the Scanner →