//textpurify
Now in public beta

Clean, analyze and optimize AI text

TextPurify removes hidden Unicode artifacts, scores readability and SEO structure, analyzes sentiment, and checks semantic relevance — entirely in your browser.

// before → after
INPUT

It's important to note that AI-generated content contains various Unicode artifacts -- invisible characters, non-standard hyphens, and smart quotes.

Detected:
zero-width spacesmart quotes x2non-breaking hyphens x2em dash
OUTPUT

It's important to note that AI-generated content contains various Unicode artifacts -- invisible characters, non-standard hyphens, and smart quotes.

5 artifacts removed0 invisible chars100% text preserved
// TOOLS

Everything you need in one place

Six specialized tools built for teams that work with AI-generated content every day.

Cleaner

Detect and remove zero-width characters, homoglyphs, non-standard spaces, and typographic artifacts from any text.

Inspector

Explore every character in your text with code point details, Unicode category, and script origin.

SEO & Health

Readability scores (Flesch-Kincaid / LIX), keyword density, water word analysis, passage structure, and a composite Health Score.

Bulk Upload

Process dozens of files at once. Upload .txt or .md files and get a clean version of each with an artifact report.

API & Keys

Integrate Unicode cleaning into your pipeline. Generate API keys, call the REST endpoint, and process text programmatically.

History

Every text you process is saved locally on your device. Restore any previous version with one click — no account required.

Open the app
// AI INTELLIGENCE

Deeper analysis, in your browser

On-device ML models run locally — your text never leaves your machine.

Health Score

A single 0–100 composite score combining artifact cleanliness, readability, water density, and passage structure. Know at a glance whether your text is publish-ready.

Sentiment Analysis

On-device BERT model classifies your text as Positive, Negative, or Neutral with a confidence score. Runs in your browser — no text is sent to a server.

Semantic Relevance

Enter a topic or query and see how semantically relevant your text is. Sentence embeddings find the most and least relevant paragraphs.

Passage & AEO Score

Measures heading structure, question headings, answer-first patterns, and paragraph length — the factors that determine whether your content gets surfaced by AI overviews.

// what we catch

Four categories of Unicode artifacts.

01

Zero-Width Characters

Invisible characters with no width — U+200B, U+FEFF, U+200C and more. AI tools insert them silently and they can break search, clipboard copy, and text processing.

02

Non-Standard Spaces

Unicode has dozens of space characters beyond U+0020. Non-breaking spaces (U+00A0), em spaces (U+2003), and thin spaces (U+2009) look identical but behave differently.

03

Typographic Substitutions

Smart quotes (“”), em dashes (—), and ellipsis (…) replace their ASCII equivalents in AI output. These break code, CSV parsing, and plain-text pipelines.

04

Homoglyphs

Cyrillic letters like а (U+0430) look identical to Latin a but are different code points. Used for obfuscation, SEO spam, or introduced accidentally by multilingual AI models.

05

Not an AI detector

TextPurify does not try to guess whether text was written by AI. It detects specific Unicode artifacts that AI tools happen to leave behind. Works on any text, from any source.

06

Your text never leaves your browser

Detection and cleaning run entirely in your browser using JavaScript. Nothing is sent to a server for analysis. Your content stays private.

// BLOG

Learn about text quality

Deep dives into Unicode, readability, AEO, and why AI text is full of hidden characters.

View all articles
// PRIVACY BY DESIGN

Your text never leaves your browser

Every computation — from Unicode scanning to sentiment analysis — runs locally. Open DevTools and watch the network tab: no text is ever transmitted.

Unicode detection & cleaning

Pure TypeScript, runs synchronously in the main thread.

The entire artifact detection engine is a ~8 KB JS module. It scans every character against Unicode category tables without any network calls. Results appear in under 5 ms for texts up to 100,000 characters.

Readability & SEO metrics

Flesch-Kincaid, LIX, Gunning Fog, passage analysis — all computed locally.

All scoring formulas are implemented in plain TypeScript. No external API is called. The keyword and water-word analysis runs against static dictionaries bundled with the app.

Sentiment analysis

DistilBERT model running via WebAssembly (ONNX Runtime Web).

The ~67 MB model weights are downloaded once from Hugging Face CDN and cached in your browser's IndexedDB. After the first load, inference is fully offline. Your text is passed only to the local WASM runtime — never to a remote endpoint.

Semantic relevance

all-MiniLM-L6-v2 sentence embeddings, computed in a Web Worker.

A ~23 MB embedding model runs in a background Web Worker so it never blocks the UI. Cosine similarity between your query and each paragraph is computed in-memory. The Worker is terminated when you leave the tool.

// how it works

Three steps. Zero complexity.

1

Paste your text

Drop in any AI-generated or AI-assisted text. Blog posts, emails, reports — anything.

2

Detect artifacts

TextPurify scans for patterns across four Unicode artifact categories, flagging every issue instantly.

3

Get clean text

Copy your cleaned text with one click. Same words, zero hidden characters, no more surprises.

Stop the Unicode aftertaste.

Free tier: up to 5,000 characters without an account. Sign up free for 10,000.

Open TextPurify
TextPurify — Clean, Analyze & Optimize AI Text