Best Practice: A 'Clean Content' Workflow for Web Content Editors

Body

This hidden code, or HTML cruft, is invisible to editors but creates messy markup behind the scenes. HTML cruft can break how your content displays in the browser, block Content Central exports, or trigger errors.

Follow these four steps to add clean, well-formatted content to your pages.

Step 1: Compose your content

Draft your content in Word, Notepad, or your preferred word processor.

Step 2: Scrub your text

Before pasting into content into a Text Box or Basic Text content block, remove all formatting and HTML cruft using one of these methods:

  • Plain text editor: Paste your text into Notepad (Windows) or TextEdit in plain text mode (Mac), then copy the cleaned text back out.
     
  • Online tool: Several free tools are available. Paste your text in, clear the formatting, and copy the cleaned text out.
    • Word to HTML — paste text into the 'Word' screen, click to the 'HTML' screen, select all, click the Clear Formatting button 
    • HTML Cleaner — paste text, click the Clean HTML button
    • Word to Clean HTML — strips invalid and proprietary tags, leaves clean web-safe text
    • HTML Washer — strips inline styles, scripts, and unnecessary markup in one click; no account required
    • WordToHTML.net — paste or upload a .docx file and convert to clean HTML

Step 3: Paste into SiteMasonry CMS

Paste the cleaned text into a Body text area or Basic Text Block.

Step 4: Format using the WYSIWYG toolbar

Use the text block's WYSIWYG toolbar to apply formatting such as headings, bold, and lists. Formatting applied in Word or any other word processor will not carry over correctly, which is why formatting directly in SiteMasonry CMS is the final step.

A note on AI tools

AI tools such as Claude or ChatGPT can be useful for drafting content, but do not paste output from an AI tool directly into SiteMasonry CMS. Always run AI-generated text through the scrubbing step first. Here is why:

  • AI tools embed hidden metadata in their output. Copying text from AI tools can leave invisible code markers in your HTML, including data-start and data-end attributes embedded in paragraph tags. These are AI residue that serve no purpose on a published web page.
     
  • It bloats your markup. Content copied from AI tools often carries hidden baggage including inline style attributes and empty span wrappers in addition to AI-specific markers.
     
  • It applies to any CMS with a visual editor. This is not a SiteMasonry CMS issue. Any platform that allows you to paste formatted content and publish it is affected.
     
  • It can signal AI-generated content to search engines and auditors. Search engines crawl source code, not just what is visible on screen, and can detect AI metadata tags. This introduces noise into your markup and may raise flags during content audits.
     
  • The problem is invisible in the editor. The hidden attributes do not appear in the visual editor or on the published page. You have to inspect the source code to find them.

Treat AI-generated content the same as Word content: compose, scrub, paste, then format.

Topics