How to strip tags in the Rich Text Editor – Not breaking Sitecore

This is the third post in the series about Rich Text Editor (RTE) in Sitecore. The first post which was about the basics of setting up the RTE could be found here.

Copy-paste wrecks your day

So, you have configured your editor and specified access for HTML edit buttons to appropriate people only. If you think that this is the end – unfortunately I have to disappoint you. As an editors are a little bit lazy and don’t want to type a lot – they prefer to copy & paste. And in this case our RTE can lose its head.

Let’s imagine that editors prefer to prepare some content in Microsoft Word (from my experience it’s so often). In this case when they paste something formatted from MS Word into RTE the result will be a little bit ugly.

word-formatted-html
Word formatted HTML (it rhymes with ‘Hell’ for a reason)

Luckily the Rich Text Editor is a powerful tool and comes with mechanics to strip tags when pasting.

How to strip tags automatically

Under the hood of the RTE lies the Telerik RadEditor.This editor has an ability to handle content pasted from MS Word. The following article describes this ability: http://docs.telerik.com/devtools/aspnet-ajax/controls/editor/managing-content/pasting-content/clean-ms-word-formatting

Through this article, it becomes clear how we can manage situation with the pasting. RadEditor has the property StripFormattingOptions. It saves us from the shame.

What NOT to do

Some guides on the internet will tell you that you can simply go into the filesystem in “www\sitecore\shell\Controls\Rich Text Editor” and start changing the markup and scripts of the editor. While this might work, it is NOT the way to do it. These files ships with Sitecore and as such can be overwritten at any time, when you either upgrade Sitecore or install a Sitecore patch. Generally you should never edit files that ships with Sitecore. In real projects you should follow this rule to avoid many headaches in the future. The component based development approach could be used in such cases. In consists from creating custom code in well defined components which is then added on-top of Sitecore. This way we don’t break Sitecore and keep the solution able to be updated or patched.

Using the EditorConfiguration class and SetupEditor method

Sitecore supplies a way to configure the editor without changing the files that comes with sitecore. You can extend the ‘Sitecore.Shell.Controls.RichTextEditor.EditorConfiguration’ class and override the ‘SetupEditor’ method, where you can set all the settings you need on the editor. This method is run every time the editor is loaded.

To strip all word tags, simply do this:

using Telerik.Web.UI;
using Sitecore.Data.Items;
 
namespace CodeBuildPlay
{
  public class RichTextEditorConfiguration : Sitecore.Shell.Controls.RichTextEditor.EditorConfiguration
  {
    public RichTextEditorConfiguration(Item profile)
      : base(profile)
    {
    }
 
    protected override void SetupEditor()
    {
      base.SetupEditor();
      Editor.StripFormattingOptions = EditorStripFormattingOptions.MSWordRemoveAll;
      //Set additional properties of the Editor here if needed.
    }
  }
}

Alas, a caveat

All this applies to the Rich Text Editor. Unfortunately it is still possible to paste html directly into in the page editor in Sitecore. Anything can be pasted in the web edit version of the Rich Text Field and there is no way to strip anything. It is beyond me, how Sitecore have missed this, but in a support case I opened, they confirmed, that there’s no way around it at the moment.

problem
Text pasted from the Word in page edit

Enjoy!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s