guardian / scribe

DEPRECATED: A rich text editor framework for the web platform

Home Page:http://guardian.github.io/scribe/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Break tags inserted everywhere without sanitizer plug-in

mindplay-dk opened this issue · comments

Trying out Scribe for the first time, I immediately ran into this problem: upon making the first change, such as selecting some text and making it bold, break tags are inserted everywhere.

This problem is easy to repeat - start with the scribe demo and simply comment out the scribe.use(scribePluginSanitizer({...})) call and the scribe.setContent() call at the end, and insert some longer content - I used this:

    <div class="scribe">
        <h2>This is the first editor instance</h2>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
            Donec facilisis diam in odio iaculis blandit. Nunc eu mauris sit amet purus viverra gravida ut a dui.</p>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui.</p>
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
            Donec facilisis diam in odio iaculis blandit. Nunc eu mauris sit amet purus viverra gravida ut a dui.</p>
    </div>

After I make the first change, I have this:

<p>
        <br></p><h2>This is the first editor instance</h2><p>
        <br></p><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
            Donec facilisis diam in odio iaculis blandit. Nunc eu mauris sit amet purus viverra gravida ut a dui.</p><p>
        <br></p><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui.</p><p>
        <br></p><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec facilisis diam in odio iaculis blandit. Nunc eu
            mauris sit amet purus viverra gravida ut a dui. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
            Donec facilisis diam in odio iaculis blandit. Nunc eu mauris sit amet purus viverra gravida ut a dui.</p><p>
    <br></p>

When it starts up, it's fine - highlight some text, make it bold, and <br> tags appear everywhere.

I tried every 2.x release and back to the last 1.x release, all exhibit the same problem.

If the editor doesn't work correctly without this plug-in, shouldn't it be bundled as a core plug-in?

I'm using the current version of Chrome on Win10.

The Scribe Sanitizer is not a mandatory plugin and shouldn't be needed unless you want to control the markup. However if you cut and paste into the Scribe instance without sanitising though you will get the default browser behaviour which is likely to not be what you were expecting. Chrome will create a lot of blocks containing only break return elements.

I do not feel we can easily say what the core behaviour of the base shim should be regarding this, we're unlikely to get it right and end up mediating requests between users. Even agreeing what is a block level element hasn't been easy! Using a highly configurable plugin seems the right thing to do here.

I will add a suitable test page to the project to help test the behaviour and compare it to the base content editable.

If you want to try and submit a new core plugin so we can discuss what the basic rules should be please do so. We'd be happy to review it.

This isn't about the behavior of the base shim - I consider it more of a framework for editor features than an editor in it's own right.

But in this case, I'm using a simple, well-structured piece of content, and performing a simple operation on it, making some text bold - and I have every plugin from the example page enabled, except for the sanitizer. The problem is also repeatable in a bare editor with no plug-ins.

Any of so far 6 other editors tested exhibited no behavior like this - ranging from mammoth editors like CK down to bare contenteditable, I haven't seen editor behavior like this.

Even the core editor has a lot of builtin plug-ins though, so most likely, this is caused by one of those, and the sanitizer fixes it? I will try to narrow it down by disabling core plug-ins...

Thanks!

Remember to compare the behaviour against the raw content editable as well.

Remember to compare the behaviour against the raw content editable as well.

I did.

Anyways, there is a bug here, I'm pretty sure of it.

I narrowed it down to the enforce-p-elements plug-in - remove this and it works normally, without the sanitizer plug-in to clean things up after the fact.

Looks like there's a problem with the filter() operation in wrapChildNodes() - try adding a console.log() statement like this:

          .filter(function(node) {
            console.log("filter", node.nodeType === Node.TEXT_NODE || !nodeHelpers.isBlockElement(node), node);
            return node.nodeType === Node.TEXT_NODE || !nodeHelpers.isBlockElement(node);
          })

It looks from the console output like you're selecting all-whitespace text nodes between the block elements - and then wrapping them in <p> tags.

I don't think that was the intention?

I changed the filter() call as follows:

          .filter(function(node) {
            return node.nodeType === Node.TEXT_NODE
              ? !/^\s+$/.test(node.nodeValue)
              : !nodeHelpers.isBlockElement(node);
          })

This will strip all-whitespace text nodes, rather than wrapping them.

Doing a console.log(bin.innerHTML) after the wrapChildNodes() and this appears to fix it.

Does this look like an acceptable solution?

I can't think why we'd want to retain the whitespace, so I'd be happy to give this a go.

Cool.

Just one thing I'm not 100% sure about - is this the right place to do that?

Would it make more sense to clean the content initially, when the editor is created? Or does it make more sense the way I've done it here, cleaning redundant nodes on-the-fly, while editing?

Oh, I'm also not 100% sure about the reg-ex, should it be ^\s+$ or ^\s*$?

I think that question is only relevant if we go with the on-fly-solution, but technically we might have either consecutive text-nodes, or empty (zero length) text nodes - those can occur when you make changes to a document.

A possible solution to that problem, would be to add bin.normalize() before wrapChildNodes(bin), to make sure we're working with normalized HTML. (note that this won't remove all-whitespace text nodes, only zero-length text nodes, and consolidating consecutive text-nodes.)

It'll be easier for future readers to put it in the filter and do the work on the fly.

Normalising the HTML seems sensible when I think about it but I'd probably need to see some code and give it a go myself. Consolidation, then filtering seems sensible.

Sounds reasonable.

Can you take it from here?

I would have liked to submit a patch, but I'm on Windows, and the build scripts have a lot of (binary) dependencies - I haven't been able to get the project to build...

Yep, I'll take a look

Thanks :-)

Fixed in v2.2.0

Awesome, will test in the morning, thanks!