css4j / css4j

CSS parser with Event and Object Model APIs, a DOM wrapper and a CSS-aware DOM implementation. Written in the Java™ language.

Home Page:https://css4j.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DOM] Add a getter for innerText

carlosame opened this issue · comments

The innerText DOM property was introduced by MS Internet Explorer and adopted by the other major browsers without a lot of enthusiasm, however it has its defenders and its use cases. See The poor, misunderstood innerText by Juriy Zaytsev for background information.

The property is specified in HTML, although I did not use it to write the code:

https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute

As said in the aforementioned blog post, a typical use case is about rich text editing in a browser, however it is useful for any application where a rich-text representation of a document fragment (with all its HTML tags and styles) is used, but one also wants a plain text version of that content (to store in a database field or a plain text document).

This library focuses on non-browser use cases, and this implementation is not intended to be completely equivalent to what web browsers do. For example, I try to avoid some empty lines that often appear in innerText as given by browsers, and a white space is added before a list item —as suggested by J. Zaytsev's blog post, which has been very useful— although only if the list-style-position CSS property is set to inside.

The fe-innertext branch contains the patch intended for merging.

Two of the tests use the documents innertext.html and innertext.xhtml, which are based on a sample by @kangax (see https://kangax.github.io/jstests/innerText/), as mentioned in the comment at the top of the files. I thank him (and Aryeh Gregor) for making it available.

Commit merged.