danfickle / neoflyingsaucer

[Deprecated - Please use openhtmltopdf at link] An attempt to modernize flyingsaucer, the HTML and CSS 2.1 renderer in pure Java

Home Page:https://github.com/danfickle/openhtmltopdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use w3c Document?

hrj opened this issue · comments

commented

The *Panel classes (XHTMLPanel, etc) are using the jsoup Document class. But this class is not a sub-class of org.w3c.dom.Document which is the standard document class in Java for dealing with DOM.

The biggest missing feature in the jsoup Document class is that it doesn't support DOM events, while w3c DOM does.

Using w3c DOM also makes it easy to interoperate with other libraries.

Are you okay to switch to w3c Document for the Panel APIs? Note that JSoup can still be used for parsing. The idea is to convert it to a w3c Document after the parsing is over.

The conversion can be done with a small amount of code. Here is sample code (in Scala) to do the conversion:

  def jsoup2DOM(jsoupDocument: org.jsoup.nodes.Document): Document = {
    val docBuilderFactory = DocumentBuilderFactory.newInstance()
    val docBuilder = docBuilderFactory.newDocumentBuilder()
    val document = docBuilder.newDocument()
    createDOM(jsoupDocument, document, document, new HashMap[String, String]())
    document
  }

  private def createDOM(node: org.jsoup.nodes.Node, out: Node, doc: Document, ns: Map[String, String]) {
    node match {
      case d : org.jsoup.nodes.Document =>
        for (n <- d.childNodes()) {
          createDOM(n, out, doc, ns)
        }
      case e: org.jsoup.nodes.Element =>
        val _e = doc.createElement(e.tagName())
        out.appendChild(_e)
        val atts = e.attributes()
        for (a <- atts) {
          var attName = a.getKey
          if (attName != "xmlns") {
            val attPrefix = getNSPrefix(attName)
            if (attPrefix != null) {
              if (attPrefix == "xmlns") {
                ns.put(getLocalName(attName), a.getValue)
              } else if (attPrefix != "xml") {
                val namespace = ns.get(attPrefix)
                if (namespace == null) {
                  attName = attName.replace(':', '_')
                }
              }
            }
            _e.setAttribute(attName, a.getValue)
            if (attName == "id") {
              _e.setIdAttribute(attName, true)
            }
          }
        }
        for (n <- e.childNodes()) {
          createDOM(n, _e, doc, ns)
        }
      case t:org.jsoup.nodes.TextNode =>
        if (!(out.isInstanceOf[Document])) {
          out.appendChild(doc.createTextNode(t.text()))
        }
      case d:org.jsoup.nodes.DataNode =>
        out.appendChild(doc.createCDATASection(d.getWholeData()))
      case dt:org.jsoup.nodes.DocumentType =>
        println("Doc type: " + dt)
        //TODO
      case _ => throw new Exception("Unexpected node: " + node.getClass) // TODO: NOP
    }
  }
commented

Just a heads up. I was working on this today in a tentative branch. Most of the code changes are done. I still need to try it out if it works fine at run-time.

I suggest we, for performance reasons, only convert if a w3c document is specifically requested.

commented

Hmm, are you worried about time lost in conversion, or because w3c document is itself slower than jsoup Document?

If it is the latter then your suggestion makes sense.

If it is the former (conversion overhead) then it is a little trickier. I am thinking about DOM modifications through Javascript. In that scenario, converting once to w3c DOM would be optimal, than converting after every modification.

I wasn't aware you were wanting JavaScript support. If you do, we'll definitely need the w3c interfaces or extensions to present to user code

There will be some major challenges in supporting js. Most notable is a security sandbox. Also a problem is that we don't have incremental or partial layout for when js alters the dom.

commented

Psst.. I am working on a full-fledged (and soon to be open-source) browser :)

I was hoping that some incremental layout support is present in flyingsaucer. But that is not a blocker for me. I just want to get to a functional browser first and worry about performance later.

Cool! I'll trust you to best understand the swing side of this project as your PRS have been good and you may be the main user of the swing components. I think most users are interested in flying saucer as a static renderer.