antchfx / xpath

XPath package for Golang, supports HTML, XML, JSON document query.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

normalize-space does not conform to the W3C recommendation

vovchynniko opened this issue · comments

Hi @zhengchun,

I was working with your library and I've noticed the normalize-space function is not replacing internal whitespace characters with one space as per the spec:

https://www.w3.org/TR/1999/REC-xpath-19991116/#function-normalize-space

The normalize-space function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space. [...]

Here's the minimal Go code that reproduces the issue:

package main

import (
	"fmt"
	"github.com/antchfx/htmlquery"
	"os"
	"strings"
)

func mustSucceed(err error) {
	if err != nil {
		fmt.Printf("Error: %v", err)
		os.Exit(1)
	}
}

func main() {
	htmlDoc := `<!doctype html>
<html lang=en>
   <head>
       <meta charset=utf-8>
       <title></title>
   </head>
   <body>
       <div>Match
           me</div>
   </body>
</html>`

	root, err := htmlquery.Parse(strings.NewReader(htmlDoc))
	mustSucceed(err)

	node := htmlquery.FindOne(root, "//div[normalize-space(text())=\"Match me\"]")
	if node == nil {
		fmt.Printf("No matches")
	}
}

This would print No matches.

Thank you for your time and effort!

fixed, Thanks.