normalize-space does not conform to the W3C recommendation
vovchynniko opened this issue · comments
Vadym Ovchynnikov commented
Hi @zhengchun,
I was working with your library and I've noticed the normalize-space
function is not replacing internal whitespace characters with one space as per the spec:
https://www.w3.org/TR/1999/REC-xpath-19991116/#function-normalize-space
The normalize-space function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space. [...]
Here's the minimal Go code that reproduces the issue:
package main
import (
"fmt"
"github.com/antchfx/htmlquery"
"os"
"strings"
)
func mustSucceed(err error) {
if err != nil {
fmt.Printf("Error: %v", err)
os.Exit(1)
}
}
func main() {
htmlDoc := `<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title></title>
</head>
<body>
<div>Match
me</div>
</body>
</html>`
root, err := htmlquery.Parse(strings.NewReader(htmlDoc))
mustSucceed(err)
node := htmlquery.FindOne(root, "//div[normalize-space(text())=\"Match me\"]")
if node == nil {
fmt.Printf("No matches")
}
}
This would print No matches
.
Thank you for your time and effort!
zhengchun commented
fixed, Thanks.