tid-kijyun / Kanna

Kanna(鉋) is an XML/HTML parser for Swift.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XPath and Childs, unknown behaviour.

iDevPro opened this issue · comments

Description:

Installation method:

  • Carthage
  • CocoaPods
  • [*] Swift Package Manager
  • Manually
  • other: ()

Kanna version (or commit hash):

5.2.2 with fix from 4.0.0 (name of module)

swift --version

Apple Swift version 5.2 (swiftlang-1103.0.32.1 clang-1103.0.32.29)
Target: x86_64-apple-darwin19.4.0

Xcode version (optional):

Version 11.4 (11E146)

I found strange issue, when you need find a couple of same objects
or object with child and want to iterate it. When you inside of loop use .xpath()
for find items which contains what do you need, you cannot get it right
because .xpath() return first subitem from root item.

for example:

// I try to find all books with this XPath, which return array(list, etc.) 
// of "brow-data" items:
static let userBookXPath = "//*[@id = 'booklist']//div[@class='brow-data']"

// This XPath for search book name
static let browBookNameXPath = "//a[contains(@class, 'brow-book-name')]"

// Next I want to iterate over it:
let books = try HTML(url: pageUrl, encoding: .utf8)
    .xpath(Constants.userBookXPath)
    .makeIterator()

while let book = books.next() {
    // parse even book here like that (this is example)
    // What am I doing wrong here ?
   print(book.xpath(Constants.browBookNameXPath).first?.content)
    
   // expected:
    (optional("Book title one "))
    (optional("Book title two "))
    (optional("Book title three "))

   // actual:
    (optional("Book title one "))
    (optional("Book title one "))
    (optional("Book title one "))

   // additional:
   // I have 20 book per page, and want iterate over 20 book,
   //  but for strange return here
   // I suppose book.xpath(Constants.browBookNameXPath).count should return 1
   // But expect 20 ))
}

To define a relative path you have to use dot-notation(.//).

static let browBookNameXPath = ".//a[contains(@class, 'brow-book-name')]"

I think close now, because I move to .css("pattern") and that work perfect :)