Selector list

Question

Selector list

Lyokovic opened this issue 7 years ago · comments

Hi,

I started using Lambda Soup and found that it does not seems to support selector lists, like ".bg1, .bg3".
I need to parse an HTML document with various <div> with bg2 bg1 bgbc bg3 classes and want to keep only the bg1 and bg3 ones while keeping the order.

I am wondering if it would be easy to implement this feature?

Anton Bachin · Answer 1 · Tue Jan 30 2018 01:05:41 GMT+0800 (China Standard Time)

Yes, it should be fairly straightforward. One would have to:

Extend the grammar of selectors with one more level: https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L489

simple_selector is stuff like .class-foo, [attribute-bar], combinators are >, +, etc. So, this grammar is capable of representing things like .class-foo > [attribute-bar]. It needs one more level of list to be able to represent comma-separated lists of these.
This is the parser top-level function. It needs to be modified to become not the top-level function, but a parser for a single item delimited by ,, and then a new top-level function needs to wrap it, that reads commas, and calls the current parser for reading everything in between. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L896-L913
This is the select code. Its logic needs to be wrapped in a new top-level loop that tries additional selectors from the new top-level list if the preceding ones didn't yield a match. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L611-L647

Lyokovic · Answer 2 · Wed Jan 31 2018 18:11:44 GMT+0800 (China Standard Time)

Thanks, I'll take a look ASAP.