Kozea / tinycss2

A tiny CSS parser

Home Page:https://courtbouillon.org/tinycss2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Global cleaning

jnothman opened this issue · comments

I like the API design of tinycss2, but find it frustrating that if I'm interpreting some CSS, I always have to be ready for the possibility of finding a ParseError (or indeed a Comment), even if I have now lost the context in which it was parsed, and hence enough information to report an error to the user.

While I understand the benefits of keeping parse errors local, I would propose one or both of:

  • a function to traverse all token lists found within a parse, so that they can be cleaned or otherwise handled;
  • an option to attach the original input CSS or similar to each ParseError node.

In particular, it becomes a much greater nuisance to test that my CSS interpretation code works with comments in any part of the CSS.

Or would you recommend just sticking with tinycss, @SimonSapin?

parse_* functions have a skip_comments parameter. Does this help?

I don’t quite understand what you’re asking for, sorry. What do you mean by "the context in which it was parsed"? Comment and ParseError inherit from Node. All nodes have source_line and source_column attributes, as well as a serialize() method.

Could you given an example with some code?

As to tinycss, I think it has significant design flaws. But if it works for you ¯\_(ツ)_/¯

Thanks, I somehow overlooked skip_comments. Yes, that helps.

To be explicit, I'm writing a library that involves transforming CSS declarations, but many of them are left untransformed and consumed by a client library. My library will land up raising warnings for the parse errors it encounters directly. But then the client is obliged to also catch and handle parse errors, or else trip on them.

I rather the tinycss2 interface, thanks :)

It sounds like maybe you want a tree traversal/rewriting mechanism.

Here is an untested attempt:

_NESTED = {
    'qualified-rule': ['prelude', 'content'],
    'at-rule': ['prelude', 'content'],
    'declaration': ['value'],
    '() block': ['content'],
    '[] block': ['content'],
    '{} block': ['content'],
    'function': ['arguments'],
}

def _apply_to(node, callback):
    for attr in NESTED.get(node.type, []):
        nested_nodes = getattr(node, attr)
        new_nested_nodes = fold(nested_nodes, callback)
        setattr(node, attr, new_nested_nodes)

def _fold_iter(nodes, callback):
    for node in nodes:
        replacement = callback(node)
        if replacement is not None:
            _apply_to(replacement, callback)
            yield replacement

def fold(nodes, callback):
    return list(fold(nodes, callback))

Which could be used like this:

def remove_errors_callback(node):
    if node.type == 'error':
        print_error(node)
        # Implicit: return None
    else:
        return node

stylesheet = tinycss2.parse_stylesheet(…)
stylesheet = fold(stylesheet, remove_errors_callback)

Yes, that sort of thing, though I'd design it as a generator, with an interface more like os.walk.

I.e. replacement would occur in-place in lists and nodes.

_CHILD_ATTRS = {
    'qualified-rule': ['prelude', 'content'],
    'at-rule': ['prelude', 'content'],
    'declaration': ['value'],
    '() block': ['content'],
    '[] block': ['content'],
    '{} block': ['content'],
    'function': ['arguments'],
}


def walk(nodes):
    '''
    >>> import tinycss2, pprint
    >>> pprint.pprint(list(walk(tinycss2.parse_declaration_list('font: rgb(1,2,3) bold; background: red'))))
    [((), [<Declaration font: …>, <WhitespaceToken>, <Declaration background: …>]),
     ((0,), <Declaration font: …>),
     ((0, 'value'),
      [<WhitespaceToken>,
       <FunctionBlock rgb( … )>,
       <WhitespaceToken>,
       <IdentToken bold>]),
     ((0, 'value', 0), <WhitespaceToken>),
     ((0, 'value', 1), <FunctionBlock rgb( … )>),
     ((0, 'value', 1, 'arguments'),
      [<NumberToken 1>,
       <LiteralToken ,>,
       <NumberToken 2>,
       <LiteralToken ,>,
       <NumberToken 3>]),
     ((0, 'value', 1, 'arguments', 0), <NumberToken 1>),
     ((0, 'value', 1, 'arguments', 1), <LiteralToken ,>),
     ((0, 'value', 1, 'arguments', 2), <NumberToken 2>),
     ((0, 'value', 1, 'arguments', 3), <LiteralToken ,>),
     ((0, 'value', 1, 'arguments', 4), <NumberToken 3>),
     ((0, 'value', 2), <WhitespaceToken>),
     ((0, 'value', 3), <IdentToken bold>),
     ((1,), <WhitespaceToken>),
     ((2,), <Declaration background: …>),
     ((2, 'value'), [<WhitespaceToken>, <IdentToken red>]),
     ((2, 'value', 0), <WhitespaceToken>),
     ((2, 'value', 1), <IdentToken red>)]
    '''
    if isinstance(nodes, list):
        yield (), nodes
        for i, node in enumerate(nodes):
            for path, descendant in walk(node):
                yield (i,) + path, descendant
    else:
        node = nodes
        yield (), node
        for attr in _CHILD_ATTRS.get(node.type, []):
            for path, descendant in walk(getattr(node, attr)):
                yield (attr,) + path, descendant

Just to let you know: I’m not working on WeasyPrint or tinycss2 anymore. While I don’t mind chatting about it, filing issues is unlikely to get the project moving. Even pull requests I’m unlikely to spend time reviewing etc.

A lot of this is not part of tinycss2 on purpose: to make fallback work, only the set of properties and values that are supported (e.g. in layout code) should be parsed.

@jnothman A lot of things have changed since 2017, are you still interested in this issue?

A lot of this is not part of tinycss2 on purpose: to make fallback work, only the set of properties and values that are supported (e.g. in layout code) should be parsed.

I hadn't known of weasyprint. It's not surprising to find it does a lot of what I've been working on in weasyprint.css...

Yes, I think that most of the features you want are not in TinyCSS2's scope.

@jnothman Feel free to reopen if you want!