common-workflow-language / schema_salad

Semantic Annotations for Linked Avro Data

Home Page:https://www.commonwl.org/v1.2/SchemaSalad.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrade mistune to v3

dotlambda opened this issue · comments

@fmigneault Want to do the upgrade to mistune 3.0.x? The Debian package maintainer for mistune pinged me about this.

@mr-c I can try looking into it, but it is not in my top-priority list at the moment.

I tried a first pass of upgrade.
After removing the type stubs for mistune and patching some imports, the remaining typing issues related to makedoc only show the following error (other typing errors are found on https://github.com/common-workflow-language/schema_salad/tree/main, but from other modules):

> make mypy
schema_salad/makedoc.py: note: In member "render_type" of class "RenderType":
schema_salad/makedoc.py:602:22: error: Call to untyped function "MyRenderer" in typed context  [no-untyped-call]
                renderer=MyRenderer(escape=escape),
                         ^~~~~~~~~~~~~~~~~~~~~~~~~

I'm not quite sure what causes this error, since MyRenderer class is defined in the same file.

Running the checks for the output HTML, I get the following diffs from the lists:

❯ make check-metaschema-diff
docker run \
        -v "/home/francis/dev/schema_salad/schema_salad/metaschema:/tmp/:ro" \
        "quay.io/commonwl/cwltool_module:latest" \
        schema-salad-doc /tmp/metaschema.yml \
        > /tmp/metaschema.orig.html
schema-salad-doc \
        "/home/francis/dev/schema_salad/schema_salad/metaschema/metaschema.yml" \
        > /tmp/metaschema.new.html
diff -a --color /tmp/metaschema.orig.html /tmp/metaschema.new.html || true
208a209
> 
248c249,250
< <li>The <code>$mixin</code> feature has been removed from the specification, as it is poorly documented, not included in conformance testing,
---
> <li>The <code>$mixin</code> feature has been removed from the specification, as it
> is poorly documented, not included in conformance testing,
318c320,321
< <li><p><code>$base</code>: Must be a string.  Set the base URI for the document used to resolve relative references.</p>
---
> <li><p><code>$base</code>: Must be a string.  Set the base URI for the document used to
> resolve relative references.</p>
320c323,324
< <li><p><code>$namespaces</code>: Must be an object with strings as values.  The keys of the object are namespace prefixes used in the document; the values of
---
> <li><p><code>$namespaces</code>: Must be an object with strings as values.  The keys of
> the object are namespace prefixes used in the document; the values of
323c327,328
< <li><p><code>$schemas</code>: Must be an array of strings.  This field may list URI references to documents in RDF-XML format which will be queried for RDF
---
> <li><p><code>$schemas</code>: Must be an array of strings.  This field may list URI
> references to documents in RDF-XML format which will be queried for RDF
339c344,345
< <li><p>At least one record definition object which defines valid fields that make up a record type.  Record field definitions include the valid types
---
> <li><p>At least one record definition object which defines valid fields that
> make up a record type.  Record field definitions include the valid types
344c350,351
< <li><p>Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type.</p>
---
> <li><p>Any number of enumerated type objects which define a set of finite set of symbols that are
> valid value of the type.</p>
356c363,364
< <li><p>If the value of <code>jsonldPredicate</code> is <code>@id</code>, the field is an identifier field.</p>
---
> <li><p>If the value of <code>jsonldPredicate</code> is <code>@id</code>, the field is an identifier
> field.</p>
358c366,367
< <li><p>If the value of <code>jsonldPredicate</code> is an object, and that object contains the field <code>_type</code> with the value <code>@id</code>, the
---
> <li><p>If the value of <code>jsonldPredicate</code> is an object, and that
> object contains the field <code>_type</code> with the value <code>@id</code>, the
364c373,374
< <li><p>If the value of <code>jsonldPredicate</code> is an object which contains the field <code>_type</code> with the value <code>@vocab</code>, the field value is subject to
---
> <li><p>If the value of <code>jsonldPredicate</code> is an object which contains the
> field <code>_type</code> with the value <code>@vocab</code>, the field value is subject to
414,417c424,431
< <li><p>If a field name URI begins with a namespace prefix declared in the document context (<code>@context</code>) followed by a colon <code>:</code>, the prefix and
< colon must be replaced by the namespace declared in <code>@context</code>.</p>
< </li>
< <li><p>If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term.</p>
---
> <li>If a field name URI begins with a namespace prefix declared in the
> document context (<code>@context</code>) followed by a colon <code>:</code>, the prefix and</li>
> </ul>
> colon must be replaced by the namespace declared in `@context`.
> 
> <ul>
> <li><p>If there is a vocabulary term which maps to the URI of a resolved
> field, the field name must be replace with the vocabulary term.</p>
419c433,434
< <li><p>If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs.</p>
---
> <li><p>If a field name URI is an absolute URI consisting of a scheme and path
> and is not part of the vocabulary, no processing occurs.</p>
465c480,481
< <li><p>If an identifier URI begins with <code>#</code> it is a current document fragment identifier.  It is resolved relative to the base URI by
---
> <li><p>If an identifier URI begins with <code>#</code> it is a current document
> fragment identifier.  It is resolved relative to the base URI by
468c484,485
< <li><p>If an identifier URI contains <code>#</code> in some other position it is a relative URI with fragment identifier.  It is resolved relative
---
> <li><p>If an identifier URI contains <code>#</code> in some other position it is a
> relative URI with fragment identifier.  It is resolved relative
472c489,490
< <li><p>If an identifier URI does not contain a scheme and does not contain <code>#</code> it is a parent relative fragment identifier.</p>
---
> <li><p>If an identifier URI does not contain a scheme and does not
> contain <code>#</code> it is a parent relative fragment identifier.</p>
474c492,493
< <li><p>If an identifier URI is a parent relative fragment identifier and the base URI does not contain a document fragment, set the
---
> <li><p>If an identifier URI is a parent relative fragment identifier
> and the base URI does not contain a document fragment, set the
477c496,497
< <li><p>If an identifier URI is a parent relative fragment identifier and the object containing this identifier is assigned to a
---
> <li><p>If an identifier URI is a parent relative fragment identifier
> and the object containing this identifier is assigned to a
483c503,504
< <li><p>If an identifier URI is a parent relative fragment identifier and the base URI contains a document fragment, append a slash
---
> <li><p>If an identifier URI is a parent relative fragment identifier
> and the base URI contains a document fragment, append a slash
487c508,509
< <li><p>If an identifier URI begins with a namespace prefix declared in <code>$namespaces</code> followed by a colon <code>:</code>, the prefix and colon must be
---
> <li><p>If an identifier URI begins with a namespace prefix declared in
> <code>$namespaces</code> followed by a colon <code>:</code>, the prefix and colon must be
490c512,513
< <li><p>If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs.</p>
---
> <li><p>If an identifier URI is an absolute URI consisting of a scheme and path,
> no processing occurs.</p>
581c604,605
< <li><p>If a reference URI is prefixed with <code>#</code> it is a relative fragment identifier.  It is resolved relative to the base URI by setting
---
> <li><p>If a reference URI is prefixed with <code>#</code> it is a relative
> fragment identifier.  It is resolved relative to the base URI by setting
584c608,609
< <li><p>If a reference URI does not contain a scheme and is not prefixed with <code>#</code> it is a path relative reference.  If the reference URI contains <code>#</code> in any
---
> <li><p>If a reference URI does not contain a scheme and is not prefixed with <code>#</code>
> it is a path relative reference.  If the reference URI contains <code>#</code> in any
595c620,621
< <li><p>If a reference URI begins with a namespace prefix declared in <code>$namespaces</code> followed by a colon <code>:</code>, the prefix and colon must be replaced by the
---
> <li><p>If a reference URI begins with a namespace prefix declared in <code>$namespaces</code>
> followed by a colon <code>:</code>, the prefix and colon must be replaced by the
598c624,625
< <li><p>If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs.</p>
---
> <li><p>If a reference URI is an absolute URI consisting of a scheme and path,
> no processing occurs.</p>
674c701,702
< <li>If a reference URI is a vocabulary field, and there is a vocabulary term which maps to the resolved URI, the reference must be replaced with
---
> <li>If a reference URI is a vocabulary field, and there is a vocabulary
> term which maps to the resolved URI, the reference must be replaced with
981c1009,1010
< <li>If the value ends with a question mark <code>?</code> the question mark is stripped off and the value of the field <code>required</code> is set to <code>False</code></li>
---
> <li>If the value ends with a question mark <code>?</code> the question mark is
> stripped off and the value of the field <code>required</code> is set to <code>False</code></li>
1050c1079,1080
< <li>The document root must be an object or a list.  If the document root is an object containing the field <code>$graph</code> (which must be a list of
---
> <li>The document root must be an object or a list.  If the document root is an
> object containing the field <code>$graph</code> (which must be a list of
1052,1054c1082,1087
< <li>For each object, attempt to validate as one of the record types flagged with <code>documentRoot: true</code>.</li>
< <li>To validate a record, go through <code>fields</code> and recursively validate each field of the object.</li>
< <li>For fields with a list of types (type union), go through each type in the list and recursively validate the type.  For the
---
> <li>For each object, attempt to validate as one of the record types
> flagged with <code>documentRoot: true</code>.</li>
> <li>To validate a record, go through <code>fields</code> and recursively
> validate each field of the object.</li>
> <li>For fields with a list of types (type union), go through each
> type in the list and recursively validate the type.  For the
1056,1057c1089,1092
< <li>Missing fields are considered <code>null</code>.  To validate, the allowed types for the field must include <code>null</code></li>
< <li>Primitive types are null, boolean, int, long, float, double, string.  To validate, the value in the document must have one
---
> <li>Missing fields are considered <code>null</code>.  To validate, the allowed types
> for the field must include <code>null</code></li>
> <li>Primitive types are null, boolean, int, long, float, double,
> string.  To validate, the value in the document must have one
1060c1095,1096
< <li>To validate an array, the value in the document must be a list, and each item in the list must recursively validate as a type
---
> <li>To validate an array, the value in the document must be a list,
> and each item in the list must recursively validate as a type
1062c1098,1099
< <li>To validate an enum, the value in the document be a string, and the value must be equal to the short name of one of the values
---
> <li>To validate an enum, the value in the document be a string, and
> the value must be equal to the short name of one of the values
1064c1101,1102
< <li>As a special case, a field with the <code>Expression</code> type validates string values which contain a CWL parameter reference or expression in the form
---
> <li>As a special case, a field with the <code>Expression</code> type validates string values
> which contain a CWL parameter reference or expression in the form
1358c1396,1397
< <li><p>If the value of this field is <code>@id</code> and <code>identity</code> is false or unspecified, the parent field must be resolved using the link
---
> <li><p>If the value of this field is <code>@id</code> and <code>identity</code> is false or
> unspecified, the parent field must be resolved using the link
1362c1401,1402
< <li><p>If the value of this field is <code>@vocab</code>, the parent field must be resolved using the vocabulary resolution rules.</p>
---
> <li><p>If the value of this field is <code>@vocab</code>, the parent field must be
> resolved using the vocabulary resolution rules.</p>

I do not recall if we considered these "acceptable diffs" or not, since they should produce the same visual output in HTML?

My changes are here for the moment:
main...fmigneault:schema_salad:update-mistune-v3

I think it would be possible to remove my custom hook completely.
I obtain the same diff results above whether it is applied or not here:
https://github.com/fmigneault/schema_salad/blob/29f90c7a7541465934a0669edc4a87816b53b6e1/schema_salad/makedoc.py#L608

If the above diffs are considered "acceptable", then I can finish cleaning up this branch by removing the hook.

Thank you @fmigneault ; it looks like the only change is additional newlines? If so, that shouldn't affect HTML rendering so it is okay with me

I'm not quite sure what causes this error, since MyRenderer class is defined in the same file.

This is because the mistune3 package doesn't have type hints for mistune.renderers.html.HTMLRenderer nor its superclass mistune.renderers.core.BaseRenderer; so we'll need to update the type stubs, not simply delete them.

@fmigneault To unblock some things in Debian, I'm uploading a version of your patch there; it includes fixes to the mypy stubs: https://salsa.debian.org/python-team/packages/python-schema-salad/-/commit/ec7c0d176131ea007fcb1085f5205aae323e9686 (but maybe you want to do things differently in your PR)

This is because the mistune3 package doesn't have type hints for mistune.renderers.html.HTMLRenderer nor its superclass mistune.renderers.core.BaseRenderer; so we'll need to update the type stubs, not simply delete them.

Oh, you are right. I did not even notice.
Maybe it would be worth doing a PR to mistune and target a following patch release? I can see that HTMLRenderer and BaseRenderer have type hints for all their methods except their __init__. The omitted typing were most probably unintentional. It would be easier to avoid using stubs, otherwise those stubs must redefine all the employed methods and their types.

I'm fine with the Debian patch you provided.

Yes, we should send the type hints to them; I don't keeping our stubs until theirs is fixed.

Can you open a PR here with your fixes? I'll handle the typing

Using lepture/mistune#375, only this change was necessary:

- doc = markdown2html(doc)
+ doc = cast(str, markdown2html(doc))

Here's the PR: #787