pylti / lti

Learning Tools Interoperability for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`process_xml` doesn't correctly parse `icon` in all cases

cutz opened this issue · comments

I've run into an issue where certain xml structure that contains icon and cartridge_icon does not parse correctly. Take for example the xml from the recently added test case here.

>>> CC_LTI_OPTIONAL_PARAMS_XML = b'''<?xml version="1.0" encoding="UTF-8"?>
... <cartridge_basiclti_link xmlns:blti="http://www.imsglobal.org/xsd/imsbasiclti_v1p0" xmlns:lticm="http://www.imsglobal.org/xsd/imslticm_v1p0" xmlns:lticp="http://www.imsglobal.org/xsd/imslticp_v1p0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.imsglobal.org/xsd/imslticc_v1p0" xsi:schemaLocation="http://www.imsglobal.org/xsd/imslticc_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticc_v1p0.xsd http://www.imsglobal.org/xsd/imsbasiclti_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imsbasiclti_v1p0p1.xsd http://www.imsglobal.org/xsd/imslticm_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticm_v1p0.xsd http://www.imsglobal.org/xsd/imslticp_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticp_v1p0.xsd">
...   <blti:title>Test config</blti:title>
...   <blti:description/>
...   <blti:launch_url>http://www.example.com</blti:launch_url>
...   <blti:secure_launch_url>http://www.example.com</blti:secure_launch_url>
...   <blti:icon>http://wil.to/_/beardslap.gif</blti:icon>
...   <blti:vendor/>
...   <cartridge_icon identifierref="BLTI001_Icon"/>
... </cartridge_basiclti_link>
... '''
>>> from lti.tool_config import ToolConfig
>>> config = ToolConfig.create_from_xml(CC_LTI_OPTIONAL_PARAMS_XML)
>>> config.icon
>>> config.icon is None
True
>>> 

The issue appears to be the use of in used to check tag names of xml. Presumably this is used to get around the fact that tag names include the namespace information.

>>> icon_node.tag
'{http://www.imsglobal.org/xsd/imsbasiclti_v1p0}icon'
>>> 

This leads to issues when tag names are substrings of other tag names. For example in the case of icon and cartridge_icon the config's icon is correctly set when it encounters the icon tag, but then overwritten when parse_xml encounters the cartridge_icon tag.

As this is potentially a larger change I thought I would raise an issue here first. I would like to propose the use of lxml.etree.QName to check exact matches of the local tag name. The existing tag name checks in process_xml would be changed to something like:

from lxml.etree import QName
def _is_tag(node, name):
    return QName(node).localname == name

...

if _is_tag(child, 'icon'):
    self.icon = child.text

...

There is some evidence that this has been accounted for in the past using a serious of if/elif blocks. e.g.

            if 'secure_launch_url' in child.tag:
                self.secure_launch_url = child.text
            elif 'launch_url' in child.tag:
                self.launch_url = child.text

That approach could be taken here as well, but it seems like this may warrant a more general solution. I'm happy to provide a PR for this change if you would like to move forward.

Your suggestion seems reasonable to me, so I'm happy to review PRs to this effect. It'll take me a bit longer to review because I'm not all that familiar with the XML libraries (most of this code was written by other authors). But a more correct check of the tag name seems entirely appropriate to me.

Thanks for the report, and a PR would be most welcome!

There is also secure_icon tag (<blti:secure_icon>). It's also not produced by to_xml() method, and it's not tested anywhere in tests (but it should be parsed by process_xml() method). I was planning to implement it shortly, will secure_icon behave similarly to icon when XML is parsed?

It makes sense to me that it would fall in the same trap that secure_launch_url falls into. If you're not wanting to take on this bit, I suppose that you could add it with the careful ordering the same as was described for the launch url.

It seems that secure_icon is parsed fine:

>>> CC_LTI_OPTIONAL_PARAMS_XML = b'''<?xml version="1.0" encoding="UTF-8"?>
... <cartridge_basiclti_link xmlns:blti="http://www.imsglobal.org/xsd/imsbasiclti_v1p0" xmlns:lticm="http://www.imsglobal.org/xsd/imslticm_v1p0" xmlns:lticp="http://www.imsglobal.org/xsd/imslticp_v1p0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.imsglobal.org/xsd/imslticc_v1p0" xsi:schemaLocation="http://www.imsglobal.org/xsd/imslticc_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticc_v1p0.xsd http://www.imsglobal.org/xsd/imsbasiclti_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imsbasiclti_v1p0p1.xsd http://www.imsglobal.org/xsd/imslticm_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticm_v1p0.xsd http://www.imsglobal.org/xsd/imslticp_v1p0 http://www.imsglobal.org/xsd/lti/ltiv1p0/imslticp_v1p0.xsd">
...   <blti:title>Test config</blti:title>
...   <blti:description/>
...   <blti:launch_url>http://www.example.com</blti:launch_url>
...   <blti:secure_launch_url>http://www.example.com</blti:secure_launch_url>
...   <blti:icon>http://wil.to/_/beardslap.gif</blti:icon>
...   <blti:secure_icon>https://wil.to/_/beardslap.gif</blti:secure_icon>
...   <blti:vendor/>
...   <cartridge_icon identifierref="BLTI001_Icon"/>
... </cartridge_basiclti_link>
... '''
>>> config = ToolConfig.create_from_xml(CC_LTI_OPTIONAL_PARAMS_XML)
>>> 
>>> config.secure_icon
'https://wil.to/_/beardslap.gif'

I planned to add it just after the icon tag.

Oh, well for the serialization to XML the order shouldn't matter, so after should be fine there. It's the fuzzy in matching when parsing that can cause issues. If it's working already for parsing, then cool!

It’s the the tag order dependent parsing in process_xml due to substring based tag matching that I’m running in to. I should be able to get to a PR as proposed in the original post in an hour or two.

Closed by #60