CUNY-CL / wikipron

Massively multilingual pronunciation mining

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dialect specifier breakage

kylebgorman opened this issue · comments

Investigate this breakage:

    @pytest.mark.skipif(not can_connect_to_wiktionary(), reason="need Internet")
    def test_american_english_dialect_selection():
        # Pick a word for which Wiktionary has dialect-specified pronunciations
        # for both US and non-US English.
        word = "mocha"
        html_session = requests_html.HTMLSession()
        response = html_session.get(
            _PAGE_TEMPLATE.format(word=word), headers=HTTP_HEADERS
        )
        # Construct two configs to demonstrate the US dialect (non-)selection.
        config_only_us = config_factory(key="en", dialect="US | American English")
        config_any_dialect = config_factory(key="en")
        # Apply each config's XPath selector.
        results_only_us = response.html.xpath(config_only_us.pron_xpath_selector)
        results_any_dialect = response.html.xpath(
            config_any_dialect.pron_xpath_selector
        )
>       assert (
            len(results_any_dialect)  # containing both US and non-US results
            > len(results_only_us)  # containing only the US result
            > 0
        )
E       AssertionError: assert 2 > 2
E        +  where 2 = len([<Element 'li' >, <Element 'li' >])
E        +  and   2 = len([<Element 'li' >, <Element 'li' >])

tests/test_wikipron/test_config.py:202: AssertionError

The breakage indicates that even with dialect selection enabled at US | American English you actually obtain all pronunciations. E.g. for this page used in the tests, we grab both elements under the Pronunciation header even though the latter does not match the dialect specification.

This is currently blocking #509.

Hi @jacksonllee sorry to bother, any intuitions about what's going on here? I suspect the failure of Latin to grab anything in #509 is related too.

The issue seems to be that the dialect selector wants @class = "ib-content qualifier-content" but it's now just @class = "ib-content". I'll try this fix out and report back in a few days.