suyashb95 / WiktionaryParser

A Python Wiktionary Parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

returns nothing for Thai

garfieldnate opened this issue · comments

>> from wiktionaryparser import WiktionaryParser
>> parser = WiktionaryParser()
>> word = parser.fetch('ฉลาด')
>> word
[]

The page is clearly there on the website: https://en.wiktionary.org/wiki/%E0%B8%89%E0%B8%A5%E0%B8%B2%E0%B8%94. I'm trying to scrape the pronunciations.

The language is english by default.

parser.fetch('ฉลาด', language='thai')

Ah, that gets it. The info returned is not quite right, though:

[
    {
        'etymology': 'From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n', 'definitions': [
            {
                'partOfSpeech': 'adjective', 
                'text': ['ฉลาด • (chà-làat) (abstract noun ความฉลาด)', 'clever; smart; intelligent.'], 'relatedWords': [], 
                'examples': []
            }
        ], 
        'pronunciations': {
            'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'], 
            'audio': []
        }
    }, 
    {
        'etymology': '', 
        'definitions': [
            {
                'partOfSpeech': 'noun', 
                'text': ['ฉลาด • (chà-làat)', 'Alternative form of สลาด (slàat)'], 
                'relatedWords': [], 
                'examples': []
            }
        ], 
        'pronunciations': {
            'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'], 
            'audio': []
        }
    }
]

The etymology is in the pronunciation text, and the pronunciation is missing altogether.

Yeah well, the format of the pronunciations is different from most of the other words. I'm still working on it