returns nothing for Thai
garfieldnate opened this issue · comments
Nathan Glenn commented
>> from wiktionaryparser import WiktionaryParser
>> parser = WiktionaryParser()
>> word = parser.fetch('ฉลาด')
>> word
[]
The page is clearly there on the website: https://en.wiktionary.org/wiki/%E0%B8%89%E0%B8%A5%E0%B8%B2%E0%B8%94. I'm trying to scrape the pronunciations.
Surkal commented
The language is english by default.
parser.fetch('ฉลาด', language='thai')
Nathan Glenn commented
Ah, that gets it. The info returned is not quite right, though:
[
{
'etymology': 'From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n', 'definitions': [
{
'partOfSpeech': 'adjective',
'text': ['ฉลาด • (chà-làat) (abstract noun ความฉลาด)', 'clever; smart; intelligent.'], 'relatedWords': [],
'examples': []
}
],
'pronunciations': {
'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'],
'audio': []
}
},
{
'etymology': '',
'definitions': [
{
'partOfSpeech': 'noun',
'text': ['ฉลาด • (chà-làat)', 'Alternative form of สลาด (slàat)'],
'relatedWords': [],
'examples': []
}
],
'pronunciations': {
'text': ['From Khmer ឆ្លាត (chlaat, “clever”). Compare Lao ສະຫລາດ (sa lāt).\n'],
'audio': []
}
}
]
The etymology is in the pronunciation text, and the pronunciation is missing altogether.
Suyash commented
Yeah well, the format of the pronunciations is different from most of the other words. I'm still working on it