Separate simplified and traditional Chinese versions needed
peter17ji opened this issue · comments
Some of the countries have different translation in simplified and traditional Chinese, thus should be collected separately. For example, Antigua and Barbuda(zh-tw:安地卡及巴布達; zh-cn:安提瓜和巴布达) and Cyprus(zh-tw:賽普勒斯;zh-cn:塞浦路斯).
Wikipedia provides many kinds of Chinese variation, including Mainland China Chinese(zh-cn), Taiwan Chinese(zh-tw), Hong Kong Chinese(zh-hk)(not to be confused w/ Cantonese), Macau Chinese(zh-mo), Singapore Chinese(zh-sg) and Malaysia Chinese(zh-my). Mainly zh-cn and zh-tw are needed, but finishing all these versions is appreciated.
This is what I am currently using. Is this the simplified Chinese? Please give me a link to the other one and I'll definitely add it. Thanks!
This is what I am currently using. Is this the simplified Chinese? Please give me a link to the other one and I'll definitely add it. Thanks!
No, it's currently a mix-up. It both contains zh-cn(克罗地亚, zh-tw:克羅埃西亞) and zh-tw(賽普勒斯, zh-cn:塞浦路斯).
https://zh.wikipedia.org/zh-cn/Wikipedia:%E9%A6%96%E9%A1%B5
https://zh.wikipedia.org/zh-tw/Wikipedia:%E9%A6%96%E9%A1%B5
https://zh.wikipedia.org/zh-hk/Wikipedia:%E9%A6%96%E9%A1%B5
https://zh.wikipedia.org/zh-mo/Wikipedia:%E9%A6%96%E9%A1%B5
https://zh.wikipedia.org/zh-my/Wikipedia:%E9%A6%96%E9%A1%B5
https://zh.wikipedia.org/zh-sg/Wikipedia:%E9%A6%96%E9%A1%B5
Again, if you give me links to a correct and constantly updated list of countries (preferable from Wikipedia) I will very gladly add to the list. If the Wikipedia list is not up to date or it is incorrect, by all means, go and update the list - that is what Wikipedia is about after all! I did so for the list in Romanian.
Oops, I thought you were asking me for Wikipedia of different Chinese variations.
https://zh.wikipedia.org/zh-cn/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
https://zh.wikipedia.org/zh-hk/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
https://zh.wikipedia.org/zh-mo/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
https://zh.wikipedia.org/zh-my/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
https://zh.wikipedia.org/zh-sg/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
https://zh.wikipedia.org/zh-tw/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8D%80%E7%B4%A2%E5%BC%95
Better use a headless Chrome instead of HTML parser like Beautiful Soup, because Chinese variations' conversion is done on-the-fly.
This is now fixed by #32 and #35 and available in release 1.1.1
You can also grab it from npm or customize your download