avi-perl / Hebrew

A python package with methods to handle the complexities of Hebrew text, calculate Gematria, and more.

Home Page:https://hebrew.aviperl.me/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Method `Hebrew.no_niqqud` is misleading

saroad2 opened this issue · comments

Hi there!

Thank you for this awesome library. It is very useful! I'm currently using it to create a machine-learning model for automatic niqqud.

I looked at the method Hebrew.no_niqqud, and its current form is very misleading. This method strips all of the niqqud characters from the word, but this alone doesn't make the word without niqqud.

Here is an example:
Look at the word אֹהֶל (tent). If I want to write this word without niqqud I need to add the letter ו to the word: אוהל.
Currently, the function Hebrew.no_niqud will turn the word to אהל, which is an incorrect translation.

My suggestion:
Rename the method Hebrew.no_niqqud to Hebrew.strip_niqqud. This is a much more accurate name for the method that is less misleading.
After that, create a new function named Hebrew.ctiv_male (full writing) that removes niqqud smartly, adding ו and י (vav and yud) whenever needed.

Let me know what do you think about the idea!

Oh wow, thank you! This is really valuable feedback!
Adding strip_niqqud and possibly depreciating no_niqqud or removing it entirely is a great idea.

As far as adding a method to intelligently add letters, this will need some study. If we are claiming to do this, we have to do it correctly. Are there other replacements besides the ones you mentioned? Are there cases where you do not? etc.

This should definitely be possible, thanks for bringing this up!

Hi Avi! Thank you for the kind words.

I'll make sure to add the new strip_niqqud methods and add deprecation warning for the no_niqqud method. I'll hopefully do it next week.

As for the Ktiv Male rules, you can find all of them in this link. As you can see, some of them are really simple (like changing every Kibutz to Vav), and some of them are are more complicated (like when to add Vav for the O vowel). If we are to implement this method, I think we'll have to do it step by step. It means that some of the rules will not be implemented right-away and we'll have some gaps for a while.

Let me know what you think.

I don't think it's proper to publish a method that does not do everything it claims, as users we would be quite upset at that. I'm afraid it's all or nothing on the letter replacement.

But no reason not to roll out the depreciation separately.

Thank you so much for the Ktiv Male rule source, looks like we have an excellent place to pull some unit tests from 😀

It'll take some careful design to implement that pattern, with consideration for things like the other non letter characters that are not nikudit, so the trop. Can we maintain those characters while adding letters? Should those chars be moved over to the new letter in some cases (in which case it likely will not work to keep them). Efficiency needs to be considered, and certainly the grapheme characters need to be considered throughout.

I'll admit that I have zero knowledge about Taamim, so I'm not sure how to answer your question, and I don't want to say anyhting that might mislead you.

As for your comment about the all-or-nothing approach with the ktiv_male method, my only one concern is the case of the Kamatz.

Kamatz can be either Kamatz Gadol which is pronounced as the A sound (meaning that no added latter needed), or Kamatz Katan which is pronounced as O sound (meaning we need to add Vav after the previous letter). While there are two different characters for Kamatz Gadol (Unicode: U+05B8) and Kamatz Katan (Unicode: U+05C7), I'm afraid some of the users may use the Kamatz Gadol character for ALL Kamatz appearances, which will lead for wrong results in the ktiv_male method.

It would be a reasonable decision to avoid implementing the Ktiv Male method all together, as it is not an easy method to implement, especially if we're in an all-or-nothing situation. I leave it to you as the owner of the library to make the decision if to go through with it or not. If the former is what you choose, I'll do my best to help you with that.

P.S. If you still live in Florida, you and I are in the same time zone. Maybe it would be easier to schedule a Zoom meeting between you and I and discuss all of the possible issues we might face implementing Ktiv Male. Feel free to email me so we can talk :)