A set of quick samples that I wrote for the purpose of experimentation of using Python language on handling Tamil text. These were written around 2019.
Details are in the following blog posts:
-
Python and Google Cloud Vision for Tamil text [https://venkatarangan.com/blog/2019/09/python-and-google-cloud-vision-for-tamil-text/]
-
Tools and Applications in Tamil [https://venkatarangan.com/blog/2019/09/tools-applications-available-for-tamil/]
-
Python code snippets for Speech in Tamil [https://venkatarangan.com/blog/2019/08/python-code-snippets-for-speech-in-tamil/]
These are simple examples and trials. Please treat them as as such. Some might be useful too, but that's an unintended consequence.
While randomly checking my GitHub account on 25 May 2024, I came across this repository and noticed it lacked basic context and comments, I have added the same and made the repository public.
This project is licensed under the MIT License.
This project was an experiment and is not maintained. Just take it as it is. Thanks.
Venkatarangan Thirumalai venkatarangan.com
I got the base code from here. You need to install SpeechRecognition package, available through PIP and PyAudio (available through PIP in Linux, but on Windows, you need to install appropriate package from here.
In 2019, when I posted the speech to text code, I got a reader request for a code that does the reverse – to speak out loud a sentence of given Tamil text. I got the base code from here. You need to install gTTS package, available through PIP. The code uses os.system to play the output audio file, this command works out of the box in Windows, in Linux you may need a command-line player like MPG321 (which can be installed using sudo apt-get install mpg321) You can listen to the output audio that’s produced by the code in the sample output MP3 file titled: output-tamil-audio.mp3.
The above were possible, thanks, to the numerous readymade packages that are available for free, and, the magic of cloud – in these two cases I am using Google Cloud, which required no configuration or key for trial runs. You may use Bing if you have an Azure API key.
This is a Quick 'n' Dirty code snippet that uses Google Cloud Vision OCR to extract the Tamil text from a given image containing Tamil text, then uses Google Cloud Translate to translate the Tamil text to English. The text parsing needs a lot more work to be better.
Tamil Text present in the image file as recognized by the Cloud Vision OCR:
புதிய இடத்தில் அதெல்லாம் பலிக்குமா?
பெரியவர் ஆறுமுகத்தோடு மல்லிகா நாகப்பட்டினத்
தில் கப்பலேறியபோது வழியனுப்ப வந்தவர்கள் வாய்
வார்த்தையின்றி அழுதழுது கண்ணைக் கடலாக்கிக்
கொண்டார்கள்.
அவர்களை அவள்தான் தேற்ற வேண்டியிருந்தது
"மாமாவின் கடைசி ஆசையை நான் நிறைவேற்ற
வேண்டாமா? அந்த ஆத்மாவுக்குச் சாந்தி கிடைக்க
வேண்டாமா? கோடை விடுமுறையில் தானே போகிறேன்.
The output English translation:
Will it be sacrificed in the new place? When Mallika sailed to Nagapattinam with the eldest, the passersby cried without a word. "Do I not fulfill my uncle's last wish? Do I not find peace with that soul? I am going on summer vacation.
This project uses the Grapheme package, which is a wonderful package to perform string operations on language recognized characters.
In 2004, when I faced the issue of counting of number of characters for a given Tamil string, the standard library functions were not useful for non-latin scripts, so I had to write a paper for a conference and submitted code written in .NET, Perl and VB.NET to solve this problem, specific for Tamil. Now, Grapheme makes it super easy and is updated upto Unicode 12.0.0 standard.