In Natural Language Processing, there are numerous applications for searching information in texts across different
languages around the world. However, for Arabic, this is rare due to the complexity of its processing and its unique
characteristics.
In this project, three pattern matching algorithms were implemented, tested, and compared in the Arabic language: Brute-Force (Naïve), Boyer-Moore-Horspool, and Knuth-Morris-Pratt algorithms.
The goal of this project is to solve the problem of information search in Arabic texts, regardless of the presence of
diacritical symbols. This platform uses Artificial Intelligence and Natural Language Processing algorithms to not only to
allow the user to easily search for information in Arabic texts, but also to determine how often it occurs and precisely
locate it within the text.
The project uses various algorithms and methods for pattern matching in the Arabic language, including:
- Pattern Matching Algorithm
- Token Count Vectorizer
- Arabic shakeel Function
- Strip_shakeel Function
The project is written using Java, J2EE, Apache Tomcat, HTML5, CSS3, JS, MySQL.
This project demonstrates the potential of using Artificial Intelligence and Natural Language Processing algorithms for search in Arabic texts, regardless of the presence of diacritical symbols. The different algorithms and methods used in the project provide a comprehensive approach to easily search for information in Arabic texts, but also to determine how often it occurs and precisely locate it within the text.