nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Arabic sentence split on the Arabic comma

ymoslem opened this issue · comments

Describe the bug
Arabic sentence split on the Arabic comma.

To Reproduce
Steps to reproduce the behavior:

import pysbd
text = "هذه تجربة، للغة العربية"
seg = pysbd.Segmenter(language="ar", clean=True)
>>> print(seg.segment(text))

Output: ['هذه تجربة،', 'للغة العربية']

Expected behavior
The text should not be split on the Arabic comma.
Expected output: ['هذه تجربة، للغة العربية']

Additional context
I locally fixed it by modifying the file: pysbd/lang/arabic.py, deleting ، from SENTENCE_BOUNDARY_REGEX.