intersystems / iknow

Community development repository for iKnow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mark questions with a different end-marker

ISC-SDE opened this issue · comments

Each sentence gets two marker-lexreps: B with label SBegin at the beginning, and E with label SEnd at the end.
For linguistic processing, it would be helpful to know whether the input is an affirmative sentence or a question. Therefore, I would like to get different markers for questions, especially at the end of the sentence.

Proposal:
affirmative sentence: SBegin - SEnd
question in Spanish (with reverse question mark at the beginning of the sentence): QBegin - QEnd
question in other Western languages: SBegin or QBegin (whatever is easier to implement) - QEnd

The proposal sounds good.
In Japanese, question sentences can use "?" at the end like English, but it is also possible to use "。" like non-question sentences. The sentence structure normally determines whether it's a question or not. However, it probably would be helpful to have the same implementation as proposed for "other Western languages", i.e., if a sentence ends with a question mark, get different label for end-of-sentence than SEnd.

Branch Issue#183 has been created to test with this new functionality. At present, following labels are assigned:

(text input is "Are you a little confused ?")

+[8] "LexrepIdentified:<lexrep id=1 type=Nonrelevant value="" index="B" labels="SBegin;QBegin;" />;" std::string
+[15] "LexrepIdentified:<lexrep id=10 type=Nonrelevant value="" index="E" labels="SEnd;QEnd;" />;" std::string

This has been done to preserve the current test material, but it offers the possibility to write rules referencing the new "QBegin" and "QEnd" labels.
I can later remove the "SBegin" and "SEnd" labels if that would be needed...

Thank you for the implementation.
As a test, I added some rules with QBegin and QEnd in language models and it worked as expected, so it is ready to be integrated in master.

Ok, I'll merge the branch into master.