medzuslovjansky / database-engine

Motor za rabotu s: https://github.com/medzuslovjansky/database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

@interslavic/database-engine

npm version Build Status Coverage Status

Utilities to flavorize dictionary entries and calculate a smarter distance between languages.

To install in your project, use:

npm install @interslavic/database-engine --save

Flavorization rules

At the moment, rules are stored in CSV files under __fixture__/rules directory.

To calculate written intelligibility between Interslavic words and words from another Slavic language, this repository relies on the following algorithm:

  1. Every Interslavic entry gets processed in three separate ways:
    1. Etymological. The entry goes through the replacement rules, that have E letter in their respective flavorizationLevel column.
    2. Standard. Only rules with S letter in the flavorizationLevel column will be applied for the transformation.
    3. Mistaken. Same as above, but for M letter.
  2. The translation variants undergo all transformations with R (i.e., Reverse) letter defined in their flavorization level column.

In the end, there will be multiple variants to compare, e.g.:

Interslavic word: jęčmenny
Russian translation: ячменный (jačmennyj)

Mistaken reading (distance = 6):
jęčmenny -> jecmenny -> джецменну (džecmennu)

Standard reading (distance = 2):
jęčmenny -> ječmenny -> йечменны -> ечменны

Etymological reading (distance = 0):
jęčmenny (adj.) -> jačmenny -> jačmennyj -> йачменный -> ячменный

The algorithm uses Levenshtein's editing distance to tell how close the Interslavic word and its translation are. If the base letters are the same except for diacrictics, the editing distance is considered to be 0.5 characters instead of 1.

About

Motor za rabotu s: https://github.com/medzuslovjansky/database

License:GNU General Public License v3.0


Languages

Language:JavaScript 57.5%Language:TypeScript 42.5%Language:Shell 0.0%