akosbalasko / yarle

Yarle - The ultimate converter of Evernote notes to Markdown

Home Page:https://github.com/akosbalasko/yarle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: apply links based on Levensthein-distance

akosbalasko opened this issue · comments

matboehmer's great idea is, in order to increase the number of the recognizable chains between links two end, that Yarle could try to do it by calculating a Levensthein distance between the text of the link and the existing notes' title created and apply the link to the minimal one.

If more than one notes has minimal distance, based on another setting Yarle could do the followings:

  1. do not add a link to any of them (it results link loss)
  2. link to all of the notes (it results extra links which were not set in Evernote)
  3. link to the first of them (may result link mixtures)

As an MVP I would implement case 3.

Thanks! Really looking forward to this one. Happy to serve as a tester.

Hi @matboehmer !
I've created a pre-release with this Levensthein-distance linking feature, feel free to download from here https://github.com/akosbalasko/yarle/releases/tag/v5.8.0 and test it.
Thanks a lot!

Thanks, great! How can I run the code using npx or any other way? I am not sure if npx -p yarle-evernote-to-md@5.8.0 yarle --configFile config.json uses the latest code.

@matboehmer
yes yes, it should work as you wrote, just extend your config.json with a new property:

useLevenshteinForLinks: true

Thanks, got it! However, it does not work for me. It seems like applyLinks in apply-links.js is only called once and also the if (options.useLevenshteinForLinks) block is only called once (I added a console output for debugging). However, in the test set I posted in #530 there are 4 links. So, from my understanding the levenshtein lookup should also be done 4 times?

hm... it is iterated through the recognized links and replaces the link URLs everywhere in the notes folder. Let me check.

It's hard to create a real test for multiple links, the Evernote fails to sync for me currently. So it will take a bit of time, sorry.

@matboehmer could you pls give it a try via the UI?
thanks a lot!

Same result; also does not work using the app UI. Does it work for you? Do you have some test data you could share?

It works for me with your data set, but not with the one I postet here #530 (comment)

I think that one, what you shared in the comment reflects a different issue which cannot be resolved easily.
What i implemented is that if the the referenced note is recognized by its note text's shortest Levenshtein-distance.
For instance if the text of the note is mistyped like notA is typed instead of noteA, and there is no notes that's name is more similar than this, then notA is going to be picked.

In my example data in #530 (comment) the wrong link is created as [[first-note|second note]] in both files first-note and second-note. However, the link [[first-note|second note]] could be fixed to [[second-note|second note]] (i.e., replacing first-note with second-note) by looking up a proper link target using Levensthein distance.

@matboehmer ,
Okay, I found a bug around the unique id recognizer that caused that the links could overlap each other. Now it is fixed, I checked with your example, and as I see it fixes your issue, but please confirm.
Thanks a lot!

Great, thank you! Works perfectly now on my test data set and already really good on my real data set. Thank you very much for adding this feature!