LinkedDataFragments / HDT-Node

Native bindings for Node.js to access HDT compressed triple files.

Home Page:http://ruben.verborgh.org/blog/2014/09/30/bringing-fast-triples-to-nodejs-with-hdt/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prefix search for literals/strings

walling opened this issue · comments

Using HDT-it! browser, there's a prefix search for literals, when typing in the fields. It could be nice to allow in this module as well. Something along the lines of:

htdDocument.findLiteralsByPrefix(
  '"nice ',
  { offset: 0, limit: 10 },
  function(error, literals, totalCount) {
    console.log('Found approx. %s literals matching prefix pattern: %s', totalCount, literals.join(' '));
  }
);

Do you think it's possible to implement?

Absolutely; I have added this to the list of future features.
Pull requests welcome 😉

Ha, I added substring search a while ago but forgot to mention this here.
Is this useful for what you want to do?

Sorry, that I haven't got back to you for a long time. It is definitely useful. However I get an error “The HDT document does not support literal search” when testing it out on some HDTs downloaded here. Can I easily generate some more up-to-date HDTs myself or is there a good place to download some?

Btw. I think you can close this issue. :)

Sure! It's not about being up-to-date; you need to create an HDT with support for literal search. Not all HDT files do this, because there is some overhead involved (such as file size).

To create an HDT file with a literal index, execute:

rdf2hdt -c fmindex.hdtcfg -f turtle input.ttl output.hdt

where fmindex.hdtcfg is a configuration file.

Ah, maybe there's a difference between substring search and prefix search; I think that is part of my confusion. I can prefix search in all fields in the HDT-it app, with the files I downloaded at the rdfhdt.org website, but the app doesn't allow substring search as far as I know. I'll play around with this feature some more. Thank you!

Ah yes indeed, there is a difference. That's probably why I didn't close this issue originally.
Does this help for your use case, or do you need more?

Well, both are nice for different use-cases. Tonight I started looking at implementing HdtDocument#suggestions(prefix, options, callback, self). Example:

doc.suggestions(
  'http://wiktionary.dbpedia.org/resource/linked_',
  { component: 'object',
    limit: 20 },
  console.log
);

I forked the repo, implemented it in C++, and starting writing tests. However I do not get out any items in the results array, and I haven't figured out why. I am invoking the C++ method getSuggestions. If you have any clues, or if you want to implement it yourself, please let me know. I just wanted to help out a bit, since I think this could be useful.

Alright, I'll have a look at this the coming month.

Alright, I pushed my code if you want to have a look.

This is now implemented by #11