zhihu / rucene

Rust port of Lucene

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Which index codecs are supported?

mooreniemi opened this issue · comments

Lucene has a lucene-backward-codecs library.

In trying to run a Lucene 8 shard, I hit:

Error: Error(CorruptIndex("index format either too new or too old: 4 <= 9 <= 6 doesn\'t hold"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })

Any plans to expand the supported codecs? Could you document which codecs are supported currently? Based on above, I assume between 4 and 6.

We started Rucene three years ago with Lucene 6.2 being the latest Codec at that time. After the long journey of migrating ES based search engine to Rucene, we no longer need to maintain compatibility with Lucene. Therefore we didn't upgrade Lucene codec to later version and made some binary incompatible changes to implement certain features that is crucial to our scenarios, in place update for example.

I understand. The downside is that given industry has moved on to Lucene 7 and 8, on-boarding existing indices onto Rucene would mean downgrading them first. This limits the utility.

Have you documented the incompatible changes you made and why?

I may end up writing a codec reader from Rust to serve my purpose, and if it would be useful, could contribute it back.

I understand. The downside is that given industry has moved on to Lucene 7 and 8, on-boarding existing indices onto Rucene would mean downgrading them first. This limits the utility.

Have you documented the incompatible changes you made and why?

I may end up writing a codec reader from Rust to serve my purpose, and if it would be useful, could contribute it back.

Unfortunately due to limited resources, we didn't make the incompatible changes to a new codec but on the only codec we have instead. We will try translating internal documents about the changes and rational behind it. Will let you know when documents are ready.

As the codec for newer Lucene version idea, that will be great. We really appreciate your kindness help, let us know whenever you need help.

I've been taking a look at this but without knowing what changes you made to the standard 6.x codec it's a bit tricky to translate. Even a rough summary here is helpful.

What I may do instead is just try going from scratch translating a more recent codec.

hi, we based on es-5.4 with lucene-core-6.4.18, and do not follow any higher version.

some special codec changes are these:
doc values update:
b9a43cc,
9c6e614,
ef encoder:
a700992,
using simd:
6629d2f,

Thanks!

Is there a way to open an IndexReader without writing anything? I have gotten as far as reading norms now, and 1. the code assumes all fields have norms (not true for ES _id) so I have to chop around this and 2. when I read the index I'm corrupting it even though I think I've commented out everywhere it tries to write...

Thanks!

Is there a way to open an IndexReader without writing anything? I have gotten as far as reading norms now, and 1. the code assumes all fields have norms (not true for ES _id) so I have to chop around this and 2. when I read the index I'm corrupting it even though I think I've commented out everywhere it tries to write...

I'm afraid that Rucene can now only open the Index which created by Rucene, no more ES-index, included 6.4.18. :(

I know, further up in the thread I was told that but I am pursuing opening a more recent index. I have search and reading stored fields working. I may or may not actually try to get doc values working, not sure I need them for my use case.

But my main issue now is just that it seems somewhere Rucene writes/corrupts the index while reading it, and I can't figure out where. I've disabled creating backup segments and everywhere else I see it write. In my case I am not dealing with a live index so that stuff is not necessary. If you have any other pointers about that it's appreciated, otherwise feel free to close this.

If I actually get to the doc values stuff I will submit a PR.

I found the call to write. All set now. :)