outr / lucene4s

Light-weight convenience wrapper around Lucene to simplify complex tasks and add Scala sugar.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question - Creating a field for exact matches

hajime-moto opened this issue · comments

I'm trying to create a field that needs to be exact matched(case sensitive) character by character irrespective of the content of the field. I've set tokenized to false but the output fails when the text contains uppercase and/or spaces.

Here is my test.

import com.outr.lucene4s._
import com.outr.lucene4s.field.{Field, FieldType, IndexOption}

object LuceneTest extends App {
  val lucene: Lucene = new DirectLucene(uniqueFields = List.empty, defaultFullTextSearchable = true, autoCommit = true)

  val email: Field[String] =
    lucene.create.field[String](
      name = "uniqueId",
      fieldType =
        FieldType(
          indexOptions = Set(IndexOption.Documents, IndexOption.Frequencies, IndexOption.Positions, IndexOption.Offsets),
          tokenized = false,
          stored = true,
          frozen = true
        )
    )

  lucene.doc().fields(email("Has_UpperCase_No_Spaces@email.com")).index()
  lucene.doc().fields(email("has spaces@email.com")).index()
  lucene.doc().fields(email("no_upper_case@email.com")).index()

  //prints no output but I expected the email to get returned
  println(lucene.query().filter(exact(email("Has_UpperCase_No_Spaces@email.com"))).search().results.map(_ (email)))
  //prints no output but I expected the email to get returned
  println(lucene.query().filter(exact(email("has spaces@email.com"))).search().results.map(_ (email)))
  //prints no_upper_case@email.com as expected
  println(lucene.query().filter(exact(email("no_upper_case@email.com"))).search().results.map(_ (email)))
}

Can this be achieved with Lucene ?

Thank you

Hmmm, that's a good question. Much of lucene4s was designed around tokenized indexes, so it's possible this is a bug in the code. Let me do some testing and get back to you.

@hajime-moto, are you able to checkout and build the latest from master? I'd like you to verify the issue is fixed before I do another release. Note, in the new version you can use FieldType.Untokenized instead of creating the field manually.

Yep fixed! Just did a pull & build. This is awesome! Thank you for this amazing library.