eclipse / jnosql

Eclipse JNoSQL is a framework which has the goal to help Java developers to create Jakarta EE applications with NoSQL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Faceted search API?

salesportal opened this issue · comments

Hi,

For hobby and learning purposes I am writing a simple salesportal that uses Elasticsearch (or Lucene locally) through an abstraction that should work on top of Apache Solr or other such.

It has annotated POJOs ala JPA with annotations like @Facet and @Freetext on fields and then a simple search API.
I thought faceted search is such a common scenario that it ought to be standardised, and my quest for looking for such made me end up at this project :)

From what I can see this is yet another scenario in addition to what you have, although you do have support for Elastic, probably as a plain document DB? Since it can store fields and retrieve them back and do search on the documents based on fields.

For the record I am not after standardising my own API, just thought the use case ought to be standardised.

That said search API I have now is like below, to give an idea what I am thinking of.

Eg. Elasticsearch can do way much more beyond faceted search (they have nested aggregations and facets are a special case of that) but faceted search is a concrete usecase that no matter what should be expressed as a common API IMO.

Search API:
ItemAttribute is from the metamodel (in JPA speak), such searches often are generic, ie the sales portal show all available attributes (fields) and facets so one should use the metamodel instead of writing specific searches (attribute names are sent to webpage without it knowing that "this is Make attribute, this is Model attribute" and so on).

This means that normally this API call is called just one or at most a few places in such an application, however the value of a standardised and stable API remains. The implementations of the API for Elasticsearch, Solr or Lucene (for test or eg. a desktop app with local storage) are complex enough to justify exposing a simplified API from an annotated data model.
Concept of annotated datamodel goes well with existing tech like JPA, which you also utilise for documents.

       IndexSearchCursor search(
                         List<Class<? extends Item>> types,
                         String freeText,
                         List<Criterium> criteria,
                         List<SortAttributeAndOrder> sortOrder,
                         boolean returnSortAttributeValues,
                         Set<ItemAttribute> fieldAttributes,
                         Set<ItemAttribute> facetAttributes) throws ItemIndexException;

search parameters

  • types - Car, RentalApartment, Laptop or whatever you can think of in any ad or sales portal
  • freetext - most portals combine facetd search with freetext filter
  • criteria - nested criteria for value match (make=Ford) or range (price $10000-$20000). Supports nesting, eg. model under make (F-150).
  • sortOrder - like in SQL order by
  • whether also to return the field values for sorted attributes back from matching documents
  • field attributes - fields to return from matching documents
  • facetAttributes - fields to facet on, must have been annotated with @Facet

The result is a cursor to iterate over matching documents and their field values and an object containing facet counts for all specified facet attributes. These can be nested, eg:

Make
   Ford (1253)
        Model
              F-150 (231)
              Focus (54)
   GM (632)
         Model
              Chevy Bolt (32)

Annotations to support this are something like the below. Includes JPA annotations for good measure and a JPA implementation of API exists, but facet counting must be done on client side, was just to test if was doable. No freetext search support in JPA.

@DisplayAttribute is for saying this should be shown in ad details view so not directly related.
@NumericAttributeFiltering says how to present filtering to the user, eg. checkbox for each value, range input form or a checkbox list with predefined ranges to select from.


@MappedSuperclass
public abstract class RetailItem extends PurchasableItem {

      @Sortable(priority=4)
      @Facet
      @DisplayAttribute("Make")
      @Column
      private String make;

      @Sortable(priority=3)
      @Facet(superAttribute="make")
      @DisplayAttribute("Model")
      @Column
      private String model;
      
      @Sortable(priority=2)
      @Facet
      @NumericAttributeFiltering(FacetFiltering.INPUT)
      @DisplayAttribute("Production year")
      @Column
      private Integer productionYear;
}        

These features are excellent, we should have it, however, once that is for a specific database. We can put it as an extension for these databases.

OK, makes sense. It is reusable and abstractable for at least more than one search index, but still somewhat on the side. Would not apply to all document stores, not all of them can do faceting at fast enough speeds.

Would require to expose meta model though, I don't know if that is done today and one exposes javax.persistence.metamodel.Metamodel or one exposes an separate API with only the relevant information.
I would propose the latter, even if you reuse JPA annotations to annotate POJOs, for meta model return types that are specific to your project so that it is not confusing which metamodel data is valid in NoSQL case, and you can extend eg. with facet information like above, or other custom information not found in relational DBs.

I'll close this issue; we can work on as mapping-extension if you wish
https://github.com/eclipse/jnosql-mapping-extension