mosuka / phalanx

Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Golang search ecosystem opportunity ?

gedw99 opened this issue · comments

commented

Hey @mosuka and @prabhatsharma and @mschoch

I raised an Issue here that i would like you to have a look at if you don't mind: opensearch-project/opensearch-go#82

As a gopher i really like to run golang everywhere and i see a great synergy / opportunity here to really get a great Search ecosystem happening for golang.

I want to stress that there is Text search and Elastic Search / opensearch-go both have different needs of course.
But they both need an easy to run solution for gophers.
So i wonder if there is some happy harmony possibility here ?
Zinc is the API with a single non HA solution (Maintainer is @prabhatsharma)
Phalanx has a HA solution for bluge (Maintainer is @mosuka)

Zinc provides the API and single server solution.
Phalanx could match the ZinC API and so provide a HA solution. Phalanx would also still provide its Facetted Text Search

I am probably missing lots of detail here i know, but it woudl be great to know your thoughts.
Maybe my proposed design solution is not optimal. But i think you can see my intent .

Hi @gedw99,
Thanks for your opinion.

First of all, let me explain why I started with Phalanx.
I've been using Solr and Elasticsearch for years. They are very nice products. I'm still using Elasticsearch at work.

However, in the last few years, the environment in which they are run has changed dramatically. This is the spread of public clouds and Kubernetes.
Solr and Elasticsearch were designed before public clouds and Kubernetes became popular, and Kubernetes also requires a lot of care to handle stateful applications that require data persistence, such as Solr and Elasticsearch. (This is my personal opinion based on my experience.)
I was serving hundreds of millions of documents with Elasticsearch on Kubernetes on a production environment, and most of the trouble was happening on the Elasticsearch data nodes (storage related) and master nodes. Sometimes those process failures would cause index corruption and bring down the cluster.

I was looking for a distributed search engine that would be less storage implicate and less cluster-breaking in the event of problems in the search engine processes.
So I started a project for a master node-less distributed search engine, separating the computation layer for searching and indexing from the storage layer for persisting the index.
The major difference between Phalanx and Solr or Elasticsearch is that Phalanx is designed to use object storage such as S3 or GCS on public clouds.

So Phalanx is a completely different design from Elasticsearch, and it is not a copycat of Elasticsearch, nor is it intended to be.
Before Phalanx, I had developed a distributed search engine using Raft.
At that time, someone asked me to provide an API compatible with Elasticsearch because they wanted to use it as a backend for Kibana, but I refused.
It might be possible to mimic the request and response format of the search API, but it would be a lot of work to support all of the Elasticsearch APIs that Kibana uses in addition to the search API, and I'm not really into Elasticsearch enough to go that far.

Phalanx is still a nascent project. I may follow a standard specification in the future if it is formulated, but I don't think it's the right time yet. I have a lot of work to do before I can mimic the Elasticsearch API.

I have prepared a discussions space, so if there is anything else you would like to discuss, please use so here. 😃

https://github.com/mosuka/phalanx/discussions