Anaethelion / gopher-hunting-elasticsearch

Go-ing Gopher Hunting with Elasticsearch and Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Go-ing Gopher Hunting with Elasticsearch and Go

This repository provides an introductory example of using the Elasticsearch Go client to find documents in Elasticsearch. Specifically, it covers three types of search:

  1. Traditional keyword search.
  2. Vector search, making use of the sentence-transformers/msmarco-MiniLM-L-12-v3 model from Hugging Face to generate the embeddings.
  3. Hybrid search combining the keyword and vector approaches.

How to Run

Elasticsearch Instance Setup

The quickest way to setup your own cluster is to register for a free trial of Elastic Cloud. You'll need to perform these additional steps:

  1. Note your Cloud ID
  2. Generate an API Key
  3. Populate your instance with data in the same format as those in the Sources section below
  4. Upload your model from Hugging Face using Eland
  5. Enriching your ingested documents using an ingest pipeline

Pre-requisites

This script requires setting the essential environment variables before running the script. I recommend using something like direnv, invoked via .envrc and then adding the variables to a top-level .env file. Alternatively you can explicitly set the environment variables in your current session according to your operating system.

The following environment variables are required:

  • ELASTIC_CLOUD_ID=<MY_INSTANCE_CLOUD_ID>
  • ELASTIC_API_KEY=<MY_API_KEY>

Starting the server

Running server.go will start a net/http server on port 80 that you can use to query Elasticsearch:

cd server
go run .

Navigate to the below URLs to obtain the Gopher search results for each search type:

Slides

The slides from the Women Who Go meetup @ Elastic are available in the docs/slides folder.

Sources

The below set of rodent-focused Wikipedia pages have been extracted to Elasticsearch using the Elastic Web Crawler:

If you're new to Go and would like to build your own Web Crawler, I recommend having a stab at this exercise in the Tour of Go where you can build your own concurrent web crawler.

Resources

Check out the below resources to learn more about Elasticsearch, Keyword Search and Vector Search.

Elasticsearch

  1. Elasticsearch
  2. Elasticsearch Go Client
  3. Understanding Analysis in Elasticsearch (Analyzers) by Bo Andersen | #CodingExplained

Vector Search

  1. code.sajari.com/word2vec
  2. huggingface | pkg.go.dev
  3. What is Vector Search | Elastic

LLMs and Natural Language Processing

  1. BERT 101: State Of The Art NLP Model Explained | Hugging Face
  2. sentence-transformers/msmarco-MiniLM-L-12-v3 | Hugging Face

About

Go-ing Gopher Hunting with Elasticsearch and Go

License:Apache License 2.0


Languages

Language:Go 100.0%