Programming-from-A-to-Z / Save-Embeddings-JSON

Using bge-large-en-v1.5 to save embeddings to a local file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Saving Embeddings to JSON file

Overview

This is an example Node.js application processes a text corpus, generates embeddings for "chunks", and saves the embeddings to a local file. The embeddings can be used in another application (like a Retrieval Augmentated Generation system or 2D/3D clustering demonstration using UMAP dimensionality reduction)

There are two main scripts in this project:

  • `embeddings-replicate.js``: Generates embeddings using the Llama model on Replicate.
  • `embeddings-transformers.js``: Generates embeddings using the bge-small model with transformers.js.

Both scripts output the embeddings to embeddings.json.

Replicate with Llama model

Using transformers.js with bge-small model

  • Uses the transformers.js package and bge-small model for embeddings generation.
  • embeddings-transformers.js: Script to process a text file and generate embeddings using the bge-small model.

A map of clustered p5.js function names

References

How-To

  1. Install Dependencies
npm install

For Replicate (embeddings-replicate.js)

  1. Set up the .env file with your Replicate API token:
REPLICATE_API_TOKEN=your_api_token_here
  1. Generate the embeddings.json file.

You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

Then:

node embeddings-replicate.js

For transformers.js (embeddings-transformers.js)

  1. Generate the embeddings.json file. Adjust the text filename and splitting method as needed:
const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);
node embeddings-transformers.js

About

Using bge-large-en-v1.5 to save embeddings to a local file


Languages

Language:JavaScript 100.0%