Saving Embeddings to JSON file

Overview

This is an example Node.js application processes a text corpus, generates embeddings for "chunks", and saves the embeddings to a local file. The embeddings can be used in another application (like a Retrieval Augmentated Generation system or 2D/3D clustering demonstration using UMAP dimensionality reduction)

There are two main scripts in this project:

`embeddings-replicate.js``: Generates embeddings using the Llama model on Replicate.
`embeddings-transformers.js``: Generates embeddings using the bge-small model with transformers.js.

Both scripts output the embeddings to embeddings.json.

Uses the transformers.js package and bge-small model for embeddings generation.
embeddings-transformers.js: Script to process a text file and generate embeddings using the bge-small model.

npm install

REPLICATE_API_TOKEN=your_api_token_here

You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

Then:

node embeddings-replicate.js

Generate the embeddings.json file. Adjust the text filename and splitting method as needed:

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

node embeddings-transformers.js