inkinworld / milvus-tools

A data migration tool for Milvus.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Milvus Data Migration Tool

Overview

MilvusDM (Milvus Data Migration) is a data migration tool for Milvus that supports importing Faiss and HDF5 data files into Milvus, migrating data between Milvus, and it also supports batch backup of Milvus data to local files. Using milvusdm can help developers improve usage efficiency, reduce operation and maintenance costs.

Getting started

Prerequisites

  • Operating system requirements
Operating system Supported versions
CentOS 7.5 or higher
Ubuntu LTS 18.04 or higher
  • Software requirements
Software Version
Milvus 0.10.x or 1.x or 2.x
Python3 3.7 or higher
pip3 Corresponds to python version.

Install

  • Configure environment variables

Add the following two lines to ~/.bashrc file:

export MILVUSDM_PATH='/home/$user/milvusdm'
export LOGS_NUM=0

MILVUSDM_PATH: This parameter defines the working path of milvusdm. Logs and data generated by Milvusdm will be stored in this path. The default value is /home/$user/milvusdm.

LOGS_NUM: Milvusdm log file generates one per day. This parameter defines the number of log files to be saved. The default value is 0, which means all log files are saved.

Make the configured environment variables:

$ source ~/.bashrc
  • Install milvusdm by pip
$ pip3 install pymilvusdm==2.0

The pymilvusdm2.0 is used to migrate data from Milvus(0.10.x or 1.x) to Milvus2.x.

How to use

Faiss to Milvus

Export one Faiss index file to Milvus in a specified collection or partition.

In the current version, only flat and ivf_flat indexes for floating data are supported.

  • Download the example yaml
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/F2M.yaml
  • Config the yaml
F2M:
  milvus_version: 1.x
  data_path: '/home/data/faiss1.index'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'append'
  dest_collection_name: 'test'
  dest_partition_name: ''
  collection_parameter:
    dimension: 256
    index_file_size: 1024
    metric_type: 'L2'
  1. Optional parameters: dest_partition_name
  2. The Parameter mode can be selected from append, skip, overwrite. This parameter takes effect only when the specified collection name exists in Milvus library.

​ append: Append data to the existing collection

​ skip: Skip the existing collection and do not perform any operations

​ overwrite: Delete the old collection, create a new collection with the same name and then import the data.

  • Parameter description
parameter description example
F2M Task: Export data in HDF5 to Milvus.
milvus_version Version of Milvus. 0.10.5
data_path Path to the data in Faiss. '/home/user/data/faiss.index'
dest_host Milvus server address '127.0.0.1'
dest_port Milvus server port. 19530
mode Mode of migration. 'append'
dest_collection_name Name of the collection to import data to. 'test'
dest_partition_name Name of the partition to import data to. (Optional) 'partition'
collection_parameter Collection-specific information such as vector dimension, index file size, and similarity metric. dimension: 512
index_file_size: 1024
metric_type: 'HAMMING'
  • Usage
$ milvusdm --yaml F2M.yaml

HDF5 to Milvus

Export one or more HDF5 files to Milvus in a specified collection or partition.

We provide the HDF5 examples of float vectors(dim-100) and binary vectors(dim-512) and their corresponding ids.

  • Download the yaml
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/H2M.yaml
  • Config the yaml
H2M:
  milvus-version: 1.x
  data_path:
    - /Users/zilliz/float_1.h5
    - /Users/zilliz/float_2.h5
  data_dir:
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'overwrite'        # 'skip/append/overwrite'
  dest_collection_name: 'test_float'
  dest_partition_name: 'partition_1'
  collection_parameter:
    dimension: 128
    index_file_size: 1024
    metric_type: 'L2'

or

H2M:
  milvus_version: 1.x
  data_path:
  data_dir: '/Users/zilliz/HDF5_data'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'append'        # 'skip/append/overwrite'
  dest_collection_name: 'test_binary'
  dest_partition_name: 
  collection_parameter:
    dimension: 512
    index_file_size: 1024
    metric_type: 'HAMMING'
  1. Optional parameters: dest_partition_name
  2. Just configure data_path or data_dir, while the other one is None.
  • Parameter description
parameter description example
H2M Task: Export data in HDF5 to Milvus.
milvus_version Version of Milvus. 0.10.5
data_path Path to the HDF5 file. - /Users/zilliz/float_1.h5
- /Users/zilliz/float_1.h5
data_dir Directory of the HDF5 files. /Users/zilliz/Desktop/HDF5_data
dest_host Milvus server address. '127.0.0.1'
dest_port Milvus server port. 19530
mode Mode of migration 'append'
dest_collection_name Name of the collection to import data to. 'test_float'
dest_partition_name Name of the partition to import data to.(optional) 'partition_1'
collection_parameter Collection-specific information such as vector dimension, index file size, and similarity metric. dimension: 512
index_file_size: 1024
metric_type: 'HAMMING'
  • Usage
$ milvusdm --yaml H2M.yaml

Milvus to Milvus

MilvusDM does not support migrating data from Milvus 2.0 standalone to Milvus 2.0 cluster.

Copy a collection of source_milvus or multiple partitions of a collection into the corresponding collection or partition in dest_milvus.

  • Download the yaml
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2M.yaml
  • Config the yaml
M2M:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    test:
      - 'partition_1'
      - 'partition_2'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'skip' # 'skip/append/overwrite'

Or

M2M:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
  source_collection: # specify the collection named 'test'
    test:
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'skip' # 'skip/append/overwrite'
  1. Required parameters: source_milvus_path, source_collection, dest_host, dest_port and mode.
  2. If you are using MySQL to manage source_milvus metadata, configure the mysql_parameter parameter, which is empty if you are using SQLite.
  3. The source_collection parameter must specify a collection name, and the following partition name is optional and multiple partitions can be added.
  • Parameter Description
parameter description example
M2M Task: Copy the data from Milvus to the same version of Milvus.
milvus_version The dest-milvus version. 0.10.5
source_milvus_path Working directory of the source Milvus. '/home/user/milvus'
mysql_parameter MySQL settings for the source Milvus, including mysql host, user, port, password and database parameters. host: '127.0.0.1'
user: 'root'
port: 3306
password: '123456'
database: 'milvus'
source_collection Names of the collection and its partitions in the source Milvus. test:
- 'partition_1'
- 'partition_2'
dest_host Target Milvus server address. '127.0.0.1'
dest_port Target Milvus server port. 19530
mode Mode of migration 'skip'
  • Usage

    It will copy the source_milvus collection data to dest_milvus.

$ milvusdm --yaml M2M.yaml

Milvus to HDF5

Export a Milvus collection or multiple partitions of a collection to a local HDF5 format file.

  • Download the yaml
$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2H.yaml
  • Config the yaml
M2H:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    test:
      - 'partition_1'
      - 'partition_2'
  data_dir: '/home/user/data'

Or

M2H:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
  source_collection: # specify the collection named 'test'
    test:
  data_dir: '/home/user/data'
  1. The source_milvus_path, source_collection, and data_dir parameters are required.
  2. If you are using MySQL to manage source_milvus metadata, configure the mysql_parameter parameter, which is empty if you are using SQLite.
  3. The source_collection parameter must specify a collection name, and the following partition name is optional and multiple partitions can be added.
  • Parameter description
parameter description example
M2H Task: Export Milvus data to local HDF5 format files.
milvus_version The source-milvus version. 0.10.5
source_milvus_path Working directory of Milvus. '/home/user/milvus'
mysql_parameter MySQL settings for Milvus, including mysql host, user, port, password and database parameters. host: '127.0.0.1'
user: 'root'
port: 3306
password: '123456'
database: 'milvus'
source_collection Names of the collection and its partitions in Milvus. test:
- 'partition_1'
- 'partition_2'
data_dir Directory to save HDF5 files. '/home/user/data'
  • Usage

    It will generate the corresponding hfd5 format file and H2M configuration file in the data_dir directory.

$ milvusdm --yaml M2H.yaml

Code Structure

If you would like to contribute code to this project, you can find out more about our code structure:

  • pymilvusdm
    • core
      • milvus_client.py: Performs client operations in Milvus.
      • read_data.py: Reads the HDF5 files on your local drive. (Add your code here to support reading data files in other formats.)
      • read_faiss_data.py: Reads Faiss data files.
      • read_milvus_data.py: Reads Milvus data files.
      • read_milvus_meta.py: Reads Milvus metadata.
      • data_to_milvus.py: Creates collections or partitions as specified in .yaml files and imports vectors and the corresponding IDs into Milvus.
      • save_data.py: Saves data as HDF5 files.
      • write_logs.py: Writes debug/info/error logs during runtime.
    • faiss_to_milvus.py: Imports Faiss data into Milvus.
    • hdf5_to_milvus.py: Imports HDF5 files into Milvus.
    • milvus_to_milvus.py: Migrates data from a source Milvus to a target Milvus.
    • milvus_to_hdf5.py: Saves Milvus data as HDF5 files.
    • main.py: Executes tasks as specified by the received .yaml file.
    • setting.py: Stores configurations for MilvusDM operation.
  • setup.py: Creates and uploads pymilvusdm file packages to PyPI (Python Package Index).

Todo

  • Faiss to Milvus
    • Supports vector data of binary type.
  • Milvus to Milvus
    • Specify multiple collections or multiple partitions based on blocklist and allowlist.
    • When source_collection='*', all Milvus data is exported.
    • Support for merging multiple collections or partitions in source_milvus into one collection in dest_milvus.
  • Milvus Dump
  • Milvus Restore

About

A data migration tool for Milvus.

License:Apache License 2.0


Languages

Language:Python 100.0%