sandrain / hpssix

A metadata indexing framework for large-scale storage systems, such as HPSS archival storage system.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HPSSIX: HPSS Metadata Indexing

HPSSIX incrementally indexes metadata of files in running HPSS system and allows users to quickly locate files of interest with file metadata, including POSIX file attributes (or stat metadata) and POSIX extended attributes. HPSSIX also features the fulltext search for supported file types, e.g., Microsoft Office document files.

This framework enables fast file search operations.

## cp files to hpss
$ mkdir /var/hpss/mnt/home/hs2/demo
$ cp -r documents /var/hpss/mnt/home/hs2/demo

## setting some extended attributes
$ for f in /var/hpss/mnt/home/hs2/demo/documents/mtg.notes/*; do hpssix-tag set -n projid -v 100 $f; done

## show extended attributes
$ for f in /var/hpss/mnt/home/hs2/demo/documents/mtg.notes/*; do getfattr -d $f; done

## index -- this takes time (around 90sec), automatically runs daily
$ hpssixd.sh action

## search -- stat, name, path
$ hpssix-search --uid=`id -u` --count-only (--file-only) (--directory-only) (--verbose) (--limit)
$ hpssix-search --uid=`id -u` --name=pptx
$ hpssix-search --uid=`id -u` --path=papers (--file-only) (--directory-only)
$ hpssix-search --uid=`id -u` --name=pptx --date='>20190227-15:00:00'

## tagging
$ hpssix-search --tag="projid:" (--tag="projid:100", --tag="projid:0,200", --tag="projid:>=90.56")
$ hpssix-search --tag="projid:<=100"

## keyword
$ hpssix-search --keyword="parallel file system performance titan" --name='pptx'
$ hpssix-search --keyword='checkpoint burst buffers' --name=pptx

## document display
$ hpssix-meta --meta /var/hpss/mnt/home/hs2/tests/documents/projects/project-poster.pptx
$ hpssix-meta --text /var/hpss/mnt/home/hs2/tests/documents/projects/project-poster.pptx

The architecture is summarized in the following slides.

This work was presented in the following events.

  • Hyogi Sim, "Extracting Metadata from the ORNL HPSS Archive to Improve its Usability," Knowledge Is Power: Unleashing the Potential of Your Archives through Metadata, BoF in ACM/IEEE International Conferenece for HighPerformance Computing, Networking, Storage, and Analysis (SC), Denver, CO, November 2019
  • Hyogi Sim, "Making a Peta-Scale Archival Storage System Searchable," High Performance Storage Systems User Forum2019 (HUF 2019), Indiana University, Bloomington, IN, October 2019

About

A metadata indexing framework for large-scale storage systems, such as HPSS archival storage system.

License:Other


Languages

Language:C 89.8%Language:Shell 4.2%Language:Makefile 2.8%Language:PLpgSQL 1.9%Language:M4 1.2%