johncf / backblaze-proc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Backblaze Data Analysis

A collection of scripts to prepare and process data published by Backblaze, to generate failure rate curves over the age of a disk. The scripts needed for the final generation of plots are kept in failure-analysis repo.

The steps described below assumes the presence of directories 2013, 2014 etc. within data directory, containing Hard Drive Test Data from Backblaze in csv format (i.e. in extracted form).

Usage

  1. make db-init

    This will create a Postgres database named backblaze, process csv files nested in data directory and load it into the database.

  2. make plot-all

    Dependency chain: plot-all -> plot-metadata -> popular-models

    • popular-models will query the database for the 20 most popular disk models and creates a file that lists them.
    • plot-metadata processes data for each model listed in popular-models file and generates csv files required for plotting as well as a plot-metadata file that lists all these files needed.
    • plot-all uses the plot-metadata file to actually generate plots.
  3. make Results.md

    Uses plot-metadata file to create a markdown file that embeds all generated plots.

Results

See https://johncf.github.io/failure-analysis.html

About


Languages

Language:Python 65.1%Language:Shell 24.7%Language:Makefile 10.2%