Instructions

The purpose of this repository is to test Seeding a Time Series Instance Database with Historical Data

Extract the desired historical data

This sample conversion is using data from the OSIsoft Example Kit: Pump Condition Based Maintenance

Data is extracted from the PI System using the extract.ps1 which results in the sample.csv

Rules

Data must be flattened
Each row must at a minimum have
- The partition key value used as the Time Series ID
- The timestamp representing the time of the event yyy-MM-ddThh:mm:ss.fffZ
- The event measurements

Steps

RDP to the Virtual Training Environment
Copy and execute the powershell script extract.ps1
Copy the csv file results sample.csv

Convert the historical data to a Parquet file format

There are many different ways that this data can now be converted from CSV to Parquet file formats. This method is using a Databricks Notebook.

Rules

The Time Series Id column name must be the same as defined when creating the TSI environment
- The column name must have _string appended, e.g. “DeviceID_string” and must be of type StringType
- Only single key property is supported
The Event Time column must be named timestamp and be of type TimestampeType
All other column names must be appended with the time
- Supported Types: _string (UTF8), _bool, _datetime, _double
Compression should be Snappy

Steps

Load the CSV file into Blob Storage convert\sample.csv
Modify and import the CsvToParquet.scala notebook with the proper storage account information.

val storageAccountName = ""
val storageAccountAccessKey = ""

Download the converted file which is in the parquet folder. part-xxxx...snappy.parquet
Load the Parquet file into Blob Storage convert\sample.parquet
Execute and Run the CsvToParquet Notebook

Partion the historical data Parquet file

The historical data has to be partioned in a manner that is now compatable with TSI.

Rules

Databricks cluster should not use GPU Types

Steps

Modify the TSIPartioning.py notebook with the proper input storage account information

# Input files source information
inputStorageAccountName = ""
inputStorageAccountAccessKey = ""
inputContainerName = "conversion"

# blob path inside container
inputBlobPrefix = "convert/sample.parquet"

Modify the TSIPartioning.py notebook with the proper TSI environment storage account information

# Output files source information. This can be an environment's storage account, but not necessarily. 
# If you are using a storage account other than environment's storage account, you should manually(using AzCopy) move data to environment's storage account for loading metadata into TSI environment.
environmentStorageAccountName = ""
environmentStorageAccountAccessKey = ""
# The below container should exist and should be different that inputContainerName
environmentContainerName = "partition"

Modify the TSIPartioning.py notebook with the proper TSI Time Series Id Property

# timeSeriesIdProperty should be the name of column in input parquet file which will correspond to the TSID property given while creating TSI environment.
timeSeriesIdProperty = "deviceId_string"

Import and Execute the TSIPartioning Notebook

Validate the TSI Partioned Historical Data

The partioned data can now be validated as compatable with Time Series Instances.

Rules

Databricks cluster should not use GPU Types
Databricks cluster must have the following library installed Name: Azure Type: PyPI

Steps

Modify the TSIPartioningValidator.py notebook with the proper storage account information

# Databricks notebook source
storage_account_name = ""
storage_account_access_key = ""
outputContainerName = "partition"
timeSeriesIdProperty = "deviceId_string"

Import and Execute the TSIPartioningValidator Notebook

Send the required information to Microsoft Engineering

# Databricks notebook source
storage_account_name = ""
storage_account_access_key = ""
containerName = "partition"
tsiEnvContainerName = "env-111a1111-11d1-1111-1111-111111111111"

danielscholl / iot-tsi-seed

Instructions

Extract the desired historical data

Convert the historical data to a Parquet file format

Partion the historical data Parquet file

Validate the TSI Partioned Historical Data

Send the required information to Microsoft Engineering

About

Languages