PNDA Guide

PNDA is a simple, scalable, open big data platform supporting operational and business intelligence analysis for networks and services. This guide provides an overview of PNDA, and will tell you how to set up and use PNDA in your own environment.

This guide covers PNDA release 3.

Last updated: July 4, 2016

Version: 0.1.0

Quick Links

Overview

This chapter covers the main components of PNDA, including:

Data ingress using Logstash, Open Daylight & the bulk ingest tool
Data distribution with Kafka & Zookeeper
High velocity stream processing with Spark Streaming it
High volume batch processing with Spark
Free form data exploration with Jupyter
Structured query over big data with Impala
Handling time series with OpenTSDB & Grafana

Download Book

You can read the latest version of this guide online, or download the book in a number of formats.

Getting Started

This checklist will get you started setting up a fully operational PNDA cluster, with data flowing in and out.

Provisioning

This chapter describes how to provision a PNDA cluster, and includes some background information on SaltStack and OpenStack Heat.

Console

The PNDA console provides a real-time overview of all the components in a PNDA cluster. The home page shows health statistics for each component, color-coded by status. Components are grouped into categories, including data distribution, data processing, data storage, applications, etc.

Other pages on the console let you view detailed metrics, deploy packages, run applications, and set data retention policies.

Producers

Kafka is the "front door" of PNDA. It handles ingest of data streams from network sources and distributes data to all interested consumers. This chapter covers how to integrate and develop "producers", which feed data into Kafka.

Bulk Ingest

In addition to streaming ingest via Kafka producers, PNDA also provides an offline bulk ingest tool for those who would like to migrate pre-existing data into the PNDA platform.

Bulk-ingest tool

Consumers

Kafka has a simple, clean design that moves complexity traditionally found inside message brokers into its producers and consumers. A Kafka consumer pulls messages from one or more topics using Zookeeper for discovery, issuing fetch requests to the brokers leading the partitions it wants to consume. Rather than the broker maintaining state and controlling the flow of data, each consumer controls the rate at which it consumes messages.

Packages & Applications

Packages are independently deployable units of application layer functionality, and applications are instances of packages. You can use the PNDA console to deploy packages and manage the application lifecycle. The Deployment Manager documentation explains the structure of packages, and the REST API used to deploy them.

Log Aggregation

Logs from the various component services that make up PNDA, and the applications that run on PNDA, are collected and stored on the logserver node.

Structured Query

Apache Impala is a parallel execution engine for SQL queries. It supports low-latency access and interactive exploration of data in HDFS and HBase. Impala allows data to be stored in a raw form, with aggregation performed at query time without requiring upfront aggregation of data.

Impala

Data Exploration

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. In PNDA, it supports exploration and presentation of data from HDFS and HBase.

Time Series

OpenTSDB is a scalable time series database that lets you store and serve massive amounts of time series data, without losing granularity. Grafana is a graph and dashboard builder for visualizing time series metrics.

Security

A big data infrastructure like PNDA involves a multitude of technologies and tools, and may be deployed in a multi-tenant environment. Providing enterprise grade security for such system is not only complex, but is of primary concern for any production deployment. If you are implementing a client for a PNDA interface or developing a PNDA application, this chapter will cover some security guidelines that you should adhere to when working with individual components.

Repositories

The PNDA distribution consists of the following source code repositories and sub-projects:

Provisioning

platform-salt: provisioning logic for creating PNDA
platform-salt-cloud: cluster templates for creating PNDA with salt-cloud
pnda-heat-templates: cluster templates for creating PNDA with Heat
pnda-dib-elements: tools for building disk image templates
pnda-package-server-docker: tools for creating package server

Platform

platform-libraries: libraries for working with interactive notebooks
platform-tools: tools for operating a cluster
- bulkingest: tools for performing a bulk ingest of data
platform-console-frontend: “single pane of glass” giving operational overview and access to application and data management functions
platform-console-backend: APIs that provide data to the console frontend
- console-backend-data-logger: APIs to ingest data
- console-backend-data-manager: APIs to provide data
platform-testing: modules that test both the end to end platform and individual components and collect metrics
platform-deployment-manager: API to manage packages and application deployment and lifecycle
platform-data-mgmnt: tools to manage data retention
- data-service: API to set data retention policies
- hdfs-cleaner: cron job to clean up HDFS data
- oozie-templates: templates that archive or delete data
platform-package-repository: manages a simple package repository backed by OpenStack Swift

Forked Projects

gobblin: customized fork of the Gobblin data ingest frameworkjup

Producers

prod-odl-kafka: plugin to ingest data from OpenDaylight
prod-logstash-codec-avro: plugin to ingest data from Logstash

Examples

example-spark-batch: example batch data processing application
example-spark-streaming: example streaming data processing application
example-jupyter-notebooks: examples for working with Jupyter notebooks
example-kafka-clients: examples for working with kafka clients
- java
- php
- python
example-kafka-spark-opentsdb-app: example consumer that feeds data to OpenTSDB

rolinston / pnda-guide

PNDA Guide

Quick Links

Overview

Download Book

Getting Started

Provisioning

Console

Producers

Bulk Ingest

Consumers

Packages & Applications

Log Aggregation

Structured Query

Data Exploration

Time Series

Security

Repositories

Provisioning

Platform

Forked Projects

Producers

Examples

Documentation

References

Changelog

About