This project is owned by the Puppet Solutions Architects team. It is in an early stage of development and only intended to be used by Puppet Enterprise customers, in close collaboration with its developers.
The Puppet Data Service (PDS) provides a centralized API-driven interface for Puppet node data and for Hiera data. PDS supports self-service use cases, and Puppet-as-a-Service (PUPaaS) use cases, providing a foundational mechanism for allowing service customer teams to get work done without requiring manual work to be performed by the PUPaaS team.
The PDS consists of:
- A database backend. At present PostgreSQL is supported. Backends are plugable, and support for other backends is planned.
- An API service.
- A command-line interface (CLI) for human operators.
- Several Puppet integration components which let Puppet consume data from the API.
- A Puppet module (puppetlabs-puppet_data_service) to aid in setup and configuration.
Here is detailed information to install, configure, and run the service using the puppetlabs-puppet_data_service module
The puppet_data_service
module will install the whole PDS via its RPM for you, so you don't have to worry about operationalizing the PDS service itself, dealing with DB setup, migrations, and so on, also it will install the PDS CLI as well.
Required configuration parameters
puppet_data_service::database_host
puppet_data_service::pds_token
Optional configuration parameters
puppet_data_service::package_source
This setup will help you to quickly configure the PDS in your existing PE server, for advanced Puppet users review the Configure using roles and Hiera eyaml section.
- Add the puppetlabs-puppet_data_service module to your control repo
- Configure the two required application roles
- The Database server
-
Add a new Node Group from the PE Console.
Parent name: PE Infrastructure Group name: PDS Database Environment: production
-
Add the class
puppet_data_service::database
to the PDS Database group created in the step above -
Add your existing PE PostgreSQL server to the group using the Rules tab (it could be the primary server)
- In case you want to test the PDS in a different server without PostgreSQL, you can apply the
puppet_enterprise::profile::database
class to your node before following these steps
- In case you want to test the PDS in a different server without PostgreSQL, you can apply the
-
Commit your changes
-
- PDS API servers
- In the PE Master node group
- In the classes tab
- Add the new class
puppet_data_service::server
- Include at least the following parameters:
- database_host: The resolvable address (fqdn) of the PDS database host.
- Add the new class
- In the Configuration data tab:
- Configure the sensitive
pds_token
parameter. You may generate and pass a UUID to initialize the "admin" account token in a new PDS system. If the PDS is already deployed, you must supply a valid PDS user token.
- Configure the sensitive
- Commit your changes
- In the classes tab
- In the PE Master node group
- The Database server
- Run the Puppet Agent
If you are an experienced Puppet practicioner, this other configuration option will give you the tools you need to make your own Puppet profiles.
Include the profile classes in the appropriate role class, if roles are being used.
We suggest that each compiler run an instance of the PDS API server.
The PDS Database server
Example profile
# control-repo/site-modules/profile/manifests/pds_database_server.pp
class profile::pds_database_server {
include puppet_data_service::database
}
The PDS API server
Example profile
# control-repo/site-modules/profile/manifests/pds_api_server.pp
class profile::pds_api_server {
include puppet_data_service::server
# Set at least these parameter values in Hiera:
# ---
# puppet_data_service::server::database_host: 'database.example.com'
# puppet_data_service::server::pds_token: 'C979C2A2-C031-4B7B-B271-08DFBD6C795D'
}
Since the pds_token
is a sensitive parameter, it should ideally be encrypted using Hiera eyaml.
To use the Hiera data elements of PDS, your hiera.yaml must include a level for the Puppet Data Service. Instructions for configuring the hiera backend in your hiera.yaml are available here.
The PDS app folder contains detailed instructions to run the PDS API in a local development environment, as well as the CLI golang README file explains how to build and test the PDS CLI.
The following paths are included in the packaging of pds-server.
Paths:
/opt/puppetlabs/server/apps/pds-server
- Server application files
/etc/puppetlabs/pds
- Config files
/opt/puppetlabs/bin
- pds-cli executable
/opt/puppetlabs/sbin
- pds-ctl admin utility script
/etc/puppetlabs/puppet/trusted-external-commands/pds
- Wrapper to `pds-cli node get --trusted-external-command "$1"`
fpm
must be installedrpm
build tools must be installeddeb
build-essentialmust be installed
To build the pds-server RPM package
- Checkout the project repo on a host of the OS type you would like to build the package for and change to that directory
- Run
make clean
- Run
make rpm
To build the pds-server DEB package
- Checkout the project repo on a host of the OS type you would like to build the package for and change to that directory
- Run
make clean
- Run
make deb
As a PDS user you currently have two options to interact with it.
- PDS CLI
- PDS API
This user guide will focus on the PDS CLI, but if you want to create your own PDS client (e.g. web app) check the PDS API documentation
The CLI offers you a convinient way to create and retrieve data from the PDS. You can interact with it by typing in your Puppet Server's terminal:
pds-cli
The PDS CLI documentation section has detailed instructions of the available options
All Puppet Data Service (PDS) implementations have a Backend Storage Service (BSS) which is either a source of truth or a cached version of an external source of truth to facilitate better integration with Puppet. In both scenarios you must plan for regular backups of the BSS to reduce RTO and prevent data loss. Backup and restore operations should be initiated through the PDS API. The PDS API can bulk output and load JSON via flat files, making the backup and restore procedure independent from the technology chosen to implement BSS durable storage.
RTO of a PDS deployment is dependent in large upon the BSS implementation. The actual API service is fairly simple and deploys quickly, data stored in the BSS is not complex, primarily made up of key/value pairs and lacking of any relationships.
This PDS implementation has the Puppet Enterprise customer as the primary user in mind and in that context the BSS is backed by PE PostgreSQL so a loss of the PDS BSS likely coincides with a loss of PE services. To restore PDS services PE must be online so the RTO is calculated as RTO of PE + PDS API deployment time + BSS data restore time
. In a scenario where the user has a Large architecture deployment of PE, a 2 hour RTO is reasonable but Recovery Time Actual (RTA) will flux dependent upon the time it takes to restore PE services and the quantity of data that was stored in the BSS.
Scenarios where the PDS is not dependent on the functionality of PE can usually be recovered in less time. This is simply because obtaining a database from PE after a disaster requires you restore the entirety of PE; if PE is not a factor then an independent database can be provisioned to house restored BSS data. The time to online a non-PE dependent PDS service is at least half of the time it takes to online a PE dependent installation. In most scenarios a 1 hour RTO is reasonable. This is calculated by taking BSS backend database deployment time + PDS API deployment time + BSS data restore time
.
RPO of a PDS deployment is dependent on rate of change of user modifications and how you address that rate of change with your backup schedule. The RPO is calculated by Timestamp of service loss - Timestamp of last backup
. This number will be the number of hours of data loss that you will need to be re-input manually or obtain from an external source of truth.
How feasible restoring incremental lost data depends on rate of change. If you only see a few changes a day then daily backups will be sufficient but if your change rate is dozens per hour then a more frequent backup schedule is ideal. Having to manually input data not found in your latest backup through the PDS API could be time consuming, affecting your Recovery Time Actual (RTA).
The basic procedure for backing up the PDS is running the pds-cli
list sub-command against each endpoint, once for user, hiera, and node to output all values as a single JSON blob then transforming that data to be an appropriate data structure for doing a mass add through the pds-cli
create endpoint at a later date. For the data dumped from a list to be valid for restoration it needs to be contained in a JSON hash key of resources
.
Example CLI output:
[
{
"level": "common",
"key": "pds::nothing",
"value": null,
"created-at": "2022-01-25T22:27:42.761Z",
"updated-at": "2022-01-25T22:27:42.761Z"
},
{
"level": "common",
"key": "pds::color",
"value": "red",
"created-at": "2022-01-25T22:27:42.762Z",
"updated-at": "2022-01-25T22:27:42.762Z"
},
{
"level": "priority",
"key": "pds::color",
"value": "blue",
"created-at": "2022-01-25T22:27:42.764Z",
"updated-at": "2022-01-25T22:27:42.764Z"
}
]
Valid structure for CLI restore:
{
"resources": [
{
"level": "common",
"key": "pds::nothing",
"value": null,
"created-at": "2022-01-25T22:27:42.761Z",
"updated-at": "2022-01-25T22:27:42.761Z"
},
{
"level": "common",
"key": "pds::color",
"value": "red",
"created-at": "2022-01-25T22:27:42.762Z",
"updated-at": "2022-01-25T22:27:42.762Z"
},
{
"level": "priority",
"key": "pds::color",
"value": "blue",
"created-at": "2022-01-25T22:27:42.764Z",
"updated-at": "2022-01-25T22:27:42.764Z"
}
]
}
An example workflow on how you can validate backup and restore functionality can be found here is the docs folder.