Contributors: Paul Au, Ryan Wallner, Johnathon Battiato, Kallio Prinewill
The barrier to entry for any technology and community surrounding that technology can often be quite high. The purpose of this document is to capture existing resources that we can leverage to help create a roadmap for DoK beginners.
The secondary purpose of this document is to capture gaps in existing content so we can, as a community, create content to fill those gaps.
Provide a set of resources to help guide a complete beginner with the base set of knowledge to start running data workloads on Kubernetes.
We will break the content into sections in a logical order to guide someone from DoK beginner to deploying their first stateful application on kubernetes.
- Why Stateful on Kubernetes?
- Intro to Stateful
- Types of workloads
- Operators 101
- Common Tools
- Ecosystem 101
- Deploy your first database on kubernetes
- Next Steps
According to past DoK reports, the following are some of the reasons
- Kubernetes has become a core part of IT – half of the respondents are running 50% or more of their production workloads • on it, and they are very satisfied and more productive as a result. The most advanced users report 2x or greater productivity gains. -Business demands are creating pressures for further adoption. The increasing importance of real-time data to competitive advantage will sharpen companiesʼ need to run data on Kubernetes. A majority believe standards will improve data management and that data should become declarative. -Standardization is the key driver for Kubernetes Leaders
Provide basic knowledge of what stateful means in Kubernetes.
- Documentation on Stateful Sets from Kubernetes
- Stateful Workloads in Kubernetes: A Deep Dive - Kaslin Fields & Michelle Au, Google
Provide a list of stateful workloads that exist on Kubernetes and a description/examples of each workload Stateful Workloads
- Databases (stateful sets or CRD)
- AI/ML (usually jobs) - https://developers.redhat.com/aiml/ai-workloads
- Batch processing jobs
- Stream processing
- Machine learning and AI workloads
- Data analytics
- ETL (Extract, Transform, Load) pipelines
- Data warehousing
- Distributed databases
- In-memory data grids
- Time series databases
- Search and indexing engines
Provide resources explaining what operators are and what role they play in running data workloads on kubernetes.
- What is a kubernetes Operator
- What are Kubernetes Operators (Operators 101 :part 1)
- Operator Pattern
- Custom Resource Definitions
- An Introduction to Custom Resource Definitions and Custom Resources (Operators 101: Part 2)
- Operator Hub OR visit https://www.cncf.io/blog/2022/06/15/kubernetes-operators-what-are-they-some-examples/
- Celebrating 10 years of Kubernetes: The evolutio of databse operators
- CloudNativePG
- [Vitess](https://vitess.io/docs/20.0/get-started/ope](https://vitess.io/docs/20.0/get-started/operator/)
Provide resources explaining what operators are and what role they play in running data workloads on kubernetes.
- MiniKube: Used for running kubernetes locally
List and describe open source projects that are a part of the DoK Ecosystem. This list is not comprehensive.
- Vitess - MySQL-compatible, horizontally scalable, cloud-native database solution
- Cassandra - Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.
- PostgreSQL - PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.
- Rook - Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments.
- CubeFS - CubeFS is a new generation cloud-native open source storage system that supports access protocols such as S3, HDFS, and POSIX.
- Longhorn - Longhorn is a lightweight, reliable and easy-to-use distributed block storage system for Kubernetes.
- Kafka
- Spark
- Flink
- Kubeflow
- Strimzi
In this section, you'll learn how to use the knowledge you've accumulated to deploy a database to kubernetes.
In this section, we'll list some resources to push you to the next level of understanding.