Theodolite Execution Framework
This directory contains the Theodolite framework for executing scalability benchmarks in a Kubernetes cluster. As Theodolite aims for executing benchmarks in realistic execution environments,, some third-party components are required. After everything is installed and configured, you can move on the execution of benchmarks.
Requirements
Kubernetes Cluster
For executing benchmarks, access to Kubernetes cluster is required. We suggest to create a dedicated namespace for executing our benchmarks. The following services need to be available as well.
Prometheus
We suggest to use the Prometheus Operator and create a dedicated Prometheus instance for these benchmarks.
If Prometheus Operator is not already available on your cluster, a convenient way to install is via the unofficial Prometheus Operator Helm chart. As you may not need an entire cluster monitoring stack, you can use our Helm configuration to only install the operator:
helm install prometheus-operator stable/prometheus-operator -f infrastructure/prometheus/helm-values.yaml
After installation, you need to create a Prometheus instance:
kubectl apply -f infrastructure/prometheus/prometheus.yaml
You might also need to apply the ServiceAccount, ClusterRole and the CusterRoleBinding, depending on your cluster's security policies.
For the individual benchmarking components to be monitored, ServiceMonitors are used. See the corresponding sections below for how to install them.
Grafana
As with Prometheus, we suggest to create a dedicated Grafana instance. Grafana with our default configuration can be installed with Helm:
helm install grafana stable/grafana -f infrastructure/grafana/values.yaml
The official Grafana Helm Chart repository provides further documentation including a table of configuration options.
We provide ConfigMaps for a Grafana dashboard and a Grafana data source.
Create the Configmap for the dashboard:
kubectl apply -f infrastructure/grafana/dashboard-config-map.yaml
Create the Configmap for the data source:
kubectl apply -f infrastructure/grafana/prometheus-datasource-config-map.yaml
A Kafka cluster
One possible way to set up a Kafka cluster is via Confluent's Helm Charts. For using these Helm charts and conjuction with the Prometheus Operator (see below), we provide a patch for these helm charts. Note that this patch is only required for observation and not for the actual benchmark execution and evaluation.
Our patched Confluent Helm Charts
To use our patched Confluent Helm Charts clone the chart's repsoitory. We also provide a default configuration. If you do not want to deploy 10 Kafka and 3 Zookeeper instances, alter the configuration file accordingly. To install Confluent's Kafka and use the configuration:
helm install my-confluent <path-to-cp-helm-charts> -f infrastructure/kafka/values.yaml
To let Prometheus scrape Kafka metrics, deploy a ServiceMonitor:
kubectl apply -f infrastructure/kafka/service-monitor.yaml
Other options for Kafka
Other Kafka deployments, for example, using Strimzi, should work in a similar way.
A Kafka Client Pod
A permanently running pod used for Kafka configuration is started via:
kubectl apply -f infrastructure/kafka/kafka-client.yaml
A Zookeeper Client Pod
Also a permanently running pod for ZooKeeper access is started via:
kubectl apply -f infrastructure/zookeeper-client.yaml
The Kafka Lag Exporter
Lightbend's Kafka Lag Exporter can be installed via Helm. We also provide a default configuration. To install it:
helm install kafka-lag-exporter https://github.com/lightbend/kafka-lag-exporter/releases/download/v0.6.3/kafka-lag-exporter-0.6.3.tgz -f infrastructure/kafka-lag-exporter/values.yaml
Python 3.7
For executing benchmarks, a Python 3.7 installation is required. We suggest
to use a virtual environment placed in the .venv
directory (in the Theodolite
root directory). As set of requirements is needed. You can install them with the following
command (make sure to be in your virtual environment if you use one):
pip install -r requirements.txt
Required Manual Adjustments
Depending on your setup, some additional adjustments may be necessary:
- Change Kafka and Zookeeper servers in the Kubernetes deployments (uc1-application etc.) and
run_XX.sh
scripts - Change Prometheus' URL in
lag_analysis.py
- Change the path to your Python 3.7 virtual environment in the
run_XX.sh
schripts (to find the venv'sbin/activate
) - Change the name of your Kubernetes namespace for Prometheus' ClusterRoleBinding
- Please let us know if there are further adjustments necessary
Execution
Please note that a Python 3.7 installation is required for executing Theodolite.
The ./theodolite.py
is the entrypoint for all benchmark executions. Is has to be called as follows:
./theodolite.sh <use-case> <wl-values> <instances> <partitions> <cpu-limit> <memory-limit> <commit-interval> <duration> <domain-restriction> <search-strategy>
-
<use-case>
: Stream processing use case to be benchmarked. Has to be one of1
,2
,3
or4
. -
<wl-values>
: Values for the workload generator to be tested, separated by commas and sorted in ascending order. For example100000,200000,300000
. -
<instances>
: Numbers of instances to be benchmarked, separated by commas and sorted in ascending order. For example1,2,3,4
. -
<partitions>
: Number of partitions for Kafka topics. Optional. Default40
. -
<cpu-limit>
: Kubernetes CPU limit. Optional. Default1000m
. -
<memory-limit>
: Kubernetes memory limit. Optional. Default4Gi
. -
<commit-interval>
: Kafka Streams' commit interval in milliseconds. Optional. Default100
. -
<duration>
: Duration in minutes subexperiments should be executed for. Optional. Default5
. -
<domain-restriction>
: The domain restriction:restrict-domain
to use domain restriction,no-domain-restriction
to not use domain restriction. Defaultno-domain-restriction
. For more details see Section Domain Restriction. -
<search-strategy>
: The benchmarking search strategy. Can be set tocheck-all
,linear-search
orbinary-search
. Defaultcheck-all
. For more details see Section Benchmarking Search Strategies.
Domain Restriction
For dimension value, we have a domain of the amounts of instances. As a consequence, for each dimension value the maximum number of lag experiments is equal to the size of the domain. How the domain is determined is defined by the following domain restriction strategies.
-
no-domain-restriction
: For each dimension value, the domain of instances is equal to the set of all amounts of instances. -
restrict-domain
: For each dimension value, the domain is computed as follows:- If the dimension value is the smallest dimension value the domain of the amounts of instances is equal to the set of all amounts of instances.
- If the dimension value is not the smallest dimension value and N is the amount of minimal amount of instances that was suitable for the last smaller dimension value the domain for this dimension value contains all amounts of instances greater than, or equal to N.
Benchmarking Search Strategies
There are the following benchmarking strategies:
-
check-all
: For each dimension value, execute one lag experiment for all amounts of instances within the current domain. -
linear-search
: A heuristic which works as follows: For each dimension value, execute one lag experiment for all number of instances within the current domain. The execution order is from the lowest number of instances to the highest amount of instances and the execution for each dimension value is stopped, when a suitable amount of instances is found or if all lag experiments for the dimension value were not successful. -
binary-search
: A heuristic which works as follows: For each dimension value, execute one lag experiment for all number of instances within the current domain. The execution order is in a binary-search-like manner. The execution is stopped, when a suitable amount of instances is found or if all lag experiments for the dimension value were not successful.