Add docs for load generator

2b48d30d · Sören Henning · d1de7b09 · 2b48d30d · 2b48d30d
Commit 2b48d30d authored Jan 19, 2022 by Sören Henning
--- a/docs/theodolite-benchmarks/index.md
+++ b/docs/theodolite-benchmarks/index.md
 ---
 title: Available Benchmarks
-has_children: false
+has_children: true
 nav_order: 7
 ---

 # Theodolite Benchmarks

-Theodolite comes with 4 application benchmarks, which are based on typical use cases for stream processing within microservices. For each benchmark, a corresponding load generator is provided. Currently, Theodolite provides benchmark implementations for Apache Kafka Streams and Apache Flink.
+Theodolite comes with 4 application benchmarks, which are based on typical use cases for stream processing within microservices. For each benchmark, a corresponding [load generator](load-generator) is provided. Currently, Theodolite provides benchmark implementations for Apache Kafka Streams and Apache Flink.


 Theodolite's benchmarks are based on typical use cases for stream processing within microservices. Specifically, all benchmarks represent some sort of microservice doing Industrial Internet of Things data analytics. 

--- a/docs/theodolite-benchmarks/load-generator.md
+++ b/docs/theodolite-benchmarks/load-generator.md
+---
+title: Load Generators
+parent: Available Benchmarks
+has_children: false
+nav_order: 1
+---
+
+# Load Generator Framework
+
+Theodolite's benchmarks come with a flexible load generator framework. It is used to create load on the [4 Theodolite benchmarks](#prebuild-container-images), but can also be applied to create [custom load generators](#creating-a-custom-load-generator).
+It is particularly designed for scalability: Just spin up multiple instances of the load generator and the instances automatically divide the load to be generated among themselves.
+
+## Prebuild container images
+
+For each benchmark, we provide a [load generator as OCI container image](https://github.com/orgs/cau-se/packages?tab=packages&q=workload-generator). These load generators simulate smart power meters in an industrial facility, which generate measurement records at a fixed rate. Records are published to an Apache Kafka topic (default) or sent as POST requests to an HTTP endpoint.
+
+You can simply run a load generator container, for example, for benchmark UC1 with:
+
+```sh
+docker run ghcr.io/cau-se/theodolite-uc1-workload-generator
+```
+
+### Message format
+
+Messages generated by the load generators represent a single measurement of [active power](https://en.wikipedia.org/wiki/AC_power#Active,_reactive,_apparent,_and_complex_power_in_sinusoidal_steady-state). The corresponding message type is specified as [`ActivePowerRecords`](https://github.com/cau-se/titan-ccp-common/blob/master/src/main/avro/ActivePower.avdl)
+defined with Avro. It consists of an identifier for simulated power sensor, a timestamp in epoch milliseconds and the actual measured (simulated) value in watts.
+
+When sending generated records via Apache Kafka, these records are serialized with the [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry).
+If the load generator is configured to send records as HTTP POST requests, records are serialized as JSON according to the following format:
+
+```json
+{
+  "identifier": "sensor-id",
+  "timestamp": 1645564942000,
+  "valueInW": 1234.56
+}
+```
+
+### Configuration
+
+The prebuild container images can be configured with the following environment variables:
+
+| Environment Variable | Description | Default |
+|:----|:----|:----|
+| `BOOTSTRAP_SERVER` | Address (`hostname:port`) of another load generator instance to form a cluster with. Can also be this instance. | `localhost:5701` |
+| `KUBERNETES_DNS_NAME` | Kubernetes service name to discover other load generators to form a cluster with. Must be a fully qualified domain name (FQDN), e.g., something like `<service>.<namespace>.svc.cluster.local`. * Requires `BOOTSTRAP_SERVER` not to be set. | |
+| `PORT` | Port used for for coordination among load generator instances. | 5701 |
+| `PORT_AUTO_INCREMENT` | If set to true and the specified PORT is already used, use the next higher one. Useful if multiple instances should run on the same host, without configuring each instance individually. | true |
+| `CLUSTER_NAME_PREFIX` | Only required if unrelated load generators form a cluster. | theodolite-load-generation |
+| `TARGET` | The target system the load generator send messages to. Valid values are: `kafka`, `http`. | `kafka` |
+| `KAFKA_BOOTSTRAP_SERVERS` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. See [Kafka producer config: `bootstrap.servers`](https://kafka.apache.org/documentation/#producerconfigs_bootstrap.servers) for more information. Only used if Kafka is set as `TARGET`. | `localhost:9092` |
+| `KAFKA_INPUT_TOPIC` | Name of the Kafka topic, which should receive the generated messages. Only used if Kafka is set as `TARGET`. | input |
+| `SCHEMA_REGISTRY_URL` | URL of the [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry). | `http://localhost:8081` |
+| `KAFKA_BATCH_SIZE` | Value for the Kafka producer configuration: [`batch.size`](https://kafka.apache.org/documentation/#producerconfigs_batch.size). Only used if Kafka is set as `TARGET`. | see Kafka producer config: [`batch.size`](https://kafka.apache.org/documentation/#producerconfigs_batch.size) |
+| `KAFKA_LINGER_MS` | Value for the Kafka producer configuration: [`linger.ms`](https://kafka.apache.org/documentation/#producerconfigs_linger.ms). Only used if Kafka is set as `TARGET`. | see Kafka producer config: [`linger.ms`](https://kafka.apache.org/documentation/#producerconfigs_linger.ms) |
+| `KAFKA_BUFFER_MEMORY` | Value for the Kafka producer configuration: [`buffer.memory`](https://kafka.apache.org/documentation/#producerconfigs_buffer.memory) Only used if Kafka is set as `TARGET`. | see Kafka producer config: [`buffer.memory`](https://kafka.apache.org/documentation/#producerconfigs_buffer.memory) |
+| `HTTP_URL` | The URL the load generator should post messages to. Only used if HTTP is set as `TARGET`. | |
+| `NUM_SENSORS` | The amount of simulated sensors. | 10 |
+| `PERIOD_MS` | The time in milliseconds between generating two messages for the same sensor. With our Theodolite benchmarks, we apply an [open workload model](https://www.usenix.org/legacy/event/nsdi06/tech/full_papers/schroeder/schroeder.pdf) in which new messages are generated at a fixed rate, without considering the think time of the target server nor the time required for generating a message. | 1000 |
+| `VALUE` | The constant `valueInW` of an `ActivePowerRecord`. | 10 |
+| `THREADS` | Number of worker threads used to generate the load. | 4 |
+
+Please note that there are some additional configuration options for benchmark [UC4's load generator](https://github.com/cau-se/theodolite/blob/master/theodolite-benchmarks/uc4-load-generator/src/main/java/theodolite/uc4/workloadgenerator/LoadGenerator.java).
+
+## Creating a custom load generator
+
+To create a custom load generator, you need to import the [load-generator-commons](https://github.com/cau-se/theodolite/tree/master/theodolite-benchmarks/load-generator-commons) project. You can then create an instance of the `LoadGenerator` object and call its `run` method:
+
+```java
+LoadGenerator loadGenerator = new LoadGenerator()
+    .setClusterConfig(clusterConfig)
+    .setLoadDefinition(new WorkloadDefinition(
+        new KeySpace(key_prefix, numSensors),
+        duration))
+    .setGeneratorConfig(new LoadGeneratorConfig(
+        recordGenerator,
+        recordSender))
+    .withThreads(threads);
+loadGenerator.run();
+```
+
+Alternatively, you can also start with a load generator populated with a default configuration or created from environment variables and then adjust the `LoadGenerator` as desired:
+
+```java
+LoadGenerator loadGeneratorFromDefaults = LoadGenerator.fromDefaults()
+LoadGenerator loadGeneratorFromEnv = LoadGenerator.fromEnvironment();
+```
\ No newline at end of file