Merge remote-tracking branch 'upstream' into 109-implement-kotlin-prototype

1d42d7f9 · Simon Ehrenstein · c69703f5 · 9aec1b00 · 1d42d7f9 · 1d42d7f9
Commit 1d42d7f9 authored Feb 8, 2021 by Simon Ehrenstein
--- a/analysis/README.md
+++ b/analysis/README.md
@@ -9,7 +9,7 @@ benchmark execution results and plotting. The following notebooks are provided:
 For legacy reasons, we also provide the following notebooks, which, however, are not documented:
 * [scalability-graph.ipynb](scalability-graph.ipynb): Creates a scalability graph for a certain benchmark execution.
-* [scalability-graph-final.ipynb](scalability-graph-final.ipynb): Combines the scalability graphs of multiple benchmarks executions (e.g. for comparing different configuration).
+* [scalability-graph-plotter.ipynb](scalability-graph-plotter.ipynb): Combines the scalability graphs of multiple benchmarks executions (e.g. for comparing different configuration).
 * [lag-trend-graph.ipynb](lag-trend-graph.ipynb): Visualizes the consumer lag evaluation over time along with the computed trend.
 ## Usage

--- a/analysis/demand-metric-plot.ipynb
+++ b/analysis/demand-metric-plot.ipynb
@@ -34,7 +34,7 @@
  },
  {
   "source": [
-    "We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). "
+    "We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). To use Unicode narrow non-breaking spaces in the description format it as `u\"1000\\u202FmCPU\"`."
   ],
   "cell_type": "markdown",
   "metadata": {}

 %% Cell type:markdown id: tags:
 # Theodolite Analysis - Plotting the Demand Metric
 This notebook creates a plot, showing scalability as a function that maps load intensities to the resources required for processing them. It is able to combine multiple such plots in one figure, for example, to compare multiple systems or configurations.
 The notebook takes a CSV file for each plot mapping load intensities to minimum required resources, computed by the `demand-metric-plot.ipynb` notebook.
 %% Cell type:markdown id: tags:
 First, we need to import some libraries, which are required for creating the plots.
 %% Cell type:code id: tags:
 ``` python
 import os
 import pandas as pd
 from functools import reduce
 import matplotlib.pyplot as plt
 from matplotlib.ticker import FuncFormatter
 from matplotlib.ticker import MaxNLocator
 ```
 %% Cell type:markdown id: tags:
-We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix).
+We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). To use Unicode narrow non-breaking spaces in the description format it as `u"1000\u202FmCPU"`.
 %% Cell type:code id: tags:
 ``` python
 results_dir = '<path-to>/results'
 experiments = {
    'System XYZ': 'exp200',
 }
 ```
 %% Cell type:markdown id: tags:
 Now, we combie all systems described in `experiments`.
 %% Cell type:code id: tags:
 ``` python
 dataframes = [pd.read_csv(os.path.join(results_dir, f'{v}_demand.csv')).set_index('load').rename(columns={"resources": k}) for k, v in experiments.items()]
 df = reduce(lambda df1,df2: df1.join(df2,how='outer'), dataframes)
 ```
 %% Cell type:markdown id: tags:
 We might want to display the mappings before we plot it.
 %% Cell type:code id: tags:
 ``` python
 df
 ```
 %% Cell type:markdown id: tags:
 The following code creates a MatPlotLib figure showing the scalability plots for all specified systems. You might want to adjust its styling etc. according to your preferences. Make sure to also set a filename.
 %% Cell type:code id: tags:
 ``` python
 plt.style.use('ggplot')
 plt.rcParams['axes.facecolor']='w'
 plt.rcParams['axes.edgecolor']='555555'
 #plt.rcParams['ytick.color']='black'
 plt.rcParams['grid.color']='dddddd'
 plt.rcParams['axes.spines.top']='false'
 plt.rcParams['axes.spines.right']='false'
 plt.rcParams['legend.frameon']='true'
 plt.rcParams['legend.framealpha']='1'
 plt.rcParams['legend.edgecolor']='1'
 plt.rcParams['legend.borderpad']='1'
 @FuncFormatter
 def load_formatter(x, pos):
    return f'{(x/1000):.0f}k'
 markers = ['s', 'D', 'o', 'v', '^', '<', '>', 'p', 'X']
 def splitSerToArr(ser):
    return [ser.index, ser.as_matrix()]
 plt.figure()
 #plt.figure(figsize=(4.8, 3.6)) # For other plot sizes
 #ax = df.plot(kind='line', marker='o')
 for i, column in enumerate(df):
    plt.plot(df[column].dropna(), marker=markers[i], label=column)
 plt.legend()
 ax = plt.gca()
 #ax = df.plot(kind='line',x='dim_value', legend=False, use_index=True)
 ax.set_ylabel('number of instances')
 ax.set_xlabel('messages/second')
 ax.set_ylim(ymin=0)
 #ax.set_xlim(xmin=0)
 ax.yaxis.set_major_locator(MaxNLocator(integer=True))
 ax.xaxis.set_major_formatter(FuncFormatter(load_formatter))
 plt.savefig('temp.pdf', bbox_inches='tight')
 ```
 %% Cell type:code id: tags:
 ``` python
 ```

--- a/analysis/demand-metric.ipynb
+++ b/analysis/demand-metric.ipynb
@@ -4,7 +4,7 @@
   "source": [
    "# Theodolite Analysis - Demand Metric\n",
    "\n",
-    "This notebook allows applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.\n",
+    "This notebook applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.\n",
    "\n",
    "Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.\n",
    "\n",

 %% Cell type:markdown id: tags:
 # Theodolite Analysis - Demand Metric
-This notebook allows applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.
+This notebook applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.
 Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.
 The final output when running this notebook will be a CSV file, providig this mapping. It can be used to create nice plots of a system's scalability using the `demand-metric-plot.ipynb` notebook.
 %% Cell type:markdown id: tags:
 In the following cell, we need to specifiy:
 * `exp_id`: The experiment id  that is to be analyzed.
 * `warmup_sec`: The number of seconds which are to be ignored in the beginning of each experiment.
 * `max_lag_trend_slope`: The maximum tolerable increase in queued messages per second.
 * `measurement_dir`: The directory where the measurement data files are to be found.
 * `results_dir`: The directory where the computed demand CSV files are to be stored.
 %% Cell type:code id: tags:
 ``` python
 exp_id = 200
 warmup_sec = 60
 max_lag_trend_slope = 2000
 measurement_dir = '<path-to>/measurements'
 results_dir = '<path-to>/results'
 ```
 %% Cell type:markdown id: tags:
 With the following call, we compute our demand mapping.
 %% Cell type:code id: tags:
 ``` python
 from src.demand import demand
 demand = demand(exp_id, measurement_dir, max_lag_trend_slope, warmup_sec)
 ```
 %% Cell type:markdown id: tags:
 We might already want to plot a simple visualization here:
 %% Cell type:code id: tags:
 ``` python
 demand.plot(kind='line',x='load',y='resources')
 ```
 %% Cell type:markdown id: tags:
 Finally we store the results in a CSV file.
 %% Cell type:code id: tags:
 ``` python
 import os
 demand.to_csv(os.path.join(results_dir, f'exp{exp_id}_demand.csv'), index=False)
 ```

--- a/analysis/scalability-graph-finish.ipynb
+++ b/analysis/scalability-graph-finish.ipynb
--- a/docs/README.md
+++ b/docs/README.md
+---
+title: Theodolite
+nav_order: 1
+permalink: /
+---
+# Theodolite
+> A theodolite is a precision optical instrument for measuring angles between designated visible points in the horizontal and vertical planes.  -- <cite>[Wikipedia](https://en.wikipedia.org/wiki/Theodolite)</cite>
+Theodolite is a framework for benchmarking the horizontal and vertical scalability of stream processing engines. It consists of three modules:
+## Theodolite Benchmarks
+Theodolite contains 4 application benchmarks, which are based on typical use cases for stream processing within microservices. For each benchmark, a corresponding workload generator is provided. Currently, this repository provides benchmark implementations for Kafka Streams.
+## Theodolite Execution Framework
+Theodolite aims to benchmark scalability of stream processing engines for real use cases. Microservices that apply stream processing techniques are usually deployed in elastic cloud environments. Hence, Theodolite's cloud-native benchmarking framework deploys as components in a cloud environment, orchestrated by Kubernetes. More information on how to execute scalability benchmarks can be found in [Thedolite execution framework](execution).
+## Theodolite Analysis Tools
+Theodolite's benchmarking method create a *scalability graph* allowing to draw conclusions about the scalability of a stream processing engine or its deployment. A scalability graph shows how resource demand evolves with an increasing workload. Theodolite provides Jupyter notebooks for creating such scalability graphs based on benchmarking results from the execution framework. More information can be found in [Theodolite analysis tool](analysis).
--- a/docs/_config.yml
+++ b/docs/_config.yml
+title: "Theodolite"
+remote_theme: pmarsceill/just-the-docs
+#color_scheme: "dark"
+aux_links:
+    "Theodolite on GitHub":
+      - "//github.com/cau-se/theodolite"
\ No newline at end of file
--- a/docs/release-process.md
+++ b/docs/release-process.md
+---
+title: Release Process
+has_children: false
+nav_order: 2
+---
 # Release Process
 We assume that we are creating the release `v0.1.1`. Please make sure to adjust

--- a/execution/infrastructure/kafka/values.yaml
+++ b/execution/infrastructure/kafka/values.yaml
@@ -56,6 +56,7 @@ cp-kafka:
    "auto.create.topics.enable": false
    "log.retention.ms": "10000" # 10s
    # "log.retention.ms": "86400000" # 24h
+    # "group.initial.rebalance.delay.ms": "30000" # 30s
    "metrics.sample.window.ms": "5000" #5s
 ## ------------------------------------------------------