diff --git a/analysis/README.md b/analysis/README.md index 5fc0179bf9d1e103783fc1bdb2b030aacbb4ed98..3c96cf0b6e67a60ebbb4c610ca69fcbcb27876a0 100644 --- a/analysis/README.md +++ b/analysis/README.md @@ -3,18 +3,24 @@ This directory contains Jupyter notebooks for analyzing and visualizing benchmark execution results and plotting. The following notebooks are provided: +* [demand-metric.ipynb](demand-metric.ipynb): Create CSV files describing scalability according to the Theodolite `demand` metric. +* [demand-metric-plot.ipynb](demand-metric-plot.ipynb): Create plots based on such CSV files of the `demand` metric. + +For legacy reasons, we also provide the following notebooks, which, however, are not documented: + * [scalability-graph.ipynb](scalability-graph.ipynb): Creates a scalability graph for a certain benchmark execution. * [scalability-graph-final.ipynb](scalability-graph-final.ipynb): Combines the scalability graphs of multiple benchmarks executions (e.g. for comparing different configuration). * [lag-trend-graph.ipynb](lag-trend-graph.ipynb): Visualizes the consumer lag evaluation over time along with the computed trend. ## Usage -Basically, the Theodolite Analysis Jupyter notebooks should be runnable by any Jupyter server. To make it a bit easier, +In general, the Theodolite Analysis Jupyter notebooks should be runnable by any Jupyter server. To make it a bit easier, we provide introductions for running notebooks with Docker and with Visual Studio Code. These intoduction may also be a good starting point for using another service. -For analyzing and visualizing benchmark results, either Docker or a Jupyter installation with Python 3.7 or newer is -required (e.g., in a virtual environment). +For analyzing and visualizing benchmark results, either Docker or a Jupyter installation with Python 3.7 or 3.8 is +required (e.g., in a virtual environment). **Please note that Python 3.9 seems not to be working as not all our +dependencies are ported to Python 3.9 yet.** ### Running with Docker diff --git a/analysis/demand-metric-plot.ipynb b/analysis/demand-metric-plot.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..95f371510bbcc8af785739c50bce42e969ea2b80 --- /dev/null +++ b/analysis/demand-metric-plot.ipynb @@ -0,0 +1,173 @@ +{ + "cells": [ + { + "source": [ + "# Theodolite Analysis - Plotting the Demand Metric\n", + "\n", + "This notebook creates a plot, showing scalability as a function that maps load intensities to the resources required for processing them. It is able to combine multiple such plots in one figure, for example, to compare multiple systems or configurations.\n", + "\n", + "The notebook takes a CSV file for each plot mapping load intensities to minimum required resources, computed by the `demand-metric-plot.ipynb` notebook." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "source": [ + "First, we need to import some libraries, which are required for creating the plots." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import pandas as pd\n", + "from functools import reduce\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib.ticker import FuncFormatter\n", + "from matplotlib.ticker import MaxNLocator" + ] + }, + { + "source": [ + "We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). " + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "results_dir = '<path-to>/results'\n", + "\n", + "experiments = {\n", + " 'System XYZ': 'exp200',\n", + "}\n" + ] + }, + { + "source": [ + "Now, we combie all systems described in `experiments`." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataframes = [pd.read_csv(os.path.join(results_dir, f'{v}_demand.csv')).set_index('load').rename(columns={\"resources\": k}) for k, v in experiments.items()]\n", + "\n", + "df = reduce(lambda df1,df2: df1.join(df2,how='outer'), dataframes)" + ] + }, + { + "source": [ + "We might want to display the mappings before we plot it." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df" + ] + }, + { + "source": [ + "The following code creates a MatPlotLib figure showing the scalability plots for all specified systems. You might want to adjust its styling etc. according to your preferences. Make sure to also set a filename." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.style.use('ggplot')\n", + "plt.rcParams['axes.facecolor']='w'\n", + "plt.rcParams['axes.edgecolor']='555555'\n", + "#plt.rcParams['ytick.color']='black'\n", + "plt.rcParams['grid.color']='dddddd'\n", + "plt.rcParams['axes.spines.top']='false'\n", + "plt.rcParams['axes.spines.right']='false'\n", + "plt.rcParams['legend.frameon']='true'\n", + "plt.rcParams['legend.framealpha']='1'\n", + "plt.rcParams['legend.edgecolor']='1'\n", + "plt.rcParams['legend.borderpad']='1'\n", + "\n", + "@FuncFormatter\n", + "def load_formatter(x, pos):\n", + " return f'{(x/1000):.0f}k'\n", + "\n", + "markers = ['s', 'D', 'o', 'v', '^', '<', '>', 'p', 'X']\n", + "\n", + "def splitSerToArr(ser):\n", + " return [ser.index, ser.as_matrix()]\n", + "\n", + "plt.figure()\n", + "#plt.figure(figsize=(4.8, 3.6)) # For other plot sizes\n", + "#ax = df.plot(kind='line', marker='o')\n", + "for i, column in enumerate(df):\n", + " plt.plot(df[column].dropna(), marker=markers[i], label=column)\n", + "plt.legend()\n", + "ax = plt.gca()\n", + "#ax = df.plot(kind='line',x='dim_value', legend=False, use_index=True)\n", + "ax.set_ylabel('number of instances')\n", + "ax.set_xlabel('messages/second')\n", + "ax.set_ylim(ymin=0)\n", + "#ax.set_xlim(xmin=0)\n", + "ax.yaxis.set_major_locator(MaxNLocator(integer=True))\n", + "ax.xaxis.set_major_formatter(FuncFormatter(load_formatter))\n", + "\n", + "plt.savefig('temp.pdf', bbox_inches='tight')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "version": "3.8.5-final" + }, + "orig_nbformat": 2, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "npconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3, + "kernelspec": { + "name": "python37064bitvenvvenv6c432ee1239d4f3cb23f871068b0267d", + "display_name": "Python 3.7.0 64-bit ('.venv': venv)", + "language": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/analysis/demand-metric.ipynb b/analysis/demand-metric.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..525bde211afcabeecf52f1e88f3c91c02a77a152 --- /dev/null +++ b/analysis/demand-metric.ipynb @@ -0,0 +1,119 @@ +{ + "cells": [ + { + "source": [ + "# Theodolite Analysis - Demand Metric\n", + "\n", + "This notebook allows applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.\n", + "\n", + "Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.\n", + "\n", + "The final output when running this notebook will be a CSV file, providig this mapping. It can be used to create nice plots of a system's scalability using the `demand-metric-plot.ipynb` notebook." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "source": [ + "In the following cell, we need to specifiy:\n", + "\n", + "* `exp_id`: The experiment id that is to be analyzed.\n", + "* `warmup_sec`: The number of seconds which are to be ignored in the beginning of each experiment.\n", + "* `max_lag_trend_slope`: The maximum tolerable increase in queued messages per second.\n", + "* `measurement_dir`: The directory where the measurement data files are to be found.\n", + "* `results_dir`: The directory where the computed demand CSV files are to be stored." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "exp_id = 200\n", + "warmup_sec = 60\n", + "max_lag_trend_slope = 2000\n", + "measurement_dir = '<path-to>/measurements'\n", + "results_dir = '<path-to>/results'\n" + ] + }, + { + "source": [ + "With the following call, we compute our demand mapping." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from src.demand import demand\n", + "\n", + "demand = demand(exp_id, measurement_dir, max_lag_trend_slope, warmup_sec)" + ] + }, + { + "source": [ + "We might already want to plot a simple visualization here:" + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "demand.plot(kind='line',x='load',y='resources')" + ] + }, + { + "source": [ + "Finally we store the results in a CSV file." + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "demand.to_csv(os.path.join(results_dir, f'exp{exp_id}_demand.csv'), index=False)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "version": "3.8.5-final" + }, + "orig_nbformat": 2, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "npconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3, + "kernelspec": { + "name": "python37064bitvenvvenv6c432ee1239d4f3cb23f871068b0267d", + "display_name": "Python 3.7.0 64-bit ('.venv': venv)", + "language": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/analysis/scalability-graph.ipynb b/analysis/scalability-graph.ipynb index 868f950dfea091b8fd6dbc78dc4b7471086c8947..8e4b3bd99ef032b75826535eaebd2b435ccf0881 100644 --- a/analysis/scalability-graph.ipynb +++ b/analysis/scalability-graph.ipynb @@ -245,7 +245,7 @@ "metadata": {}, "outputs": [], "source": [ - "min_suitable_instances.to_csv(os.path.join(directory_out, f'../results-inst/exp{exp_id}_min-suitable-instances.csv'), index=False)" + "min_suitable_instances.to_csv(os.path.join(directory_out, f'exp{exp_id}_min-suitable-instances.csv'), index=False)" ] }, { diff --git a/analysis/src/demand.py b/analysis/src/demand.py new file mode 100644 index 0000000000000000000000000000000000000000..dfb20c05af8e9a134eedd2cdb584c961a82369f5 --- /dev/null +++ b/analysis/src/demand.py @@ -0,0 +1,59 @@ +import os +from datetime import datetime, timedelta, timezone +import pandas as pd +from sklearn.linear_model import LinearRegression + +def demand(exp_id, directory, threshold, warmup_sec): + raw_runs = [] + + # Compute SL, i.e., lag trend, for each tested configuration + filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("totallag.csv")] + for filename in filenames: + #print(filename) + run_params = filename[:-4].split("_") + dim_value = run_params[2] + instances = run_params[3] + + df = pd.read_csv(os.path.join(directory, filename)) + #input = df.loc[df['topic'] == "input"] + input = df + #print(input) + input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp'] + #print(input) + #print(input.iloc[0, 'timestamp']) + regress = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up + #regress = input + + #input.plot(kind='line',x='timestamp',y='value',color='red') + #plt.show() + + X = regress.iloc[:, 2].values.reshape(-1, 1) # values converts it into a numpy array + Y = regress.iloc[:, 3].values.reshape(-1, 1) # -1 means that calculate the dimension of rows, but have 1 column + linear_regressor = LinearRegression() # create object for the class + linear_regressor.fit(X, Y) # perform linear regression + Y_pred = linear_regressor.predict(X) # make predictions + + trend_slope = linear_regressor.coef_[0][0] + #print(linear_regressor.coef_) + + row = {'load': int(dim_value), 'resources': int(instances), 'trend_slope': trend_slope} + #print(row) + raw_runs.append(row) + + runs = pd.DataFrame(raw_runs) + + # Set suitable = True if SLOs are met, i.e., lag trend is below threshold + runs["suitable"] = runs.apply(lambda row: row['trend_slope'] < threshold, axis=1) + + # Sort results table (unsure if required) + runs.columns = runs.columns.str.strip() + runs.sort_values(by=["load", "resources"]) + + # Filter only suitable configurations + filtered = runs[runs.apply(lambda x: x['suitable'], axis=1)] + + # Compute demand per load intensity + grouped = filtered.groupby(['load'])['resources'].min() + demand_per_load = grouped.to_frame().reset_index() + + return demand_per_load diff --git a/execution/README.md b/execution/README.md index d8f30d0742d6e2037840332ec597637619510c79..358ce270400d1e4e4947a8ef736feac74c314163 100644 --- a/execution/README.md +++ b/execution/README.md @@ -153,11 +153,11 @@ declarations for different volume types. Using a [hostPath volume](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) is the easiest option when running Theodolite locally, e.g., with minikube or kind. -Just modify `infrastructure/kubernetes/volumeSingle.yaml` by setting `path` to the directory on your host machine where +Just modify `infrastructure/kubernetes/volume-hostpath.yaml` by setting `path` to the directory on your host machine where all benchmark results should be stored and run: ```sh -kubectl apply -f infrastructure/kubernetes/volumeSingle.yaml +kubectl apply -f infrastructure/kubernetes/volume-hostpath.yaml ``` ##### *local* volume @@ -166,12 +166,12 @@ A [local volume](https://kubernetes.io/docs/concepts/storage/volumes/#local) is access (e.g. via SSH) to one of your cluster nodes. You first need to create a directory on a selected node where all benchmark results should be stored. Next, modify -`infrastructure/kubernetes/volumeCluster.yaml` by setting `<node-name>` to your selected node (this node will most -likely also execute the job). Further, you have to set `path` to the directory on the node you just created. To deploy +`infrastructure/kubernetes/volume-local.yaml` by setting `<node-name>` to your selected node. (This node will most +likely also execute the [Theodolite job](#Execution).) Further, you have to set `path` to the directory on the node you just created. To deploy you volume run: ```sh -kubectl apply -f infrastructure/kubernetes/volumeCluster.yaml +kubectl apply -f infrastructure/kubernetes/volume-local.yaml ``` ##### Other volumes @@ -195,7 +195,7 @@ RBAC is enabled on your cluster (see installation of [Theodolite RBAC](#Theodoli To start the execution of a benchmark run (with `<your-theodolite-yaml>` being your job definition): ```sh -kubectl apply -f <your-theodolite-yaml> +kubectl create -f <your-theodolite-yaml> ``` This will create a pod with a name such as `your-job-name-xxxxxx`. You can verifiy this via `kubectl get pods`. With diff --git a/execution/infrastructure/kafka-lag-exporter/values.yaml b/execution/infrastructure/kafka-lag-exporter/values.yaml index b83a911283a7e8264f982f9eb5d550ad5497ec9d..8e53454345df75b55d5d36799dd0b0f0f75233a0 100644 --- a/execution/infrastructure/kafka-lag-exporter/values.yaml +++ b/execution/infrastructure/kafka-lag-exporter/values.yaml @@ -1,3 +1,6 @@ +image: + pullPolicy: IfNotPresent + clusters: - name: "my-confluent-cp-kafka" bootstrapBrokers: "my-confluent-cp-kafka:9092" diff --git a/execution/infrastructure/kafka/values.yaml b/execution/infrastructure/kafka/values.yaml index 1efbda0515d0a9c881552cb63293ca8cc28c98b2..e65a5fc567d39c7389479d406fa9e6d7156b0f0a 100644 --- a/execution/infrastructure/kafka/values.yaml +++ b/execution/infrastructure/kafka/values.yaml @@ -55,6 +55,7 @@ cp-kafka: # "min.insync.replicas": 2 "auto.create.topics.enable": false "log.retention.ms": "10000" # 10s + #"log.retention.ms": "86400000" # 24h "metrics.sample.window.ms": "5000" #5s ## ------------------------------------------------------ diff --git a/execution/infrastructure/kubernetes/volumeSingle.yaml b/execution/infrastructure/kubernetes/volume-hostpath.yaml similarity index 100% rename from execution/infrastructure/kubernetes/volumeSingle.yaml rename to execution/infrastructure/kubernetes/volume-hostpath.yaml diff --git a/execution/infrastructure/kubernetes/volumeCluster.yaml b/execution/infrastructure/kubernetes/volume-local.yaml similarity index 100% rename from execution/infrastructure/kubernetes/volumeCluster.yaml rename to execution/infrastructure/kubernetes/volume-local.yaml diff --git a/execution/theodolite.yaml b/execution/theodolite.yaml index 68d53386bcf5e77ce08d964f3c04eb000794575c..06d14a0f589b2ac7a16ebaaae4d1490b840ea57b 100644 --- a/execution/theodolite.yaml +++ b/execution/theodolite.yaml @@ -11,7 +11,7 @@ spec: claimName: theodolite-pv-claim containers: - name: theodolite - image: bvonheid/theodolite:latest + image: ghcr.io/cau-se/theodolite:latest # imagePullPolicy: Never # Used to pull "own" local image env: - name: UC # mandatory