Merge branch '254-adjust-analysis' into 'master'

Update demant-metric notebooks in order to run with the new implementation Closes #254 See merge request !190

Merge branch '254-adjust-analysis' into 'master'
Update demant-metric notebooks in order to run with the new implementation Closes #254 See merge request !190
26d64c83 · Sören Henning · 3f1b94c2 · 892ff732 · 26d64c83 · 26d64c83
Commit 26d64c83 authored 3 years ago by Sören Henning
--- a/analysis/demand-metric-plot.ipynb
+++ b/analysis/demand-metric-plot.ipynb
 {
 "cells": [
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "# Theodolite Analysis - Plotting the Demand Metric\n",
    "\n",
    "This notebook creates a plot, showing scalability as a function that maps load intensities to the resources required for processing them. It is able to combine multiple such plots in one figure, for example, to compare multiple systems or configurations.\n",
    "\n",
    "The notebook takes a CSV file for each plot mapping load intensities to minimum required resources, computed by the `demand-metric-plot.ipynb` notebook."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "First, we need to import some libraries, which are required for creating the plots."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -33,11 +33,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). To use Unicode narrow non-breaking spaces in the description format it as `u\"1000\\u202FmCPU\"`."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -53,11 +53,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "Now, we combie all systems described in `experiments`."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -71,11 +71,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "We might want to display the mappings before we plot it."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -87,11 +87,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "The following code creates a MatPlotLib figure showing the scalability plots for all specified systems. You might want to adjust its styling etc. according to your preferences. Make sure to also set a filename."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -149,27 +149,33 @@
  }
 ],
 "metadata": {
+  "file_extension": ".py",
+  "interpreter": {
+   "hash": "e9e076445e1891a25f59b525adcc71b09846b3f9cf034ce4147fc161b19af121"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.8.10 64-bit ('.venv': venv)",
+   "name": "python3"
+  },
  "language_info": {
-   "name": "python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
-   "version": "3.8.5-final"
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
  },
-  "orig_nbformat": 2,
-  "file_extension": ".py",
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
+  "orig_nbformat": 2,
  "pygments_lexer": "ipython3",
-  "version": 3,
-  "kernelspec": {
-   "name": "python37064bitvenvvenv6c432ee1239d4f3cb23f871068b0267d",
-   "display_name": "Python 3.7.0 64-bit ('.venv': venv)",
-   "language": "python"
-  }
+  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 2
-}
\ No newline at end of file
+}
 %% Cell type:markdown id: tags:

 # Theodolite Analysis - Plotting the Demand Metric

 This notebook creates a plot, showing scalability as a function that maps load intensities to the resources required for processing them. It is able to combine multiple such plots in one figure, for example, to compare multiple systems or configurations.

 The notebook takes a CSV file for each plot mapping load intensities to minimum required resources, computed by the `demand-metric-plot.ipynb` notebook.

 %% Cell type:markdown id: tags:

 First, we need to import some libraries, which are required for creating the plots.

 %% Cell type:code id: tags:

-``` python
+``` 
 import os
 import pandas as pd
 from functools import reduce
 import matplotlib.pyplot as plt
 from matplotlib.ticker import FuncFormatter
 from matplotlib.ticker import MaxNLocator
 ```

 %% Cell type:markdown id: tags:

 We need to specify the directory, where the demand CSV files can be found, and a dictionary that maps a system description (e.g. its name) to the corresponding CSV file (prefix). To use Unicode narrow non-breaking spaces in the description format it as `u"1000\u202FmCPU"`.

 %% Cell type:code id: tags:

-``` python
+``` 
 results_dir = '<path-to>/results'

 experiments = {
    'System XYZ': 'exp200',
 }
 ```

 %% Cell type:markdown id: tags:

 Now, we combie all systems described in `experiments`.

 %% Cell type:code id: tags:

-``` python
+``` 
 dataframes = [pd.read_csv(os.path.join(results_dir, f'{v}_demand.csv')).set_index('load').rename(columns={"resources": k}) for k, v in experiments.items()]

 df = reduce(lambda df1,df2: df1.join(df2,how='outer'), dataframes)
 ```

 %% Cell type:markdown id: tags:

 We might want to display the mappings before we plot it.

 %% Cell type:code id: tags:

-``` python
+``` 
 df
 ```

 %% Cell type:markdown id: tags:

 The following code creates a MatPlotLib figure showing the scalability plots for all specified systems. You might want to adjust its styling etc. according to your preferences. Make sure to also set a filename.

 %% Cell type:code id: tags:

-``` python
+``` 
 plt.style.use('ggplot')
 plt.rcParams['pdf.fonttype'] = 42 # TrueType fonts
 plt.rcParams['ps.fonttype'] = 42 # TrueType fonts
 plt.rcParams['axes.facecolor']='w'
 plt.rcParams['axes.edgecolor']='555555'
 #plt.rcParams['ytick.color']='black'
 plt.rcParams['grid.color']='dddddd'
 plt.rcParams['axes.spines.top']='false'
 plt.rcParams['axes.spines.right']='false'
 plt.rcParams['legend.frameon']='true'
 plt.rcParams['legend.framealpha']='1'
 plt.rcParams['legend.edgecolor']='1'
 plt.rcParams['legend.borderpad']='1'

 @FuncFormatter
 def load_formatter(x, pos):
    return f'{(x/1000):.0f}k'

 markers = ['s', 'D', 'o', 'v', '^', '<', '>', 'p', 'X']

 def splitSerToArr(ser):
    return [ser.index, ser.as_matrix()]

 plt.figure()
 #plt.figure(figsize=(4.8, 3.6)) # For other plot sizes
 #ax = df.plot(kind='line', marker='o')
 for i, column in enumerate(df):
    plt.plot(df[column].dropna(), marker=markers[i], label=column)
 plt.legend()
 ax = plt.gca()
 #ax = df.plot(kind='line',x='dim_value', legend=False, use_index=True)
 ax.set_ylabel('number of instances')
 ax.set_xlabel('messages/second')
 ax.set_ylim(ymin=0)
 #ax.set_xlim(xmin=0)
 ax.yaxis.set_major_locator(MaxNLocator(integer=True))
 ax.xaxis.set_major_formatter(FuncFormatter(load_formatter))

 plt.savefig('temp.pdf', bbox_inches='tight')
 ```

 %% Cell type:code id: tags:

-``` python
+``` 
 ```

--- a/analysis/demand-metric.ipynb
+++ b/analysis/demand-metric.ipynb
 {
 "cells": [
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "# Theodolite Analysis - Demand Metric\n",
    "\n",
@@ -9,11 +11,11 @@
    "Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.\n",
    "\n",
    "The final output when running this notebook will be a CSV file, providig this mapping. It can be used to create nice plots of a system's scalability using the `demand-metric-plot.ipynb` notebook."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "In the following cell, we need to specifiy:\n",
    "\n",
@@ -22,9 +24,7 @@
    "* `max_lag_trend_slope`: The maximum tolerable increase in queued messages per second.\n",
    "* `measurement_dir`: The directory where the measurement data files are to be found.\n",
    "* `results_dir`: The directory where the computed demand CSV files are to be stored."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -40,11 +40,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "With the following call, we compute our demand mapping."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -58,11 +58,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "We might already want to plot a simple visualization here:"
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -74,11 +74,11 @@
   ]
  },
  {
+   "cell_type": "markdown",
+   "metadata": {},
   "source": [
    "Finally we store the results in a CSV file."
-   ],
-   "cell_type": "markdown",
-   "metadata": {}
+   ]
  },
  {
   "cell_type": "code",
@@ -93,27 +93,33 @@
  }
 ],
 "metadata": {
+  "file_extension": ".py",
+  "interpreter": {
+   "hash": "e9e076445e1891a25f59b525adcc71b09846b3f9cf034ce4147fc161b19af121"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.8.10 64-bit ('.venv': venv)",
+   "name": "python3"
+  },
  "language_info": {
-   "name": "python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
-   "version": "3.8.5-final"
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
  },
-  "orig_nbformat": 2,
-  "file_extension": ".py",
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
+  "orig_nbformat": 2,
  "pygments_lexer": "ipython3",
-  "version": 3,
-  "kernelspec": {
-   "name": "python37064bitvenvvenv6c432ee1239d4f3cb23f871068b0267d",
-   "display_name": "Python 3.7.0 64-bit ('.venv': venv)",
-   "language": "python"
-  }
+  "version": 3
 },
 "nbformat": 4,
 "nbformat_minor": 2
-}
\ No newline at end of file
+}
 %% Cell type:markdown id: tags:

 # Theodolite Analysis - Demand Metric

 This notebook applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.

 Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.

 The final output when running this notebook will be a CSV file, providig this mapping. It can be used to create nice plots of a system's scalability using the `demand-metric-plot.ipynb` notebook.

 %% Cell type:markdown id: tags:

 In the following cell, we need to specifiy:

 * `exp_id`: The experiment id  that is to be analyzed.
 * `warmup_sec`: The number of seconds which are to be ignored in the beginning of each experiment.
 * `max_lag_trend_slope`: The maximum tolerable increase in queued messages per second.
 * `measurement_dir`: The directory where the measurement data files are to be found.
 * `results_dir`: The directory where the computed demand CSV files are to be stored.

 %% Cell type:code id: tags:

-``` python
+``` 
 exp_id = 200
 warmup_sec = 60
 max_lag_trend_slope = 2000
 measurement_dir = '<path-to>/measurements'
 results_dir = '<path-to>/results'
 ```

 %% Cell type:markdown id: tags:

 With the following call, we compute our demand mapping.

 %% Cell type:code id: tags:

-``` python
+``` 
 from src.demand import demand

 demand = demand(exp_id, measurement_dir, max_lag_trend_slope, warmup_sec)
 ```

 %% Cell type:markdown id: tags:

 We might already want to plot a simple visualization here:

 %% Cell type:code id: tags:

-``` python
+``` 
 demand.plot(kind='line',x='load',y='resources')
 ```

 %% Cell type:markdown id: tags:

 Finally we store the results in a CSV file.

 %% Cell type:code id: tags:

-``` python
+``` 
 import os

 demand.to_csv(os.path.join(results_dir, f'exp{exp_id}_demand.csv'), index=False)
 ```

--- a/analysis/src/demand.py
+++ b/analysis/src/demand.py
 import os
 from datetime import datetime, timedelta, timezone
 import pandas as pd
+from pandas.core.frame import DataFrame
 from sklearn.linear_model import LinearRegression

 def demand(exp_id, directory, threshold, warmup_sec):
    raw_runs = []

-    # Compute SL, i.e., lag trend, for each tested configuration
-    filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("totallag.csv")]
+    # Compute SLI, i.e., lag trend, for each tested configuration
+    filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and "lag-trend" in filename and filename.endswith(".csv")]
    for filename in filenames:
-        #print(filename)
        run_params = filename[:-4].split("_")
-        dim_value = run_params[2]
-        instances = run_params[3]
+        dim_value = run_params[1]
+        instances = run_params[2]

        df = pd.read_csv(os.path.join(directory, filename))
-        #input = df.loc[df['topic'] == "input"]
        input = df
-        #print(input)
+
        input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
-        #print(input)
-        #print(input.iloc[0, 'timestamp'])
+    
        regress = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
-        #regress = input

-        #input.plot(kind='line',x='timestamp',y='value',color='red')
-        #plt.show()
+        X = regress.iloc[:, 1].values.reshape(-1, 1)  # values converts it into a numpy array
+        Y = regress.iloc[:, 2].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column

-        X = regress.iloc[:, 2].values.reshape(-1, 1)  # values converts it into a numpy array
-        Y = regress.iloc[:, 3].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
        linear_regressor = LinearRegression()  # create object for the class
        linear_regressor.fit(X, Y)  # perform linear regression
        Y_pred = linear_regressor.predict(X)  # make predictions

        trend_slope = linear_regressor.coef_[0][0]
-        #print(linear_regressor.coef_)

        row = {'load': int(dim_value), 'resources': int(instances), 'trend_slope': trend_slope}
-        #print(row)
        raw_runs.append(row)

    runs = pd.DataFrame(raw_runs)

-    # Set suitable = True if SLOs are met, i.e., lag trend is below threshold
-    runs["suitable"] =  runs.apply(lambda row: row['trend_slope'] < threshold, axis=1)
-
-    # Sort results table (unsure if required)
-    runs.columns = runs.columns.str.strip()
-    runs.sort_values(by=["load", "resources"])
+    # Group by the load and resources to handle repetitions, and take from the reptitions the median
+    # for even reptitions, the mean of the two middle values is used
+    medians = runs.groupby(by=['load', 'resources'], as_index=False).median()

-    # Filter only suitable configurations
-    filtered = runs[runs.apply(lambda x: x['suitable'], axis=1)]
-
-    # Compute demand per load intensity
-    grouped = filtered.groupby(['load'])['resources'].min()
-    demand_per_load = grouped.to_frame().reset_index()
+    # Set suitable = True if SLOs are met, i.e., lag trend slope is below threshold
+    medians["suitable"] =  medians.apply(lambda row: row['trend_slope'] < threshold, axis=1)

+    suitable = medians[medians.apply(lambda x: x['suitable'], axis=1)]
+    
+    # Compute minimal demand per load intensity
+    demand_per_load = suitable.groupby(by=['load'], as_index=False)['resources'].min()
+    
    return demand_per_load
+
--- a/theodolite/src/main/kotlin/theodolite/execution/TheodoliteExecutor.kt
+++ b/theodolite/src/main/kotlin/theodolite/execution/TheodoliteExecutor.kt
@@ -115,10 +115,10 @@ class TheodoliteExecutor(
        val ioHandler = IOHandler()
        val resultsFolder = ioHandler.getResultFolderURL()
        this.config.executionId = getAndIncrementExecutionID(resultsFolder + "expID.txt")
-        ioHandler.writeToJSONFile(this.config, "$resultsFolder${this.config.executionId}-execution-configuration")
+        ioHandler.writeToJSONFile(this.config, "${resultsFolder}exp${this.config.executionId}-execution-configuration")
        ioHandler.writeToJSONFile(
            kubernetesBenchmark,
-            "$resultsFolder${this.config.executionId}-benchmark-configuration"
+            "${resultsFolder}exp${this.config.executionId}-benchmark-configuration"
        )

        val config = buildConfig()
@@ -130,7 +130,7 @@ class TheodoliteExecutor(
        }
        ioHandler.writeToJSONFile(
            config.compositeStrategy.benchmarkExecutor.results,
-            "$resultsFolder${this.config.executionId}-result"
+            "${resultsFolder}exp${this.config.executionId}-result"
        )
    }