Skip to content
Snippets Groups Projects
Commit 1e8e383f authored by Sören Henning's avatar Sören Henning
Browse files

Add demand metric notebook

parent 6ba9f900
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Theodolite Analysis - Demand Metric
This notebook allows applies Theodolite's *demand* metric to describe scalability of a SUT based on Theodolite measurement data.
Theodolite's *demand* metric is a function, mapping load intensities to the minimum required resources (e.g., instances) that are required to process this load. With this notebook, the *demand* metric function is approximated by a map of tested load intensities to their minimum required resources.
The final output when running this notebook will be a CSV file, providig this mapping. It can be used to create nice plots of a system's scalability using the `demand-metric-plot.ipynb` notebook.
%% Cell type:markdown id: tags:
In the following cell, we need to specifiy:
* `exp_id`: The experiment id that is to be analyzed.
* `warmup_sec`: The number of seconds which are to be ignored in the beginning of each experiment.
* `max_lag_trend_slope`: The maximum tolerable increase in queued messages per second.
* `measurement_dir`: The directory where the measurement data files are to be found.
* `results_dir`: The directory where the computed demand CSV files are to be stored.
%% Cell type:code id: tags:
``` python
exp_id = 200
warmup_sec = 60
max_lag_trend_slope = 2000
directory = '<path-to>/results'
results_dir = '<path-to>/results-inst'
```
%% Cell type:markdown id: tags:
With the following call, we compute our demand mapping.
%% Cell type:code id: tags:
``` python
from src.demand import demand
demand = demand(exp_id, measurement_dir, max_lag_trend_slope, warmup_sec)
```
%% Cell type:markdown id: tags:
We might already want to plot a simple visualization here:
%% Cell type:code id: tags:
``` python
demand.plot(kind='line',x='load',y='resources')
```
%% Cell type:markdown id: tags:
Finally we store the results in a CSV file.
%% Cell type:code id: tags:
``` python
demand.to_csv(os.path.join(results_dir, f'exp{exp_id}_demand.csv'), index=False)
```
import os
from datetime import datetime, timedelta, timezone
import pandas as pd
from sklearn.linear_model import LinearRegression
def demand(exp_id, directory, threshold, warmup_sec):
raw_runs = []
# Compute SL, i.e., lag trend, for each tested configuration
filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("totallag.csv")]
for filename in filenames:
#print(filename)
run_params = filename[:-4].split("_")
dim_value = run_params[2]
instances = run_params[3]
df = pd.read_csv(os.path.join(directory, filename))
#input = df.loc[df['topic'] == "input"]
input = df
#print(input)
input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
#print(input)
#print(input.iloc[0, 'timestamp'])
regress = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
#regress = input
#input.plot(kind='line',x='timestamp',y='value',color='red')
#plt.show()
X = regress.iloc[:, 2].values.reshape(-1, 1) # values converts it into a numpy array
Y = regress.iloc[:, 3].values.reshape(-1, 1) # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression() # create object for the class
linear_regressor.fit(X, Y) # perform linear regression
Y_pred = linear_regressor.predict(X) # make predictions
trend_slope = linear_regressor.coef_[0][0]
#print(linear_regressor.coef_)
row = {'load': int(dim_value), 'resources': int(instances), 'trend_slope': trend_slope}
#print(row)
raw_runs.append(row)
runs = pd.DataFrame(raw_runs)
# Set suitable = True if SLOs are met, i.e., lag trend is below threshold
runs["suitable"] = runs.apply(lambda row: row['trend_slope'] < threshold, axis=1)
# Sort results table (unsure if required)
runs.columns = runs.columns.str.strip()
runs.sort_values(by=["load", "resources"])
# Filter only suitable configurations
filtered = runs[runs.apply(lambda x: x['suitable'], axis=1)]
# Compute demand per load intensity
grouped = filtered.groupby(['load'])['resources'].min()
demand_per_load = grouped.to_frame().reset_index()
return demand_per_load
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment