Skip to content
Snippets Groups Projects
Commit 0cfb8e4b authored by Simon Ehrenstein's avatar Simon Ehrenstein
Browse files

Merge

parents 5421ed7f 67371596
No related branches found
No related tags found
1 merge request!6Add Distributed Workload Generator
Showing
with 299 additions and 130 deletions
configFilePath=config/checkstyle.xml
customModulesJarPaths=
eclipse.preferences.version=1
enabled=true
enabled=false
customRulesJars=
eclipse.preferences.version=1
enabled=true
enabled=false
ruleSetFilePath=config/pmd.xml
Current envisaged project structure:
# Theodolite
> A theodolite is a precision optical instrument for measuring angles between designated visible points in the horizontal and vertical planes. -- <cite>[Wikipedia](https://en.wikipedia.org/wiki/Theodolite)</cite>
* `scalability-benchmarking`
* `uc1-application-kstreams`
* `uc1-application-samza`
* `uc1-workload-generation`
* ...
Theodolite is a framework for benchmarking the horizontal and vertical scalability of stream processing engines. It consists of three modules:
## Theodolite Benchmarks
Theodolite contains 4 application benchmarks, which are based on typical use cases for stream processing within microservices. For each benchmark, a corresponding workload generator is provided. Currently, this repository provides benchmark implementations for Kafka Streams.
## Theodolite Execution Framework
Theodolite aims to benchmark scalability of stream processing engines for real use cases. Microservices that apply stream processing techniques are usually deployed in elastic cloud environments. Hence, Theodolite's cloud-native benchmarking framework deploys as components in a cloud environment, orchestrated by Kubernetes. More information on how to execute scalability benchmarks can be found in [Thedolite execution framework](execution).
## Theodolite Analysis Tools
Theodolite's benchmarking method create a *scalability graph* allowing to draw conclusions about the scalability of a stream processing engine or its deployment. A scalability graph shows how resource demand evolves with an increasing workload. Theodolite provides Jupyter notebooks for creating such scalability graphs based on benchmarking results from the execution framework. More information can be found in [Theodolite analysis tool](analysis).
# Theodolite Analysis
This directory contains Jupyter notebooks for analyzing and visualizing
benchmark execution results and plotting. The following notebooks are provided:
* [scalability-graph.ipynb](scalability-graph.ipynb): Creates a scalability graph for a certain benchmark execution.
* [scalability-graph-final.ipynb](scalability-graph-final.ipynb): Combines the scalability graphs of multiple benchmarks executions (e.g. for comparing different configuration).
* [lag-trend-graph.ipynb](lag-trend-graph.ipynb): Visualizes the consumer lag evaluation over time along with the computed trend.
## Usage
For executing benchmarks and analyzing their results, a **Python 3.7**
installation is required (e.g., in a virtual environment). Our notebooks require some
Python libraries, which can be installed via:
```sh
pip install -r requirements.txt
```
We have tested these
notebooks with [Visual Studio Code](https://code.visualstudio.com/docs/python/jupyter-support),
however, every other server should be fine as well.
%% Cell type:code id: tags:
```
import os
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import matplotlib
```
%% Cell type:code id: tags:
```
directory = ''
filename = 'xxx_totallag.csv'
directory = '<path-to>/results'
#filename = 'exp1002_uc3_75000_1_totallag.csv'
filename = 'exp1002_uc3_50000_2_totallag.csv'
warmup_sec = 60
threshold = 2000 #slope
```
%% Cell type:code id: tags:
```
df = pd.read_csv(os.path.join(directory, filename))
input = df.iloc[::3]
#print(input)
input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
#print(input)
#print(input.iloc[0, 'timestamp'])
regress = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
#regress = input
#input.plot(kind='line',x='timestamp',y='value',color='red')
#plt.show()
X = regress.iloc[:, 4].values.reshape(-1, 1) # values converts it into a numpy array
Y = regress.iloc[:, 3].values.reshape(-1, 1) # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression() # create object for the class
linear_regressor.fit(X, Y) # perform linear regression
Y_pred = linear_regressor.predict(X) # make predictions
```
%% Cell type:code id: tags:
```
print(linear_regressor.coef_)
```
%% Cell type:code id: tags:
```
plt.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['axes.edgecolor']='555555'
#plt.rcParams['ytick.color']='black'
plt.rcParams['grid.color']='dddddd'
plt.rcParams['axes.spines.top']='false'
plt.rcParams['axes.spines.right']='false'
plt.rcParams['legend.frameon']='true'
plt.rcParams['legend.framealpha']='1'
plt.rcParams['legend.edgecolor']='1'
plt.rcParams['legend.borderpad']='1'
#filename = f"exp{exp_id}_{benchmark}_{dim_value}_{instances}"
t_warmup = input.loc[input['sec_start'] <= warmup_sec].iloc[:, 4].values
y_warmup = input.loc[input['sec_start'] <= warmup_sec].iloc[:, 3].values
plt.figure()
#plt.figure(figsize=(4, 3))
plt.plot(X, Y, c="#348ABD", label="observed")
#plt.plot(t_warmup, y_warmup)
plt.plot(X, Y_pred, c="#E24A33", label="trend") # color='red')
#348ABD, 7A68A6, A60628, 467821, CF4457, 188487, E24A33
plt.gca().yaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, pos: '%1.0fK' % (x * 1e-3)))
plt.ylabel('queued messages')
plt.xlabel('seconds since start')
plt.legend()
#ax.set_ylim(ymin=0)
#ax.set_xlim(xmin=0)
plt.savefig("plot.pdf", bbox_inches='tight')
```
%% Cell type:code id: tags:
```
```
%% Cell type:code id: tags:
```
```
......
jupyter==1.0.0
matplotlib==3.2.0
pandas==1.0.1
scikit-learn==0.22.2.post1
\ No newline at end of file
%% Cell type:code id: tags:
```
import os
import pandas as pd
from functools import reduce
import matplotlib.pyplot as plt
```
%% Cell type:code id: tags:
```
directory = '../results-inst'
directory = '<path-to>/results-inst'
experiments = {
'exp1003': 'exp1003',
'exp1025': 'exp1025',
}
```
%% Cell type:code id: tags:
```
dataframes = [pd.read_csv(os.path.join(directory, f'{v}_min-suitable-instances.csv')).set_index('dim_value').rename(columns={"instances": k}) for k, v in experiments.items()]
df = reduce(lambda df1,df2: df1.join(df2,how='outer'), dataframes)
df
```
%% Cell type:code id: tags:
```
plt.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['axes.edgecolor']='555555'
#plt.rcParams['ytick.color']='black'
plt.rcParams['grid.color']='dddddd'
plt.rcParams['axes.spines.top']='false'
plt.rcParams['axes.spines.right']='false'
plt.rcParams['legend.frameon']='true'
plt.rcParams['legend.framealpha']='1'
plt.rcParams['legend.edgecolor']='1'
plt.rcParams['legend.borderpad']='1'
plt.figure()
ax = df.plot(kind='line', marker='o')
#ax = df.plot(kind='line',x='dim_value', legend=False, use_index=True)
ax.set_ylabel('instances')
ax.set_xlabel('data sources')
ax.set_ylim(ymin=0)
#ax.set_xlim(xmin=0)
```
%% Cell type:code id: tags:
```
```
......
%% Cell type:code id: tags:
```
print("hello")
```
%% Cell type:code id: tags:
```
import os
import requests
from datetime import datetime, timedelta, timezone
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
```
%% Cell type:code id: tags:
```
os.getcwd()
```
%% Cell type:code id: tags:
```
exp_id = 1003
exp_id = 2012
warmup_sec = 60
warmup_partitions_sec = 120
threshold = 2000 #slope
directory = '../results'
#directory = '../results'
directory = '<path-to>/results'
directory_out = '<path-to>/results-inst'
```
%% Cell type:code id: tags:outputPrepend,outputPrepend
```
#exp_id = 35
#os.chdir("./results-final")
raw_runs = []
filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("totallag.csv")]
for filename in filenames:
#print(filename)
run_params = filename[:-4].split("_")
dim_value = run_params[2]
instances = run_params[3]
df = pd.read_csv(os.path.join(directory, filename))
#input = df.loc[df['topic'] == "input"]
input = df
#print(input)
input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
#print(input)
#print(input.iloc[0, 'timestamp'])
regress = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
#regress = input
#input.plot(kind='line',x='timestamp',y='value',color='red')
#plt.show()
X = regress.iloc[:, 2].values.reshape(-1, 1) # values converts it into a numpy array
Y = regress.iloc[:, 3].values.reshape(-1, 1) # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression() # create object for the class
linear_regressor.fit(X, Y) # perform linear regression
Y_pred = linear_regressor.predict(X) # make predictions
trend_slope = linear_regressor.coef_[0][0]
#print(linear_regressor.coef_)
row = {'dim_value': int(dim_value), 'instances': int(instances), 'trend_slope': trend_slope}
#print(row)
raw_runs.append(row)
lags = pd.DataFrame(raw_runs)
```
%% Cell type:code id: tags:
```
lags.head()
```
%% Cell type:code id: tags:
```
raw_partitions = []
filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("partitions.csv")]
for filename in filenames:
#print(filename)
run_params = filename[:-4].split("_")
dim_value = run_params[2]
instances = run_params[3]
df = pd.read_csv(os.path.join(directory, filename))
#input = df.loc[df['topic'] == "input"]
input = df
#print(input)
input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
#print(input)
#print(input.iloc[0, 'timestamp'])
input = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
#regress = input
input = input.loc[input['topic'] >= 'input']
mean = input['value'].mean()
#input.plot(kind='line',x='timestamp',y='value',color='red')
#plt.show()
row = {'dim_value': int(dim_value), 'instances': int(instances), 'partitions': mean}
#print(row)
raw_partitions.append(row)
partitions = pd.DataFrame(raw_partitions)
#runs = lags.join(partitions.set_index(['dim_value', 'instances']), on=['dim_value', 'instances'])
```
%% Cell type:code id: tags:
```
raw_obs_instances = []
filenames = [filename for filename in os.listdir(directory) if filename.startswith(f"exp{exp_id}") and filename.endswith("instances.csv")]
for filename in filenames:
run_params = filename[:-4].split("_")
dim_value = run_params[2]
instances = run_params[3]
df = pd.read_csv(os.path.join(directory, filename))
if df.empty:
continue
#input = df.loc[df['topic'] == "input"]
input = df
#print(input)
input['sec_start'] = input.loc[0:, 'timestamp'] - input.iloc[0]['timestamp']
#print(input)
#print(input.iloc[0, 'timestamp'])
input = input.loc[input['sec_start'] >= warmup_sec] # Warm-Up
#regress = input
#input = input.loc[input['topic'] >= 'input']
#mean = input['value'].mean()
#input.plot(kind='line',x='timestamp',y='value',color='red')
#plt.show()
#row = {'dim_value': int(dim_value), 'instances': int(instances), 'obs_instances': mean}
#print(row)
raw_obs_instances.append(row)
obs_instances = pd.DataFrame(raw_obs_instances)
obs_instances.head()
```
%% Cell type:code id: tags:
```
runs = lags
#runs = lags.join(partitions.set_index(['dim_value', 'instances']), on=['dim_value', 'instances'])#.join(obs_instances.set_index(['dim_value', 'instances']), on=['dim_value', 'instances'])
#runs["failed"] = runs.apply(lambda row: (abs(row['instances'] - row['obs_instances']) / row['instances']) > 0.1, axis=1)
#runs.loc[runs['failed']==True]
```
%% Cell type:code id: tags:
```
#threshold = 1000
# Set to true if the trend line has a slope less than
runs["suitable"] = runs.apply(lambda row: row['trend_slope'] < threshold, axis=1)
runs.columns = runs.columns.str.strip()
runs.sort_values(by=["dim_value", "instances"])
```
%% Cell type:code id: tags:
```
filtered = runs[runs.apply(lambda x: x['suitable'], axis=1)]
grouped = filtered.groupby(['dim_value'])['instances'].min()
min_suitable_instances = grouped.to_frame().reset_index()
min_suitable_instances
```
%% Cell type:code id: tags:
```
min_suitable_instances.to_csv(f'../results-inst/exp{exp_id}_min-suitable-instances.csv', index=False)
min_suitable_instances.to_csv(os.path.join(directory_out, f'../results-inst/exp{exp_id}_min-suitable-instances.csv'), index=False)
```
%% Cell type:code id: tags:
```
min_suitable_instances.plot(kind='line',x='dim_value',y='instances')
# min_suitable_instances.plot(kind='line',x='dim_value',y='instances', logy=True)
plt.show()
```
%% Cell type:code id: tags:
```
```
......
cleanup.add_default_serial_version_id=true
cleanup.add_generated_serial_version_id=false
cleanup.add_missing_annotations=true
cleanup.add_missing_deprecated_annotations=true
cleanup.add_missing_methods=false
cleanup.add_missing_nls_tags=false
cleanup.add_missing_override_annotations=true
cleanup.add_missing_override_annotations_interface_methods=true
cleanup.add_serial_version_id=false
cleanup.always_use_blocks=true
cleanup.always_use_parentheses_in_expressions=false
cleanup.always_use_this_for_non_static_field_access=true
cleanup.always_use_this_for_non_static_method_access=true
cleanup.convert_functional_interfaces=false
cleanup.convert_to_enhanced_for_loop=true
cleanup.correct_indentation=true
cleanup.format_source_code=true
cleanup.format_source_code_changes_only=false
cleanup.insert_inferred_type_arguments=false
cleanup.make_local_variable_final=true
cleanup.make_parameters_final=true
cleanup.make_private_fields_final=true
cleanup.make_type_abstract_if_missing_method=false
cleanup.make_variable_declarations_final=true
cleanup.never_use_blocks=false
cleanup.never_use_parentheses_in_expressions=true
cleanup.organize_imports=true
cleanup.qualify_static_field_accesses_with_declaring_class=false
cleanup.qualify_static_member_accesses_through_instances_with_declaring_class=true
cleanup.qualify_static_member_accesses_through_subtypes_with_declaring_class=true
cleanup.qualify_static_member_accesses_with_declaring_class=true
cleanup.qualify_static_method_accesses_with_declaring_class=false
cleanup.remove_private_constructors=true
cleanup.remove_redundant_modifiers=false
cleanup.remove_redundant_semicolons=false
cleanup.remove_redundant_type_arguments=true
cleanup.remove_trailing_whitespaces=true
cleanup.remove_trailing_whitespaces_all=true
cleanup.remove_trailing_whitespaces_ignore_empty=false
cleanup.remove_unnecessary_casts=true
cleanup.remove_unnecessary_nls_tags=true
cleanup.remove_unused_imports=true
cleanup.remove_unused_local_variables=false
cleanup.remove_unused_private_fields=true
cleanup.remove_unused_private_members=false
cleanup.remove_unused_private_methods=true
cleanup.remove_unused_private_types=true
cleanup.sort_members=false
cleanup.sort_members_all=false
cleanup.use_anonymous_class_creation=false
cleanup.use_blocks=true
cleanup.use_blocks_only_for_return_and_throw=false
cleanup.use_lambda=true
cleanup.use_parentheses_in_expressions=true
cleanup.use_this_for_non_static_field_access=true
cleanup.use_this_for_non_static_field_access_only_if_necessary=false
cleanup.use_this_for_non_static_method_access=true
cleanup.use_this_for_non_static_method_access_only_if_necessary=false
cleanup_profile=_CAU-SE-Style
cleanup_settings_version=2
eclipse.preferences.version=1
editor_save_participant_org.eclipse.jdt.ui.postsavelistener.cleanup=true
formatter_profile=_CAU-SE-Style
formatter_settings_version=15
org.eclipse.jdt.ui.ignorelowercasenames=true
org.eclipse.jdt.ui.importorder=;
org.eclipse.jdt.ui.ondemandthreshold=99
org.eclipse.jdt.ui.staticondemandthreshold=99
sp_cleanup.add_default_serial_version_id=true
sp_cleanup.add_generated_serial_version_id=false
sp_cleanup.add_missing_annotations=true
sp_cleanup.add_missing_deprecated_annotations=true
sp_cleanup.add_missing_methods=false
sp_cleanup.add_missing_nls_tags=false
sp_cleanup.add_missing_override_annotations=true
sp_cleanup.add_missing_override_annotations_interface_methods=true
sp_cleanup.add_serial_version_id=false
sp_cleanup.always_use_blocks=true
sp_cleanup.always_use_parentheses_in_expressions=false
sp_cleanup.always_use_this_for_non_static_field_access=true
sp_cleanup.always_use_this_for_non_static_method_access=true
sp_cleanup.convert_functional_interfaces=false
sp_cleanup.convert_to_enhanced_for_loop=true
sp_cleanup.correct_indentation=true
sp_cleanup.format_source_code=true
sp_cleanup.format_source_code_changes_only=false
sp_cleanup.insert_inferred_type_arguments=false
sp_cleanup.make_local_variable_final=true
sp_cleanup.make_parameters_final=true
sp_cleanup.make_private_fields_final=true
sp_cleanup.make_type_abstract_if_missing_method=false
sp_cleanup.make_variable_declarations_final=true
sp_cleanup.never_use_blocks=false
sp_cleanup.never_use_parentheses_in_expressions=true
sp_cleanup.on_save_use_additional_actions=true
sp_cleanup.organize_imports=true
sp_cleanup.qualify_static_field_accesses_with_declaring_class=false
sp_cleanup.qualify_static_member_accesses_through_instances_with_declaring_class=true
sp_cleanup.qualify_static_member_accesses_through_subtypes_with_declaring_class=true
sp_cleanup.qualify_static_member_accesses_with_declaring_class=true
sp_cleanup.qualify_static_method_accesses_with_declaring_class=false
sp_cleanup.remove_private_constructors=true
sp_cleanup.remove_redundant_modifiers=false
sp_cleanup.remove_redundant_semicolons=false
sp_cleanup.remove_redundant_type_arguments=true
sp_cleanup.remove_trailing_whitespaces=true
sp_cleanup.remove_trailing_whitespaces_all=true
sp_cleanup.remove_trailing_whitespaces_ignore_empty=false
sp_cleanup.remove_unnecessary_casts=true
sp_cleanup.remove_unnecessary_nls_tags=true
sp_cleanup.remove_unused_imports=true
sp_cleanup.remove_unused_local_variables=false
sp_cleanup.remove_unused_private_fields=true
sp_cleanup.remove_unused_private_members=false
sp_cleanup.remove_unused_private_methods=true
sp_cleanup.remove_unused_private_types=true
sp_cleanup.sort_members=false
sp_cleanup.sort_members_all=false
sp_cleanup.use_anonymous_class_creation=false
sp_cleanup.use_blocks=true
sp_cleanup.use_blocks_only_for_return_and_throw=false
sp_cleanup.use_lambda=true
sp_cleanup.use_parentheses_in_expressions=true
sp_cleanup.use_this_for_non_static_field_access=true
sp_cleanup.use_this_for_non_static_field_access_only_if_necessary=false
sp_cleanup.use_this_for_non_static_method_access=true
sp_cleanup.use_this_for_non_static_method_access_only_if_necessary=false
configFilePath=../config/checkstyle.xml
customModulesJarPaths=
eclipse.preferences.version=1
enabled=true
customRulesJars=
eclipse.preferences.version=1
enabled=true
ruleSetFilePath=../config/pmd.xml
package spesb.uc1.streamprocessing;
package theodolite.commons.kafkastreams;
import java.util.Objects;
import java.util.Properties;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import titan.ccp.common.kafka.streams.PropertiesBuilder;
/**
* Builder for the Kafka Streams configuration.
*/
public class KafkaStreamsBuilder {
private static final String APPLICATION_NAME = "titan-ccp-history";
private static final String APPLICATION_VERSION = "0.0.1";
// private static final Logger LOGGER = LoggerFactory.getLogger(KafkaStreamsBuilder.class);
public abstract class KafkaStreamsBuilder {
// Kafkastreams application specific
private String applicationName; // NOPMD
private String applicationVersion; // NOPMD
private String bootstrapServers; // NOPMD
private String inputTopic; // NOPMD
private int numThreads = -1; // NOPMD
private int commitIntervalMs = -1; // NOPMD
private int cacheMaxBytesBuff = -1; // NOPMD
public KafkaStreamsBuilder inputTopic(final String inputTopic) {
this.inputTopic = inputTopic;
/**
* Sets the application name for the {@code KafkaStreams} application. It is used to create the
* application ID.
*
* @param applicationName Name of the application.
* @return
*/
public KafkaStreamsBuilder applicationName(final String applicationName) {
this.applicationName = applicationName;
return this;
}
/**
* Sets the application version for the {@code KafkaStreams} application. It is used to create the
* application ID.
*
* @param applicationVersion Version of the application.
* @return
*/
public KafkaStreamsBuilder applicationVersion(final String applicationVersion) {
this.applicationVersion = applicationVersion;
return this;
}
/**
* Sets the bootstrap servers for the {@code KafkaStreams} application.
*
* @param bootstrapServers String for a bootstrap server.
* @return
*/
public KafkaStreamsBuilder bootstrapServers(final String bootstrapServers) {
this.bootstrapServers = bootstrapServers;
return this;
......@@ -35,6 +58,9 @@ public class KafkaStreamsBuilder {
/**
* Sets the Kafka Streams property for the number of threads (num.stream.threads). Can be minus
* one for using the default.
*
* @param numThreads Number of threads. -1 for using the default.
* @return
*/
public KafkaStreamsBuilder numThreads(final int numThreads) {
if (numThreads < -1 || numThreads == 0) {
......@@ -48,6 +74,10 @@ public class KafkaStreamsBuilder {
* Sets the Kafka Streams property for the frequency with which to save the position (offsets in
* source topics) of tasks (commit.interval.ms). Must be zero for processing all record, for
* example, when processing bulks of records. Can be minus one for using the default.
*
* @param commitIntervalMs Frequency with which to save the position of tasks. In ms, -1 for using
* the default.
* @return
*/
public KafkaStreamsBuilder commitIntervalMs(final int commitIntervalMs) {
if (commitIntervalMs < -1) {
......@@ -61,6 +91,10 @@ public class KafkaStreamsBuilder {
* Sets the Kafka Streams property for maximum number of memory bytes to be used for record caches
* across all threads (cache.max.bytes.buffering). Must be zero for processing all record, for
* example, when processing bulks of records. Can be minus one for using the default.
*
* @param cacheMaxBytesBuffering Number of memory bytes to be used for record caches across all
* threads. -1 for using the default.
* @return
*/
public KafkaStreamsBuilder cacheMaxBytesBuffering(final int cacheMaxBytesBuffering) {
if (cacheMaxBytesBuffering < -1) {
......@@ -71,22 +105,38 @@ public class KafkaStreamsBuilder {
}
/**
* Builds the {@link KafkaStreams} instance.
* Method to implement a {@link Topology} for a {@code KafkaStreams} application.
*
* @return A {@code Topology} for a {@code KafkaStreams} application.
*/
public KafkaStreams build() {
Objects.requireNonNull(this.inputTopic, "Input topic has not been set.");
// TODO log parameters
final TopologyBuilder topologyBuilder = new TopologyBuilder(
this.inputTopic);
final Properties properties = PropertiesBuilder
protected abstract Topology buildTopology();
/**
* Build the {@link Properties} for a {@code KafkaStreams} application.
*
* @return A {@code Properties} object.
*/
protected Properties buildProperties() {
return PropertiesBuilder
.bootstrapServers(this.bootstrapServers)
.applicationId(APPLICATION_NAME + '-' + APPLICATION_VERSION) // TODO as parameter
.applicationId(this.applicationName + '-' + this.applicationVersion)
.set(StreamsConfig.NUM_STREAM_THREADS_CONFIG, this.numThreads, p -> p > 0)
.set(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, this.commitIntervalMs, p -> p >= 0)
.set(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, this.cacheMaxBytesBuff, p -> p >= 0)
// .set(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG")
.build();
return new KafkaStreams(topologyBuilder.build(), properties);
}
/**
* Builds the {@link KafkaStreams} instance.
*/
public KafkaStreams build() {
// Check for required attributes for building properties.
Objects.requireNonNull(this.bootstrapServers, "Bootstrap server has not been set.");
Objects.requireNonNull(this.applicationName, "Application name has not been set.");
Objects.requireNonNull(this.applicationVersion, "Application version has not been set.");
// Create the Kafka streams instance.
return new KafkaStreams(this.buildTopology(), this.buildProperties());
}
}
......@@ -12,9 +12,8 @@ buildscript {
// Variables used to distinct different subprojects
def useCaseProjects = subprojects.findAll {it -> it.name.matches('uc(.)*')}
def useCaseApplications = subprojects.findAll {it -> it.name.matches('uc[0-9]+-application')}
def useCaseGenerators = subprojects.findAll {it -> it.name.matches('uc[0-9]+-workload-generator*')}
def commonProjects = subprojects.findAll {it -> it.name.matches('(.)*common(s?)(.)*')}
def commonProjects = subprojects.findAll {it -> it.name.matches('(.)*commons(.)*')}
// Plugins
allprojects {
......@@ -58,8 +57,8 @@ allprojects {
}
}
// Dependencies for all use case applications
configure(useCaseApplications) {
// Dependencies for all use cases
configure(useCaseProjects) {
dependencies {
// These dependencies are used internally, and not exposed to consumers on their own compile classpath.
implementation('org.industrial-devops:titan-ccp-common:0.0.3-SNAPSHOT') { changing = true } // This branch depends still on old version of titan-ccp-common
......@@ -84,11 +83,7 @@ configure(useCaseGenerators) {
implementation 'com.google.guava:guava:24.1-jre'
implementation 'org.jctools:jctools-core:2.1.1'
implementation 'org.slf4j:slf4j-simple:1.6.1'
implementation project(':workload-generator-common')
// maintain build of generators
implementation 'net.kieker-monitoring:kieker:1.14-SNAPSHOT'
implementation('org.industrial-devops:titan-ccp-common:0.0.3-SNAPSHOT') { changing = true }
implementation project(':application-kafkastreams-commons')
// Use JUnit test framework
testImplementation 'junit:junit:4.12'
......@@ -100,11 +95,10 @@ configure(commonProjects) {
dependencies {
// These dependencies is exported to consumers, that is to say found on their compile classpath.
api 'org.apache.kafka:kafka-clients:2.4.0'
// These dependencies are used internally, and not exposed to consumers on their own compile classpath.
implementation 'org.slf4j:slf4j-simple:1.6.1'
implementation('org.industrial-devops:titan-ccp-common:0.0.3-SNAPSHOT') { changing = true } // This branch depends still on old version of titan-ccp-common
implementation('org.industrial-devops:titan-ccp-common-kafka:0.1.0-SNAPSHOT') { changing = true }
implementation('org.industrial-devops:titan-ccp-common:0.0.3-SNAPSHOT') { changing = true }
// Use JUnit test framework
testImplementation 'junit:junit:4.12'
......
exp*
exp_counter.txt
\ No newline at end of file
# Requirements
# Theodolite Execution Framework
This directory contains the Theodolite framework for executing scalability
benchmarks in a Kubernetes cluster. As Theodolite aims for executing benchmarks
in realistic execution environments,, some third-party components are [required](#requirements).
After everything is installed and configured, you can move on the [execution of
benchmarks](#execution).
## Kubernetes Cluster
## Requirements
### Kubernetes Cluster
For executing benchmarks, access to Kubernetes cluster is required. We suggest
to create a dedicated namespace for executing our benchmarks. The following
services need to be available as well.
### Prometheus
#### Prometheus
We suggest to use the [Prometheus Operator](https://github.com/coreos/prometheus-operator)
and create a dedicated Prometheus instance for these benchmarks.
......@@ -34,7 +41,7 @@ depending on your cluster's security policies.
For the individual benchmarking components to be monitored, [ServiceMonitors](https://github.com/coreos/prometheus-operator#customresourcedefinitions)
are used. See the corresponding sections below for how to install them.
### Grafana
#### Grafana
As with Prometheus, we suggest to create a dedicated Grafana instance. Grafana
with our default configuration can be installed with Helm:
......@@ -60,7 +67,7 @@ Create the Configmap for the data source:
kubectl apply -f infrastructure/grafana/prometheus-datasource-config-map.yaml
```
### A Kafka cluster
#### A Kafka cluster
One possible way to set up a Kafka cluster is via [Confluent's Helm Charts](https://github.com/confluentinc/cp-helm-charts).
For using these Helm charts and conjuction with the Prometheus Operator (see
......@@ -68,7 +75,7 @@ below), we provide a [patch](https://github.com/SoerenHenning/cp-helm-charts)
for these helm charts. Note that this patch is only required for observation and
not for the actual benchmark execution and evaluation.
#### Our patched Confluent Helm Charts
##### Our patched Confluent Helm Charts
To use our patched Confluent Helm Charts clone the
[chart's repsoitory](https://github.com/SoerenHenning/cp-helm-charts). We also
......@@ -86,42 +93,42 @@ To let Prometheus scrape Kafka metrics, deploy a ServiceMonitor:
kubectl apply -f infrastructure/kafka/service-monitor.yaml
```
#### Other options for Kafka
##### Other options for Kafka
Other Kafka deployments, for example, using Strimzi, should work in similiar way.
Other Kafka deployments, for example, using Strimzi, should work in a similar way.
### The Kafka Lag Exporter
#### A Kafka Client Pod
[Lightbend's Kafka Lag Exporter](https://github.com/lightbend/kafka-lag-exporter)
can be installed via Helm. We also provide a [default configuration](infrastructure/kafka-lag-exporter/values.yaml).
To install it:
A permanently running pod used for Kafka configuration is started via:
```sh
helm install kafka-lag-exporter https://github.com/lightbend/kafka-lag-exporter/releases/download/v0.6.0/kafka-lag-exporter-0.6.0.tgz -f infrastructure/kafka-lag-exporter/values.yaml
kubectl apply -f infrastructure/kafka/kafka-client.yaml
```
To let Prometheus scrape Kafka lag metrics, deploy a ServiceMonitor:
#### The Kafka Lag Exporter
[Lightbend's Kafka Lag Exporter](https://github.com/lightbend/kafka-lag-exporter)
can be installed via Helm. We also provide a [default configuration](infrastructure/kafka-lag-exporter/values.yaml).
To install it:
```sh
kubectl apply -f infrastructure/kafka-lag-exporter/service-monitor.yaml
helm install kafka-lag-exporter https://github.com/lightbend/kafka-lag-exporter/releases/download/v0.6.0/kafka-lag-exporter-0.6.0.tgz -f infrastructure/kafka-lag-exporter/values.yaml
```
## Python 3.7
For executing benchmarks and analyzing their results, a **Python 3.7** installation
is required. We suggest to use a virtual environment placed in the `.venv` directory.
### Python 3.7
As set of requirements is needed for the analysis Jupyter notebooks and the
execution tool. You can install them with the following command (make sure to
be in your virtual environment if you use one):
For executing benchmarks, a **Python 3.7** installation is required. We suggest
to use a virtual environment placed in the `.venv` directory (in the Theodolite
root directory). As set of requirements is needed. You can install them with the following
command (make sure to be in your virtual environment if you use one):
```sh
pip install -r requirements.txt
```
## Required Manual Adjustments
### Required Manual Adjustments
Depending on your setup, some additional adjustments may be necessary:
......@@ -133,7 +140,7 @@ Depending on your setup, some additional adjustments may be necessary:
# Execution
## Execution
The `./run_loop.sh` is the entrypoint for all benchmark executions. Is has to be called as follows:
......
#!/bin/bash
./run_loop.sh 1 "25000 50000 75000 100000 125000 150000" "1 2 3 4 5" 40 #6*5=3Std
sleep 5m
./run_loop.sh 2 "6 7 8 9" "1 2 3 4 6 8 10 12 14 16 18 20" 40 #4*12=5Std
sleep 5m
./run_loop.sh 3 "25000 50000 75000 100000 125000 150000" "1 2 3 4 5 6" 40 #6*6=3.5Std
sleep 5m
./run_loop.sh 4 "25000 50000 75000 100000 125000 150000" "1 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90" 40 #6*18=11Std
sleep 5m
./run_loop.sh 1 "25000 50000 75000 100000 125000 150000" "1 2 3 4 5" 400 #6*5=3Std
sleep 5m
./run_loop.sh 2 "6 7 8 9" "1 2 3 4 6 8 10 12 14 16 18 20" 400 #4*12=5Std
sleep 5m
./run_loop.sh 3 "25000 50000 75000 100000 125000 150000" "1 2 3 4 5 6" 400 #6*6=3.5Std
sleep 5m
./run_loop.sh 4 "25000 50000 75000 100000 125000 150000" "1 2 4 6 8 10 12 14 16 18 20 30 40 50 60 70 80 90" 400 #6*18=11Std
sleep 5m
./run_loop.sh 4 "150000" "100 110 120 130 140 150 160 17 18 190 200" 400 #6*18=11Std
sleep 5m
# For commit interval evaluation
./run_loop.sh 4 "5000 10000 15000 20000 25000 30000" "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15" 160
\ No newline at end of file
#!/bin/bash
#./run_loop.sh 1 "50000 100000 150000 200000 250000 300000" "1 2 3 4 5" 40 #3Std
./run_loop.sh 1 "200000 250000 300000" "1 2 3 4 5" 40 1000m 4Gi 100 5 #1.5Std
sleep 1m
#./run_loop.sh 1 "50000 100000 150000 200000 250000 300000" "1 2 3 4 5" 400 #3Std
./run_loop.sh 1 "200000 250000 300000" "1 2 3 4 5" 400 1000m 4Gi 100 5 #1.5Std
sleep 1m
#./run_loop.sh 3 "50000 100000 150000 200000 250000 300000" "1 2 3 4 5 6 7 8 9 10" 40 #6 Std
./run_loop.sh 3 "200000 250000 300000" "1 2 3 4 5 6 7 8 9 10" 40 1000m 4Gi 100 5 #3 Std
sleep 1m
#./run_loop.sh 3 "50000 100000 150000 200000 250000 300000" "1 2 3 4 5 6 7 8 9 10" 400 #6 Std
./run_loop.sh 3 "200000 250000 300000" "1 2 3 4 5 6 7 8 9 10" 400 1000m 4Gi 100 5 #3 Std
sleep 1m
./run_loop.sh 1 "50000 100000 150000 200000 250000 300000" "1 2 3 4 5" 40 500m 2Gi 100 5 #3Std
1
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment