Generic Graph Aggregator

Motivation

Graph aggregation is a common task in Kieker including trees, traces, and directed graphs. They are used with call traces, trees, and graphs, but also to model user and service interactions. In all cases similar graphs needed to be matched and differences to be aggregated. Presently for each kind of graph a specialized aggregator must be implemented. This is a cumbersome task, as most of that code is identical. Therefore, we propose a aggregation filter to leverage this task.

Solution Idea

The filter should have an input port for graphs and an output port for aggregated graphs. This node will have state. As it may be unfavorable to aggregate all incoming information indefinitely, the filter must accept additional inputs defining the lower bound (timestamp). All information older than this timestamp can be forgotten.

The internal data model of this filter is an aggregated graph comprising nodes and edges with attached time series values.

Note: an overly generic metamodel for the internal data model (and the result) should be avoided, as they unnecessarily complicate the implementation and it may result in a slow filter. Therefore, we recommend the following metamodel (xcore):

package graphmodel.model

interface NamedElement {
	String name
}

interface Module extends NamedElement {
	refers Node[] nodes
}

interface TimeSeries<T> extends NamedElement {
	contains T[] values
}

class Graph extends NamedElement {
	contains Node[] nodes
	contains Edge[] edges
}

class Node extends NamedElement {
	refers derived Edge[] outgoing // derived from 'from' connection of edge
	refers derived Edge[] incoming // derived from 'to' connection of edge
}

class Edge extends NamedElement {
	refers Node[] from
	refers Node[] to
}

The abstract implementation of the aggregator will have at least one abstract method to match incoming graphs on the existing aggregated graphs, and generic methods to append and cut values from the time series.

Open Issues

It might be not the best idea to output the aggregated graph after every aggregation step. This could be triggered by a separate input, specified with a configuration value, or defined by several constraints.
For most purposes the output aggregated graph should only be read by subsequent filters, but it might be necessary to duplicate the information in case a subsequent filter wants to modify the data. This could be realized by a separate filter. However, it must be determined whether such modifications are relevant.