Regarding --faster-copy, I think it's okay to enable it. You find an evaluation of this configuration in Jan's thesis. Also regarding the metrics you find some information in Jan's thesis. However, it's important whether these are beam metrics or Flink/Samza metrics or both.
If we have to set parallelism is still open, right?
Why is the Flink Master address not needed in docker-compose? How do the taskmanagers coordinate?
Regarding the state backend in Flink: We should use the same aas for the native Flink benchmarks. However, we should support both backends and document this somewhere.
For the --maxSourceParallelism argument, we should simply set a high value because of:
I added --fastercopy and disabled the metrics. They are disabled for both.
In Flink: If we want to set parallelism depends on the use case (testing, Benchmarking). This can be done by parallelism.default: 1 in docker-compose
The Flink Master address is set in the docker-compose file with jobmanager.rpc.address: benchmark-jobmanager. I have added it for exemplary reasons just to uc4-beam-flink until now. I think we should discuss the docker-compose file.
--maxSourceParallelism will be set to --maxSourceParallelism=$maxSourceParallelism
Lorenz Boguhnmarked the checklist item Clear usage of samza configuration. Is this needed with multiple instances for the shared zookeeper state? as completed
marked the checklist item Clear usage of samza configuration. Is this needed with multiple instances for the shared zookeeper state? as completed
Lorenz Boguhnmarked the checklist item Clear usage of flink pipeline option --parallelism=$PARALLELISM as completed
marked the checklist item Clear usage of flink pipeline option --parallelism=$PARALLELISM as completed
Lorenz Boguhnmarked the checklist item Clear usage of flink pipeline option --flinkMaster=flink-jobmanager:8081 as completed
marked the checklist item Clear usage of flink pipeline option --flinkMaster=flink-jobmanager:8081 as completed
Lorenz Boguhnmarked the checklist item Clear usage of samza pipeline option --maxSourceParallelism=$MAX_SOURCE_PARALLELISM as completed
marked the checklist item Clear usage of samza pipeline option --maxSourceParallelism=$MAX_SOURCE_PARALLELISM as completed
Lorenz Boguhnmarked the checklist item Check when a TimestampPolicie is needed. as completed
marked the checklist item Check when a TimestampPolicie is needed. as completed
Lorenz Boguhnmarked the checklist item ConfigurationKeys not optimal as completed
marked the checklist item ConfigurationKeys not optimal as completed
The usage of Uc2ApplicationBeamNoFeedback.java is not documented or part of Jan's thesis. It is an old artefact of
the development process. Therefore we can delete it.
1
Lorenz Boguhnmarked the checklist item Clear usage of Uc2ApplicationBeamNoFeedback.java (old use case 2 / new use case 4). as completed
marked the checklist item Clear usage of Uc2ApplicationBeamNoFeedback.java (old use case 2 / new use case 4). as completed