Sure. What I have is a Spring Boot 2.2.7 app with an embedded Camunda 7.13 and our own libs versions (according to the factory standards). Right now just running locally.
I forgot to add more about the error I’m getting.
- What went wrong:
Execution failed for task ‘:compileJava’.
Could not resolve all files for configuration ‘:compileClasspath’.
Could not resolve commons-logging:commons-logging:1.2.
Required by:
project : > org.camunda.connect:camunda-connect-http-client:1.4.0 > org.apache.httpcomponents:httpclient:4.5.12
project : > com.github.StephenOTT:camunda-prometheus-process-engine-plugin:v1.8.0 > org.apache.httpcomponents:fluent-hc:4.5.12
Module ‘commons-logging:commons-logging’ has been rejected:
Cannot select module with conflict on capability ‘logging:jcl-api-capability:0’ also provided by [org.springframework:spring-jcl:5.2.6.RELEASE(compile)]
Could not resolve org.springframework:spring-jcl:5.2.6.RELEASE.
Required by:
project : > com.github.StephenOTT:camunda-prometheus-process-engine-plugin:v1.8.0 > org.springframework:spring-core:5.2.6.RELEASE
Module ‘org.springframework:spring-jcl’ has been rejected:
Cannot select module with conflict on capability ‘logging:jcl-api-capability:0’ also provided by [commons-logging:commons-logging:1.2(compile)]
Do you need more information?
Regards,
Diego
I guess it too.
I’ll investigate a little bit more about the error. If I get a solution, can I make a PR to your solution?
Regards,
Diego
Sure. A quick test if is update those deps and “hope” there is no breaking changes
It wasn’t necessary
I just had to add these lines in build.gradle
all {
exclude module: 'httpclient'
exclude module: 'commons-logging'
}
Those module are included in other dependencies.
Regards,
Diego
Hi @StephenOTT, after a couple of week I could adapt your implementation to Spring Boot 2.2.6 and Camunda 7.13.
If you’re interested, I could make a PR to your repo. But before, it’s necessary to talk about what is the best way to organize the code according to your standards.
Cheers,
Diego
1 Like
If you want to post it in the repo as a WIP/Work in progress PR i would be interested to take a look.
Hello @StephenOTT
Thanks for building this plug-in . We have integrated this with our camunda setup and has helped us to monitor the workflow metrics in a much better way. Are there any more features that will get added such as alert rules and alert configurations ?
@Sandeep_Yalamarthi can you give me some examples of features you are looking for ?
Most alerts I had imagined would be Prometheus specific configurations per implementation.
@StephenOTT We run an internal PAAS application with process engine embedded in the application on k8s in a single pod. It is observed that the pod/application is crashing frequently due to high number of asynchronous jobs(1500000) created in the background which means high number of incidents as well due to a faulty bpmn/workflow . These avoid these kind of DDOS attacks it is better if we can have an alert if the number of background jobs cross a certain threshold . Same applies for any metric that affects the application/engine health.
You should be able to set a threshold in grafana. You should not need anything special. What is missing?
A new plugin has been developed that is a replacement:
The new plugin leverages micrometer and springboot actuator.
You can use any of the supported micrometer monitoring systems.
1 Like
We are running camunda 7.12 version in production. A lot of our bpmns are modelled with connector tasks making http calls to other services. To monitor all these business metircs we tried to to implement the earlier plugin.
StephenOTT/camunda-prometheus-process-engine-plugin with the default scrape frequency of 5s. But it was causing high cpu utilisation on db server and we had to disable the plugin.
Will integrating the latest plugin solve our issue. ? Have there been any reports of perfomance issues when integrated in a heavy load application with huge amount of data in the history database.
What are the ideal configurations to run this plugin??
If you re running the queries every 5 seconds then you are querying the data see every 5 seconds. A high cpu load is expected if you are queuing for large amounts of data.
Which specific metrics are you running?
Almost all of the metrics given by in the initial plugin. This is the list.
- camunda_metric_activity_instance_start
- camunda_metric_activity_instance_end
- camunda_metric_executed_decision_elements
- camunda_metric_job_successful
- camunda_metric_job_failed
- camunda_metric_job_acquisition_attempt
- camunda_metric_job_acquired_success
- camunda_metric_job_acquired_failure
- camunda_metric_job_execution_rejected
- camunda_metric_job_locked_exclusive
- camunda_process_definition_stats_instance_count
- camunda_message_event_subscription_count
- camunda_signal_event_subscription_count
- camunda_compensation_event_subscription_count
- camunda_conditional_event_subscription_count
- camunda_open_incidents_count
- camunda_resolved_incidents_count
- camunda_deleted_incidents_count
- camunda_active_process_instance_count
- camunda_active_user_tasks_count
- camunda_active_unassigned_user_tasks_count
- camunda_camunda_suspended_user_tasks_count
- camunda_active_timer_job_count
- camunda_suspended_timer_job_count
Although for few of the metrics i have done some customisations to add an extra label tenantid in the metrics . Example groovy snippet for customised counter metrics as below.
static {
tenantsList =
ProcessEngines.getDefaultProcessEngine()
.getRepositoryService()
.createProcessDefinitionQuery()
.list()
.stream()
.map(ResourceDefinition::getTenantId)
.collect(Collectors.toList());
}
tenantsList.forEach {
tenantId ->
long count = processEngine.getRuntimeService()
.createEventSubscriptionQuery()
.eventType("SIGNAL")
.tenantIdIn(tenantId)
.count();
counter.setValue(count, Arrays.asList(tenantId, engineName));
}
I could recommend you disable 1, 2, and 11.
Always consider what level of detail you actually need to have visibility on. Each of those 24 items are all queries being executed, some with 1+N scenarios such as #11 where it gets a list of Defs and then for each def it does more lookups. This can be a lot of data to process, especially given your are executing every 5 seconds.
1 Like
Thanks for the suggestions @StephenOTT . As of now we increased frequency to 15 mins and disabled few 1,2,11 and few of the custom metrics and things seemed to have stabilised. I am also wondering if there is any other way of getting the full telemetry of the engine without scraping the db approach. Can camunda push metrics to prometheus collectors while creating/invoking the resources itself.??