Reload data in optimize / elastic

jpcam · June 11, 2020, 2:41pm

Hi there,

Because there was not enough disk space optimize “crashed” last weekend. For some reasons huge logfiles were produced, which took all the space. On monday we made space and started optimize again. We didn’t know that elastic was automatically turned to “read only”. So optimize “was running” for about 2 days with elastic in read only state. As we realized it, we set elastic to the normal state. Since optimize and elastic are running but we are missing the data of these 2 days.

Now we want to try the get these data. Is it possible to get them again? Can we set back the time to sunday for example somewhere in optimize? Or can we delete the data since monday in elastic and optimize would reload the whole week?

Best regards
Jean-Pierre

JoHeinem · June 12, 2020, 7:14am

Hi @jpcam,

Thanks for reaching out! Just in case you are not aware of it: if you’re a customer feel free to open up a support case. It will ensure that you’ll always receive a quick response and you can share sensitive data without publishing it to the web.

Now we want to try the get these data. Is it possible to get them again? Can we set back the time to sunday for example somewhere in optimize?

If Optimize can’t write the data to Elasticsearch it will automatically retry (with a backoff) to do so until it’s possible again. So once you’ve reset the Elasticsearch indexes to be writable again Optimize immediately persists data again. So you never need to worry about that Optimize is “jumping over” some of your data. Can you please validate that you have data for those two days? For instance, you could create a report with a static start date filter set to validate that.

Or can we delete the data since monday in elastic and optimize would reload the whole week?

You can also trigger a reimport of the engine data. However, it’s not easily possible to just delete partial data.

Does that help?

Best
Johannes

jpcam · June 12, 2020, 8:44am

Hi @JoHeinem

Thank you for your answer

I created a report for this 2 days and there are no data. I looked in Camunda and there are thousands of instances (completed and running). I although realized that some data of the day before are missing, I think since elastic switched to read only.

I think the reimport option would solve the problem. Do I get duplicates with this option or are the data in elastic overriden?
When I start optimize, I start 3 services:

camunda.service
camunda-optimize-elastic.service
camunda-optimize.service.

Should I stop both optimize services (1&3) or just the camunda-optimize.service?

Best
Jean-Pierre

JoHeinem · June 12, 2020, 9:18am

Hey @jpcam,

I created a report for this 2 days and there are no data. I looked in Camunda and there are thousands of instances (completed and running). I although realized that some data of the day before are missing, I think since elastic switched to read only.

That’s weird and there’s probably another reason why the import is not continuing. To better help you, can you please:

attach the Optimize logs if possible?
tell me if you’ve unblocked the Optimize indexes?

Do I get duplicates with this option or are the data in elastic overriden?

If you use the reimport approach then the data gets deleted and everything is imported from scratch. So no duplication.

Should I stop both optimize services (1&3) or just the camunda-optimize.service?

Yes, that’s correct.

Best
Johannes

jpcam · June 16, 2020, 6:59am

Hi @JoHeinem

I can’t attach any logs, they were huge, almost 500 Gb (rolling log problems) and as quick fix we delete them. The issue was a connection problem eventually / probably due to the “read only” state of elastic. Perhaps is the current version not affected by this problem, we are running an older version. We’re just migrating to 3.0.

Yes, Optimize is unblocked and collects data since the switch to the normal state. I read the chapter about reimporting data. Unfortunately we can’t delete all data in eleastic, we need them several years back otherwise it were a nice feature. So it seems that we can’t get these 2 - 2,5 days back again.

Is this feature in all versions?

Best
Jean-Pierre

JoHeinem · June 16, 2020, 10:05am

Hey @jpcam,

I can’t attach any logs, they were huge, almost 500 Gb (rolling log problems) and as quick fix we delete them.

It’s possible to do that automatically. If you download a demo distribution of Optimize you can have a log at the environment-logback.xml. There a rolling log mechanism has already been configured. You can define how often (by date) a new file is written and what the maximum size is. If I remember correctly it will also ensure that the maximum amount will be kept in check.

Perhaps is the current version not affected by this problem, we are running an older version. We’re just migrating to 3.0.

Indeed, that could solve the problem. Optimize 3.0 disables the read-only setting in the Elasticsearch indexes by default.

Yes, Optimize is unblocked and collects data since the switch to the normal state. I read the chapter about reimporting data. Unfortunately we can’t delete all data in eleastic, we need them several years back otherwise it were a nice feature. So it seems that we can’t get these 2 - 2,5 days back again.

I’m still confused that Optimize “lost” those two days. Actually, Optimize should always make sure that this doesn’t happen. Even if it can’t write data to Elasticsearch.

Is this feature in all versions?

Do you mean the reimporting feature? Yes, that’s available in all Optimize versions.

Best
Johannes

jpcam · June 19, 2020, 8:56am

Hi @JoHeinem

Last night we got the same error we had last week.
ERROR o.c.o.s.e.i.EngineImportScheduler - Was not able to execute import of ProcessDefinitionXmlEngineImportMediator]

java.lang.NullPointerException: null

As I already mentioned, we’re not running the last version, but I wonder if this problem happens also in the current version. If so, what are the causes and the consequences? Would it be possible to get an message/email and stop the import instead of keep on working and to fill log files?

Best
Jean-Pierre

JoHeinem · June 19, 2020, 9:26am

Hi @jpcam,

Last night we got the same error we had last week.

Is it possible for you to provide the whole stack trace? Also, it would be great if you could tell me which version you’re using. Then I could see if the problem has already been solved.

If so, what are the causes and the consequences? Would it be possible to get an message/email and stop the import instead of keep on working and to fill log files?

I’m not sure what the cause is. For that, I need more information, e.g. logs and the Optimize version. The consequence of the current behavior is that Optimize doesn’t continue to import. Right now, there’s no mechanism that allows you to get notified if an error occurs - I’m sorry.

WIthout having any additional information I can try a shot in the dark: from the log statement you’ve given me it seems that Optimize can’t import the XML for a process definition. That can happen if Optimize has imported the process definition information, then you’ve deleted a process definition and then Optimize tries to import the XML. You can have a look at the documentation to find more details on that.

Does that help?

Best
Johannes

jpcam · June 19, 2020, 1:57pm

Hi @JoHeinem

After so many questions and posts, I thank you for your help.

WIthout having any additional information I can try a shot in the dark: from the log statement you’ve given me it seems that Optimize can’t import the XML for a process definition. That can happen if Optimize has imported the process definition information, then you’ve deleted a process definition and then Optimize tries to import the XML. You can have a look at the documentation to find more details on that.

I read the documention you gave me. I don’t think we have one of the exposed case here. The last process definitions we deleted were more than 2 years old. The cleanup job in camunda is set to 14 days, so we don’t have a big history. In one process, instances can last more than 1 year, but we can’t delete a process definition when instances are still running. We never delete decision definitions explicitly, if some are deleted, it is when we delete the process definiton they belong to.

Is it possible for you to provide the whole stack trace? Also, it would be great if you could tell me which version you’re using. Then I could see if the problem has already been solved.

We’re using Optimize 2.1. There is nothing more left in the camunda Optimize log flles than that error message, The rolling log mechanism worked well . I found an error in the elastic log, I’m not sure if it occured at the same time because Optimize writes a lot in 1 millisecond according the time in the file. So I don’t know when it occured the first time.

Elastic had an OutOfMemory Error. Could this be the cause of the problem? According your answer and the documentation, I don’t think so. This error brings me to my next question. When this error occurs we stop optimize-service and elastic and start them again, nothing else. Both then run, elastic has no memory problem anymore. It happens “regularly”, I mean every 2-3 weeks. Does this have any relation to the volume of data in camunda? In average about 40K-50K processes are started everyday, is this a problem? In Camunda we have normal / primitive process variables (strings) and one variable is an “object” (json) which can be big (20 Kb ?). Is there the cause for this exception? Or is this just a memory leak problem?

Best
Jean-Pierre

JoHeinem · June 19, 2020, 3:27pm

Hi @jpcam,

After so many questions and posts, I thank you for your help.

I’m happy to help and hope that you’ll be able to fully work with Optimize without any further issues

Elastic had an OutOfMemory Error. Could this be the cause of the problem?

Interesting! Without any logs, it’s really hard to say. Since you’re using Optimize 2.1 a lot of things could be the cause for those issues. At that time, Optimize was still at an early stage. Some performance tests were missing and the query was sometimes very memory heavy. Since then Optimize has grown way more mature. Now it’s more bulletproof, can cope with more data, is more performant in terms of import and query performance, and many many bugs have been resolved. So I would highly suggest upgrading to the latest version (maybe even Optimize 3.1 that will be released in the mid of July) and then check if the problem still persists.

Does that help?

Best and have a nice weekend
Johannes

jpcam · June 25, 2020, 4:59am

Hi @JoHeinem

Yes, that helps, thank you. I started the upgrade, the current situation isn’t stable anymore at all. I’m really looking forward to the results.

Best
Jean-Pierre

system · January 30, 2024, 12:27pm