Sadly, I run into the problem again… 
- zeebe-grpc 0.23.7.0
- grpcio 1.33.2
- protobuf 3.13.0
- Zeebe server 0.23.7
- Zeebe docker image: zeebe:0.23.7@sha256:8aa7418bf02e22fc52f897e03d5d8ee04b32a57e1b4e0961684d5ace8a2de9dc
Let’s see the output of my tool to analyze the Elasticsearch exported data of Zeebe:
JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime bpmnProcessId version type workflowInstanceKey elementId intent key
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-19T15:14:56.314000 test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI 1 madam_scanfolder 2251799813691084 scanfolder_ba CREATED 2251799813691089
2020-11-19T15:14:56.373000 test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI 1 madam_scanfolder 2251799813691084 scanfolder_ba ACTIVATED 2251799813691089
2020-11-19T15:14:56.471000 test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI 1 madam_scanfolder 2251799813691084 scanfolder_ba COMPLETED 2251799813691089
2020-11-19T15:15:08.140000 test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI 2 madam_scanfolder 2251799813691098 scanfolder_ba CREATED 2251799813691104
2020-11-19T15:15:09.561000 test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI 2 madam_scanfolder 2251799813691098 scanfolder_ba ACTIVATED 2251799813691104
Here, the integration tests stopped at the second test, after 20 seconds of timeout waiting for the completed state (task is less than 5/10s).
A second try stopped later, with another workflow bpmn file, another worker:
JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime bpmnProcessId version type workflowInstanceKey elementId intent key
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-19T15:41:56.533000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 1 madam_scanfolder 2251799813691272 scanfolder_ba CREATED 2251799813691277
2020-11-19T15:41:56.667000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 1 madam_scanfolder 2251799813691272 scanfolder_ba ACTIVATED 2251799813691277
2020-11-19T15:41:57.320000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 1 madam_scanfolder 2251799813691272 scanfolder_ba COMPLETED 2251799813691277
2020-11-19T15:42:10.716000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 2 madam_scanfolder 2251799813691285 scanfolder_ba CREATED 2251799813691293
2020-11-19T15:42:10.776000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 2 madam_scanfolder 2251799813691285 scanfolder_ba ACTIVATED 2251799813691293
2020-11-19T15:42:18.052000 test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT 2 madam_scanfolder 2251799813691285 scanfolder_ba COMPLETED 2251799813691293
2020-11-19T15:42:29.997000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 1 madam_scanfolder 2251799813691304 scanfolder_ba CREATED 2251799813691309
2020-11-19T15:42:30.036000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 1 madam_scanfolder 2251799813691304 scanfolder_ba ACTIVATED 2251799813691309
2020-11-19T15:42:30.406000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 1 madam_scanfolder 2251799813691304 scanfolder_ba COMPLETED 2251799813691309
2020-11-19T15:42:54.085000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 2 madam_scanfolder 2251799813691318 scanfolder_ba CREATED 2251799813691325
2020-11-19T15:42:54.113000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 2 madam_scanfolder 2251799813691318 scanfolder_ba ACTIVATED 2251799813691325
2020-11-19T15:42:56.484000 test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6 2 madam_scanfolder 2251799813691318 scanfolder_ba COMPLETED 2251799813691325
2020-11-19T15:43:05.357000 test_wJkELaJoZIlqGh2gS2fhAO0S2TMR4jqk 1 madam_ffmpeg 2251799813691334 ffmpeg_create_video CREATED 2251799813691342
2020-11-19T15:43:05.853000 test_wJkELaJoZIlqGh2gS2fhAO0S2TMR4jqk 1 madam_ffmpeg 2251799813691334 ffmpeg_create_video ACTIVATED 2251799813691342
After the job is activated, the worker continue to loop waiting for activated jobs, but the ones with no response are lost in oblivion…
[EDIT]
Even if I run the python worker client on my laptop (so not inside a docker service in a swarm network), I get the problem…
I start a workflow instance for process_id test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy
.
2020-11-20T00:02:04.357000 test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy 1 madam_scanfolder 2251799813695059 scanfolder_ba CREATED 2251799813695065
2020-11-20T00:02:06.824000 test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy 1 madam_scanfolder 2251799813695059 scanfolder_ba ACTIVATED 2251799813695065
No response is seen in the python grpc client, so the worker do nothing.
But a tcpdump show me that a response for this particular process_id is received:
00:01:59.781542 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 50:129, ack 461, win 501, options [nop,nop,TS val 1325758811 ecr 1486111281], length 79
0x0000: 4500 0083 2f1c 4000 3e06 8932 c0a8 0169 E.../.@.>..2...i
0x0010: c0a8 016d 6784 b280 8108 2943 52e2 2512 ...mg.....)CR.%.
0x0020: 8018 01f5 a213 0000 0101 080a 4f05 795b ............O.y[
0x0030: 5894 4231 0000 4601 0400 0000 0188 5f10 X.B1..F......._.
0x0040: 6170 706c 6963 6174 696f 6e2f 6772 7063 application/grpc
0x0050: 400d 6772 7063 2d65 6e63 6f64 696e 6708 @.grpc-encoding.
0x0060: 6964 656e 7469 7479 4014 6772 7063 2d61 identity@.grpc-a
0x0070: 6363 6570 742d 656e 636f 6469 6e67 0467 ccept-encoding.g
0x0080: 7a69 70 zip
00:01:59.782102 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 129:226, ack 461, win 501, options [nop,nop,TS val 1325758812 ecr 1486111281], length 97
0x0000: 4500 0095 2f1d 4000 3e06 891f c0a8 0169 E.../.@.>......i
0x0010: c0a8 016d 6784 b280 8108 2992 52e2 2512 ...mg.....).R.%.
0x0020: 8018 01f5 d81d 0000 0101 080a 4f05 795c ............O.y\
0x0030: 5894 4231 0000 4000 0000 0000 0100 0000 X.B1..@.........
0x0040: 003b 08d1 cc80 8080 8080 0412 2574 6573 .;..........%tes
0x0050: 745f 4650 7a63 7279 345a 5031 597a 5066 t_FPzcry4ZP1YzPf
0x0060: 4f66 4532 714e 7657 6368 466e 536d 4b4b OfE2qNvWchFnSmKK
0x0070: 5a79 1801 20d3 cc80 8080 8080 0400 000f Zy..............
0x0080: 0105 0000 0001 400b 6772 7063 2d73 7461 ......@.grpc-sta
0x0090: 7475 7301 30 tus.0
The problem seems to be located in the python client…
[EDIT]
NO! This response is empty, see my next post for a full response dump.
So the problem is on the Zeebe side…