spring-cloud / spring-cloud-dataflow-acceptance-tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix StreamPartitioning test on CF tile

ilayaperumalg opened this issue · comments

The Stream partitioning test on the CF tile environment consistently fails with the following error:

[ERROR] org.springframework.cloud.dataflow.acceptance.test.DataFlowAT.streamPartitioning  Time elapsed: 93.604 s  <<< ERROR!
java.lang.IllegalStateException: Duplicate key feaff301-845b-4f36-a533-d5a46ea157d8 (attempted merging values {guid=feaff301-845b-4f36-a533-d5a46ea157d8, index=1, metrics.machine.cpu=0.0%, metrics.machine.disk=0.0%, metrics.machine.memory=0.0%, skipper.application.name=log, skipper.release.name=partitioning-test, skipper.release.version=14, url=http://xxx-test-log-v14.apps.santabarbara.cf-app.com, url.0=http://xxx-test-log-v14.apps.santabarbara.cf-app.com} and {guid=feaff301-845b-4f36-a533-d5a46ea157d8, index=0, metrics.machine.cpu=171.4%, metrics.machine.disk=16.1%, metrics.machine.memory=19.8%, skipper.application.name=log, skipper.release.name=partitioning-test, skipper.release.version=14, url=http://xxx-test-log-v14.apps.santabarbara.cf-app.com, url.0=http://xxx-test-log-v14.apps.santabarbara.cf-app.com})

This may have something to do with the way we handle the collections at DataFlowIT here

I was able to reproduce the StreamPartitioning test failure by running the AT locally pointing at the CF AT environment.

By using the SCDF shell I was also able to see the multiple instances w/ the same GUID, which is expected for CF instances.

Screen Shot 2022-02-22 at 9 05 09 PM

The app has a GUID and the instance have an index. However, the AT code does a Map.collect using the GUID as the key and that will explode if there are duplicate key detected.

Question is why is this happening now?

One theory is that the code that we use to report the "guid" has changed in some way.

Was "guid" not in instance attributes?
If there is no "guid" attribute in the instance attributes then we fallback to the instance id which is the index and would be unique and not have this issue w/ duplicate key.

Take. from org.springframework.cloud.dataflow.rest.resource.AppInstanceStatusResource#getGuid

  if (this.getAttributes() != null && this.getAttributes().containsKey(StreamRuntimePropertyKeys.ATTRIBUTE_GUID)) {
    return this.getAttributes().get(StreamRuntimePropertyKeys.ATTRIBUTE_GUID);
  }
  return this.getInstanceId();

Or did the actual value we were using as "guid" change?
Was it not simply using CF-INSTANCE-GUID but now it is?

Was "guid" not in instance attributes?
If there is no "guid" attribute in the instance attributes then we fallback to the instance id which is the index and would be unique and not have this issue w/ duplicate key.

This is most likely the case.

Yep, I remember @dturanski mentioning something about a bug in the CF GUID generation a while back and @ilayaperumalg found the commit:

We used to have: attributes.put("guid", applicationDetail.getName() + ":" + index);
10:07
https://github.com/spring-cloud/spring-cloud-deployer-cloudfoundry/commit/7e8e44b3b78515483e3421ad5997290efa5f4424#diff-879f24453d[…]32b4d168a34R971. 😂

I will adjust the AT code to account for this (not barf on same guid as the keys) tomorrow.

Leaving open until the test is re-enabled. I am not sure when this will get picked up as the tests use the cloudfoundry-tile which has a version of spring-cloud-deployer-cloudfoundry baked in. They are both 2.8.0-SNAPSHOT version but I am not sure if that matters for a tile (ie is the tile like an image and it uses the lib version that was avaiable at time of creation? That seems not too helpful if so.

I went ahead and re-enabled this test on the build plan. Lets see what it does this afternoon when it runs.

This has now been verified on the ATs w/ a green run https://jenkins.spring.io/view/SCDF-Acceptance-Tests/job/Santabarbara-Kafka/138/. It is also re-enabled in the build plan going forward.

Great! thank you @onobc