Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent

Home Page:http://azure.microsoft.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incomplete read?

ebatutis opened this issue · comments

I am porting the v2.8.0.11 agent to a new platform. Everything seemed to be working OK across many deployments. However, I got a deployment where I got this error - which did not go away (it kept trying to update the goal state hundreds of times):

``
INFO Daemon Daemon Protocol endpoint not found:
[ProtocolError] GET vmSettings [correlation ID: 5cfb8089-bdb9-4029-9d41-e3fedf5f6ba4 eTag: None]: Request failed: IncompleteRead(4145 bytes read, 147 more expected)

...and inside the Python debugger, I can see that the data is indeed truncated coming from the WireServer:

Pdb) data
b'{"hostGAPluginVersion":"1.0.8.136","vmSettingsSchemaVersion":"0.0","activityId":"8107622a-9115-44ae-ab21-d72719a295b5","correlationId":"24f2bfcb-224f-43b7-a28d-dff8610b0bce","inSvdSeqNo":1,"extensionsLastModifiedTickCount":6380982430511
64566,"extensionGoalStatesSource":"Fabric","statusUploadBlob":{"statusBlobType":"PageBlob","value": ...
...indows.net/568bb00f-455e-32b8-8deb-0e1bf1636254/568bb00f-455e-32b8-8deb-0e1bf1636254_manifest.x'
``

...the data ends with ".x" when indeed there should be a lot more stuff after that.

Looking at the IO object in the debugger, I determined that there was no more data available. There is no chunking on this request.

Maybe the request should ask for compressed data? I don't know why that would be needed though.

Not clear to me where the bug is - maybe this particular WireServer is having a problem? It seemed to be working consistently OK on many other deployments. So, just adding an issue here in case there's any advice out there.

@ebatutis this error does not seem to be related to your port of the agent. The error is on the hostGAPlugin, which is similar to wireserver. We tried looking at the logs for that node, but the node has been recycled since the issue happened and we could not confirm what the issue was. We looked at telemetry and the issue seems to have been transient.