dotnet / sdk-container-builds

Libraries and build tooling to create container images from .NET projects using MSBuild

Home Page:https://learn.microsoft.com/en-us/dotnet/core/docker/publish-as-container

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Container Registry auth failure when using Managed Identity

baronfel opened this issue · comments

A user reported failures when attempting to use our tools when logged in to ACR via a managed identity via an Azure Devops-Managed Service Connection. We should verify/fix that Managed Identity authentication works.

I did a bit of digging into why/how managed identities work to try and triage this. After digging into the user's report, we aren't failing fetching the credentials, but we are failing during actual auth. I'm not able to get insight into the value of the token/auth that's retrieved.

Useful links:

We should investigate this logic and try to set up a test suite for Managed Identity support.

We've done some investigation and found that when a login happens with Managed Identity, an IdentityToken is given (indicated by a username of <token>). This is supposed to signal performing an OAuth Token authentication flow with the upstream registry, but right now we only perform the 'Token' authentication method. We need to do more here and support the OAuth Bearer authentication mechanism as well.

Docs on the OAuth Bearer mechanism are here, and the Jib and regclient implementations of this are good references too.

Hello @baronfel , I believe I ran into this same issue. I'm not sure what all authentication schemes a Managed Identity covers, but I ran into the issue for both federated credentials in a GHA workflow, and when logging in as a service principal and clientId + clientSecret.

I implemented a temporary work around by modifying $HOME/.docker/config.json and overwriting auths.[your-registry].auth and set it to the base64 of 00000000-0000-0000-0000-000000000000:REDACTED JWT, and also remove the identitytoken element.

I put in a lot of investigation/reproduction/work-around effort here (I probably should have looked through these issues first, but I was originally thinking it was a docker or ACR problem. Alwell.)

Is there a better place to provide these details than this comment? Perhaps @MichalPavlik or @vlada-shubina might like to read through it to see if it covers the potential fix?

I wouldn't mind testing the fix as well. I reproduced in Windows, Alpine Linux, and ubuntu in a GHA workflow so can quickly verify if an updated SDK solves the problem for me.

@hotfix-houdini that repro is amazing! Thank you so much for writing it up - one thing I had been unable to identify is how I could test the work Vlada did in the linked SDK PR over at baronfel/sdk-container-demo, and your repro instructions are perfectly suited for that.

I agree that the issue you've written up is the one we're describing here, and so I think you should be able to verify the fix using the following steps. Unfortunately this is a bit fiddly due to .NET 8 runtime incompatibilities between preview 6 and preview 7, so you currently must build the SDK itself:

  • Clone the SDK repo and checkout Vlada's PR branch
  • Build the SDK via build.cmd/build.sh depending on platform
  • Execute the eng/dogfood.ps1 or eng/dogfood.sh scripts to make your current terminal session use the locally-built SDK (and containers tooling from Vlada's PR)
  • log into azure act with your managed identity
  • cd to your project directory and run the dotnet publish command you would normally run to publish via the SDK

We've had success with an internal-to-Microsoft partner team using this approach, so I'm hoping you are also able to have success using these steps.

Cool, I just stepped away for a while but should be able to go through those steps within 24 hours and will report back.

@hotfix-houdini I was just able to run these steps myself on the test ACR I have for baronfel/sdk-container-demo AND THEY WORKED! This is very exciting!

Woo! 🥳

I have not been so successful 😅. I see what you mean with things being fiddly. I could definitely be doing things wrong, so might not be a problem if you're satisified. I definitely built the wrong repo/branch several times. I still got the 401 unauthorized and it does appear that I'm running the locally built SDK.

Here's exactly what I did on windows:

  • new admin powershell terminal
  • git clone https://github.com/vlada-shubina/sdk.git
  • cd sdk
  • git checkout origin/containers-oauth
  • build.cmd
    • "Attempting to install 'sdk v8.0.100-preview.6.23330.14' from public location."
  • cd ..
  • dotnet new worker -o Worker -n DotNet.ContainerImage
  • cd Worker
  • manually add container registry reference csproj
  • dotnet add package Microsoft.NET.Build.Containers
  • az acr login --name registry
  • dotnet publish --os linux --arch x64 /t:PublishContainer -c Release
    • observe that it's running my global sdk (c:\Program Files\dotnet\sdk\7.0.400-preview.23225.8)
    • observe 401 error (expected)
  • verify docker creds are good though - docker push ".azurecr.io/hello-world:latest"
  • cd ../sdk
  • eng/dogfood.ps1
  • cd ../Worker
  • dotnet publish --os linux --arch x64 /t:PublishContainer -c Release
    • observe that it's running the build sdk (C:\sdkt\6\sdk\artifacts\bin\redist\Debug\dotnet\sdk\8.0.100-dev)
    • still observe 401 error :(
       C:\Users\name\.nuget\packages\microsoft.net.build.containers\7.0.306\build\Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013: Failed to push to the output registry: System.Net.Http.HttpReq
    uestException: Response status code does not indicate success: 401 (Unauthorized). [C:\sdkt\6\Worker\DotNet.ContainerImage.csproj]
    

At one point when I was building the SDK on the wrong repo/branch, I was prompted with an error: SDK Stage 0 has more than one folder with templates with both \sdk\.dotnet/templates\6.0.0-rc.2.21420.26 and \sdk\.dotnet/te mplates\8.0.0-preview.5.23302.2. I deleted 6.0.0-rc.2.21420.26 - maybe I was supposed to keep that one and delete 8.0.0-preview.5.23302.2 instead? Unfortunately I didn't save 6.0.0-rc.2.21420.26 and I haven't been reprompted so I haven't been able to retest with that version if that is my issue.

I switched over to my docker-in-docker alpine linux and repeated the steps and got the same result. I was reprompted with the template choices and removed 8.0.0-preview.6.23329.11 from the templates directory this time, leaving 6.0.0-preview.5.21220.15. It didn't seem to make a difference. dogfood didn't seem to work for me (which dotnet still showed /usr/bin/dotnet), but I ran ./dotnet from /sdk/artifacts/bin/redist/Debug/dotnet, publishing the .net7 app. I did do my hack on the .docker/config.json and was able to push. git status shows: HEAD detached at origin/containers-oauth Here's the output from the alpine linux attempt if it's helpful:

source/f1/sdk/artifacts/bin/redist/Debug/dotnet # /source/f1/sdk/artifacts/bin/redist/Debug/dotnet/dotnet publish --os linux --arch x64 /t:PublishContainer -c Release /source/f1/Worker/
MSBuild version 17.8.0-preview-23401-01+b3989dc43 for .NET
  Determining projects to restore...
  All projects are up-to-date for restore.
/source/f1/sdk/artifacts/bin/redist/Debug/dotnet/sdk/8.0.100-dev/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.RuntimeIdentifierInference.targets(311,5): message NETSDK1057: You are using a preview version of .NET. See: https://aka.ms/dotnet-support-policy [/source/f1/Worker/DotNet.ContainerImage.csproj]
  DotNet.ContainerImage -> /source/f1/Worker/bin/Release/net7.0/linux-x64/DotNet.ContainerImage.dll
  DotNet.ContainerImage -> /source/f1/Worker/bin/Release/net7.0/linux-x64/publish/
  Building image 'dotnet-containerimage' with tags 1.0.0 on top of base image mcr.microsoft.com:443/dotnet/runtime:7.0
  Uploading layer sha256:1d5252f66ea9b661aceca1027b3d7ca259a50608261a25b51148119ecf086932 to testacrrbmcundbyipszfj.azurecr.io
  Uploading layer sha256:75b8c875f256d7ad2c486a3a53c3bdf6b084bfd885e2ea845d54aab237e16b76 to testacrrbmcundbyipszfj.azurecr.io
  Uploading layer sha256:3b04a5dc83ef9b6e55e871d3f28543d55da237e04fe428844764759854f62b4a to testacrrbmcundbyipszfj.azurecr.io
  Uploading layer sha256:6703541983d9dc7ce7f7b2db5cc628f5c0b0f4a6e38db93d1a344f40c6f6f153 to testacrrbmcundbyipszfj.azurecr.io
  Uploading layer sha256:bf2ec28fb93d09a6e7e37100af827b4fe060656b630f1e13f57492bce38f05fd to testacrrbmcundbyipszfj.azurecr.io
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013: Failed to push to the output registry: System.Net.Http.HttpRequestException: Response status code does not indicate success: 401 (Unauthorized). [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode() [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.AuthHandshakeMessageHandler.GetAuthenticationAsync(String registry, String scheme, Uri realm, String service, String scope, CancellationToken cancellationToken) in /_/src/Containers/Microsoft.NET.Build.Containers/AuthHandshakeMessageHandler.cs:line 140 [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.AuthHandshakeMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /_/src/Containers/Microsoft.NET.Build.Containers/AuthHandshakeMessageHandler.cs:line 182 [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.Registry.BlobAlreadyUploadedAsync(String repository, String digest, HttpClient client, CancellationToken cancellationToken) in /_/src/Containers/Microsoft.NET.Build.Containers/Registry.cs:line 497 [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.Registry.<>c__DisplayClass51_0.<<PushAsync>b__0>d.MoveNext() in /_/src/Containers/Microsoft.NET.Build.Containers/Registry.cs:line 546 [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013: --- End of stack trace from previous location --- [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.Registry.PushAsync(BuiltImage builtImage, ImageReference source, ImageReference destination, Action`1 logProgressMessage, CancellationToken cancellationToken) in /_/src/Containers/Microsoft.NET.Build.Containers/Registry.cs:line 575 [/source/f1/Worker/DotNet.ContainerImage.csproj]
/root/.nuget/packages/microsoft.net.build.containers/7.0.306/build/Microsoft.NET.Build.Containers.targets(195,5): error CONTAINER1013:    at Microsoft.NET.Build.Containers.Tasks.CreateNewImage.ExecuteAsync(CancellationToken cancellationToken) in /_/src/Containers/Microsoft.NET.Build.Containers/Tasks/CreateNewImage.cs:line 133 [/source/f1/Worker/DotNet.ContainerImage.csproj]

Hopefully this is just a local environment issue on me or still the wrong repo/branch or something. I definitely hit snags on both OSes.

@hotfix-houdini there will be an SDK RC1 nightly build probably tomorrow with this feature, would you mind retesting then? I was able to succeed with this set of changes as were some teammates of mine last week, so we've merged the current implementation for inclusion into RC1.