openshift / hive

API driven OpenShift cluster provisioning and management

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure: When using Managed DNS Hive fails to find a SOA record

m1kola opened this issue · comments

When using Managed DNS on Azure Hive fails to find a SOA record.

Steps to reproduce

  1. Create a DNS zone (I use hive.example.com for example)
  2. Create/update HiveConfig as per docs so that hive.example.com is included in managedDomains.
  3. Create a cluster
    1. In the cluster install config set baseDomain: cluster.hive.example.com

    2. In ClusterDeployment the spec should include the following:

      spec:
        manageDNS: true
        baseDomain: cluster.hive.example.com
      # ...
  4. Run oc logs -f deployment.apps/hive-controllers to watch for logs.

You should see in logs that the controller creates a child zone cluster.hive.example.com, but then fails to find a SOA record for that zone:

level=info msg="reconciling dns zone" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2
level=debug msg="DNSZone is not involved in a relocate" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2
level=info msg="Syncing DNS Zone" controller=dnszone currentGeneration=1 delta=0s dnsZone=example-cluster/cluster-zone lastSyncGeneration=1 reconcileID=f7jzgsr2
level=debug msg="Retrieving current state" controller=dnszone
level=debug msg="Fetching managed zone by zone name" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=debug msg="Found managed zone" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=info msg="Existing hosted zone found. Syncing with DNSZone resource" controller=dnszone
level=debug msg="found managed zone name servers" controller=dnszone dnsZone=example-cluster/cluster-zone nameservers="&[ns1-03.azure-dns.com. ns2-03.azure-dns.net. ns3-03.azure-dns.org. ns4-03.azure-dns.info.]" reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=info msg="looking up domain SOA record" controller=dnszone servers="[172.30.0.10:53]"
level=info msg="SOA query duration: 25.958524ms" server="172.30.0.10:53"
level=info msg="no answer for SOA record returned" controller=dnszone server="172.30.0.10:53"
level=info msg="SOA record for DNS zone not available" controller=dnszone
level=debug msg="Updating DNSZone status" controller=dnszone
level=info msg="reconcile complete" controller=dnszone dnsZone=example-cluster/cluster-zone elapsedMillis=702 elapsedMillisGT=0 outcome=unspecified reconcileID=f7jzgsr2

You should also be able to find cluster.hive.example.com resource on Azure portal or using az CLI.

Potential cause

The controller doesn't seem to create NS records in the parent zone hive.example.com to allow delegation of queries about cluster.hive.example.com to the right name server.

I believe this is why hive controller is not able to find a SOA record.

I'm not very savvy with the networking, so forgive me if this response is silly...

The first step in the doc you pointed to says:

Manually create a DNS zone for your "root" domain (i.e. hive.example.com in the example below) and ensure your DNS is operational.

Is that referring to the SOA record in question?

@2uasimojo my understanding of Hive's "native" support of DNS is as follows. We have two options:

  1. First option is when you create a ClusterDeployment with manageDNS set to false. In this case no delegation happens and all the records for all clusters deployed this way being created under the "root" DNS zone Azure resource (hive.example.com in my example).
  2. Second option is when you create a ClusterDeployment with manageDNS set to true. In this case Hive creates delegated child DNS zone for each cluster (cluster.hive.example.com in my example) under the "root" DNS zone resource (hive.example.com in my example).

Is my understanding correct?


So the issue I'm reporting is with the second option (manageDNS: true): Hive creates delegated child DNS zone, but doesn't not create NS records under the "root" dns zone. And because of that it can not resolve SOA record for the child domain.

The first step in the doc you pointed to says:

Manually create a DNS zone for your "root" domain (i.e. hive.example.com in the example below) and ensure your DNS is operational.

I did create "root" domain: it is hive.example.com in the docs and in my example. And now hive is failing to lookup SOA record for child domain cluster.hive.example.com.

So the picture should be something like this when you do dig cluster.hive.example.com SOA +trace:

  1. Root NS servers
  2. NS servers for com.
  3. NS servers for example.com.
  4. NS servers for hive.example.com.
  5. NS servers for cluster.hive.example.com.
  6. SOA record for cluster.hive.example.com.

Instead it looks like this:

  1. Root NS servers
  2. NS servers for com.
  3. NS servers for example.com.
  4. NS servers for hive.example.com.
  5. SOA record for hive.example.com.

I believe NS server which serves hive.example.com. also replies for cluster.hive.example.com due to lack of delegation/NS records.

level=info msg="Existing hosted zone found. Syncing with DNSZone resource" controller=dnszone

This is the problematic part, if I recall correctly from my outdated experience. Hive will only do the NS records if it is creating the zone for the cluster's base domain. It that zone already existed, then Hive must assume that the zone is already functional.

This is the problematic part, if I recall correctly from my outdated experience. Hive will only do the NS records if it is creating the zone for the cluster's base domain. It that zone already existed, then Hive must assume that the zone is already functional.

@staebler Hive did create a new zone cluster.hive.example.com as it did not exist prior to creation of ClusterDeployment, but did not create NS records. The logs I provided above are from Hive controller which is stuck in a reconciliation loop. Unfortunately I did not capture logs from the first attempt to reconcile the DNS zone (when it was actually created by the hive controller).

I had a very quick look at the Hive codebase and I do not see any code which sets NS records on DNS zone creation at all: I only see zone creation itself:

managedZone, err := a.azureClient.CreateOrUpdateZone(context.TODO(), resourceGroupName, zone)
if err != nil {
logger.WithError(err).Error("Error creating managed zone")
return err
}
logger.Debug("Managed zone successfully created")

And implementation of CreateOrUpdateZone:

func (c *azureClient) CreateOrUpdateZone(ctx context.Context, resourceGroupName string, zone string) (dns.Zone, error) {
return c.zonesClient.CreateOrUpdate(ctx, resourceGroupName, zone, dns.Zone{
Location: to.StringPtr("global"),
ZoneProperties: &dns.ZoneProperties{
ZoneType: dns.Public,
},
}, "", "")
}

Nothing about NS records here or anywhere near apart from getting nameservers in the actuator here:

// GetNameServers implements the GetNameServers call of the actuator interface
func (a *AzureActuator) GetNameServers() ([]string, error) {
if a.managedZone == nil {
return nil, errors.New("managedZone is unpopulated")
}
logger := a.logger.WithField("zone", a.dnsZone.Spec.Zone)
result := a.managedZone.NameServers
logger.WithField("nameservers", result).Debug("found managed zone name servers")
return *result, nil
}

I have a feeling that there was an assumption that CreateOrUpdateZone API call will create a delegated DNS zone in Azure (with appropriate NS records, etc).

Are there any logs from the dnsendpoint controller? That is the controller responsible for adding the NS records.

There should be a ParentLinkCreated condition on the DNSZone that indicates whether Hive was able to create the NS records pointing from the managed DNS zone to the cluster's base-domain zone.

Ok it was my fault. At some point I messed up role assignments for the service principal which I used for DNS zone management. I'm sorry for the noise. Closing this one.