Azure: When using Managed DNS Hive fails to find a SOA record
m1kola opened this issue · comments
When using Managed DNS on Azure Hive fails to find a SOA record.
Steps to reproduce
- Create a DNS zone (I use
hive.example.com
for example) - Create/update
HiveConfig
as per docs so thathive.example.com
is included inmanagedDomains
. - Create a cluster
-
In the cluster install config set
baseDomain: cluster.hive.example.com
-
In
ClusterDeployment
the spec should include the following:spec: manageDNS: true baseDomain: cluster.hive.example.com # ...
-
- Run
oc logs -f deployment.apps/hive-controllers
to watch for logs.
You should see in logs that the controller creates a child zone cluster.hive.example.com
, but then fails to find a SOA record for that zone:
level=info msg="reconciling dns zone" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2
level=debug msg="DNSZone is not involved in a relocate" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2
level=info msg="Syncing DNS Zone" controller=dnszone currentGeneration=1 delta=0s dnsZone=example-cluster/cluster-zone lastSyncGeneration=1 reconcileID=f7jzgsr2
level=debug msg="Retrieving current state" controller=dnszone
level=debug msg="Fetching managed zone by zone name" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=debug msg="Found managed zone" controller=dnszone dnsZone=example-cluster/cluster-zone reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=info msg="Existing hosted zone found. Syncing with DNSZone resource" controller=dnszone
level=debug msg="found managed zone name servers" controller=dnszone dnsZone=example-cluster/cluster-zone nameservers="&[ns1-03.azure-dns.com. ns2-03.azure-dns.net. ns3-03.azure-dns.org. ns4-03.azure-dns.info.]" reconcileID=f7jzgsr2 zone=cluster.hive.example.com
level=info msg="looking up domain SOA record" controller=dnszone servers="[172.30.0.10:53]"
level=info msg="SOA query duration: 25.958524ms" server="172.30.0.10:53"
level=info msg="no answer for SOA record returned" controller=dnszone server="172.30.0.10:53"
level=info msg="SOA record for DNS zone not available" controller=dnszone
level=debug msg="Updating DNSZone status" controller=dnszone
level=info msg="reconcile complete" controller=dnszone dnsZone=example-cluster/cluster-zone elapsedMillis=702 elapsedMillisGT=0 outcome=unspecified reconcileID=f7jzgsr2
You should also be able to find cluster.hive.example.com
resource on Azure portal or using az CLI.
Potential cause
The controller doesn't seem to create NS records in the parent zone hive.example.com
to allow delegation of queries about cluster.hive.example.com
to the right name server.
I believe this is why hive controller is not able to find a SOA record.
I'm not very savvy with the networking, so forgive me if this response is silly...
The first step in the doc you pointed to says:
Manually create a DNS zone for your "root" domain (i.e. hive.example.com in the example below) and ensure your DNS is operational.
Is that referring to the SOA record in question?
@2uasimojo my understanding of Hive's "native" support of DNS is as follows. We have two options:
- First option is when you create a
ClusterDeployment
withmanageDNS
set tofalse
. In this case no delegation happens and all the records for all clusters deployed this way being created under the "root" DNS zone Azure resource (hive.example.com
in my example). - Second option is when you create a
ClusterDeployment
withmanageDNS
set totrue
. In this case Hive creates delegated child DNS zone for each cluster (cluster.hive.example.com
in my example) under the "root" DNS zone resource (hive.example.com
in my example).
Is my understanding correct?
So the issue I'm reporting is with the second option (manageDNS: true
): Hive creates delegated child DNS zone, but doesn't not create NS
records under the "root" dns zone. And because of that it can not resolve SOA
record for the child domain.
The first step in the doc you pointed to says:
Manually create a DNS zone for your "root" domain (i.e. hive.example.com in the example below) and ensure your DNS is operational.
I did create "root" domain: it is hive.example.com
in the docs and in my example. And now hive is failing to lookup SOA record for child domain cluster.hive.example.com
.
So the picture should be something like this when you do dig cluster.hive.example.com SOA +trace
:
- Root NS servers
- NS servers for
com.
- NS servers for
example.com.
- NS servers for
hive.example.com.
- NS servers for
cluster.hive.example.com.
SOA
record forcluster.hive.example.com.
Instead it looks like this:
- Root NS servers
- NS servers for
com.
- NS servers for
example.com.
- NS servers for
hive.example.com.
SOA
record forhive.example.com.
I believe NS server which serves hive.example.com.
also replies for cluster.hive.example.com
due to lack of delegation/NS records.
level=info msg="Existing hosted zone found. Syncing with DNSZone resource" controller=dnszone
This is the problematic part, if I recall correctly from my outdated experience. Hive will only do the NS records if it is creating the zone for the cluster's base domain. It that zone already existed, then Hive must assume that the zone is already functional.
This is the problematic part, if I recall correctly from my outdated experience. Hive will only do the NS records if it is creating the zone for the cluster's base domain. It that zone already existed, then Hive must assume that the zone is already functional.
@staebler Hive did create a new zone cluster.hive.example.com
as it did not exist prior to creation of ClusterDeployment
, but did not create NS
records. The logs I provided above are from Hive controller which is stuck in a reconciliation loop. Unfortunately I did not capture logs from the first attempt to reconcile the DNS zone (when it was actually created by the hive controller).
I had a very quick look at the Hive codebase and I do not see any code which sets NS records on DNS zone creation at all: I only see zone creation itself:
hive/pkg/controller/dnszone/azureactuator.go
Lines 69 to 75 in bfb69ae
And implementation of CreateOrUpdateZone
:
hive/pkg/azureclient/client.go
Lines 68 to 75 in bfb69ae
Nothing about NS records here or anywhere near apart from getting nameservers in the actuator here:
hive/pkg/controller/dnszone/azureactuator.go
Lines 147 to 157 in bfb69ae
I have a feeling that there was an assumption that CreateOrUpdateZone
API call will create a delegated DNS zone in Azure (with appropriate NS
records, etc).
Are there any logs from the dnsendpoint controller? That is the controller responsible for adding the NS records.
There should be a ParentLinkCreated condition on the DNSZone that indicates whether Hive was able to create the NS records pointing from the managed DNS zone to the cluster's base-domain zone.
Ok it was my fault. At some point I messed up role assignments for the service principal which I used for DNS zone management. I'm sorry for the noise. Closing this one.