Metrics are not gathered after 1.5 > 3.6 upgrade
ksemaev opened this issue · comments
After upgrading from 1.5.1 to 3.6.by this manual: https://docs.openshift.org/latest/install_config/upgrading/automated_upgrades.html#install-config-upgrading-automated-upgrades i tried to upgrade metrics as written there: https://docs.openshift.org/latest/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics
Everything installed smoothly, there is no warnings or errors in events. Logs are looking good. But no metrics are gathered. I can see only one error in webcli:
Metrics are not available.
An error occurred getting metrics for container container_name from https://metrics.elpass/hawkular/metrics.
Status code -1
https://metrics.elpass/hawkular/metrics is availiable.
Any idea - how to debug it?
Additional info:
Inventory:
[OSEv3:vars]
openshift_metrics_install_metrics=true
openshift_metrics_image_version=v3.6.0
openshift_hosted_metrics_deployer_version=v3.6.0
openshift_metrics_hawkular_hostname=metrics.elpass
openshift_metrics_cassandra_storage_type=emptydir
oc get pods -n openshift-infra
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-m514l 1/1 Running 0 1d
hawkular-metrics-95mqx 1/1 Running 0 1d
heapster-t983l 1/1 Running 0 1d
git describe
openshift-ansible-3.6.173.0.18-1
oc version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO
hi,
Can you show services and routes?
oc get svc -n openshift-infra
oc get route -n openshift-infra
@mrGrab surely:
oc get svc -n openshift-infra
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hawkular-cassandra 172.30.228.212 <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP 1d
hawkular-cassandra-nodes None <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP 1d
hawkular-metrics 172.30.234.234 <none> 443/TCP 1d
heapster 172.30.211.59 <none> 80/TCP 1d
oc get route -n openshift-infra
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
hawkular-metrics metrics.elpass hawkular-metrics <all> reencrypt None
seems you have issue with your own DNS server:
nslookup metrics.elpass 8.8.8.8 Server: 8.8.8.8 Address: 8.8.8.8#53 ** server can't find metrics.elpass: NXDOMAIN
metrics.elpass - should be visible outside openshift cluster.
inside cluster use service name hawkular-metrics (or hawkular-metrics.svc.openshift-infra.cluster.local)
@mrGrab surely you can't query google about domain .elpass, it is private domain for our internal network.
As I said - in our network https://metrics.elpass/hawkular/metrics is availiable, and it shows that service is started.
Inside cluster both hawkular-metrics.svc.openshift-infra.cluster.local and metrics.elpass are resolved correctly
Created #377