openshift / origin-metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metrics are not gathered after 1.5 > 3.6 upgrade

ksemaev opened this issue · comments

After upgrading from 1.5.1 to 3.6.by this manual: https://docs.openshift.org/latest/install_config/upgrading/automated_upgrades.html#install-config-upgrading-automated-upgrades i tried to upgrade metrics as written there: https://docs.openshift.org/latest/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics

Everything installed smoothly, there is no warnings or errors in events. Logs are looking good. But no metrics are gathered. I can see only one error in webcli:

Metrics are not available.
An error occurred getting metrics for container container_name from https://metrics.elpass/hawkular/metrics.
Status code -1

https://metrics.elpass/hawkular/metrics is availiable.
Any idea - how to debug it?

Additional info:

Inventory:
[OSEv3:vars]
openshift_metrics_install_metrics=true
openshift_metrics_image_version=v3.6.0
openshift_hosted_metrics_deployer_version=v3.6.0
openshift_metrics_hawkular_hostname=metrics.elpass
openshift_metrics_cassandra_storage_type=emptydir

oc get pods -n openshift-infra
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-m514l   1/1       Running   0          1d
hawkular-metrics-95mqx       1/1       Running   0          1d
heapster-t983l               1/1       Running   0          1d
git describe
openshift-ansible-3.6.173.0.18-1
 oc version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

hi,
Can you show services and routes?
oc get svc -n openshift-infra
oc get route -n openshift-infra

@mrGrab surely:

 oc get svc -n openshift-infra
NAME                       CLUSTER-IP       EXTERNAL-IP   PORT(S)                               AGE
hawkular-cassandra         172.30.228.212   <none>        9042/TCP,9160/TCP,7000/TCP,7001/TCP   1d
hawkular-cassandra-nodes   None             <none>        9042/TCP,9160/TCP,7000/TCP,7001/TCP   1d
hawkular-metrics           172.30.234.234   <none>        443/TCP                               1d
heapster                   172.30.211.59    <none>        80/TCP                                1d
oc get route -n openshift-infra
NAME               HOST/PORT        PATH      SERVICES           PORT      TERMINATION   WILDCARD
hawkular-metrics   metrics.elpass             hawkular-metrics   <all>     reencrypt     None

seems you have issue with your own DNS server:

 nslookup metrics.elpass 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53

** server can't find metrics.elpass: NXDOMAIN

metrics.elpass - should be visible outside openshift cluster.
inside cluster use service name hawkular-metrics (or hawkular-metrics.svc.openshift-infra.cluster.local)

@mrGrab surely you can't query google about domain .elpass, it is private domain for our internal network.
As I said - in our network https://metrics.elpass/hawkular/metrics is availiable, and it shows that service is started.

Inside cluster both hawkular-metrics.svc.openshift-infra.cluster.local and metrics.elpass are resolved correctly

Created #377