openshift / origin-metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metrics Scale Up Details?

lpsantil opened this issue · comments

One my former customers is having a hard time enabling HA in metrics by following the guidance in the OCP docs and some other things we found here in the issues list.

Specifically, the scale up command given in the OCP docs does not seem accurate. It doesn't scale up the cassandra pod as the docs imply by their mentioning of the storage requirements. Manually scaling up the cassandra pods is not effective. Deploying with CASSANDRA_NODES=3 from the template doesn't bring up a running metrics instance. The one thing we haven't tried just yet is the ansible installer method. Maybe the installer has some magic in there? Executing the following after scaling up 3 cassandra nodes fails

oc exec <cassandra_pod> csql -e "ALTER KEYSPACE hawkular_metrics WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'} AND durable_writes = true;"
oc exec <cassandra_pod> nodetool repair -full

There's also concerns that the oc exec commands have an ephemeral effect and would need to be replicated in a recovery scenario.

Any ideas? I have a Portal ticket number with details for those with access.

Ping?

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.