ARN role is assumed for all purposes

Question

ARN role is assumed for all purposes

sihil opened this issue 5 years ago · comments

#65 and #66 changed the behaviour of the library to assume the ARN role for all purposes. This is certainly not the correct behaviour in all circumstances.

In our case we have many AWS accounts and they opt in to central ELK collection by creating a kinesis stream and a role that can pull data off the kinesis stream. Tracking the stream consumption and progress is related to the ELK stack, not the applications that are producing log data. As such it makes sense to create dynamo tables and metrics in the main account and only assume role when talking to the kinesis API.

Whilst I made the assumption that this was the obvious way of doing things I didn't take into account the use case that @autarchprinceps had. Unfortunately we are now stuck on an old version again because of the breaking change so this should really be made configurable in some way.

Patrick Robinson · Answer 1 · Mon Mar 25 2019 21:15:39 GMT+0800 (China Standard Time)

For what it is worth, I wouldn't necessarily disapprove of having two roles, one for kinesis, one for dynamodb. If its the same for us, fine, but that would make it configurable.
We do have a Kinesis Stream as an input for Logstash to collect multiple AWS Accounts worth of logs as well, but if I understand you correctly, we have a single central Kinesis stream that all AWS accounts report to, and you have one in each account, correct? Because if I understand Kinesis correctly, you'll need one dynamodb per Kinesis and Application. Therefore, they are actually tied together as a pair, so I too assumed that our way was the natural and default way to handle this.
I don't quite understand, what connection you put together between applications producing logs and the dynamodb. The latter is needed for applications reading from the Kinesis stream, not writing to it and this is a logstash plugin, so it is running in the ELK stack and therefore account.

Simon Hildrew · Answer 2 · Mon Mar 25 2019 22:01:25 GMT+0800 (China Standard Time)

That might be a sane approach. We use multiple streams as we've had issues in the past where log storms saturate the write rate of the kinesis. We don't want one aws account to be able to stop log entries being recorded from other aws accounts. As such the kinesis stream is maintained and sized by the team feeding it with events.

By having the dynamo table outside of the account there is no way we can inadvertently impact any of their applications by accidentally choosing the same table name to tracking data.

Can see pros and cons for either option.

Gee · Answer 3 · Mon Apr 06 2020 22:09:45 GMT+0800 (China Standard Time)

Hi @sihil,

I ran into exact same issue you had. Did you figure it out or are you still using the old version? If so, which version are you using?

UPDATE

2.0.11 seems to be the latest version where Kinesis and DynamoDB/Cloudwatch are decoupled.

Eunseon Song · Answer 4 · Thu Jul 30 2020 17:56:53 GMT+0800 (China Standard Time)

In some case,
ARN role has only READ policy for kinesis stream of other aws account,
without WRITE/READ policy for dynamodb.

spark-streaming supports configuration for each credential per each purpose of consuming process.

var builder = KinesisInputDStream.builder
.streamingContext(ssc)
.kinesisCredentials(kinesisCredentials)
.dynamoDBCredentials(defaultCredentials)
.cloudWatchCredentials(defaultCredentials)
.endpointUrl("kinesis.ap-northeast-2.amazonaws.com")
.regionName("ap-northeast-2")

I think logstash & dynamodb tied as a pair, for getting action from kinesis stream.
dynamodb is only offset storage for logstash.

@higee
i have used the version https://rubygems.org/gems/logstash-input-kinesis/versions/2.0.11-java,
which use ARN role only for kinesis stream.