logstash-plugins / logstash-input-kinesis

Logstash Plugin for AWS Kinesis Input

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ARN role is assumed for all purposes

sihil opened this issue · comments

#65 and #66 changed the behaviour of the library to assume the ARN role for all purposes. This is certainly not the correct behaviour in all circumstances.

In our case we have many AWS accounts and they opt in to central ELK collection by creating a kinesis stream and a role that can pull data off the kinesis stream. Tracking the stream consumption and progress is related to the ELK stack, not the applications that are producing log data. As such it makes sense to create dynamo tables and metrics in the main account and only assume role when talking to the kinesis API.

Whilst I made the assumption that this was the obvious way of doing things I didn't take into account the use case that @autarchprinceps had. Unfortunately we are now stuck on an old version again because of the breaking change so this should really be made configurable in some way.

For what it is worth, I wouldn't necessarily disapprove of having two roles, one for kinesis, one for dynamodb. If its the same for us, fine, but that would make it configurable.
We do have a Kinesis Stream as an input for Logstash to collect multiple AWS Accounts worth of logs as well, but if I understand you correctly, we have a single central Kinesis stream that all AWS accounts report to, and you have one in each account, correct? Because if I understand Kinesis correctly, you'll need one dynamodb per Kinesis and Application. Therefore, they are actually tied together as a pair, so I too assumed that our way was the natural and default way to handle this.
I don't quite understand, what connection you put together between applications producing logs and the dynamodb. The latter is needed for applications reading from the Kinesis stream, not writing to it and this is a logstash plugin, so it is running in the ELK stack and therefore account.

That might be a sane approach. We use multiple streams as we've had issues in the past where log storms saturate the write rate of the kinesis. We don't want one aws account to be able to stop log entries being recorded from other aws accounts. As such the kinesis stream is maintained and sized by the team feeding it with events.

By having the dynamo table outside of the account there is no way we can inadvertently impact any of their applications by accidentally choosing the same table name to tracking data.

Can see pros and cons for either option.

commented

Hi @sihil,

I ran into exact same issue you had. Did you figure it out or are you still using the old version? If so, which version are you using?


UPDATE

2.0.11 seems to be the latest version where Kinesis and DynamoDB/Cloudwatch are decoupled.

In some case,
ARN role has only READ policy for kinesis stream of other aws account,
without WRITE/READ policy for dynamodb.

spark-streaming supports configuration for each credential per each purpose of consuming process.

var builder = KinesisInputDStream.builder
.streamingContext(ssc)
.kinesisCredentials(kinesisCredentials)
.dynamoDBCredentials(defaultCredentials)
.cloudWatchCredentials(defaultCredentials)
.endpointUrl("kinesis.ap-northeast-2.amazonaws.com")
.regionName("ap-northeast-2")

I think logstash & dynamodb tied as a pair, for getting action from kinesis stream.
dynamodb is only offset storage for logstash.

@higee
i have used the version https://rubygems.org/gems/logstash-input-kinesis/versions/2.0.11-java,
which use ARN role only for kinesis stream.