aws / amazon-cloudwatch-logs-for-fluent-bit

A Fluent Bit output plugin for CloudWatch Logs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cloudwatch plugin reception of string log causes Fluent Bit to crash

matthewfala opened this issue · comments

Fluent Bit currently crashes if a simple string log is received by the cloudwatch go plugin. This shouldn't affect the normal use case of Firelens because Firelens usually receives input from the docker fluentd log driver which always outputs an object rather than a raw string.

Here is my Fluent Bit configuration:

[SERVICE]
     Grace 30
     Log_Level trace

# Provide entry point for logs
[INPUT]
     Name http
     host 0.0.0.0
     port 8888
[OUTPUT]
     Name cloudwatch
     Match *
     log_stream_prefix x/
     log_group_name x/
     region us-west-2

Here is the input I send via HTTP request body to Fluent Bit:

POST http://localhost:8888/app.log

[
    "this is a small regular log."
]

Here is Fluent Bit's output:

Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/10/28 22:58:47] [ info] Configuration:
[2021/10/28 22:58:47] [ info]  flush time     | 5.000000 seconds
[2021/10/28 22:58:47] [ info]  grace          | 30 seconds
[2021/10/28 22:58:47] [ info]  daemon         | 0
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  inputs:
[2021/10/28 22:58:47] [ info]      http
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  filters:
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  outputs:
[2021/10/28 22:58:47] [ info]      cloudwatch.0
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  collectors:
[2021/10/28 22:58:47] [ info] [engine] started (pid=24180)
[2021/10/28 22:58:47] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/10/28 22:58:47] [debug] [storage] [cio stream] new stream registered: http.0
[2021/10/28 22:58:47] [ info] [storage] version=1.1.4, initializing...
[2021/10/28 22:58:47] [ info] [storage] in-memory
[2021/10/28 22:58:47] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/10/28 22:58:47] [ info] [cmetrics] version=0.2.2
[2021/10/28 22:58:47] [ info] [input:http:http.0] listening on 0.0.0.0:8888
[2021/10/28 22:58:47] [debug] [cloudwatch:cloudwatch.0] created event channels: read=25 write=26
INFO[0000] [cloudwatch 0] plugin parameter log_group_name = '/x' 
INFO[0000] [cloudwatch 0] plugin parameter default_log_group_name = 'fluentbit-default' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream_prefix = 'x/' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream_name = '' 
INFO[0000] [cloudwatch 0] plugin parameter default_log_stream_name = '/fluentbit-default' 
INFO[0000] [cloudwatch 0] plugin parameter region = 'us-west-2' 
INFO[0000] [cloudwatch 0] plugin parameter log_key = '' 
INFO[0000] [cloudwatch 0] plugin parameter role_arn = '' 
INFO[0000] [cloudwatch 0] plugin parameter auto_create_group = 'false' 
INFO[0000] [cloudwatch 0] plugin parameter new_log_group_tags = '' 
INFO[0000] [cloudwatch 0] plugin parameter log_retention_days = '0' 
INFO[0000] [cloudwatch 0] plugin parameter endpoint = '' 
INFO[0000] [cloudwatch 0] plugin parameter sts_endpoint = '' 
INFO[0000] [cloudwatch 0] plugin parameter credentials_endpoint =  
INFO[0000] [cloudwatch 0] plugin parameter log_format = '' 
[2021/10/28 22:58:47] [trace] [router] input=http.0 tag=http.0
[2021/10/28 22:58:47] [debug] [router] match rule http.0:cloudwatch.0
[2021/10/28 22:58:47] [ info] [sp] stream processor started
[2021/10/28 22:58:52] [trace] [input:http:http.0 at build/plugins/in_http/CMakeFiles/flb-plugin-in_http.dir/compiler_depend.ts:49] new TCP connection arrived FD=30
[2021/10/28 22:58:52] [trace] [input:http:http.0 at build/plugins/in_http/CMakeFiles/flb-plugin-in_http.dir/compiler_depend.ts:79] read()=299 pre_len=0 now_len=299
[2021/10/28 22:58:56] [trace] [task 0x7fffb000a5a0] created (id=0)
[2021/10/28 22:58:56] [debug] [task] created task=0x7fffb000a5a0 id=0 OK
[2021/10/28 22:58:56] [trace] [GO] entering go_flush()
panic: interface conversion: interface {} is []uint8, not map[interface {}]interface {}

goroutine 17 [running, locked to thread]:
github.com/x/github.com/fluent/fluent-bit-go@v0.0.0-20201210173045-3fd1e0486df2/output/decoder.go:87 +0x2ea
main.FLBPluginFlushCtx(0x7fffb0007560, 0x7fffc43b9010, 0xc000000028, 0x7fffb000a710, 0x7ffff4339ca6)
        /home/x/amazon-cloudwatch-logs-for-fluent-bit/fluent-bit-cloudwatch.go:174 +0x1f2
main._cgoexpwrap_19a10b653c9e_FLBPluginFlushCtx(0x7fffb0007560, 0x7fffc43b9010, 0x28, 0x7fffb000a710, 0x7fffb0012860)
        _cgo_gotypes.go:90 +0x49

The easiest way to reproduce may be to run fluent bit with the above config and send the following string payload with curl:

curl -X POST http://localhost:8888/app.log \
   -H 'Content-Type: application/json' \
   -d '["this is a small regular log."]'

It looks like the crash is caused by a call to GetRecord() on the output_plugin's input which appears to contain encoded content from Fluent Bit's core.
https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/blob/mainline/fluent-bit-cloudwatch.go#L174

The scope of this problem is most likely all Fluent Bit Go plugins.

Upon decoding to a character array string ([]uint8) via fluent/fluent-bit-go/output/decoder.go#L87, the []uint8 datatype is forced to be converted to an object interface map[interface{}]interface{} via the interface method which is described as follows:

func (reflect.Value).Interface() (i interface{})

Interface returns v's current value as an interface{}. It is equivalent to:
var i interface{} = (v's underlying value)

It panics if the Value was obtained by accessing unexported struct fields.

The panic due to incompatible type conversion is what we are seeing crash Fluent Bit.

func GetRecord(dec *FLBDecoder) (ret int, ts interface{}, rec map[interface{}]interface{}) {
	var check error
	var m interface{}

	check = dec.mpdec.Decode(&m)
	if check != nil {
		return -1, 0, nil
	}

	slice := reflect.ValueOf(m)
	if slice.Kind() != reflect.Slice || slice.Len() != 2 {
		return -2, 0, nil
	}

	t := slice.Index(0).Interface()
	data := slice.Index(1)

	map_data := data.Interface().(map[interface{}]interface{})

	return 0, t, map_data
}

GetRecord(...) from fluent/fluent-bit-go package's /output/decoder.go

There are several potential fixes to this problem:

  1. Update Fluent Bit's HTTP input plugin (and other similar plugins) so that strings are converted to objects upon reception.
  2. Update the fluent/fluent-bit-go package's /output/decoder.go GetRecord() method to accept strings and object rather than just objects. GetRecord() would then return strings and objects.
    • This may be the closest solution to the Fluent Bit native plugins, as I think those plugins can receive msgpack string logs as well as object logs as input.
    • A change to the decoded input, map_data datatype of all go plugins would most likely be required, unless some interesting union type is utilized as the return value of the decoder which supports backwards compatibility for plugins expecting objects.
  3. Update GetRecord() plugin to package strings in some kind of object such as {"value": <my_string>}
    • This may deviate from the c plugins making this solution undesirable.

We were seeing Node failures in our Fargate clusters while on cloudwatch_logs and was told to use cloudwatch plugin instead. Node stability has returned with the older plugin, and I'm wondering if this is the upstream issue for tracking purposes for the newer variant.