graylog + compose for post-loading /var/log/messages*
files from centos 7 default logging config.
docker-compose -f graylog/docker-compose.yml up -d
./graylog/configure.sh
notes:
- had to use a pipeline because extractors can't set timestamp (to originally-logged time)
- graylog 3.0 content packs support pipelines and a bunch of other stuff, so this section may eventually simplify to "apply this content pack".
todo:
- multiline parsing for kernel splats and ooms.
- additional pipelines
Note: pipeline rule configured in
configure.sh
needs to match schema of input.
assuming directory structure:
qa/vmware/manager/$HOSTNAME/messages.gz
prod/aws/worker/$HOSTNAME/messages.gz
do something like:
#!/usr/bin/env sh
node_id=$(curl -su admin:admin http://127.0.0.1:9000/api/cluster |jq '.[]|.node_id' -r)
for file in */*.tar.gz; do
echo $file;
infrastructure=$(echo $file|cut -d/ -f1|sed 's:.$::');
environment=$(echo $file|cut -d/ -f2|sed 's:.$::');
swarm_role=$(echo $file|cut -d/ -f3|sed 's:.$::');
tar -xOf $file var/log/messages | nl -s" ${infrastructure} ${environment} ${swarm_role} " | pv -brL 4M | nc localhost 5555;
journal_utilization=$(curl -su admin:admin http://127.0.0.1:9000/api/cluster/${node_id}/journal |jq '.journal_size/.journal_size_limit*100'|cut -d'.' -f1)
while [ $journal_utilization -gt 10 ]; do
echo journal utilization ${journal_utilization}%. waiting to be 10% or lower.
sleep 10
journal_utilization=$(curl -su admin:admin http://127.0.0.1:9000/api/cluster/${node_id}/journal |jq '.journal_size/.journal_size_limit*100'|cut -d'.' -f1)
done
done
notes:
lineno
inserted to enable ordering of query results. Neither syslog nor event-received timestamps are sufficiently precise to reliably distinguish "events" arriving at 10kHz. using file line number for ordering may be brittle for logs in the transition between files from the same origin host.pv
rate limits to prevent drops due to journal overflow. beats can't introduce line numbers and raw doesn't support backpressure. wait between files until journal recovers to 10% before proceeding to next file.-
tune the
pv
rate limit for your hardware if you like. journal depth can be monitored in the graylog web ui underSystem > Nodes > Details
, or in a terminal:node_id=$(curl -su admin:admin http://127.0.0.1:9000/api/cluster |jq '.[]|.node_id' -r) watch "curl -su admin:admin http://127.0.0.1:9000/api/cluster/${node_id}/journal |jq '100*.journal_size/.journal_size_limit'"
-
pv
rate can be changed on a running instance withpv -R$(ps h -o pid -C pv) -L ${NEW_RATE}M
-
todo:
- develop an input strategy that supports realtime backpressure and file line numbering.
- can beats read from a socket or named pipe?
- or send the whole for loop through
pv
, monitor journal, and slow downpv
if journal gets too deep
- test if pipeline
set_fields
is faster than individualset_field
calls - how hard is it to scale ES?
- http://localhost:9000
- login:
admin:admin
why
-
logstash supports sequencing internally https://stackoverflow.com/a/49896736
-
aimed kibana at existing es with
docker run --link
and took it for a spin. for my simple use case only area graylog UI is better than kibana feature wise is exporting queries.EDIT: graylog possibly beats logstash on pipeline development (can test in the UI).
research + test
-
beats backpressure
For example, when the beats input encounters back pressure, it no longer accepts new connections and waits until the persistent queue has space to accept more events. Ref: https://www.elastic.co/guide/en/logstash/current/persistent-queues.html#backpressure-persistent-queue
- when does a new beats connection happen?
- one connection per file line would backpressure nicely, but be otherwise super socket-create intensive
- one connection per (gigabytes sized) file wouldn't backpressure nicely, but would be socket friendly
- or maybe connections carry chunks of files?
- does beats pause reading in the input file, or does it read into beats process memory?
- when does a new beats connection happen?
do
- find elk compose
- convert pipeline rule to logstash filter