Unable to aggregate logs using logstash scheduler
nd-roy opened this issue · comments
Hello,
I tried to launch Logstash as a marathon app and it seems to not aggregate the logs in all nodes.
Mesosphere DCOS: v.1.6.1
Mesos: 0.27.1
Do we have information that can help me to solve my problem?
Thank you
My configuration
{
"id": "/logstash",
"cpus": 1,
"mem": 1024.0,
"instances": 1,
"container": {
"type": "DOCKER",
"docker": {
"image": "mesos/logstash-scheduler:0.10-RC1",
"network": "HOST"
}
},
"env": {
"MESOS_ZOOKEEPER_SERVER": "int.host:2181",
"MESOS_MASTER": "host",
"FRAMEWORK_NAME": "logstash",
"MESOS_ROLE": "logstash",
"MESOS_USER": "root",
"LOGSTASH_HEAP_SIZE": "64",
"LOGSTASH_ELASTICSEARCH_URL": "my-els-server",
"EXECUTOR_CPUS": "0.5",
"EXECUTOR_HEAP_SIZE": "128",
"ENABLE_COLLECTD": "false",
"ENABLE_SYSLOG": "true",
"ENABLE_FILE": "true",
"ENABLE_DOCKER": "true",
"EXECUTOR_FILE_PATH": "/var/log/*, $MESOS_WORK_DIR/slaves/*/frameworks/*/executors/*/runs/*/stdout, $MESOS_WORK_DIR/slaves/*/frameworks/*/executors/*/runs/*/stderr"
}
}
stdout
--container="mesos-f07f4be9-67be-47ed-8af5-75b7077b3223-S1.84a0a4b3-af1c-4623-9f87-d9131de99fe2" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--b012cc908778011b3c6b09b1ebaa06f5e0a93ccd/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/f07f4be9-67be-47ed-8af5-75b7077b3223-S1/frameworks/f07f4be9-67be-47ed-8af5-75b7077b3223-0000/executors/logstash.bf6cec48-f693-11e5-8707-0242bbe76e2d/runs/84a0a4b3-af1c-4623-9f87-d9131de99fe2" --stop_timeout="0ns"
--container="mesos-f07f4be9-67be-47ed-8af5-75b7077b3223-S1.84a0a4b3-af1c-4623-9f87-d9131de99fe2" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --launcher_dir="/opt/mesosphere/packages/mesos--b012cc908778011b3c6b09b1ebaa06f5e0a93ccd/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slave/slaves/f07f4be9-67be-47ed-8af5-75b7077b3223-S1/frameworks/f07f4be9-67be-47ed-8af5-75b7077b3223-0000/executors/logstash.bf6cec48-f693-11e5-8707-0242bbe76e2d/runs/84a0a4b3-af1c-4623-9f87-d9131de99fe2" --stop_timeout="0ns"
Registered docker executor on 10.0.0.17
Starting task logstash.bf6cec48-f693-11e5-8707-0242bbe76e2d
|\ /|
| \ / |
| / \ |
|/ \|
/ \ /|
/ \ / | .____ __ .__
\ | | | | ____ ____ _______/ |______ _____| |__
\ | | | | / _ \ / ___\/ ___/\ __\__ \ / ___/ | \
| | / | |__( <_> ) /_/ >___ \ | | / __ \_\___ \| Y \
| |/ |_______ \____/\___ /____ > |__| (____ /____ >___| /
| / \/ /_____/ \/ \/ \/ \/
|/ :: Running Spring Boot 0.1.0 ::
2016-03-30 16:23:35.356 INFO 1 --- [ main] o.a.m.logstash.scheduler.Application : Starting Application v0.1.0 on ip-10-0-0-17.us-west-2.compute.internal with PID 1 (/tmp/logstash-mesos-scheduler.jar started by root in /)
2016-03-30 16:23:35.361 INFO 1 --- [ main] o.a.m.logstash.scheduler.Application : No active profile set, falling back to default profiles: default
2016-03-30 16:23:39.117 INFO 1 --- [ main] o.a.m.logstash.scheduler.Application : Started Application in 4.775 seconds (JVM running for 5.236)
stderr
I0330 16:23:33.798660 32152 exec.cpp:134] Version: 0.27.1
I0330 16:23:33.800987 32180 exec.cpp:208] Executor registered on slave f07f4be9-67be-47ed-8af5-75b7077b3223-S1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/logstash-mesos-scheduler.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/tmp/logstash-mesos-scheduler.jar!/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2016-03-30 16:23:38,824:1(0x7fb815bff700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-0-0-17.us-west-2.compute.internal
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@724: Client environment:os.arch=4.1.7-coreos-r1
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Thu Nov 5 02:10:23 UTC 2015
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@log_env@753: Client environment:user.dir=/
2016-03-30 16:23:38,825:1(0x7fb815bff700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=int.host:2181 sessionTimeout=1000 watcher=0x7fb81b4ad600 sessionId=0 sessionPasswd=<null> context=0x7fb804001ab0 flags=0
2016-03-30 16:23:38,890:1(0x7fb8113f6700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.7.235:2181]
2016-03-30 16:23:38,893:1(0x7fb8113f6700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.7.235:2181], sessionId=0x353c1e9bc07000e, negotiated timeout=4000
Hi @AbdoulNdiaye. Thank you very much for your feedback. I've been experimenting a little with your parameters.
The issue is that the scheduler isn't successfully registering in Mesos, in that case, you'll see the following line in STDOUT
c.c.mesos.scheduler.UniversalScheduler : Framework registrered with frameworkId=37a00eb7-d7f4-4fe3-b31f-c1fe638fccb1-0001
c.c.m.s.state.StateRepositoryZookeeper : Received frameworkId=37a00eb7-d7f4-4fe3-b31f-c1fe638fccb1-0001
First thing I've noticed is that you're missing a port on your MESOS_MASTER
. When I try without that, I'm getting the following parsing error:
c.c.mesos.scheduler.UniversalScheduler : Received error: Failed to create a master detector for '172.16.33.20': Failed to parse '172.16.33.20'
So I'm assuming it's a copy/paste thing?
Last I tried with an inaccessible Mesos Master, where I came very close to your behaviour, so my conclusion is: Please check the following things:
- Make sure you have a port defined on your
MESOS_MASTER
environment variable, in the following formathost:5050
- Make sure the Mesos master host is actually resolvable and accessible. On DCOS I believe you can use
mesos.leader
?
@AbdoulNdiaye If you still have issues, please reopen this report. Thanks.