dianping / cat

CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

最新版本master(对应v4.0-RC1)部署后Log View为空

lxil520 opened this issue · comments

打开Log View出现下面内容
Sorry, the message is not there. It could be missing or archived.

server配置如下:


<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="false"/>
         <property name="job-machine" value="false"/>
         <property name="send-machine" value="false"/>
         <property name="alarm-machine" value="false"/>
         <property name="hdfs-machine" value="false"/>
         <property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281"/>
      </properties>
      <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7" har-mode="true" upload-thread="5">
         <hdfs id="logview" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/logview"/>
         <hdfs id="dump" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/dump"/>
         <hdfs id="remote" max-size="128M" server-uri="hdfs://192.168.1.71/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   <server id="192.168.1.71">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>

目前本地断点发现解码器问题,version取出来是乱码
image

Same issue, fixed ?

Same issue, fixed ?

no,i haven't solved it yet.

property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281"
这里的端口是cat的web端口8080吧?

是arm架构吗,arm有几个地方需要修改下

是arm架构吗,arm有几个地方需要修改下

是的,需要改哪里?

property name="remote-servers" value="192.168.1.71:2281,192.168.1.72:2281,192.168.1.73:2281" 这里的端口是cat的web端口8080吧?

是的,这个是没问题的

是arm架构吗,arm有几个地方需要修改下

是的,需要改哪里?

1.升级snappy包

org.xerial.snappy
snappy-java
1.1.10.3

不升级这个包,你会发现dump文件出不来,原因就是arm架构读取文件不行。org.unidal.cat.message.storage.internals.DefaultBlock#createOutputSteam 这个位置会卡住,导致无法存储logview。

2.com.dianping.cat.report.page.logview.service.LocalMessageService#buildNewReport
修改为如下
private String buildNewReport(ModelRequest request, ModelPeriod period, String domain, ApiPayload payload)
throws Exception {
String messageId = payload.getMessageId();
boolean waterfall = payload.isWaterfall();
MessageId id = MessageId.parse(messageId);
ByteBuf buf = m_finderManager.find(id);
MessageTree tree = null;

	if (buf != null) {
		tree = CodecHandler.decode(changeBuf(buf));
	}

	if (tree == null) {
		Bucket bucket = m_bucketManager.getBucket(id.getDomain(),
		      NetworkInterfaceManager.INSTANCE.getLocalHostAddress(), id.getHour(), false);

		if (bucket != null) {
			bucket.flush();

			ByteBuf data = bucket.get(id);

			if (data != null) {
				tree = CodecHandler.decode(changeBuf(data));
			}
		}
	}

	if (tree != null) {
		ByteBuf content = ByteBufAllocator.DEFAULT.buffer(8192);

		if (tree.getMessage() instanceof Transaction && waterfall) {
			m_waterfall.encode(tree, content);
		} else {
			m_html.encode(tree, content);
		}

		try {
			content.readInt(); // get rid of length
			return content.toString(Charset.forName("utf-8"));
		} catch (Exception e) {
			// ignore it
		}
	}

	return null;
}

private ByteBuf changeBuf(ByteBuf data) {
	data.markReaderIndex();
	int length = data.readInt();
	data.resetReaderIndex();
	ByteBuf readBytes = data.readBytes(length + 4);

	readBytes.markReaderIndex();
	readBytes.readInt();
	return readBytes;
}

主要是changeBuf,因为0-4位被占用了,但是最新master分支,没有处理是从0字节开始读取,导致解析失败了。

非arm架构,x64的linux部署的服务端,也出现了Sorry, the message is not there. It could be missing or archived,服务端配置文件如下:

<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="true"/>
         <property name="job-machine" value="true"/>
         <property name="send-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="hdfs-enable" value="false"/>
         <property name="remote-servers" value="127.0.0.1:8080"/>
      </properties>
      <storage local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7" har-mode="true" upload-thread="5">
         <hdfs id="logview" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/logview"/>
         <hdfs id="dump" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/dump"/>
         <hdfs id="remote" max-size="128M" server-uri="hdfs://127.0.0.1/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   <server id="127.0.0.1">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
         <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>