[Bug]: The column config of the elasticsearch reader does not take effect
imzhf opened this issue · comments
Contact Details(联系人)
No response
What happened?
2022-10-24 08:36:11.922 [0-0-0-writer] ERROR WriterRunner - Writer Runner Received Exceptions:
com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "taskGroup-0" com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)
2022-10-24 08:36:14.748 [ job-0] ERROR JobContainer - 运行scheduler出错.
2022-10-24 08:36:14.751 [ job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 7629 bytes | Speed 7.45KB/s, 3 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2022-10-24 08:36:14.751 [ job-0] INFO EsReader$Job - ============elasticsearch reader job destroy=================
com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)
Version
4.0.9 (Default)
OS Type
No response
Java JDK Version
Oracle JDK 1.8.0
Relevant log output
No response
elasticsearch reader、kafka writer都是配置了100列
com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
从这个报错信息来看,从ES读取的字段数只有77个,没有100个,可以手工查看下字段数数是否满足要求
es是用通配符的方式读多个索引库,不同索引库的字段不同,因此配置的是全集字段
而且我也简单看了下源码,配置的column,确实没在代码中用到
目前应该直接查出索引库的所有字段了,如果我想只要其中的几列,也是做不到。
方便的话,麻烦给出测试用的 json 配置文件,我本地调试下
关联的东西比较多,只提供json配置文件,缺少索引库还是无法正常运行。说个简单的办法,只要你在elasticsearch reader中配置的column少于索引库的字段数,就会出现这个异常。
"reader": {
"name": "elasticsearchreader",
"parameter": {
"endpoint": "http://******:9200",
"accessId": "*****",
"accessKey": "*****",
"index": "clueeslog-*",
"type": "_doc",
"searchType": "dfs_query_then_fetch",
"headers": {},
"scroll": "3m",
"search": [
{
"query": {
"bool": {
"must": [
{
"range": {
"clueCreateddate": {
"gte": "2000-01-01 00:00:00.000",
"lt": "2021-01-01 00:00:00.000"
}
}
}
]
}
}
}
],
"column": [
"clueId",
"clueCreateddate",
"brandId",
"sbsVid"
]
}
},
目前代码里是没有处理,不过因为支持 ES 原生的搜索语句,所以一个临时的解决办法,是在 search
字段里加上 _source
这个过滤关键字,比如你给出的json文件可以改成
"reader": {
"name": "elasticsearchreader",
"parameter": {
"endpoint": "http://******:9200",
"accessId": "*****",
"accessKey": "*****",
"index": "clueeslog-*",
"type": "_doc",
"searchType": "dfs_query_then_fetch",
"headers": {},
"scroll": "3m",
"search": [
{
"query": {
"bool": {
"must": [
{
"range": {
"clueCreateddate": {
"gte": "2000-01-01 00:00:00.000",
"lt": "2021-01-01 00:00:00.000"
}
}
}
]
},
"_source": {
"include": ["clusterId", "cluCreateddate", "brandId", "sbsVid"]
}
}
}
],
"column": [
"clueId",
"clueCreateddate",
"brandId",
"sbsVid"
]
}
},
elastcisearch reader 插件的话,可以用附件的文件替代 plugin/reader/elasticsearchreader/elasticsearchreader-<version>jar
文件,然后再进行测试。