wgzhao / Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

Home Page:https://wgzhao.github.io/Addax/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: The column config of the elasticsearch reader does not take effect

imzhf opened this issue · comments

commented

Contact Details(联系人)

No response

What happened?

2022-10-24 08:36:11.922 [0-0-0-writer] ERROR WriterRunner - Writer Runner Received Exceptions:
com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "taskGroup-0" com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)
2022-10-24 08:36:14.748 [ job-0] ERROR JobContainer - 运行scheduler出错.
2022-10-24 08:36:14.751 [ job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 7629 bytes | Speed 7.45KB/s, 3 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2022-10-24 08:36:14.751 [ job-0] INFO EsReader$Job - ============elasticsearch reader job destroy=================
com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77
at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:50)
at com.wgzhao.addax.plugin.writer.kafkawriter.KafkaWriter$Task.startWrite(KafkaWriter.java:154)
at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
at java.lang.Thread.run(Thread.java:748)

Version

4.0.9 (Default)

OS Type

No response

Java JDK Version

Oracle JDK 1.8.0

Relevant log output

No response

commented

elasticsearch reader、kafka writer都是配置了100列

com.wgzhao.addax.common.exception.AddaxException: NOT_MATCHED_COLUMNS - Your item column has 100 , but the record has 77

从这个报错信息来看,从ES读取的字段数只有77个,没有100个,可以手工查看下字段数数是否满足要求

commented

es是用通配符的方式读多个索引库,不同索引库的字段不同,因此配置的是全集字段

commented

而且我也简单看了下源码,配置的column,确实没在代码中用到

commented

目前应该直接查出索引库的所有字段了,如果我想只要其中的几列,也是做不到。

方便的话,麻烦给出测试用的 json 配置文件,我本地调试下

commented

关联的东西比较多,只提供json配置文件,缺少索引库还是无法正常运行。说个简单的办法,只要你在elasticsearch reader中配置的column少于索引库的字段数,就会出现这个异常。

commented
"reader": {
    "name": "elasticsearchreader",
    "parameter": {
        "endpoint": "http://******:9200",
        "accessId": "*****",
        "accessKey": "*****",
        "index": "clueeslog-*",
        "type": "_doc",
        "searchType": "dfs_query_then_fetch",
        "headers": {},
        "scroll": "3m",
        "search": [
            {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "range": {
                                    "clueCreateddate": {
                                        "gte": "2000-01-01 00:00:00.000",
                                        "lt": "2021-01-01 00:00:00.000"
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        ],
        "column": [
            "clueId",
            "clueCreateddate",
            "brandId",
            "sbsVid"
        ]
    }
},

目前代码里是没有处理,不过因为支持 ES 原生的搜索语句,所以一个临时的解决办法,是在 search 字段里加上 _source 这个过滤关键字,比如你给出的json文件可以改成

"reader": {
    "name": "elasticsearchreader",
    "parameter": {
        "endpoint": "http://******:9200",
        "accessId": "*****",
        "accessKey": "*****",
        "index": "clueeslog-*",
        "type": "_doc",
        "searchType": "dfs_query_then_fetch",
        "headers": {},
        "scroll": "3m",
        "search": [
            {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "range": {
                                    "clueCreateddate": {
                                        "gte": "2000-01-01 00:00:00.000",
                                        "lt": "2021-01-01 00:00:00.000"
                                    }
                                }
                            }
                        ]
                    },
                   "_source": {
                       "include": ["clusterId", "cluCreateddate", "brandId", "sbsVid"]
                }
              }
            }
        ],
        "column": [
            "clueId",
            "clueCreateddate",
            "brandId",
            "sbsVid"
        ]
    }
},

elastcisearch reader 插件的话,可以用附件的文件替代 plugin/reader/elasticsearchreader/elasticsearchreader-<version>jar 文件,然后再进行测试。

elasticsearchreader-4.0.11-SNAPSHOT.jar.gz