V-I-C-T-O-R / flume-kudu-sink

flume-ng kudu sink

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

flume-kudu-sink

flume关于kudu的sink组件

更新点

  • 衔接json字符串转换source
  • 根据传输的Json字段创建表

具体用法

  • 项目根目录下执行mvn clean package
  • 将tar/flume.kudu.sink-1.0-SNAPSHOT.jar文件拷贝到flume-ng根目录下的lib文件夹中
  • 启动Flume:nohup ./bin/flume-ng agent --conf ./conf --conf-file log_t.conf --name agent -Dflume.root.logger=INFO,console > /dev/null 2>&1 &
注意

-Dflume.root.logger传递的是log4j.properties文件中的参数,例如如果想每个实例一个文件,那么传递-Dflume.log.file=haha.log

用法示例

agent.channels = fileChannel
agent.sources = sqlSource
agent.sinks = kuduSink

agent.sources.sqlSource.type = org.keedio.flume.source.SQLSource

agent.sources.sqlSource.hibernate.connection.url = jdbc:mysql://127.0.0.1:3306/test?autoReconnect=true

agent.sources.sqlSource.hibernate.connection.user = test 
agent.sources.sqlSource.hibernate.connection.password = test
agent.sources.sqlSource.hibernate.connection.autocommit = true
agent.sources.sqlSource.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect
agent.sources.sqlSource.hibernate.connection.driver_class = com.mysql.jdbc.Driver
agent.sources.sqlSource.hibernate.temp.use_jdbc_metadata_defaults=false

agent.sources.sqlSource.table = items
# Columns to import to kafka (default * import entire row)
agent.sources.sqlSource.columns.to.select = *
# Query delay, each configured milisecond the query will be sent
agent.sources.sqlSource.run.query.delay= 300000
# Status file is used to save last readed row
agent.sources.sqlSource.status.file.path = /data/flume_status
agent.sources.sqlSource.status.file.name = items

# Custom query
agent.sources.sqlSource.start.from = 0 
agent.sources.sqlSource.source.transfer.method = bulk

agent.sources.sqlSource.batch.size = 3000
agent.sources.sqlSource.max.rows = 10000

agent.sources.sqlSource.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sqlSource.hibernate.c3p0.min_size=1
agent.sources.sqlSource.hibernate.c3p0.max_size=3

# The channel can be defined as follows.
agent.channels.fileChannel.type = file
agent.channels.fileChannel.checkpointDir = /data/flume_checkpont/items
agent.channels.fileChannel.dataDirs = /data/flume_datadirs/items
agent.channels.fileChannel.capacity = 200000000
agent.channels.fileChannel.keep-alive = 180
agent.channels.fileChannel.write-timeout = 180
agent.channels.fileChannel.checkpoint-timeout = 300
agent.channels.fileChannel.transactionCapacity = 1000000

agent.sources.sqlSource.channels = fileChannel
agent.sinks.kuduSink.channel = fileChannel

######配置kudu sink ##############################
agent.sinks.kuduSink.type = com.flume.sink.kudu.KuduSink
agent.sinks.kuduSink.masterAddresses = 127.0.0.1:7051,127.0.0.2:7051,127.0.0.3:7051
agent.sinks.kuduSink.customKey = itemid
agent.sinks.kuduSink.tableName = items
agent.sinks.kuduSink.batchSize = 100
agent.sinks.kuduSink.namespace = test
agent.sinks.kuduSink.producer.operation = upsert
agent.sinks.kuduSink.producer = com.flume.sink.kudu.JsonKuduOperationProducer

Configuration of KuDu Sink:

Property Name Default Description
masterAddresses - kudu master地址
type - sink类型
customKey - 指定的表的主键
tableName - kudu表名
batchSize 100 sink每批次处理数
namespace - kudu表前缀
producer.operation - kudu表插入操作类型
producer com.flume.sink.kudu.JsonKuduOperationProducer kudu数据操作具体执行类

About

flume-ng kudu sink


Languages

Language:Java 100.0%