TopSpoofer / hbrdd

一个为spark批量导入数据到hbase的库

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

开启checkpoint后异常

hehuiyuan opened this issue · comments

18/08/22 15:50:41 ERROR Utils: Exception encountered
java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)

我这是spark streaming任务

stream.foreach{rdd=>rdd.put2hase(....)}
应该是put2hbase方法中:def put2Hbase(tableName: String)(implicit config: HbRddConfig)

调用了config.getHbaseConfig

然后这个方法实现是:def getHbaseConfig = HBaseConfiguration.create(config)

create方法中创建了Configuration conf = new Configuration(); 不能序列化,因为这个类没有实现序列化

但是我不知道为什么会报错,不开启checkpoint没事,开启了报错

解决办法:
object HbRddConfig {
type configOption = (String, String)
private[HbRddConfig] case class HbaseOption(name: String, value: String)

def apply(config: Configuration): HbRddConfig = new HbRddConfig(config)

def apply(configs: configOption*): HbRddConfig = {
val hbConfig = HBaseConfiguration.create()

for {
  option <- configs
  hbOption = HbaseOption(option._1, option._2) //使用新的case class 只是为了表达更加清晰
} hbConfig.set(hbOption.name, hbOption.value)

this.apply(hbConfig)

}