keedio / flume-ng-sql-source

Flume Source to import data from SQL Databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ERROR - org.keedio.flume.source.SQLSourceHelper.getStatusFileIndex(SQLSourceHelper.java:231)]

randidwiputra opened this issue · comments

Exception reading status file, doing back up and creating new status file
Unexpected exception at position -1: null
	at org.keedio.flume.source.SQLSourceHelper.checkJsonValues(SQLSourceHelper.java:244)
	at org.keedio.flume.source.SQLSourceHelper.getStatusFileIndex(SQLSourceHelper.java:227)
	at org.keedio.flume.source.SQLSourceHelper.<init>(SQLSourceHelper.java:105)
	at org.keedio.flume.source.SQLSource.configure(SQLSource.java:66)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:331)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Hi randidwiputra,
exception seems to be launched where trying to parse the json, on checkJsonValues() with a log.error information trace that helps to identify the problem, but i cant see it i your copy paste error. So please:

  • which version of flume-sql are you using?
  • copy and paste the content of status file (json)?
  • When catching "Exception reading status file, doing back up and creating new status file" a backup file status should be created in the same path, something like "sql1.status.bak.1517992774076", could you please copy and paste.
  • im trying to reproduce such an exception with latest release flume-sql but not been sucessfull. In the snapshot i force exception "reading status file, doing backup..." setting statusFileJsonMap to null and configured "configuredStartValue" to -1, but in the case for value -1 the helper forces sql to get rows from position 0 in the table, so the "LastIndex" value does not seem to be the problem. Setting statusFileJsonMap to null launches a NPE, as it is expected.
    captura de pantalla 2018-02-07 a las 10 07 53

best, Luis

Hi Luis

Sorry for late reply. I'm following this tutorial https://www.toadworld.com/platforms/oracle/w/wiki/11121.collecting-indexing-and-searching-mysql-database-table-data-in-apache-solr

flume-mysql.conf**

`agent.sources=sql-source
agent.sinks=sink1
agent.channels=ch1
agent.channels.ch1.type=memory

#agent.sources.sql-source.type=org.apache.flume.source.SQLSource
agent.sources.sql-source.type=org.keedio.flume.source.SQLSource
agent.sources.sql-source.channels=ch1

agent.sources.sql-source.connection.url=jdbc:mysql://localhost:3306/wlslog

Database connection properties

agent.sources.sql-source.user=root
agent.sources.sql-source.password=12345
agent.sources.sql-source.table=wlslog
agent.sources.sql-source.database=wlslog

agent.sources.sql-source.columns.to.select =*

Increment column properties

agent.sources.sql-source.incremental.column.name=id

Increment value is from you want to start taking data from tables (0 will import entire table)

agent.sources.sql-source.incremental.value=0

Query delay, each configured milisecond the query will be sent

agent.sources.sql-source.run.query.delay=10000

Status file is used to save last readed row

agent.sources.sql-source.status.file.path=/var/lib/flume
agent.sources.sql-source.status.file.name=sql-source.status

agent.sinks.sink1.morphlineId=morphline1
agent.sinks.sink1.type=org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.sink1.channel=ch1
agent.sinks.sink1.morphlineFile=/opt/flume/conf/morphlines.conf

agent.sinks.sink1.batchSize=1
agent.sinks.sink1.batchDurationMillis=10000

agent.channels.ch1.capacity=100000
`

`SOLR_LOCATOR : {

collection : collection1
solrUrl : "http://localhost:8983/solr/"
solrHomeDir: "/opt/solr/example/solr/collection1"
}
morphlines : [
{
id : morphline1
importCommands : ["com.cloudera.", "org.apache.solr.", "org.kitesdk.**"]
commands : [
{
readLine {
charset : UTF-8

    }
  }
  {
      generateUUID {
          field : id
      }
  }
  {
    sanitizeUnknownSolrFields {
      solrLocator : ${SOLR_LOCATOR}
    }
  } 
       
  { logDebug { format : "output record: {}", args : ["@{}"] } }   
       {
    loadSolr {
      solrLocator : ${SOLR_LOCATOR}
    }
  }
]

}
]`

sql-source.status
jdbc:mysql://localhost:3306/test wlslog id 0

sql-source.status.bak.1518407398120
{"LastIndex":"5"}

Thanks for your attention Luis

Hi randidwiputra,

  • i don't understand the relationship between the link (it doesnt work) and your problem with flume-sql.
  • it doesnt exists such a property "agent.sources.sql-source.incremental.value=0" , could you please update to 1.4.3 or 1.5.0
  • you status files are odd:
    example of sql1.status:
{"Query":"SELECT id,first_name FROM customers WHERE id > $@$","LastIndex":"20000","SourceName":"sql1","URL":"jdbc:mysql:\/\/127.0.0.1:3306\/testdb"}

example of corrupted sql1.status (cannot be parsed), so sql1.status.bak.1518422137916 is generated

{"Query":"SELECT id,first_name FROM customers WHERE id > $@$",**"LastIndex":-1**,"SourceName":"sql1","URL":"jdbc:mysql:\/\/127.0.0.1:3306\/testdb"}
  • this is a basic configuration file for flume-sql sinking to file. Maybe it can help you reduce the complexity of your config file in your use case:
agent.sinks = k1
agent.channels = ch1
agent.sources = sql1

# For each one of the sources, the type is defined
agent.sources.sql1.type = org.keedio.flume.source.SQLSource
agent.sources.sql1.hibernate.connection.url = jdbc:mysql://127.0.0.1:3306/testdb

# Hibernate Database connection properties
agent.sources.sql1.hibernate.connection.user = root
agent.sources.sql1.hibernate.connection.password = root
agent.sources.sql1.hibernate.connection.autocommit = true
agent.sources.sql1.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect
agent.sources.sql1.hibernate.connection.driver_class =  com.mysql.jdbc.Driver

agent.sources.sql1.table = customers
# Columns to import (default * import entire row)
agent.sources.sql1.columns.to.select = *
# Query delay, each configured milisecond the query will be sent
agent.sources.sql1.run.query.delay=10000

# Status file is used to save last readed row
agent.sources.sql1.status.file.path = /var/log/sqlflume/flume-out
agent.sources.sql1.status.file.name = sql1.status


agent.sources.sql1.batch.size = 1000
agent.sources.sql1.delimiter.entry = ~
agent.sources.sql1.enclose.by.quotes = false

agent.sources.sql1.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
agent.sources.sql1.hibernate.c3p0.min_size=1
agent.sources.sql1.hibernate.c3p0.max_size=10

# The channel can be defined as follows.
agent.sources.sql1.channels = ch1

agent.sinks.k1.type = file_roll
agent.sinks.k1.sink.directory = /var/log
agent.sinks.k1.sink.rollInterval = 7200

agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 10000
agent.channels.ch1.transactionCapacity = 1000

agent.sources.sql1.channels = ch1

agent.sinks.k1.channel = ch1

Hi Luis

Thanks for your reply.

My Flume version is apache-flume-1.4.0-bin. My Solr version solr-4.10.3. MySql Connector mysql-connector-java-5.1.45.

this is a basic configuration file for flume-sql sinking to file. Maybe it can help you reduce the complexity of your config file in your use case:

Thanks, i'll try. Yes, i just try sink mysql -> flume -> solr . But, i found error when flume agent running.

Thanks Sir