keedio / flume-ng-sql-source

Flume Source to import data from SQL Databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues faced while using 1.4.3 version

rishit606 opened this issue · comments

Hi ,

I was earlier using 1.3.7 version but tried to move to 1.4.3(latest) to avoid the quotes in output data file.
However while testing the new version i was facing below issues:

  1. output of custom query always repeated after 10k rows. Is max.rows property mandatory in 1.4.3 ?

Kindly suggest on above.
thanks,
Rishit

Hi rishit606,
i try to reproduce with latest release 1.5.0, but not getting repeated rows with 20k.
max.rows is not mandatory, it sets a max amount of rows that if resulset query cannot reach, main thread will be put to sleep when processing events.

Thanks lazaromedina,
10k rows was without setting max.rows property.
If i set max.rows property then it would work fine.
In my project, rows are going beyond 3 crores, so i was thinking of not using the max.rows as not always sure about the count.

thanks,
Rishit Shah.

Hi rishit606,
I'm not sure if I'm understanding your problem with the parameter "max.rows".

  • you are getting repeated rows if not setting "max.rows"?. Not setting the parameter is equivalent to the flume context using the 10000 defective value.
  • if you set a custom value for "max.rows" you are not getting repeated rows. Is the value you are setting that seems to work well because you do not get duplicate rows, is it higher or lower than the default value (10k)?
  • For your use case (more than 3 x 10^7 rows), not using (not setting) "max.rows" will fetch batches of 10^4, so may be you could try increase this value to at least 10^5.

Regards, Luis

Hi Luis,
Sorry for replying late.
For your above points , here is my reply :
Point 1 : Yes data repeats again after 10k rows
Point 2: set max.rows = 50000 . Data repeats after 50000 rows .
Point 3 : I couldn't get what you are trying to say.

My query is in version 1.3.7 i was not getting any duplicate rows (no max.rows property set). So is the default value of 10k set in new version ?

Thanks,
Rishit shah

Hi Rishit,

  • yes, in lastest release 1.5.0 default value of 10k is set.
  • in version 1.3.7 it is also set.
  • max.rows is configured and set to 10k since 1.3.0
  • about repeated rows, my first guess is that may be the file status where keep a json with "lastindex" is not being loaded between starts and stops of flume-ng. If file status cannot be read, lastindex is set to zero, but it's just a conjeture.
  • can you please enable debugging when launching flume-ng or at least append info trace. I am searching (expecting) for a line [ERROR] like "...reading status file..."

best, Luis