confluentinc / kafka-connect-hdfs

Kafka Connect HDFS connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues related to NullPointerExceptions handling done in DataWriter.java class.

kaushiksrinivas opened this issue · comments

We see below PRs focussing on avoiding NPEs in DataWriter class.

  1. #422
  2. #496

In case of the first PR, the NPEs were handled via clearing up the
topicPartitionWriters set as well as the assignment upon close().

With the changes of 2. #496 we see that the code in lines 384 to 410 ( for (String topic : topics ...)in https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/DataWriter.java are never hit as topicpartitionwriters is always empty.
Is this by design or an unintended side effect.

We had reported another issue in lines 401 to 408 #538 due to which connector goes into irrecoverable crash loop. So this problem stays hidden with the above changes and hence reporting this query/issue.

Hi @kaushiksrinivas the topicPartitionWriters are initialized in the constructor here and syncwithHive() that contains the lines you mention is called after the initialization, so I wouldn't see how the writers would always be empty. Are you referring to a specific case? Could you clarify the flow of logic that would end up having the writers empty?

Hi @dosvath
Below is the code flow.
Install worker, install connector. Write few records to hdfs (complete the flush size and commit the file).
Now if we restart the connector, we always see the topicpartitionwriters to be empty in syncwithhive.

we have put debug statements in syncwithhive() and never observed topicpartitionwriters to be non empty at this point of time. And openpartitions() calls this function too, but it calls syncwithhive() before setting topicpartitionwriters.

Infact not even once we have seen syncwithhive() entering the for loop of ( for (String topic : topics ...) in any case in our tests. could it be issue with partition assignment to the tasks at this point of time ? Our debug logs have confirmed this.

And before 5.4.2, we have observed this to enter the for loop when it was still using assignment. confirmed with debug logs again.
Can you please check if you can reproduce this and confirm us.

Hi @dosvath
Any insights on this please ?

Hi @kaushiksrinivas thank you for bringing up the issue, I am actively looking into this and believe I may have an update for you soon.

Hi @dosvath
thanks for taking this up. Let me also know if any tests or patch needs to be done... we can help too.

Hi @kaushiksrinivas, in which version do you see the code working correctly? How did you determine #496 introduced the issue? Which Confluent platform version are you using?

Hi @kaushiksrinivas do you mean the issue is not present on Confluent platform version 5.4.2, or on the connector version 5.4.2?
I made a DEBUG branch that is based on connector version 5.3.4 and reverts changes from #496 on top. I still don't see context.assignment populated and thus the syncwithHive block is still not executed. I'm testing on confluent platform 6.x. I'm guessing the issue may be the difference of handling start() and open() in the new confluent platform version.

Hi @dosvath
I am referring to hdfs sink connector version here and not the confluent version.

in 5.4.2 version and onwards, we do not see the code flow entering the for loop. And the test where it was working / running this code was 5.1.0 in our case. We have not tested all the versions in between 5.1.0 and 5.4.2.

We are not sure if this is the correct behavior or not. If the flow in the for loop highlighted is not run, it will have other hidden issues like we have pointed out above.

Thank you @kaushiksrinivas I will be looking into this again this week.

Hi @kaushiksrinivas the issue will be fixed by #568