CleverTap / clevertap-android-sdk

CleverTap Android SDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SDK 5.1.0's new CTWorkManager breaks custom WorkerFactory implementations and can't rollback

henryglendening-hh opened this issue · comments

Describe the bug
Our app uses a custom WorkerFactory for creating Workers, and this custom WorkerFactory is assigned to the WorkerManager. I believe the new code added to SDK version 5.1.0 breaks this (specifically line 37 of CTWorkerManager.kt) however, as our custom WorkerFactory does not know how to handle the com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork class and fails to create the corresponding worker, resulting in a crash.

Since this is a breaking change, it would be helpful for the SDK version number to reflect that and include notes indicating that the implementing app should use a DelegatingWorkerFactory to mitigate any friction with CleverTap's use of the WorkerManager.

Additionally, due to the crashes that our application has experienced in production relating to com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork, we tried to roll back our CleverTap SDK version to 5.0.0 (where this class had not yet existed, nor the CTWorkerManager.kt class). Surprisingly, the app is continuing to crash, and experiencing the same exception where it is unable to handle com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork.

Do you know why this crash would continue to happen for users who are running SDK version 5.0.0? Is it possible that the com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork worker instruction being dispatched dynamically/remotely, affecting users who were previously running 5.1.0 but are now on 5.0.0?

To Reproduce
Steps to reproduce the behavior:

  1. Set up a custom WorkerFactory for use with the WorkerManager in an app
  2. Use CleverTap SDK 5.1.0
  3. ...
  4. Experience crash from the custom WorkerFactory not knowing how to handle com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork Worker creation
  5. Downgrade CleverTap SDK to 5.0.0
  6. ...
  7. Continue seeing the same crash occur, despite com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork being absent from the app binary

Expected behavior
When the SDK version is rolled back from 5.1.0 to 5.0.0, then the WorkerManager no longer receives requests to create a Worker for com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork

Screenshots/Logs
If applicable, add screenshots to help explain your problem.
In case of crashes, share the entire crash logs as a .txt file

Environment (please complete the following information):

  • Device: any
  • OS: Android 10, 11, 12, 13
  • CleverTap SDK Version 5.1.0 -> 5.0.0
  • Android Studio Giraffe | 2022.3.1

Additional context
Add any other context about the problem here.

Here's a screenshot of the classes.dex contents for the com.clevertap.android.sdk.pushnotification directory for the APK that's running SDK 5.0.0

Screenshot 2023-08-09 at 9 08 33 AM

@henryglendening-hh What is the crash? can you attach full stacktrace?

@piyush-kukadiya The crash is due to a custom exception thrown from our custom WorkerFactory implementation. Our WorkerFactory's override of createWorker(appContext: Context, workerClassName: String, workerParameters: WorkerParameters) throws an exception because it unexpectedly receives com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork as the value for the workerClassName parameter.

Here's the stacktrace:

Fatal Exception: java.lang.IllegalArgumentException: unknown worker class name: com.clevertap.android.sdk.pushnotification.work.CTFlushPushImpressionsWork at com.hayhouse.hayhouseaudio.workers.factory.MainWorkerFactory.createWorker(MainWorkerFactory.kt:24) at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:83) at androidx.work.impl.WorkerWrapper.runWorker(WorkerWrapper.java:243) at androidx.work.impl.WorkerWrapper.run(WorkerWrapper.java:145) at androidx.work.impl.utils.SerialExecutorImpl$Task.run(SerialExecutorImpl.java:96) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:923)

@henryglendening-hh Thanks for pointing out. We will add this in our changelog.

it would be helpful for the SDK version number to reflect that and include notes indicating that the implementing app should use a DelegatingWorkerFactory to mitigate any friction with CleverTap's use of the WorkerManager.

I would like to know why instead of handling CTFlushPushImpressionsWork in your custom factory you instead decided to roll back CleverTap SDK version to 5.0.0

@piyush-kukadiya We downgraded the SDK in order to lessen the impact of the crash on our users while we developed & validated a patch. Why would it be a problem to downgrade an SDK?

@henryglendening-hh While not entirely certain, it appears that the Work Manager library functions by saving scheduled workers in its own database. This database includes the name of the worker, which must be provided to the WorkerFactory for executing the task. I assumes that a CleverTap worker was scheduled and recorded in the database when the application was equipped with CleverTap SDK version 5.1.0. Upon downgrading the SDK, the entry in the database persisted, even though the SDK version changed, ultimately leading to a crash.

@piyush-kukadiya Thank you for the explanation. From what I can tell, I'm not sure that this could be happening this way:

the Work Manager library functions by saving scheduled workers in its own database

I don't believe this step would be occurring, as the overridden onCreate function of the custom WorkerManager fails to create the requested worker, which should mean that the worker couldn't be scheduled since it failed to get created.

@henryglendening-hh I don't find any other cause apart from this. From where else can this WorkerName will be coming from then? To verify this you can check an entry in WorkName table in androidx.work.workdb if you are able to reproduce the crash. Attaching the screenshot for the ref.
Screenshot 2023-08-09 at 9 30 12 PM

For the permanent solution I would recommend to handle the CTFlushPushImpressionsWork in your factory.

@piyush-kukadiya Thank you and yes, that is the permanent solution that we are targeting otherwise 👍 As far as reproducing goes, I'm not sure how to do that - can you provide us any insight into how to achieve this (specifically, how the work for CTFlushPushImpressionsWork gets triggered)?

@piyush-kukadiya And as far as handling CTFlushPushImpressionsWork in our factory goes, what do you recommend? I'm currently implementing the recommendation from an Android engineer at Google where a DelegatingWorkerFactory is used and my custom WorkerFactory returns null for the CTFlushPushImpressionsWork (thus, allowing for the default WorkFactory to handle creation of that worker implicitly).

@henryglendening-hh We have update our Docs here

  1. CTFlushPushImpressionsWork get's scheduled only when,
    • Push is rendered on device from CleverTap and
    • OS as well as App target SDK is above Android O and
    • Push is being rendered on Main Process of App, check here and here
  2. If scheduling is success, it will get stored in Work DB to run it when below constraints are met
    • When device is charging and
    • Network on device is connected, check here
  3. Android docs for WorkerFactory says "The factory is invoked every time a work runs", here we can assume that you received name of CTFlushPushImpressionsWork in Factory because work is triggered. check here
  4. When scheduled worker is run based on constraints, it initialises Worker class here which will call your createWorker() of custom WorkFactory here and if your createWorker() returns null then it initialises worker class using reflection, which is default factory behaviour here.
  5. What is happening is instead of returning null you are throwing exception. If you return null then work library will gracefully handle worker instantiation through reflection which is not happening here.
  6. Your current implementation will crash for all third party dependencies who are equipped with Workers and unknown to you.
  7. Ideal solution is to return null for unknown workers. Please check updated docs which provides recommended solution.

@piyush-kukadiya Thank you! I haven't been able to reproduce triggering the CTFlushPushImpressionsWork worker, however my patch seems to be successful and eliminated the crash. The solution we have in place using a DelegatingWorkerFactory is also performing what you mention in point 7 and the docs, so we've got our bases covered. Thank you for your support triaging and working through this 🙂