IndexError: List index out of range when calling linker.estimate_parameters_using_expectation_maximisation(training_blocking_rule)
sthamodh opened this issue · comments
What happens?
Hi,
The following code is picked up from the example page here: spark example
I have had this code work many times before without any issues.
The error I get is from the expectation maximization step as shown below
I did a little snooping around and was able to trace back to the step below and when I ran this piece of code after the SparkLinker step, I get the same error as shown in the screenshots below.
linker._settings_obj._get_comparison_levels_corresponding_to_training_blocking_rule("l.first_name = r.first_name and l.surname = r.surname")
I have no idea why this would happen and all of the developed code from my solution (deduplicating corporate addresses) also fails in this step.
To Reproduce
- I used the spark example
- I ran it on a single user cluster on databricks with 12.2 LTS ML as the runtime. Here is a screenshot of the cluster configuration:
OS:
Databricks runtime version: 12.2 LTS ML (includes Apache Spark 3.3.2, Scala 2.12)
Splink version:
3.9.13
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
See this comment
Closed by #2079