moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Home Page:https://moj-analytical-services.github.io/splink/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

threshold_selection_tool_from_labels_table does not work using spark

guylissak opened this issue · comments

What happens?

https://moj-analytical-services.github.io/splink/charts/threshold_selection_tool_from_labels_table.html

Hi this functionality of threshold_selection_tool_from_labels_table does not work when using spark linker.
same code works for me when using DuckDB

error:
ParseError: Invalid expression / Unexpected token. Line 1, Col: 14.
l.first_name�[4m�[0m = r.first_name

To Reproduce

Run this code using spark linker

https://moj-analytical-services.github.io/splink/charts/threshold_selection_tool_from_labels_table.html

OS:

Databricks

Splink version:

3.9.14

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Thanks for the report

The characters �[4m�[0m are ANSI escape codes for terminal text formatting. Specifically:

  • �[4m is the ANSI escape code for enabling underline.
  • �[0m is the ANSI escape code for resetting all text attributes (including underline).

I think this is therefore likely to be a copy and paste problem, could be related to this:
#2018

I've tried copy pasting the duckdb code which works fine at my end.

Are you able to post a reproducible example using the Spark linker where this error occurs?