mattjbishop / splink-on-fabric

MoJ Splink on Microsoft Fabric

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

splink-on-fabric

An experiment to get the MoJ Splink Spark demo(s) running on Microsoft Fabric.

Notebooks:

Running Splink on Fabric

The best way to get Splink running is to make use of the new Environments feature in Fabric.

Note

Environments are a preview feature and may change before general release [as of 2024-03-05]

To use Splink in this demo, you need to:

  1. Upload the similarity UDF jar into the Lakehouse that you are using:
similarity_jar_file
  1. In your environment, add a "spark.jars" Spark Property that points to the jar file. Use an ABFS path to point to the file e.g. abfss://00000000-0000-0000-0000-000000000000@onelake.dfs.fabric.microsoft.com/00000000-0000-0000-0000-000000000000/Files/scala-udf-similarity-0.1.1_spark3.x.jar:
spark_jar_property

Note: You can get the correct ABFS path for your file by right-clicking on it in your lakehouse file listing and selecting "Copy ABFS Path": ABFS_path

  1. In your environment, add Splink as a Public Library from PyPl:
splink_public_library

About

MoJ Splink on Microsoft Fabric

License:MIT License


Languages

Language:Jupyter Notebook 100.0%