moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Home Page:https://moj-analytical-services.github.io/splink/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Splink 4] Move key user-facing classes/functions to `__init__.py`?

RobinL opened this issue · comments

If __init__.py contains:

__version__ = "4.0.0.dev2"

from splink.blocking_rule_library import block_on
from splink.datasets import splink_datasets
from splink.linker import Linker
from splink.settings_creator import SettingsCreator

from splink.sqlite.database_api import SQLiteAPI

try:
    from splink.duckdb.database_api import DuckDBAPI
except ImportError:
    pass

try:
    from splink.postgres.database_api import PostgresAPI
except ImportError:
    pass

try:
    from splink.spark.database_api import SparkAPI
except ImportError:
    pass

We could then write

from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets

You can do:

def __getattr__(name):
    if name == "SparkAPI":
        try:
            from splink.spark.database_api import SparkAPI

            return SparkAPI
        except ImportError:
            raise ImportError(
                "SparkAPI cannot be imported because its dependencies are not "
                "installed. Please `pip install pyspark`."
            )
    raise AttributeError(f"module 'splink' has no attribute '{name}'")

If you want a custom error message that is generated at the point the user tries to from splink import SparkAPI

Otherwise the user would just get ImportError: cannot import name 'SparkAPI' from 'splink'