moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Home Page:https://moj-analytical-services.github.io/splink/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEAT] Rename cols in graph metric tables

zslade opened this issue · comments

Is your proposal related to a problem?

Describe the solution you'd like

I think some of the column names in the graph metrics tables (generated by linker.compute_graph_metrics()) could benefit from a renaming to make them more descriptive and to align with documentation on metrics.

For example, the n_nodes column in the cluster metrics table (i.e. the count of nodes in a cluster) is referred to as 'cluster size' in the documentation. Renaming the column to 'cluster_size' would be more consistent. Likewise, it may be a good idea to update the names of the density and centralisation columns to cluster_density and `cluster_centralisation respectively to make it clear that these metrics are quantifying characters of clusters (rather than, say, edges).

Describe alternatives you've considered

Additional context