graspologic-org / graspologic

Python package for graph statistics

Home Page:https://graspologic-org.github.io/graspologic/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Leiden on integer node ids and starting_communities

daxpryce opened this issue · comments

If I have a graph where the node ids are of type int, the resulting partition map should also be of type int->int. If I use this partition map when I call leiden with starting_communities, it fails because the node id in the partition map is not of type str.

Example Code

import unittest
class Example(unittest.TestCase):
    def test_hashable_nonstr_with_starting_communities(self):
        seed = 1234
        first_graph = nx.erdos_renyi_graph(20, 0.4, seed=seed)
        second_graph = nx.erdos_renyi_graph(21, 0.4, seed=seed)
        third_graph = nx.erdos_renyi_graph(19, 0.4, seed=seed)

        first_partitions = leiden(first_graph)
        second_partitions = leiden(second_graph, starting_communities=first_partitions)
        third_partitions = leiden(third_graph, starting_communities=second_partitions)

Full Traceback

self = <test_leiden.TestLeiden testMethod=test_hashable_nonstr_with_starting_communities>

    def test_hashable_nonstr_with_starting_communities(self):
        seed = 1234
        first_graph = nx.erdos_renyi_graph(20, 0.4, seed=seed)
        second_graph = nx.erdos_renyi_graph(21, 0.4, seed=seed)
        third_graph = nx.erdos_renyi_graph(19, 0.4, seed=seed)

        first_partitions = leiden(first_graph)
>       second_partitions = leiden(second_graph, starting_communities=first_partitions)

tests/partition/test_leiden.py:213:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

graph = <networkx.classes.graph.Graph object at 0x124667c40>, starting_communities = {0: 1, 1: 2, 2: 0, 3: 1, ...}, extra_forced_iterations = 0
resolution = 1.0, randomness = 0.001, use_modularity = True, random_seed = None, weight_attribute = 'weight', is_weighted = None
weight_default = 1.0, check_directed = True, trials = 1

    def leiden(
        graph: Union[
            List[Tuple[Any, Any, Union[int, float]]],
            nx.Graph,
            np.ndarray,
            scipy.sparse.csr.csr_matrix,
        ],
        starting_communities: Optional[Dict[str, int]] = None,
        extra_forced_iterations: int = 0,
        resolution: float = 1.0,
        randomness: float = 0.001,
        use_modularity: bool = True,
        random_seed: Optional[int] = None,
        weight_attribute: str = "weight",
        is_weighted: Optional[bool] = None,
        weight_default: float = 1.0,
        check_directed: bool = True,
        trials: int = 1,
    ) -> Dict[str, int]:
... PYDOC OMITTED FOR SOME SEMBLANCE OF BREVITY ...
        _validate_common_arguments(
            starting_communities,
            extra_forced_iterations,
            resolution,
            randomness,
            use_modularity,
            random_seed,
            is_weighted,
            weight_default,
            check_directed,
        )
        if not isinstance(trials, int):
            raise TypeError("trials must be a positive integer")
        if trials < 1:
            raise ValueError("trials must be a positive integer")
        node_id_mapping, edges = _validate_and_build_edge_list(
            graph, is_weighted, weight_attribute, check_directed, weight_default
        )

>       _modularity, partitions = gn.leiden(
            edges=edges,
            starting_communities=starting_communities,
            resolution=resolution,
            randomness=randomness,
            iterations=extra_forced_iterations + 1,
            use_modularity=use_modularity,
            seed=random_seed,
            trials=trials,
        )
E       TypeError: argument 'starting_communities': 'int' object cannot be converted to 'PyString'

graspologic/partition/leiden.py:340: TypeError

Additional Details

We map the node ids for the graph into string versions prior to calling the native code, then re-reference the original node ids on the way back out for both the graph and the return partitions, but we don't ever do the same for starting_communities.

A proper fix is to refactor the mapping into a separate object that we can use for the provided graph, the return partitions, as well as the starting communities partition map.