[BUG] Leiden on integer node ids and starting_communities
daxpryce opened this issue · comments
If I have a graph where the node ids are of type int
, the resulting partition map should also be of type int->int
. If I use this partition map when I call leiden with starting_communities
, it fails because the node id in the partition map is not of type str
.
Example Code
import unittest
class Example(unittest.TestCase):
def test_hashable_nonstr_with_starting_communities(self):
seed = 1234
first_graph = nx.erdos_renyi_graph(20, 0.4, seed=seed)
second_graph = nx.erdos_renyi_graph(21, 0.4, seed=seed)
third_graph = nx.erdos_renyi_graph(19, 0.4, seed=seed)
first_partitions = leiden(first_graph)
second_partitions = leiden(second_graph, starting_communities=first_partitions)
third_partitions = leiden(third_graph, starting_communities=second_partitions)
Full Traceback
self = <test_leiden.TestLeiden testMethod=test_hashable_nonstr_with_starting_communities>
def test_hashable_nonstr_with_starting_communities(self):
seed = 1234
first_graph = nx.erdos_renyi_graph(20, 0.4, seed=seed)
second_graph = nx.erdos_renyi_graph(21, 0.4, seed=seed)
third_graph = nx.erdos_renyi_graph(19, 0.4, seed=seed)
first_partitions = leiden(first_graph)
> second_partitions = leiden(second_graph, starting_communities=first_partitions)
tests/partition/test_leiden.py:213:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
graph = <networkx.classes.graph.Graph object at 0x124667c40>, starting_communities = {0: 1, 1: 2, 2: 0, 3: 1, ...}, extra_forced_iterations = 0
resolution = 1.0, randomness = 0.001, use_modularity = True, random_seed = None, weight_attribute = 'weight', is_weighted = None
weight_default = 1.0, check_directed = True, trials = 1
def leiden(
graph: Union[
List[Tuple[Any, Any, Union[int, float]]],
nx.Graph,
np.ndarray,
scipy.sparse.csr.csr_matrix,
],
starting_communities: Optional[Dict[str, int]] = None,
extra_forced_iterations: int = 0,
resolution: float = 1.0,
randomness: float = 0.001,
use_modularity: bool = True,
random_seed: Optional[int] = None,
weight_attribute: str = "weight",
is_weighted: Optional[bool] = None,
weight_default: float = 1.0,
check_directed: bool = True,
trials: int = 1,
) -> Dict[str, int]:
... PYDOC OMITTED FOR SOME SEMBLANCE OF BREVITY ...
_validate_common_arguments(
starting_communities,
extra_forced_iterations,
resolution,
randomness,
use_modularity,
random_seed,
is_weighted,
weight_default,
check_directed,
)
if not isinstance(trials, int):
raise TypeError("trials must be a positive integer")
if trials < 1:
raise ValueError("trials must be a positive integer")
node_id_mapping, edges = _validate_and_build_edge_list(
graph, is_weighted, weight_attribute, check_directed, weight_default
)
> _modularity, partitions = gn.leiden(
edges=edges,
starting_communities=starting_communities,
resolution=resolution,
randomness=randomness,
iterations=extra_forced_iterations + 1,
use_modularity=use_modularity,
seed=random_seed,
trials=trials,
)
E TypeError: argument 'starting_communities': 'int' object cannot be converted to 'PyString'
graspologic/partition/leiden.py:340: TypeError
Additional Details
We map the node ids for the graph into string versions prior to calling the native code, then re-reference the original node ids on the way back out for both the graph and the return partitions, but we don't ever do the same for starting_communities.
A proper fix is to refactor the mapping into a separate object that we can use for the provided graph, the return partitions, as well as the starting communities partition map.