SINr-Embeddings / sinr

The SINr approach to train interpretable word and graph embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Community with one disconnected node

nicolasdugue opened this issue · comments

commented

Describe the bug
If a community of one node disconnected to the rest of the graph exists, the embedding of the node is nan.

To Reproduce

  1. Having a node alone in a community, and disconnected of the rest of the graph.
  2. The node should be seen in sinr.out_of_LgCC
  3. Running the similarity evaluation would raise a warning such as RuntimeWarning: invalid value encountered in double_scalars cosine_sim.append(np.dot(vec1,vec2)/(norm(vec1)*norm(vec2))) if the word alone is in one of the similarity datasets.
  4. get_out_of_LgCC_coms has a bug, it can be corrected as follows, and the node alone should be the one alone in a com :
    def get_out_of_LgCC_coms(self, communities):
        """Get communities that are not in the Largest Connected Component (LgCC).

        :param communities: Partition object of the communities as obtained by calling a Networkit community detection algorithm
        :type communities: Partition
        :returns: Indices of the comunnities outside the LgCC
        :rtype: list[int]

        """
        out_of_LgCC_coms = []
        for com in communities.getSubsetIds():
            members = set(communities.getMembers(com))
            intersection = members.intersection(self.out_of_LgCC)
            if len(intersection) > 0:
                print(intersection)
                out_of_LgCC_coms.append(com)
        return out_of_LgCC_coms

Behavior expected
The connected components of the graph should be studied, and components with only one node should be removed. Degrees of such nodes is probably 0, such nodes can be removed.

Screenshots
Capture d’écran de 2024-06-03 11-26-31