graspologic-org / graspologic

Python package for graph statistics

Home Page:https://graspologic-org.github.io/graspologic/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

It feels like the matrix being embedded, or at least its singular values, should be stored by ASE/LSE

ebridge2 opened this issue · comments

Is your feature request related to a problem? Please describe.

ZG2 is not perfect, and sometimes goes awry. It feels like it would be reasonable to allow users to at least have exposed to them the singular values, so they can reasonably decide whether the singular value chosen for the cutoff was a reasonable choice by ZG. Alternatively, since the laplacian is an internal calculation, one could feasibly just set self.L_norm = L_norm after computing the laplacian of the adjacency matrix, and let the user do whatever they want with the representation of the network before it is embedded.

@ebridge2 singular values are stored as an attibute https://github.com/microsoft/graspologic/blob/ff34382d1ffa0b7ea5f0e005525b7364f977e86f/graspologic/embed/ase.py#L83

unsure how i feel about storing the matrix - somewhat opposed as it requires that object to carry around a (potentially large) array

I meant for all of the dimensions -- it seems like one has no way to decide how well ZG2 did otherwise post-hoc; e.g., you can't actually produce a full scree plot, which is a normal step for dimensionality reduction.

part of the issue with that would be doing a full SVD by default, then?

Very good call. However, I think there are a lot of tricks to get "surrogates", at least for what I imagine to be the "most common" case that will arise (e.g., assume weighted, undirected, adjacency matrix). I think you can reasonably still use the fact that a weighted, undirected, adjacency matrix with diagonal aug or a laplacian X are PSD, and therefore you can compute the proportion of the singular values in-place by just using the sum of the top k singular values (where k=ZG2) divided by the trace of X (e.g., the "proportion" of the singular values). If this value is big, it's pretty clear your elbow was probably at least not losing much information in the embedded matrix. I don't work with directed matrices so it's unclear to me how common this situation is, however.