How to explain the meaning of "ssm_obj.pca_model_components"

Question

How to explain the meaning of "ssm_obj.pca_model_components"

fengyasi opened this issue 8 months ago · comments

Hello Williams
Sorry to disturb
I'd like to ask how to understand the meaning of "ssm_obj.pca_model_components". If I have 20 landmark files, and I want to get the first three eigenvectors of covariance matrix, can these three eigenvectors represent the information of the above 20 landmark files?
Best wishes!

Josh Williams · Answer 1 · Wed Nov 22 2023 16:14:45 GMT+0800 (China Standard Time)

Hi Feng,
Thanks for raising this and using our library!
Yes you are correct, the pca_model_components is representing the eigenvectors of the covariance matrix. This is inhereted from the StatisticalModelBase class in the function create_pca_model (see statistical_model_base.py):

    # perform principal component analysis to train shape model
    self.pca_object, self.required_mode_number = self.do_pca(
      dataset, desired_variance
    )
    # get principal components (eigenvectors) and variances (eigenvalues)
    self.pca_model_components = self.pca_object.components_
    self.variance = self.pca_object.explained_variance_

Building on the jupyter notebook tutorial, you would get the three leading eigenvectors using:

eigenvectors_first_three = ssm_obj.pca_model_components[:3]

Hope that helps?

Best,
Josh

fengyasi · Answer 2 · Thu Nov 23 2023 10:45:16 GMT+0800 (China Standard Time)

Thanks for your kindly reply.
I have also tried another method, the link is [(https://github.com/morphomatics/morphomatics)], and the code is:

import pyvista as pv
from morphomatics.manifold import FundamentalCoords, util, DifferentialCoords, PointDistributionModel
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
import morphomatics
from morphomatics.geom import Surface
from morphomatics.stats import StatisticalShapeModel
from morphomatics.manifold import FundamentalCoords

# import keras
# load surfaces
# mesh1 = pv.read('b1-1.obj')  # reference shape
# mesh2 = pv.read('Wrapping2.obj')  # Mapping shape 1, format should be .obj?

mesh1 = pv.read('testAAANRR001.obj')  # reference shape
mesh2 = pv.read('testAAANRR002.obj')  # Mapping shape 1, format should be .obj?
mesh3 = pv.read('testAAANRR003.obj')  # Mapping shape 2, format should be .obj?

meshes = [mesh1, mesh2, mesh3]

# show
pl = pv.Plotter(shape=(1, len(meshes)))
for i in range(len(meshes)):
    pl.subplot(0, i)
    pl.add_mesh(meshes[i])
    pl.view_xz()
    pl.camera.roll += 0
    pl.camera.zoom(2)
# pl.show()
# print(type(meshes))

##获取点云数据
# to Surface type

as_surface = lambda mesh: Surface(mesh.points, mesh.faces.reshape(-1, 4)[:, 1:])  # method
surfaces = [as_surface(m) for m in meshes]

# construct model
SSM = StatisticalShapeModel(lambda ref: FundamentalCoords(ref))  ##是否可以修改
# print(SSM)
SSM.construct(surfaces)
data = SSM.coeffs
print(data)

# show mean
pl = pv.Plotter()
pl.add_mesh(pv.PolyData(SSM.mean.v, meshes[0].faces))
pl.view_xz()
pl.camera.roll += 0
# pl.show(window_size=(840,400))

SSM.modes, SSM.modes.shape  # its modes of variation
print(SSM.modes.shape)
SSM.variances, SSM.variances.shape  # its per-mode-variances # 理解为协方差矩阵的特征值
var = SSM.coeffs.shape  # its shape coefficients (uniquely determining all input shapes) # 理解为协方差矩阵的特征向量

and SSM.coeffs is the eigenvectors of covariance matrix. which is a n*(n-1) matrix, and I think the "SSM.coeffs" is different from the "ssm_obj.pca_model_components", when I tried the code in the tutotial:

import pyssam
from copy import copy
import matplotlib.pyplot as plt
import numpy as np
from glob import glob

LANDMARK_DIR = "D:\PengChen\DeepLearning\ssm\pyssam-main\example_data\lung_landmarks"

landmark_files = glob(LANDMARK_DIR + "/landmarks*.csv")
if len(landmark_files) == 0:
    raise AssertionError(
        "The directories you have declared are empty.",
        "\nPlease check your input arguments.",
    )

print(len(landmark_files))
landmark_coordinates = np.array(
    [np.loadtxt(l, delimiter=",") for l in landmark_files]
)
print(landmark_coordinates.shape)

fig = plt.figure()
ax = plt.axes(projection="3d")  # 画布
for i in range(len(landmark_coordinates)):
    ax.scatter3D(landmark_coordinates[i, :, 0], landmark_coordinates[i, :, 1], landmark_coordinates[i, :, 2],
                 c='g', marker='*')

plt.show()

for i, data in enumerate(landmark_coordinates):
    print(f"New data {i + 1} 坐标点数量: {data.shape[0]}")

ssm_obj = pyssam.SSM(landmark_coordinates)
ssm_obj.create_pca_model(ssm_obj.landmarks_columns_scale)
mean_shape_columnvector = ssm_obj.compute_dataset_mean()
mean_shape = mean_shape_columnvector.reshape(-1, 3)
shape_model_components = ssm_obj.pca_model_components
print(shape_model_components.shape)
# (48, 18018) and 18018 equal 6006*3, 6006 is the number of landmarks,
# if it means that these 48 samples have 18018 eigenvectors? 


def plot_cumulative_variance(explained_variance, target_variance=-1):
    number_of_components = np.arange(0, len(explained_variance)) + 1
    fig, ax = plt.subplots(1, 1)
    color = "blue"
    ax.plot(number_of_components, explained_variance * 100.0, marker="o", ms=2, color=color, mec=color, mfc=color)
    if target_variance > 0.0:
        ax.axhline(target_variance * 100.0)

    ax.set_ylabel("Variance [%]")
    ax.set_xlabel("Number of components")
    ax.grid(axis="x")
    plt.show()


def plot_shape_modes(
        mean_shape_columnvector,
        mean_shape,
        original_shape_parameter_vector,
        shape_model_components,
):
    weights = [-2, 0, 2]
    fig, ax = plt.subplots(1, 3)
    for j, weights_i in enumerate(weights):
        shape_parameter_vector = copy(original_shape_parameter_vector)
        shape_parameter_vector[mode_to_plot] = weights_i
        mode_i_coords = ssm_obj.morph_model(
            mean_shape_columnvector,
            shape_model_components,
            shape_parameter_vector
        ).reshape(-1, 3)

        offset_dist = pyssam.utils.euclidean_distance(
            mean_shape,
            mode_i_coords
        )
        # colour points blue if closer to point cloud centre than mean shape
        mean_shape_dist_from_centre = pyssam.utils.euclidean_distance(
            mean_shape,
            np.zeros(3),
        )
        mode_i_dist_from_centre = pyssam.utils.euclidean_distance(
            mode_i_coords,
            np.zeros(3),
        )
        offset_dist = np.where(
            mode_i_dist_from_centre < mean_shape_dist_from_centre,
            offset_dist * -1,
            offset_dist,
        )
        if weights_i == 0:
            ax[j].scatter(
                mode_i_coords[:, 0],
                mode_i_coords[:, 2],
                c="gray",
                s=1,
            )
            ax[j].set_title("mean shape")
        else:
            ax[j].scatter(
                mode_i_coords[:, 0],
                mode_i_coords[:, 2],
                c=offset_dist,
                cmap="seismic",
                vmin=-1,
                vmax=1,
                s=1,
            )
            ax[j].set_title(f"mode {mode_to_plot} \nweight {weights_i}")
        ax[j].axis('off')
        ax[j].margins(0, 0)
        ax[j].xaxis.set_major_locator(plt.NullLocator())
        ax[j].yaxis.set_major_locator(plt.NullLocator())

    plt.show()


print(f"To obtain {ssm_obj.desired_variance * 100}% variance, {ssm_obj.required_mode_number} modes are required")
plot_cumulative_variance(np.cumsum(ssm_obj.pca_object.explained_variance_ratio_), 0.9)

mode_to_plot = 0
print(f"explained variance is {ssm_obj.pca_object.explained_variance_ratio_[mode_to_plot]}")
plot_shape_modes(
    mean_shape_columnvector,
    mean_shape,
    ssm_obj.model_parameters,
    ssm_obj.pca_model_components,
)

mode_to_plot = 1
print(f"explained variance is {ssm_obj.pca_object.explained_variance_ratio_[mode_to_plot]}")
plot_shape_modes(
    mean_shape_columnvector,
    mean_shape,
    ssm_obj.model_parameters,
    ssm_obj.pca_model_components,
)

mode_to_plot = 2
print(f"explained variance is {ssm_obj.pca_object.explained_variance_ratio_[mode_to_plot]}")
plot_shape_modes(
    mean_shape_columnvector,
    mean_shape,
    ssm_obj.model_parameters,
    ssm_obj.pca_model_components,
)

mode_to_plot = 15
print(f"explained variance is {ssm_obj.pca_object.explained_variance_ratio_[mode_to_plot]}")
plot_shape_modes(
    mean_shape_columnvector,
    mean_shape,
    ssm_obj.model_parameters,
    ssm_obj.pca_model_components,
)

The shape of shape_model_components is (48, 18018) and 18018 equal 6006*3, 6006 is the number of landmarks,
if it means that these 48 samples have 18018 eigenvectors? I'm not sure.

Best wishes.

Josh Williams · Answer 3 · Thu Nov 23 2023 17:08:21 GMT+0800 (China Standard Time)

Hi @fengyasi.

Thanks for sharing this. I was not aware of this library!

Here is what I think is happening from some quick checks just now:
Indeed when one does a singular value decomposition (np.linalg.svd(ssm_obj.landmarks_columns_scale)) this returns the eigenvectors as a (18018, 18018) array.
If we add the kwarg full_matrics=False (np.linalg.svd(ssm_obj.landmarks_columns_scale, full_matrices=False)), the components are an (48, 18018) array (which is the same shape as shape_model_components).

I can dig into it a bit more to understand tomorrow or the weekend perhaps. Maybe we will need to add the option to change this behaviour to use full SVD.

Josh

fengyasi · Answer 4 · Fri Nov 24 2023 09:27:19 GMT+0800 (China Standard Time)

Hi Josh:

I believe your method is faster because, unlike the method mentioned in my previous question, it does not require rigid and non-rigid registration. If you can modify your code and your feature vectors can capture the global morphological features, it would be a very attractive approach.

Thanks a lot.

Josh Williams · Answer 5 · Fri Nov 24 2023 16:15:28 GMT+0800 (China Standard Time)

Hi @fengyasi,

I will look into modifying it. Our code is faster because using np.linalg.svd with full_matrices=False is over 30x faster than with full_matrices=True.

Also, was this figure produced using pyssam or morphomatics? Looks very nice!

Josh

Josh Williams · Answer 6 · Fri Nov 24 2023 20:28:52 GMT+0800 (China Standard Time)

Hi @fengyasi

I have created a modified SSM example jupyter notebook to use SVD as done by np.linalg.svd. You can find this tutorial here (on the development branch). You will see the new functions numpy_pca(), reconstruct_with_svd() and morph_with_svd().

In this case, the eigenvector matrix has shape (18018, 18018). The singular values are obviously the same, so there is minimal change to the resultant modes. It seems the first mode is weighted heavier with the current sklearn.PCA.

I can add an option to alternate between sklearn.PCA or np.linalg.svd backend in the do_pca() method. Would this be of value to you? Could you please explain the benefit of this? Also I would be happy for you to make modifications to the code and review your pull request.

Best,
Josh

fengyasi · Answer 7 · Sat Nov 25 2023 23:36:23 GMT+0800 (China Standard Time)

Hi @jvwilliams23

Thank you for your reply, which helped me understand your code better. The results of plot_shape_modes and dev_plot_shape_modes are close.

The image that I used yesterday (#2 (comment)) is from a reference called “Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering applied to Healthy and Pathological Aortic Arches” by Bruse et al. In this study, the authors used the SSM to extract global features and create shape clusters that are correlated with clinical assessment. I believe your code can also achieve similar results.

Does the code now have the capability to extract the global features (for example mode 1, mode 2, and mode 3 can represent the 90% information of the original landmarks coordinate files?) of different organs, such as aneurysms, brain, etc.? I will have a try on your new code.

Thanks again for your help!

Best wishes.

feng

Josh Williams · Answer 8 · Sat Nov 25 2023 23:45:35 GMT+0800 (China Standard Time)

Hi @fengyasi.

Yes our code can extract shape features (as we are showing in the documentation). This is the most common task in SSM field. You should just be able to replace the landmark files in the tutorial with your own landmark data.

To reduce the number of modes to e.g. the first three, see the documentation for the morph_model function (here). You can set num_modes=3 to use the eigenvectors and eigenvalues from the three most dominant modes to deform the model.

It is currently just working with point cloud coordinates. I have code to use this to produce surface files like in your figure above. I will upload it sometime next week.

Josh

fengyasi · Answer 9 · Sun Nov 26 2023 22:58:03 GMT+0800 (China Standard Time)

Hi @jvwilliams23
Sure, if it's convenient for you, I think the models need ridig registration(such as Iterative Closest Point, method) and non rigid registration method before making the SSM process according to my experience, I hope it will be helpful.
Thanks!

fengyasi · Answer 10 · Thu Nov 30 2023 15:06:01 GMT+0800 (China Standard Time)

Hi @jvwilliams23
If I understand your code correctly, the shape_model_components(# eigenvectors of covariance matrix, obtain by PCA.) should be the shape features. The dimension of shape_model_components is number of lanamark files (number of points *3)*, is each column a set of shape features? I'm not sure.
Uploading Remesh.zip…
What is enclosed with this letter is the new code and data I used.
feng

Josh Williams · Answer 11 · Thu Nov 30 2023 18:22:35 GMT+0800 (China Standard Time)

Not really sure what you mean. Each row is a principal component variance, then I guess each column would represent an x,y or z component of a specific coordinate (so each item in the 2D array, i,j is how much one coordinate moves in one direction under a mode of variance, i)

Josh Williams · Answer 12 · Thu Nov 30 2023 18:25:28 GMT+0800 (China Standard Time)

You will perhaps benefit from the sklearn.PCA documentation. You will see that n_components == min(n_samples, n_features).

fengyasi · Answer 13 · Thu Nov 30 2023 21:11:00 GMT+0800 (China Standard Time)

I apologize for not being clear in my expression. What I meant to ask is: if each shape_model_components（eigenvectors of covariance matrix, obtained by PCA.） can represent one shape feature.

Josh Williams · Answer 14 · Thu Nov 30 2023 21:13:11 GMT+0800 (China Standard Time)

it represents all of the shape features. Each row is one separate mode of variation in the landmarks

fengyasi · Answer 15 · Fri Dec 01 2023 10:57:39 GMT+0800 (China Standard Time)

Thanks, I understand.

so can we calculate the shape coefficients of the input models by changing the code to make clustering? just like SSM.coeffs in [(https://morphomatics.github.io/tutorials/tutorial_ssm/)], it's another package.

Josh Williams · Answer 16 · Fri Dec 01 2023 16:20:46 GMT+0800 (China Standard Time)

Here is the equivalence between pyssam and morphometrics:

pyssam.ssm.shape_model_components = morphometrics.SSM.modes
pyssam.ssm.variances = morphometrics.SSM.variances
pyssam.ssm.fit_model_parameters() = morphometrics.SSM.coeffs

The final one (coeffs) seems to be computed automatically in morphometrics when you define the model. This is not done automatically in pyssam, but we have the fit_model_parameters() function (docs). I should add a tutorial for this. However, you can see its usage in our tests, where it is used extensively e.g. test_ssm.py.

You can copy code from the tests into your script to find the unique coefficients for each sample, which will allow you to reproduce the clustering analysis you showed previously.

fengyasi · Answer 17 · Fri Dec 01 2023 22:22:36 GMT+0800 (China Standard Time)

Hi @jvwilliams23

Thank you so much! I have over 100 models, and it is complex to perform non-rigid registration analysis on each one individually. You have helped me a lot!

I also have another question: Can we generate new models with surfaces under different modes?

Best wishes.

Josh Williams · Answer 18 · Fri Dec 01 2023 23:30:05 GMT+0800 (China Standard Time)

No problem!

Not currently, but I do have the code to do this somewhere else and it would be an easy post-processing step to add in the next couple of weeks. Pyssam currently just uses point clouds (landmarks).

If you are interested, it is the surface morphing algorithm from Grassi et al. (2011). Is this what you refer to when you say "generate new models with surfaces under different modes"?

fengyasi · Answer 19 · Sat Dec 02 2023 13:46:31 GMT+0800 (China Standard Time)

Hi @jvwilliams23
Yes, that’s how it is!
Point clouds only have node coordinates, there are no linking relationships between nodes
We want to generate three-dimensional models of different organs, such as aneurysms (https://ieeexplore.ieee.org/abstract/document/9349107?casa_token=Nm8KBKndM-EAAAAA:jovQvtc1htKBL6zvfoUeK7b6N4tyE5rtzxf8ePnGNajHZTn81RUGjB32OL1A7CyyBMEgx9F1nVSW)
and fumer you mentioned above (https://www.sciencedirect.com/science/article/abs/pii/S1350453310002109?via%3Dihub)
Best regards.
feng

fengyasi · Answer 20 · Sat Dec 02 2023 19:11:02 GMT+0800 (China Standard Time)

Hi @jvwilliams23 :
I have tried to use pyssam.ssm.fit_model_parameters(), I found that when calling this function pyssam.ssm.fit_model_parameters(), two inputs need to be given: input sample and shape_model_components. And this function will return model_parameters, but I think model_parameters (which is used to perturb each principal component by some amount 1D array, where values should all be within +/- 3.) is not the shape coefficients of the input models.

I tried to use another input sample , but we still can't obtain the shape coefficients, so I want to ask what kind of input sample should be given.
test aorta000_results.xlsx，
test aorta001_results.xlsx

from simpleicp import PointCloud, SimpleICP
import numpy as np
import pyssam
import pandas as pd
from glob import glob
from scipy.spatial import KDTree
import matplotlib.pyplot as plt
from copy import copy

# Read point clouds from xyz files into n-by-3 numpy arrays

LANDMARK_DIR = "D:\PengChen\ssmaaa"
landmark_files = glob(LANDMARK_DIR + "/test aorta*.xlsx")
# print(len(landmark_files))
if len(landmark_files) == 0:
    raise AssertionError(
        "The directories you have declared are empty.",
        "\nPlease check your input arguments.",
    )

# 存储 STL 数据的数组
stl_data_list = []
# 循环读取并存储 STL 数据
min_points = float('inf')  # 初始化最小坐标点数量为正无穷大

for file in landmark_files:
    # 使用 trimesh 的 load_mesh 函数读取文件
    df = pd.read_excel(file, skiprows=0)
    num_points = df.shape[0]  # 获取行数，即坐标点的数量
    min_points = min(min_points, num_points)
    vertices = df.values
    sorted_vertices = vertices[vertices[:, 2].argsort()]
    # stl_data_list.append(df.values)
    stl_data_list.append(sorted_vertices)


for i, data in enumerate(stl_data_list):
    print(f"Stl data {i + 1} 坐标点数量: {data.shape[0]}")

landmark_coordinates = np.array(stl_data_list)

# landmark_coordinates = landmark_coordinates[0:-1, :, :]
print(landmark_coordinates.shape)
#
ssm_obj = pyssam.SSM(landmark_coordinates)
ssm_obj.create_pca_model(ssm_obj.landmarks_columns_scale)
mean_shape_columnvector = ssm_obj.compute_dataset_mean()
mean_shape = mean_shape_columnvector.reshape(-1, 3)
shape_model_components = ssm_obj.pca_model_components
print(shape_model_components.shape)

coeff = ssm_obj.fit_model_parameters(landmark_coordinates[1], shape_model_components)
print(coeff)

Josh Williams · Answer 21 · Mon Dec 04 2023 00:55:15 GMT+0800 (China Standard Time)

It is hard to help, because you have not provided the outputs from the different print statements in your code...?

Nevertheless, it looks like you are using STL coordinates to train your SSM which have no correspondence. You need to use landmarks, where each sample has the same number of landmarks and they are sorted in the same order (landmark 1 is the same feature on each sample). So yeah, if you try to fit_model_parameters, it is going to output something which is completely wrong

Josh Williams · Answer 22 · Mon Dec 04 2023 00:57:05 GMT+0800 (China Standard Time)

Also if you only have 2 samples (like 2 xlsx files you have attached), you are not going to have a model capable of producing correct output

fengyasi · Answer 23 · Mon Dec 04 2023 09:32:18 GMT+0800 (China Standard Time)

Hi @jvwilliams23
Thanks for your reply first.
So now I tried the the landmark files of lungs that you provided in the folder (https://github.com/jvwilliams23/pyssam/tree/main/example_data) and the code of test_ssm.py:

test_sample_id = 1
target_shape = ssm_obj.landmarks_columns_scale[test_sample_id]
model_parameters = ssm_obj.fit_model_parameters(target_shape, ssm_obj.pca_model_components)
print(model_parameters)

It will output the 1D array (48, ).

[-1.11580007  2.21165852 -1.01869685  0.23541563 -1.27556996 -1.01021429
 -0.639636    0.82138931  0.17766148 -0.50953353  2.02758919  0.33178339
  0.79850343 -0.03060536 -1.07358324  0.43292614  0.71824538 -0.66852739
  0.6010668   0.10419736 -0.22799726 -0.06395666  1.08279304  1.82692897
  0.19250186 -0.0548615   0.71844014  1.29350938  3.4408299   0.35765348
  1.5586059  -0.73511378  1.14755897  0.25332925 -0.58883713 -0.33806624
  0.59766521  0.57114284  1.04533285 -0.05502311 -0.41330958  0.11245477
  0.72370723  0.13904692 -0.10897357  0.38755082 -0.99579438  0.9715206 ]

I also tested the above code on my own samples, 21 abdominal aortic aneurysms, it will output 1D array:

[ 1.27339304 -0.68895829 -0.37692336 -0.70247151  2.37149469  0.60298648
  0.80844234  1.16810228 -0.45902085  1.2220032  -0.16509319 -0.12481842
  0.39846164 -1.41646007  1.57211267  0.98331405  0.92809662 -0.01017539
  0.25381958 -0.18431586 -5.20646252]

So, I guess that's the shape cofficients of sample 1, right?

Moreover, according to your suggestion, I made each sample has the same number of points and generate different shapes under different pca_model_components, here is the relationship between explained variance and number of components:

Here is the new shape under modes 0 and modes 1

That's all the results.
Best regards!
feng

Josh Williams · Answer 24 · Mon Dec 04 2023 16:02:27 GMT+0800 (China Standard Time)

Looks like a good start! The coefficients make sense (between +/- 3). The last one is quite large, but I think that is since that mode accounts for very very low variance, so it is a floating point error.

I uploaded the function for mesh morphing (to convert a new SSM output to a surface mesh). I have not added any example tutorial or testing yet. The code is here.

fengyasi · Answer 25 · Tue Dec 05 2023 11:19:05 GMT+0800 (China Standard Time)

Hi @jvwilliams23

Thanks for your work!

I have tried to use the code morph_mesh.py to generate new surface mesh, however, maybe there is something wrong with the code.

For example, sometimes there is confusion between landmarks_target and landmark_target in the code,

and there are also some places in the comments that I don’t understand. In the function def scale_and_align_coordinates

but the function return landmarks_target, landmarks_template, coords_template, std_scale

So can you modidy the code when it's convinent for you, it will help me a lot!

Best regards

feng

Josh Williams · Answer 26 · Tue Dec 05 2023 18:22:14 GMT+0800 (China Standard Time)

Updated the code, should work now. Need to update docs and then will merge to main

Josh Williams · Answer 27 · Thu Dec 07 2023 04:45:17 GMT+0800 (China Standard Time)

Updated morph_mesh code is available in latest release (0.2.2), available on PyPI. Docs have also been updated, to show how it works with a jupyter notebook tutorial. Hope that helps @fengyasi .

Josh Williams · Answer 28 · Wed Mar 20 2024 20:00:21 GMT+0800 (China Standard Time)

If there is nothing else, I will close this in 1 week due to inactivity. @fengyasi