ZAL-NASK / CSMC

CSMC is a Python library for performing column subset selection in matrix completion tasks. It provides an implementation of the CSSMC method, which aims to complete missing entries in a matrix using a subset of columns.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSMC

CSMC is a Python library for performing column subset selection in matrix completion tasks. It provides an implementation of the CSSMC method, which aims to complete missing entries in a matrix using a subset of columns.

Columns Selected Matrix Completion (CSMC) is a two-stage approach for low-rank matrix recovery. In the first stage, CSMC samples columns of the input matrix and recovers a smaller column submatrix. In the second stage, it solves a least squares problem to reconstruct the whole matrix.

Alt text

CSMC supports numpy arrays and pytorch tensors.

Installation

You can install CSMC using pip:

pip install -i  csmc

Usage

  1. Generate random data
import numpy as np
import random 

n_rows = 50
n_cols = 250
rank = 3

x = np.random.default_rng().normal(size=(n_rows, rank)) 
y = np.random.default_rng().normal(size=(rank, n_cols)) 
M = np.dot(x, y)

M_incomplete = np.copy(M)
num_missing_elements = int(0.7 * M.size)
indices_to_zero = random.sample(range(M.size), k=num_missing_elements)
rows, cols = np.unravel_index(indices_to_zero, M.shape)
M_incomplete[rows, cols] = np.nan
  1. Fill with CSNN algorithm
from csmc import CSMC
solver = CSMC(M_incomplete, col_number=100)
M_filled = solver.fit_transform(M_incomplete)
  1. Fill with Nuclear Norm Minimization with SDP (NN algorithm)
from csmc import NuclearNormMin
solver = NuclearNormMin(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))
  1. Fill with Frank-Wolfe (Conditional Gradient Method)
from csmc import CGM
solver = CGM(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))

Algorithms

Examples

Configuration

To adjust the number of threads used for intraop parallelism on CPU, modify variable:

NUM_THREADS = 8

in settings.py

Citation

Krajewska, A., Niewiadomska-Szynkiewicz E. (2023). Matrix Completion with Column Subset Selection.

Krajewska, A. (2023). Efficient matrix completion for data recovery in data-driven IT applications. Systems Research Institute Polish Academy of Sciences.

About

CSMC is a Python library for performing column subset selection in matrix completion tasks. It provides an implementation of the CSSMC method, which aims to complete missing entries in a matrix using a subset of columns.

License:MIT License


Languages

Language:Python 100.0%