rdspring1/RzLinear

Model compression - matrix multiply with compressed matrix using state-of-the-art ROBE-Z compression.

Speed:

This implementation is quite slow and a better implementation is being created for practical usage
Update(Feb 22, 22) : Current implementation is 3x slower than complete matrix multiplication for large MM !! Faster implementation coming soon with measurements of forward and backward passes !!

Notes:

Use weight_size > 256 ( SMLS x SMLS) when using TILED=True

Sample Usage:

import torch
from RzLinear import RzLinear 
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

weight_size = 1000
input_dim = 1000
output_dim = 1000
chunk_size = 2

#hashed_weight = nn.Parameter(torch.from_numpy(np.random.uniform(-1/np.sqrt(input_dim), 1/np.sqrt(input_dim), size=((weight_size,))).astype(np.float32)))
hashed_weight = nn.Parameter(torch.from_numpy(np.arange(weight_size).astype(np.float32)))
rzlinear = RzLinear(input_dim, output_dim, chunk_size, hashed_weight).to("cuda:0");

input_v = torch.eye(input_dim).to("cuda:0")
output_v = rzlinear(input_v)

About

A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z

MIT License

Languages

Language:Python 41.5%Language:Jupyter Notebook 41.2%Language:Cuda 17.4%