koraykv / unsup

Some unsupervised learning modules using Torch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory boom when using `unsup.zca_whiten`

russellfei opened this issue · comments

Very excited that unsup module has zca_whitening function
( I used torch since May 2014, but never thought that whitening function is accessible 0_o)

Just leave out happy installation

Test code

requrie 'image'
require 'unsup'
-- actually I used these lines before
-- im.image.lena()
-- im = image.scale(image.lena(),128,128)
im = image.scale(image.lena(),32,32)
out = unsup.zca_whitening(im)
image.display(out)

Error

  • When use full image size, i.e., 512x512, my 16G macbook just tell me Memory is full(WHAT???)
  • When use half image size, 128x128, the qlua quickly consumed 2.5+ GB memory ( I just killed that process. Wow, pretty scary)
    -- When use 32x32, then everything goes fine. (Thankfully)

Summary

  • I wrote one whiten function according to Andrew Ng's UFLDL based on torch, but that function runs within seconds and never consumes such HUGE memory.
  • Maybe there is something inappropriate way when solving correlation matrix or inverse. Or there's something wrong with UFLDL?

@russellfei maybe you can contribute your "better" whitening function to unsup :)

@soumith I'm NOT SURE about my implementation, but what I wrote is guided by the formula on UFLDL.

In addition, I came across memory boom when wrote my script, the key point is the view of input data structure, i.e., the dimension you want to remove correlations:
take a tensor bxdxhxw as an example:
if we remove the correlations between h dim, then even there's a large number of hxw;
if we want to remove the correlations between hxw (flatten tensor), the output is devastating.

Could you spare a minute to check my code? just ~20 short lines. I can mail it to you.
( sorry, I have no access to gist currently, otherwise that will be perfect share of scripts)

ZCA/PCA allocates NxN matrix for variance-covariance matrix.
When input is 512x512 = 262144px, torch requires 262144262144sizeof(float) = 256GB memory.

@nagadomi, According to what you said, PCA/ZCA_whitening removes correlation between pixels?

All right, it seems that this zca_whitening is limited to small patches or gray scale images, and my explanation above is quite specific about input format, i.e., 4-D tensors. The general zca_whiten function is correct but limited to small image patches and gray scale images.
Thanks @nagadomi I figured out this memory boom.

This issue should be over. Thanks all!