Crappy MNIST autoencoder to do weight learning on 784-32-784 sample/sgdweights has W, b, Wp, bp Autoencoder is run like m = np.tanh(np.dot(x, W) + b) y = np.dot(m, Wp) + bp MSE loss is 0.5122 on mnist train after training for 20 epochs Drops to 0.5076 after 100 epochs. What is the theoretical minimum? Adam is getting 100 epochs, MSE 0.4962, 0.4339 200 epochs, MSE 0.4956, 0.4335 300 epochs, MSE 0.4951, 0.4330 400 epochs, MSE 0.4951, 0.4330 Currently losing to PCA-32, which has MSE of 0.49 PCA-128, MSE of 0.21 PCA-256, MSE of 0.08 784-128-32-128-784, 100 epochs, MSE 0.3742, 0.3180 784-256-128-32-128-256-784 100 epochs, MSE 0.2940, 0.2450 200 epochs, MSE 0.2602, 0.2227 300 epochs, MSE 0.2409, 0.2129 400 epochs, MSE 0.2299, 0.2096 500 epochs, MSE 0.2207, 0.2063 Adam (is much better) 100 epochs, MSE 0.2449, 0.2195 200 epochs, MSE 0.2161, 0.2088 300 epochs, MSE 0.1989, 0.2053 400 epochs, MSE 0.1881, 0.2042 500 epochs, MSE 0.1803, 0.2048 784-256-32-256-784, Adam, let's use this 100 epochs, MSE 0.2681, 0.2344 500 epochs, MSE 0.2100, 0.2239 784-256-128-64-32-64-128-256-784, Adam (damn that's a deep network) 100 epochs, MSE 0.2514, 0.2235 500 epochs, MSE 0.1795, 0.1974 1000 epochs, MSE 0.1584, 0.2006 MNIST 2 experiments PCA-32 0.459 RBM-32 New Preproc, much more reasonable 0.0320/0.0741 -- big ReLU 784,256,128,64,32,4, 1000 epochs 0.0176/0.0172 -- linear 784-32-784, Adam, 25 epochs 0.0173/0.0169 -- 784-32-784, Adam, 100 epochs 0.0172/0.0172 -- PCA-32 0.0140/0.0168 -- 784-256-128-32-8, Adam, 1000 epochs -- SAVED2 0.0077/0.0077 -- 784-256-32-256-784, Adam, 500 epochs 0.0071/0.0071 -- 784-256-32-256-784, Adam, 1000 epochs -- SAVED 2's only 0.0363 -- 784-256-4-256-784, Adam, 3000 epochs 0.0265 -- big ReLU 784,256,128,64,32,4 0.0153 -- linear 784-32-784, Adam, 50 epochs 0.0153 -- PCA-32 0.0106 -- 784-256-32-256-784, Adam, 500 epochs 0.0082 -- 784-256-32-256-784, Adam, 1000 epochs