romeric / Fastor

A lightweight high performance tensor algebra framework for modern C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why Tensor's have this exact size?

Bbllaaddee opened this issue · comments

Description

Hey there!

I was wondering around small tensor objects, while discovered this particular peculiarity.

Here's the little program:

#include <iostream>
#include <Fastor/Fastor.h>

using namespace Fastor;

int main()
{
	std::cout << sizeof(Tensor<double>) << std::endl;
	std::cout << sizeof(Tensor<double,1>) << std::endl;
	std::cout << sizeof(Tensor<double,2>) << std::endl;
	std::cout << sizeof(Tensor<double,3>) << std::endl;
	std::cout << sizeof(Tensor<double,4>) << std::endl;
	std::cout << sizeof(Tensor<double,5>) << std::endl;
	std::cout << sizeof(Tensor<double,6>) << std::endl;
	std::cout << sizeof(Tensor<double,7>) << std::endl;
	std::cout << sizeof(Tensor<double,8>) << std::endl;
	std::cout << sizeof(Tensor<double,9>) << std::endl;
	std::cout << sizeof(Tensor<double,10>) << std::endl;
}

which outputs the following:

16
16
16
32
32
48
48
64
64
80
80

So basically, different-sized tensors, for example, Tensor<double,3> and Tensor<double,4> have the same size!
If the sequence is continued, the pattern repeats -- each two neighbors have the same size.

Use case

I'm extensively using 3D objects in my simulation, so I finally decided to implement some encapsulation for 3D-vector.

But from what I've learned, you shouldn't reinvent the wheel (so use libs for tensor contraction, for example xD).
So, one of my ideas was to try Fastor::Tensor<double,3>, as it's highly optimized and already has all operations needed.
That's where my little test came in.

If this is the case, it seems that this opportunity is very much out of the question, because it will take 33% more memory!
Imagine a std::vector of such tensors and some large number of particles.

Questions

  1. Is this the desired behavior, and if so, why?
  2. Am I right, that this is then unusable for something like my case?
  3. Is this cache-friendly with extra-data?

Maybe to ensure 16-bit alignment?

If not, I'd guess it is for SIMD optimization. If the padded parts of the SSE or AVX vector are stored in memory as zeros, you can skip setting the padded values to zero?

commented

Yes this is expected. It is due to SIMD vectorisation. If you want the tensors to have the expected sizes you can use the pragma FASTOR_DONT_VECTORISE. If your system is so constrained (stack-size wise) then this is the only way to get around that. You will still have access to all features of Fastor.

Scaling up the tensors to such sizes is extremely cache friendly as the data can be packed nicely into the CPU vector register once loaded (in case it is not already loaded) and masking the upper part of the registers can be avoided.