Add AVX/AVX2 version of functions

Question

Add AVX/AVX2 version of functions

redorav opened this issue 5 years ago · comments

Diogo Teixeira · Answer 1 · Mon Jul 15 2019 00:20:10 GMT+0800 (China Standard Time)

I wonder if it would be worth it to have float8 (for avx256) and float16 (for avx512) section. Would be a superset of actual hlsl, but could be handy. Maybe even add some calls to query their availability at runtime.

Just wondering if it would be possible and if there are negative implications going down this path. Asking because hlslpp is a very accessible, and readable way to write simd code in general. Probably the easiest I've used so far, tbh, and I'm getting great results. Could be a good way to extend it to the future, assuming it's possible.

Emilio López · Answer 2 · Mon Jul 15 2019 00:49:47 GMT+0800 (China Standard Time)

@diogovtx Sure, I've always thought of extending hlsl++ to things that technically aren't in hlsl but easily derive from it. To some extent I already have, take for example the quaternion class. I would like to do float8 and float16, although I'd need to see how to add support to it on platforms that don't have such wide registers.

I'm not quite sure how to do that last part. I started doing double vectors and got it relatively far using AVX, but I don't have a solution on how I might make it work on NEON. Should it even be supported? Should I be able to mix and match scalar versions of the library?

Diogo Teixeira · Answer 3 · Mon Jul 15 2019 01:16:58 GMT+0800 (China Standard Time)

I guess you'd have to rely on compiler preprocessor to conditionally enable float8 and float16. They could simply be unavailable, or emulated using float4.

When enabled by the preprocessor, responsibility would fall on the programmer using hlslpp to ensure the supported path is taken at runtime. Some hlslpp runtime query calls could help with that, using API calls outside of hlsl spec ofc.

Emilio López · Answer 4 · Mon Jul 15 2019 01:26:53 GMT+0800 (China Standard Time)

I'm not sure I understand what you mean by runtime query calls, do you mean to inform the user that there is hardware support for the feature they're trying to use?

Diogo Teixeira · Answer 5 · Mon Jul 15 2019 03:16:06 GMT+0800 (China Standard Time)

Yes. That's what I meant. Sorry if I wasn't clear.

Emilio López · Answer 6 · Sun Oct 13 2019 20:48:10 GMT+0800 (China Standard Time)

@diogovtx I have recently introduced the float8 type. Unfortunately the nice swizzles that were possible with smaller types end up in a combinatorial explosion (8^8) so instead there's a templated swizzle function that serves that purpose. I took a look at OpenCL and it's not clear what they provide in the docs, I think short of language support this can't be done. Anyhow, the functionality is there, give it a spin if you want :)

Emilio López · Answer 7 · Fri Dec 23 2022 04:24:45 GMT+0800 (China Standard Time)

Closing due to inactivity (and AVX has been implemented in many parts of the codebase such as matrices and float8)