redorav / hlslpp

Math library using hlsl syntax with SSE/NEON support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add AVX/AVX2 version of functions

redorav opened this issue · comments

I wonder if it would be worth it to have float8 (for avx256) and float16 (for avx512) section. Would be a superset of actual hlsl, but could be handy. Maybe even add some calls to query their availability at runtime.

Just wondering if it would be possible and if there are negative implications going down this path. Asking because hlslpp is a very accessible, and readable way to write simd code in general. Probably the easiest I've used so far, tbh, and I'm getting great results. Could be a good way to extend it to the future, assuming it's possible.

@diogovtx Sure, I've always thought of extending hlsl++ to things that technically aren't in hlsl but easily derive from it. To some extent I already have, take for example the quaternion class. I would like to do float8 and float16, although I'd need to see how to add support to it on platforms that don't have such wide registers.

I'm not quite sure how to do that last part. I started doing double vectors and got it relatively far using AVX, but I don't have a solution on how I might make it work on NEON. Should it even be supported? Should I be able to mix and match scalar versions of the library?

I guess you'd have to rely on compiler preprocessor to conditionally enable float8 and float16. They could simply be unavailable, or emulated using float4.

When enabled by the preprocessor, responsibility would fall on the programmer using hlslpp to ensure the supported path is taken at runtime. Some hlslpp runtime query calls could help with that, using API calls outside of hlsl spec ofc.

I'm not sure I understand what you mean by runtime query calls, do you mean to inform the user that there is hardware support for the feature they're trying to use?

Yes. That's what I meant. Sorry if I wasn't clear.

@diogovtx I have recently introduced the float8 type. Unfortunately the nice swizzles that were possible with smaller types end up in a combinatorial explosion (8^8) so instead there's a templated swizzle function that serves that purpose. I took a look at OpenCL and it's not clear what they provide in the docs, I think short of language support this can't be done. Anyhow, the functionality is there, give it a spin if you want :)

Closing due to inactivity (and AVX has been implemented in many parts of the codebase such as matrices and float8)