noloader / SHA-Intrinsics

SHA-1, SHA-256 and SHA-512 compression functions using Intel, ARMv8 and Power8 SHA intrinsics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Just a question..

QRCS-CORP opened this issue · comments

Hey,
I used this code in my SHA256 implementation:
https://github.com/Steppenwolfe65/CEX/blob/master/CEX/SHA256Compress.h

I made some adjustments for vectors, and it processes one block at a time, but otherwise, the implementation was fairly straightforward.
The containing class (SHA256) uses a run-time cpuid check to test for SHA-NI, SIMD capabilities, and available cores, and (all of the digest implementations in the library) can be multi-threaded in a tree-hashing arrangement.
I think the implementation should be OK, but I don't have the SHA-NI instructions on my dev-box, so if you ever get a chance, can you take a look to see if it looks proper.. it would be much appreciated (untested code makes me very nervous ;o).

Regards,
John

I think the implementation should be OK, but I don't have the SHA-NI instructions on my dev-box

Use the Intel emulators: Intel® Software Development Emulator.

Ok, I'll try it.. thanks

I recall seeing the emulator around, but never tried it. I plan to add AVX512 support to the cipher modes at some point, so this could be useful.. I'll test it out tomorrow.

@Steppenwolfe65 ,

Yeah, the latest processors I have lack AVX512, too. My last two purchases were a Celeron J3455 (for SHA) and an Athlon 845 X4 (for AMD RDRAND). Those 250 dollar purchases add up.

I can give you remote SSH access to the machines. I'll need you authorized_keys file. You will connect using an IP and port, like steppenwolf@71.244.244.203:1525. Email me at noloader, gmail account if you are interested.

@noloader AVX512 is only on xeons AFAIC, so server market, whereas library is being designed to target app developers, (a lot of automation and ease of use patterns), which is why it's just a future todo at this point. Still, would be fairly easy to implement, and interesting to see what AES-NI + AVX512 running a parallelized CTR across 72 cores would do.. ;o)
I'm using a i7-6700 in an HP all-in-one, nice dev box, but probably not so great for bench marking (shared video, noisy Win10 OS).
I appreciate the offer of a remote, but I'll try the emulator, and like I said, implementation was fairly simple, just removed the loop and added vectors, so, should be OK..