Caligatio / jsSHA

A JavaScript/TypeScript implementation of the complete Secure Hash Standard (SHA) family (SHA-1, SHA-224/256/384/512, SHA3-224/256/384/512, SHAKE128/256, cSHAKE128/256, and KMAC128/256) with HMAC.

Home Page:https://caligatio.github.io/jsSHA/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance

z3dev opened this issue · comments

Can you supply some performance statistics for the various input types and output types?

Basically, I’d like to hash numeric values, but these values (types) are not supported by API.

All of the input types get transformed into a internal representation (a packed native JavaScript number) which is then used throughout the code. The input/output type that requires the least overhead is "UINT8ARRAY" as it's just a shift+or operation. All the converts are in converters.ts

As a side note, numbers in of themselves aren't hashable. Are they integers (32 or 64 bit?), floats, or something else? Little or big endian? The SHA-1 and SHA-2 families all deal with byte streams and SHA-3 deals with bit streams (my implementation is limited to bytes).

A few more details...

We have 2D and 3D data structures, implemented using Float32Array. Basically, arrays of points, either X,Y or X,Y,Z.

I need to hash the data structures.

Float32Array.prototype.buffer gives you an ArrayBuffer which is something jsSHA can take.

Cool. I’ll try this out then. Thanks.

Some performance statistics would be good. I’m sure this library would be useful in many cases.

@Caligatio here's a little feedback from my quick testing.

I really like the API, which is easy to work with. Also, the support for ArrayBuffer seems to be perfect as well.

Here's the code of the quick hash function.

const hashGeom3 = (geom) => {
  const polygons = geometry.geom3.toPolygons(geom)
  const buffer = new Float64Array(3)
  const hash = new jsSHA("SHA-256", "ARRAYBUFFER")
  polygons.forEach((polygon) => {
    const points = geometry.poly3.toPoints(polygon)
    points.forEach((point) => {
      buffer[0] = point[0]
      buffer[1] = point[1]
      buffer[2] = point[2]
      hash.update(buffer)
    })
  })
  const key = hash.getHash("HEX")
  return key
}

Let me know if you see any issues with the code.

As far as performace, here's a few numbers. (elaspe times are milliseconds)

  • polygons 512, sha256 elaspe 19, json elaspe 3
  • polygons 2592, sha256 elaspe 36, json elaspe 11
  • polygons 10368, sha256 elaspe 63, json elaspe 54
  • polygons 64800, sha256 elaspe 309, json elaspe 280

(json is JSON.stringify() on the geom object)

This is strictly feedback.

I also looked at the code, and tried to determine the memory footprint of the sha256 class. It seems very compact. Are there any buffers kept?

P.S. This is definitely faster then MD5 hashs, even the fastest.

I'm glad it's giving you the speed you want/need!

In terms of buffers, every 4 bytes data gets packed into a JavaScript number which is actually a float. SHAKE-128 has the largest state at 168 unpacked bytes. The library will also store incomplete chunks of data which is guaranteed to be no larger than the underlying state size.

It will also temporarily buffer any input while it processes it but doesn't keep it around any longer than that processing call. For example, if you pass in 1024 bytes of data, it will create a temporary variable that is 256 JavaScript numbers.

Good information. Thanks.

I still think that a performance test would be worth the effort, as lots of people search for hashing algorithms, and very few implementations publish performance information. For instance, knowing the relative performance of the various algorithms would help decide which to use when.

Just for information (and because I run into this thread), I was comparing different library and algorithm performance for file hashing.

So in my case, I'm more interested by speed and low collision risk. Just for information if someone else end up here.

Here a small extract of my simple benchmark

 >> Sample 3 - 17.11 MB
* XXHash3: 23.796ms
* XXHash128: 19.111ms
* md5-file.MD5: 43.050ms
* Node.MD5: 48.725ms
* jssha.SHA1: 291.356ms
* Node.SHA1: 35.071ms
* jssha.SHA256: 388.465ms
* Node.SHA256: 56.407ms

 >> Sample 4 - 84.32 MB
* XXHash3: 98.887ms
* XXHash128: 97.379ms
* md5-file.MD5: 207.953ms
* Node.MD5: 213.164ms
* jssha.SHA1: 1400.008ms
* Node.SHA1: 165.200ms
* jssha.SHA256: 1865.832ms
* Node.SHA256: 261.393ms

 >> Sample 5 - 45.02 MB
* XXHash3: 54.408ms
* XXHash128: 47.330ms
* md5-file.MD5: 118.463ms
* Node.MD5: 111.317ms
* jssha.SHA1: 743.692ms
* Node.SHA1: 87.461ms
* jssha.SHA256: 1019.186ms
* Node.SHA256: 134.761ms

And I get similar results on multiple machines.

So my conclusion

  • XXHash3 is the fastest, not a surprise it's designed for that
  • Node crypto module perf are quite good
  • JSSha is really slow, almost 8~10 times than node for the same task, but it works on the web

Interesting but are you really doing like comparisons? It seems that XXHash is a 'binding' to the C implementation, so it should be by far faster. Maybe you should try XXHashJS which is written in 100% JavaScript.

And yeah. Some performance for the various algorithms numbers would be very helpful.

@z3dev stole the words right out of my text box :) Both XXHash and the native node implementations are compiled code and will always beat interpreted languages in terms of performance. If you're in a position to use a C-based implementation, that is definitely the way to go.

Wow that was some quick answers guys. 😄
Indeed, I almost put a sentence about interpreted language in my comment, maybe I should have

I totally agree, both xxhash and node are bindings of C++ implementation and most of the perf probably comes from here. I knew from the start it would be slower, I just wanted to know how big was the difference in a practical usage. And with so much IO involved, I was expecting a x3~4, but not x10 ... at least I know

As I said I will continue to use jssha mostly for the web, but I will be more careful with the amount of data to hash.
Thinking about it, I need to check if there are some decent wasm implementation of algorithm like sha1/sha256 out there, it could be the best of both world.
Because ok it's fast, but compiling and maintaining node addon is quite a pain 😼