39 minutes ago by boulos
I feel like youâd want something a bit safer than âwe donât store the keys and just rely on the hash to be really goodâ [1], putting âplease do not use this for serious tasksâ in a comment embedded in the header file isnât a clear enough warning.
Itâs not clear to me that that probability of collision assumptions hold. Itâs basically assuming that the hashing is perfect and distributes any inputs to the full 64-bit space with uniform probability. Thatâs the usual hash map / randomized algorithm hope, but does BigCrush or similar avalanche testing really prove that? (Presumably not, otherwise there wouldnât be image attacks for things like md5).
[1] https://github.com/wangyi-fudan/wyhash/blob/d2a305811972f391...
23 minutes ago by undefined
a few seconds ago by graderjs
I agree that not storing keys is riskier, and it's not a risk that's mitigated even if it is a perfect uniform distribution because you can still have collisions. Maybe I don't understand the reasons for that design but personally I don't think it's a good design for a hash table. But even so the hash underlying it is a very good hash.
Big crush and smhasher give a good indication of uniform distribution, but nothing can guarantee no collisions because you're always going to have collisions it's always a possibility. Even if you have a perfect permutation of the 64 bit space, the minute you go beyond 64 bits of keys you're going to collide within 64 bits of hash.
By all the tests they run md5 is a much poorer hash than many others. But it's a or it was a cryptographic hash. It's different.
34 minutes ago by laserbeam
Meow hash claims 3-4x faster hashing over this, still passes smhasher, and is a few years old. https://mollyrocket.com/meowhash
And there are a bunch of other good suggestions here in the comments looking at around the same 50-60gb/sec speed
a minute ago by wangyi_fudan
moew is not good at short keys. portability is also a concern
39 minutes ago by zucker42
Based on the comment in the source, the hash table doesn't handle hash collisions. And furthermore, I don't see the code which benchmarks this versus other hash tables, just the claim that it is 2-3x faster. Specifically, comparisons for different tables sizes using the same hash function would be good.
The API this presents is not really inspiring either.
36 minutes ago by undefined
an hour ago by kouteiheika
According to this ahash is faster:
https://github.com/tkaitchuck/aHash/blob/master/compare/read...
Did anything change?
an hour ago by dietrichepp
I see my name in the "thanks" section, it's nice, but I don't know why I'm there.
36 minutes ago by soedirgo
Probably for your PR from way back when: https://news.ycombinator.com/item?id=19359870
44 minutes ago by pansa2
See also âNew fastest portable hash: wyhashâ from 2019:
an hour ago by jandrewrogers
A change in the last ten years is that it now makes a lot of sense, due to the development of nuanced algorithms, to optimize your hash infrastructure for the details of the specific application. This has been a positive development generally when it comes to efficient software.
an hour ago by fakename11
They should throw the "Simple Example" through a code formatter. Would look more professional. Other than that, this is a nice project, congrats to the author.
Daily digest email
Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.