Hacker News

39 minutes ago by boulos

I feel like you’d want something a bit safer than “we don’t store the keys and just rely on the hash to be really good” [1], putting “please do not use this for serious tasks” in a comment embedded in the header file isn’t a clear enough warning.

It’s not clear to me that that probability of collision assumptions hold. It’s basically assuming that the hashing is perfect and distributes any inputs to the full 64-bit space with uniform probability. That’s the usual hash map / randomized algorithm hope, but does BigCrush or similar avalanche testing really prove that? (Presumably not, otherwise there wouldn’t be image attacks for things like md5).

[1] https://github.com/wangyi-fudan/wyhash/blob/d2a305811972f391...

23 minutes ago by undefined

[deleted]

a few seconds ago by graderjs

I agree that not storing keys is riskier, and it's not a risk that's mitigated even if it is a perfect uniform distribution because you can still have collisions. Maybe I don't understand the reasons for that design but personally I don't think it's a good design for a hash table. But even so the hash underlying it is a very good hash.

Big crush and smhasher give a good indication of uniform distribution, but nothing can guarantee no collisions because you're always going to have collisions it's always a possibility. Even if you have a perfect permutation of the 64 bit space, the minute you go beyond 64 bits of keys you're going to collide within 64 bits of hash.

By all the tests they run md5 is a much poorer hash than many others. But it's a or it was a cryptographic hash. It's different.

34 minutes ago by laserbeam

Meow hash claims 3-4x faster hashing over this, still passes smhasher, and is a few years old. https://mollyrocket.com/meowhash

And there are a bunch of other good suggestions here in the comments looking at around the same 50-60gb/sec speed

a minute ago by wangyi_fudan

moew is not good at short keys. portability is also a concern

39 minutes ago by zucker42

Based on the comment in the source, the hash table doesn't handle hash collisions. And furthermore, I don't see the code which benchmarks this versus other hash tables, just the claim that it is 2-3x faster. Specifically, comparisons for different tables sizes using the same hash function would be good.

The API this presents is not really inspiring either.

36 minutes ago by undefined

[deleted]

an hour ago by dietrichepp

I see my name in the "thanks" section, it's nice, but I don't know why I'm there.

36 minutes ago by soedirgo

Probably for your PR from way back when: https://news.ycombinator.com/item?id=19359870

44 minutes ago by pansa2

See also “New fastest portable hash: wyhash” from 2019:

https://news.ycombinator.com/item?id=19357895

an hour ago by jandrewrogers

A change in the last ten years is that it now makes a lot of sense, due to the development of nuanced algorithms, to optimize your hash infrastructure for the details of the specific application. This has been a positive development generally when it comes to efficient software.

an hour ago by fakename11

They should throw the "Simple Example" through a code formatter. Would look more professional. Other than that, this is a nice project, congrats to the author.

Daily digest email

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.