Skip to main content

Timeline for Hash map without collision check

Current License: CC BY-SA 3.0

32 events
when toggle format what by license comment
Mar 15, 2018 at 9:23 audit Suggested edits
Mar 15, 2018 at 9:24
Mar 10, 2018 at 4:31 history tweeted twitter.com/StackSoftEng/status/972328990877863936
Mar 8, 2018 at 16:53 vote accept CodeSandwich
Mar 7, 2018 at 9:17 comment added Jules @TKK - with a secure hash (e.g. SHA256), the hash produced is believed to be effectively indistinguishable from random for any known input data.
Mar 7, 2018 at 0:06 answer added StackOverthrow timeline score: 0
Mar 6, 2018 at 23:54 comment added StackOverthrow @Jules The probability of a random hash collision is only relevant if your data is random.
Mar 6, 2018 at 14:57 comment added JimmyJames @CodeSandwich Sorry, I'm an idiot: we know that across the space of all 257-bit pre-images, every input image will collide with at one other pre-image input on average. For the 512-bit pre-image space each pre-image collides with the of 2^256 other pre-images on average.
Mar 6, 2018 at 0:58 comment added JimmyJames @CodeSandwich As I was thinking about my last comment, it occurs to me that a non-secure hash algorithm might be a better solution here. That is, the features of SHA-256 that make it good for security purposes don't help this solution and they may hinder it. That is, if you can predict the collisions more easily, you can be more certain about their absence. Security doesn't factor into your idea as I understand it.
Mar 5, 2018 at 23:03 comment added JimmyJames @CodeSandwich So we need to be clear with terminology here. Hashes are not 'random'. They are perfectly deterministic which is why they are useful i.e. A always maps to B. That's why using probabilistic arguments are tricky. We know that across the space of all 512-bit pre-images, every input image will collide with at least one other pre-images input. But which ones collide and how often? No one can tell you. The picture you link to just makes some really dubious arguments. Your question, would be improved by dropping it.
Mar 5, 2018 at 22:51 comment added CodeSandwich @JimmyJames Sure, it's always possible. But if we consider real world hash map keys random (when nobody tries to break it intentionally), the odds of finding a collision are really, really low, it's almost impossible. Applying pigeonhole principle here is theoretically valid, but practically useless, it would be a valid point if hash was small.
Mar 5, 2018 at 22:44 comment added candied_orange Let me make my point with some examples
Mar 5, 2018 at 22:17 comment added JimmyJames "There seems to be a lot of disregard towards 2^256 combinations." I don't think anyone here is arguing that 2^256 is not a really big number. The problem is that just because you put a number in a 32 byte array doesn't mean it will never equal another number placed in a 32 byte array. By that logic, it's impossible to see the light from distant stars due to how large space is and how unlikely it is for any of the photons they emit to strike the earth.
Mar 5, 2018 at 21:30 comment added JimmyJames @amon The math behind the birthday problem assumes all values in the image space are equally likely. Showing that a given hash algorithm has this property across all values you expect to put into it is where the issues tend to arise.
Mar 5, 2018 at 21:21 vote accept CodeSandwich
Mar 5, 2018 at 21:24
Mar 5, 2018 at 21:09 comment added JimmyJames @RobertHarvey Maybe I'm not understanding your point here but wouldn't that defeat the purpose i.e. you are talking about a collision check. Also, how would you know when you need the alternate hash on lookup?
Mar 5, 2018 at 21:01 comment added candied_orange Wow ok look I'm not arguing that 256 isn't big enough. I'm arguing that there are problems that don't go away no mater how many bits you throw at it. Humans aren't a good source of random. Hackers can make this a security issue regardless of what you think it is. I fully expect the sun to burn out before anyone counts to 256 bits. I also expect people to find ways to exploit those who think that's all this problem is about.
Mar 5, 2018 at 20:00 comment added amon @CandiedOrange Please take a look at the probability table and mathematical descriptions in the Birthday Attack Wikipedia article. We would have to calculate over a billion billion billion billion hashes before we can statistically expect to find a collision. That is a very large number. Please compare this with the probability of cosmic rays striking your CPU or your RAM and therefore corrupting your hash table. The difference is, errors due to cosmic rays do actually happen at a relevant scale.
Mar 5, 2018 at 19:56 comment added Andres F. @CandiedOrange As per the OP's question, cryptographic safety is not the point here.
Mar 5, 2018 at 19:40 history edited Robert Harvey CC BY-SA 3.0
deleted 53 characters in body
Mar 5, 2018 at 18:55 history edited CodeSandwich CC BY-SA 3.0
added 916 characters in body
Mar 5, 2018 at 18:41 comment added candied_orange @RobertHarvey It's a tad more complicated than that. A hash collision attack doesn't let you know it's happening. And if the attacker had access to billions of hashes finding a 1 in a billion collision isn't so unlikely. Math without context isn't good math.
Mar 5, 2018 at 18:21 comment added Robert Harvey @CandiedOrange: Which just means that you need code to recover from such an unlikely occurrence, code that would probably ask for a new hash.
Mar 5, 2018 at 18:16 answer added JimmyJames timeline score: -1
Mar 5, 2018 at 17:07 comment added Jules @KilianFoth - the pigeonhole principle (or rather the birthday paradox, which is related and more relevant here) suggests that with a 256 bit hash, you're likely to start seeing collisions when you have 2^128 items in storage. If you use the "short scale" for numbering, that's around 30 decillion items. Nobody has that much data. Nobody ever will have that much data.
Mar 5, 2018 at 17:03 answer added Jules timeline score: 7
Mar 5, 2018 at 6:00 review Close votes
Mar 10, 2018 at 3:01
Mar 5, 2018 at 4:51 comment added candied_orange Just because it's incredibly unlikely to flip a coin and have it land on edge doesn't mean it has no chance of happening. Things that have 1 in a billion chance of happening happen every day.
Mar 4, 2018 at 23:45 comment added Stack Exchange Broke The Law Will your hash table have 2^256 buckets? Or will it only have 64ish buckets and use the lower 6ish bits of the hash?
Mar 4, 2018 at 23:31 answer added amon timeline score: 8
Mar 4, 2018 at 23:27 comment added Kilian Foth No, you can't. Look up the "pigeonhole principle" to see why.
Mar 4, 2018 at 23:10 review First posts
Mar 5, 2018 at 7:16
Mar 4, 2018 at 23:09 history asked CodeSandwich CC BY-SA 3.0