Click to See Complete Forum and Search --> : substrings of hashes..


Ravenous
04-25-2005, 09:53 AM
How unique are the strings the following code produces gonna be?substr(md5(md5(time())),0,10);I hash the hash 'cause the line right before this one hashes time() as well and I want to make sure the two strings are not equivalent.

The way I see it, there are (26+10)^10 (or 3,656,158,440,062,976 -- 3.6 QUADrillion) possibilities. Given that the site expects only about 10,000 users, I'd be pretty safe ensuring that each user has a unique (and pseudorandom) string associated with their account. Safe assumption?

My only worry is that the md5 algorithm may return two hashes that have the same 10 character substring to start. Any thoughts on that?

One work-around (kinda) for that would be to set the user's registration key to, say, 0000000000 once they have completed registration. That will help cut down the possibility of two users who have not completed the registration process having the same registration key at any given time.

laserlight
04-25-2005, 11:26 AM
I hash the hash 'cause the line right before this one hashes time() as well and I want to make sure the two strings are not equivalent.
Not much point, since the 2 original hashes should be very different.
If they somehow are the same, then your additional hashes will also be the same.

The way I see it, there are (26+10)^10 (or 3,656,158,440,062,976 -- 3.6 QUADrillion) possibilities.
Actually, there are 2**128 possible hashes, which is greater than 36**10.
On the opther hand, the number of possible hashes is greatly reduced since you're hashing the output of time(), which doesnt vary very much.
If the attacker knows the hour at which the hash was computed, he/she only has a few thousand pre-images to choose from.

Actually, what are you trying to do?

mrhappiness
04-25-2005, 11:29 AM
if the inner md5 returns the same result twice you will use the outer md5 for the very same data and therefor will end up with the very same result

md5(<something>) = XYZ
md5(<something_else>) = XYZ

md5(md5(<something>)) = md5(XYZ) = ABC
md(md5(<something_else>)) = md5(XYZ) = ABC

got it?
if you can do something likesubstr(md5(time().$id_of_newly_created_user), 0, 10)now you have a unique part even if time() is not unique

btw: maybe microtime is more suitable?

Ravenous
04-25-2005, 06:27 PM
The goal of all this is to ensure that users who begin the registration process cannot have their accounts hijacked by a third party who guesses a currently registered but non-validated account identifier.

For instance.. say I sent the user a link via email to validate their email address and then finish up registering their account. They would receive a link such as www.domain.com/confirm.php?key=k7eh736sdg. Using something like confirm.php?userid=1345 would be considerably easier to predict as the user IDs are sequential. I realize that the problem is minor, but just trying to lock things down as much as I can.

As for md5(x) and md5(md5(x)) returning the same thing, this I do not understand nor do I see evidence for this. I mean, a quick of md5("samplestring") and md5(md5("samplestring")) gives me "ba5759e55b83e28b84c717b95fd7bfd3" and "d75392ba3a6215f450ef12d4216c7acd" respectively. Perhaps I'm not getting something?

And mrhappiness, per your suggestion, I've changed it to microtime().$user_email_address for the time being until a better way of doing this comes along. Thanks for the tip.

mrhappiness
04-26-2005, 03:42 AM
Originally posted by Ravenous
As for md5(x) and md5(md5(x)) returning the same thing, this I do not understand nor do I see evidence for this. I mean, a quick of md5("samplestring") and md5(md5("samplestring")) gives me "ba5759e55b83e28b84c717b95fd7bfd3" and "d75392ba3a6215f450ef12d4216c7acd" respectively. Perhaps I'm not getting something?you use md5 twice because you fear if you use it only once you might get two identical results, right?
suppose you have two strings which have the very same md5 hash:
md5("samplehash") = "ba5759e55b83e28b84c717b95fd7bfd3"
if you have another string (e.g. samplehash2 which has the very same hash "ba5759e55b83e28b84c717b95fd7bfd3" you caclulate the md5 hash of "ba5759e55b83e28b84c717b95fd7bfd3" in both cases and of course will receive the same result twice
so it isn't more secure to use md5 more than once



And mrhappiness, per your suggestion, I've changed it to microtime().$user_email_address for the time being until a better way of doing this comes along. Thanks for the tip. assuming $user_mail_address is unique (you don't allow multiple accounts sharing the same mail adress, do you?) you have a unique part in the string you calculate the md5 hash from and that should be suitable

Weedpacket
04-26-2005, 07:54 AM
If you want a unique ID, why not use uniqid instead of replicating (parts of) that function?

Originally written by laserlight
On the opther hand, the number of possible hashes is greatly reduced since you're hashing the output of time(), which doesnt vary very much.
And then taking only the first ten characters, thus reducing the number of possible hashes to only 2**40 = a shave over a trillion.