Click to See Complete Forum and Search --> : Do you agree that my code is able to generate the most unique random string?
phproy
12-23-2004, 08:34 AM
Hi guys,
I am trying to create a function to generate unique random number.
As we all know, time is the most unique variable ever, so I try to base my code on time.
$id = md5(uniqid(microtime(), true));
echo dechex(crc32($id));
I used crc32 to make the string shorter. And I am only going to use this function on 1 website so it will be impossible to get a duplicate of the microtime().
I have looped this 10000 times and every generated string is different. I want to know if you guys will agree with me that this is 99.99% foolproof.
:D
laserlight
12-23-2004, 09:06 AM
If you want the unique part to be foolproof, then check the existing numbers generated for a match.
uniqid() already uses the current time, perhaps you could use rand(), or the Mersenne Twister pseudo-random number generator instead, with mt_rand()?
e.g.
$id = sha1(uniqid(mt_rand(), true));
If you really want random numbers, look for something like your own implementation of HotBits (http://www.fourmilab.ch/hotbits/), though even then there are doubts pertaining to bias in the reference implementation.
phproy
12-23-2004, 09:46 AM
There is no way I can track the generated strings in my situation.
Anyway is the function below 99.9% foolproof?
$id = sha1(uniqid(mt_rand(), true));
I tried reading hotbits but it is seriously hurting my brain.
laserlight
12-23-2004, 09:59 AM
Anyway is the function below 99.9% foolproof?
If you're taking a substring of the SHA1 hash, then maybe not.
But if you're using the full SHA1 hash, then it probably is more collision-resistant than MD5.
phproy
12-23-2004, 10:06 AM
If SHA1 is so collision resistant, then why is everybody using MD5 for authentication scripts?
And is there any easier way to generate a 10 chars unique random numbers than using hotbits?
laserlight
12-23-2004, 10:38 AM
If SHA1 is so collision resistant, then why is everybody using MD5 for authentication scripts?
1. MD5 is faster than SHA1
2. MD5 is also pretty collision resistant, just that some parts of it have been shown to be not so
3. The reasons behind revising SHA to SHA1 was not published
4. sha1() is only available from PHP 4.3.0 onwards
And is there any easier way to generate a 10 chars unique random numbers than using hotbits?
The problem is uniqueness.
If you cant check, you'll never know.
phproy
12-23-2004, 10:49 AM
Daaaaaaaaaaaamn.....
How did you manage to get those details?
Even after loitering at at php.net for months I have never read about info like that. There must be some PHP inner circle thing going on.
Originally posted by laserlight
1. MD5 is faster than SHA1
2. MD5 is also pretty collision resistant, just that some parts of it have been shown to be not so
3. The reasons behind revising SHA to SHA1 was not published
4. sha1() is only available from PHP 4.3.0 onwards
laserlight
12-23-2004, 11:15 AM
How did you manage to get those details?
Even after loitering at at php.net for months I have never read about info like that. There must be some PHP inner circle thing going on.
Aside from #4 the rest have little to do with PHP, but are related to cryptography instead.
Weedpacket
12-24-2004, 04:28 AM
$id = sha1(uniqid(mt_rand(), true));
If it's uniqueness you want then you might as well just go
$id = uniqid(mt_rand(),true);
sha1() is deterministic, and so doesn't add any information (i.e., increase "uniqueness" if you want to commit the semantic crime of quantifying an absolute). In fact, although it is pretty unlikely, there is a chance that it will destroy information, so that there are fewer possible strings as a result. (Actually, it may be impossible for the range of strings that uniqid() produces, but I haven't seen a proof.)
All sha1() or md5() would do in this case is (a) normalise the string to a fixed length, and (b) make it harder to guess the generator. (a) is a bigger benefit.
Weedpacket
12-24-2004, 05:07 AM
Originally posted by phproy
I tried reading hotbits but it is seriously hurting my brain. That would be the radioisotopes. You're supposed to keep them shielded, you know :)
But it shouldn't be too hard to make a request for, e.g., http://www.fourmilab.ch/cgi-bin/uncgi/Hotbits?nbytes=16&fmt=hex and parse out the bit you want.
saloon12yrd
01-07-2005, 09:02 AM
Sorry, but since when is time unique?
The company I work for has developed eCommerce Systems with several terabyte of data per day will a _lot_ of requests / sessions.
I assume that your application (as most projects are) will not be of such size but in general I believe it to be a good practise to code in a way that keeps the code future-proof and ready to grow.
If you mean your "99,99% foolproof" statement literally, you'll be fine using microtime() like you suggested and can stop reading here.
However if you're aiming at 100% I suggest this:
Create a database table with a single column of the largest INT type your DB supports (probably BIGINT). Don't make it an auto-value.
Don't forget the primary key.
Create a stored procedure that
a) LOCKs the table
b) INSERTs a new row into that table with MAX(num) + 1
c) reads the new MAX()
d) UNLOCKs the table
e) returns the value read in c) to the client
This procedure is considerably slower than your suggested method but it will guarantee you a 100% unique number.
To avoid one standard-response: No, you will not run out of numbers. 64bit is _really_ big.
And no, harddisk space won't be an issue.
Regarding SHA1(): You're way above 99,99% with both SHA1() and MD5().
If I remember correctly, MD5 collides (statistically) once every 2^128 _different_ input vectors, SHA1() does so for every 2^160 vectors.
That's a lot. But bear with me, my cryptography lessons are a little aged, so the numbers might be wrong. However, you definately _are_ above 99,99%.
Hope I could help.
Dominique
saloon12yrd
01-07-2005, 10:04 AM
Some further thoughts about the uniqueness of constructs like
md5(uniqid(microtime(), true));
I will _only_ cover uniqueness and leave out security completely.
First, a definition of "collision": A behaviour where a function returns an identical output R for two different inputs V1 and V2 is defined as collision.
Now on to the thoughts:
1) The beginning:
uniqid()
The PHP manual states "uniqid() returns a [...] unique identifier based on the current time in microseconds."
Judging from the output of uniqid() (without parameters) it seems to be doing nothing more than a rather simple transformation of some kind.
So: uniqid() might (!) return the same result for the same microsecond. Without diving into PHP sources I don't know but from an educated guess I'm pretty sure of it.
In other words, uniqid() might have the same uniqueness as microtime(). For the further thoughts however, this is irrelevant.
2) Next:
uniqid('prefix')
The prefix is not merged with the microtime that uniqid() internally uses, but it merely prefixes it.
So: For a fixed prefix, the uniquess is identical to 1)
For a variable prefix however, the uniqueness increases. This is what you do with uniqid(microtime()).
By what amount it increases is unclear. Sadly this is significant for an evaluation. But again, for the further thoughts this is irrelevant.
3)
uniqid(microtime(), true)
Added entropy. Extract from the manual "...which should make the results more unique.". Hmm, rather vague. Maybe the entropy parameter increases the entropy by factors, maybe not. Most likely it will. But since I don't have enough time to read and understand the internals of the combined linear congruential generator I'm thinking worst-case and assume that it won't.
4) Hashing.
The characteristics of any hash function can be summarized as follows: For a given input string of arbitrary length the hash function will generate a string-representation of fixed length. For MD5() the length is 128 bits, which is 32 byte in hex-representation.
Read it again. A fixed string will always return the same output. That's the whole point of hashing.
Since MD5() (and most other hash functions) uses a finite charset _and_ a fixed length for the returned values, the number of different outputs that MD5() can return is limited. To be precise, MD5() can return 2^128 different hashes.
Since it is very easily possible (I don't say it's fast) to generate _more_ than 2^128 different strings, MD5() _will_ collide at some point. SHA1() by the way will too, but later.
Now a look at the code:
md5(uniqid(microtime(), true));
Lets make it simpler:
$vector = uniqid(microtime(), true);
$hash = md5($vector);
As long as $vector does not change, $hash won't change as well. Remember, that's the whole point of hashing. In short: The uniqueness of $hash is roughly equal to the uniqueness of $vector. You're not gaining anything.
5) Further thoughts:
In 4) I state "The uniqueness of $hash is roughly equal to the uniqueness of $vector.". Why only roughly? Because to be precise it will be _lower_ than the uniqueness of $vector. Why? Because I believe (and please correct me) that we're adding probabilities for collisions.
Lets say $vector has a likeliness of X to collide. For two out of X different inputs, identical outputs will be generated.
Lets further say that $hash has a likelyness of Y to collide.
This means:
If $vector1 and $vector2 are identical, then both hashes from those vectors will be identical. Again: This is the whole point of hashing.
But if $vector1 and $vector3 are _not_ identical, the two hashes from those vectors might _still_ be identical, due to a collision of MD5().
Conclusions:
Basically you might be worsening things by hashing a string that is already considered to be unique. I'm not saying that hashing is a bad thing, all I say is that you have to be careful when to use it. Again, if you feel I might be mistaken please correct me.
What might remain is the desire to keep the string at a fixed-length. For that, it might not look as good as hashing but from a uniqueness point of view you might as well use sprintf() to pad the string.
Regards,
Dominique
Weedpacket
01-07-2005, 10:19 AM
Well, that pretty much expands on my earlier post - with the proviso of using microtime(true) instead of mt_rand. I'd still use the external entropy supply. It's only a shame uniqid() doesn't use the same one specified for the session ID generator.
saloon12yrd
01-07-2005, 10:53 AM
I agree.
You pointed out the fact exacly like I did (mt_rand() vs. microtime() asside). And again I agree that the fixed length issue is generally more beneficial than "true" uniqueness.
But I thought it'd might help people to understand _why_ certain things might be not such a good idea in certain situations. Hovever for the every-day usage my posting is indeed a little too theoretical.
To add a more practical suggestion, what I personally like to use is something like
md5('magic word' . $_SERVER['HTTP_HOST'] . microtime());
microtime() varies very quickly, ensuring entropy
$SERVER['HTTP_HOST'] makes sure that two requests in the same microsecond on two servers generate different values
'magic word' is a constant intended to make guessing the vector a little more complicated for blackhats.
finally md5() ensures a safe, fixed-length string.
I consider this perfectly safe for most applications.
Regards,
Dominique
PHP Builder
Copyright WebMediaBrands Inc. All Rights Reserved.