gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Coding > string to seed

#64765 - sgeos - Mon Dec 26, 2005 2:48 pm

What is a good string to seed function? I'm currently using something like:
Code:
u32 stringToSeed(const char *pString, u32 pSeed)
{
   u32 result = pSeed;
   int i;

   for (i = 0; pString[i]; i++)
      result = (result + pString[i]) * 69069;
   return result;
}

Does anyone have anything better?

-Brendan

#64771 - keldon - Mon Dec 26, 2005 3:13 pm

well considering your string is mostly going to contain letters of the alphabet and numbers I would suggest transposing results to letter 'a' and doing modulo 32 so that 'A' and 'a' have the same value. This can then be scaled from 32 to 69069 so that you have a more versatile range. You could also use letter frequency in reverse so that strings would yield more balanced results that are not highly distributed around 's' and 'e' if you **really** need to.

What about some of the error correction calculations such as checksum etc?

#64775 - Ultima2876 - Mon Dec 26, 2005 4:40 pm

Out of interest, why 69069?

EDIT: "The hardcore programers are the ones who were logged in at 3am on Christmas day!!!" -- I'll toast that ;P

#64776 - kusma - Mon Dec 26, 2005 5:01 pm

because 69069 is a prime number, multiply-adding with a prime number reduce the statistical properties of the checksum, making less collitions. i guess ;)

#64780 - poslundc - Mon Dec 26, 2005 7:02 pm

69069 is a number frequently used as a multiplier in linear congruential generators for random numbers. It is actually known as the "Super-Duper" LCG. More info can be found at: http://random.mat.sbg.ac.at/~charly/server/node3.html

(I've typically favoured the BCPL instead of Super-Duper and been pretty happy with the results.)

It shouldn't be necessary or relevant to scale your seed value by the multiplier - you'll be doing that anyway in the implementation of the random algorithm, so I don't imagine you'd get better results by involving the multiplier in your seed selection.

Personally, I would just add the characters... or if I wanted something to prevent collisions from things like having the same string backwards, I'd run sha1 or MD5 or some other hashing algorithm on the string to get something that'll be truly unpredictable for every string.

Dan.

#64835 - sgeos - Tue Dec 27, 2005 8:15 am

keldon wrote:
well considering your string is mostly going to contain letters of the alphabet and numbers I would suggest transposing results to letter 'a' and doing modulo 32 so that 'A' and 'a' have the same value.

Why assume English? It's going to contain values between a minimum and a maximum. What those values are will depend on the language. If I really want 'A' and 'a' to have the same value I'll write a toUpper() function and pass in an upper case string.

keldon wrote:
This can then be scaled from 32 to 69069 so that you have a more versatile range.

Note the u32 return type. I want 32 bits. Scaling belongs in another function.

keldon wrote:
You could also use letter frequency in reverse so that strings would yield more balanced results that are not highly distributed around 's' and 'e' if you **really** need to.

Neat idea, but what happens if the player really likes wxyz and q? I suspect inverse scaling could work really well give another set of requirements (which I probably should have mentioned).

Quote:
What about some of the error correction calculations such as checksum etc?

What do you assume I'm going to use this for? I want 32 bits to seed an RNG.

Ultima2876 wrote:
Out of interest, why 69069?

Because it works. I should have hidden it behind a macro.

kusma wrote:
because 69069 is a prime number

69069 is not prime. By looking at it you can tell that it factors into 69 * 1001. Its prime factorization is 3 * 7 * 11 * 13 * 23. In hex it is 0x10DCD.

poslundc wrote:
Personally, I would just add the characters...

That's basically what I'm doing. I want a 32 bit return value, and I want different permutations of the same letters to return different results. I'll look into sha1 and MD5. Thanks!

-Brendan

#64841 - poslundc - Tue Dec 27, 2005 10:57 am

Between MD5 and sha1 I'd probably favour MD5... it's been a while since I looked, but I think MD5 is a bit less computationally expensive (and a bit less secure, which of course doesn't matter a whit for your purposes) and it generates only 16 bytes as opposed to 20 (seeing as you'll only be needing the first four anyway).

Since you'll presumably be doing all of this during initialization, computation time won't be a factor for you. Still, I find it's occasionally useful to have these hashing algorithms around... it's surprising how frequently they come up in various parts of game functionality.

Dan.