How to choose a seed for generating random numbers in SAS
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Last week I was asked a simple question: “How do I choose a seed for the random number functions in SAS?” The answer might surprise you: use any seed you like. Each seed of a well-designed random number generator is likely to give rise to a stream of random numbers, so you can view the various streams as statistically equivalent.
Random means random
To be clear, I am talking about using a seed value to initialize a modern, high-quality, pseudorandom number generator (RNG). For example, in SAS you can use the STREAMINIT subroutine to initialize the Mersenne twister algorithm that is used by the RAND function. If you are still using the old-style RANUNI or RANNOR functions in SAS, please read the article “Six reasons you should stop using the RANUNI function to generate random numbers.”
A seed value specifies a particular stream from a set of possible random number streams.
When you specify a seed, SAS generates the same set of pseudorandom numbers every time you run the program. However, there is no intrinsic reason to prefer one stream over another. The stream for seed=12345 is just as random as the stream for the nine-digit prime number 937162211.
Some people see the number 937162211 and think that it looks “more random” than 12345. They then assume that the random number stream that follows from CALL STREAMINIT(937162211) is “more random” than the random number stream for CALL STREAMINIT(12345). Nope, random means random.
In modern pseudorandom generators, the streams for different seeds should have similar statistical properties. Furthermore, many RNGs use the base-2 representation of the seed for initialization and (12345)_{10} = (11000000111001)_{2} looks pretty random! In fact, if you avoid powers of 2, the base-2 representations of most base-10 numbers “look random.”
Initialization: Hard for researchers, easy for users
Researchers who specialize in random number generators might criticize what I’ve said as overly simplistic. There have been many research papers written about how to take a 32-bit integer and use that information to initialize a RNG whose internal state contains more than 32 bits. There have been cases where a RNG was published and the authors later modified the initialization routine because certain seeds did not result in streams that were sufficiently random. There have been many discussions about how to create a seed initialization algorithm that is easy to call and that almost always results in a high-quality stream of random numbers.
These are hard problems, but fortunately researchers have developed ways to initialize a stream from a seed so that there is a high probability that the stream will have excellent statistical properties.
The relevant question for many SAS programmers is “can I use 12345 or my telephone number as seed values, or do I always need to type a crazy-looking nine-digit sequence?” My response is that there is no reason to prefer the crazy-looking seed over an easy-to-type sequence such as your phone number, your birthday, or the first few digits of pi.
Choosing a random seed
If you absolutely insist on using a “random seed,” SAS can help. If you call the STREAMINIT subroutine with the value 0, then SAS will use the date, time of day, and possibly other information to manufacture a seed when you call the RAND function. SAS puts the seed value into the SYSRANDOM system macro variable. That means you can use %PUT to display the seed that SAS created, as follows:
data _null_; call streaminit(0); /* generate seed from system clock */ x = rand("uniform"); run; %put &=SYSRANDOM; |
SYSRANDOM=1971603567 |
Every time you run this program, you will get a different seed value that you can use as the seed for a next program.
A second method is to use the RAND function to generate a random integer between 1 and 2^{31}-1, which is the range of valid seed values for the Mersenne twister generator in SAS 9.4m4.
The following program generates a random seed value:
data _null_; call streaminit(0); seed = ceil( (2**31 - 1)*rand("uniform") ); put seed=; run; |
seed=1734176512 |
Both of these methods will generate a seed for you. However, the randomly generated seed does not provide any benefit. For a modern, high-quality, pseudorandom number generator, the stream should have good statistical properties regardless of the seed value. Using a random seed value does not make a stream “more random” than a seed that is easier to type.
The post How to choose a seed for generating random numbers in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |