This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |
There is an interesting question —
There are 5 different types of toys at a KFC restaurant. If you go there, you will get one toy randomly. How many times do you need to go to KFC in order to get all 5 toys?
The question is about probabilistic analysis. Different professionals, such as a business analyst, a statistical programmer, a mathematician and a software developer, will have different thinking pathway to solve this problem. Let’s see what they would think.
1. Business Analyst
A business analyst will tend to do scenario analysis at the first step.
Best-case scenario:
Assume I am so lucky that each time I visit KFC and get a different toy, then I only need 5 times to have all the five toys. The minimum number is 5.
Worst-case scenario:
To get a different toy I need to go to KFC five times. If there is not the toy I want, I will come back with empty hand. Thus, I will have to go to KFC 5*5 = 25 times. Of course, this scenario never happens.
OK. Then the mean times I need to go to KFC seems to be in a range [5, 25). Let’s give the simplest try — use the (5+25)/2 to get 15 times. The number is not accurate but at least we have an estimate.
2. Statistical Programmer
As a brute-force tool, simulation is the instant thought for a statistical programmer. Let the computer randomly create 10,000 trials — say a person plays the game 10,000 times. After averaging the results, the computer eventually tells the expected times to get the 5 toys.
I modified the SAS code from a post by Rick Wicklin, and set the maximum number of visits to KFC as 32. After 10,000 runs, the mean is 11.37. The hunch tells that this number should be quite close.
************(1)Simulate 10000 trials**************************;
proc iml;
K = 5; /* number of toys */
L = 32; /* max visits per trial */
/* generate NSim trials of L visits */
NSim = 10000;
x = j(Nsim, L);
call randseed(12345);
call randgen(x, "Uniform");
x = ceil(K*x); /* integers in [1,K] */
/* record the visit when 5 toys are taken */
c = j(NSim,1,0); /** allocate */
do i = 1 to NSim;
do j = 5 to L;
rowUnique = countunique(x[i, 1:j]);
if rowUnique = 5 then do;
c[i, 1] = j;
goto skip;
end;
end;
skip:
end;
/* output the result */
create _1 from c;
append from c;
close _1;
;quit;
data _2;
set _1;
/* remove the trials that didn't get 5 toys in 32 visits */
where col1 >= 5;
run;
************(2)Show the results*******************************;
proc sgplot;
density col1;
density col1 / type = kernel;
run;
proc means;
run;
3. Mathematician
Actually this question is a variant of Coupon collector’s problem. The mean/expectation and standard deviation can be derived directly by the formulas. The final expectation should be 5*(1+1/2+1/3+1/4+1/5) = 11.41. This is the answer.
4. Software Developer
A software developer considers time complexity and space complexity first. When N is approaching infinity, the question is similar to a merge sorting. Given a merge sort is O(nlog(n)), the expected times must be greater than 5*ln(5) = 8.05. At least this number will be a lower bound for this question.
This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |