How to compute p-values for a bootstrap distribution
I was recently asked the following question:
I am using bootstrap simulations to compute critical values for a statistical test. Suppose I have test statistic for which I want a p-value. How do I compute this?
The answer to this question doesn’t require knowing anything about bootstrap methods. An equivalent formulation (for a one-sided p-value) is “How do I count the number of values in a vector that are greater than a given value?” For p-values,
you assume that the vector contains random values sampled from the null distribution.
You can find a fully-worked example of a bootstrap computation for a paired t test on pages 11–14 of my SAS Global Forum paper, "Rediscovering SAS/IML Software:
Modern Data Analysis for the Practicing Statistician.” The empirical p-value is computed on page 14.
Here’s one way think about this problem. Suppose a vector, s, contains random values from the null distribution. In a bootstrap situation, this means that s1, s2, …, sN are the bootstrapped statistics, where si is the statistic computed on the ith bootstrap sample, and where each bootstrap sample is sampled from the null distribution (that is, according to the null hypothesis). Let s0 be the value of the test statistic. Then a one-sided empirical p-value for s0 is computed as follows:
- The simplest computation is to apply the definition of a p-value. To do this, count the number of values (statistics) that are greater than or equal to the observed value, and divide by the number of values. In code, pval = sum(s >= s0)/N;
The previous formula has a bias due to finite sampling. Some authors suggest the modification
pval = (1+sum(s >= s0))/(N+1); For example, see Davison and Hinkley (1997), Bootstrap Methods and their Application, p. 141. Obviously, the two formulas are essentially the same when the number of values, N, is large.
See also my article on computing empirical estimates from the data.
Incidentally, if you’d like to run the bootstrap computations yourself,
you can download the airlines data that I used in my SAS Global Forum paper.