How to compute p-values for a bootstrap distribution

November 2, 2011
By

This post was kindly contributed by The DO Loop - go there to comment and to read the full post.

I was recently asked the following question:

I am using bootstrap simulations to compute critical values for a statistical test. Suppose I have test statistic for which I want a p-value. How do I compute this?

The answer to this question doesn’t require knowing anything about bootstrap methods. An equivalent formulation (for a one-sided p-value) is “How do I count the number of values in a vector that are greater than a given value?” For p-values,
you assume that the vector contains random values sampled from the null distribution.

You can find a fully-worked example of a bootstrap computation for a paired t test on pages 11–14 of my SAS Global Forum paper, "Rediscovering SAS/IML Software:
Modern Data Analysis for the Practicing Statistician
.” The empirical p-value is computed on page 14.

Here’s one way think about this problem. Suppose a vector, s, contains random values from the null distribution. In a bootstrap situation, this means that s1, s2, …, sN are the bootstrapped statistics, where si is the statistic computed on the ith bootstrap sample, and where each bootstrap sample is sampled from the null distribution (that is, according to the null hypothesis). Let s0 be the value of the test statistic. Then a one-sided empirical p-value for s0 is computed as follows:

  • The simplest computation is to apply the definition of a p-value. To do this, count the number of values (statistics) that are greater than or equal to the observed value, and divide by the number of values. In code, pval = sum(s >= s0)/N;
  • The previous formula has a bias due to finite sampling. Some authors suggest the modification
    pval = (1+sum(s >= s0))/(N+1); For example, see Davison and Hinkley (1997), Bootstrap Methods and their Application, p. 141. Obviously, the two formulas are essentially the same when the number of values, N, is large.

See also my article on computing empirical estimates from the data.

Incidentally, if you’d like to run the bootstrap computations yourself,
you can download the airlines data that I used in my SAS Global Forum paper.

tags: Bootstrap and Resampling

This post was kindly contributed by The DO Loop - go there to comment and to read the full post.

Tags: ,

Proc-x is looking for sponsors!

Dear readers, proc-x is looking for sponsors who would be willing to support the site in exchange for banner ads in the right sidebar of the site. If you are interested, please e-mail me at: [email protected]

Welcome!

SAS-X.com offers news and tutorials about the various SAS® software packages, contributed by bloggers. You are welcome to subscribe to e-mail updates, or add your SAS-blog to the site.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.