Type I error rates in test of normality by simulation

This post was kindly contributed by Heuristic Andrew - go there to comment and to read the full post.

This simulation tests the type I error rates of the Shapiro-Wilk test of normality in R and SAS.

First, we run a simulation in R. Notice the simulation is vectorized: there are no “for” loops that clutter the code and slow the simulation.


# type I error
alpha <- 0.05

# number of simulations
n.simulations <- 10000

# number of observations in each simulation 
n.obs <- 100

# a vector of test results
type.one.error <- replicate(n.simulations, 
    shapiro.test(rnorm(n.obs))$p.value)<alpha

# type I error for the whole simulation
mean(type.one.error)

# Store cumulative results in data frame for plotting 
sim <- data.frame(
    n.simulations = 1:n.simulations,
    type.one.error.rate = cumsum(type.one.error) / 
        seq_along(type.one.error))

# plot type I error as function of the number of simulations 
plot(sim, xlab="number of simulations", 
    ylab="cumulative type I error rate")

# a line for the true error rate
abline(h=alpha, col="red")

# alternative plot using ggplot2
require(ggplot2)
ggplot(sim, aes(x=n.simulations, y=type.one.error.rate)) + 
    geom_line() + 
    xlab('number of simulations') +
    ylab('cumulative type I error rate') + 
    ggtitle('Simulation of type I error in Shapiro-Wilk test') +
    geom_abline(intercept = 0.05, slope=0, col='red') +
    theme_bw()

As the number of simulations increases, the type I error rate approaches alpha. Try it in R with any value of alpha and any number of observations per simulation.

It’s elegant the whole simulation can be condensed to 60 characters:


mean(replicate(10000,shapiro.test(rnorm(100))$p.value)<0.05)

Likewise, we now do a similar simulation of the Shapiro-Wilk test in SAS. Notice there are no macro loops: the simulation is simpler and faster using a BY statement.


data normal;
 length simulation 4 i 3; /* save space and time */
 do simulation = 1 to 10000;
  do i = 1 to 100;
   x = rand('normal');
   output;
  end;
 end;
run;

proc univariate data=normal noprint ;
 by simulation;
 var x;
 output out=univariate n=n mean=mean std=std NormalTest=NormalTest probn=probn;
run;

data univariate;
 set univariate;
 type_one_error = probn<0.05;
run;

/* Summarize the type I error rates for this simulation */
proc freq data=univariate;
 table type_one_error/nocum;
run;

In my SAS simulation the type I error rate was 5.21%.

Tested with R 3.0.2 and SAS 9.3 on Windows 7.

This post was kindly contributed by Heuristic Andrew - go there to comment and to read the full post.

Post navigation