Michele Ensor recently posted a wonderful blog with a graph of the 2014 Winter Olympics medal count. I’m going to further refine that graph, making it an Olympic graph … on steroids! 🙂 Here is Michele’s graph: First, let’s give it a few simple cosmetic changes. I always like to have […]
Category: SAS
Type I error rates in test of normality by simulation
This simulation tests the type I error rates of the Shapiro-Wilk test of normality in R and SAS.
First, we run a simulation in R. Notice the simulation is vectorized: there are no “for” loops that clutter the code and slow the simulation.
# type I error
alpha <- 0.05
# number of simulations
n.simulations <- 10000
# number of observations in each simulation
n.obs <- 100
# a vector of test results
type.one.error shapiro.test(rnorm(n.obs))$p.value)<alpha
# type I error for the whole simulation
mean(type.one.error)
# Store cumulative results in data frame for plotting
sim <- data.frame(
n.simulations = 1:n.simulations,
type.one.error.rate = cumsum(type.one.error) /
seq_along(type.one.error))
# plot type I error as function of the number of simulations
plot(sim, xlab="number of simulations",
ylab="cumulative type I error rate")
# a line for the true error rate
abline(h=alpha, col="red")
# alternative plot using ggplot2
require(ggplot2)
ggplot(sim, aes(x=n.simulations, y=type.one.error.rate)) +
geom_line() +
xlab('number of simulations') +
ylab('cumulative type I error rate') +
ggtitle('Simulation of type I error in Shapiro-Wilk test') +
geom_abline(intercept = 0.05, slope=0, col='red') +
theme_bw()
As the number of simulations increases, the type I error rate approaches alpha. Try it in R with any value of alpha and any number of observations per simulation.
It’s elegant the whole simulation can be condensed to 60 characters:
mean(replicate(10000,shapiro.test(rnorm(100))$p.value)<0.05)
Likewise, we now do a similar simulation of the Shapiro-Wilk test in SAS. Notice there are no macro loops: the simulation is simpler and faster using a BY statement.
data normal;
length simulation 4 i 3; /* save space and time */
do simulation = 1 to 10000;
do i = 1 to 100;
x = rand('normal');
output;
end;
end;
run;
proc univariate data=normal noprint ;
by simulation;
var x;
output out=univariate n=n mean=mean std=std NormalTest=NormalTest probn=probn;
run;
data univariate;
set univariate;
type_one_error = probnrun;
/* Summarize the type I error rates for this simulation */
proc freq data=univariate;
table type_one_error/nocum;
run;
In my SAS simulation the type I error rate was 5.21%.
Tested with R 3.0.2 and SAS 9.3 on Windows 7.
For more posts like this, see Heuristic Andrew.
Applying conditional highlighting to a SAS Enterprise Guide report
Here’s my latest tip on how to apply conditional highlighting to a SAS Enterprise Guide report using the Summary Tables task. The Summary Tables task is a great way to point and click your way to creating simple or complex reports. Conditional highlighting is just one additional feature you can […]
Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More!
I’m expanding the coverage of my article, The Popularity of Data Analysis Software. This is the first installment, which includes a new opening and a greatly expanded analysis of the analytics job market. Here it is, from the abstract onward … Continue reading →![]()
Way to go Team USA, 28 Olympic medals!
I must admit, I’m glad the 2014 Winter Olympics are over. I’ve spent too many night-time hours glued to the TV. I can now get on with my life. But first, let’s have a little SAS fun with the medal standings. How do you prefer your summary of medal standings […]
Building a SAS Stored Process Log
As a SAS stored process developer, a question sometimes pokes its way into my head: “Are people using the stored processes I write?” In fact, really I have four questions:
What stored processes are being used?
Who is using them?
When are they being used?
How are they using them?
I realized what I’ve been missing. I need a SAS stored process log.
If It Works for a Macro…
As a macro programmer …
Post Building a SAS Stored Process Log appeared first on BI Notes for SAS® Users. Go to BI Notes for SAS® Users to subscribe.
Fencing in your SAS users with LOCKDOWN
SAS administrators now have another tool to keep SAS users from straying off their permitted path: the LOCKDOWN system option. The option was introduced in “stealth mode” for SAS 9.4. In SAS 9.4M1, it became a true, documented option. For the official guide to creating “locked-down servers”, see the SAS […]
Let’s shed some light on the black box containing Neural Net model weights
One of the primary predictive modeling tools capable of fitting very complex nonlinear functions is Neural Networks (NN) in SAS advanced analytics software SAS Enterprise Miner (EM). The default option in NN EM node uses a Multilayer Perceptron model w…

