To attach() or not attach(): that is the question

This post was kindly contributed by SAS and R - go there to comment and to read the full post.

R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensome when longer names or repeated references are required, or objects in complicated structures must be accessed.

The attach() function in R can be used to make objects within dataframes accessible in R with fewer keystrokes. As an example:


ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
names(ds)
attach(ds)
mean(cesd)
[1] 32.84768

The search() function can be used to list attached objects and packages. Let’s see what is there, then detach() the dataset to clean up after ourselves.


search()
> search()
[1] ".GlobalEnv" "ds" "tools:RGUI" "package:stats"
[5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[9] "package:methods" "Autoloads" "package:base"
detach(ds)

As noted in section B.4.5, users are cautioned that if there is already a variable
called cesd in the local workspace, issuing attach(ds), may not mean that cesd references ds$cesd. Name conflicts of this type are a common problem with attach() and care should be taken to avoid them.

The help page for attach() notes that attach can lead to confusion. The Google R Style Manual provides clear advice on this point, providing the following advice about attach():

The possibilities for creating errors when using attach are numerous. Avoid it.

After being burned by this one too many times, we concur.

So what options exist for those who decide to go cold turkey?

  1. Reference variables directly (e.g. lm(ds$x ~ ds$y))
  2. Specify the dataframe for commands which support this (e.g. lm(y ~ x, data=ds))
  3. Use the with() function, which returns the value of whatever expression is evaluated (e.g. with(ds,lm(y ~x)))
  4. (Also note the within() function, which is similar to with(), but returns a modified object.)

Some examples may be helpful.


> # fit a linear model
> lm1 = lm(cesd ~ pcs, data=ds)

> mean(ds$cesd[ds$female==1]) # these next three are equivalent
[1] 36.88785
> with(ds, mean(cesd[female==1]))
[1] 36.88785
> with(subset(ds, female==1), mean(cesd))
[1] 36.88785

In short, there’s never an actual need to use attach(), using it can lead to confusion or errors, and alternatives exists that avoid the problems. We recommend against it.

In SAS, all procedures use the most recent data set or must reference a data set explicitly. Very roughly speaking, using attach() in R is like relying on the implicit use of the most recent data set. Our recommendation against attach() thus mirrors our use of the data= option throughout our books.

This post was kindly contributed by SAS and R - go there to comment and to read the full post.