R scoping, assignment, confusion

In R lexical variables are implicitly introduced by assignment. It also appears that lexical variables default to the value they had in the enclosing scope.

> b = 2
> g <- function () { print(b); b <- 3; print(b) }
> b
[1] 2
> g()
[1] 2
[1] 3
> b
[1] 2
>

It’s unclear from that example if the first print of b in the function is referring to outer or the inner b.

But this example makes my brain hurt. Here we have x, who’s value is a matrix. When we assign to it even a single element of the matrix in the function it appears that R copies the entire matrix.

> x <- matrix(1:4,c(2,2))
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> x[2,2] <- 22
> x
[,1] [,2]
[1,] 1 3
[2,] 2 22
> f <- function (z) { x[1,1] <- z; c(x[1,1],x[2,2]) }
> f(11)
[1] 11 22
> x
[,1] [,2]
[1,] 1 3
[2,] 2 22
>

It this point it unclear on how to write functions that have side effects on data structures. That certainly is a strange state to get into. The language reference manual talks about scopes, but it spends all its calories on explaining closures. If you read the help for the assignment operator help(“<-“) you discover there are actually five operators with two semantics.

> f <- function (z) { x[1,1] <<- z; c(x[1,1],x[2,2]) }
> f(11)
[1] 11 22
> x
[,1] [,2]
[1,] 11 3
[2,] 2 22

Notice the “< <-" operator. It preempts creating something in the inner most scope of the function. Instead it will run outward looking for an existing data structure to modify, finally creating a fresh variable in the global scope. The five operators are <-, ->, <<-, ->>, and =. The = looks like a vestigial organ; it’s identical to <- except for some severe limits on what contexts you can us it in.

I don’t think I quite have this all figured out yet. In this example we make a list of two elements, but we also give names to those
elements.

> guys = list(who = c("tom", "john"), age=c(12, 14))
> guys
$who
[1] "tom" "john"

$age
[1] 12 14

> names(guys)
[1] "who" "age"

We can then “attach” to that data structure so we can use the name directly.

> attach(guys)
> who
[1] "tom" "john"
> age
[1] 12 14
> mean(age)
[1] 13

Something similar to the automatic copy of a lexical variable is going on here as well.

> age[1] <- 13
> age
[1] 13 14
> guys$age[1]
[1] 12
> detach(guys)
> guys
$who
[1] "tom" "john"

$age
[1] 12 14

That’s kind of neat. The attach creates a scratch workspace where you can tinker with a data structure without doing damage to your master copy.

All this suggests to me that there might be a lot of lazy evaluation going on here. That’s plausible since we saw promises in the basic type suite.

None of this surprises me, this is the kind of stuff that happens in a very domain specific special purpose language. I just wish it was explained someplace in a way I could understand.

0 thoughts on “R scoping, assignment, confusion

  1. Mark

    You should really try to get hold of a copy of John M. Chambers’s book “Progamming with Data: A Guide to the S Language” as it would likely answer many of your insightful questions about how this language works. Alternatively, have a look at some of his on-line publications which might provide some useful information:

    http://cm.bell-labs.com/cm/ms/departments/sia/jmc/pub.html
    http://cm.bell-labs.com/cm/ms/departments/sia/Sbook/index.html

    In your first example, the first print of b in the function refers to the inner b. The S/R language does indeed make extensive use of lazy evaluation. An arguement in S/R is evaluated only when it is needed. It does matter where this evaluation takes place. In the case of a function body, the evaulation occurs in an ‘evalation frame’ that S/R creates. Evaluation of function arguements takes place in the frame of the caller. Side effects are highly discouraged in S/R – even when you might be able to do so – such as in function arguements. The reasoning for this seems to be that lazy evaluation means you don’t necessarily know when, or even if, the assignment takes place – especially dangerous in a language such as S/R.

Leave a Reply

Your email address will not be published. Required fields are marked *