R

I’ve been trying to add R to the tool bench I keep in my noodle. I’ve heard very good things about it, but my first impressions were crummy. The tutorial is weak, there’s no wiki, no IRC channel, the help list is hosted at Yahoo, the book is hundred bucks – mixed signals. And then, it’s community is full of very smart stats dudes, which is scary – they have special words for everything!

Finally I got around to reading the language reference manual. Oh boy! You can tell a lot about a language from the primitive types. In addition to the expected usual boring junk notice these: closure, promise, language, special, environment, expression, weakref. It’s the guest list for a really good party!

Great sentence: “All objects except NULL can have one or more attributes attached to them. Attributes are stored as a list where all elements are named.” You can build some really nice data structures given that, for example custom class systems. So when you want to have, oh time series, you can slip them in “The tsp attribute is used to hold parameters of time series, start, end, and frequency. This construction is mainly used to handle series with periodic substructure such as monthly or quarterly data.” Without messing with the language kernel.

Then somebody is having fun.

> class(d2)
[1] “data.frame”
> typeof(d2)
[1] “list”
> attributes(attributes(d2))
$names
[1] “names” “class” “row.names”

or

> 0:1
[1] 0 1
> 0:10
[1] 0 1 2 3 4 5 6 7 8 9 10
> 2*1:3
[1] 2 4 6
> 0:1*1:10
[1] 0 2 0 4 0 6 0 8 0 10
>

Very amusing, I like a cheerful language.

0 thoughts on “R

  1. Mark

    Weak Tutorial?!? Book costing $100??? Huh?!! The R project has some of the most extensive online (and free) documentation of any open source project [http://cran.r-project.org/manuals.html]. There are *numerous* tutorials for all levels of experience [http://cran.r-project.org/other-docs.html] – many quite good. Yeah, some knowledge of statistics helps – after all, this is a tool developed by and for statisticians (with a heritage coming out of Bell Labs – the original S language). Prefer printed books about R (and S/S-plus) – despite their cost? There are many to choose from – I’d recommend “Programming with Data: A Guide to the S Language” if you really want to appreciate the language itself.

  2. Anton Tagunov

    Really amusing!
    But the linke to “R” points back to this blog entry.
    Could you possibly supply some other link, where more info could be found?

  3. Bill Tozier

    After wrestling and being put off by the CRAN website for several weeks, I eventually had to subcontract to some folks from the University to work on a consulting project for us. As they were working I watched them, and it was all made clear. I think that those of us who aren’t using R in a classroom setting are just going to have to help each other.

    Oh, and one book is still practically required for a lot of the special cool packages and stuff: nnet and mass and trellis and stuff: Venables and Ripley, Modern Applied Statistics with S.

    It all comes down to how we were never taught statistics correctly in school, I guess. Kids these days have it easy….

  4. Ben Hyde

    Mark – Feel free to ignore my opinion.

    But, the first example involving statistics in the tutorial is “2*pt(-2.43, df = 13)”. The first actual bit of data analysis is something called fivenum (who’s help talks about something called a hinge?) then we get a stem and leaf plot? The tutorial introduces the concept of a data.frame only as a side effect of introducing data.frame and never actually provides the reader with enough information to actually create a model in his head of how list, vector, array, matrix, and data.frame relate to each other. The concept of models doesn’t show up until section eleven, after defining your own binary operators, and we don’t get to lm until after we climb over a section that uses the word “homoscedastic.”

  5. Mark

    I assume you ran into ‘fivenum’ and ‘stem/tree’ in the “Introduction to R” document – which isn’t so much a tutorial as it is a quick overview of the capabilities of R. It does assume some background in modern statistics – R and S are closely associated with Exploratory Data Analysis (EDA) and Confirmatory Data Analysis – the above are common techniques in EDA. John Tukey’s classic 1977 book by the same title was a ground-breaking reference to a new way of thinking about and performing data analysis combining statistics and graphical visualization – there are a number of more recent books on this topic. See http://www.itl.nist.gov/div898/handbook/eda/eda.htm or Google for other useful references. [John Tukey worked at Bell Labs – hence the connection to S and subsequently R]. The other R “tutorials” are probably more approachable than this particular Introduction to R document. R is quite a powerful language in itself and one need not necessarily use its functions which are oriented toward EDA.

Leave a Reply

Your email address will not be published. Required fields are marked *