3 min read

Getting strings into code in base R

I’m reasonably often asked how to take the value of a character string variable and use it as a variable name in, eg, the survey package. This is the sort of quasiquotation that the tidyverse uses a lot. It’s needed much more often in the tidyverse because of the use of bare variable names as function arguments, but sometimes you need it in base R as well.

I should first say that quasiquotation in base R should be a last resort. If you want to iterate over variables, you usually will be better off having those variables be elements of a list. R isn’t SAS or Stata; we have lists and function calls and we don’t need to do everything with macros. However, sometimes you really do want to use the value of a character string as a variable, especially in a model formula, and quasiquotation respects the structure of the model formula better than just using string manipulation.

Suppose we have a variable v whose value is sch.wide and we want to do

svyglm(api00~api99+sch.wide,data=dclus2)

That is, we want to take the value of v and put it into a model formula api00~api99+the_value_of_v.

You can do this with paste:

suppressMessages(library(survey))
data(api)
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
v<-"sch.wide"
f <- as.formula(paste("api00~api99",v,sep="+"))
f
## api00 ~ api99 + sch.wide
model<-svyglm(f, design=dclus2)
model
## 2 - level Cluster Sampling design
## With (40, 126) clusters.
## svydesign(id = ~dnum + snum, fpc = ~fpc1 + fpc2, data = apiclus2)
## 
## Call:  svyglm(formula = f, design = dclus2)
## 
## Coefficients:
## (Intercept)        api99  sch.wideYes  
##      36.897        0.928       46.990  
## 
## Degrees of Freedom: 125 Total (i.e. Null);  37 Residual
## Null Deviance:       2340000 
## Residual Deviance: 33890     AIC: 1119

There are two problems with this. First, it’s ugly and fragile, because paste doesn’t understand model formulas (imagine if you had transformations and interactions). Second, the formula argument to svyglm is f rather than api00~api99+sch.wide, which affects how it prints.

The better way to do it is with bquote, which quotes an expression except for specified places where it evaluates it first.

bquote(1+2)
## 1 + 2
two<-2
bquote(1+two)
## 1 + two
bquote(1+.(two))
## 1 + 2
eval(bquote(1+.(two)))
## [1] 3

Anything wrapped in the .() operator is evaluated and the value replaces the .() – like a code chunk in R markdown.

bquote(svyglm(api00~api99+.(as.name(v)), design=dclus2))
## svyglm(api00 ~ api99 + sch.wide, design = dclus2)
model<-eval(bquote(svyglm(api00~api99+.(as.name(v)), design=dclus2)))
model
## 2 - level Cluster Sampling design
## With (40, 126) clusters.
## svydesign(id = ~dnum + snum, fpc = ~fpc1 + fpc2, data = apiclus2)
## 
## Call:  svyglm(formula = api00 ~ api99 + sch.wide, design = dclus2)
## 
## Coefficients:
## (Intercept)        api99  sch.wideYes  
##      36.897        0.928       46.990  
## 
## Degrees of Freedom: 125 Total (i.e. Null);  37 Residual
## Null Deviance:       2340000 
## Residual Deviance: 33890     AIC: 1119

And it works!!

bquote also has now the unquote-and-splice operator (the !!! from tidyverse), and an argument that lets you choose where to evaluate the terms that need evaluating.