Author Archives: mar36

Comparing Stan to JAGS for Bayesian Inference (Part 1?)

Stan is a new, open source, Bayesian inference tool. Stan is based on the the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. In order to compare Stan with JAGS, a gibbs sampler approach to Bayesian inference, I used the classic WinBUGS rats example. The rats example uses a hierarchical model to look at the growth curves of 30 rats.

In this example, I used R with RStan and R2jags packages. I used the code on the RStan Getting Started and the examples from the JAGS sourceforge page as starting points. I used a very large number of samples to test run times. Data and model files needed are below.

rats.csv

rats.bug

rats.txt

rats.stan

Both RStan and R2jags use the same list structure for the model data. In in R2jags, you need to specify which variables you wish to monitor, while RStan defaults to all of the variables. Both require an external model file and have very similar commands to fit the model. So, within R, both packages are very similar.

When you run the code, you will note that JAGS starts the sampling immediately, while Stan requires a lengthy compilation step. You will also see that I used two different calls to “stan”.

 stan.fit <- stan(file = 'rats.stan', data = rats_dat, verbose = FALSE, warmup=1000,iter = 11000, n_chains = 3) stan.fit2 <- stan(fit=stan.fit, data = rats_dat, verbose = FALSE, warmup=1000,iter = 11000, n_chains = 3) 

The first compiles the model and runs the sampler, while the second uses the model from the first command. This eliminates the need to recompile the model, saving a good chunk of time. Notice the first argument has changed from “file=” to “fit=”. I also discovered that adding the “thin” option to either “stan” call creates a segfault, which I have reported the the Stan maintainers.

The answers that each program produced were identical (to the first decimal, at least). Both gave means, standard deviations, and key quantiles of the posterior distributions of the parameters, as well as R-hat, a measure of convergence you would like close to 1. The plots are slightly different, with RStan color coding the R-hat value vs. reporting it for R2jags. The output from RStan does not include a DIC value, at least by default.

R2jags Output

Inference for Bugs model at "rats.bug", fit using jags, 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 9 n.sims = 3000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat alpha.c 242.390 3.712 237.001 240.654 242.431 244.237 247.839 1.129 alpha0 106.316 4.508 99.444 103.915 106.331 108.864 113.246 1.043 beta.c 6.185 0.109 5.968 6.116 6.185 6.258 6.394 1.002 sigma 6.136 1.182 5.266 5.776 6.065 6.395 7.088 1.001 tau.alpha 0.005 0.001 0.003 0.004 0.005 0.006 0.008 1.001 tau.beta 4.137 1.538 1.904 3.073 3.902 4.904 7.857 1.001 tau.c 0.027 0.004 0.020 0.024 0.027 0.030 0.036 1.001 deviance 967.610 23.168 941.121 957.316 966.043 976.319 998.467 1.001 n.eff alpha.c 3000 alpha0 3000 beta.c 1400 sigma 3000 tau.alpha 3000 tau.beta 3000 tau.c 3000 deviance 3000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 268.5 and DIC = 1236.1 DIC is an estimate of expected predictive error (lower deviance is better). 

plot of chunk unnamed-chunk-1

RStan output

Inference for Stan model: anon_model. 3 chains: each with iter=11000; warmup=1000; thin=1; 11000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff alpha[1] 239.9 0.0 2.7 234.6 238.1 239.9 241.7 245.1 15672 alpha[2] 247.8 0.0 2.7 242.6 246.0 247.8 249.6 253.1 16865 alpha[3] 252.4 0.0 2.7 247.1 250.6 252.4 254.2 257.7 20956 alpha[4] 232.6 0.0 2.7 227.3 230.8 232.5 234.3 237.8 21050 alpha[5] 231.6 0.0 2.7 226.4 229.8 231.6 233.4 236.8 20250 alpha[6] 249.7 0.0 2.7 244.5 247.9 249.7 251.5 255.1 18570 alpha[7] 228.7 0.0 2.7 223.4 226.9 228.7 230.5 234.0 21257 alpha[8] 248.4 0.0 2.7 243.0 246.6 248.4 250.2 253.7 16278 alpha[9] 283.3 0.0 2.7 278.0 281.5 283.3 285.1 288.6 19244 alpha[10] 219.2 0.0 2.7 213.9 217.4 219.2 221.0 224.6 17471 alpha[11] 258.2 0.0 2.7 252.9 256.4 258.2 260.1 263.6 21935 alpha[12] 228.2 0.0 2.7 222.9 226.4 228.2 230.0 233.5 19933 alpha[13] 242.4 0.0 2.7 237.1 240.6 242.4 244.2 247.7 14641 alpha[14] 268.3 0.0 2.7 262.9 266.4 268.3 270.1 273.7 20241 alpha[15] 242.8 0.0 2.7 237.6 241.0 242.8 244.6 248.0 14321 alpha[16] 245.3 0.0 2.7 239.9 243.5 245.3 247.1 250.5 15314 alpha[17] 232.2 0.0 2.7 227.0 230.4 232.2 234.0 237.4 20874 alpha[18] 240.5 0.0 2.7 235.2 238.6 240.5 242.3 245.6 16301 alpha[19] 253.8 0.0 2.7 248.5 252.0 253.8 255.6 259.1 22001 alpha[20] 241.6 0.0 2.7 236.2 239.8 241.6 243.4 246.9 14330 alpha[21] 248.5 0.0 2.7 243.3 246.8 248.5 250.3 253.8 19178 alpha[22] 225.2 0.0 2.7 220.0 223.4 225.2 227.1 230.5 19358 alpha[23] 228.5 0.0 2.7 223.2 226.7 228.5 230.3 233.8 19742 alpha[24] 245.1 0.0 2.7 239.9 243.3 245.1 246.9 250.2 14794 alpha[25] 234.5 0.0 2.7 229.2 232.7 234.5 236.2 239.7 18778 alpha[26] 254.0 0.0 2.7 248.7 252.2 254.0 255.8 259.2 19957 alpha[27] 254.4 0.0 2.7 249.1 252.6 254.4 256.2 259.7 21171 alpha[28] 243.0 0.0 2.7 237.7 241.2 243.0 244.8 248.3 14643 alpha[29] 217.9 0.0 2.7 212.6 216.1 217.9 219.7 223.3 18663 alpha[30] 241.4 0.0 2.6 236.3 239.6 241.4 243.2 246.6 16161 beta[1] 6.1 0.0 0.2 5.6 5.9 6.1 6.2 6.5 1717 beta[2] 7.1 0.0 0.3 6.6 6.9 7.1 7.2 7.5 1052 beta[3] 6.5 0.0 0.2 6.0 6.3 6.5 6.6 7.0 1218 beta[4] 5.3 0.0 0.3 4.8 5.2 5.3 5.5 5.8 1187 beta[5] 6.6 0.0 0.2 6.1 6.4 6.6 6.7 7.0 1936 beta[6] 6.2 0.0 0.2 5.7 6.0 6.2 6.3 6.7 1651 beta[7] 6.0 0.0 0.2 5.5 5.8 6.0 6.1 6.4 1702 beta[8] 6.4 0.0 0.2 5.9 6.3 6.4 6.6 6.9 1926 beta[9] 7.1 0.0 0.3 6.6 6.9 7.1 7.2 7.6 1444 beta[10] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1514 beta[11] 6.8 0.0 0.2 6.3 6.6 6.8 7.0 7.3 780 beta[12] 6.1 0.0 0.2 5.7 6.0 6.1 6.3 6.6 1924 beta[13] 6.2 0.0 0.2 5.7 6.0 6.2 6.3 6.6 1283 beta[14] 6.7 0.0 0.3 6.2 6.5 6.7 6.9 7.2 1523 beta[15] 5.4 0.0 0.3 4.9 5.2 5.4 5.6 5.9 1000 beta[16] 5.9 0.0 0.2 5.4 5.8 5.9 6.1 6.4 1920 beta[17] 6.3 0.0 0.2 5.8 6.1 6.3 6.4 6.8 2247 beta[18] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1694 beta[19] 6.4 0.0 0.2 5.9 6.3 6.4 6.6 6.9 1184 beta[20] 6.1 0.0 0.2 5.6 5.9 6.1 6.2 6.5 1666 beta[21] 6.4 0.0 0.2 5.9 6.2 6.4 6.6 6.9 1618 beta[22] 5.9 0.0 0.2 5.4 5.7 5.9 6.0 6.3 2248 beta[23] 5.8 0.0 0.2 5.3 5.6 5.8 5.9 6.2 1597 beta[24] 5.9 0.0 0.2 5.4 5.7 5.9 6.0 6.4 2203 beta[25] 6.9 0.0 0.2 6.4 6.7 6.9 7.1 7.4 1778 beta[26] 6.5 0.0 0.2 6.1 6.4 6.5 6.7 7.0 1912 beta[27] 5.9 0.0 0.2 5.4 5.7 5.9 6.1 6.4 1546 beta[28] 5.8 0.0 0.2 5.4 5.7 5.8 6.0 6.3 1776 beta[29] 5.7 0.0 0.2 5.2 5.5 5.7 5.8 6.2 1514 beta[30] 6.1 0.0 0.2 5.7 6.0 6.1 6.3 6.6 1630 mu_alpha 242.4 0.0 2.8 236.9 240.6 242.4 244.3 247.9 12061 mu_beta 6.2 0.0 0.1 6.0 6.1 6.2 6.3 6.4 2855 sigmasq_y 37.3 0.1 5.7 27.8 33.2 36.7 40.7 50.0 2879 sigmasq_alpha 219.1 1.2 64.6 125.7 173.1 208.1 252.0 374.3 2934 sigmasq_beta 0.3 0.0 0.1 0.1 0.2 0.3 0.3 0.5 408 sigma_y 6.1 0.0 0.5 5.3 5.8 6.1 6.4 7.1 2864 sigma_alpha 14.7 0.0 2.1 11.2 13.2 14.4 15.9 19.3 2966 sigma_beta 0.5 0.0 0.1 0.4 0.5 0.5 0.6 0.7 402 alpha0 106.3 0.1 3.7 99.0 103.9 106.3 108.8 113.6 5365 lp__ -438.4 0.1 6.9 -453.0 -442.7 -437.9 -433.5 -426.2 2614 Rhat alpha[1] 1 alpha[2] 1 alpha[3] 1 alpha[4] 1 alpha[5] 1 alpha[6] 1 alpha[7] 1 alpha[8] 1 alpha[9] 1 alpha[10] 1 alpha[11] 1 alpha[12] 1 alpha[13] 1 alpha[14] 1 alpha[15] 1 alpha[16] 1 alpha[17] 1 alpha[18] 1 alpha[19] 1 alpha[20] 1 alpha[21] 1 alpha[22] 1 alpha[23] 1 alpha[24] 1 alpha[25] 1 alpha[26] 1 alpha[27] 1 alpha[28] 1 alpha[29] 1 alpha[30] 1 beta[1] 1 beta[2] 1 beta[3] 1 beta[4] 1 beta[5] 1 beta[6] 1 beta[7] 1 beta[8] 1 beta[9] 1 beta[10] 1 beta[11] 1 beta[12] 1 beta[13] 1 beta[14] 1 beta[15] 1 beta[16] 1 beta[17] 1 beta[18] 1 beta[19] 1 beta[20] 1 beta[21] 1 beta[22] 1 beta[23] 1 beta[24] 1 beta[25] 1 beta[26] 1 beta[27] 1 beta[28] 1 beta[29] 1 beta[30] 1 mu_alpha 1 mu_beta 1 sigmasq_y 1 sigmasq_alpha 1 sigmasq_beta 1 sigma_y 1 sigma_alpha 1 sigma_beta 1 alpha0 1 lp__ 1 Sample were drawn using NUTS2 at Wed Sep 5 21:55:07 2012. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). 

plot of chunk unnamed-chunk-1

One interesting difference between the two outputs is the R-hat value for the alpha0 variable. Alpha0 is generated quantity, representing the value of the intercept when non-centered data is used. For R2jags, the value of R-hat is 1.228, while R-hat is 1 for RStan. A quick look at the output indicates that R2jags used a thin value of 9, while RStan defaults to 1 (and creates a segfault when changed).

There is a major speed difference between the two methods. Now, it should be noted that JAGS can be faster on some problems and Stan can be faster on others. On my machine, the JAGS sampler took 7.1 seconds, while the Stan sampler took 25 seconds, which was a bit shocking to me. And this was timed using the second “stan” call, when the Stan model was pre-compiled. This is something that needs to be explored further.

Installing RStan in Ubuntu

Based on RStan Getting Started

R packages

Assuming you have the most up to date version of R, the following packages need to be installed. This assumes you have the c2d4u PPA available. See here for more information.

 sudo apt-get install r-cran-rcpp r-cran-inline r-cran-rcpp 

C++ Compiler

You probably have a C++ compiler installed, but just in case:

 sudo apt-get install build-essential 

The “RStan Getting Started” page suggests modifying the C++ compiler optimization level. On my install of Ubuntu, the optimization was set as the page suggested. You can check with:

 grep "CXXFLAGS =" /usr/lib/R/etc/Makeconf 

Among the output of grep, you should see:

 CXXFLAGS = -O3 -pipe -g $(LTO) 

If you do not, use your favorite editor to edit “/ust/lib/R/etc/Makeconf” so that the CXXFLAGS line has “-03” as well as everything else that is there.

Install RStan

From within R, run the following commands. You will need to select a mirror, although it shouldn’t actually use it.

 options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R")) install.packages('rstan', type = 'source') 

It is a large download (for R, 2.3 Mb) and will take some time to compile.

Testing RStan

Now that RStan is installed, you can test it with the following code. RStan compiles the model, then runs it. It will take much longer to compile the code than it takes to generate the samples.

Installing the Development Version of R on Ubuntu (alongside the current version of R)

A recent question was asked on the R-sig-Debian on installing the development version of R alongside the stable version. While most users of R do not need to have two versions installed on their machine, it is useful if you are developing packages for CRAN and want to test your package.

The latest development version of R is not available as a Ubuntu package, so it will need to be built from source. Before you build from source, you will need to make sure that all the compilers and libraries needed to build R are available. Since the development version is similar to the current version of R, the following commands will install all the packages you need (plus a couple more).

 sudo apt-get build-dep r-base sudo apt-get install subversion ccache 

The next steps were suggested by Dirk Eddelbuettel. First, you need to download the current version of r-devel from svn. Create a directory for the r-devel code. The scripts below use the path “~/svn” and if you want to change the location, you will need to make the appropriate changes in all the scripts.

 mkdir ~/svn/ 

Now checkout the current r-devel code from svn.

 cd ~/svn/ svn co https://svn.r-project.org/R/trunk r-devel/R 

The svn checkout will take some time. Next you need a script that will build r-devel, but install it in a location different from the stable version of R. Again, thanks to Dirk, here is that script. Note that the executable is installed in “/usr/local/lib/R-devel/bin”, but that can be changed.

A second script (that you should place in the appropriate location) will allow you to easily launch the development version of R.

R 2.15.1 Available

Ubuntu packages for the latest release of R are now available on CRAN and RutteR PPA. If you have either repository installed, Ubuntu should have updated R automatically. If you do not have either repository isnallted, see the Installing R tab above.

The announcment from the R Core Team about 2.15.1 can be found here.

For Ubuntu (and Debian) users, the following has also been added:

  • src/X11/*: Applied revised patch by Philip Johnson to to set the X11 icon and window class for improved desktop experience; patch source and support files are in debian/icon-class-patch/.
  • debian/rules: Install icon png file and desktop file from Philip too
  • debian/r-base-core.dirs: Add two directories for icon and desktop

In short, this means when you create windows from within R, the R logo will appear in the window title bar (if your desktop has that feature enabled) and in any application that manages your windows.