catbarchart is a R function I wrote for a Statistics course. This function coupled with a helper function allows plotting of Continuous data against a categorical Response Variable.

Here is the plot you will get if you take famous Cars93 dataset in R and plot some of the Continuous variables against a Categorical Variable (Response)

Continuous variables: “MPG.city”, “MPG.highway”, “Horsepower”, “Price” from the Cars93 Dataset in R

Categorical variable: “Origin” (US vs. Non-US)

Let’s take a look at the output of this function:

Looking at this plot you can *visually* infer many things:

- There are more cars with the Origin US that have low MPG.city and MPG.highway
- For medium MPG.city and MPG.highway there are more non-US Origin Cars
- For high MPG.city and MPG.highway that are NO US Origin Cars

So on and so forth.

Now let’s take a look at how to plot this data using catbarchart. First we need to pass the dataset to the **continous2categorical** function. This function takes a continuous data set and converts it to a Categorical dataset. Note: the last variable in the input dataset must be Categorical. This is the response variable. **continous2categorical** function doesn’t modify the response variable

The output from the **continous2categorical** needs to be passed to **catbarchart** function.

And now to these two functions (the secret sauce):

# continous2categorical function. This function takes a data frame of continous variables and # converts to a data frame of categorical variables. The last variable is the response variable. continous2categorical <- function(x){ numberoffactors <- ncol(x)-1 out <- data.frame(0,matrix(nrow=nrow(x),ncol=1)) for (i in 1:numberoffactors){ labs <- c("low", "low-medium", "medium", "medium-high", "high") vartemp <- cut(x[,i], breaks = 5, labels = labs) out[i]<-vartemp } i<- i+1 out[i] <- x[i] colnames(out)<-colnames(x) return(data.frame(out)) }

# catbarchart plots each variable against the response variable. # The last variable is the response variable. catbarchart <- function(x){ xcolumnnames <- colnames(x) responsecol <- ncol(x) plot_hist <- function (column, data, response) ggplot(data, aes(x=get(column), ..count..)) +geom_bar(aes(fill=get(response)), position="dodge") + xlab(column) + scale_fill_discrete(name=response) myplots <- lapply(colnames(x), plot_hist, data = x, response=xcolumnnames[responsecol]) myplots <- myplots[-length(myplots)] grid.arrange(grobs = myplots, ncol=1) }

How to use these two functions?

library(MASS) data(Cars93) mycars <- Cars93[,c("MPG.city", "MPG.highway", "Horsepower", "Price", "Origin")] catbarchart(continous2categorical(mycars))

Let me know if you have questions. In one of the next blogpost I will go into details on how **ggplot** was used in the **catbarchart** function.