catbarchart- Plotting categorical data in R

catbarchart is a R function I wrote for a Statistics course. This function coupled with a helper function allows plotting of Continuous data against a categorical Response Variable.

Here is the plot you will get if you take famous Cars93 dataset in R and plot some of the Continuous variables against a Categorical Variable (Response)

Continuous variables: “MPG.city”, “MPG.highway”, “Horsepower”, “Price” from the Cars93 Dataset in R
Categorical variable: “Origin” (US vs. Non-US)

Let’s take a look at the output of this function:

Looking at this plot you can visually infer many things:

  1. There are more cars with the Origin US that have low MPG.city and MPG.highway
  2. For medium MPG.city and MPG.highway there are more non-US Origin Cars
  3. For high MPG.city and MPG.highway that are NO US Origin Cars

So on and so forth.

Now let’s take a look at how to plot this data using catbarchart. First we need to pass the dataset to the continous2categorical function. This function takes a continuous data set and converts it to a Categorical dataset. Note: the last variable in the input dataset must be Categorical. This is the response variable. continous2categorical function doesn’t modify the response variable

The output from the continous2categorical needs to be passed to catbarchart function.

And now to these two functions (the secret sauce):

# continous2categorical function. This function takes a data frame of continous variables and 
# converts to a data frame of categorical variables. The last variable is the response variable.
continous2categorical <- function(x){
  numberoffactors <- ncol(x)-1
  out <- data.frame(0,matrix(nrow=nrow(x),ncol=1))
  for (i in 1:numberoffactors){
    
    
    
    labs <- c("low", "low-medium", "medium", "medium-high", "high")
    vartemp <- cut(x[,i], breaks = 5, labels = labs)
    
    
    out[i]<-vartemp
    
  }
  i<- i+1
  out[i] <- x[i]
  colnames(out)<-colnames(x)
  return(data.frame(out))
}
# catbarchart plots each variable against the response variable.
# The last variable is the response variable.

catbarchart <- function(x){

xcolumnnames <- colnames(x)
responsecol <- ncol(x)

plot_hist <- function (column, data, response) ggplot(data, aes(x=get(column), ..count..)) +geom_bar(aes(fill=get(response)), position="dodge") + xlab(column) + scale_fill_discrete(name=response)

myplots <- lapply(colnames(x), plot_hist, data = x, response=xcolumnnames[responsecol])
myplots <- myplots[-length(myplots)]

grid.arrange(grobs = myplots, ncol=1)

}

How to use these two functions?

library(MASS)
data(Cars93)
mycars <- Cars93[,c("MPG.city", "MPG.highway", "Horsepower", "Price", "Origin")]
catbarchart(continous2categorical(mycars))

Let me know if you have questions. In one of the next blogpost I will go into details on how ggplot was used in the catbarchart function.

Leave a Reply

Your email address will not be published. Required fields are marked *