Why you should always specify colClasses when reading tables in R

Today, I was importing table that looked like this into R:

with a command:

 r<-read.table("passes_stat",colClasses = c("integer","character"))

Why did I put an extra effort in specifying colClasses? That’s because if I wouldn’t, read.table function would have to make an educated guess about what type of the data my columns contain.

This would not only take longer (see the post here), but in this particular case it would also ruin my leading zeroes – see for example line number 5. Additionally, I might have not even noticed that it happened.

 r<-read.table("passes_stat")
 View(r)

About mqm5775https://www.biostars.org/u/2884/Bioinformatics of sequences. Sex chromosomes. Enjoying DNA in my computer. Great ape Y chromosome evolution, specifically heterochromatin variability and analysis of male fertility genes. Creative
 use of visualization, gene expression of multi-copy gene families, genome assembly. Enthusiastic about learning and applying new technologies: Pacific Biosciencies (expert experience), Oxford Nanopore, BioNano Genomics.

Leave a Reply

Your email address will not be published. Required fields are marked *