Is Excel a useful tool for analyzing data in biology/bioinformatics?

Excel is a commonly used tool for analyzing data in biology, but it has a bad habit of converting gene names to dates. Here is how you can fix the gene-to-date problem and continue using Excel productively in your research.

I had a chat with Robert Aboukhalil about whether Excel should be used for analyzing biological data. He has a PhD in bioinformatics from Cold Spring Harbor Laboratory, where we studied at the same time. Now he works as a bioinformatics software engineer in industry. He uses cloud computing and web development to create tools that help biologists analyze sequencing data.

Robert and I agree that we should all be pragmatic about our research and not listen to people who say that you should never use Excel for biology.

We both use Excel for some things like manual curation and entering measurements. And then we switch to R or other languages for big data or specialized plotting and statistics.

Excel converts some gene names to dates

We also discuss how Excel has a problem with some gene names that it thinks are dates.

For instance, it turns OCT4 into October 4th. Since this only happens to a few genes, the literature is littered with supplementary datasets containing these date-formatted genes. The problem is that when you then do downstream analysis, the databases do not match the gene names in their date formats. You might miss important results related to these genes!

Robert talked about a few ways of avoiding this and shared a free app he built for getting around the issue, available at http://gum.co/oct4th.

You can catch up with Robert on Twitter at @RobAboukhalil (and me at @MariaNattestad).

Be sure to sign up for emails below to hear when new episodes come out. And to have your voice heard about what I should include in future episodes.

Get weekly emails with new videos and behind-the-scenes updates

Email_signup_thumbnail

Sign up to get free bioinformatics video content, guides, and other good stuff delivered straight to your email inbox

Powered by ConvertKit

Related Posts

Video

Getting started with bioinformatics

A practical introduction to doing bioinformatics research Bioinformatics is a huge field and nobody can be an expert on everything, but here are a few recommendations for how to get started learning and doing bioinformatics. Read more...

Video

Intro to data wrangling

Data wrangling is a very practical skill that you will definitely need in your data science or bioinformatics work. It is all about getting data into the right format so it can be used by Read more...

Video

Automagical R Plotting script using ggplot

This video shows off the powers of ggplot (a plotting package in R) with a script that automatically generates dozens of plots by adapting to the types of columns in a dataset. ggplot2 is a Read more...