Is Excel a useful tool for analyzing data in biology/bioinformatics?

Excel is a commonly used tool for analyzing data in biology, but it has a bad habit of converting gene names to dates. Here is how you can fix the gene-to-date problem and continue using Excel productively in your research.

I had a chat with Robert Aboukhalil about whether Excel should be used for analyzing biological data. He has a PhD in bioinformatics from Cold Spring Harbor Laboratory, where we studied at the same time. Now he works as a bioinformatics software engineer in industry. He uses cloud computing and web development to create tools that help biologists analyze sequencing data.

Robert and I agree that we should all be pragmatic about our research and not listen to people who say that you should never use Excel for biology.

We both use Excel for some things like manual curation and entering measurements. And then we switch to R or other languages for big data or specialized plotting and statistics.

Excel converts some gene names to dates

We also discuss how Excel has a problem with some gene names that it thinks are dates.

For instance, it turns OCT4 into October 4th. Since this only happens to a few genes, the literature is littered with supplementary datasets containing these date-formatted genes. The problem is that when you then do downstream analysis, the databases do not match the gene names in their date formats. You might miss important results related to these genes!

Robert talked about a few ways of avoiding this and shared a free app he built for getting around the issue. You can get Oct4th as a command-line tool: `pip install oct4th` or use the web app at

You can catch up with Robert on Twitter at @RobAboukhalil (and me at @MariaNattestad).

Related Posts


Minimum viable product for bioinformatics software: A case study

A case study of prototyping bioinformatics software: Ribbon The way to finish a big software project is to focus on testing the main idea as fast as possible. I applied this idea to finish a Read more...


Automagical R Plotting script using ggplot

This video shows off the powers of ggplot (a plotting package in R) with a script that automatically generates dozens of plots by adapting to the types of columns in a dataset. ggplot2 is a Read more...


Getting started with bioinformatics

A practical introduction to doing bioinformatics research Bioinformatics is a huge field and nobody can be an expert on everything, but here are a few recommendations for how to get started learning and doing bioinformatics. Read more...