A practical introduction to doing bioinformatics research

Bioinformatics is a huge field and nobody can be an expert on everything, but here are a few recommendations for how to get started learning and doing bioinformatics. I also share my favorite podcasts and online genomics communities.

I got this question from one of you by email (paraphrased for anonymity):

I am a first year PhD student in a wet lab, where I am the only one doing bioinformatics. I started learning R after your first video, but I feel that I need more guidance on how to get started with bioinformatics.

Here is the path that I would recommend for beginners in bioinformatics:

1. Start with a foundation in Python/R and bash

In Python/R: Just get to the point where you can read in data and run a statistical test.

In Bash: Be able to navigate around directories, open files, and run a program (like sort, uniq, or wc) and output the result to a file. Check out my intro video on bash for quick tutorial on getting started on the command-line.

2. Do a small project

Start with a simple project where you take some data, do some analysis of it, and present the results with plots in a powerpoint presentation or equivalent.

You can use publicly available data. A good source is the UCSC genome browser, where you can use the Table Browser to download csv files.

It is even better if you have data from your own project or from your lab that you can play with, which will make your results more interesting to everyone around you, and more publishable!

Start by actually understanding where the data comes from, why it was generated, and what kinds of biological questions you may be able to answer with it.

A few ideas for what to do with data: look into statistical tests to run, check out machine learning techniques like PCA, look for correlations, or just start plotting to see if there are any patterns you want to investigate further.

Google as you go!

If you get completely stuck and google doesn’t answer your questions after several tries, you can ask coding questions on Stack Overflow or bioinformatics questions on Biostars.

3. Occasionally do tool safaris

Make sure you are aware of the commonly used tools for your field, which means you know their names and their main purpose. You can always learn more when you need them. Here is an example.

There is a tool called bedtools that is super useful for comparing sets of genomic intervals, called bed files. If you are asking a question like how many of these variants we found overlap with lincRNA’s, bedtools will give you an answer. Before I knew bedtools existed, there were a few times that I had questions like that and wrote whole python scripts to do the comparison.

If I had known bedtools existed, even if I had no idea how to use it yet, I would just have googled it, figured out how to use it, and solved my problem much faster. I didn’t need to spend weeks learning how to use every tool in the NGS field, but if I had done a couple of searches on Google for useful bioinformatics tools, I would have been aware enough to go back and find it later.

4. Build tools to fill gaps as they come up in your research

If you want to do bioinformatics as a career, I would recommend over time finding places where there are gaps in the tools available and filling it in by writing a tool of your own to solve the problem. Writing tools is fun and puts you in a place where finding a job is easier, especially in industry.

Making a tool is as easy as building an R package or turning a python script into a command-line program using argparse, but it can also be as complicated as web applications. For any tool you are building, consider starting with a minimum viable product. You will naturally find gaps as you are doing analysis, so just focus on solving the problems as they come up in the projects you do.

Productive entertainment:

Look for podcasts, blogs, or people to follow on Twitter. For podcasts, I recommend following Mendelspod for news and stories, especially around biotechologies and the future. And the Effort Report for discussions on life and career as an academic.

Look for blogs related to your field of interest, and follow people on Twitter. When I go to conferences, I use TweetDeck to follow the conference hashtag and see what people are saying about the talks. There are also relevant subreddits, like reddit.com/r/genomics. And Quora is starting to have some decent answers on broader questions related to the field of bioinformatics or speculation on what the future holds.

And for fun:

Check out the PhD Comics if you haven’t already. I read through all of them right when I was applying to graduate schools and I thought, hey graduate school sounds really hard but I think I can do it.

The comic XKCD is also awesome, more related to programming/math/and generally geeky stuff but it has a great sense of humor.

And there is a great dystopian movie called Gattaca that gives some perspective on what the social impact may be of the work we are doing right now to understand the human genome.

If you want more bioinformatics tips and videos like this one, subscribe for weekly email updates below:

1 thought on “Getting started with bioinformatics”

Getting Started with Bioinformatics – Frank's World · June 14, 2017 at 10:24 am

[…] The accompanying blog post has all the links mentioned. […]

Comments are closed.

Related Posts


How to write a bash script that takes user input

Quick guide to writing a bash script on the Mac/Linux command-line Writing a bash script is important for setting up pipelines to process data or run a series of different tools. It also makes your Read more...


Is Excel useful in biology research?

Is Excel a useful tool for analyzing data in biology/bioinformatics? Excel is a commonly used tool for analyzing data in biology, but it has a bad habit of converting gene names to dates. Here is Read more...


Intro to data wrangling

Data wrangling is a very practical skill that you will definitely need in your data science or bioinformatics work. It is all about getting data into the right format so it can be used by Read more...