1. Start by opening the terminal (on a Mac, hit command-space and type in terminal.
  2. I’m going to show you how to navigate around your file system, but first we are going to make a quick change that will make navigating easier (forever).

    Type this in:

    touch ~/.bash_profile
    open ~/.bash_profile
    

    That should open a new window with TextEdit (or another text editor if you have changed your settings). Then you can copy in the following code:

    function prompt
    {
        local BGreen='\e[1;32m'       # Green
        local BIBlue='\e[1;94m'       # Blue
        local GRAY="\[\033[0;37m\]"   # Gray
        local BYellow='\e[1;33m'      # Yellow
        local BLACK="\[\033[0;30m\]"
        local CYAN='\e[\033[1;36m'
        export PS1="
    ${BGreen}\u${BGreen}@${BGreen}\h:${BIBlue}\w${BLACK}
    $ "
    }
    prompt
    

    Once you have saved that file, quit out of the terminal and restart it again. You should now see green and blue text that used to be black. That works on a Mac, but if you are on another system or using ssh, you can edit your ~/.bashrc file instead of the ~/.bash_profile file. The code we added has two advantages, it adds color to help orient us, and it shows the entire path of where you are instead of only the name of the folder you are in.

  3. Basic navigation (much easier to follow along in the video for this one)
    cd ~/Desktop
    mkdir my_folder
    cd my_folder
    cd ..
    cd my_folder
    mkdir another_folder
    cd another_folder
    cd ../..
    
  4. Make and edit a file
    touch my_file.txt
    open my_file.txt
    # OR try nano if it's available on your system
    nano my_file.txt
    # OR check out vim if you don't mind a slightly steeper learning curve (but way more power)
    vim my_file.txt
    # OR I use SublimeText instead of TextEdit which you can make open from the command-line
    

    You can copy the following text into your new file so we have something to play with for the next part.

    Pikachu	Electric	none
    Bulbasaur	Grass	Poison
    Charmander	Fire	none
    Squirtle	Water	none
    Caterpie	Bug	none
    Weedle	Bug	Poison
    Pidgey	Normal	Flying
    Ivysaur	Grass	Poison
    Charmeleon	Fire	none
    Wartortle	Water	none
    Metapod	Bug	none
    Kakuna	Bug	Poison
    Pidgeotto	Normal	Flying
    Venusaur	Grass	Poison
    Charizard	Fire	Flying
    Blastoise	Water	none
    Butterfree	Bug	Flying
    Beedrill	Bug	Poison
    Pidgeot	Normal	Flying
    

    Make sure you save the file, then close it and let’s look at it from the command-line.

  5. There are several ways to look at the contents of a file. These are the ones I use regularly.
    • Output everything to the terminal:
      cat my_file.txt
    • Output the first 10 lines (the default) or 5 lines:
      head my_file.txt
      head -n 5 my_file.txt
      

      (replace head with tail to get the last lines)

    • Show the file and close it without clogging up the terminal:
      less -S my_file.txt
      # use arrow keys to move around the file
      # type q to quit
      # type h to see help menu
      # the -S stops it from wrapping long lines. 
      
  6. Let’s count how many Pokémon of each type are on our list here. We are going to build this up step by step using the “|” character called a pipe.
    1. Print everything to the terminal
      cat my_file.txt
    2. Pipe that to cut to get the second column
      cat my_file.txt | cut -f 2

      (If you still see the whole file there, then the tabs converted to spaces at some point. If you ever need to convert weird spacing to tabs, use this trick:

      cat my_file.txt | awk '{$1=$1;print}' OFS='\t' | cut -f 2 
      

      If you are curious how this works: Here I use a program called awk that automatically reads files with weird spacing correctly, sets column 1 to column 1 (pretending to do work to force awk to re-delimiter the file), prints the result, and sets the output field separator (OFS) to a tab (‘\t’). I found this on Stack Overflow once when searching for how to fix weird spacing, and it works like a charm.
      Insert that into the other examples if necessary)

    3. Pipe that to uniq -c which consolidates and counts (-c) the rows
      cat my_file.txt | cut -f 2 | uniq -c 

      Output:

         1 Electric
         1 Grass
         1 Fire
         1 Water
         2 Bug
         1 Normal
         1 Grass
         1 Fire
         1 Water
         2 Bug
         1 Normal
         1 Grass
         1 Fire
         1 Water
         2 Bug
         1 Normal

      Notice that some of these types are listed more than once. This is because uniq can only consolidate entries that are right after each other.

    4. We can fix this by first sorting the list
      cat my_file.txt | cut -f 2 | sort
    5. And now we’ll pipe the sorted output to uniq -c
      cat my_file.txt | cut -f 2 | sort | uniq -c

      Output:

         6 Bug
         1 Electric
         3 Fire
         3 Grass
         3 Normal
         3 Water
    6. For clarity we can sort this output by the counts
      cat my_file.txt | cut -f 2 | sort | uniq -c | sort -k1,1nr 

      Sort works alphabetically on the whole line if there are no arguments. Here I am saying I want to sort -k1,1: only columns 1 through 1 (meaning just the first column), -n: numerically, -r reverse (highest number at the top).
      Output:

         6 Bug
         3 Fire
         3 Grass
         3 Normal
         3 Water
         1 Electric
      

      Sticking together -n and -r with the -k1,1 is a pattern that allows you to sort by multiple columns in different ways. A great example in bioinformatics is how bedtools asks you to sort .bed files first by chromosome (column 1) and then within each chromosome group, increasing numerically by position (column 2): sort -k1,1 -k2,2n

    7. Ta-daa! By stringing a few commands together you can do all sorts of good stuff in bash, like counting categories of Pokémon.
  7. Wrapping up for this time, I want to leave you with a few shortcuts and general tips:
    • Get out of trouble with control-c. Use control-c to stop running a program or get out when it’s not responding because you accidentally opened a quote and didn’t finish it or something.
    • Use control-a to jump to the beginning of a line you are typing
    • Use control-e to jump to the end of a line you are typing
    • Tab to complete, tab twice to see options when there are multiple
    • See your history of commands:
      history
    • Where to get help:
      Most programs will let you type --help to get a quick help printout.

      sort --help

      You can also type in man before a program’s name to get the manual for it.

      man sort

      Of course Google is great when you have a specific thing you are trying to do; look for Stack Overflow results to get the best quick answers.

        1. Get weekly emails with new videos and behind-the-scenes updates

          Email_signup_thumbnail

          Sign up to get free bioinformatics video content, guides, and other good stuff delivered straight to your email inbox

          Powered by ConvertKit

0 thoughts on “Introduction to bash for data analysis”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Video

Is Excel useful in biology research?

Is Excel a useful tool for analyzing data in biology/bioinformatics? Excel is a commonly used tool for analyzing data in biology, but it has a bad habit of converting gene names to dates. Here is Read more...

Video

What is bioinformatics?

Bioinformatics is a huge field with many definitions. This video discusses a broader definition of bioinformatics based on the intersection of biology, computer science, and math/statistics. I also break research in the field up into Read more...

Video

Make a python script into a command-line program

You can use the argparse package to easily turn a python script into a command-line program This is an applied example of using argparse to build a small command-line program from a python script. I Read more...