Skip to content

UNIX and the bash shell

henrykmwong edited this page Nov 13, 2013 · 16 revisions

Useful shell command syntax

There are some useful operators that can be used to compare numbers. This is useful when using if/else statement.

-eq: equal; -ne: not equal; -ge: greater than or equal to; -le: less than or equal to; -gt: greater than; -lt: less than.

Example: if (4 -ge 2) <-square bracket; then…

Note: "4 -ge 2" will not have any meaning outside the "if" statement.

Creating and running a shell script file

You can save shell commands in a code file (a shell script file), which should end in the expression .sh. To be safe, so that it executes as a bash shell script from any shell, it's best to put

#!/bin/bash

as the first line of the file, so that it is always executed using the bash shell.

You can run the commands in a shell script in three ways:

./myShellFile.sh

. myShellFile.sh

source myShellFile.sh

To do it the first way, UNIX needs to know that it is an executable file, so you would need to do: chmod u+x myShellFile.sh

Converting text files between DOS (Windows) and UNIX

Text files created in UNIX have lines that end with the ASCII character n (i.e., "new line" or "line feed"). Text files created in DOS end with both r and n, i.e., rn. r is a "carriage return" and you'll sometimes see it printed as "^M" when you open a DOS text file in a UNIX text editor.

DOS text files with r in them can sometimes mess things up in UNIX because a given command or program in UNIX doesn't know what to do with the r's, and at least one of you was running into this when trying to run a shell script file.

On the SCF Linux machines, which run the Ubuntu variant of Linux, you can convert from DOS format to UNIX: fromdos myFile and to convert back: todos myFile

In other variants of UNIX, you'll need dos2unix and unix2dos in place of fromdos and todos.

export

export -f functionname # export the function to shell. It takes a local variable and makes it global.

for loop

for i in $(seq 1 1 $num) #create a for loop from 1 to num(integer variable). Put "read name" before this for loop to enable reading the value from input.

function

Hello () {echo "Hello World $1 $2"} Hello arg1 arg2 #This shows how to pass parameters to a function.

cut

In addition to selecting columns based on a delimiter using the -d and -f flags, you can also select columns based on fixed width format. For example if you want the 2nd through 5th characters of every line in file.txt:

cut -c2-5 file.txt

grep

  • grep -v : selects non-matching lines
  • grep -o : return only the matched part of a line
    
  • grep -i "string" FILE #It searches for the given string/pattern case insensitively
    
  • grep -w "string" FILE #It searches for a word and avoids to match substrings
    
  • grep -A N "string" FILE #It prints N lines after the match
    
  • grep -B N "string" FILE #It prints N lines before the match
    
  • grep -C N "string" FILE #It shows N lines before and after the match
    
  • grep -r "string" * #Search in all files under current directory and its sub directory.
    
  • grep -c "string" FILE #Count how many lines match the given string
    
  • grep -l "string" * #Show file names which match the give pattern
    
  • grep -o -b "string" FILE #Show the position of match in the line
    

sed

  • sed 's/string1/string2/' FILE1 > FILE2 #Change the first "string1" in FILE1 to "string2" and save to FILE2
  • sed 's/string1/string2/g' FILE1 > FILE2 #Change every "string1" in FILE1 to "string2" and save to FILE2
    
  • Use & as matched pattern:
    
  • echo 123 abc | sed 's/[0-9]*/& &/' shows 123 123 abc 
    
  • Use \1 \2 flag patterns and manipulate them:
    
  • echo "stat" "comp" | sed 's/ \([a-z]*\) \([a-z]*\)/ \2 \1/' shows comp stat
    
  • sed '2d' filename # remove the 2nd line
    
  • sed '$d' filename # remove the last line
    

awk

  • awk 'BEGIN{FS = "|"}; 5 > 200' #Return lines with number, which is bigger than 200, in column 15. The column separator is |
    
  • awk -F"|" '{SUM+=5;} END {print SUM;}' #Set column separator as |,calculate column 15's sum, and print it at the end.
    
  • awk '{temp = ;  = ;  = temp; print}' #To switch column 2 and column 3.
    

ssh

ssh -X username@$1.berkeley.edu ps aux | sed '1d' | grep "exec/R" | sort -k3nr,3 | head -n "$2" >tmp.q #log into user defined SCF machine, delete the first line, grep lines with key word "exec/R", reverse numerical sort on column 3, get first several rows(number defined by user), and put the result in a file named " tmp".

head, tail

  • head -n -3: prints the entire file except for the last 3 lines
    
  • tail -n +3: prints the entire file except for the first 2 lines (i.e., starting from the third line)
    
  •     Even though +3 = 3, 'tail -n 3" is quite different to "tail -n +3"!
    
  •     I often forget the finicky details - but it's handy to know that the functionality exists.
    
  •     An easy way to experiment with head/tail (to check the behavior of those edge cases) is to use 'seq' (which prints out a sequence of numbers) e.g., "seq 1 20 | head -n -3"
    
  • head/tail -q: never prints the file names (which would otherwise happen if you have multiple files e.g., 'head file1 file2')    
    

find, locate

  • ``locate <dir_name>``: locate a directory with the given name, e.g. ``locate stat243-fall-2013``
    
  • ``find . -type d -name '<dir_name>'``: find a certain file, in this case, a directory with a given name. The syntax can be explained as follow: . refers to where to start finding. If you could narrow down to a specific place, it would be much better than, say, recursively finding a directory from root. 'd' stands for 'directory', and ``-type`` tells the shell to just find a particular type. Without that, it would find all files, so 'stat243-fall-2013.csv' could be included in the result. The flag ``-name`` means search for a directory with this name, and then you provide the name as a string. Example: ``find . -type d -name 'stat243-fall-2013'``      
    

In general, find is much more powerful than locate. In fact, if you do $ man find, you will see that you can specify the search depth, and with the -exec flag, you can also pass the found results to another command, among other things. locate seems to be much faster if you have no idea where the file/directory could be.

Regular Expression Basic Syntax

  • {n} where n is an integer >=1 : repeats the previous item exactly n times
  • a{2} matches aa
    
  • {n,m} where n>=0,m>=n : Repeats the previous item between n and m times
    
  • a{2,4} matches aaaa, aaa, or aa
    

Variables

  • Var1=`head -n1 FILE` # assign the first line to variable Var1, Note that the backtick (`) is not a quote (').
  • declare -a ARRAY # define array ARRAY, here is an example
    •     declare -a Week
      
    •     Week[0]="Sun" #Assign value
      
    •     Week[1]="Mon" #Assign value
      
    •     echo ${Week[1]} #Print value of Week[1]
      
    •     Mon
      

Basic calculations

You can use the bc command to do basic calculations. It expects input from a file so to get from a variable or the command line, you need to use a pipe. E.g.,

echo "7 + 8 + 9" | bc

OS X idiosyncrasies

Developer tools

If you are working on your own Mac and you haven't programmed much before, you may want to install the Developer Tools from your install DVD (they are not part of the default install). Many Unix and bash tools aren't installed unless you do this.

Even after you do this, there are some Unix utilities that aren't available by default in OSX, and this turns out to include wget. One way to install these things is to use Fink, which has a nice GUI here

external link: http://finkcommander.sourceforge.net/

wget on OSX and curl

In the case of wget, it also turns out that OSX does come with a very similar tool called curl that you can use instead

cd into aliased directories on OSX

I found a little shell script + bit of code so that Mac OSX will navigate directories aliased in the Finder on the command line (otherwise you can't cd into those directories). This is useful for me because I usually keep all my code in one place and my course materials in another, and wanted to create an aliased subdirectory in my course directory for the R code relevant to this course:

external link: http://hints.macworld.com/article.php?story=20050828054129701

You also need to compile the c code getTrueName.c and put the executable somewhere where it will be on your PATH.

.bash_profile vs .bashrc

if you're on OS X the system seems to use your .bash_profile rather than .bashrc, so I put this in my .bashrc rather than worry about what gets read when (see bottom of page):

external link: http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html

Run Multiple Jobs in Background

We can use the unix command 'screen' to run multiple jobs in background.

e.g. Suppose I want to run two R files example1.R and example2.R in the background. Then:

Start the 1st job

  • screen -t 1
  • R --no-save < example1.R

To detach, type Control-a followed by Control-d. To go back, type: screen -r -1

Start the 2nd job

  • screen -t 2
  • R --no-save < example2.R

To detach, type Control-a followed by Control-d. To go back, type: screen -r -2

Use top to see your job ID You can close your terminal without interrupting the running