Biotools for Comparative Microbial Genomics Wiki
Register
Advertisement

Unix cheat sheet[]

This is a very simplified and rough introduction to using a terminal on a unix machine. The unix command line interface is a very powerful environment and there is much more to it than described here.

This document describes:[]

  • Some useful concepts
  • A brief overview of the command line shell
  • Directories and the file system
  • Working with files
  • Reading the contents of text files
  • Invoking executables
  • Redirection and pipes
  • Permissions and access

Some useful concepts[]

This is a a brief overview of some useful concepts in unix The Shell: Although Unix has a graphical interface called X Windows, it's often easier and quicker to run programs by typing commands into a terminal window. Access to unix from other operating systems is usually conducted through a terminal client e.g. Putty for windows.

  • Users: All programs are run as a specific user, so you have to log into the system as that user with a password
  • Files and processes: Everything is a file or a process and the input and output from files and processes can be sent to each other (see pipes and redirection).
  • Permissions: All files, directories and programs have access permissions. A user cannot see the contents of a file or run a program unless the permissions allow

A brief overview of the command line shell[]

This is what is run when you open a terminal window. It provides a lot of information and tools to help you run programs. The command prompt: When you open a terminal, the text at the bottom of the screen next to the cursor will look something like this:

interaction[maq]:/home/projects/MicrobialGenomicsGroup>

This is useful because it tells you who you are and where you are. It can be configured in different ways but the example above shows the machine, the username in square brackets '[ ]' and the current directory after the colon ':'. Command line history and auto-completion:

  • Previously entered commands can be edited or executed again using the up arrow key
  • Filenames can be auto-completed using the tab key
  • When you login to a machine or terminal, a set of variables, collectively called the environment are created. These variables do things like telling the shell where to look for programs.
  • The printenv command will list all the environment variables. The list can be quite long.
printenv SHELL

This command prints the executable for the current shell. Unix offers a selection of many shells all of which are subtly different

printenv HOME

This command prints the location of the users home directory

printenv PATH

The PATH variable is particularly important. It consists of a list of directories which are searched when a command it typed. It is often useful to edit the PATH variable to add directories where executables are stored. Getting help: many unix commands have one or more manpage (manual page) . Try typing man commandname to see this.


Directories and the file system[]

The shell logs you into a directory in the file system. There are some rules for about the file system.

files, directories and exectuables are case sensitive: so x.txt and X.txt are two different files path delimiter: The unix shell uses the forward slash '/' to seperate files and directories NOT the backslash '\' like MsDOS There are some special characters which are used when defining the location of a file, directory or program:

  • The root directory: The root directory is defined by the single slash '/' and represents to the first node of the directory tree. It is similar to 'C:' on a windows machine.
  • The '.' directory: The '.' character is used to define the current directory when it's part of a file path.
  • The home directory: Most users are assigned a home directory where files can be created. This can be referenced using the '~\ character.
  • absolute versus relative paths: Absolute paths are defined from the root directory e.g.
/usr/bin/perl

Relative paths can be defined from the current directory e.g.

./script.pl

Relative paths can be defined from the home directory e.g.

~/script.pl

Here are some useful command to assist getting around the filesystem:

  • pwd: prints the current, or "working" directory
  • cd: changes the current working directory to a new location
cd /usr/bin
  • mkdir: creates a new directory
  • mkdir newdir
  • rmdir: removes a directory, although the directory must be empty so may not contain any subdirectories or files
rmdir newdir

Working with files[]

Files reside in directories and can contain text or binary information. Files be created, copied, moved, renamed and deleted with the following commands.

ls: lists the contents of directories. Run just as 'ls' the command lists all the contents without any other information. More information can be gained by supplying some arguments. list showing permissions, user, group. size, modification date and filename

ls -l

as above but file sizes are printed in human readable form

ls -lh

as above but sort the results by file size in descending order

ls -lhS

as above but in ascending order

ls -lhSr

list the contents of all directories recursively from the current directory

ls -lR

list files in ascending order of last modified

ls -lt

as above but in descending order

ls -ltr

touch: used with a filename. If the file doesn't already exist, a new one is created. Otherwise the date of the file is changed to the current time

touch filename

cp: copies filename1 to filename2 or into a directory and leaves the original file untouched. It can also be used to copy directories

cp original_file new_file
cp file directory/
cp -r directory new_directory

mv: moves a file from one to another and deletes the original file. It can also be used to recursively move directories

mv original_file new_file
mv file directory/
mv dir1 dir2

rm: deletes a file, can also be used to delete a directory and the contents. use with care.

rm file
rm -r dir

File name advice: It's best not to use spaces or special characters such as " ' < > $ @ $ in filenames. Underscores '_' and hypens '-' are fine

Reading the contents of text files[]

The contents of text files (but not binary files) can be read quite easily through the terminal.

cat: appends the contents of one file into another

cat file1 file2

more: shows the contents of a text file. Press 'q' to return to prompt

more filename
  • less: better than more because the up and down keys can be used to scroll up and down. Some useful key commands are
  • space: scroll forward one screen
  • b: scroll backward one screen
  • g: scroll to the top of the file
  • G: scroll to the end of the file
  • q: quit to the prompt
  • /text: searches the file for the word 'text'
less filename

head: shows the first few lines of the given file(s). A hyphen and number can be passed to determine how much of the file is shown

head filename
head -5 filename

tail: shows the last few lines of the given file(s). A hyphen and number can be passed to determine how much of the file is shown

tail filename
tail -5 filename

grep: Searches through text files for a search term and print matching lines. grep is a complex command and has many options. to search two files for a searchword

grep searchword file1 file2

to search two files to print lines excluding a searchword

grep -v searchterm file1 file2

To count the number of matches per file

grep -c searchterm file1 file2

wc: counts the number of words or lines a file contains print the wordcount of a file

wc -w file.txt

print the line count of a file

wc -l file.txt

Invoking executables[]

Some files can be marked as executable which means they can run as programs and perform tasks.

Invoking executables residing in the PATH directories: These executables can be invoked without including the path to the executable. So even though a program called perl can be found in /usr/bin, because this is normally part on the PATH it can called as

perl myscript.pl

Invoking non-PATH executables: These must be called giving an absolute or relative path to the program

/usr/bin/perl myscript.pl

getting help: there are a number of ways of getting help in unix but they can vary a lot

  • man: many commands have a manual page which can be viewed by "man ls"
  • whatis: may provide a one-line description of the command "whatis wc"
  • apropos: returns a list of commands with the keyword in their manual page
  • arguments: executables take arguments e.g. filenames to set options and to define files to be operated one. options are usually prefixed by one or two hyphens
  • -h or --help:
  • wildcards : the asterisk * matches any character(s) whilst the ? character matches exactly one character

Redirection and pipes[]

Most unix processes write their output to the standard output (the terminal screen) and take their input from standard input (they keyboard). There is also a standard error which also the usually the terminal screen. These inputs and outputs can be redirected to files. Unfortunately the details of redirection can vary slightly depending on which shell program is being executed.

Redirecting output: The strandard output can be output using the > character

ls -l > list.txt

Appending to a file

echo 'hello' >> list.txt

Redirecting input

sort < filewithdata.txt

Both at the same time

sort < input.txt > output.txt

Pipes '|' allow processes to direct output and directly to other files. Pipes can be put together to allow complex "pipelines" of commands to be put together

The output of command one can be passed to command 2 as follows

ls -1 | grep -v '*.txt' | grep -c '*.coli*'

File system permissions[]

File system permissions ensure that file contents or executables can only be examined or invoked by users with the correct authorisation. They can often also be a source of problems when using data or programs created by others where you can't access a file or directory due to the permissions set.

To view permissions type ls -l in a directory containing some files, the output will look like this:

-rw-r----- 1 user1 cdrom 11802 Jul  9 10:02 file.txt

The 10 character string "-rw-r-----" describes the permissions. The hyphen indicates that the permission has not been granted and r indicates read permission, w indicates write and x indicates that the file can be executed. There are three sets of r,w,x or - to control access by the user to whome the file belongs, members of the same group and anyone else. e.g. -rwxrwxrwx means anyone can read, write and execute the file

  • whoami: will display your username
  • groups: will display the groups you are a member of
  • chmod: file permissions can be changed using the chmod command

This command takes a complex set of arguments.

  • The user, group or other are represented by u,g and o. a is used to represent all
  • whether permissions are granted or reveoked is determined using + or - respectively
  • read, write or execute permissions are represented by r,w and x

So to remove write and execute permissions for the group and others try

chmod go-wx data.txt

to give everyone read and write permissions

chmod a+rw data.txt

How to count number of columns in tab separated file[]

Sometimes you may need to know how many columns are in a text file. This command will give that number using tab as a separator (\t). To change the separator to ; change the expression FS="\t" to FS=";":

awk 'BEGIN {FS="\t"} ; END{print NF}' <file>
Advertisement