Biotools for Comparative Microbial Genomics Wiki
Register
Advertisement

What is R?[]

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formula where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

R commands[]

Start R and change the working directory to where the statistics files are found.

R
setwd("/home/student/Genomes/")
getwd()
[1] "/home/student/Genomes"


Reshape table to matrix (heatmap)[]

This example illustrate a formatting situation that you might run into in working with multiple values per genome.

State, Year, Value
KY, 1998, 56
KY, 1997, 78
IL, 1998, 48
IL, 1997, 72

and I want:

State, 1997_value, 1998_value
KY, 78, 56
IL, 72, 48

You want to use the reshape() function.

reshape(data, idvar="State", timevar="Year", direction="wide")

Reference the last column of data-frame[]

codon[,length(codon)]

Heatmaps[]

Codon usage heatmap[]

To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow.

install.packages("gplots")
library(gplots)
codon <- read.table("codonUsage.all")
colnames(codon) <- c( 'Name', 'codon', 'score', 'count')
codon <- codon[1:3]
test <- reshape(codon, idvar="Name", timevar="codon", direction="wide")
codonMatrix <- data.matrix(test[2:length(test)])
rownames(codonMatrix) <- test$Name
codon_heatmap <- heatmap.2(codonMatrix, 
scale="column", 
main="Codon usage", 
xlab="Codon fraction", 
ylab="Organism", 
trace="none", 
margins=c(8, 20))
dev.print(pdf, "codonUsage.pdf")
dev.off()

The formats of each data structure is shown bellow:

> codon
                                              Name codon   score
1             Acidaminococcus_fermentans_DSM_20731   AAA 3.05528
2             Acidaminococcus_fermentans_DSM_20731   CAA 0.30650
........
> test
Name score.AAA score.CAA score.GAA score.TAA
1	Acidaminococcus_fermentans_DSM_20731   3.05528   0.30650   5.23985    0.15237
65	Acidaminococcus_intestini_RyC-MR95   3.02789   0.91191   4.91588   0.16988
........
> codonMatrix
score.AAA score.CAA score.GAA score.TAA score.ACA
Acidaminococcus_fermentans_DSM_20731	3.05528   0.30650   5.23985   0.15237   0.34499
Acidaminococcus_intestini_RyC-MR95	3.02789   0.91191   4.91588   0.16988   0.98450


Amino acid heatmap[]

To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow.

library(gplots)
aa <- read.table("aaUsage.all")
colnames(aa) <- c( 'Name', 'aa', 'score')
test <- reshape(aa, idvar="Name", timevar="aa", direction="wide")
aaMatrix <- data.matrix(test[2:length(test)])
rownames(aaMatrix) <- test$Name
stat_heatmap <- heatmap.2(aaMatrix, 
scale="column", 
main="Amino acid usage", 
xlab="Amino acid fraction", 
ylab="Organism", 
trace="none", 
margins=c(8, 20),
col = cm.colors(256))
dev.print(pdf, "aaUsage.pdf")
dev.off()

The formats of each data structure is shown bellow:

> aa
        V1                                      V2      V3
1 Acidaminococcus_fermentans_DSM_20731	G	8.1275
2 Acidaminococcus_fermentans_DSM_20731	A	9.0013
........
> str(aa)
'data.frame':	620 obs. of  3 variables:
$ Name : Factor w/ 31 levels "Acidaminococcus_fermentans_DSM_20731",..:  1 ...
$ aa   : Factor w/ 20 levels "A","C","D","E",..: 6 1 18 10 8 5 20 19 7 9 ...
$ score: num  8.13 9 7.3 10.12 5.8 ...
> test
        Name                            score.G score.A score.V score.L score.I score.F
1  Acidaminococcus_fermentans_DSM_20731	8.1275  9.0013  7.2975 10.1203  5.7992  3.8577
21 Acidaminococcus_intestini_RyC-MR95	7.7623  8.7881  7.0802  9.7019  6.3094  4.0698
........
> aaMatrix
                                        score.G score.A score.V score.L score.I score.F
Acidaminococcus_fermentans_DSM_20731	8.1275 9.0013 7.2975 10.1203 5.7992 3.8577
Acidaminococcus_intestini_RyC-MR95	7.7623 8.7881 7.0802 9.7019 6.3094 4.0698
Advertisement