source : Most of the material here was borrowed and adapted from
under the Creative Commons Attribution (CC_BY) license.
In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland. In 1993 the first announcement of R was made to the public. Ross’s and Robert’s experience developing R is documented in a 1996 paper in the Journal of Computational and Graphical Statistics:
Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299–314, 1996
In 1995, Martin Mächler made an important contribution by convincing Ross and Robert to use the GNU General Public License to make R free software. This was critical because it allowed for the source code for the entire R system to be accessible to anyone who wanted to tinker with it (more on free software later).
In 1996, a public mailing list was created (the R-help and R-devel lists) and in 1997 the R Core Group was formed, containing some people associated with S and S-PLUS. Currently, the core group controls the source code for R and is solely able to check in changes to the main R source tree. Finally, in 2000 R version 1.0.0 was released to the public.
The primary R system is available from the Comprehensive R Archive Network, also known as CRAN. CRAN also hosts many add-on packages that can be used to extend the functionality of R.
The R system is divided into 2 conceptual parts:
R functionality is divided into a number of packages.
base
package which is required to run R and contains the most fundamental functions.utils
, stats
, datasets
, graphics
, grDevices
, grid
, methods
, tools
, parallel
, compiler
, splines
, tcltk
, stats4
.boot
, class
, cluster
, codetools
, foreign
, KernSmooth
, lattice
, mgcv
, nlme
, rpart
, survival
, MASS
, spatial
, nnet
, Matrix
.When you download a fresh installation of R from CRAN, you get all of the above, which represents a substantial amount of functionality. However, there are many other packages available:
bnlearn
package from CRANRgraphviz
package from Bioconductor projectTo download R, go to CRAN, the comprehensive R archive network. CRAN is composed of a set of mirror servers distributed around the world and is used to distribute R and R packages. Don’t try and pick a mirror that’s close to you: instead use the cloud mirror, https://cloud.r-project.org, which automatically figures it out for you.
A new major version of R comes out once a year, and there are 2-3 minor releases each year. It’s a good idea to update regularly. Upgrading can be a bit of a hassle, especially for major versions, which require you to reinstall all your packages, but putting it off only makes it worse.
For our hands-on sessions in Konstanz, download R version 3.4.3 is recommended.
If you are a user of homebrew, we do NOT recommend you install R via homebrew
.
tcltk
) requires XQuartz
to be installed since it is no longer part of OS X. Always re-install XQuartz when upgrading your OS X to a new major version.Much of your time in R will be spent in the R interactive console. This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file.
The first thing you will see in the R interactive session is a bunch of information, followed by a “>” and a blinking cursor. It operates on the idea of a “Read, Evaluate, Print loop” (REPL):
R
Click on the icon on your desktop, or in the Start menu (if you allowed the Setup program to make either or both of these).
Stop R
by typing q()
at the command prompt.
Remarks
()
: if you type q
by itself, you will get some confusing output which is actually R
trying to tell you the definition of the q
function; more on this later.R
will ask you if you want to save the workspace (that is, all of the variables you have defined in this session); in general, you should say “no
” to avoid clutter and unintentional confusion of results from different sessions.
yes
” to saving your workspace, it is saved in a hidden file named .RData. By default, when you open a new R
session in the same directory, this workspace is loaded and a message informing you so is printed:
[Previously saved workspace restored]
When you start R, a console window is opened. The console has a few basic menus at the top; check them out on your own.
The console is where you enter commands for R to execute interactively, meaning that the command is executed and the result is displayed as soon as you hit the Enter
key. (The simplest thing you could do with R is do arithmetic)
For example, at the command prompt >
, type in 2+2
and hit Enter
; you will see
2+2
#> [1] 4
And R
will print out the answer, with a preceding “[1]
”. Don’t worry about this. For now think of it as indicating ouput.
If you type in an incomplete command, R will wait for you to complete it:
> 2 +
+
Any time you hit return and the R session shows a “+
” instead of a “>
”, it means it’s waiting for you to complete the command. If you want to cancel a command you can simply hit “Esc
” and R will give you back the “>
” prompt.
When using R as a calculator, the order of operations is the same as you would have learnt back in school.
From highest to lowest precedence:
(
, )
^
/
*
+
-
3 + 5 * 2
#> [1] 13
Use parentheses to group operations in order to force the order of evaluation if it differs from the default, or to make clear what you intend.
(3 + 5) * 2
#> [1] 16
This can get unwieldy when not needed, but clarifies your intentions.
(3 + (5 * (2 ^ 2))) # hard to read
3 + 5 * 2 ^ 2 # clear, if you remember the rules
3 + 5 * (2 ^ 2) # if you forget some rules, this might help
The text after each line of code is called a “comment”. Anything that follows after the hash (or octothorpe) symbol #
is ignored by R
when it executes code.
Most of R
’s functionality comes from its functions. A function takes zero, one or multiple arguments, depending on the function, and returns a value. To call a function enter it’s name followed by a pair of brackets - include any arguments in the brackets.
exp(10)
#> [1] 22026.47
To find out more about a function called function_name
type ?function_name
.
Which function calculates sums? And what arguments does it take?
We can store values in variables by giving them a name, and using the assignment operator =
(We should note that another form of assignment operator <-
is also in use.):
x = 1 / 40
# This is equal to :
# x <- 1 / 40
R automatically creates the variable x
and stores the 0.025
in it, but by default doesn’t print anything.
To ask R to print the value, just type the variable name by itself
x
#> [1] 0.025
Our variable x
can be used in place of a number in any calculation that expects a number:
exp(x)
#> [1] 1.025315
Notice also that variables can be reassigned:
x = 100
x
used to contain the value 0.025
and and now it has the value 100
.
Assignment values can contain the variable being assigned to:
x = x + 1
x
#> [1] 101
R
expression.“Create an object called x
with the value 7
. What is the value of x^x
. Save the value in a object called y
. If you assign the value 20
to the object x
does the value of y
change? What does this indicate about how R assigns values to objects?”
Remarks
Variable names can contain letters, numbers, underscores and periods. They cannot start with a number nor contain spaces at all. Different people use different conventions for long variable names, these include :
What you use is up to you, but be consistent.
There are a few useful commands you can use to interact with the R
session.
ls()
functionls
will list all of the variables and functions stored in the global environment (your working R
session):
ls()
[1] "x" "y"
Remarks
ls
, but we still needed to give the parentheses to tell R to call the function.ls
by itself, R will print out the source code for that function!rm()
functionYou can use rm
to delete objects you no longer need:
rm(x)
If you have lots of things in your environment and want to delete all of them, you can pass the results of ls
to the rm
function:
rm(list = ls())
In this case we’ve combined the two. Just like the order of operations, anything inside the innermost parentheses is evaluated first, and so on.
In this case we’ve specified that the results of ls
should be used for the list
argument in rm
.
Remarks
Pay attention when R does something unexpected! Errors, like above, are thrown when R cannot proceed with a calculation. Warnings on the other hand usually mean that the function has run, but it probably hasn’t worked as expected.
In both cases, the message that R prints out usually give you clues how to fix a problem.
Writing functions is simple. Paste the following code into your console
sum.of.squares = function(x,y) {
res = x^2 + y^2
return(res)
}
You have now created a function called sum.of.squares
which requires two arguments and returns the sum of the squares of these arguments. Since you ran the code through the console, the function is now available, like any of the other built-in functions within R
. Running sum.of.squares(3,4)
will give you the answer 25
.
sum.of.squares(x = 3, y = 4)
#> [1] 25
# or just RUN : sum.of.squares(3, 4)
The procedure for writing any other functions is similar, involving three key steps:
A data frame is a very important data type in R
. It’s pretty much the de facto data structure for most tabular data and what we use for statistics.
Some additional information on data frames:
read.csv()
and read.table()
.data.frame()
function.nrow()
and ncol()
, respectively.dat = data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
#> id x y
#> 1 a 1 11
#> 2 b 2 12
#> 3 c 3 13
#> 4 d 4 14
#> 5 e 5 15
#> 6 f 6 16
#> 7 g 7 17
#> 8 h 8 18
#> 9 i 9 19
#> 10 j 10 20
head()
- show first 6 rowstail()
- show last 6 rowsdim()
- returns the dimensionsnrow()
- number of rowsncol()
- number of columnsstr()
- structure of each columnnames()
- shows the names
attribute for a data frame, which gives the column names.# Load the built-in data set "cars"
data("cars")
Test upper useful functions on data.frame
: cars
head(cars)
tail(cars)
dim(cars)
nrow(cars)
ncol(cars)
str(cars)
names(cars)
cars[2,] # 2nd row, all columns
cars[,1] # All rows, 1st columns
cars$speed # Speed feature in data.frame cars
Often when we’re coding we want to control the flow of our actions. This can be done by setting actions to occur only if a condition or a set of conditions are met. Alternatively, we can also set an action to occur a particular number of times.
There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:
# if
if (condition is true) {
perform action
}
# if ... else
if (condition is true) {
perform action
} else { # that is, if the condition is false,
perform alternative action
}
Example
if(1 == 1){
x = 2
}else{
x = 3
}
print(x)
#> [1] 2
To install any package use install.packages()
install.packages("bnlearn") ## install the bnlearn package
You can’t ever learn all of R, but you can learn how to build a program and how to find help to do the things that you want to do.