Learning Objectives

  • To gain familiarity with the buttons, options in the R GUI
  • To understand variables and how to assign to them
  • To be able to manage your workspace in an interactive R session
  • To be able to use mathematical and comparison operations
  • To be able to call functions

1 Introduction

source : Most of the material here was borrowed and adapted from

under the Creative Commons Attribution (CC_BY) license.

1.1 What is R ?

In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland. In 1993 the first announcement of R was made to the public. Ross’s and Robert’s experience developing R is documented in a 1996 paper in the Journal of Computational and Graphical Statistics:

Ross Ihaka and Robert Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299–314, 1996

In 1995, Martin Mächler made an important contribution by convincing Ross and Robert to use the GNU General Public License to make R free software. This was critical because it allowed for the source code for the entire R system to be accessible to anyone who wanted to tinker with it (more on free software later).

In 1996, a public mailing list was created (the R-help and R-devel lists) and in 1997 the R Core Group was formed, containing some people associated with S and S-PLUS. Currently, the core group controls the source code for R and is solely able to check in changes to the main R source tree. Finally, in 2000 R version 1.0.0 was released to the public.

1.2 Design of the R System

The primary R system is available from the Comprehensive R Archive Network, also known as CRAN. CRAN also hosts many add-on packages that can be used to extend the functionality of R.

The R system is divided into 2 conceptual parts:

  1. The “base” R system that you download from CRAN: Linux Windows Mac Source Code
  2. Everything else.

R functionality is divided into a number of packages.

  • The “base” R system contains, among other things, the base package which is required to run R and contains the most fundamental functions.
  • The other packages contained in the “base” system include utils, stats, datasets, graphics, grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4.
  • There are also “Recommended” packages: boot, class, cluster, codetools, foreign, KernSmooth, lattice, mgcv, nlme, rpart, survival, MASS, spatial, nnet, Matrix.

When you download a fresh installation of R from CRAN, you get all of the above, which represents a substantial amount of functionality. However, there are many other packages available:

  • There are over 4000 packages on CRAN that have been developed by users and programmers around the world.
    • Remark for Hands-on session : We will manually install and use bnlearn package from CRAN
  • There are also many packages associated with the Bioconductor project.
    • Remark for Hands-on session : We will manually install and use Rgraphviz package from Bioconductor project


2 Obtaining R

To download R, go to CRAN, the comprehensive R archive network. CRAN is composed of a set of mirror servers distributed around the world and is used to distribute R and R packages. Don’t try and pick a mirror that’s close to you: instead use the cloud mirror, https://cloud.r-project.org, which automatically figures it out for you.

A new major version of R comes out once a year, and there are 2-3 minor releases each year. It’s a good idea to update regularly. Upgrading can be a bit of a hassle, especially for major versions, which require you to reinstall all your packages, but putting it off only makes it worse.

For our hands-on sessions in Konstanz, download R version 3.4.3 is recommended.

2.1 Install R for Windows

  1. Download latest R for Windows : go to https://cloud.r-project.org/bin/windows/base/release.htm
  2. Install R : Leave all default settings in the installation options
Note: c.f.

2.2 Install R for Mac OS

If you are a user of homebrew, we do NOT recommend you install R via homebrew.

  1. Download latest R for Mac OS :
  2. Install R : Leave all default settings in the installation options.
  3. Install XQuartz : Leave all default settings in the installation options.
Note: c.f.


3 Getting Started with R

Much of your time in R will be spent in the R interactive console. This is where you will run all of your code, and can be a useful environment to try out ideas before adding them to an R script file.

The first thing you will see in the R interactive session is a bunch of information, followed by a “>” and a blinking cursor. It operates on the idea of a “Read, Evaluate, Print loop” (REPL):

  • you type in commands
  • R tries to execute them
  • and then returns a result.

3.1 Start and stop R

Starting R

Click on the icon on your desktop, or in the Start menu (if you allowed the Setup program to make either or both of these).

Stopping R

Stop R by typing q() at the command prompt.

Remarks

  • the (): if you type q by itself, you will get some confusing output which is actually R trying to tell you the definition of the q function; more on this later.
  • When you quit, R will ask you if you want to save the workspace (that is, all of the variables you have defined in this session); in general, you should say “no” to avoid clutter and unintentional confusion of results from different sessions.
    • Note: When you say “yes” to saving your workspace, it is saved in a hidden file named .RData. By default, when you open a new R session in the same directory, this workspace is loaded and a message informing you so is printed:
      • [Previously saved workspace restored]

3.2 Using R as a calculator

When you start R, a console window is opened. The console has a few basic menus at the top; check them out on your own.

The console is where you enter commands for R to execute interactively, meaning that the command is executed and the result is displayed as soon as you hit the Enter key. (The simplest thing you could do with R is do arithmetic)

For example, at the command prompt >, type in 2+2 and hit Enter; you will see

2+2 
#> [1] 4

And R will print out the answer, with a preceding “[1]”. Don’t worry about this. For now think of it as indicating ouput.

If you type in an incomplete command, R will wait for you to complete it:

> 2 +
+

Any time you hit return and the R session shows a “+” instead of a “>”, it means it’s waiting for you to complete the command. If you want to cancel a command you can simply hit “Esc” and R will give you back the “>” prompt.

When using R as a calculator, the order of operations is the same as you would have learnt back in school.

From highest to lowest precedence:

  • Parentheses: (, )
  • Exponents: ^
  • Divide: /
  • Multiply: *
  • Add: +
  • Subtract: -
3 + 5 * 2
#> [1] 13

Use parentheses to group operations in order to force the order of evaluation if it differs from the default, or to make clear what you intend.

(3 + 5) * 2
#> [1] 16

This can get unwieldy when not needed, but clarifies your intentions.

(3 + (5 * (2 ^ 2))) # hard to read
3 + 5 * 2 ^ 2       # clear, if you remember the rules
3 + 5 * (2 ^ 2)     # if you forget some rules, this might help

The text after each line of code is called a “comment”. Anything that follows after the hash (or octothorpe) symbol # is ignored by R when it executes code.

3.3 Functions

Most of R’s functionality comes from its functions. A function takes zero, one or multiple arguments, depending on the function, and returns a value. To call a function enter it’s name followed by a pair of brackets - include any arguments in the brackets.

exp(10)
#> [1] 22026.47

To find out more about a function called function_name type ?function_name.

Exercise

Which function calculates sums? And what arguments does it take?

3.4 Variables and assignment

We can store values in variables by giving them a name, and using the assignment operator = (We should note that another form of assignment operator <- is also in use.):

x = 1 / 40
# This is equal to :
# x <- 1 / 40

R automatically creates the variable x and stores the 0.025 in it, but by default doesn’t print anything.

To ask R to print the value, just type the variable name by itself

x
#> [1] 0.025

Our variable x can be used in place of a number in any calculation that expects a number:

exp(x)
#> [1] 1.025315

Notice also that variables can be reassigned:

x = 100

x used to contain the value 0.025 and and now it has the value 100.

Assignment values can contain the variable being assigned to:

x = x + 1
x
#> [1] 101
  • The right hand side of the assignment can be any valid R expression.
  • The right hand side is fully evaluated before the assignment occurs.

Exercise

“Create an object called x with the value 7. What is the value of x^x. Save the value in a object called y. If you assign the value 20 to the object x does the value of y change? What does this indicate about how R assigns values to objects?”

Remarks

Variable names can contain letters, numbers, underscores and periods. They cannot start with a number nor contain spaces at all. Different people use different conventions for long variable names, these include :

  • periods.between.words
  • underscores_between_words
  • camelCaseToSeparateWords

What you use is up to you, but be consistent.

3.5 Managing your environment

There are a few useful commands you can use to interact with the R session.

ls() function

ls will list all of the variables and functions stored in the global environment (your working R session):

ls()
[1] "x"   "y"

Remarks

  • Note here that we didn’t given any arguments to ls, but we still needed to give the parentheses to tell R to call the function.
  • If we type ls by itself, R will print out the source code for that function!

rm() function

You can use rm to delete objects you no longer need:

rm(x)

If you have lots of things in your environment and want to delete all of them, you can pass the results of ls to the rm function:

rm(list = ls())

In this case we’ve combined the two. Just like the order of operations, anything inside the innermost parentheses is evaluated first, and so on.

In this case we’ve specified that the results of ls should be used for the list argument in rm.

Remarks

Pay attention when R does something unexpected! Errors, like above, are thrown when R cannot proceed with a calculation. Warnings on the other hand usually mean that the function has run, but it probably hasn’t worked as expected.

In both cases, the message that R prints out usually give you clues how to fix a problem.

3.6 Writing functions

Writing functions is simple. Paste the following code into your console

sum.of.squares = function(x,y) {
  res = x^2 + y^2
  return(res)
}

You have now created a function called sum.of.squares which requires two arguments and returns the sum of the squares of these arguments. Since you ran the code through the console, the function is now available, like any of the other built-in functions within R. Running sum.of.squares(3,4) will give you the answer 25.

sum.of.squares(x = 3, y = 4)
#> [1] 25
# or just RUN : sum.of.squares(3, 4)

The procedure for writing any other functions is similar, involving three key steps:

  • Define the function in R script
  • Load the function into the R session
  • Use the function

3.7 Data frames

A data frame is a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use for statistics.

Some additional information on data frames:

  • Usually created by read.csv() and read.table().
  • Can also create with data.frame() function.
  • Find the number of rows and columns with nrow() and ncol(), respectively.
  • Rownames are usually 1, 2, …, n.

Creating data frames

dat = data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
#>    id  x  y
#> 1   a  1 11
#> 2   b  2 12
#> 3   c  3 13
#> 4   d  4 14
#> 5   e  5 15
#> 6   f  6 16
#> 7   g  7 17
#> 8   h  8 18
#> 9   i  9 19
#> 10  j 10 20

Useful functions

  • head() - show first 6 rows
  • tail() - show last 6 rows
  • dim() - returns the dimensions
  • nrow() - number of rows
  • ncol() - number of columns
  • str() - structure of each column
  • names() - shows the names attribute for a data frame, which gives the column names.
# Load the built-in data set "cars"
data("cars")

Exercise

Test upper useful functions on data.frame : cars

head(cars)
tail(cars)
dim(cars)
nrow(cars)
ncol(cars)
str(cars)
names(cars)

Subsetting data

cars[2,] # 2nd row, all columns
cars[,1] # All rows, 1st columns
cars$speed # Speed feature in data.frame cars

3.8 Flow control

Often when we’re coding we want to control the flow of our actions. This can be done by setting actions to occur only if a condition or a set of conditions are met. Alternatively, we can also set an action to occur a particular number of times.

There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:

# if
if (condition is true) {
  perform action
}

# if ... else
if (condition is true) {
  perform action
} else {  # that is, if the condition is false,
  perform alternative action
}

Example

if(1 == 1){
  x = 2
}else{
  x = 3
}
print(x)
#> [1] 2

3.9 Get new functions: Packages

To install any package use install.packages()

install.packages("bnlearn")  ## install the bnlearn package

You can’t ever learn all of R, but you can learn how to build a program and how to find help to do the things that you want to do.



Back to home