1

Stat 216 Lab Intro Introduction to R

1. Starting R To start R, click on the Window logo on the very lower-left corner, and then click on the letter A (or any other single letters) to bring up a display of all letters, then select R -> Rx64 You will notice immediately the R console. This is where you can enter commands, receive warnings and see the output of your programs. If at any time you want to learn more about R, the webpage is located at: http://www.r-project.org/, this is also where R can be downloaded from for free! You can get to it from the Help menu -> R Project home page. If you want further instruction such as a more intensive tutorial go to the Help menu-> Html help. 2. Scripts and the R-Gui To start, we will open a script file where we will type and save commands. Go to File → New script, this will open a blank R editor. The advantage to writing a script rather than entering commands directly into the console is that for more involved work it is easier to make modifications. To run parts (or all) of your script, highlight the relevant parts and type Ctrl-R. To select all of the script, type Ctrl-A. You can also run lines of your script by having your cursor in the line you want to run and hitting the ‘run’ button which is immediately to the right of the ‘save’ icon just below the menu bar in the R-Gui, this can be easier when you want to go through your code line by line. A blank R Editor (the script file) is where we will type commands and read in data. The output will appear in the R console or, for graphs, in a Graphics window. The Console, Editor or Graphics window can be made active either by clicking on the window or by selecting it under Window from the menu bar. Type the commands below in bold into your R console to explore what happens. 3. Getting help The most important command is help:

help() help(matrix) ?matrix

You should see the help file for matrix. It tells you what arguments the function matrix takes and any default values it may have. It also tells you what objects are output and provides some examples at the end. Don’t worry if you don’t understand it all right now, in fact, the key to looking at any help file is to extract the information that you need and not to worry too much about the rest. The commands above only work if you know the exact name of the function being searched. Now try typing

?Matrix

You should see an error message in the R console. This is because R is CaSe SEnsiTive, so Matrix is not the same as matrix.

01/18/21 2

Stat 216 Lab Manual

You can also perform a “fuzzy matching” search when you don’t know the exact name of a function or you are just looking for an idea in general. For example, if We want to find the command for finding a mean of a dataset, We can enter:

??mean

I will be presented with several options where the title, function name, or general concept involves the word “mean”. In this case, we will see the library that the function is associated with followed by two colons and then the name of the function. So that the first search item base::colSums tells us that the function colSums in the library base is to Form Row and Column Sums and Means. Can you find the library and function in the list that would give us the arithmetic mean of a dataset? 4. Arithmetic Basic arithmetic can be used in R using the following symbols: +, -, /, *, ^ or ** (exponents). Note that R follows the rules of precedence (multiplication before adding, brackets before multiplication etc.). This is not the case with many languages. Try the following:

2+3*2^2 However, it is good practice to use brackets. 5. Assigning values to variables Assign the value 10.6 to a variable named x

x<-10.6

The “<-” tells R to take the number to the right of the symbol and store it in a variable whose name is given on the left. You can also use the “=” symbol. When you make an assignment R does not print out any information. If you want to see what value a variable has just type the name of the variable on a separate line and run it. 6. Names for objects (variables, vectors, functions ,matrices). Descriptive names, rather than single-letter names, are the best. If a good descriptive name consists of more than one word, use a period rather than a space to separate the words. Names can only begin with a letter and can only consist of letters, numbers and periods. Of course, you should not use a name already in use as a function. For example, t.test is a function which performs one and two sample t-tests; we shouldn’t use t.test as the name for vectors or matrices. 7. Creating Vectors You can create a vector with the c command.

december.sales <-c(3,5,33,15) colours <-c(‘red’,’blue’,’green’)

The c command produces a vector, and the “<-” or “=”command assigns it to the object. You can print out an object by typing its name in the next line of your script or into the console.

december.sales

01/18/21 3

Stat 216 Lab Manual

ALL R objects are stored as files on disk. They can be saved at the end of a session. You can delete them with the “rm” command. The “ls” command will show all the data objects you currently have. [“ls” stands for list segments, “rm” stands for remove].

ls() rm(colours)

8. Other ways to make vectors

• a vector of consecutive natural numbers d <-1:10

• sequence of 1 to 5 by increments of 0.5 e <-seq(1,5,.5)

• repeat function Repeat the numbers 1 to 6, twice

y<-rep(1:6,2) Repeat the numbers in the first vector the number of times in the second vector

y<-rep(c(9,8,7), c(1,2,3)) NOTE: It’s not a good idea to assign anything to the object “c”, since this is the name of the function to produce a vector. Other letters to avoid are q, s, t, C, D, F, I, and T. You can subscript vectors by indicating a range of values with square brackets: • fourth element in vector e

e[4] • first through third elements in vector e

e[1:3] •first and sixth elements in vector e

e[c(1,6)] •display all but the second element from vector e

e[-2] •display all but the first and third element from vector e

e[-c(1,3)] 9. Matrices Another common object is a matrix: Here we will set up a 2 row x 3 column matrix, where the entries are 5,11,15,21,25,7.

matrix(c(5,11,15,21,25,7),nrow=2,ncol=3) Notice that R fills the matrix from left to right, and from top to bottom.

01/18/21 4

Stat 216 Lab Manual

10. Subscripting Matrices You can access various parts of matrices using square brackets as with vectors. First, we’ll build a 4-by-5 matrix whose entries are the natural numbers from one to twenty.

mat<-matrix(1:20,4,5) • show the first row

mat[1,] • show the first 3 columns

mat[,1:3] • show the row 2, column 3 entry

mat[2,3] • show the matrix, omitting the 3rd column

mat[,-3] We can ask R to find the total number of entries in a matrix or vector:

length(mat) We can find the sum of the entries in a matrix or vector:

sum(mat) We can also find the mean of the entries in a matrix or vector:

mean(mat)

If we have a vector, we can find the standard deviation using sd and the variance using var: book.prices <-c(103.23, 99.52, 68.20) mean(book.prices) sd(book.prices) var(book.prices)

11. Reading in External Data from a CSV file Unfortunately, it is rare to have just a few data points that you do not mind typing in at the prompt. It is much more common to have a lot of data points in an external file. Here we will examine how to read a data set from a file using the read.csv function. For the purposes of our course the data files are in the format called “comma separated values” (csv). That is, each line contains a row of values which can be numbers or letters, and each value is separated by a comma. We also assume that the very first row contains a list of labels. The idea is that the labels in the top row are used to refer to the different columns of values. The command to read the data file is read.csv. We have to give the command at least one arguments, but we will give three different arguments to indicate how the command can be used in different situations. The first argument is the name of file. The second argument indicates whether or not the first row is a set of labels. The third argument indicates that there is a comma between each number of each line. The following command will read in the data and assign it to a variable called trials

01/18/21 5

Stat 216 Lab Manual

trials<-read.csv(file=”simple.csv”,head=TRUE,sep=”,”) Did you get a warning in your console? You may need to first set your working directory to the STAT 216 lab files directory (that is, where the .csv files for our course have been saved). We set our working directory using the setwd() function and specifying the path where the data files will be found in quotations as follows: If you are working remotely on the campus computer labs then the data files are found by typing setwd(“S:/instructors/Bree_Wilton/Stat216”) If you are working on your home computer then you will need to download and save the data files from D2L and save them all to a folder on your computer. Once you have saved them, go to the folder where they are saved and copy the path in the navigation bar at the top. Mine looks like this: C:\Users\C0248111\OneDrive – Camosun College\Documents\M_drive\STAT 216\R Labs Online F2020\Data You can copy and paste this into the brackets in setwd BUT make sure you change all of the backward slashes “\” to forward slashes “/” or you will get an error. It should look like this: setwd(“C:/Users/C0248111/OneDrive – Camosun College/Documents/M_drive/STAT 216/R Labs Online F2020/Data”) Note that it uses the forward slash “/”, not the backward slash as used in a computer directory. Check to see our files are now in our working directory by typing dir() You should now see the file trials.csv at the end of the list that is printed in the console. Let’s try again and look at the dataset by typing the trials on a new line after. trials<-read.csv(file=”simple.csv”,head=TRUE,sep=”,”) trials We can now access elements of trials just like we did for matrices by specifying either complete rows, complete columns or specific values within the square brackets. Can you display the last column of trials? We can also call whole columns in trials by typing trials$mass or more generally dataset$columnname.

01/18/21 6

Stat 216 Lab Manual

Lab Intro Assignment The assignment will be accepted until 4:30 PM on the due date listed on D2L. Late assignments will receive a late penalty after that point. You should try to submit your assignment early, if possible.

Your assignment should be submitted electronically through the D2L Assignment page. Please submit in pdf format or Word document. Files that contain several PHOTOS WILL NOT BE ACCEPTED. At the top of the first page of your assignment, type your first and last name, your section and the lab number. For example:

Bree Wilton STAT 216 D08 Lab 1

For this lab please include all code and output for the questions below.

1. Creating vectors

a) Create a vector, x, to contain the values of {116, 216, 218, 219}, and display values in x. b) Create a vector, y, to contain a sequence of values from 10 to 24 by increment of 2;

then display values in y. c) Create a vector, z, that repeats the numbers 2 to 5 three times, and display values in z. d) Create a vector, w, that contains words “good” “better” “best”, and display values in w.

2. Create a matrix: a) 3 rows and 2 columns as the following matrix 123 12 234 23 345 34 b) Show the second row and first and second column c) Calculate the mean of the first column. d) Calculate the standard deviation of the third row.