Geography 4/595: Geographic Data Analysis
Winter 2019

Exercise 1: Getting and using R and RStudio
Finish by Monday, January 14

1. Introduction

The object of this exercise is to install and set up R and RStudio, and to experiment with some basic procedures. R is actually a computer language (that is quite similar to the S language for data analysis and visualization developed at AT&T’s Bell Labs), but is best thought of as an “environment” for producing both numerical and graphical analyses of data. R has several advantages for us here, because

R has a fairly steep learning curve, which these exercises are designed to diminish. The home page for the “R project” is at http://www.r-project.org

Both the Mac and Windows versions of R have their own built-in GUIs (Graphical User Interfaces), but they are a little idiosyncratic. RStudio (https://www.rstudio.com) is a free and open-source environment for running R, and it looks and behaves virtually the same in both Windows and OS X or MacOS, and so it will be used throughout the course.

Read through the following before beginning…

2. Getting R

R can be downloaded from one of the “CRAN” (Comprehensive R Archive Network) sites. In the US, the main site is at http://cran.us.r-project.org/ To download R, go to a CRAN website, and look in the “Download and Install R” area. Click on the appropriate link.

Windows 10 (and 7 & 8)

Note: Depending on the age of your computer and version of Windows, you may be running either a “32-bit” or “64-bit” version of the Windows operating system. If you have the 64-bit version (most likely), R will install the appropriate version (R x64 3.5.2) and will also (for backwards compatibility) install the 32-bit version (R i386 3.5.2). You can run either, but you will probably just want to run the 64-bit version.

  1. On the “R for Windows” page, for example, click on the “base” link (which should take you to the “Download R-3.5.2 for Windows” page, or you can click directly on this link: http://cran.us.r-project.org/bin/windows/).
  2. On this page, click either on the “base” or “install R for the first time links”.
  3. On the next page, click on “Download R 3.5.2 for Windows” link, and save that file to your hard disk when prompted. Saving to the desktop is fine.
  4. To begin the installation, double-click on the downloaded file, or open it from a downloads window. Don’t be alarmed unknown publisher type warnings. Window’s UAC (User Account Control) will also worry about an unidentified program wanting access to your computer. Click on “Run”.
  5. Select the proposed options in each part of the install dialog. When the “Select Components” screen appears, just accept the standard choices

Mac OS X (and MacOS Sierra)

  1. On the “R for Mac OS X” page (http://cran.us.r-project.org/bin/macosx/), there are multiple packages that could be downloaded, but the one you want is one of two topmost ones. If you are running Mavericks, Yosemite, El Capitan, or MacOS Sierra or High Sierra, download the R-3.5.2.pkg in the next step; if you are running an earlier version of OS X, download the R-3.2.1-snowleopard.pkg in the next step
  2. To download the package click on the (e.g.) “R-3.5.2.pkg (Latest release)” link.
  3. After the package finishes downloading, right-click in the downloads window of your browser, and click on “Show in Finder” (or just look in the Downloads folder). This will open a new Finder window with the installer package.
  4. Then double-click on the installer package, and after a few screens, select a destination for the installation of the R framework (the program) and the R.app GUI. Note that you will have supply the Administator’s password. Close the window when the installation is done.
  5. An application will appear in the Applications folder: R.app. You may want to drag R.app to the Dock to make R easier to start up.

There are three sort of technical “FAQ” pages that contain additional information that may be useful for working out the kinks. These include

3. Set Up

Both the Windows and OS X/MacOS versions of R come with built-in GUI’s (graphical user interfaces) that are broadly similar, but there are slight differences in how each works, and what a “best practices” workflow and set of working folders looks like. Thee differences are obviated by using RStudio (see below).

Windows

It will be useful while running R on Windows to create one (or more) “working folders” that R can use to store its internal workspace (which will appear in that folder as a file named .Rdata), and into which you can download or create data sets (e.g. in Excel or ArcGIS), or files containing R “source code” or scripts (e.g. using a text editor like the built-in script editor in R). Once that folder is created, then a shortcut (icon) on your desktop can be created that points to that working folder while starting up R. On Windows, the possibility exists to create a number of working folders, wherein the data for specific projects can be conveniently stored. For this class, one folder will probably do the job.

To create a working folder,

  1. start Windows Explorer (right-click on the Start button, and click on “Explore”)
  2. browse to or create a new folder that will contain the R data and files (e.g. create a new folder called “geog495” or something). Pick a sensible location for this folder; on Windows 10, probably in the c:\Users\xxxx\Documents\ folder (e.g. c:\Users\bartlein\Documents\)
  3. open that folder by clicking on it, and
  4. create two folders in the geog495 folder you just created called R and data (File > New > Folder etc.). These folders will be empty at first.

To create the desktop shortcut,

  1. find the “R 3.5.2” shortcut (icon) in the Start Menu (Start > Programs > R) or on the desktop.
  2. right-click on the icon, and click on “Create Shortcut”
  3. paste the shortcut back onto the desktop
  4. right-click on the new shortcut, and click on “Properties”
  5. on the “Shortcut” tab, in the field called "Start in:’ enter the full path to the R folder you just created (e.g. c:\Users\bartlein\Documents\geog495\R (Windows will help fill this in) and
  6. on the “General” tab, change the name of the shortcut to the working folder name (e.g. “geog495”).

If the shortcut has been properly created, you can click on it to start R, and it will automatically assume that its working folder is the one you created. Other shortcuts and working folders can be created.

OS X and MacOS

The Mac version of R has a built-in Workspace browser, which makes the maintenance of separate workspaces straightforward, and so it is unnecessary to create a desktop shortcut. To create a working folder, the procedure is similar to that on Windows

  1. click on Finder, and open a new window if necessary, and browse to your User/Documents folder (where User is your user name).
  2. click on File > New Folder, and create a new folder in your User/Documents folder and name this geog495
  3. create subfolders in that folder named R and data.

4. Starting R

To start the R “gui” (graphical user interface), just click on the shortcut you just created (in Windows) or on the R.app GUI (Mac) in the Applications folder (which you can copy to the Dock).

After a brief pause, you should see the message:

R version 3.5.2 (2018-12-20) — "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
... [Previously saved workspace restored]

appear in the “RConsole” window. In Windows, you can verify that R is looking at the correct folder (working directory) by clicking on File > Change dir… on the RGui menu. If you’re in the folder you just created, fine, otherwise you could browse to it from that dialog.

On the Mac, you will probably be in a default folder. You can use the Misc > Change Working Directory… menu command to browse to the folder you created above. To change the working directory that R starts with each time, you can use the R > Preferences… > General, Startup dialog.

The command window (or RConsole) is where you type commands and view text (as opposed to graphics) results. The prompt is the character “>” (in red in Windows, usually) at the bottom of the text in the R Console window. If you’ve scrolled away from the prompt, typing anything in the window will bounce you back.

The Windows and Mac versions each have built-in “script editors” which are simple text editors that allow commands to be entered, sent to R, and saved for later reuse. On Windows, the script editor is started using the File > New Script dialog, while on the Mac it’s File > New Document. (In RStudio, its File > New File > RScript.)

5. Quitting R

There are several ways to quit R – clicking on the “close window” button, typing File > Exit from the RGui menu on Windows, clicking R > Quit R on the RGui menu on the Mac (or clicking on the power switch, or typing quit() at the command prompt (or more simply q()). (Note that you must type the parentheses.) R will ask if you want to save the current workspace image. In general, you’ll want to do that, but there are cases when you might not want to (e.g. you’ve accidentally deleted some intermediate results).

6. Getting Help

The first thing to do in learning new software is figure out how to get help. R has several approaches:

The key links on the help page are:

  1. “An Introduction to R” (the built-in main manual)
  2. “Package” which lists the contents of the basic and added packages that R knows about.
  3. “Search Engine and Keywords” which allows you to search for function names and the keywords associated with each function, and for information on built-in data sets.

On the Mac, the Help menu has a search window as well as a link the Html help (Help > R Help).

One of the issues with R is that error messages can be rather obscure. The most frequent sources of errors are simple typos, followed by those generate by copying and editing code. With time, you’ll develop a feel for what the error messages mean.

7. Installing and Using RStudio

RStudio (http://www.rstudio.com) is an IDE (integrated development environment) that provides a consistent environment for running R across different platforms (i.e. Windows, OS X or MacOS, Linux). The “environment” contains four “panes” two of which include the standard command-line “console” interface of R, and a code or script editor that is generally more useful that those built into the standard R applications for Windows or the Mac, plus two other panes that provide a graphics window, help window, workspace summary and so on. The panes are tiled, and remain in the foreground, making it a little easier to navigate around the different windows that appear in the Windows and Mac applications. The IDE also provides other nice features that assist coding in general (like autocompletion) and in doing the report writing and documentation required to do “reproducible research” and also developing R packages. RStudio is still under development (the current version is Version 1.1.383), and so there are occasional problems that arise, but most are minor.

Installing RStudio is not too complicated. The download page is at: https://www.rstudio.com/products/rstudio/download/, and after a few clicks you can choose the version for the particular operating system (Windows, OS X, several flavors of Linux) that you’re using Here’s a direct link to the downloads page: https://www.rstudio.com/products/rstudio/download/

Windows

  1. Clicking on the Windows installation package (e.g. RStudio 1.1.463 - Windows Vista/7/8/10 ) will bring up a standard Windows download dialog box. Save the file to an appropriate place.
  2. Open the downloaded file.
  3. Accept the proposed defaults, and quit the installer when finished. You should see an RStudio icon on the desktop

Mac

  1. Download the Mac disk-image installation (e.g. RStudio 1.1.463 - Mac OS X 10.6+ (64-bit)). Save the file.
  2. Open the downloaded file (hRStudio-1.1.463.dmg) by clicking on it. This will open a dialog box that asks if you want to open the file with the default DiskImageMounter application.
  3. Clicking on ok will open a window showing icons for your Applications folder and the RStudio.app. Drag and drop the RStudio.app onto the Applications folder icon.
  4. If you wish, browse to your Applications folder and drag RStudio.app to the Dock.

RStudio is flexible enough in its layout (Tools > Options > Pane Layout), that individual work habits can be accommodated. A typical layout might be:

Useful menus in RStudio include:

In practice, R scripts (*.R) can be opened or created in the script editing pane. Individual lines of code (or the whole script) can be “run” or sent to the console by selecting them, and clicking on the “Run” icon at the top of the script pane, or by selecting and then pressing Ctrl-Enter (Windows) or Command-Enter (Mac). The standard R command line can also be used in the R Console pane.

Graphical output can be viewed in a larger format by using the Zoom tool on the Plots pane.

Another feature of RStudio is its ability to create “R Notebook” and “R Markdown” documents that combine text, code and the results of executing that code, an element of what is known as “Reproducible Research”. This feature will be discussed further as the course goes on.

8. Projects in RStudio

One very nice feature of RStudio is its ability to create Projects which help a lot in keeping data (e.g. .csv-type text files, or R’s internal .RData format) and scripts (e.g. files that end in .R, or .Rmd) organized. Also, multiple Projects can reside on the same machine (or user account on the machine), which helps keep your work organized. Project folders can be created internally in RStudio, but it may be easier to create the folders outside of RStudio, and then use the File > New Project > Existing Directory dialog to browse to that folder or directory. A useful folder or directory hierarchy would be created by using the two subfolders or directories to the working directories described above, the R one for code, the .RData workspace file, and *.R and *.Rmd source files, and the other data to download data into. Then in the New Project dialog, one would browse to, say c:\Users\bartlein\Documents\geog495\R\ (Windows), or User/Documents/R/ (OS X/MacOS) to create the Project file (R.Rproj), and download data to c:\Users\bartlein\Documents\geog495\data\ (Windows), or User/Documents/data/ (OS X/MacOS).

It’s also possible to begin where you left off, by browsing to a Project file, and simply clicking on it.

9. A Data Set

The Summit Cr. geomorphic data consists of 88 observations of 11 variables along an 0.8-km stretch of Summit Cr. in eastern Oregon. This data set was collected by Pat McDowell, Frank Magilligan and their students as part of their study of the effects of cattle “exclosures” on the morphology of stream channels. They divided this stretch of Summit Cr. into individual “hydrologic units” (HU’s) that were either pools, shallow “riffles,” or straight “glides.” The overall study area is divided into three sections: an upstream reach (reach A) in which cattle are permitted to graze, a middle reach (reach B) from which cattle have been excluded, and a downstream reach (reach C), in which cattle were again permitted to graze.

The dataset contains the following information:

Col. name scale R class Definition
==== ========== =========== ======= =============================================================
1 Location alphanumeric character ID for a particular cross section
2 Reach nominal factor Reach (A=upstream reach(grazed); B=exclosure (no cattle); C=downstream (grazed))
3 HU nominal factor Hydrologic unit type (P=pool; R=riffle; G=“glide” or straightwater stretch)
4 CumLen ratio numeric cumulative distance downstream from the upstream end of the study area (m)
5 Length ratio numeric length of a hydrologic unit (m)
6 DepthWS ratio numeric depth of the channel from the water surface to the bottom (m)
7 WidthWS ratio numeric width of the channel at the bankfull stage (m)
8 WidthBF ratio numeric width of the channel at the bankfull stage (m)
9 HUAreaWS ratio numeric area covered by the hydrologic unit at the water surface (sq m)
10 HUAreaBF ratio numeric area covered by the hydrologic unit at the bankfull stage (sq m)
11 wsgrad ratio numeric water-surface gradient (m/m, i.e. dimensionless

The above table is sometimes referred to as a “codebook” that provides an expanded definition for each variable. (There is a tradeoff between shortish variable names, which are efficient to type, and longish variable names that are more self-explanatory.)

10. Importing the Data Set

Reading data

R can read data from a number of different sources, including text (ascii) data and the .csv (comma separated values) format of Excel spreadsheets, as well as from an internal format, which is text-based, but not easily readable by humans. R stores the data, names of variables, etc. in an efficient form in its workspace (.Rdata) that can be saved and reloaded.

At the time of this writing, the most efficient way to open and import a new data set is in .csv format, which can be download from a web page, either the “data sets” page on the course web page, or from a link on one of the exercise pages like this one.

Importing a data set or shape file into R is a two-step procedure: 1) getting or downloading the data set from a server onto the computer you’re using, and 2) reading into R.

To download the Summit Cr. data set, (Step 1)

  1. right-click on a link to a data set on a web page, like this one: [sumcr.csv]
  2. then save the file (using Internet Explorer, click on “Save target as…” or for Firefox, click on “Save link as…”, or using Safari on the Mac, click on “Download Linked Files As…”
  3. then browse to the data subfolder in the working folder created above, and
  4. save the file.

To read the Summit Cr. data set into R (Step 2), type the following:

NOTE: This will only work if the file was downloaded to data folder in the working folder. If you saved it somewhere else, like your Downloads folder, you should move it into your working folder. On the Mac, you may have to change the working folder using the Misc > Change Working Directory… menu command. On both a PC or a Mac, you can verify that the file is in the right place by typing:

NOTE: Punctuation, spelling and case are important. R is case sensitive; in other words, Sumcr is not the same thing as sumcr, and Read.csv is not the same as read.csv.

If you’re not in the working directory, you can use File > Change dir… (Windows) or Misc > Change Working Directory… (Mac). You can also “strong-arm” the change of the working directory using the setwd() function:

Note the use of the double-backslash “\\” in specifying the folder paths in Windows. (R uses a single backslash “\” as an operator, and so the first backslash “escapes” the second, telling R to treat the combination like a single backslash.)

The read.csv() function creates a data frame “object” called “sumcr” that contains the data from the .csv file. Note that the data frame object doesn’t need to have the same name as the file, but by convention it usually does. The “<-” arrow is called the “assignment operator”, which, as it sounds, assigns whatever object is to its right to whatever object is to its left, sometimes creating a new object in the process. In reading a line of text, the operator is usually spoken as “gets” as in “the dataframe sumcr gets the contents of the sumcr.csv file.” In newer versions of R, the equals (=) sign can be used, but in most existing texts and .pdf files, the <- version is used.

The advantage of the download-first-then-read-in approach is that you have an Excel-editable copy of the data set in your working folder.

An alternative approach for reading data is to use the file.choose() function to browse to a particular file:

This will open a “Select file…” dialog box. There’s a disadvantage to this approach in that it is not “reproducible”–at some later time, you may not be able to recall what file was read in to produce a particular result.

Looking at the data

The first thing to do is to check to see that R indeed has the Summit Cr. data frame in its workspace. This can be done by typing ls() (the list function) at the command line, or (Windows) clicking on Misc > List objects on the RGui menu.

The data frame can be examined by simply typing the name of the data frame at the command line (e.g. sumcr), which will create a lot of output, or by typing head(sumcr), which lists the first five lines (and guess what tail(sumcr) does..)..

The names() function can be used to get a list of the variables in a data frame, e.g.:

The individual variables are referred to by a “compound” name consisting of the data frame name and the variable name, joined by a dollar sign ($), e.g. sumcr$WidthWS Note that variable names are case-sensitive too (e.g. the name sumcr$WidthWS is not the same as sumcr$widthws.) This manner of referring to variables can be made less cumbersome by using the attach() function. For example, try typing the following (don’t type the material in parentheses, or the comments within a line, just the text in the Courier type face:

Then try typing attach(sumcr), press Enter, and now type WidthWS on the next line (should work ok now).

11. What to hand in.

Use the summary() function to produce a quick summarization of the data set:

To print the summary out, select the text, and click on the “print” icon, or use File > Print.

12. Quitting RStudio

R does not automatically save any script files you may have created or any updates that may have been made to .RData, but there are dialogs that should pop up when quitting RStudio. Quit RStudio using the File > Quit Session… menu. A dialog box will pop up saying “Quit R Session, Save workspace image to …” Click on “Save”, and likewise for any .R or .Rmd scripts you may have created.